Typed Logic: Using Difference Lists

Wednesday, August 13, 2008

Using Difference Lists

I posted my enthusiasm about difference lists for Haskell earlier this month, but then became sidetracked by combinators and reading lists, so, let's get back on track.

Difference lists are a (very) useful data structure in Prolog, particularly when it comes to parsing with definite clause grammars (DCGs) and when it comes to list construction (again, using DCGs). Why? Well, to answer that question, let's examine the structure of difference lists. The Prolog (or relational) representation of a difference list is of the form:

X = [1,2,3|Y] - Y

What that relation describes is that X is the difference of the pair of lists of (first) [1,2,3] appended with the list Y minus (second) the list Y.

Well, what is the value of the list Y? To a Prologian, this question is not important (for Y, being a logic variable, can be any list or, even, not a list at all because the variable is still uninstantiated (a "free" variable)). Its importance is that it gives us an "instant" index into the list after the elements we've already processed. The significance of that index for parsing is that we now have a stream over the parsed (list) data, which gives us the ability to translate BNF grammars directly into working parsers. The significance for list construction is that we "simply" walk "backward" through the DCG to construct a list from a (parsed) internal representation, without paying the linear cost of either (a) traversing the list each time to add a new element (which results in exponential time list construction) or (b) building the list in reverse (adding new elements to the head of the list) and then applying reverse to the end result.

"Big deal!" you scoff, "zippers [also called 'finger trees'] already give us indices into data structures. Who needs a funny-looking specialized zipper just for lists?"

"Good point!" is my response. Zippers are powerful and flexible data structures that not only provide that index over any data structure, but, even better than difference list's index that only goes forward (with any dignity) in constant time, the zipper's index can go both forward and backward through that (generic) structure in constant time. So, if you are not familiar with zippers, I request that you run, don't walk, to a good overview to learn more about their utility.

With all their flexibility and power, in certain situations, a zipper is not the correct choice for some list processing tasks; the difference list's simplicity shines in these situations. First, for the unidirectional index of the difference list (it moves forward through the list well, but backwards ... not so much), this, for list construction purposes, is hardly ever an issue, and if it becomes one, then backtracking is easily installed with some MonadPlus magic. Next, you, the coder, are responsible for constructing a zipper list data type each time you need one, but Don Stewart has provided Data.DList; no construction necessary! Last, an example structure of a zipper list, for the list a++b is (reverse a,b), so if we wish to reconstruct the original list, for an a of large magnitude, we still pay the linear (reverse) cost in reconstruction as well as the linear cost of appending b to that result.

You heard it here first, folks — difference lists: [zipper lists] can't touch this!

Okay, I've finished throwing it down for now. Let's look at difference lists from the Haskell perspective and then use this data structure in a list construction example.

> import Data.DList

For my examples, I'm going to be looking at the subset of difference lists that I can compare and see:

> instance Eq a => Eq (DList a) where
>     x == y = toList x == toList y

> instance (Eq a, Show a) => Show (DList a) where
>     show x | x == empty = "\\x -> [x]"
>            | otherwise  = "\\x -> [" ++ show (head x)
>                           ++ doShow (toList (tail x)) ++ ":x]"
>         where doShow [] = ""
>               doShow (a:b) = ", " ++ show a ++ doShow b

With the above instance declarations, we see that

(singleton 1) `snoc` 2 `snoc` 3

gives us the representation

λ x . [1, 2, 3:x]

... snoc appends an element to the end of a list (so it's the reverse of cons).

Prologians are always smug with their ability to create an infinite stream of 1s with X = [1|X]. And, no doubt about it, Prolog is a programming language that makes list construction easy ... *ahem* almost as easy as list construction in Haskell ("Your list comprehensions outshine even Prolog for sure..."):

repeat 1
[1,1..]
... and now with difference lists ...
let x = unDL (singleton 1) x in x

But no self-respecting Prologian would freeze their interpreter with a such a rash proof; Haskellians have no such concern, thanks to weak head normal form:

take 5 (repeat 1)

So there!

Let's examine an application of difference lists. I have a little scanner that I use to help me in my oration. The meat of the algorithm (before difference lists) was as follows:

> lettersIn word = lettersIn' word []

> lettersIn' [] ans = reverse ans
> lettersIn' (h:t) ans | isLetter h = lettersIn' t (h:ans)
>                      | otherwise  = reverse ans

Ugh, yes, reverses galore! This is an oft-recurring pattern in Haskell code that is very easily fixed with some difference list magic:

> lettersIn word = lettersIn' word empty

> lettersIn' [] ans = toList ans
> lettersIn' (h:t) ans | isLetter h = lettersIn' t (ans `snoc` h)
>                      | otherwise  = toList ans

A very simply example, I grant you, for if we take a list whose elements are all in the set of the alphabet, the above code is a very complicated way of reinventing the id function. Or, put another way, who's going to teach me about scanl? Hm, would I be required to use the continuation Monad to escape the scan, I wonder? But you get the general idea for list construction, right? Instead of building the list by adding to the head ("cons"ing) and then applying reverse to the result, the different list allows one to "snoc" each element and return the correctly ordered result. For a single list construction, the benefit may not be all that obvious, but when one builds, e.g., an XML generator for a complex internal representation, reversing the reverse of the reversed reversed child element becomes an odious labour — difference lists eliminate all the internal juggling, replacing that complication with a straightforward constant-time list and element in-place append.

So, next time you need to do some list construction, instead of falling back on the reverse standby, flex your new difference list muscles; you'll be pleasantly surprised.