Saturday, June 21, 2014

keepEquals with Difference Lists

Yesterday's 1HaskellADay exercise was keepEqual but it could have easily have been ‘keepSimple’ or ‘write this in your sleep’ or ... something like that.

It was a good exercise, because there’s the obvious solution but then you look to improve the efficiencies.

And then, of course, there’re the wrong solutions, but that’s what we have proofs for, to prove ourselves into the correct implementations.

So the first stab I did was wrong:

keepEqual1 : Eq a => List a -> List a -> List a
keepEqual1 list1 list2 = list1 >>= \h1 =>
                         list2 >>= \h2 =>
                         (if h1 == h2 then return h1 else [])

The problem here is that this algorithm, albeit cleverish is, well: wrong.

It iterates through each element of list1, fine, but it compares the currently-selected element of that list to every element of list2.  There’s also a tiny problem in that if either list1 or list2 are infinite, see, as the binding operation goes through each element of the list (inner, then outer) before returning, which could be a problem if you ever want a return in this eternity.

Minor problem there.

The other problem is that this is an exponential algorithm: for each element of list1, it possibly iterates through the entire set of list2 to find a match.


So. Cleverish approach was a fail on my part. Shall we try the traditional approach?

keepEqual2 : Eq a => List a -> List a -> List a
keepEqual2 [] _ = []
keepEqual2 _ [] = []
keepEqual2 (h1 :: t1) (h2 :: t2) =
   (if h1 == h2 then [h1] else []) ++ (keepEqual2 t1 t2)

So, that’s traditional. That works (and we can choose to verify that it does work), and it terminates at the end of the first list, thereby neatly dodging the non-termination-with-infinite-list-arguments issue.

The problem here is that we are representing choice with a list-compressionescque algorithm here, so we continuously concatenate to the end of a single-element list, or, in the case of a non-match, the empty list.

That algorithm, concatenation-no-matter-what, just screams: “Improve me! Improve me, please!”

So, okay, one improvement is we can turn our choice-point from the above to construction or return:

   (if h1 == h2
    then (h1 :: (keepEqual2 t1 t2))
    else keepEqual2 t1 t2)

Okay, is that yummy-looking?

No. No, it is not.

I mean, for efficiency’s sake we are eliminating the whole ‘concatenate after the empty list’ operation for the not-equals case, and keepEqual2 is being called only once in this branch, but ...

But it’s still written twice, and that’s ugly. Why do we have to express something twice for the concept of ‘okay, add the head only if it’s the same and then continue on with the rest of the two lists.’ I mentioned the continuation (ooh! continuations!) just once here in my denotation, why do I have to mention it twice in my implementation?

Well, I don’t have to, actually. We’re working in a functional domain, so let’s get functional!

   (if h1 == h2 then ((::) h1) else id) (keepEqual2 t1 t2)

Boom! ... actually, not so ‘boom’ because of the overridded values of (::) confuses Idris (Vect, Stream, and List all use (::) as a constructor), so let’s clarify that:

cons : a -> List a -> List a
cons a list = a :: list

So, now we can write:

   (if h1 == h2 then (cons h1) else id) (keepEqual2 t1 t2)

I actually had this little partial function revelation just now, so my actual implementation involved me creating the difference list data type, which allowed constant time prepend and append operations. Which is better? Partial functions or the difference list?

Well, let’s take a look at my difference list implementation so we can judge their respective merits.

data DList a = DL (List a -> List a)

A difference list is the difference between two lists, and we represent this difference by capturing it in a function

unDL : DList a -> (List a -> List a)

And we can, of course, extract our function from the difference list type.

(<<) : DList a -> a -> DList a
list << a = DL ((unDL list) . (cons a))

Our append operator (<<) appends an element to the tail end of a DList in constant time (if it were needed, the corresponding (>>) operator prepends an element to the head of a DList, also in constant time).

So, we have our DList structure, but we need to start from something, and then, eventually convert back to a plain-old List structure. The below functions provide those functionalities for us:

emptyDL : DList a
emptyDL = DL id

dlToList : DList a -> List a
dlToList dlist = with List (unDL dlist [])

Now, with the difference list type, we rewrite our keepEqual in the natural style.

keepEqual : Eq a => List a -> List a -> List a
keepEqual’ : Eq a => DList a -> List a -> List a -> List a

keepEqual = keepEqual’ emptyDL

keepEqual’ dl [] _ = dlToList dl
keepEqual’ dl _ [] = dlToList dl
keepEqual’ dl (h1 :: t1) (h2 :: t2) =
   keepEqual’ (if (h1 == h2) then dl << h1 else dl) t1 t2

So, what do you think? For this example the DList doesn’t really shine all that much, but if we had something where appending to the end (in constant time) made a whole lot of sense, then we would have a clear win for it.

I like DList.

1 comment:

Pseudonym said...

One place where difference lists work extremely well is the Show typeclass, which shouldn't be surprising.

However, one of my favourites is GHC's sort function in Data.List. It's well worth a look.