Thursday, July 31, 2008

Don't know; don't care: Whatever

The Maybe Monad type (covered in this blog in May) is a practical data type in Haskell and is used extensively. Since it is also a MonadPlus type, it is also immediately useful for logic programming, particularly of the semideterministic variety. Personal, I view Maybe as a second-order Boolean type, which, in itself, is a very powerful mode of expression, as George Boole demonstrated. Do yourself a favor, reread his works on algebra, or perhaps some musings from others, such as Dijkstra. Just as Donald Knuth showed that any n-tree could be transformed into an equivalent binary tree, and Paul Tarau put that to good, ahem, clause and effect, the Boolean algebra can model many other forms of algebras ... and does, for, after all, last I checked all my calculations are reduced to boolean representations (some call them binary digits).

Yes, Maybe's all well and good (and for the most part, it is — it's very well-suited to describe a large class of problems). But, as a dyed-in-the-wool logic programmer, Haskell has a bit of a nagging problem capturing the concept of logic variables ... that is, a thing that either may be grounded to a particular value (which Maybe already has when the instance is Just x) or may be free (which Maybe does not have). In short, the problem is with Nothing. And what is the problem? Values resolving to Nothing in a monadic computation, because they represent mzero, or failure, propagate throughout the entire (possibly chained) computation, forcing it to abort. Now, under the Maybe protocol, this is the desired result, but, when doing logic programming, it is often desirable to proceed with the computation until the value is grounded.

Now, deferred computation in Haskell is not a novel concept. In fact, there are several approaches to deferred, or nondeterministic, assignment, including my own unification recipe as well as the credit card transformation (which the author immediately disavows). The problem with these approaches is that they require resolution within that expression. What we need is a data type that captures succinctly this decided or undecided state. This we do with the Whatever data type:

data Whatever x = Only x | Anything


Just like Mabye, Whatever can be monadic:

instance Functor Whatever where
fmap
f (Only x) = Only (f x)
fmap f Anything = Anything

instance Monad Whatever where
return
x = Only x
Only x >>= f = f x
Anything >>= f = Anything


You'll note that the monadic definition of Whatever is the same as Maybe, with the exception that Maybe defines fail on Nothing, whereas Whatever has no such semantics.

Even though Whatever is monadic, the interesting properties of this data type come to the fore in its more ordinary usage ...

instance Eq x ⇒ Eq (Whatever x) where
Only x ≡ Only y = x ≡ y
Anything ≡ _ = True
_ ≡ Anything = True


... and with that definition, simple logic puzzles, such as the following, can be constructed:

In a certain bank the positions of cashier, manager, and teller are held by Brown, Jones and Smith, though not necessarily respectively.

The teller, who was an only child, earns the least.
Smith, who married Brown's sister, earns more than the manager.

What position does each man fill?1


An easy enough puzzle solution to construct in Prolog, I suppose, and now given the Whatever data type, easy enough in Haskell, too! Starting from the universe of pairs that contain the solution ...

data Man = Smith | Jones | Brown deriving (Eq, Show)
data Position = Cashier | Manager | Teller deriving (Eq, Show)
type Sibling = Bool
type Ans = (Position, Man)

universe :: [Ans]
universe = [(pos, man) | pos ← [Cashier, Manager, Teller],
man ← [Smith, Jones, Brown]]


... we restrict that universe under the constraint of the first rule ("The teller, who was an only child...") by applying "only child"ness to both the Teller and to the Man holding that Position (as obtained from the fact: "... Brown's sister ..."), but, importantly, abstaining from defining a sibling restriction on the other Positions ...

-- first rule definition: The teller must be an only child
sibling :: PositionWhatever Sibling
sibling Teller = Only False
sibling _ = Anything

-- fact: Brown has a sibling
hasSibling :: ManWhatever Sibling
hasSibling Brown = Only True
hasSibling _ = Anything

-- seed: the universe constrained by the sibling relation
seed :: [Ans]
seed = filter (λ(pos, man).sibling pos ≡ hasSibling man) universe


And, given that, we need define the rest of the rules. The first implied rule, that each Position is occupied by a different Man is straightforward when implemented by the choose monadic operator defined elsewhere, the other rule concerns the pecking order when it comes to earnings: "The teller ... earns the least" (earnings rule 1) and "Smith, ..., earns more than the manager." (earnings rule 2). This "earnings" predicate we implement monadically, to fit in the domain of choose ...

-- the earnings predicate, a suped-up guard
makesLessThan :: AnsAnsStateT [Ans] [] ()

-- earning rule 1: the teller makes less than the others
(Teller, _) `makesLessThan` _ = StateT $ λans. [((), ans)]
_ `makesLessThan` (Teller, _) = StateT $ λ_. []

-- and the general ordering of earnings, allowed as long as it's
-- different people and positions
(pos1, man1) `makesLessThan` (pos2, man2) = StateT $ λans.
if pos1 ≡ pos2 ∨ man1 ≡ man2 then []
else [((), ans)]


... we then complete the earnings predicate in the solution definition:

rules :: StateT [Ans] [] [Ans]
rules = do teller@(Teller, man1) ← choose
mgr@(Manager, man2) ← choose
guard $ man1 ≠ man2
cashier@(Cashier, man3) ← choose
guard (man1 ≠ man3 ∧ man2 ≠ man3)

-- we've extracted an unique person for each position,
-- now we define earnings rule 2:
-- "Smith makes more than the manager"
-- using my good-enough unification2 to find Smith

let k = const3
let smith = (second $ k Smith)∈[teller,mgr,cashier]
mgr `makesLessThan` smith
return [teller, mgr, cashier]


... and then to obtain the solution, we simply evaluate the rules over the seed:

evalStateT rules seed


The pivotal rôle that the Whatever data type is in the sibling relation (defined by the involuted darts sibling and hasSibling). We can abduce the derived fact that Brown is not an only child, which, with this fact reduces the universe, removing the possibility that Brown may be a Teller ...

universe \\ seed ≡ [(Teller,Brown)]


... but what about the other participants? For Jones, when we are doing the sibling/hasSibling involution, we don't know his familial status (we eventually follow a chain of reasoning that leads us to the knowledge that he is an only child) and, at that point in the reasoning we don't care. For Smith, we never have enough information derived from the problem to determine if he has siblings, and here again, we don't care. That is the power that the Whatever data type gives us,4 we may not have enough information reachable from the expression to compute the exact value in question, but we may proceed with the computation anyway. Unlike Maybe's resolution to one of the members of a type (Just x) or to no solution (Nothing), the Whatever data type allows the resolution (Only x), but it also allows the value to remain unresolved within the type constraint (Anything).5 So, I present to you, for your logic problem-solving pleasure, the Whatever data type.


Endnotes












1 Problem 1 from 101 Puzzles in Thought and Logic, by C. R. Wylie, Jr. Dover Publications, Inc. New York, NY, 1957.
2 f ∈ list = head [x|x ← list, f x ≡ x] When x is atomic, of course f ≡ id, but when x is a compound term, then there are many definitions for f for each data type of x that satisfy the equality f x ≡ x and that do not equate to id. We see one such example in the code above.
3 I really must define a library of combinators for Haskell, à la Smullyan's To Mock a Mockingbird, as I've done for Prolog.
4 This semantic is syntactically implement in Prolog as anonymous logic variables.
5 I find it ironic that Maybe says everything about the resolution of a value, but Whatever allows you to say Nothing about that value's resolution.

Thursday, July 24, 2008

My Top-"10" Movie List

The following article is presented as a literate (literal) Haskell program. You can read it, and you can just as easily feed it to a Haskell interpreter, such as ghci or hugs.

I often get into conversations with acquaintances about movies, and when a (really) good movie is mentioned, I exclaim, "That's on my Top '10' list!", because, well, it is. After about 17 of these exclamations in a (series of) conversations, I'm often called on my assertions: "Well, what are your top 10 favorite movies then?" So, I've been forced to show my hand. Which language, though, to display this (ever-changing) list? HTML, no? XML? No, dreck, and no-no-no (been there, done that; barely survived to tell the tale). Haskell, of course. What follows is my top "10" movies as a program as activated data. If you need to see the movies in something other than code, I've posted the run result: voilà.

> module Movies where

The philosophy of this system, other than the fact that I count to 10 accurately, but I appear to have difficulty stopping there (just as the protagonist (?) in the movie "High Fidelity" has in counting to 5), is that I use the State monad to control how the output of the database is formatted. The resulting movie database is a tabled-HTML output.

Quite simply, the State monad contains a (modular) counter, and when my logic determines, then things happen at each cycle.

The Reader transformer monad carries the constant of when I wish to do these things, so the net result is a State monad counter wrapped in a Reader (constant) transformer. This article is not about the (very useful) topic of monad transformers, so I recommend the excellent article that covers that material.
> import Control.Monad.State
> import Control.Monad.Reader

> data Cat a = Cat String [a]

... unfortunately, the type-class system balks at using bare or aliased types, such as string, so here I wrap String in the "S" data type and provide my HTML representation against that.
> type Caddy = Cat (S Int)   -- quite the cad, indeed!

> data S s = S s String

> instance Show (S s) where
> show (S s str) = cdata str []

Not particularly happy with the cdata transformation, either, but it's a quick and dirty implementation to handle a problem of character data in the database that would muck up an HTML representation.
> cdata [] ans = reverse ans 
> cdata (a:b) ans = cdata b ((if a == '&' then 'n' else a) : ans)

The typeclass Html is a pretty printer that generates HTML from the data set.
> class Html a where
> asHtml :: a → ReaderT Int (State Int) String

Here, when the typeclass is given an instance of "String" (wrapped in the S data type), it pretty prints columns, wrapping every n columns were n is the constant held by the ReaderT transformer monad.
> instance Html (S s) where
> asHtml x = do pre ← prefix
> post ← postfix
> return (pre ++ "<td>" ++ show x
> ++ "</td>" ++ post)
> where prefix = do idx ← get
> return (if idx == 0 then "<tr>" else "")
> postfix = do sz ← ask
> idx ← get
> put ((idx + 1) `mod` sz)
> return (if idx + 1 == sz
> then "</tr>\n" else "")

This is the engine control of the pretty printer. When given a category, it prints the title, resets the counter to zero, and then hands off the printing of columns to either itself (if the category (e.g. "Top 10 movies") contains categories (such as the genres)) or to the "String" printer (if the category just contains the movie list).
> instance Html a => Html (Cat a) where
> asHtml (Cat title list)
> = do rows ← ask
> put 0
> cells ← mapM asHtml list
> rest ← get
> return ("\n<tr><th colspan='"
> ++ show rows ++ "'>" ++ title
> ++ "</th></tr>\n" ++ foldl (++) "" cells
> ++ closeCells ((rows - rest) `mod` rows))
> where closeCells 0 = ""
> closeCells x = foldl
> (λx y → x ++ "<td>&nbsp;</td>")
> "" [1..x]
> ++ "</tr>\n"

The database of movies follows:
> scifi, documentary, ferrn, romcom :: Caddy
> western, musicals, comedy :: Caddy
> horror, brit, oldies, sitcom, indie, drama :: Caddy

> movies :: Cat Caddy
> movies = Cat "My Top 10 Movies"
> [scifi, ferrn, romcom, musicals,
> comedy, indie, documentary,
> horror, brit, sitcom, drama, oldies,
> western, anime]

> anime = Cat "Animation"
> (map (S 1) ["Emporer's New Groove", "Wallace & Gromit",
> "Shrek"])
> scifi = Cat "Science Fiction"
> (map (S 2) ["Solaris", "Blade Runner", "Galaxy Quest",
> "Donnie Darko", "Fight Club"])
> ferrn = Cat "Foreign"
> (map (S 3) ["Le Placard", "Yojimbo", "Bleu, Blanc, Rouge",
> "Siti no Samurai", "Sanjuro", "Eiron Shimbum",
> "Schultze gets the Blues", "Eat Drink Man Woman",
> "Legend of Drunken Master", "Shaolin Soccer",
> "Kung Fu (Hustle)", "Cronos"])
> romcom = Cat "Romantic Comedy"
> (map (S 4) ["George of the Jungle", "Charade", "Bull Durham",
> "Bride and Prejudice", "L.A. Story",
> "Moonstruck", "My Big, Fat Greek Wedding",
> "Bullets Over Broadway", "Clueless", "Ocean's 11",
> "Whole Nine Yards", "Princess Bride"])
> musicals = Cat "Musical"
> (map (S 5) ["Guys and Dolls", "Singin' in the Rain",
> "Mary Poppins"])
> brit = Cat "Brit"
> (map (S 6) ["Cold Comfort Farm", "Importance of Being Earnest",
> "Hot Fuzz"])
> oldies = Cat "Ole Timey"
> (map (S 7) ["O Brother, Where art Thou?",
> "To Have and Have Not", "Hopscotch"])
> western = Cat "Western"
> (map (S 8) ["Shanghai Noon", "Pale Rider",
> "Dances with Wolves"])
> comedy = Cat "Comedy"
> (map (S 9) ["Blazing Saddles", "Arsenic and Old Lace",
> "Dodgeball", "Young Frankenstein",
> "Harold & Kumar go to White Castle",
> "Revenge of the Pink Panther"])
> sitcom = Cat "Sitcom"
> (map (S 10) ["Addams Family", "Addams Family Values",
> "State and Main", "Rushmore"])
> horror = Cat "Horror/Suspense"
> (map (S 11) ["Jacob's Ladder", "Silence of the Lambs",
> "Shawn of the Dead"])
> drama = Cat "Drama"
> (map (S 12) ["Grave of the Fireflies", "Village", "Payback",
> "Shadowlands", "Station Agent", "Pleasantville",
> "Shoot 'em up", "Searching for Bobby Fischer"])
> indie = Cat "Indies"
> (map (S 13) ["Bagdad Cafe", "Ghost World",
> "Living in Oblivion", "My New Gun"])
> documentary = Cat "Documentaries"
> (map (S 14) ["Good Night and Good Luck",
> "thirty two short films about Glenn Gould",
> "Koyaanisqatsi", "Unzipped", "Ed Wood"])

This function calls the monadic system to print the movie database.
> showMovies :: Int → String
> showMovies cols = "<table>\n"
> ++ evalState (runReaderT (asHtml movies) cols) 0
> ++ "\n</table>\n"

> test = putStrLn (showMovies 3)

You can download it and see the embedded HTML in a code browser, such as eclipse. At any rate, executing test will get the above movie database in a tabled-HTML form.

What's the payoff?

Ah, yes. Well, if I had encoded my movie list in one of the currently popular metadata formats, such as HTML or XML/XSTL, then changing the number of columns per row would be prohibitively difficult with the former (HTML), and getting the metadata output as a properly formatted set of table rows would have been prohibitively difficult with the latter (XSTL). The Haskell program demonstrates both: you've seen how (relatively) easy it is to output the structured data as table rows of three columns per row, to change the number of columns (e.g.: to two columns per row), it's as simple as rewriting the test function to ...
test = putStrLn (showMovies 2)

... paid off; or, if you prefer, Q.E.D.

Alternative Implementations

Using the Reader monad is one of the ways to pass around "global" values, defining a ("global") constant function is another, sigfpe discusses those, as well as using the Reader Comonad, in an article on comonadic plumbing. I chose the Reader transformer monad instead of the Reader comonad for this article, because combining a comonad with a monad introduces arrows as the glue. Another article will explore comonads first before we dive into the thick of monad/comonad/arrow plumbing.