Friday, August 29, 2008

Scanner-parsers II: State Monad Transformers

Okay, let's implement the same Mars rover command language with pretty much the same solution, but this time, for our scanner-parser of the Position, we're going to use the StateT monad transformer over the list monad to erase the housekeeping of the input stream.
> module MarsRoverStateMonad where

> import Control.Monad.State

... other imports and data/instance definitions as before ...
Recall our Scan type was
> -- type Scan = String → (a, String)
This type looks exactly like the type for the State monad, and that documentation page reveals that parsing is very well handled by the StateT String [] a type. Let's transform our Scan type into a monad:
> type Scan a = StateT String [] a
Since, also, everything in the Position data type derives the class Read it becomes a simple matter to constrain our polymorphic type (a) to the values we wish to obtain from our parser, like so:
> next :: Read a ⇒ Scan a
> next = get >>= λ(x:xs) . put xs >> if x ≡ ' '
> then next
> else return (read [x])
This function, next, is very simple for two reasons: first, the only separator that concerns us in this command language is the space (ASCII 32), and second, the tokens it consumes are only one character each. Both of these simplicities in this language can be easily and obviously enhanced when implementing scanner-parsers for richer languages.

With this new scanner (next) based on the StateT monad transformer, we can now rewrite our lifting functions in the monadic style. This frees us to concentrate on the token at the current position and lets the monad handle the stream positioning.
> liftLoc = next >>= λx . next >>= λy . return (Loc x y)
> liftOrient = next
> liftPos = liftLoc >>= λloc .
> liftOrient >>= λdir . return (Pos loc dir)
Note that the abstraction of the type signatures of these functions did not change at all, but the implementations simplified significantly (that is, if you consider the monadic style a simplification over the previous implementation that uses let-bindings to sequence computations).

Now to run the system, we replace in runRovers the
let (start, _) = liftPos pos
let start = head $ evalStateT (liftPos) pos
and we're done!


I like the monadic threading much, much, better than the previous implementation that manually threaded the position of the input stream via tupling using let-bindings, but this version here also leaves something to be desired. Note the parsing functions (liftPos and liftLoc) both follow the the format of:
grabA >>= λa . grabB >>= λb . return (Τ a b)
I don't like that I need to identify and bind the variables a and b; I don't like that one bit! In my eyes, identifying these variables does nothing to make the program more declarative. I would much rather write these functions in the following manner...
> liftLoc = next >>=> next >=>> Loc
> liftPos = liftLoc >>=> liftOrient >=>> Pos
...with the my weird monadic stream operators defined (obviously) as follows...
> (>>=>) :: Monad m ⇒ m a → m b → m (a, m b)
> mA >>=> mB = mA >>= λa . return (a, mB)

> (>=>>) :: Monad m ⇒ m (a, m b) → (a → b → c) → m c
> mTup >=>> τ = mTup >>= λ(a, mB) .
> mB >>= λb . return (τ a b)
...but then I worry that this syntax is too esoteric for its own good. Thoughts?

Yes, a person leaving a remark pointed out that I've reinvented liftM2. Obviously so, in retrospect. I'm on a roll! After a year of entries, I suppose I'll have reinvented most of Haskell's standard library.

Actually for binary situations (that is, for types that take two arguments from the stream to construct), I think I will use these special streaming monadic operators, but when types become more complex, then more, and more complex, operators are required (each new argument for a type doubles the complexity of each new operator). This tells me (as it has already told Uustalu and Vene) that monads are not the best way to go about scanning/parsing streams to extract an internal representation of the grammar.


meteficha said...

import Control.Monad

liftLoc = liftM2 Loc next next
liftPos = liftM2 Pos liftLoc liftOrient

geophf said...

@meteficha, point well taken and very nicely done. You caught me in the act of unconsciously reinventing liftM2. Here's your ⊥-trophy. I suppose after a year of blogging, I'll have reinvented most of the standard library. *sigh*

However, your solution, albeit nicely done and elegant, is not at all to my taste; I guess I have the sequencing of DCGs ingrained into me. Also both our solutions handle the binary case fine, but the general case adds complexity for each new argument to be parsed, no?

urso said...

idea from applicative... but StateT not instance of applicative

mf <*> mx = do f <- mf
x <- mx
return $ f x

liftLoc = return Loc <*> next <*> next

liftPos = return Pos <*> liftLoc <*> liftOrient

-- or use ap:
liftLoc = return Loc `ap` next `ap` next
liftPos = return Pos `ap` liftLoc `ap` liftOrient

Erik said...

Or even better, go completely for Applicative Functors:

liftLoc = Loc <$> next <*> next