Thursday, April 14, 2011

Attributed Variables: their uses and one implementation

Here's a problem that comes up enough, computationally: find a value within some range. You don't know what the value is at the moment, but you will know it, eventually, as the range becomes further restricted by refinements of the constraints or data.

Now, in general, this is an easy problem for Haskell to solve, and Haskell provides more than one way to approach the solution. For example, if we have some x in some domain D, one way to look at an arbitrary x until we find the one, true x we are looking for is the following code snippet:

> [ x | x <- D, ... further constraints ]

More variables? Interrelated ones? No problem:

> [ (x, y, z) | x <- D, y <- D, z <- D, x < y, ... further constraints ]

Okay, let’s take it to the next level. What if the interrelation is such that one variable consumes from the domain, so that value is now unavailable to other variables. Pithily: the variables’ values are mutually exclusive.

Well, one way to go about that is to embed a guard for each variable into the list compression:

> [ (x, y, z) | x <- D, y <- D, y /= x, z <- D, z `notElem` [x,y], ...

The drawback here is that this kind of guarded coding becomes tedious and prone to errors. On top of that we see that the guard follows the select, causing unnecessary backtracking through the search space,[1] especially given that we know the guarded values a priori.

Well, then, we can automate guarded selection with a state (transformer) monad:[2]

> ...
> f = do x <- choose
> y <- choose
> z <- choose
> ...
> runStateT f [0..9]


Okay, excellent! All of this works and works well … for a fixed and predetermined variable set.

The Problem



Things start to get tricky for me, programmatically, however, when one set of constrained values spill over onto another set of constrained values, such as the results of one generation in a cellular automata set or genetic (algorithm) pool affect the next set:

> evalStateT (gen row1guards) domain >>= \ans1 ->
> evalStateT (gen (row2guards ans1)) domain >>= \ans2 ->
> evalStateT (gen (row3guards ans1 ans2)) domain >>= \ans3 ->
> ... etc


The state transformer isn’t enough, so it needs to be some RWS monad, perhaps pushing the interim results into the Reader monad and the accumulated ones into Writer while the State monad handles the flow of the computation?

It seems to me that the work becomes more about monad management and less about the computation itself.[3]

Then there’s the even more general computation. What if we do not know, a priori, how many values we are generating in our problem solving? The problem space I’m looking at is something like:
For a variable number of variables, map selection and guards for each variable.

How do I write that as a list compression expression, or as a monadic one? The problem with Haskell variables is that they aren't (variables), and so something so easily accomplished in a Prolog program:

kakuro([], Vars, Functor) :-
Fn =.. [Functor, Vars],
call(Fn).
kakuro([H|T], Acc, Functor) :-
assign(H),
kakuro(T, [H|Acc], Functor).


called with:

?- kakuro([A, B, C, D], [], sum_equals_10).

or:

?- kakuro([X, Y, Z], [], sum_equals_10).

... or with a list of any other length isn't so easily written out in Haskell when I go to attack the problem. After all, what is the type of a list of unbound variables? Haskell doesn't (so easily) allow this, and I was stuck: I didn't know how to proceed using monadic programming or list compression.

A solution



Fortunately for me, I've been studying constraint-based programming, and that research has let me to some papers on attributed variables[4] which faciliate that programming style.

Attributed variables are variables ranging over a domain that can have attributes attached to them that restrict (or constrict) the range of values. It's a very simple step to go from a set of unbound attributed variables to a (simple or façile) constraint solver by adding (constraining) attributes to the variables.

More formally, I declare (denotationally) an attributed variable as:

> data Attribute a = Attribute [Constraint a] (Domain a)

where Constraint and Domain are declared as:

> newtype Constraint a = Constraint { resolve :: a -> Bool }

> type Domain a = [a]


Or, to put words to it, an Attributed Variable is an object that has a set of (constraining) functions over an assigned domain.[5]

The above denotational declaration of attributed variables works, but I find, operationally, grounding an attributed variable once it's reduced to a single value to be useful:

> data Attribute a = Attribute [Constraint a] (Domain a)
> | Ground a


So, that's my declaration.[8] How is this implemented in practice?

Implementation



The approach I take is that an attributed variable is created in a free state over its domain:

> makeAttrib :: Domain a -> Attribute a
> makeAttrib list = Attribute [] list


And then, as constraints are added to the attributed variable, the system 'tries to solve' the attributed variable over the set of the constraints:

> constrain (Attribute constraints dataset) constraint
> = solve (Attribute (Constraint constraint:constraints) dataset)
> constrain (Ground x) _ = Ground x


And if the solution results in a singleton list, then the attributed variable becomes Ground:

> solve attr@(Attribute _ []) = attr
> solve attr@(Attribute constraints list@(h:t))
> = let result = solve' constraints list
> in if length result == 1
> then Ground (head result)
> else Attribute constraints result


Of course, the 'solution' to a Ground attributed variable is itself:

> solve (Ground x) = Ground x

The above code is simply a dispatch along the binary type. The worker-bee, solve' is simply (again) an implementation of the dual of filter:

> solve' :: [Constraint a] -> [a] -> [a]
> solve' _ [] = []
> solve' constraints (h:t) =
> if (solveConstraints constraints h)
> then (h:solve' constraints t)
> else solve' constraints t

> solveConstraints :: [Constraint a] -> a -> Bool
> solveConstraints [] _ = True
> solveConstraints (f:rest) x = resolve f x && solveConstraints rest x


So something from Control.Applicative to reduce that to a one-liner? I leave that as an exercise to the reader.

And there you have it, that is a working implementation of attributed variables. I have test functions for groundedness, solvability and the current state of the domain of an active attributed variable, but those are simple helper functions, easily implemented, and provided in the attached source code.

Uses



So, now, what do we have?

Now we have variables over which we may specify a domain ('loosely' typing a variable) and then constrain. How can we use such a system?

One way is that we can write constraint logic programs in Haskell. The illustration from Prolog (above) has a nearly direct transliteration into Haskell (with the additional benefit of strong typing):

> kakuro :: [Int] -> Int -> [[Int]]
> kakuro list total = nub $ summer (sort $ map trib list) [] total

> trib :: Int -> Attribute Int
> trib x | x == 0 = makeAttrib [1..9]
> | otherwise = Ground x

> summer :: [Attribute Int] -> [Int] -> Int -> [[Int]]
> summer [] list total = do
> guard $ sum list == total
> return $ sort list
> summer (h:t) list total = do
> a <- domain $ constrain h (flip notElem list)
> summer t (a:list) total


And with the above code, we can now do the following:

*Kakuro> kakuro [0,0,0] 10
[[1,2,7],[1,3,6],[1,4,5],[2,3,5]]
*Kakuro> kakuro [0,0,0,0] 10
[[1,2,3,4]]


Note that I use sort above, but attributed variables are not (denotationally) ordered types. Operationally, that is: in logical programming, it makes sense to eliminate early, so if we have Ground values, we wish to consume those first. So, to that end, I declare the following instances:

> instance (Eq a) => Eq (Attribute a) where
> Ground x == Ground y = x == y
> _ == _ = False

> instance (Ord a) => Ord (Attribute a) where
> Ground x <= Ground y = x <= y
> Ground _ <= _ = True
> _ <= _ = False


And with those instance declarations, sort automagically works. Neat!

Interlude: Haskell is "Neat"



I don't know about you, but I'm guessing that many of you are taking the gift that Haskell is for granted. Or I suppose I should say 'the way of thinking that using Haskell provides.' Over and over again in industry (yes: stone knives and bearskins) I hear 'that can't be done.' And I find myself puzzled. What's the showstopper here?

For example, this past week at work, we needed to convert a data mapping document in Excel (stop snickering, please) into an equivalently flat XML schema representation with a accompanying XML sample file. So fields of the form:

/Field/Name="BusinessAddress"
/Field/Value="plaintext"

/Field/Name="BusinessAddress"
/Field/Value="encrypted"


And there were enumerated values as well:

/Field/Name="PropertyOccupancyTypeOwnerOccupied"
/Field/Value="plaintext"

/Field/Name="PropertyOccupancyTypeRentedToTenant"
/Field/Value="plaintext"

/Field/Name="PropertyOccupancyTypeVacant"
/Field/Value="plaintext"


There were over 1,000 of these element names. I mentioned I would write a parser to generate the XSD and XML and was told by the development team point-blank that that was impossible and the only way to go about this transcription was manually, line-by-line, by hand.

So I wrote the script. I suppose I could have used Perl, but as I'm on a Windows box, the shell isn't very inviting for Perl-like piping, so I used Haskell, stripping the metainformation and pushing the parsed names into a map (or, to be precise, a Data.Map) where enumerated (mutally exclusive instance) types went into a list named by their type name and other values were singletons. Then for the XSD I simply iterated over the elements of the map (recursively), and for the sample XML file, I just chose an element (in this case, the first element) of each type.

The net result was that all the elements analysis team compiled into an Excel spreadsheet by hand over a six week period I was able to deliver in two days as a full XSD schema and a comprehensive XML sample file.

Two members from the development team assisted me on this task, one of them was the lead developer. He was intrigued and asked me: "What does this script do?" after being impressed that he could generate a schema definition file and a sample file in seconds. So I explained Haskell as a 'data-flow' language and I traced the flow of the input file through the monad and function composition showing the translation of the internal representation into XSD and then, in a different context, into XML. I'm not sure he got my explanation, but he did walk away impressed with what Haskell can do.

Yes, these are simple things: parsing, mapping, iterating. But for most of the software engineering community in industry, it is easier and faster to go through, line-by-line, by hand, than to write a piece of code to automate that process ("Can that even be done?" "No, it's impossible, just do it manually").

I have encountered this over and over again in industry. Teams are willing to expend more than two hundred person hours to a reoccuring task and are willing to accept the associated fatigue errors to schedule systems by hand. Then they are shocked that I build a functional or logic-based system that autogenerates a pristine result in sub-second time.

Where does this leave us?

What I see as status quo is that industry is resigned to slog through repetitive tasks manually, because it's too hard in Language X to consider tackling the problem: more time would be spent writing, compiling and fixing a program to do the task than to do the task by hand.

On the academic side, I see researchers isolated from industry, feeling their advances have no practical use as there appears to be no interest from industry. I attended a talk on dependent types where the thesis was that this typing makes coding more declarative, more powerful and less error prone. I approached the lecturer and asked how I could implement such a system and was asked what application dependent types had for industry.

Hm. Let me think on that for a while.

Why would industry be interested in coding more declaratively, more powerfully, and with less errors? I was flabbergasted that the researcher saw no benefit for industry in his presentation.

We have a powerful tool in Haskell. We can do more, cleaner, safer, and do it faster than the mainstream alternatives industry picks over and over again by default.

I've found, however, that talking about typing, or monads, or pure functional programming only causes the eyes to glaze.

What lights a fire in management's eyes? When you rescue an overdue project, delivering before the deadline, and you educate other workers to be as productive ... perhaps not writing Haskell themselves (I've been asked: "Is it like Pascal?" ... and that's after a code walkthrough), but using the tools you've used to deliver these 'tremendous' results, and earning their trust that when you say, 'just let me write a Haskell script to do that: I'll take a day to write it, and we'll start getting results right away,' that that is actually what happens. I've found that 'we should use Haskell because of [all the wonderful arguments put forward]' does not convince anybody. Delivering 'incredible' and 'impossible' results, significantly faster than anticipated (two days instead of two months for the XML project)? That moves mountains.

Where I work, Haskell is installed on two lead developer's systems other than my own, and I'm given permission to code in Haskell, on a 'we only allow Language X' project.

Other Uses



So attributed variables can be used to help one solve kakuro puzzles, or to write a sudoku solver. Fine.

And I've been inching my way toward attributed variables, so it can be applied to symbolic logic puzzles as well, replacing my Whatever type with attributed variables over the enumerated domain.

So the domain is not restricted to Number, but these are not very interesting or pragmatic examples in and of themselves. Not very gripping.

So how about this example?

*Stocks> makeAttribute [Buy, Sell, Hold]

with Constraints being input feeds filtered through weighted moving averages and other indicators. Now this is a system of keen interest for me and 'currently under active and intense research.'

Will such a system work or not work? I'm not at liberty to say,[9] but here is an area that appears to fit well in the language space of constraint programming.

As problem spaces attacked by software projects — workflow analysis, data mining, planning and scheduling, budgeting, forecasting, (war)gaming, modelling and simulation — many seem good candidates to use constraint-based reasoning. Attributed variables give Haskell programmers another way to express these problem domains.



Endnotes:



[1] Y’all don’t mind if I speak logically, and in the logic-programming paradigm, do you?

[2] See my article in The Monad.Reader, issue 11, “MonadPlus: What a Super Monad!” for an implementation of choose, et al.

[3] No, I’m not speaking disparagingly of the monadic programming style. I rather like that style, and use it in my domain language of predicate logic that I layer on top of ‘regular’ Haskell programming. The beauty of monads, for me, is that they are invisible (in coding them) and powerful (in their effect). When the code becomes more about monads and less of what they are there to do is when I see this as an opportunity to examine things in a different light.

[4] http://www.swi-prolog.org/pldoc/doc_for?object=section(2,'6.1',swi('/doc/Manual/attvar.html'))

[5] My representation of Domain is straightforward and workable for most cases, but it hides more than a few compromises (or 'sell-outs') beneath the surface. For example, it makes the (gross) assumption that the range of the domain is enumerable. If Domain is in ℜ that no longer holds (at all), so then Domain shouldn't have a list representation but a pair (or bookending) bounds. But then what if Domain extends beyond ℜ and goes into the surreal numbers?[6] Then we have infintesimals such as '*' (pron: 'star') where:

* > 0, * < 0, * + * = 0

and:

* || 0 (that is: 'star' is incomparable to 0)

So how does one bounds-check on incomparables?

But these concerns are, in the main, not ones to lose one's head over. For the most part, simple linear programming solves a vast majority of the World's problems,[7] and those interesting cases are exactly that: interesting cases.

[6] http://en.wikipedia.org/wiki/Surreal_number

[7] The preface of How to Solve It: Modern Heuristics by Zbigniew Michalewicz and David B. Fogel makes this assertion, from hard-won experience in the field. Seconded.

[8] This is one approach to attributed variables, that is: constraints or attributes are applied to the variable over the domain until it is reduced to a single value, at which point the constraints and even the domain become superfluous and simply go away. An advantage to this approach is that a test for groundedness becomes trivial. A disadvantage is that once grounded one loses the path or information that got one there except through the data flow through the algorithm (the stack trace, as it were). Another disadvantage is that an attributed variable, once grounded, automatically rejects constraints. This may lead to unexpected program behavior, because AV grounded to 5 with a new constraint of x > 23 should fail, but my implementation forwards the 5 value in a success state.

[9] An audit trail of at least a year of data is necessary before any claims can be made.