Thursday, September 18, 2008

Animal: an RDR implementation study, part I: types

Synopsis

Ripple-down rules provide a fast and efficient representation of knowledge in context for use, e.g., by expert systems. We present here a complete implementation of one type of RDR system in Haskell. But what analogy is sufficient to describe what an RDR system is? The literature, albeit comprehensive, seems to concentrate more on the details of making such a system work, but none have presented the essence: the computer guessing-game Animal does a good job of this illustration, and we use it here to build an example knowledge base for this RDR system.

Motivation/Introduction

As a knowledge engineer I have worked with Subject-Matter Experts (SMEs) to build various rule-based expert systems. A common pitfall of such systems, its ὕβρις, is that they attempt to abstract decision making from any context. And, as such, fail to notice the nuances or have the situational awareness needed to render useful judgments. In a knowledge-engineered rule-based/bayesian-like hybrid system I developed, the bayesian decisions lead to over 99% of the positive findings in the transactions analyzed.

This would be the end of the story if there were no hard limits, but there are always such hard limits. Bayesian-like systems tend to scorn the advice and guidance of SMEs: the data set itself is the experts, not the SMEs. Despite the success of using the data, bayesian-like systems also tend to overreact — only 1 transaction out of 1000 transactions it flagged actually lead to a decision — these systems need serious throttling to be successful. Resources, then, are a real-world constraints that rule-based systems better model than bayesian systems in practice. In fact, hard constraints in general are modeled much better by rule-based systems.

But the rule-based systems, popularized by, e.g., iLog JRules™ and used in many expert systems, do not speak the language of the SMEs. Having worked with SMEs across the U.S.A. over a period of years, rules invariably tend to be defined by exception. Whenever we, as the knowledge-engineers, attempt to nail down a definition with the SMEs, the conversations always proceed as follows:
SME: Yeah, a CC transaction of over $57.38 is always suspect.
Me: So, we'll flag those, then [thinking: Ha! that was an easy rule; finally!]
SME: No, no, no! Only from young males or senior citizens in the following three income brackets.
Me: Oh, okay, I'll add that to the constraint.
SME: No, but it needs to be in the following zip-codes ...
Several hours later we're still ironing out the rule, and then, as lunch break approaches, the chair either tables the rule or passes a simplified, useless, version of it.

Note how the rule set was defined above, the SMEs agreed to a general case, and then continue to refine that definition by adding (often conflicting) constraints. In a context-free rule-based system, modelling such a rule set is by no means impossible, but the task quickly becomes a chore in the nightmare of complexity.

RDRs (Ripple-down Rules), on the other hand, embrace the context. The syntax of an RDR system is as follows:
<rule> ::= IF condition THEN conclusion
<knowledge base> ::= ⊥
| <rule>
EXCEPT <knowledge base>
ELSE <knowledge base>
The semantics is as follows. If the condition is met, then that conclusion is saved as a viable result (replacing any prior conclusion) and the EXCEPT branch is followed recursively until , at which time the most recently saved conclusion is the result. On the other hand, if condition fails, the ELSE branch is selected and followed recursively. Of course, the knowledge base must be applied to something. In this system, we have a very simple environment where the condition tests for the presence of a String in that environment.

The initial knowledge base for every RDR system starts as:
IF (const True) THEN none EXCEPT ⊥ ELSE ⊥
As the SMEs interact with the RDR system, they add to knowledge to refine the conclusions guided by refinements in the conditions. The system is very permissive, redundancy is permitted, even encouraged, because a condition in depth of one path of EXCEPTs and ELSEs has a very different meaning, in context, than along another path.

Example

Our RDR system with be the Animal guessing game with the following knowledge base:
IF (const True) THEN "not an animal"
EXCEPT IF (present "four legs") THEN "pony"
EXCEPT IF (present "barks") THEN "dog"
EXCEPT ⊥
ELSE IF (present "meows") THEN "cat"
EXCEPT ⊥
ELSE ⊥
ELSE IF (present "swims") THEN "fish"
EXCEPT ⊥
ELSE IF (present "spins web") THEN "spider"
EXCEPT ⊥
ELSE ⊥
ELSE ⊥


Types
> module RDRTypes where

> import Control.Monad.State
> import Data.Map (Map)
> import qualified Data.Map as Map
I must apologize for not introducing the next three modules properly. These modules are part of my canon and will be introduced in depth elsewhere. For now, I must settle for the following descriptions:
> import Data.Transitive
defines a generic protocol of walking a data structure one step at a time, either "forward" (with advance) or backward (with withdraw).
> import Data.Zipper
The "simple Ariadne zipper" illustrated in the Haskell Wikibooks.
> import Data.BinaryTree
The only novel structure here is that the tree is shaped to conform to the structure of RDRs: the data is in the branch, not the leaves.
From Predicate Logic-based Incremental KA, Barry Drake and Ghassan Beydoun (Nov 2000), file named PRDR.pdf

2.1. Ripple Down Rules (RDR)
An RDR knowledge base is a collection of simple rules organised in a binary tree structure. Each rule has the form, "If condition then conclusion". Every rule can have two branches to other rules: a false-branch (also called the “or” branch) and a true-branch (also called the “exception” branch). An example RDR tree is shown in figure 2.1. When a rule is satisfied, the true branch is taken, otherwise a false branch is taken. The root node of an RDR tree contains the default rule whose condition is always satisfied, that is, it is of the form, “If true then default conclusion”. This default rule has only a true-branch.
The RDR IF-THEN rule contains a condition and conclusion that interact with the Environment (defined later) to inform the decision of the system.
> data Condition a env = Cond a (env → Bool)
> instance Show a ⇒ Show (Condition a env) where
> show (Cond c _) = "IF " ++ show c

> data Conclusion a b env = Concl a (env → b)
> instance Show a ⇒ Show (Conclusion a b env) where
> show (Concl c _) = "THEN " ++ show c
Given the above, an IF-THEN Rule is simply the conjunction of the the Condition and Conclusion:
> data Rule a b c kb
> = Rule (Condition a kb) (Conclusion b c kb)
> instance (Show a, Show b) ⇒ Show (Rule a b c kb) where
> show (Rule a b) = show a ++ " " ++ show b
The Environment is composed of a dictionary (keys to values) and the current most valid conclusion under consideration. In our example (Animal), we merely test for the existence of a key, but more complex system usually treat the keys as attributed values and perform more than simple existence-check tests.
> type Knowledge k v = Map k v
> data Environment k v a b
> = Env (Knowledge k v) (Conclusion a b (Knowledge k v))
> instance (Show k, Show v, Show a)
> ⇒ Show (Environment k v a b) where
> show (Env kv conc) = "{" ++ show kv ++ ": "
> ++ show conc ++ "}"
The above elements are what comprise the simple types for the RDR system, so what is left is those elements that form the structure. This system is in the shape of a binary tree, so, of course, we use that data structure. As we append new rule branches to leaves of the tree, we use the Zipper data type to allow us to add these nodes in place.
> type RuleBranch a b c k v
> = BinaryTree (Rule a b c (Knowledge k v))
> type RuleTree a b c k v
> = Zipper BinDir (RuleBranch a b c k v)

> data RuleTreeEnv a b c k v = RuleEnv (RuleTree a b c k v)
> (Environment k v b c)
> instance (Show a, Show b, Show k, Show v)
> => Show (RuleTreeEnv a b c k v) where
> show (RuleEnv tree env) = "| " ++ show tree ++ " : "
> ++ show env ++ " |"
The RDR system is built around the concept of context, and the State Monad captures that concept well. The final type is used to shuttle around the knowledge base as well as the currently viable conclusion based on the rule finding.
> type RuleFinding a b c k v
> = State (RuleTreeEnv a b c k v)
> (Conclusion b c (Knowledge k v))
The above types describe the RDR system. In the next entry, we will show the implementation of the system when it comes to building and adding rules as well as traversing the rule tree to reach a conclusion.

1 comment:

Anonymous said...

A really good and interesting post, Douglas!

This immediately made me think of the "plumber" system inside the Plan 9 OS. In most OSs, there is the "copy-and-paste" approach, where the user specifies the destination of data. In Plan 9, however, he "plumber" uses a set of "plumbing rules" to determine the destination of data. That seems to be something akin to ripple-down rules. Anyway, very interesting stuff!