## Wednesday, April 2, 2014

### 'B' is for Bayesian Classifiers

So. B.

Bayes was this little guy, way back when who happened to notice something that most people just naturally assume to be true. It's not, but people assume this.

"Hey, look, guys, something seems to happen because of everything else that's happening!"

Stunning revelation, isn't it?

But then he, unlike everybody else at that time (prior and even unto now), codified that belief.

('Belief.' Heh. The Bayesian network is also called the belief network).

He said, the likelihood of something is dependent on the likelihood of all the other factors you've observed.

So, for example, the probability of you picking the bracket perfectly is a (more than a few) billion (plus) to your one guess. Double your odds? Guess twice.

Yeah, right.

But I digress.

As usual.

So, formulized, Bayes said

p(x | X) = p(a | x) * p(b | x) * p(c | x) ... * p(z | x)

Or, that is:

The probability of x in the set of X's happening (like the probability of the Red Sox winning this game), is equivalent to the set of probabilities of everything else happening, given that the Red Sox won before.

So, was it raining when the Red Sox won?
Was it Tuesday when the Red Sox won?
Did you wear your lucky shirt when the Red Sox won?

You take all that historical data, compute the probabilities of each, multiply them all together, and you come up with some probability.

Then, you do the same for all cases (Red Sox winning is one case, Red Sox losing is what we don't even want to consider, but it is a case, too, so just deal with that ... for eighty-six years), and the case with the highest probable outcome is ... well, the one most likely to occur.

... according to Bayes.

How does this work in practice?

Well, pretty darn good, but that's dependent upon what other systems you're comparing it against and what you're using it for.

For example (or 'fer realz'), a Bayesian classifier we built against a transactional system that ran one hundred thousand transactions per day was able to sniff out more than ninety-nine percent of the transactions that were ... 'fishy': erroneous or malicious. Compared to what other system? A rule-based system made up of rules composed by subject-matter experts and analysts familiar with the transactions.

Another for example, a little Bayesian classifier I built to pick winning football teams worked ... pretty well. We had ten or so members of that pool, and I walked away a winner three or four times. Not stellar, but not bad; not bad at all.

The beauty of a Bayesian classifier is that it's simplicity itself to implement. The probabilities run away from you on a computer, because multiplying small probabilities will get you into the IEEE math error zone so quickly your head will spin, and your system gets all weirded out, but instead of multiplying probabilities, just use a little bit of seventh grade math (in my day) and simply sum their logarithms:

p(x | X) = p(a | x) * p(b | x) * p(c | x) ...

can be re-represented as:

log p(x | X) = log p(a | x) + log p(b | x) + log p(c | x) ...

And this way we (greatly) reduce going off into la-la land for your ALU.

Carolyn Brown said...