Thursday, August 13, 2015

Yeah, but how do I do that?

So, my article on FP IRL has garnered some interest, and I have been asked, 'Yeah, but how do I get started into FP? How can I start using this stuff at my job?'

So, here's my answer. Here's what I do and how I do it.

So, it depends on how and where you want to start this adventure, yes? The beauty of today is that there is so many resources freely available to let you work on them. The problem is that you're good at what you do already, so it'll be hard to move away from what you know already into the domain where it should be easy but it's actually really, really different and that difference can be frustrating: caveat coder.

Also, there are effete programmers out there that tell you how you should not code.

"Oh, Prolog's not pure and it's not functional. You can't do that."

I don't listen to what I can't do. When somebody says to me: "you can't do that," it really means they are saying: "I can't do that." And I'm not interested in listening to the whining of losers. What I'm interested in is delivering results by coding well. If you want to do that, I want to do that with you. If you want to tell me what I can't do, the door is over there, don't let it hit you on your way out of my life.

Sorry. Not sorry.

So.

I host @1HaskellADay where I post a problem that you can solve in any language you want, but I post the problem, and the solution, in Haskell, every day, Monday through Friday. You can learn FP one day at a time that way, be it Haskell, Scala, Idris, whatever you'd like. You write a solution in Haskell, I retweet your solution so everybody can see you're a Haskell coder.

So. That.

Also, there are contests on-line, some with money prizes (kaggle, right?), that you can take on in language X. You may or may not win, but you'll surely learn what you can do easily, and what comes hard in your language of choice.

The way I learn a language is I don't. Not in the traditional sense, that is, of learning a language's syntax and semantics. If I don't have a 'why' then the 'how' of a language is uninteresting to me.

So I make a 'why' to learn a language, then I learn it.

What I do is I have a real-world problem, and solve it in that language. That's how I learn a language, and yes, so I code wrongly, for a long time, but then I start to code better and better in that language, until I'm an expert in that language.

Reading any and everything on the fundamentals of that language, as I encounter them, help me a lot, too.

So, as you can see. I'm learning the 'languages' Neo4J and AWS right now (yes, I know, they aren't languages. Thanks). Lots of fun. I'm doing stuff obviously wrong, but the solutions I provide they need at work, and I'm the only one stepping up to the plate and swinging hard and fast enough to keep them funding these adventures in this tech.

Get that. They are paying me at work to learn stuff that I'm having a blast doing. Why?

Maybe it's because when the VP says, 'Hey, I got a problem here for ya,' I come running?

Here's something I do not do.

I DO NOT ASK: 'can I code in X?' because the answer is always: 'No.'

What I do, is code in X and then hand them a result that so wows them, they feed me the next impossible problem to solve, and I get to set the terms. It's like instead of 'doing my job,' I instead take ownership of the company and its problems, looking for the best solution for the company as its owner. And, like an owner, I say what I do and how I do it, because I know what's best for the company in these new waters we're exploring together in partnership.

Try it that way. Don't say 'we should do X' because that's what (in management's eyes) whiny little techies say. No, don't say anything. Just code it in X, deliver the solution, that you demo to the VP and then to the whole company, and get people coming up to you saying, 'Wow. Just wow. How the hell did you do that?'

No kidding: it takes a hell of a lot of courage to be a water-walker. It has for me, anyway, because the risk is there: that you'll fail. Fail hard. Because I have failed hard. But I choose that risk over drowning, doing what they tell me and how they tell me to do it, because I'm just employ number 89030 and my interview was this: "Do you know Java?" "Yes, I know Java." And, yay, I'm a Java programmer, just like everybody else, doing what everybody else does. Yay, so yay. So boring.

I've failed twice in my 25 years in this field, and wasn't for lack of trying. Twice.

Do you know how many times I have succeeded? I don't. I've lost count. I've saved three teens' lives and that was just in one month. Put a price on that, and that was because I stepped up to the plate and tried, when nobody else would or could. And my other successes, too, and the beauty of my successes is that the whole team won, we all got to keep working on really neat stuff that mattered and got company and customer attention.

And, bonus, "Hey, I've got a business and I need your help."

Three times so far.

Taking the risk leads to success. Success breeds success.

It starts with you, not asking, but just taking that risk, trying, a few times or right off the bat, and succeeding.

And that success, and the feeling you get from knowing you've done something, and you've done something.

They can't take that away from you.

Ever.

Uploading Data to GrapheneDB

Today.

We look at how to upload a set of data (potentially massive) to GrapheneDB. I say '(potentially massive)' because an option, of course, is to enter Cypher, line-by-line in the Neo4J web admin interface, but this becomes onerous when there are larger data sets with complex (or, strike that, even 'simple') relations.

An ETL-as-copy-paste is not a solution for the long term, no matter how you slice it (trans: no matter for how long you have that intern).

So, let's look at a viable long-term solution using a specific example.

Let's do this.

The Data

The data is actually a problem in and of itself, as it is the set of Top 5 securities by group, and it is reported by various outlets, but the reports are (deeply) embedded into ((very) messy) HTML, in the main, or have a nice, little fee attached to them if you want to tap into a direct feed.

As I'm a start-up, I have more time than money (not always true for all start-ups, but that's a good rule of thumb for this one), so, instead of buying a subscription to the top 5s-feed, I built an HTML-scraper in Haskell to download the sets of Top 5s. Scraping HTML is not in the scope of this article, so if you wish to delve into that topic, please review Tagsoup in all its glory.

Okay, prerequisite,

Step 0. scraped data from HTML: done (without explanation. Deal.)

Next, I save the data locally. I suppose I could go into a database instance, such as MySQL, but for now, I have only 50 or so days worth of data, which I'm happily managing in a file and a little map in memory.

Step 1. store data we care about locally: done

Okay, so we have scraped data, automatically stored away for us. What does it all mean? That's when I got the idea of having a way of visualizing and querying these data. Neo4J was a fit, and GrapheneDB, being DaaS (you just have to need to know that 'DaaS' means 'Data as a Service'), makes sense for collaborating as a geographically-dispersed team.

Two Options

So, how do I get the data there? Two options that we explored. One was to load the data into a local neo4j-instance and then snap-restore in the Cloud with that image. That works, the first time, but it seems to me to be rather ponderous, as this is a daily process, and I do not wish to replicate my database locally daily and snap restore to the Cloud. So, I chose the latter option, which was to build a system that takes the local map, translate that into Cypher queries (to save as graph nodes and edges), then translate those Cypher queries into JSON, then create a web client that ships that JSON-of-Cypher-queries over the wire to the targeted web service.

... Neo4J and GrapheneDB are web services that allow REST data queries... (very) helpful, that.

Step 2. Translate the local data to Cypher queries

Okay, this is not going to be a Cypher tutorial. And this is not going to be the Cypher you like. I have my Cypher, you have yours, critique away. Whatevs. Let's get on with it.

The data is of the following structure, as you saw above:

Date -> { ("Mkt_Cap", ([highs], [lows])), ("Price", ([highs], [lows])), ("Volume", [leaders]) }

And we wish to follow this structure in our graph-relational data. Let's look at the top-tier of the data-structure:



You see I've also added a 'Year'-node. I do this so I can anchor around a locus of days if I wish to explore data across a set of days.

So, okay, from there, do I then create grouping nodes of 'leaders' and 'losers' for the categorization of stocks? This gets into the data-modelling question. I chose to label the relations to the stocks as such instead of creating grouping nodes. There're tradeoffs in these modeling decisions, but I'm happy with the result:



The module that converts the internal data structures is named Analytics.Trading.Web.Upload.Cypher. Looking at that module you see it's very MERGE-centric. Why? Here's why:


What you see here is that symbols, such as, well, primarily $AAPL, and others like $T and $INTC find themselves on the Top 5s lists, over and over again. By using MERGE we make sure the symbol is created if this is its first reference, or linked-to if we've seen it before in this data set.

In this domain, MERGE and I are like this: very close friends.

Okay, Map-to-Cypher, it's a very close ... well, mapping, because the relational calculus and domain-to-co-domain-mappings have a high correspondence.

I'm at a 'disadvantage' here: I come to functional programming from a Prolog-background: I think functional data structures relationally, so, usually, mappings of my functional data structures fall very nicely into graphs.

I don't know how to help you with your data structures, especially if you've been doing the Java/SQL object/relation-mapping stuff using JPA ... I mean, other than saying: 'Switch to ... Haskell?' Sorry.

Okay, so we have a set of Cypher queries, grouped in the following structures:

Date -> [groups] where groups are Mkt_Cap, Volume, and Price

Then, for each group for that date

group -> Leader [symbols] -> Losers [symbol]

So we have (with the three groups), four sets of Cypher queries, each of the grouped Cypher query weighing in with thirty MERGE statements each (three MERGE statements for each stock symbol node). Not bad.

How do we convert this list of grouped Cypher queries into JSON that Neo4J understands?

Two things make this very easy.

  1. The JSON-structure that Neo4J accepts is very simple, it is simply a group of "statements" which are individuated into single Cypher-"statement" elements. Very simple JSON! (thank you, Neo4J!)
  2. There is a module in Haskell, Data.Aeson, that facilitates converting from data structures in Haskell into JSON-structure, so the actual code to convert the Cypher queries reduces to three definitions:

With that, we have Cypher queries packaged up in JSON.

Step 3: SHIP IT!
So now that we have our data set, converted to Cypher, and packaged as JSON, we want to send it to GrapheneDB. Before I went right to that database (I could have, but I didn't), I tested my results on a Neo4J instance running on my laptop, ran the rest call and verified the results. BAM! It worked for the one day I uploaded.


After I got that feel-good confirmation in the small, I simply switched the URL from localhost:7474 to the URL GrapheneDB provides in the "Connection" tab, and voilà: we have data here!


(lots of it!)

Step n: Every day
So now that I have the back-data uploaded, I just simply run my scraper->ETL-over-REST->GrapheneDB little system and I have up-to-the-day Top 5s stock securities for my analysis, on the Cloud.

LOLSweet!

Wednesday, August 12, 2015

Recipe: Getting Haskell running on AWS Linux EC2 instance

Steps for standing up dev tools, and Haskell, on AWS EC-2

So, you’ve configured your EC-2 with the standard AWS-Linux image, and now you want to compile your sources and have running executables to demo, including in not only Java and Perl but possibly C, C++ and, for my prototypes and graph-systems, Haskell. How to do this?

Unfortunately, the AMI for Haskell is rather unforthcoming, as it seems to compile the GHC from sources, an endeavor that takes hours of AWS-time. Unnecessary. So, I’ve cut out those middle steps and have a Haskell-image on my S3 that can be downloaded and used once the dev-tools are in place.

Let’s do this.

1. Get the ghc image from my public-S3-bucket:
$ wget https://s3-us-west-2.amazonaws.com/haskell-7-10-2/haskell-platform-7.10.2-a-unknown-linux-deb7.tar.gz

(you can either get this from my S3 bucket and be stuck with that version of haskell on a vanilla linux instance, or you can go to haskell.org and get the most recent version for your particular flavor of linux)

2. install the dev-tools from Amazon (this takes a bit)

$ sudo yum groupinstall “Development Tools”

3. Now you need to provide a soft link to libgmp.so:

$ cd /usr/lib64
$ sudo ln -s libgmp.so.3 libgmp.so
$ sudo ln -s libgmp.so.3 libgmp.so.10
$ cd

Yes, you need both of those soft links.

4. Once you have all that done, unzip ghc:

$ sudo tar xzf haskell-platform-7.10.2-a-unknown-linux-deb7.tar.gz

5. And now ghci will work for you:

$ ghci
Prelude> 3 + 4
~> 7


YAY! Do a happy dance!

(Pure) Functional Programming Claims IRL

So, THIS happened:


– question from a young programmer

So, does functional programming stack up in the real world? In industry?

Yes. Check my linkin profile. I have been programming, in industry, as long as you have been alive.

Here are some (pure) functional programming examples to back up this claim, because academics can talk all they want, but they are not in there, in the trenches, with you where it counts.

I am, because I happened to have dug a few of those trenches. You're welcome.

Case study 1: ATS-L

Worked on a project in DHS called 'ATS' ('Automated Targeting System'). The existing system ATS-C was a 100,000-line Prolog behemoth that used pure dynamic types (no type hints, nor boxed types) and every rule started with an assert and ended with a retract. And 100,000 lines.

It was impossible to know what was going on in that system, without running the code in the debugger and pulling from the dynamic environment. Consequently, the ATS-C guy had (still has) job security. Not his aim, but that is a nice plus.

It took us 72-hours to go through every line of his code to correct the Int-rollover problem when the primary key exceeded two billion for the index.

So, I was called in to 'help.' HA! But then eventually I built ATS-L. I wrote it in 10,000 lines of purely functional Prolog (yes, that is possible to do, and remain authentic to logic programming in Prolog), so every rule called gave the same truth-verification from the same set of arguments, every time.

Shocker! I know.

I had the same level of functionality of ATS-C and handled 1,000x the number of transactions per hour. And as it was purely functional Prolog, I could reason about my program in the large and in the small. Importantly, so could others, as I passed on that work after maintaining it for three years.

In short: 1/10th the SLOC with the same level of functionality with increased real-time responsiveness and a vastly reduced level of maintenance.

Oh, and I also wrote 726 unit tests and put them on an automated midnight run, generating a report every single day. If my system broke, or something changed, I knew it, and management knew it when the report was automatically emailed to them.

Case Study 2: CDDS

Worked three years in Fannie Mae devising within a team an appraisal review process, CDDS. We had a good team of five engineers and I was given the 'Sales Comparison Approach' which had 600 elements out of 2,100 data elements in over 100 data tables, one of the tables ingested 100 million elements per month. All the elements were optional. All of them, so primary key dependencies were an ... interesting problem. The upshot was that Sales Comparison Approach was an impossible task to code, as we coded it in Java, of course.

What did I do? I coded it in Java.

After I implemented the Maybe type, then the Monad type-class ... in Java.

After I completed the system and tuned it, storing only the values that were present in the submitted forms, my manager reported up the chain that SCA and CDDS would have failed if I had not been there to implement it.

How did I implement it? In Java. I didn't use one for-loop and my if-statements were not there. I used the Maybe-Monad to model semi-determinism, lifting the present data to Just x and the absent data ('null') to Nothing, and then I executed action against the monadic data.

Simple. Provable. Implemented. Done.

Oh, and I had written 1,000 of the 1,100 unit test cases. SCA had 1,000 unit test cases, the rest of the system had a total of 100 unit test cases.

My code coverage was fiiiiiiine.

Case Study 3: Sunset Dates

This one was interesting.

I worked at Freddy Mac for a year, and they had a problem, and that problem was to calculate the sunset date for a mortgage based on the most recent date from one of possibly five indicators, that changed with each possible mortgage transaction.

Three different software teams tackled this problem over a period of six months and none of them implemented a system that passed UAT.

I sat down with the UATester and kept getting part of the story. I lifted our conversations up into the categorical domain, and then dropped that into a Java-implementation (I used both monads and comonads which I had implemented).

It took me two solid months working with this tester and a front-end developer, but we passed UAT and we got the customer and their SMA to sign off on it.

Three person team, purely functional programming ... in Java won that work where standard imperative approaches failed, over and over again.

Funny story. I was seriously asked on that project: "What's a tuple?"

Case Study 4: Dependency Graphs of Program Requirements ('TMQER')

I can't compare what I wrote, in Haskell, to an alternative system, because the alternative, traditional imperative approach was never essayed. We had a set of 521 requirements for a program with many (multiple) parent and child dependencies, so it wasn't a tree, it was a graph. So, I parsed the requirements document into a Haskell Data.Graph and provided not only a distance matrix, as requested (which is not what the customer wanted at all: it was just what they said and thought they wanted), but also clustering reports of which requirements were the 'heaviest' having the most dependencies and which requirements were show-stoppers to how many follow-on requirements.

Then I uploaded my Haskell Graph into Neo4J, making heavily-clustered requirements an obvious visual cue. And we won that contract.

The project wasn't attempted in Java. The project was attempted in R, and it couldn't be done. They estimated the graph manipulation algorithm would be 200-lines of code in R, that they couldn't get working.

With comonads, I did it in one line of Haskell. One line for a graph deforestation algorithm to get to the bare essentials of what was important to the project. Wanna see it?



How hard was that? In Haskell, a pure functional programming language, not hard at all.

Not only that, that we won a contract that our competing companies said was impossible, but our VP got wind of this and started vetting my tech to other companies.

We have a contract in the works, right now, using Haskell and Neo4J on AWS that is answering questions about fuzzy relations in social networks that a company that is expert in social engineering needs us to answer.

And I can answer these questions using graph theory and purely functional programming.

Case study 5: the one that got away

Oh, and then there was the one that got away. It had to do with a neural network I built in Mercury, a purely functional logic programming language with Prolog-like syntax that was able to classify images into 'interesting' and (mostly) 'not-interesting' but 'interesting' had very specific, different meanings, and it was able to classify these images, using a pulse-coupled neural network, in ways that eliminated 99% of waste images quickly so that analysts could concentrate on doing work, as opposed to sieving through the deluge of useless images to get the the ones they needed to see.

I build a working prototype and demoed it.

This had never been done before. Ever.

Then, a Big Six came in and said, 'we can do that for you with 250 programmers and Java' and stole the project. After ten years and billions of dollars, they were unable to reproduce my work.

Pure Functional Programming Claims IRL

So, let's do a real-money tally.

ATS-L in one month, in the three years I maintained it (it is still up and running ten years later, ladies and gentlemen) made $26 million dollars in seizures and rescued three teens being human-trafficked over the border.

CDDS has been in production since the year 2010 and has verified appraisals helping Fannie Mae to make 62 Billion dollars in net profit in one quarter the year it went live, actually contributing to the rescue of Fannie Mae from insolvency.

TMQER has rescued a government run program from failure that has the funding price-tag of over 100 Million dollars of Government (your) taxpayer (your) money. You're welcome.

Sunset dates I wish I had a dollar amount, but you can estimate for me: three teams of business analysts and software engineers over a six month period said it couldn't be done (or tried it and failed). I scrapped all that code, wrote the system from first principals (Category Theory) and got it working and approved in two months. You do the math.

... Oh, and then there's my current project. I might actually be able to own this thing. Hmmmm.

So, yes, Virginia,

1. there is a Santa Clause
2. those academics are actually onto something. (Pure) functional programming actually does matter. It actually does allow you to program better, faster and more cleanly, and with these enhanced skill-sets you become the one they turn to when other teams throw up their hands at an 'impossible' task. And then you deliver, ahead of expectations on both time to deliver and budget costs.

Hm.

Monday, August 3, 2015

July 2015 1HaskellADay Problems and Solutions

July 2015
  • July 31st, 2015: Today's #haskell problem is a timid little thing, encouraged by @BeRewt http://lpaste.net/5965902538434674688 WEAKSAUCE solution to today's #haskell problem http://lpaste.net/8568087437690011648
  • July 30th, 2015: Today's #haskell problem shows we are not in the Cool World https://en.wikipedia.org/wiki/Cool_World ... shucks and oh, well! http://lpaste.net/3949921219851059200 The answer was to use a shell-script, so whatevs! :/ 
  • July 29th, 2015: For today's #haskell problem we get a 'little' Bifunctor-ish ... and learn sharing is caring http://lpaste.net/2449110323500679168 all at the same time :) Workin' that uncurry on the bimap, yo! http://lpaste.net/1674876759692017664
  • July 28th, 2015: Wait. WUT? It's today, already? How did that happen? Today's #haskell problem http://lpaste.net/2307182433419657216 is about structuring data well.
  • July 27th, 2015: We get all zweitletztes in today's #Haskell problem http://lpaste.net/3467966382467973120 So, did you use the word 'zweitletztes' in a sentence today? Did your friends all say, 'Ooh!'? http://lpaste.net/9086454874664599552
  • July 24th, 2015: Graphite! No: Graph it! for today's #haskell problem http://lpaste.net/3543526934652649472
  • July 23rd, 2015: Employee relationships http://lpaste.net/6361591630532706304 ... nah, it ain't like DAT! for today's #haskell problem Peeps be contracting corps they be working at both! 3 #1Liner later we have a solution http://lpaste.net/5632145019319091200 
  • July 22nd, 2015: SHAPES! MORE SHAPES! ... IN SVG! for today's #haskell problem http://lpaste.net/4776707050010836992 ♫ Everything's coming up ellipses (in SVG) ... http://lpaste.net/309926248828633088  
  • July 21st, 2015: LET's talk about Methods, baybee! No. Okay, let's talk about SHAPES! NOW you're talkin'! http://lpaste.net/287420001722302464 … today's #haskell problem SHAPES! SHAPES! SHAPES-SHAPES-SHAPES! http://lpaste.net/6497686044492693504 (ooh! Look! A rectangle!) (although an ellipse is more impressive, but what can you do?)  
  • July 20th, 2015: Weekly #commute-share expenses http://lpaste.net/3714496233948053504. Does this look like today's #haskell problem? Hells, yeah! 
    When everything looks like a monoid ... http://lpaste.net/2994127577280413696
  • July 17th, 2015: Remember when texting you had to press 7 four times to get the letter 'S'? With today's #haskell exercise, you will! http://lpaste.net/105251841490550784 
    BREAKING: Most used word in text between teens is 'lol.' http://lpaste.net/4604658056066760704 ... :/ Hm. Okay. Why am I not surprised?
  • July 16th, 2015: Bob, Alice, Ann and Sean have one thing in common. Their type. Okay, another thing: lenses. Today's #haskell problem http://lpaste.net/8864895915798822912 A solution. Remember, the solution didn't choose the Lens Life, the Lens Life chose it http://lpaste.net/1412556263379697664 #thuglife for #lenses
  • July 15th, 2015: ACK! ACK! ACK! Who cares about primitive recursion when you can TOTES GO AWESOME! in today's #haskell problem http://lpaste.net/1080559337258090496 ACK! That's a lot of memory utilization for the dynamic approach! ACK! Good thing we have a static solution, too! http://lpaste.net/8903612632200642560
  • July 14th, 2015: For today's #haskell problem we actually write a clojure. I mean 'closure.' http://lpaste.net/3938883175375175680 And the solution ... there should be some function that takes a plain function and 'state-ifies' it (modify <.< ) http://lpaste.net/2651395586560884736
  • July 13th, 2015: ♫ "Summertime, and the variadic function is boring!" We'll change that – WITH MONOIDS! – in today's #haskell problem http://lpaste.net/8230300509106339840 ♫ I got that Summertime, Summertime sadness! (and the solution to today's #haskell problem: http://lpaste.net/3589269637730140160)
  • July 10th, 2015: Why are sole-sourced endpoints called 'web'-services? Today #haskell problem we REALLY take on making a _web_ service http://lpaste.net/374473349520162816
  • July 9th, 2015: In today's #haskell exercise we learn that aesthetics are relative http://lpaste.net/7263104093437034496 so 'it's all good!' #EverythingIsGood #SoNothingIs
  • July 8th, 2015: For today's #haskell problem, we are asked to write a web service, and make it Snappy! http://lpaste.net/8642459490819506176 Really? A web service? In Haskell? REALLY? http://lpaste.net/2749821827369926656  
  • July 7th, 2015: For today's #haskell problem we look at how we can transform JSON into ... MORE JSON! Why? BECAUSE WE CAN! http://lpaste.net/4037780681470771200 ♫ Let's talk about Maryland, baybee! http://lpaste.net/8906149300015202304 
  • July 6th 2015: ♫ Day-by-day ♫ we do a #haskell problem. THIS day we'll do something JSON-y http://lpaste.net/399247231984599040 with webservices as a long-view this week. It's cool when the solution comes with dippin' dots! Geddit? 'Cool'? Dippin' dots?http://lpaste.net/803145685917499392 
  • July 3rd, 2015: So, we look at the iTunes XML library for today's #haskell problem, http://lpaste.net/1641625286095142912 ... specifically: yours. #ick
  • July 2nd, 2015: Silly geese need linearization http://lpaste.net/3051751310949875712 ... AND today's #haskell problem! The solution today is found in a Pink Floyd song ... oh, and here, too: http://lpaste.net/4081680603060109312
  • July 1st, 2015: Today's #haskell problem we do not sort quarters for the laundry, no! We sort words by frequency! Because reasons. http://lpaste.net/2869005254278512640 O! The sordid details of sorting quarters as a solution to today's #haskell http://lpaste.net/4041657933132464128