Thursday, October 1, 2015

September 2015 1HaskellADay Problems and Solutions

  • September 30th, 2015: Now, not only DLists are Functors, for today's #haskell problem we make them Applicative! Come to find that Applied Applicative DLists taste like Apples
  • September 29th, 2015: For today's #haskell problem we look at DLists as Functors! I know! Exciting! and we enfunctorfy DLists ... which are already functions ... ... hmmm ... That sounds like DLists are APPLICATIVE!
  • September 28th, 2015: So we indexed a set of rows last week, let's re(re)cluster them, AGAIN! for today's #haskell problem. And now we re(re)clustered the data, now with colors!
  • September 24th, 2015: Okay, yesterday we indexed rows, so, for today's #haskell problem, let's save and load those rows as CSV
  • September 23rd, 2015: Data Row, o, Data Row: wherefore art thoust identity? Today's #haskell problem adds unique ids for rows of data Simply by using Data.Array we get our data in (uniquely-identified) rows
  • September 22nd, 2015: For today's #haskell problem we go To Infinity ... and Beyond. Yes: we're coding Haskell-on-the-web, yo! simpleHTTP makes HTTP GET-requests, ... well: simple!
  • September 21st, 2015: For today's #haskell problem, we'll fade a circle to black
  • September 17th, 2015: For today's #haskell problem, we receive data one way, but want to see it in another way. What to do? Data. EnCSVified. (that's a word, now)
  • September 16th, 2015: Today's #haskell problem asks 'why JSONify when you can represent clusters yourself?' Why, indeed!
  • September 15th, 2015: For today's #haskell problem we 'unJSONify' some, well, JSON
  • September 14th, 2015: For today's #haskell problem, we relook and recenterclusters from the cluster center So, re-en-cluster-i-fied ... uh: clusters! YAY! (with, ooh! pics!) 
  • September 11th, 2015: Yesterday we displayed one cluster. For today's #haskell problem, let's display them all!
  • September 10th, 2015: This past week we've been clustering data, for today's #Haskell problem we look at visualizing one of these clusters Cluster: shone! ('Schön'? sure!) 
  • September 9th, 2015: Okay, yesterday we clustered some data. For today's #haskell problem: let's see some clustered results, then! It don't mean a thing, ... If it ain't got the (spreadsheet/CSV) schwing.
  • September 8th, 2015: Today we get to do what all those other peeps do in other programming languages. Today we get to WRITE A PROGRAM! wow. I'M K-MEANSIN' ON FIRE TODAY!(okay, geophf, calm down now) A program in Haskell
  • September 7th, 2015: Happy Labor Day in the U.S.A. Today's #haskell problem is to look at recentering clusters for the K-Means algorithm SEMIGROUPOID! (not 'monoid') is the key the solution for today's #haskell problem (ScoreCard has no valid 'zero')
  • September 4th, 2015: Today's #haskell problem we store color-coding for score cards we obtain from rows of data And, color-coded score cards ... SAVED! (makes me wanna scream 'SAIL!')
  • September 3rd, 2015: For today's #haskell problem we look at reclustering the rows of data using K-Means clustering K-Means clustering in #haskell (well, for 1 epoch. Something's not right with step 3: recentered) (and it's DOG-slow)
  • September 2nd, 2015: Drawing information from #BigData is magical, or so says today's #haskell problem Ooh! Big Data is o-so-pretty! But what does it mean? Stay tuned! 
  • September 1st, 2015: For today's #haskell problem we look (obliquely) at the problem of 'indices as identity' What is identity, anyway? 100+ clusters for 3,000 rows? Sounds legit.

Thursday, September 3, 2015

1Liners August 2015

  • August 20th, 2015: Okay this: \(a,b) -> foo a b c d e Somehow curry-i-tize the above expression (make a and b go away!) Is this Applicative?
    • JP @japesinator uncurry $ flip flip e . flip flip d . flip flip c . foo
    • Conor McBride @pigworker (|foo fst snd (|c|) (|d|) (|e|)|)
  • August 19th, 2015: points-free define unintify: unintify :: (Int, Int) -> (Float, Float) where unintify (a,b) = (fromIntegral a, fromIntegral b)
  • August 19th, 2015: points-free define timeser: timeser :: (Float, Float) -> (Float, Float) -> (Float, Float) where timeser (a,b) (c,d) = (a*c, b*d)
  • August 18th, 2015: foo :: (Float, Float) -> (Float, Float) -> Int -> (Float, Float) points-free if: foo (a,b) (c,d) e = ((c-a)/e, (d-b)/e) Arrows? Bimaps?

1Liners July 2015

  • July 29th, 2015: ... on a roll: Point-free-itize
    foo :: (a -> b, a -> b) -> (a, a) -> (b, b)
    foo (f,g) (x,y) = (f x, g y)
    • \[ c^3 \] @das_kube uncurry (***)
  • July 29th, 2015: I can't believe this wasn't a #1Liner already. Point-free-itize dup:
    dup :: a -> (a,a)
    dup x = (x,x)
    • Antonio Nikishaev @lelff join (,)
    • \[ c^3 \] @das_kube id &&& id
  • July 23rd, 2015: define pairsies so that, e.g.: pairsies [1,2,3] = {{1, 2}, {1, 3}, {2, 3}} pairsies :: [a] -> Set (Set a)
    • pairsies list = concat (list =>> (head &&& tail >>> sequence))
  • July 23rd, 2015: define both :: (a -> b) -> (a,a) -> (b,b)
    • Chris Copeland @chrisncopeland point-freer: both = uncurry . on (,)
    • Brian McKenna @puffnfresh both = join bimap
  • July 23rd, 2015: point-free-itize: gen :: Monad m => (m a, m b) -> m (a, b)
    • Bob Ippolito @etrepum gen = uncurry (liftM2 (,))
  • July 17th, 2015: You may have seen this before, but here we go. point-free-itize swap:
    swap :: (a,b) -> (b,a)

1Liners Pre-July 2015

  • Point-free define: foo :: (Ord a, Ord b) => [([a], [b])] -> (Set a, Set b)
    • Андреев Кирилл @nonaem00 foo = (Set.fromList . concat *** Set.fromList . concat) . unzip
  • point-free-itize computeTotalWithTax :: Num b => ((a, b), b) -> b computeTotalWithTax ((a, b), c) = b + c
  • point-free-itize foo (k,v) m = Map.insert k b m with obvs types for k, v, and m.
  • point-free-itize: shower :: forall a. forall b. Show a => [b -> a] -> b -> [a] shower fns thing = map (app . flip (,) thing) fns
  • row :: String -> (Item, (USD, Measure)) given csv :: String -> [String] and line is = "apple,$1.99 Lb" hint: words "a b" = ["a","b"] ... all types mentioned above are in today's @1HaskellADay problem at
  • For Read a, point-free-itize: f a list = read a:list (f is used in a foldr-expression)
    • Or you could just do: map read
  • point-free-itize f such that: f a b c = a + b + c

Tuesday, September 1, 2015

August 2015 1HaskellADay Problems and Solutions

August 2015

  • August 31st, 2015: What do 3,000 circles look like? We answer this question in today's #haskell problem Ah! Of course! 3,000 circles (unscaled, with numeric indices) look like a mess! Of course! 
  • August 28th, 2015: For today's #haskell problem: you said you wuz #BigData but you wuz only playin'! View and scale 'some' data today. Playahz gunna play ... with ... wait: lenses? WAT?
  • August 27th, 2015: Today's #haskell problem inspired from twitter: prove the soundness of ME + YOU = FOREVER Today's #haskell solution is a simpl(istic)e and specific arithmetic (dis)prover ME+YOU /= FOREVER It ain't happenin'
  • August 26th, 2015: You've heard of The Darkness? Well, today's #haskell problem is all about the Brightness Bright eyes! burnin' like fire! 
  • August 25th, 2015: Well, color me surprised! Today's #haskell problem asks to color by Num(bers) And we find out how colors can be numbers, or numbers (Integers) can be colors ... either way
  • August 24th, 2015: You thought I would say 'Purple' (as in Rain) for today's #haskell problem, but I was only playin' #PSA Circles are NOT jerkles ... because ... I don't even know what 'jerkles' ARE!
  • August 21st, 2015: So, ooh! PRITTY COLOURS YESTERDAY! BUT WHAT DO THEY MEAN? Today's #haskell problem we cluster data DO IT TO IT!
  • August 20th, 2015: For today's #haskell problem, now that we have yesterday solved, let's COLOUR the dots! Okay, very hack-y but, indeed: colour-y! (and the index colours need work, too ...)
  • August 19th, 2015: Let's look at some cells in a bounding box, shall we? for today's #haskell problem Share you results here on twitter! Ooh! I see blue dots! K3wl! 
  • August 18th, 2015: In #hadoop you can store a lot of data...but then you have to interpret that stored data for today's #haskell program Today, the SCA/Society for Creative Anachronisms solved the problem. No: SCA/Score Card Analysis ... my bad!
  • August 17th, 2015: For Today's #haskell problem we learn that bb does NOT mean 'Big Brother' (1984). What DOES it mean, then? Tune in! We learn that @geophf cannot come up with interesting title names for so early in the morning!
  • August 14th, 2015: We find out in today's #haskell problem that if 'kinda-prime' numbers had a taste, they would be 'yummy.'
  • August 13th, 2015: We generalize to divbyx-rule by using Singapore Maths-laaaaah for today's #Haskell problem "divby7 is too easy now-laaaah!" ... but there are interesting results for tomorrow's problem
  • August 12th, 2015: Is divby3 a fixpoint? We address this question in today's #haskell problem "There, fixed divby3 for ya!" you crow, in on the 'fix'-joke *groan *fixpoint-humour
  • August 11th, 2015: Today, I ask you to step up your composable-game, #haskell-tweeps! Today's div-by #haskell problem So we ♫ 'head for the mountains!' ♫ for our composable solution of divby10 and divby30 but leave an open question ...
  • August 10th, 2015: Neat little paper on divisibility rules ( leads to today's #Haskell problem divide by 3 rule! A number is divisible by three if the sum of its digits are. PROVED!
  • August 7th, 2015: For today's #haskell problem we relook yesterday's with Data.Monoid and fold(r) ... for fun(r) We're using code from the future (or the bonus answer, anyway) to answer today's #haskell problem
  • August 6th, 2015: For today's #haskell problem, @elizabethfoss provides us the opportunity to do ... MATHS! TALLY HO! Today's solution has Haskell talking with a LISP! (geddit? ;)
  • August 5th, 2015: Today's #Haskell problem shows us that 'anagramatic' is a word now, by way of @argumatronic We learned that #thuglife and #GANGSTA are a bifunctor, but not anagrams with @argumatronic 
  • August 4th, 2015: We actually write a PROGRAM for today's #haskell problem that DOES STUFF! WOW! #curbmyenthusiasm #no Today we learnt to talk like a pirate ... BACKWARDS! ARGGGH! ... no ... wait: !HGGGRA Yeah, that's it.
  • August 3rd, 2015: For today's #haskell problem, we design a Hadoop database ... ya know, without all that bothersome MapReduce stuff ;)

Thursday, August 13, 2015

Yeah, but how do I do that?

So, my article on FP IRL has garnered some interest, and I have been asked, 'Yeah, but how do I get started into FP? How can I start using this stuff at my job?'

So, here's my answer. Here's what I do and how I do it.

So, it depends on how and where you want to start this adventure, yes? The beauty of today is that there is so many resources freely available to let you work on them. The problem is that you're good at what you do already, so it'll be hard to move away from what you know already into the domain where it should be easy but it's actually really, really different and that difference can be frustrating: caveat coder.

Also, there are effete programmers out there that tell you how you should not code.

"Oh, Prolog's not pure and it's not functional. You can't do that."

I don't listen to what I can't do. When somebody says to me: "you can't do that," it really means they are saying: "I can't do that." And I'm not interested in listening to the whining of losers. What I'm interested in is delivering results by coding well. If you want to do that, I want to do that with you. If you want to tell me what I can't do, the door is over there, don't let it hit you on your way out of my life.

Sorry. Not sorry.


I host @1HaskellADay where I post a problem that you can solve in any language you want, but I post the problem, and the solution, in Haskell, every day, Monday through Friday. You can learn FP one day at a time that way, be it Haskell, Scala, Idris, whatever you'd like. You write a solution in Haskell, I retweet your solution so everybody can see you're a Haskell coder.

So. That.

Also, there are contests on-line, some with money prizes (kaggle, right?), that you can take on in language X. You may or may not win, but you'll surely learn what you can do easily, and what comes hard in your language of choice.

The way I learn a language is I don't. Not in the traditional sense, that is, of learning a language's syntax and semantics. If I don't have a 'why' then the 'how' of a language is uninteresting to me.

So I make a 'why' to learn a language, then I learn it.

What I do is I have a real-world problem, and solve it in that language. That's how I learn a language, and yes, so I code wrongly, for a long time, but then I start to code better and better in that language, until I'm an expert in that language.

Reading any and everything on the fundamentals of that language, as I encounter them, help me a lot, too.

So, as you can see. I'm learning the 'languages' Neo4J and AWS right now (yes, I know, they aren't languages. Thanks). Lots of fun. I'm doing stuff obviously wrong, but the solutions I provide they need at work, and I'm the only one stepping up to the plate and swinging hard and fast enough to keep them funding these adventures in this tech.

Get that. They are paying me at work to learn stuff that I'm having a blast doing. Why?

Maybe it's because when the VP says, 'Hey, I got a problem here for ya,' I come running?

Here's something I do not do.

I DO NOT ASK: 'can I code in X?' because the answer is always: 'No.'

What I do, is code in X and then hand them a result that so wows them, they feed me the next impossible problem to solve, and I get to set the terms. It's like instead of 'doing my job,' I instead take ownership of the company and its problems, looking for the best solution for the company as its owner. And, like an owner, I say what I do and how I do it, because I know what's best for the company in these new waters we're exploring together in partnership.

Try it that way. Don't say 'we should do X' because that's what (in management's eyes) whiny little techies say. No, don't say anything. Just code it in X, deliver the solution, that you demo to the VP and then to the whole company, and get people coming up to you saying, 'Wow. Just wow. How the hell did you do that?'

No kidding: it takes a hell of a lot of courage to be a water-walker. It has for me, anyway, because the risk is there: that you'll fail. Fail hard. Because I have failed hard. But I choose that risk over drowning, doing what they tell me and how they tell me to do it, because I'm just employ number 89030 and my interview was this: "Do you know Java?" "Yes, I know Java." And, yay, I'm a Java programmer, just like everybody else, doing what everybody else does. Yay, so yay. So boring.

I've failed twice in my 25 years in this field, and wasn't for lack of trying. Twice.

Do you know how many times I have succeeded? I don't. I've lost count. I've saved three teens' lives and that was just in one month. Put a price on that, and that was because I stepped up to the plate and tried, when nobody else would or could. And my other successes, too, and the beauty of my successes is that the whole team won, we all got to keep working on really neat stuff that mattered and got company and customer attention.

And, bonus, "Hey, I've got a business and I need your help."

Three times so far.

Taking the risk leads to success. Success breeds success.

It starts with you, not asking, but just taking that risk, trying, a few times or right off the bat, and succeeding.

And that success, and the feeling you get from knowing you've done something, and you've done something.

They can't take that away from you.


Uploading Data to GrapheneDB


We look at how to upload a set of data (potentially massive) to GrapheneDB. I say '(potentially massive)' because an option, of course, is to enter Cypher, line-by-line in the Neo4J web admin interface, but this becomes onerous when there are larger data sets with complex (or, strike that, even 'simple') relations.

An ETL-as-copy-paste is not a solution for the long term, no matter how you slice it (trans: no matter for how long you have that intern).

So, let's look at a viable long-term solution using a specific example.

Let's do this.

The Data

The data is actually a problem in and of itself, as it is the set of Top 5 securities by group, and it is reported by various outlets, but the reports are (deeply) embedded into ((very) messy) HTML, in the main, or have a nice, little fee attached to them if you want to tap into a direct feed.

As I'm a start-up, I have more time than money (not always true for all start-ups, but that's a good rule of thumb for this one), so, instead of buying a subscription to the top 5s-feed, I built an HTML-scraper in Haskell to download the sets of Top 5s. Scraping HTML is not in the scope of this article, so if you wish to delve into that topic, please review Tagsoup in all its glory.

Okay, prerequisite,

Step 0. scraped data from HTML: done (without explanation. Deal.)

Next, I save the data locally. I suppose I could go into a database instance, such as MySQL, but for now, I have only 50 or so days worth of data, which I'm happily managing in a file and a little map in memory.

Step 1. store data we care about locally: done

Okay, so we have scraped data, automatically stored away for us. What does it all mean? That's when I got the idea of having a way of visualizing and querying these data. Neo4J was a fit, and GrapheneDB, being DaaS (you just have to need to know that 'DaaS' means 'Data as a Service'), makes sense for collaborating as a geographically-dispersed team.

Two Options

So, how do I get the data there? Two options that we explored. One was to load the data into a local neo4j-instance and then snap-restore in the Cloud with that image. That works, the first time, but it seems to me to be rather ponderous, as this is a daily process, and I do not wish to replicate my database locally daily and snap restore to the Cloud. So, I chose the latter option, which was to build a system that takes the local map, translate that into Cypher queries (to save as graph nodes and edges), then translate those Cypher queries into JSON, then create a web client that ships that JSON-of-Cypher-queries over the wire to the targeted web service.

... Neo4J and GrapheneDB are web services that allow REST data queries... (very) helpful, that.

Step 2. Translate the local data to Cypher queries

Okay, this is not going to be a Cypher tutorial. And this is not going to be the Cypher you like. I have my Cypher, you have yours, critique away. Whatevs. Let's get on with it.

The data is of the following structure, as you saw above:

Date -> { ("Mkt_Cap", ([highs], [lows])), ("Price", ([highs], [lows])), ("Volume", [leaders]) }

And we wish to follow this structure in our graph-relational data. Let's look at the top-tier of the data-structure:

You see I've also added a 'Year'-node. I do this so I can anchor around a locus of days if I wish to explore data across a set of days.

So, okay, from there, do I then create grouping nodes of 'leaders' and 'losers' for the categorization of stocks? This gets into the data-modelling question. I chose to label the relations to the stocks as such instead of creating grouping nodes. There're tradeoffs in these modeling decisions, but I'm happy with the result:

The module that converts the internal data structures is named Analytics.Trading.Web.Upload.Cypher. Looking at that module you see it's very MERGE-centric. Why? Here's why:

What you see here is that symbols, such as, well, primarily $AAPL, and others like $T and $INTC find themselves on the Top 5s lists, over and over again. By using MERGE we make sure the symbol is created if this is its first reference, or linked-to if we've seen it before in this data set.

In this domain, MERGE and I are like this: very close friends.

Okay, Map-to-Cypher, it's a very close ... well, mapping, because the relational calculus and domain-to-co-domain-mappings have a high correspondence.

I'm at a 'disadvantage' here: I come to functional programming from a Prolog-background: I think functional data structures relationally, so, usually, mappings of my functional data structures fall very nicely into graphs.

I don't know how to help you with your data structures, especially if you've been doing the Java/SQL object/relation-mapping stuff using JPA ... I mean, other than saying: 'Switch to ... Haskell?' Sorry.

Okay, so we have a set of Cypher queries, grouped in the following structures:

Date -> [groups] where groups are Mkt_Cap, Volume, and Price

Then, for each group for that date

group -> Leader [symbols] -> Losers [symbol]

So we have (with the three groups), four sets of Cypher queries, each of the grouped Cypher query weighing in with thirty MERGE statements each (three MERGE statements for each stock symbol node). Not bad.

How do we convert this list of grouped Cypher queries into JSON that Neo4J understands?

Two things make this very easy.

  1. The JSON-structure that Neo4J accepts is very simple, it is simply a group of "statements" which are individuated into single Cypher-"statement" elements. Very simple JSON! (thank you, Neo4J!)
  2. There is a module in Haskell, Data.Aeson, that facilitates converting from data structures in Haskell into JSON-structure, so the actual code to convert the Cypher queries reduces to three definitions:

With that, we have Cypher queries packaged up in JSON.

Step 3: SHIP IT!
So now that we have our data set, converted to Cypher, and packaged as JSON, we want to send it to GrapheneDB. Before I went right to that database (I could have, but I didn't), I tested my results on a Neo4J instance running on my laptop, ran the rest call and verified the results. BAM! It worked for the one day I uploaded.

After I got that feel-good confirmation in the small, I simply switched the URL from localhost:7474 to the URL GrapheneDB provides in the "Connection" tab, and voilà: we have data here!

(lots of it!)

Step n: Every day
So now that I have the back-data uploaded, I just simply run my scraper->ETL-over-REST->GrapheneDB little system and I have up-to-the-day Top 5s stock securities for my analysis, on the Cloud.