Tuesday, December 1, 2015

November 2015 1HaskellADay One-liners

  • November 3rd, 2015:
Why is 'a' the standard label for type-variables? If you don't care what the type is, shouldn't the type be 'eh'? #imponderables
  • November 3rd, 2015:
{-# LANGUAGE OverloadedStrings #-}import Network.HTTPtype URL = StringrespBodyAsText :: URL -> IO Stringdefine respBodyAsTextrespBodyAsText url = simpleHTTP (getRequest url) >>= getResponseBody
  • November 2nd, 2015: 
You have f :: a -> IO b, g :: b -> IO (), h :: b -> IO AnsYou wish to sequence f, g, h as j :: a -> IO AnsDefine j points-freeDimitri Sabadie @phaazon_ fmap snd . runKleisli (Kleisli g &&& Kleisli h) . f

November 2015 1HaskellADay Problems and Solutions

November 2015

  • November 30th, 2015: Pride and Prejudice on the command-line? No. Today's #haskell problem: read in a stream http://lpaste.net/7832470045098770432 The solution defines a new Kleisli arrow. http://lpaste.net/4713907846389956608 ... AND Jane Austen prefers the pronouns SHE and HER. So there's that.
  • November 27th, 2015: Simply getting the command-line arguments for #BlackFriday #haskell problem http://lpaste.net/131531689812819968 ...and then there's that bonus. OH NOES! 'Simple' solution, am I right? http://lpaste.net/1418503822422048768
  • November 26th, 2015: A little worker-pool in #haskell wishes you Happy Thanksgiving from the #USA or today's problem: Erlangesque-Haskellhttp://lpaste.net/2732286163095126016 And today, a #haskell solution says ('sez') "Go get'm Black Friday dealz, yo!" http://lpaste.net/7453476641931526144 (but: caveat emptor!)
  • November 25th, 2015: Today's #haskell problem has a Secret Decoder Ring! http://lpaste.net/317245813698854912 ... as long as you use the HaHaJK-type. BREAKING: SHA1-HASH DECODED using #haskell! http://lpaste.net/7305841715271696384 Reported here first show my bonnie lies over BOTH the ocean AND the sea!
  • November 24th, 2015: For today's #haskell problem we look at parsing URI ... not Andropov https://en.wikipedia.org/wiki/Yuri_Andropov ... not today. http://lpaste.net/3031598688741883904 Today's #haskell URI-parsing exercise makes Yuri (Andropov) SAD and MAD ... Don't worry, Yuri: URIs are just a FADhttp://lpaste.net/8338275656215298048
  • November 23rd, 2015: For today's #haskell problem we ride West on ol' Silver declaiming: "JSON! Ho!" http://lpaste.net/7278810874737852416 And the solution allows us to look at JSON and declaim: HA! http://lpaste.net/2880528191972179968
  • November 20th, 2015: Today's #haskell problem comes with honorable mentions and stuff! http://lpaste.net/7575693578471473152 ♫ My heart...beats...fasta!
     
    ... AAANNNNNDDDDD our solution, down to 4.6 seconds from 151 seconds. http://lpaste.net/2479927048856928256 Not a bad start!
  • November 19th, 2015: In today's #haskell problem we say: '@geophf your RID-analyzer is SO efficient!' http://lpaste.net/6802158616863309824 ... NOT! Update: today geophf cries Efficienc-me? No! Efficienc-you! http://lpaste.net/7547436765292789760
  • November 18th, 2015: Today JSON and the Argonauts sail off into the high seas of the RID to adventures beyond the Thunderdome! http://lpaste.net/7016479864345591808 No...wait.
  • November 17th, 2015: Today's #haskell problem generates a report with no title... o! the irony! http://lpaste.net/4139233297970495488 The solution shows Jane Austen getting her Aggro on ... even if just a little bit http://lpaste.net/8111201736003158016 
  • November 16th, 2015: New Regressive Imagery Dictionary/RID(-structure)? That means New-NEW JSON for today's #Haskell problem http://lpaste.net/40452467304955904 And there is the RID, in all its JSON-iferific glory! http://lpaste.net/262135232898007040
  • November 13th, 2015: Today's #haskell problem–Project RIDenberg–classifies a big-ole document with FULL! ON! RID! http://lpaste.net/2340251327956779008 (exclamation mandatory) Today's solution shows us that the RID is as fascinating as ... well: Mr. Wickham. http://lpaste.net/5788646192996941824 (There. I said it.)
  • November 11th, 2015: Today's #haskell problem goes the Full Monty... NO! WAIT! NOT 'MONTY'! WE GO FULL RID! (Regressive Imagery Dictionary) http://lpaste.net/2598821222603030528 ... annnnnndddd this #haskell solution gives us the full RID as a graph http://lpaste.net/3885537396636254208 
  • November 10th, 2015: For today's #haskell problem we look at parsing a (small) document and matching it to a (small) RID http://lpaste.net/237534433320632320 QWERTY-style! Our solution (also) answers that age-old nagging question: "What DOES the fox say?" http://lpaste.net/6858775962385907712 … No, really: I need to know this.
  • November 9th, 2015: LAST week we looked at cataloguing the RID/Regressive Imagery Dictionary. Today's #haskell problem completes that. http://lpaste.net/7116276040808267776 Not that every problem and every solution can be modeled as a Graph, but ... the solution-as-Graph is here: http://lpaste.net/2599543283914899456 *blush
  • November 6th, 2015: Today's #haskell problem looks at the RID as Friends and Relations http://lpaste.net/3230687675795111936 ... actually: it just looks at RID-as-relations Ooh! Pritty Bubblés for a solution to the RID-as-relations problem http://lpaste.net/3885537396636254208 
  • November 5th, 2015: Today's #haskell problem is to JSONify the RID because JSON, and because indentation as semantic-delimiters is weird http://lpaste.net/3601240609232257024 A solution shows PRIMARY-RID-JSON in 420 lines, as opposed to the raw text at over 1800 lines. Cool story, bro! http://lpaste.net/2808636942017626112
  • November 4th, 2015: For today's #Haskell problem please Graph that RID! YEAH! http://lpaste.net/7822865785260343296 I WANT YOUR SEX(y graph of the RID) poses a solution at http://lpaste.net/5625628956930605056
  • November 3rd, 2015: YESTERDAY we used a Python program to map a document to the RID in #Haskell TODAY we map part of the RID to #Haskellhttp://lpaste.net/4829042411923570688 A solution gives us a Pathway to BACON, http://lpaste.net/3419022958092353536 ... because priorities.
  • November 2nd, 2015: We look at EQ for today's #Haskell problem; not the Class Eq, but the Emotional Quotient of a document. Fun! http://lpaste.net/856570908666494976 runKleisli (from @phaazon_) and (>=>) to the rescue for the solution today http://lpaste.net/5618280040253882368

Tuesday, November 24, 2015

(Really) Big Data: from the trenches

Okay, people throw around 'big data' experience, but what is it really like? What does it feel like to manage a Petabyte of data. How do you get your hands around it? What is the magic formula that makes it all work seamlessly without Bridge Lines opened on Easter Sunday with three Vice Presidents on the line asking for status updates by the minute during an application outage?

Are you getting the feel for big data yet?

Nope.

Big data is not terabytes, 'normal'/SQL databases like Oracle or DB2 or GreenPlum or whatever can manage those, and big data vendors don't have a qualm about handling your 'big data' of two terabytes even though they are scoffing into your purchase order.

"I've got a huge data problem of x terabytes."

No, you don't. You think you do, but you can manage your data just fine and not even make Hadoop hiccough.

Now let's talk about big data.

1.7 petabytes
2.5 billion transactions per day.
Oh, and growing to SIX BILLION transactions per day.

This is my experience. When the vendor has to write a new version of HBase because their version that could handle 'any size of data, no matter how big' crashed when we hit 600 TB?

Yeah. Big data.

So, what's it like?

Storage Requirements/Cluster Sizing


1. Your data is bigger than you think it is/bigger than the server farm you planned for it.

Oh, and 0. first.

0. You have a USD million budget ... per month.

Are you still here? Because that's the kind of money you have to lay out for the transactional requirements and storage requirements you're going to need.

Get that lettuce out.

So, back to 1.

You have this formula, right? from the vendor that says: elastic replication is at 2.4 so for 600 TB you need 1.2 Petabytes of space.

Wrong.

Wrong. Wrong. WRONG.

First: throw out the vendors' formulae. They work GREAT for small data in the lab. They suck for big data IRL.

Here's what happens in industry.

You need a backup. You make a backup. A backup is the exact same size as your active HTables, because the HTables are in bz2-format already compressed.

Double the size of your cluster for that backup-operation.

Not a problem. You shunt that TWO PETABYTE BACKUP TO AWS S3?!?!?

Do you know how long that takes?

26 hours.

Do you know how long it takes to do a restore from backup?

Well, boss, we have to load the backup from S3. That will take 26 hours, then we ...

Boss: No.

me: What?

Boss: No. DR ('disaster recovery') requires an immediate switch-over.

Me: well, the only way to do that is to keep the backup local.

Boss: Okay.

Double the size of your cluster, right?

Nope.

What happens if the most recent backup is corrupted, that is, today's backup, because you're backing up every day just before the ETL-run, then right after the ETL-run, because you CANNOT have data corruption here, people, you just can't.

You have to go to the previous backup.

So now you have two FULL HTable backups locally on your 60-node cluster!

And all the other backups are shunted, month-by-month, to AWS S3.

Do you know how much 2 petabytes, then 4 petabytes, then 6 petabytes in AWS S3 costs ... per month?

So, what to do then?

You shunt the 'old' backups, older than x years old, every month, to Glacier.

Yeah, baby.

That's the first thing: your cluster is 3 times the size of what it needs to be, or else you're dead, in one month. Personal experience bears this out: first you need the wiggle room or else you stress out your poor nodes of your poor cluster, and you start getting HBase warnings and then critical error messages about space utilization, second, you need that extra space when the ETL job loads in a billion row transaction of the 2.5 billion transactions you're loading in that day.

Been there. Done that.

Disaster Recovery


Okay, what about that DR, that Disaster Recovery?

Your 60 node cluster goes down, because, first, you're not an idiot and didn't build a data center and put all those computers in there yourself, but shunted all that to Amazon and let them handle that maintenance nightmare.

Then the VP of AWS Oregon region contacts you and tells you everything's going down in that region: security patch. No exceptions.

You had a 24/7 contract with 99.999% availability with them.

Sorry, Charlie: you're going down. A hard shutdown. On Thursday.

What are you going to do?

First, you're lucky if Amazon tells you: they usually just do it and let you figure that out on your own. So that means you have to be ready at any time for the cluster to go down with no reason.

We had two separate teams monitoring our cluster: 24/7. And they opened that Bridge Line the second a critical warning fired.

And if a user called in and said the application was non-responsive?

Ooh, ouch. God help you. You have not seen panic in ops until you see it when one user calls and come to find it's because the cluster is down with no warning catching that.

Set up monitoring systems on your cluster. No joke.

With big data, your life? Over.

Throughput


Not an issue. Or, it becomes an issue when you're shunting your backup to S3 and the cluster gets really slow. We had 1600 users that we rolled out to, we stress-tested it, you know. Nobody had problems during normal operations, it's just that when you ask the cluster to do something, like ETL or backup-transfer, that engages all disks of all nodes in reads and writes.

A user request hits all your region servers, too.

Do your backups at 2 am or on the weekends. Do your ETL after 10 pm. We learned to do that.

Maintenance


Amazon is perfect; Amazon is wonderful; you'll never have to maintain nor monitor your cluster again! It's all push-of-the-button.

I will give Amazon this: we had in-house clusters with in-house teams monitoring our clusters, 'round the clock. Amazon made maintenance this: "Please replace this node."

Amazon: "Done."

But you can't ask anything other than that. Your data on that node? Gone. That's it, no negotiations. But Hadoop/HBase takes care of that for you, right? So you're good, right?

Just make sure you have your backup/backout/DR plans in place and tested with real, honest-to-God we're-restarting-the-cluster-from-this-backup data or else you'll never know until you're in hot water.

Vendors


Every vendor will promise you the Moon ... and 'we can do that.' Every vendor believes it.

Then you find out what's what. We did. Multiple times, multiple vendors. Most can't handle our big data when push came to shove, even though they promised they can handle data of any size. They couldn't. Or they couldn't handle it in a manageable way: if the ETL process takes 26 hours and it's daily, you're screwed. Our ETL process got down to 1.5 hours, but that was after some tuning our their part and on ours: we had four consultants from the vendor in-house every day for a year running. Part of our contract-agreement. If you are blazing the big data trail, your vendor is, too: we were inventing stuff on the fly just to manage the data coming in, and to ensure the data came out in quick, responsive ways.

You're going to have to do that, too, with real big data, and that costs money. Lots.

And, ... but it also costs cutting through what vendors are saying to you, and what their product can actually handle. Their sales people have their sales-pitch, but what really happened is we had to go through three revisions of their product just so it could be an Hadoop HBase-compilant database that could handle 1.7 petabytes of data.

That's all.

Oh, and grow by 2.5 billion rows per day.

Which leads to ...

Backout/Aging Data


Look, you have big data. Some of it's relevant today, some of it isn't. You have to separate the two, clearly and daily, if you're not, then a month, two months, two years down the road you're screwed, because you're now dealing with a full-to-the-gills cluster AND having to disambiguate data you've entangled, haven't you? with the promise of looking at aging data gracefully ... 'later.'

Well, later is right now, and your cluster is full and in one month it's going critical.

What are you going to do?

Have a plan to age data. Have a plan to version data. Have a data-correction plan.

These things can't keep being pushed off to be considered 'later' because 'later' will be far too late, and you'll end up crashing your cluster (bad) or corrupting your data when you slice and dice it the wrong way, come to find (much, much worse). Oh, and version your backups, tying them to the application version, because when you upgrade your application, your data gets all screwy, being old, or your new data format on your old application when somebody pulls up a special request to view three-year-old data is all screwy.

Have a very clear picture of what your users need, the vast majority of the time, and deliver that and no more.

We turned a 4+hour query that terminated when it couldn't deliver a 200k+ row query on GreenPlum...

Get that? 4+hours to learn your query failed.

No soup for you.

To a 10 second query against Hadoop HBase that returns 1M+ rows.

Got that?

We changed peoples' lives. What was impossible before for our 1600 users was now in hand in 10 seconds.

But why?

Because we studied all their queries.

One particular query was issued 85% of the time.

We built our Hadoop/HBase application around that, and shunted the other 15% of the queries other tools that could manage that load.

Also, we studied our users: all their queries were in transactions of within the last month.

We kept two years of data on-hand.

Stupid.

And that two years grew to more, month by month.

Stupider.

We had no graceful data aging/versioning/correcting plans, so, 18 months into production we were faced with a growing problem.

Growing daily.

The users do queries up to a month? No problem: here's your data in less than 10 seconds, guaranteed. You want to do research, you put in a request.

Your management has to put their foot down. They have to be very clear what this new-fangled application is delivering and the boundaries on what data they get.

Our management did, for the queries, and our users loved us. You put in a query and it takes four hours, and only 16 queries are allowed against the system to run at any one time to: anyone, anywhere can submit a query and it returns right away?

Life-changing, and we did psychological studies as well as user-experience studies, too, so I'm not exaggerating.

What our management did not do is put bounds on how far back you could go into the data set. The old application had a 5 year history, so we thought two years was good. It wasn't. Everybody only queried on today, or yesterday, or, rarely: last week or two weeks ago. We should have said: one month of data. You want more, submit a request to defrost that old stuff. We didn't and we paid for it in long, long meetings around the problem of how to separate old data from new and what to do to restore old data, if, ever (never?) a request for old data came. If we had a monthly shunt to S3 then to Glacier, that would have been a well-understood and automatic right-sizing from the get-go.

You do that for your big data set.

Last Words


Look. There's no cookbook or "Big Data for Dummies" that is going to give you all the right answers. We had to crawl through three vendors to get to one who didn't work out of the box but who could at least work with us, night and day, to get to a solution that could eventually work with our data set. So you don't have to do that. We did that for you.

You're welcome.

But you may have to do that because you're using Brand Y not our Brand X or you're using Graph databases, not Hadoop, or you're using HIVE or you're using ... whatever. Vendors think they've seen it all, and then they encounter your data-set with its own particular quirks.

Maybe, or maybe it all will magically just work for you.

And let's say it does all magically work, and let's say you've got your ETL tuned, and your HTables properly structured for fast in-and-out operations.

Then there's the day-to-day daily grind of keeping a cluster up and running. If your cluster is in-house ... good luck with that. Have your will made out and ready for when you die from stress and lack of sleep. If your cluster is from an external vendor, just be ready for the ... eh ... quarterly, at least, ... times they pull the rug out from under you, sometimes without telling you and sometimes without reasonably fair warning time, so it's nights and weekends for you to prep with all hands on deck and everybody looking at you for answers.

Then, ... what next?

Well: you have big data? It's because you have Big Bureaucracy. The two go together, invariably. That means your Big Data team is telling you they're upgrading from HBase 0.94 to HBase whatever, and that means all your data can go bye-bye. What's your transition plan? We're phasing in that change next month.

And then somebody inserts a row in the transaction, and it's ... wrong.

How do you tease a transaction out of an HTable and correct it?

An UPDATE SQL statement?

Hahaha! Good joke! You so funny!

Tweep: "I wish twitter had an edit function."

Me: Hahaha! You so funny!

And, ooh! Parallelism! We had, count'm, three thousand region servers for our MapReduce jobs. You got your hands around parallelism? Optimizing MapReduce? Monitoring the cluster as the next 2.5 billion rows are processed by your ETL-job?

And then a disk goes bad, at least once a week? Stop the job? Of course not. Replace the disk (which means replacing the entire node because it's AWS) during the op? What are the impacts of that? Do you know? What if two disks go down during an op?

Do you know what that means?

At replication of 2.4, two bad disks means one more disk going bad will get you a real possibility of data corruption.

How's your backups doing? Are they doing okay? Because if they're on the cluster now your backups are corrupted. Have you thought of that?

Think about that.

And, I think I've given enough experience-from-the-trenches for you to think on when spec'ing out your own big data cluster. Go do that and (re)discover these problems and come up with a whole host of fires you have to put out on your own, too.

Hope this helped. Share and enjoy.

cheers, geophf

Monday, November 2, 2015

October 2015 1HaskellADay 1-Liners

  • October 15th, 2015: Matrix-themed problem
    dotProduct :: Num a => [a] -> [a] -> a
    dotProduct a b = sum (zipWith (*) a b)
    point-free-itize the definition
    • Freddy Román @frcepeda dotProduct = sum `dot` zipWith (*) where dot = (.).(.)
  • October 13th, 2015: You're given either fst or snd, but don't know which. Define a function that returns its dual:
    dual :: ((a,a) -> a) -> ((a,a) -> a)
    n.b.: The tuples here have BOTH fst and snd as the same type: a
    Also, a-values have NO typeclass constraints. You CAN NOT use (==) nor (>)
    • Michael Thomas @mjtjunior dual f = f (snd,fst)
    • Francisco Soares Nt @frsoares dual = uncurry . flip . curry
    • Fernando Castor @fernandocastor dual f = \(x,y) -> f (y, x)
      (I hadn't seen @mjtjunior's answer beforehand)
    • Андреев Кирилл @nonaem00 (. Data.Tuple.swap)

Friday, October 30, 2015

October 2015 1HaskellADay Problems and Solutions

October 2015

  • October 29th, 2015: This is a perfect introduction to today's #haskell problem: dynamic predictions http://lpaste.net/6734184072140029952 because cats. And today's #haskell problem has the added benefit of containing the longest epic mid-type-declaration-comment of epic epicness. Epically. ... but what you didn't see for today's #haskell problem is the preparation #fallingasleepoverthekeyboard #again And the 'S' in the anSwer is not for 'S'tatistician, but for geophf waiting for a 'S'uper heroine to give the anSwer http://lpaste.net/8373695753289203712
  • October 28th, 2015: Today's #haskell problem, we DEFINE WHAT 'AVERAGE' IS! Nope. But we do take on predictive analytics! http://lpaste.net/6882676007984168960 So there's that. And here's the predictions-distributions. One day we'll even do ROC-analysis. Or not. http://lpaste.net/4234314648314183680
  • October 27th, 2015: For today's #haskell problem we say "HEY! YOU! GET YOU SOME RANDOM, YO!" and then define a random number generator http://lpaste.net/5973373084290252800 A (random) solution (not really) to yesterday's random (really) #haskell problem http://lpaste.net/3547465428253016064
  • October 26th, 2015: Well, bleh! It only took all day to compose, but here's today's #haskell problem! "Learning R...in Haskell!"http://lpaste.net/2843567468654362624 Okay, that (randomly) hurt! -- one possible solution to this problem is posted at http://lpaste.net/4854130028864077824
  • October 23rd, 2015: Today's #haskell problem is thanks to Jim Webber's keynote at @GraphConnect is about triadic closurehttp://lpaste.net/2004709237044805632
  • October 22nd, 2015: Today's #haskell problem is thanks to Jim Webber's keynoteat the @neo4j @GraphConnect: WWI Alliances http://lpaste.net/4042786156616613888  WWI-Allianceshttp://lpaste.net/4413387094903226368 … and as a @neo4j-graph 
  • October 16th, 2015: Today's #haskell problem asks you to create MAJYCK! with LENSES over MATRICES using SCIENCE! (lens = magic ICYMI) http://lpaste.net/4391386661800378368
  • October 15th, 2015: Today's #haskell problem is a real (silly) problem: 'efficientize' row and col definitions for Data.Matrix http://lpaste.net/7329174284021006336 Zippidy Doo-Dah! Zippidy day! My, oh, my we've 'efficientized' Matrix RowCol (that scans. Kinda) http://lpaste.net/4076800205951860736
  • October 14th, 2015: For today's #haskell problem we look at multiplying matrices, because SCIENCE! http://lpaste.net/2775082411233378304 Today criss-cross is gonna JUMP-JUMP! ... and sauce the apples http://lpaste.net/6379071958448865280 (What this has to do with matrix-multiplication, I do not know)
  • October 13th, 2015: A rose by any other name would smell as sweet. A matrix-transpose by any other name is still today's #haskell problem http://lpaste.net/7639242339784851456 Today we transpose matrices ... LIKE A GANGSTA! http://lpaste.net/4495861517937278976
  • October 12th, 2015: We go from eh-matrices to ÜBERMATRICES for today's #haskell problem http://lpaste.net/3386266226073272320 And we übered those matrices at http://lpaste.net/4104557754952187904
  • October 8th, 2015: We haven't touched Data.Matrix in a while, and it didn't age well. Let's fix this for today's #haskell problem http://lpaste.net/4256620462181711872 Matrices, REBORN! http://lpaste.net/5942967859750633472 (or at least, prenatal, but we'll get there)
  • October 7th, 2015: So, after all that work making DList Foldable/Traversable/Monadible (eh?) TODAY's #haskell problem relaxes MultiMap http://lpaste.net/2435920706567929856 That MultiMap is now hella-relaxed, yo! http://lpaste.net/8471070633348825088
  • October 6th, 2015: So YESTERDAY we looked at Foldable. @argumatronic said "Step it up: do Traversable!" So for TODAY'S #haskell problem http://lpaste.net/4868288594713772032 So we WuTang Traversible CLANNEDthat solution! http://lpaste.net/4953464088320540672
  • October 5th, 2015: For today's #haskell problem we go from Monadical to Foldable, thanks to @argumatronic http://lpaste.net/2602721740102565888 Wait. Is 'monadical' a word? DList. Foldable instance. Done. http://lpaste.net/8044707151110733824
  • October 2nd, 2015: For today's #haskell problem we make multimaps fast with difference lists ... OR. DO. WE! http://lpaste.net/8174150842572603392 And today we find out HOW FAST MULTIMAPS ARE WITH DLISTS! (in all caps, no less) http://lpaste.net/341094126416035840
  • October 1st, 2015: Yesterday we made Difference Lists Applicative, for today's #haskell problem we make them monadichttp://lpaste.net/1062399498271064064 So, difference lists are monadic now ... so there's that ... http://lpaste.net/4828960021565931520

Wednesday, October 21, 2015

GraphConnect Lightning Talk: Project Planning Troubles? Graph Theory to the Rescue!



Project in Trouble?
GRAPHS TO THE RESCUE
Doug Auclair
for NCI, inc.
Washington, D.C.

The 4-slide Presentation
  • Okay, what're we talkin' 'bout here?
  • So, borin', amirite?
  • The State of Graphs (at NCI)
  • I can haz Linkies, plz


The Project/The Plan
  • US Gov't project: Spreadsheet management
  • The 'need': Okay, they don't know what they need
  • The 'what they don't want': excitement
  • What they got: DIS!


WHERE-D'AT, NCI?
  • Project successes/progress:
    • Gov't client: One delivery, on approved software list
    • Commercial client: Our first, PoC-stage
    • What we need, yo!


I CAN HAZ LINKIES?


MORE THAN 4 PAGES
(What they had...)
(eww, spreadsheets, blah-bla-blah)

WHAT THEY GOT
(but under the covers)
(oooh! graphs!)

WHAT THEY GOT
WHAT THEY WANTED
(sigh, more spreadsheets)
(WITH the answers they needed, s'il vous plaît)

BUT THEY ALSO GOT
(sigh, spreadsheets)
(but showing data in ways they couldn't see before)

WHERE I'M GOING WITH THIS
This...
Unclustered, but colorized-by-algorithm data

... to this
Clustered data, represented colored-by-datum, and expandable-on-demand nodes








Wednesday, October 14, 2015

September 2015 1HaskellADay 1Liners

  • September 18th, 2015:
    Okay so we know the tuple2list function doesn't exist generally, but ...
    t2l :: (a,a) -> [a]
    define t2l points-free
    • JP @japesinator uncurry ((. pure) . (:))
    • 熊井さん @lotz84_ reverse . uncurry ((flip (:)) . (:[]))
    • bazzargh @bazzargh ap[fst,snd].(:[])
  • September 17th, 2015:
    (bifunctor crowd is rubbing their hands here...)
    you have x,y :: [(a, b)]
    you need p,q :: [b]
    define fi such that fi (x,y) ~> (p,q)
    • obadz @obadzz fi = bimap f f where f = map snd ?
      • [revised:] fi = join bimap f where f = map snd
  • September 17th, 2015:
    dist :: (Num a, Floating a) => [a] -> [a] -> a
    dist is the sqrt of the sum of the squared-diffs
    define the curried fn: dist vs
  • September 11th, 2015: You have this: doubleplus = join (***) (+) define f point-free: f :: Num a => (a, a) -> (a, a) -> (a, a) f (a,b) (c, d) = (a+c, b+d)
    • obadz @obadzz (<<*>>) . (<<*>>) ((+), (+))
    • Greg Manning @ghyu f = uncurry bimap . doubleplus
  • September 8th, 2015: Given f :: a -> b -> c define g :: [(a, [b])] -> [c] points-free
    • Freddy Román @frcepeda g = concatMap (uncurry (map . f))
    • Gautier DI FOLCO @gautier_difolco
      concatMap (uncurry (zipWith ($)) . bimap (map f . repeat) id)
    • Daniel Gazard @danielgazard g = map (uncurry f) . concatMap (uncurry ((<$>) . (,)))
  • September 8th, 2015: Given f :: Monoid b => [a] -> b
    define b :: [a] -> (b, [b]) -> (b, [b]) points-free
    such that
    b c (x, r) = (x `mappend` f c, f c:r)
    • Chris Copeland @nopasetic b = uncurry bimap . bimap ((<>) . f) ((:) . f) . join (,)
    • Gautier DI FOLCO @gautier_difolco uncurry bimap . bimap (<>) (:) . join (,) . f
  • September 8th, 2015: Given data X a b c = X a b c
    and f :: a -> b
    define foo :: a -> c -> X a b c
    points-free
    • failingattempt @failingattempt foo = X <*> f
  • September 4th, 2015: summer :: (a, Float) -> (a, Float) -> (a, Float)
    summer (x, a) (_, b) = (x, a + b)
    define summer, points-free
    • Stijn van Drongelen @rhymoid fmap . (+) . snd
  • September 3rd, 2015: suggested by @BeRewt: f :: [a -> b] -> a -> [b] define f points-free
    • \[ c^3 \] @das_kube sequence
    • Matthew Avant @mavant flip $ (<**>) . pure
  • September 3rd, 2015: What is it with y'all's #1Liner-speed-of-response? (I'm lovin' it, actually) sq :: Num a => a -> a sq x = x * x define sq points-free
    • Nicoλas @BeRewt join (*)
    • \[ c^3 \] @das_kube uncurry (*) . (id &&& id) (yes, ugly)
  • September 3rd, 2015: snds :: (a,b) -> (a,b) -> (b,b) define snds, points-free
    • September 3rd, 2015: tuple :: a -> b -> (a, b) define tuple, points-free
      • Nicoλas @BeRewt (,)
      • \[ c^3 \] @das_kube curry $ id *** id
      • Olivier Iffrig @oiffrig tuple = (,) Alternatively, tuple = curry id
    • September 3rd, 2015:
      you have f :: a -> b -> c you want g :: (a -> b -> [c]) -> a -> b -> d convert f to function that is accepted by g
      • obadz @obadzz ((:[]) .) . f
        • Nicoλas @BeRewt I prefer 'return' to '(:[])'
      • Matthew Avant @mavant ((.).(.)) (:[]) f, I suppose
    • September 1st, 2015: You have Functor-instance R a, declared R (a,a,a) You have f :: a -> a -> a define mappend :: R a -> R a -> R a, using f once only
      • obadz @obadzz unless you've got an applicative instance then mappend = liftA2 f

    Thursday, October 1, 2015

    September 2015 1HaskellADay Problems and Solutions


    • September 30th, 2015: Now, not only DLists are Functors, for today's #haskell problem we make them Applicative! http://lpaste.net/4112595477009006592 Come to find that Applied Applicative DLists taste like Apples http://lpaste.net/792542463930662912
    • September 29th, 2015: For today's #haskell problem we look at DLists as Functors! I know! Exciting! http://lpaste.net/8043489210054737920 and we enfunctorfy DLists ... which are already functions ... http://lpaste.net/2809553771506434048 ... hmmm ... That sounds like DLists are APPLICATIVE!
    • September 28th, 2015: So we indexed a set of rows last week, let's re(re)cluster them, AGAIN! http://lpaste.net/5440430731631263744 for today's #haskell problem. And now we re(re)clustered the data, now with colors! http://lpaste.net/3487731800489328640
    • September 24th, 2015: Okay, yesterday we indexed rows, so, for today's #haskell problem, let's save and load those rows as CSV http://lpaste.net/2517275148160073728 The solution to this problem has no title (oh, well!) http://lpaste.net/6580653869074743296
    • September 23rd, 2015: Data Row, o, Data Row: wherefore art thoust identity? Today's #haskell problem adds unique ids for rows of data http://lpaste.net/348288308305985536 Simply by using Data.Array we get our data in (uniquely-identified) rows http://lpaste.net/2095758966012248064
    • September 22nd, 2015: For today's #haskell problem we go To Infinity ... and Beyond. Yes: we're coding Haskell-on-the-web, yo! http://lpaste.net/1872496352533938176 simpleHTTP makes HTTP GET-requests, ... well: simple! http://lpaste.net/1054132180147503104
    • September 21st, 2015: For today's #haskell problem, we'll fade a circle to black http://lpaste.net/1617260331761926144
    • September 17th, 2015: For today's #haskell problem, we receive data one way, but want to see it in another way. What to do? http://lpaste.net/4132020892534308864 Data. EnCSVified. (that's a word, now) http://lpaste.net/6250288158647255040
    • September 16th, 2015: Today's #haskell problem asks 'why JSONify when you can represent clusters yourself?' Why, indeed! http://lpaste.net/9129398633454632960
    • September 15th, 2015: For today's #haskell problem we 'unJSONify' some, well, JSON http://lpaste.net/2687400576576126976
    • September 14th, 2015: For today's #haskell problem, we relook and recenterclusters from the cluster center http://lpaste.net/8570863670190407680 So, re-en-cluster-i-fied ... uh: clusters! YAY! (with, ooh! pics!) http://lpaste.net/8152035196972040192 
    • September 11th, 2015: Yesterday we displayed one cluster. For today's #haskell problem, let's display them all! http://lpaste.net/6049110928429940736
    • September 10th, 2015: This past week we've been clustering data, for today's #Haskell problem we look at visualizing one of these clustershttp://lpaste.net/6385518627050749952 Cluster: shone! ('Schön'? sure!) http://lpaste.net/6270301353331916800 
    • September 9th, 2015: Okay, yesterday we clustered some data. For today's #haskell problem: let's see some clustered results, then! http://lpaste.net/4070941204840185856 It don't mean a thing, ... If it ain't got the (spreadsheet/CSV) schwing. http://lpaste.net/8321027338137501696
    • September 8th, 2015: Today we get to do what all those other peeps do in other programming languages. Today we get to WRITE A PROGRAM!http://lpaste.net/4909208397410205696 wow. I'M K-MEANSIN' ON FIRE TODAY!(okay, geophf, calm down now) A program in Haskellhttp://lpaste.net/8770140562062835712
    • September 7th, 2015: Happy Labor Day in the U.S.A. Today's #haskell problem is to look at recentering clusters for the K-Means algorithm http://lpaste.net/4755592492567494656 SEMIGROUPOID! (not 'monoid') is the key the solution for today's #haskell problem http://lpaste.net/9124358884469768192 (ScoreCard has no valid 'zero')
    • September 4th, 2015: Today's #haskell problem we store color-coding for score cards we obtain from rows of data http://lpaste.net/6316202708206354432 And, color-coded score cards ... SAVED! (makes me wanna scream 'SAIL!') http://lpaste.net/1760010119669612544
    • September 3rd, 2015: For today's #haskell problem we look at reclustering the rows of data using K-Means clustering http://lpaste.net/5001877831559413760 K-Means clustering in #haskell (well, for 1 epoch. Something's not right with step 3: recentered) http://lpaste.net/87582513538531328 (and it's DOG-slow)
    • September 2nd, 2015: Drawing information from #BigData is magical, or so says today's #haskell problem http://lpaste.net/8551081789559406592 Ooh! Big Data is o-so-pretty! http://lpaste.net/2956050488883150848 But what does it mean? Stay tuned! 
    • September 1st, 2015: For today's #haskell problem we look (obliquely) at the problem of 'indices as identity' http://lpaste.net/2039451038523588608 What is identity, anyway? 100+ clusters for 3,000 rows? Sounds legit. http://lpaste.net/4068559039884165120

    Thursday, September 3, 2015

    1Liners August 2015

    • August 20th, 2015: Okay this: \(a,b) -> foo a b c d e Somehow curry-i-tize the above expression (make a and b go away!) Is this Applicative?
      • JP @japesinator uncurry $ flip flip e . flip flip d . flip flip c . foo
      • Conor McBride @pigworker (|foo fst snd (|c|) (|d|) (|e|)|)
    • August 19th, 2015: points-free define unintify: unintify :: (Int, Int) -> (Float, Float) where unintify (a,b) = (fromIntegral a, fromIntegral b)
    • August 19th, 2015: points-free define timeser: timeser :: (Float, Float) -> (Float, Float) -> (Float, Float) where timeser (a,b) (c,d) = (a*c, b*d)
    • August 18th, 2015: foo :: (Float, Float) -> (Float, Float) -> Int -> (Float, Float) points-free if: foo (a,b) (c,d) e = ((c-a)/e, (d-b)/e) Arrows? Bimaps?

    1Liners July 2015

    • July 29th, 2015: ... on a roll: Point-free-itize
      foo :: (a -> b, a -> b) -> (a, a) -> (b, b)
      foo (f,g) (x,y) = (f x, g y)
      • \[ c^3 \] @das_kube uncurry (***)
    • July 29th, 2015: I can't believe this wasn't a #1Liner already. Point-free-itize dup:
      dup :: a -> (a,a)
      dup x = (x,x)
      • Antonio Nikishaev @lelff join (,)
      • \[ c^3 \] @das_kube id &&& id
    • July 23rd, 2015: define pairsies so that, e.g.: pairsies [1,2,3] = {{1, 2}, {1, 3}, {2, 3}} pairsies :: [a] -> Set (Set a)
      • pairsies list = concat (list =>> (head &&& tail >>> sequence))
    • July 23rd, 2015: define both :: (a -> b) -> (a,a) -> (b,b)
      • Chris Copeland @chrisncopeland point-freer: both = uncurry . on (,)
      • Brian McKenna @puffnfresh both = join bimap
    • July 23rd, 2015: point-free-itize: gen :: Monad m => (m a, m b) -> m (a, b)
      • Bob Ippolito @etrepum gen = uncurry (liftM2 (,))
    • July 17th, 2015: You may have seen this before, but here we go. point-free-itize swap:
      swap :: (a,b) -> (b,a)

    1Liners Pre-July 2015

    • Point-free define: foo :: (Ord a, Ord b) => [([a], [b])] -> (Set a, Set b)
      • Андреев Кирилл @nonaem00 foo = (Set.fromList . concat *** Set.fromList . concat) . unzip
    • point-free-itize computeTotalWithTax :: Num b => ((a, b), b) -> b computeTotalWithTax ((a, b), c) = b + c
    • point-free-itize foo (k,v) m = Map.insert k b m with obvs types for k, v, and m.
    • point-free-itize: shower :: forall a. forall b. Show a => [b -> a] -> b -> [a] shower fns thing = map (app . flip (,) thing) fns
    • row :: String -> (Item, (USD, Measure)) given csv :: String -> [String] and line is = "apple,$1.99 Lb" hint: words "a b" = ["a","b"] ... all types mentioned above are in today's @1HaskellADay problem at http://lpaste.net/4698665561507233792
    • For Read a, point-free-itize: f a list = read a:list (f is used in a foldr-expression)
      • Or you could just do: map read
    • point-free-itize f such that: f a b c = a + b + c

    Tuesday, September 1, 2015

    August 2015 1HaskellADay Problems and Solutions

    August 2015

    • August 31st, 2015: What do 3,000 circles look like? We answer this question in today's #haskell problem http://lpaste.net/2137455526429065216 Ah! Of course! 3,000 circles (unscaled, with numeric indices) look like a mess! Of course! http://lpaste.net/7343132373682225152 
    • August 28th, 2015: For today's #haskell problem: you said you wuz #BigData but you wuz only playin'! http://lpaste.net/5582316999184220160 View and scale 'some' data today. Playahz gunna play ... with ... wait: lenses? WAT? http://lpaste.net/3840444612304961536
    • August 27th, 2015: Today's #haskell problem inspired from twitter: prove the soundness of ME + YOU = FOREVER http://lpaste.net/324064108640993280 Today's #haskell solution is a simpl(istic)e and specific arithmetic (dis)prover http://lpaste.net/181333197114572800 ME+YOU /= FOREVER It ain't happenin'
    • August 26th, 2015: You've heard of The Darkness? Well, today's #haskell problem is all about the Brightness http://lpaste.net/8121218308407033856 Bright eyes! burnin' like fire! http://lpaste.net/6043805058976448512 
    • August 25th, 2015: Well, color me surprised! Today's #haskell problem asks to color by Num(bers) http://lpaste.net/3178349384714682368 And we find out how colors can be numbers, or numbers (Integers) can be colors ... either way http://lpaste.net/2477807113129164800
    • August 24th, 2015: You thought I would say 'Purple' (as in Rain) for today's #haskell problem, but I was only playin' http://lpaste.net/1249981301570666496 #PSA Circles are NOT jerkles ... because ... I don't even know what 'jerkles' ARE! http://lpaste.net/9210044109090193408
    • August 21st, 2015: So, ooh! PRITTY COLOURS YESTERDAY! BUT WHAT DO THEY MEAN? Today's #haskell problem we cluster data http://lpaste.net/4331594901654339584 DO IT TO IT!
    • August 20th, 2015: For today's #haskell problem, now that we have yesterday solved, let's COLOUR the dots! http://lpaste.net/6809671420902113280 Okay, very hack-y but, indeed: colour-y! http://lpaste.net/6080330641977638912 (and the index colours need work, too ...)
    • August 19th, 2015: Let's look at some cells in a bounding box, shall we? for today's #haskell problem http://lpaste.net/1002579649038909440 Share you results here on twitter! Ooh! I see blue dots! K3wl! http://lpaste.net/8209124658184716288 
    • August 18th, 2015: In #hadoop you can store a lot of data...but then you have to interpret that stored data for today's #haskell program http://lpaste.net/4873818871913512960 Today, the SCA/Society for Creative Anachronisms solved the problem. No: SCA/Score Card Analysis ... my bad! http://lpaste.net/4574974476227182592
    • August 17th, 2015: For Today's #haskell problem we learn that bb does NOT mean 'Big Brother' (1984). What DOES it mean, then? Tune in! http://lpaste.net/4021501857770766336 We learn that @geophf cannot come up with interesting title names for lpaste.net so early in the morning! http://lpaste.net/2865245555871711232
    • August 14th, 2015: We find out in today's #haskell problem that if 'kinda-prime' numbers had a taste, they would be 'yummy.' http://lpaste.net/8115330625503756288
    • August 13th, 2015: We generalize to divbyx-rule by using Singapore Maths-laaaaah http://lpaste.net/7999541914976124928 for today's #Haskell problem "divby7 is too easy now-laaaah!" http://lpaste.net/3424362847981797376 ... but there are interesting results for tomorrow's problem
    • August 12th, 2015: Is divby3 a fixpoint? We address this question in today's #haskell problem http://lpaste.net/4853502121825271808 "There, fixed divby3 for ya!" you crow, in on the 'fix'-joke http://lpaste.net/1103072314578173952 *groan *fixpoint-humour
    • August 11th, 2015: Today, I ask you to step up your composable-game, #haskell-tweeps! http://lpaste.net/793067287459397632 Today's div-by #haskell problem So we ♫ 'head for the mountains!' ♫ for our composable solution of divby10 and divby30 http://lpaste.net/7676519782581010432 but leave an open question ...
    • August 10th, 2015: Neat little paper on divisibility rules (http://www.cicm-conference.org/2015/fm4m/FMM_2015_paper_6.pdf) leads to today's #Haskell problem http://lpaste.net/5543352342910337024: divide by 3 rule! A number is divisible by three if the sum of its digits are. PROVED! http://lpaste.net/3154300080412950528
    • August 7th, 2015: For today's #haskell problem we relook yesterday's with Data.Monoid and fold(r) ... for fun(r) http://lpaste.net/7556489465431064576 We're using code from the future (or the bonus answer, anyway) to answer today's #haskell problem http://lpaste.net/1417188853759868928
    • August 6th, 2015: For today's #haskell problem, @elizabethfoss provides us the opportunity to do ... MATHS! http://lpaste.net/4497234580327104512 TALLY HO! Today's solution has Haskell talking with a LISP! (geddit? ;) http://lpaste.net/8042173730291449856
    • August 5th, 2015: Today's #Haskell problem shows us that 'anagramatic' is a word now, by way of @argumatronic http://lpaste.net/7378835938597666816 We learned that #thuglife and #GANGSTA are a bifunctor, but not anagrams with http://lpaste.net/4458286223454109696 @argumatronic 
    • August 4th, 2015: We actually write a PROGRAM for today's #haskell problem that DOES STUFF! WOW! http://lpaste.net/7617945186101886976 #curbmyenthusiasm #no Today we learnt to talk like a pirate ... BACKWARDS! http://lpaste.net/1920179878318047232 ARGGGH! ... no ... wait: !HGGGRA Yeah, that's it.
    • August 3rd, 2015: For today's #haskell problem, we design a Hadoop database ... ya know, without all that bothersome MapReduce stuff http://lpaste.net/3336770284219793408 ;)