Pages

Thursday, December 5, 2013

Abstract Nonsense, chapter 3: the Tuple


We consider the tuple.

Really, after Unit and Id, the tuple is one of the simplest data structures around. In its base for the tuple is of type

data Tuple a b = (a, b)

Tuple has two fundamental functions:

fst :: (a, b) -> a
snd :: (a, b) -> b

and that's all there is to it, and that's all there is to the tuple.

(really, the above declaration should be:

data (a, b) = (a, b)

but the typeful LHS ('left hand side') declaration being the same syntax (and I say 'the same syntax,' not 'the same semantics') as the instance definition on the RHS ('right hand side') may make some people's head hurt. That's also why I gave the list declaration as:

data list a = nil | (a, list a)

and not the standard syntax of:

data [a] = [] | (a:[a])

There is the question of programs are written to be read as well as to be executed by a machine).

So: simple.

Or.
Is.
It?

(I believe this has become my trademark.)

Of course if you have the Cartesian cross-product of two typed values, then why not three? why not one? why not ... 'n'?

So, of course, the tuple is any number of typed values associated through their cross products, but I like to keep things simple here (and in my code), so I'll just be working with the above given definition of tuple and where I need a 'triple' (heh: 'triple') ('triple' was the term used for monads up until recently) I'll define one, or I'll simply compose tuples, for, after all, any n-relation can be reduced to a set of 2-relations, e.g.:

data Triple a b c = (a, (b, c))
data Quad a b c d = (a, (b, (c, d)))
data Quinz a b c d e = ... etc

So from tuple you have triple, or quad, or quinz, or whatever.

So: tuple.

What can we do with it?

Well, everything, actually. You see most programming languages take the tuple for granted, and call it the parameter-set of functions. Haskell does not do this. Haskell be smart. Haskell be curry(ing) the arguments to functions.

Haskell B. Curry. Geddit? No? Look it up.

So, in programming language X, where X is most likely the language of your choice you invoke a function in the 'standard' way:

foo(bar, baz, ...)

Not so with Haskell ... I mean, not necessarily so (and not even ordinarily so).

Nor even with other Lisp-like languages. When everything's a list, you get a tremendous amount of conceptual freedom and from that freedom comes power.

But ... power? Later.

So, with Haskell, you can force the tupling of arguments to a function:

foo :: (bar, baz) -> quux

But why would you want to do that, requiring all the arguments to a function to be present when you can curry them:

foo :: bar -> baz -> quux

And then apply the arguments individually as needed, building a partial function (the partial function itself being a (functional) value) until you finally apply to get the solution (in this example, the quux-typed value).

So, currying: the 'anti-tuple.'

So we showed a simple, elegant use contra the tuple, but what are the uses of the tuple.

Well, like I said: plentiful, many-ful, many-fold. Manifold.

(I love writing words.) (Bear with me.)

(which means something totally different than 'bare with me.')

One of the primary uses that come to mind for the tuple, besides freely-relating two disconnected values (which, in and of itself is a big, huge win ... ever see a list of parameters to a function where there are obviously-paired values, except they aren't paired, so the code gets all mucked up treating something that isn't as if it were because it should be?), is to 'pretend' that two values are one and then to fool everybody else with this conceit.

Where does this work?

Have you ever needed to return two values from a function?

But of course you can't because a function returns 'a,' 'one,' 'single' value. A function doesn't return two values, and you don't want to have to write a class to package those values together (because why? That's something you may have to look into yourself to answer: if two values are related, why aren't you making that relation explicit in the type system? Types makes things easier, but only when you use them), so what do you do?

Return the tupled pair.

You now have functions that return two values, well-typed, at that, and if they can return two, then can return three, or four, or ... 'n.'

That's one place where tuples shine: 'Oh, I need the value from the list, but I also need the list-index where that value was.' You tuple the index with the value, and your function now returns the data you need to proceed with your work.

But then, once you start using tuples one place, you find yourself starting to see uses for tuples in many other places you never considered before, because you just coded around two disparate values.

'Coding around' things is the algorithmic approach, tuple provides a data-centric solution. You ever notice how greatly algorithm simplifies when the right data types are used?

Tuple.

Examples

Arrows

Of course, the fundamental data type of category theory, the map can be viewed as a list (which is a linked set of tuples (or 'pairs')) of tuples, where each tuple is composed of the key for the first value and the value for the second.

So that gives us another example: one is 'Map.Entry<K, V>' so obviously 'Tuple' I'm wondering why the Java folks just didn't name it 'Duck' since it was quacking like one.

But then ... Arrow.

Arrow captures the entirety of the function f :: a -> b, so if you look at it slightly differently, an arrow, or morphism, or function, is the mapping of all values applicable to the function f from Category A to Category B.

So, one way to look at an arrow f is simple the set of all tuples (a, b) where the 'a's are all possible inputs to f with the 'b's being the corresponding outcomes.

But there are more to arrows than that (and, fear not, this article will not be side-track to an arrow-exclusive): the primary arrow operation is 'first' (arrows are fundamentally, or minimally, defined on 'arr' (which is the 'arrow-maker' function) and 'first').

What does first do?

First supposes that the input has been split from some value a into two (tupled) values (a, a). And first takes an unit function and applies it to the the tuple's first value, leaving the second value unchanged.

So, if you have some value a, and a function f :: a -> b then

first f

does the following

first f (a, a) -> (b, a)

Or, actually, and more generally, this works for any tupled input value so:

first f (a, d) -> (b, d)

works as well.

Exercise: Given than f :: a -> b and first f :: (a, d) -> (b, d), write an implementation of first.

Ordering

We mentioned before that Map.Entry<K, V> is just another name for 'Tuple<K, V>', so then, looking at it this way, either List<Tuple<K, V>> or Set<Tuple<K, V>> is another name for Map<K, V> depending on which search strategy we wish to use.

... with a significant difference: as both the K and V in the tuple are used to compute the tupled instance's identity, then both those data structures permit 'duplicate' entries: a key may be associated with zero, one or more values.

And, here is where the search strategy becomes significant: List has an explicit positional ordering: first comes the first value, next come the second value, etc. This positioning may be exploited for the inherent positional index to a tuple ... or I exploited this inherent indexing, at any rate.

Problem statement: so, have a list of values, rows from a data table that we wish to display in order.

Simple enough: we make the data table entity a comparable type, sort the section, and we're done.

Or.
Are.
We?

The problem here is that ...

Well, there's always a 'the problem here is' when a model encounters the real world and when developed code hits production data.

This particular problem is that there are scenarios where an entity is represented by compound data, and so you need two rows to indicate one entity.  In these cases, you have the same entity id, so what do we do? We can complicate the sort-criterium for the comparison for the entire entity because of this special case, or we can sort simply and handle sorting these two-row entities on a case-by-case basis.

But then we sort the list twice. Once naturally and automatically, and we don't need to think about that sort.

But how do we juxtapose two rows with the same entity id in a list? We can't change the indices on a list pel-mel, can we?

Or.
Can.
We?

Using tuples, we can indeed do just that. If we have the mostly-well-sorted list

[Ent {13, ... Nov 20, asdf},
 Ent {27, ... Aug 3, aega},
 Ent {27, ... null, hjg},
 Ent {34, ... null, asjkl},
 Ent {34, ... Apr 26, aehe},
...]

And we wish to have the 'null' date higher up on the list, as per customer requirements. One way to do that is to add this to the search criteria. But what if the search criteria is already based off of three values of the primary key (and this date is not part of the primary key, because it is nullable)? Do we wish to add this new criterium for these specialized cases?

Yes, we can do that.

Or, as we did in the project, we did another sort, manually. You see, the question comes up, how do we recognized that two rows represent one value, and, when recognized, how do we swap the row order for that one case where we wish to?

You can't say, 'oh, this list index is now less one and the other list is now index plus one'

Not with a straight-up list, anyway, you can't1 change the indices there. But if we make the indices explicit by tupling them with their listed data:

int index = 0;
for(Ent entity : entities) {
    newList.add(OrderedTuple.pair(index++, entity);
}

And then, when we encounter our special case, we verify the order or correct it:

OrderedTuple<Integer, Ent> oldPair =
    OrderedTuple.pair(-1, Ent.prototype());
for(OrderedTuple<Integer, Ent> pair: newList) {
    Ent newEnt = pair.snd();
    Ent oldEnt = oldPair.snd();
    if(newEnt.id == oldEnt.id) {
        if(newEnt.date == null) {
            int pairKey = pair.fst();
            pair.changeKey(pairKey - 1);
            oldPair.changeKey(pairKey);
         }
     }
     oldPair = pair;
}

// now we sort on the revised indices:
Collections.sort(newList);

// and return that sorted entity list (sans the explicit indicies):
List<Ent> revisedList = new ArrayList<Ent>();
for(OrderedTuple<Integer, Ent> pair : newList) {
  revisedList.add(pair.snd());
}
return revisedList;

And that's one way to skin that cat.

So, what's OrderedTuple? It's simply a mutable tuple with an ordering imposed on the keyed value:

@Effectual 
public class OrderedTuple<K extends Comparable<K>, V> 
    extends MutableTuple<K, V>
    implements Comparable<OrderedTuple<K, V>> {
  protected OrderedTuple(K key, V val) { super(key, val); }
  public static <KEY extends Comparable<KEY> VALUE>
    OrderedTuple<KEY, VALUE> pair(KEY k, VALUE v) {
return new OrderedTuple<KEY, VALUE>(k, v);
  }

  // Comparable implementation --------------
  @Override public int compareTo(OrderedTuple<K, V> tup) {
    return fst().compareTo(tup.fst());
  }
}

Where mutable tuple simply decorates the tuple protocol with setters:

@Effectual public class MutableTuple<K, V> extends Tuple<K, V> {
  protected MutableTuple(K k, V v) {
    super(k, v);
    key = new State<K>(k);
    value = new State<V>(v);
  }
  @Override public K fst() { return key.arg(); }
  @Override public V snd() { return value.arg(); }
  @Effectual public void changeKey(k newKey) { key.put(newKey); }
  @Effectual public void changeValue(v newValue) { value.put(newValue); }

  private final State<K> key;
  private final State<V> value;
}

Entanglement

Tuples are a way of pairing or associating things that are not explicitly already paired. As we saw above, normally the indices of a list are implied, but they are there and are associated with the elements of the list. A tuple makes that implied associate explicit. Once something is explicated then it can also be exploited in ways unimagined heretofore.

Another, novel, way I've used tupling is entanglement. Something over here, an event or an object, changes, and that change is supposed to reflect throughout the system. How do we do that?

One way is to hardcode it:

if(x changed) then change y

But this is at the algorithmic level, and things happening procedurally are notoriously hard to track.

We could, instead, make this a structure and realize entanglement as a data type.

This is what I do. I associate an observed object with an action, and that action is to change a value entangled (implicitly) with the observed object.

@Effectual public class Entanglement<T> extends Tuple<T, Effect<T>> {
  private Entanglement(T source, Effect<T> setterF) {
    super(source, setterF);
  }

  // the point of it all: we entangle two objects
  public static <T1> Entanglement<T1> entangle(T1 src, Effect<T1> destF) {
    return new Entanglement<T1>(src, destF);
  }

  // when we observe an object, that act of observation, itself, 
  // changes the destination object via entanglement:
  public Maybe<T> observe() throws Failure {
    return (Maybe<T>) Maybe.nullOr(fst()).liftThenBind(snd());
  }
}

A very, very simple class for a very simple concept.

Where is such a concept used?

Well, as in the above scenario with Ent types, I was collecting a set of three of the most current dates from Ent rows where 'most current' is defined as the the last update date auditing column is greatest on a particular set of entity types.

The thing is that any of these three dates could be null, so if I wholesale replaced dates with the most recent Ent row, then I found I would be nulling out already established dates.

I had to update dates on the most recent non-null dates, and this occurred on a case-by-case basis.

A further complication: one of the dates has an associated duration type-code, so every time I replaced that date, I had to update the duration code, too.

Algorithmically this turned into a nightmarish mess. Three development teams were called in to code the solution, and three teams each, eventually, threw up their hands, declaring this to be an intractable problem.

I used entanglement to solve this problem.

For the associated date-duration-code, I had one entanglement:

Entanglement.entangle(incomingRow.getDateA(), new Effect<Date>() {
  @Override protected void execute(Date newA) throws Failure {
    stored.setDateA(newA);
    int dur = incomingRow.getDurCd();
    if(dur > 0) {
      stored.setDurCd(dur);
    }
  }
}).observe();

For the stand-alone date, I used a simple entanglement:

Entanglement.entangle(incomingRow.getDateB(), new Effect<Date>() {
  @Override protected void execute(Date newDateB) throws Failure {
    stored.setDateB(newDateB);
  }
}).observe();

Then I simply iterated up the result set of the set of entities of concern and I was able to get the correct dates for each entity row that were required.

Easy, peasy!

Using the entanglement type (which, itself, is simply a tuple of observed value and an effect) I overhauled a patchwork system, that actually wasn't working at all, of over one-thousand lines of code with a fifty line replacement that satisfied customer requirements.

My take-away here is that solving problems algorithmically leads to code bloat and complication, but when the problem is reconceptualized into data structures, the problem is simplified and modularized.

Tuple is a data type that does just that: often, in code, two values are implied to be associated, but there is no direct correlation. A tuple makes that association explicit.

κCalc: the Kappa calculus

One more point to cover on tuples.

The λ calculus can be considered to be the marriage of two dual calculi: the κ calculus, which is the calculus of composition,2 and allows functors of the first order only, and the ζ calculus, which is the calculus of application. So, in the κ calculus, values are lifted into tupled types, or dropped to the unit type. An entire calculus exists around the manipulation of tuples, and it has its own arithmetic and everything.

You can do anything with tuples! Tuples are like Tiger Balm that way.

-----

(1) Ever hear a British-English person say the word 'can't' ...? It is to die for! But I digress, and also, in digression, have embarrassed Mr. Sh, my boss.

(2) When I read this paper on the κ calculus, I was so pleased! I had defined my morphism type to be a type that contained only the identity and the composition functions, but not application! And I had wondered if I were doing the right thing. Isn't the whole point of it all to apply a function? Isn't that what a morphism ... 'is'? And, actually, no, that's not the case at all. There do exist morphisms free of the concept of function application, and the κ calculus has arrows where elimination occurs from (de)composition, not from function application.

There exist a whole class of arrows that do no application at all. And I got that correct!

But this topic can be discussed in the article where we consider the morphism.

I suppose I'll start that article with the words: "We consider the morphism."

Hm, I wonder if the map article and the morphism article are one and the same?

1 comment: