Storm – Beautiful Data

Our brain is wired to experiencing the world as one consistent model of reality. New data we interpret either as confirmation of the model or as an update to replace one of its parameters with a new value. Our sensory organs also reduces the incoming stimuli, drop most of the impressions, preprocess what is identified as signals to simple patterns that are propagated to our mind. What we remember as the edge of our table – a straight line, limiting the surface – was in fact received as a fine grid of multicoloured pixels by our retina. For sake of saving computation and storage power, and to keep a stable, consistent view, we forsake the richness of information. And we use to build our data bases to work exactly that way.

One of the realy disruptive shifts in our business is imo to break this paradigm: “Make your source of truth immutable.” Nathan Marz (who has just yesterday left the Twitter team) tells us to have a base layer of incoming data. Nothing here gets updated or changed. New records are just attached. From such an immutable data source, we can reconstruct the state of our data set at any given point of time in the past; even if someone messes with the database, we could roll back without the need to reset everything. This rather unstructured worm is of course not fit to get access to information with low latency. In Marz’ paradigm it is the “source of truth”, is a repository to feed into a second level of more “classic” data bases that provides precalculated, prepopulated tables that can be accessed at real time.

What Nathan Marz advocates as a way to make data bases more tolerant against human fault entails in fact a deep, even philosophical perspective. With the classic database we would keep master data and transaction data in different tables. We would regard a master record as something that should provide one consistent view on the object recorded. Take a clients data base of some retailer: Address or payment information we would expect to be a static property of the client, to be kept “up to date” – if the person moves, we would update the record. Other information we would even regard as unchangeable: Name, gender or birthday for example. This is exactly how we would be looking at the world if we had remained at the state of the naive phenomenology of the early modern ages. Concepts like “identity” of a human being reflect this integral perspective of an object with master properties – ideas like “character” (individual or even bound to ethnicity or nation) stem from this taking an object as in reality being independent from the temporal state of data that we could comprehend. (Please excuse my getting rather abstract now.)

Temporal logic was developed not in philosophy but rather in computer science. The idea is, that those apodictical clauses of “true” or “false” – tertium non datur” – that we are used to deal with in propositional calculus since the time of the ancient Greeks, would not be correctly applicable to real world systems like people interacting with other in time. – The “classic” example would be a sentence like “I am hungry” that would never necessaryly be true or false because it would depend on the specific circumstances at that point in time when I would have stated it; nevertheless it should be regarded as a valid property to describe me at that time.

In such way, the immutable database might not reflect our gut feeling about reality, but it certainly is a far more accurate “source of truth”, and not only because it is more tolerant against human operators tampering with the data.

With the concept of one immutable source of truth, this “master record” is just a view on the data at one given point in time. We would finally have “the forth dimension” in our data.

Tag: Storm

The immutability paradigm – or: how to add the “fourth dimension” to our data