The problems of scale, speed, persistence and context are the most important design problem we'll have to deal with during the next decade. Scale because we're creating and recording more data than at …
The problems of scale, speed, persistence and context are the most important design problem we'll have to deal with during the next decade. Scale because we're creating and recording more data than at any time in human history – much of it of dubious value, but none of it obviously value-less. Speed because data flows now. Ceaselessly. In high volume. It has to be persisted in multiple latencies, from milliseconds to decades. And context because the context of creation is different from the context of transmission is different from the context of use.
There are a lot of red herrings, false premises and just-plain-dementia that get in the way of us seeing the problem clearly. We must work through what we mean by "structured" and "unstructured", what we mean by “big data” and why we need new technologies to solve some of our data problems. But “new technologies” doesn’t mean reinventing old technologies while ignoring the lessons of the past. There are reasons relational databases survived while hierarchical, document and object databases were market failures, technologies that may be poised to fail again, 20 years later.
What we believe about data’s structure, schema, and semantics are as important as the NoSQL and relational databases we use. The technologies impose constraints on the real problem: how we make sense of data in order to tell a computer what to do, or to inform human decisions. Most discussions of data and code lose the unconscious tradeoffs that are made when selecting these technology handcuffs.