======================================================
1. Building Data Start-ups: Fast, Big, and Focused
======================================================
* 2 parts today:
(i) forces behind big data opportunity
(ii) big data stack and how to compete with in
* building a data start-up is a bit like Sumo Wrestling
* data is heavy, has weight - we need agile strategies to succeed
* today: talk about opportunities for data, strategies for success
* in a nutshell: data start-ups must be fast, big, and focused
================================================
2. The Big Data Opportunity
================================================
* it's a cliche by now: there is a mountain of data in this world
* understanding these forces is critical to data start-up's strategy
<transition>: what are some of the tectonic forces at work?
================================================
3-4. Attack of the Exponentials
================================================
* these are something that i call 'attack of exponentials'
* VCs like curves like
[transition]
* in the past few decades, the cost of storage, CPU, and bandwidth has been exponentially dropping, while network access has shot up
* in 1980, a terabyte of storage cost $14 MILLION - today it's $47 dollars
<transition>: exponential economics, together with two other forces
================================================
5. Intersection of Three Forces
================================================
* ... form the inputs to this massive increase in data, the data singularity
* sensor networks the phones, GPS devices, laptops, and instrumented spimes
* cloud computing has democratized and made computing power & storage a utility
( "even if it turns out that the cloud is actually just some place in Virginia.")
================================================
6-7. Data Value Must Exceed Data Cost
================================================
* the laws of economics have not changed: value must exceed cost
* the upper left side of this graph shows data whose value exceeded
its cost of collecting, storing, and computing over a decade ago
* the human genome data cost $3 billion (in 2000)
[shift slide]
* but as the tide shifts, new classes of data are revealed as being valuable
* the dog genome cost only $30 million (in 2005)
* web log data used to be tossed; now it's cheap enough to collect,
store, and compute over
* i encourage all of you, think of a data source that was previously
not collected, or not kept around, and mull the possibilities
<transition>: with that, i would like to now talk about the emerging stack,
and the strategies for being successful within it
================================================
8-9, 10-11. Success on the Data Stack
================================================
* here is my vision of the emerging big data stack
* at bottom is data - persistence layer - databases - the brawn
* in the middle is analytics - the intelligence layer
* at the top - services, what you all the brains and brawn
[ transitions in quite succession ]
* I argue that data start-ups, to succeed, must have
== FAST data, BIG analytics, and FOCUSED services ==
* let's take each of these in turn,
exploring the competitive axes at each layer
starting from the bottom of the stack, data
================================================
12. FAST
================================================
* as I said before, data is heavy
* being able to move big data quickly is key
* let's pull the data layer out of the stack & examine it
================================================
13. Fast Data
================================================
* so we have the two competitive axes on the data layer
* the first axis is scale: for data, the scaling issue has been solved.
* Hadoop