Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building Data Start-Ups: Fast, Big, and Focused

32,393 views

Published on

======================================================
1. Building Data Start-ups: Fast, Big, and Focused
======================================================

* 2 parts today:

(i) forces behind big data opportunity
(ii) big data stack and how to compete with in

* building a data start-up is a bit like Sumo Wrestling

* data is heavy, has weight - we need agile strategies to succeed

* today: talk about opportunities for data, strategies for success

* in a nutshell: data start-ups must be fast, big, and focused


================================================
2. The Big Data Opportunity
================================================

* it's a cliche by now: there is a mountain of data in this world

* understanding these forces is critical to data start-up's strategy

<transition>: what are some of the tectonic forces at work?


================================================
3-4. Attack of the Exponentials
================================================

* these are something that i call 'attack of exponentials'

* VCs like curves like

[transition]

* in the past few decades, the cost of storage, CPU, and bandwidth has been exponentially dropping, while network access has shot up

* in 1980, a terabyte of storage cost $14 MILLION - today it's $47 dollars

<transition>: exponential economics, together with two other forces

================================================
5. Intersection of Three Forces
================================================

* ... form the inputs to this massive increase in data, the data singularity

* sensor networks the phones, GPS devices, laptops, and instrumented spimes

* cloud computing has democratized and made computing power & storage a utility

( "even if it turns out that the cloud is actually just some place in Virginia.")

================================================
6-7. Data Value Must Exceed Data Cost
================================================

* the laws of economics have not changed: value must exceed cost

* the upper left side of this graph shows data whose value exceeded
its cost of collecting, storing, and computing over a decade ago

* the human genome data cost $3 billion (in 2000)

[shift slide]

* but as the tide shifts, new classes of data are revealed as being valuable

* the dog genome cost only $30 million (in 2005)

* web log data used to be tossed; now it's cheap enough to collect,
store, and compute over

* i encourage all of you, think of a data source that was previously
not collected, or not kept around, and mull the possibilities

<transition>: with that, i would like to now talk about the emerging stack,
and the strategies for being successful within it

================================================
8-9, 10-11. Success on the Data Stack
================================================

* here is my vision of the emerging big data stack

* at bottom is data - persistence layer - databases - the brawn

* in the middle is analytics - the intelligence layer

* at the top - services, what you all the brains and brawn

[ transitions in quite succession ]

* I argue that data start-ups, to succeed, must have

== FAST data, BIG analytics, and FOCUSED services ==

* let's take each of these in turn,
exploring the competitive axes at each layer
starting from the bottom of the stack, data

================================================
12. FAST
================================================

* as I said before, data is heavy

* being able to move big data quickly is key

* let's pull the data layer out of the stack & examine it

================================================
13. Fast Data
================================================

* so we have the two competitive axes on the data layer

* the first axis is scale: for data, the scaling issue has been solved.

* Hadoop

Published in: Technology
  • Thank you for sharing this presentation - I like that you went beyond saying what big data is and what it is used for, and talked about why we have big data and why it is useful.

    FYI, I cited this slide set in a presentation I prepared - an introduction to big data for marketers [available here: http://www.slideshare.net/acanhoto/cim-2012-big-data]
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Building Data Start-Ups: Fast, Big, and Focused

  1. 1. Building Data Start-ups:<br />Fast, Big, and Focused<br />Michael E. Driscoll, CTO, Metamarkets<br />@medriscoll<br />O’Reilly Strata Online | May 25, 2011<br />
  2. 2. The Big Data <br />Opportunity<br />
  3. 3. The Attack of the Exponentials<br />
  4. 4. The Attack of the Exponentials<br />
  5. 5. The Intersection of Three Forces<br />Yields Higher Volume & Velocity of Data<br />exponential economics<br />sensor networks<br />cloud computing<br />
  6. 6. Data Value Must Exceed Data Cost<br />
  7. 7. Data Value Must Exceed Data Cost<br />... New Classes of Data are Now Valuable<br />
  8. 8. Success on the Data Stack<br />Services<br />Analytics<br />Data<br />
  9. 9. Success on the Data Stack<br />Fast<br />Services<br />Analytics<br />Fast<br />Data<br />
  10. 10. Success on the Data Stack<br />Fast, Big<br />Services<br />Big<br />Analytics<br />Fast<br />Data<br />
  11. 11. Success on the Data Stack<br />Fast, Big, and Focused<br />Focused<br />Services<br />Big<br />Analytics<br />Fast<br />Data<br />
  12. 12. #1: Fast<br />
  13. 13. Success on the Data Stack<br />Fast Data<br />real-time<br />Kdb<br />Netezza<br />Esper<br />Vertica<br />MongoDB<br />speed<br />InfoBright<br />Aster<br />MySQL<br />MapR<br />Greenplum<br />Postgres<br />batch<br />Hadoop<br />Services<br />megabytes<br />petabytes<br />scale<br />Analytics<br />free, open-source<br />Data<br />commercial<br />
  14. 14. Fast Data With Cheap Memory<br />1964 – Univac 2k<br />$51 million/MB<br />2011 – DDR 1GB<br />1 cent/MB<br />data sources: http://www.sharkyextreme.com & http://www.webservicessummit.com/Trends/TechTrends1/img11.html, plotted with ggplot2<br />
  15. 15. #2: Big<br />
  16. 16. Success on the Data Stack<br />Big Analytics<br />custom<br />(hardware)<br />real-time<br />speed<br />Revolution R<br />R<br />custom <br />distributed<br />SAP<br />SAS<br />SciPy<br />SPSS<br />batch<br />Services<br />megabytes<br />petabytes<br />scale<br />Analytics<br />free, open-source<br />Data<br />commercial<br />
  17. 17. The Promise ofAnalytics<br />extract<br />learn<br />predict<br />DATA<br />FEATURES<br />MODELS<br />“More data usually beats better algorithms.”<br />
  18. 18. #3: Focused<br />
  19. 19. Success on the Data Stack<br />Focused Services<br />Focused<br />Services<br />Analytics<br />Data<br />
  20. 20. “Real-time, large-scale analytics in a focused vertical.”<br />credit: Joe Reisinger, Metamarkets<br />
  21. 21. Success on the Data Stack<br />Fast, Big, and Focused<br />Focused<br />Services<br />Big<br />Analytics<br />Fast<br />Data<br />
  22. 22. Thank You. Questions?<br />Michael E. Driscoll, CTO, Metamarkets<br />@medriscoll<br />O’Reilly Strata Online | May 25, 2011<br />

×