Building Data Start-Ups: Fast, Big, and Focused

  • 19,878 views
Uploaded on

====================================================== …

======================================================
1. Building Data Start-ups: Fast, Big, and Focused
======================================================

* 2 parts today:

(i) forces behind big data opportunity
(ii) big data stack and how to compete with in

* building a data start-up is a bit like Sumo Wrestling

* data is heavy, has weight - we need agile strategies to succeed

* today: talk about opportunities for data, strategies for success

* in a nutshell: data start-ups must be fast, big, and focused


================================================
2. The Big Data Opportunity
================================================

* it's a cliche by now: there is a mountain of data in this world

* understanding these forces is critical to data start-up's strategy

<transition>: what are some of the tectonic forces at work?


================================================
3-4. Attack of the Exponentials
================================================

* these are something that i call 'attack of exponentials'

* VCs like curves like

[transition]

* in the past few decades, the cost of storage, CPU, and bandwidth has been exponentially dropping, while network access has shot up

* in 1980, a terabyte of storage cost $14 MILLION - today it's $47 dollars

<transition>: exponential economics, together with two other forces

================================================
5. Intersection of Three Forces
================================================

* ... form the inputs to this massive increase in data, the data singularity

* sensor networks the phones, GPS devices, laptops, and instrumented spimes

* cloud computing has democratized and made computing power & storage a utility

( "even if it turns out that the cloud is actually just some place in Virginia.")

================================================
6-7. Data Value Must Exceed Data Cost
================================================

* the laws of economics have not changed: value must exceed cost

* the upper left side of this graph shows data whose value exceeded
its cost of collecting, storing, and computing over a decade ago

* the human genome data cost $3 billion (in 2000)

[shift slide]

* but as the tide shifts, new classes of data are revealed as being valuable

* the dog genome cost only $30 million (in 2005)

* web log data used to be tossed; now it's cheap enough to collect,
store, and compute over

* i encourage all of you, think of a data source that was previously
not collected, or not kept around, and mull the possibilities

<transition>: with that, i would like to now talk about the emerging stack,
and the strategies for being successful within it

================================================
8-9, 10-11. Success on the Data Stack
================================================

* here is my vision of the emerging big data stack

* at bottom is data - persistence layer - databases - the brawn

* in the middle is analytics - the intelligence layer

* at the top - services, what you all the brains and brawn

[ transitions in quite succession ]

* I argue that data start-ups, to succeed, must have

== FAST data, BIG analytics, and FOCUSED services ==

* let's take each of these in turn,
exploring the competitive axes at each layer
starting from the bottom of the stack, data

================================================
12. FAST
================================================

* as I said before, data is heavy

* being able to move big data quickly is key

* let's pull the data layer out of the stack & examine it

================================================
13. Fast Data
================================================

* so we have the two competitive axes on the data layer

* the first axis is scale: for data, the scaling issue has been solved.

* Hadoop

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Thank you for sharing this presentation - I like that you went beyond saying what big data is and what it is used for, and talked about why we have big data and why it is useful.

    FYI, I cited this slide set in a presentation I prepared - an introduction to big data for marketers [available here: http://www.slideshare.net/acanhoto/cim-2012-big-data]
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
19,878
On Slideshare
0
From Embeds
0
Number of Embeds
13

Actions

Shares
Downloads
0
Comments
1
Likes
53

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • I want to first thank O’Reilly for putting together this event, and all of you for tuning in from around the globe.The Data Opportunity in 2 parts:I. The Opportunity: Why now, what forces are driving the data explosionII. The Technology Stack: What does the Big Data technology stack look like – where are the opportunities and risks?Data is heavy.

Transcript

  • 1. Building Data Start-ups:
    Fast, Big, and Focused
    Michael E. Driscoll, CTO, Metamarkets
    @medriscoll
    O’Reilly Strata Online | May 25, 2011
  • 2. The Big Data
    Opportunity
  • 3. The Attack of the Exponentials
  • 4. The Attack of the Exponentials
  • 5. The Intersection of Three Forces
    Yields Higher Volume & Velocity of Data
    exponential economics
    sensor networks
    cloud computing
  • 6. Data Value Must Exceed Data Cost
  • 7. Data Value Must Exceed Data Cost
    ... New Classes of Data are Now Valuable
  • 8. Success on the Data Stack
    Services
    Analytics
    Data
  • 9. Success on the Data Stack
    Fast
    Services
    Analytics
    Fast
    Data
  • 10. Success on the Data Stack
    Fast, Big
    Services
    Big
    Analytics
    Fast
    Data
  • 11. Success on the Data Stack
    Fast, Big, and Focused
    Focused
    Services
    Big
    Analytics
    Fast
    Data
  • 12. #1: Fast
  • 13. Success on the Data Stack
    Fast Data
    real-time
    Kdb
    Netezza
    Esper
    Vertica
    MongoDB
    speed
    InfoBright
    Aster
    MySQL
    MapR
    Greenplum
    Postgres
    batch
    Hadoop
    Services
    megabytes
    petabytes
    scale
    Analytics
    free, open-source
    Data
    commercial
  • 14. Fast Data With Cheap Memory
    1964 – Univac 2k
    $51 million/MB
    2011 – DDR 1GB
    1 cent/MB
    data sources: http://www.sharkyextreme.com & http://www.webservicessummit.com/Trends/TechTrends1/img11.html, plotted with ggplot2
  • 15. #2: Big
  • 16. Success on the Data Stack
    Big Analytics
    custom
    (hardware)
    real-time
    speed
    Revolution R
    R
    custom
    distributed
    SAP
    SAS
    SciPy
    SPSS
    batch
    Services
    megabytes
    petabytes
    scale
    Analytics
    free, open-source
    Data
    commercial
  • 17. The Promise ofAnalytics
    extract
    learn
    predict
    DATA
    FEATURES
    MODELS
    “More data usually beats better algorithms.”
  • 18. #3: Focused
  • 19. Success on the Data Stack
    Focused Services
    Focused
    Services
    Analytics
    Data
  • 20. “Real-time, large-scale analytics in a focused vertical.”
    credit: Joe Reisinger, Metamarkets
  • 21. Success on the Data Stack
    Fast, Big, and Focused
    Focused
    Services
    Big
    Analytics
    Fast
    Data
  • 22. Thank You. Questions?
    Michael E. Driscoll, CTO, Metamarkets
    @medriscoll
    O’Reilly Strata Online | May 25, 2011