Like this presentation? Why not share!

- The Science of Managing Data Scient... by Kate Matsudaira 76754 views
- Introduction to R for Data Mining by Revolution Analytics 35625 views
- Big Data [sorry] & Data Science: Wh... by Data Science London 97830 views
- Myths and Mathemagical Superpowers ... by David Pittman 54624 views
- How to Interview a Data Scientist by Daniel Tunkelang 82303 views
- Titan: The Rise of Big Graph Data by Marko Rodriguez 111343 views

24,965

-1

-1

Published on

1. Building Data Start-ups: Fast, Big, and Focused

======================================================

* 2 parts today:

(i) forces behind big data opportunity

(ii) big data stack and how to compete with in

* building a data start-up is a bit like Sumo Wrestling

* data is heavy, has weight - we need agile strategies to succeed

* today: talk about opportunities for data, strategies for success

* in a nutshell: data start-ups must be fast, big, and focused

================================================

2. The Big Data Opportunity

================================================

* it's a cliche by now: there is a mountain of data in this world

* understanding these forces is critical to data start-up's strategy

<transition>: what are some of the tectonic forces at work?

================================================

3-4. Attack of the Exponentials

================================================

* these are something that i call 'attack of exponentials'

* VCs like curves like

[transition]

* in the past few decades, the cost of storage, CPU, and bandwidth has been exponentially dropping, while network access has shot up

* in 1980, a terabyte of storage cost $14 MILLION - today it's $47 dollars

<transition>: exponential economics, together with two other forces

================================================

5. Intersection of Three Forces

================================================

* ... form the inputs to this massive increase in data, the data singularity

* sensor networks the phones, GPS devices, laptops, and instrumented spimes

* cloud computing has democratized and made computing power & storage a utility

( "even if it turns out that the cloud is actually just some place in Virginia.")

================================================

6-7. Data Value Must Exceed Data Cost

================================================

* the laws of economics have not changed: value must exceed cost

* the upper left side of this graph shows data whose value exceeded

its cost of collecting, storing, and computing over a decade ago

* the human genome data cost $3 billion (in 2000)

[shift slide]

* but as the tide shifts, new classes of data are revealed as being valuable

* the dog genome cost only $30 million (in 2005)

* web log data used to be tossed; now it's cheap enough to collect,

store, and compute over

* i encourage all of you, think of a data source that was previously

not collected, or not kept around, and mull the possibilities

<transition>: with that, i would like to now talk about the emerging stack,

and the strategies for being successful within it

================================================

8-9, 10-11. Success on the Data Stack

================================================

* here is my vision of the emerging big data stack

* at bottom is data - persistence layer - databases - the brawn

* in the middle is analytics - the intelligence layer

* at the top - services, what you all the brains and brawn

[ transitions in quite succession ]

* I argue that data start-ups, to succeed, must have

== FAST data, BIG analytics, and FOCUSED services ==

* let's take each of these in turn,

exploring the competitive axes at each layer

starting from the bottom of the stack, data

================================================

12. FAST

================================================

* as I said before, data is heavy

* being able to move big data quickly is key

* let's pull the data layer out of the stack & examine it

================================================

13. Fast Data

================================================

* so we have the two competitive axes on the data layer

* the first axis is scale: for data, the scaling issue has been solved.

* Hadoop

Published in:
Technology

License: CC Attribution License

No Downloads

Total Views

24,965

On Slideshare

0

From Embeds

0

Number of Embeds

15

Shares

0

Downloads

0

Comments

1

Likes

63

No embeds

No notes for slide

- 1. Building Data Start-ups:<br />Fast, Big, and Focused<br />Michael E. Driscoll, CTO, Metamarkets<br />@medriscoll<br />O’Reilly Strata Online | May 25, 2011<br />
- 2. The Big Data <br />Opportunity<br />
- 3. The Attack of the Exponentials<br />
- 4. The Attack of the Exponentials<br />
- 5. The Intersection of Three Forces<br />Yields Higher Volume & Velocity of Data<br />exponential economics<br />sensor networks<br />cloud computing<br />
- 6. Data Value Must Exceed Data Cost<br />
- 7. Data Value Must Exceed Data Cost<br />... New Classes of Data are Now Valuable<br />
- 8. Success on the Data Stack<br />Services<br />Analytics<br />Data<br />
- 9. Success on the Data Stack<br />Fast<br />Services<br />Analytics<br />Fast<br />Data<br />
- 10. Success on the Data Stack<br />Fast, Big<br />Services<br />Big<br />Analytics<br />Fast<br />Data<br />
- 11. Success on the Data Stack<br />Fast, Big, and Focused<br />Focused<br />Services<br />Big<br />Analytics<br />Fast<br />Data<br />
- 12. #1: Fast<br />
- 13. Success on the Data Stack<br />Fast Data<br />real-time<br />Kdb<br />Netezza<br />Esper<br />Vertica<br />MongoDB<br />speed<br />InfoBright<br />Aster<br />MySQL<br />MapR<br />Greenplum<br />Postgres<br />batch<br />Hadoop<br />Services<br />megabytes<br />petabytes<br />scale<br />Analytics<br />free, open-source<br />Data<br />commercial<br />
- 14. Fast Data With Cheap Memory<br />1964 – Univac 2k<br />$51 million/MB<br />2011 – DDR 1GB<br />1 cent/MB<br />data sources: http://www.sharkyextreme.com & http://www.webservicessummit.com/Trends/TechTrends1/img11.html, plotted with ggplot2<br />
- 15. #2: Big<br />
- 16. Success on the Data Stack<br />Big Analytics<br />custom<br />(hardware)<br />real-time<br />speed<br />Revolution R<br />R<br />custom <br />distributed<br />SAP<br />SAS<br />SciPy<br />SPSS<br />batch<br />Services<br />megabytes<br />petabytes<br />scale<br />Analytics<br />free, open-source<br />Data<br />commercial<br />
- 17. The Promise ofAnalytics<br />extract<br />learn<br />predict<br />DATA<br />FEATURES<br />MODELS<br />“More data usually beats better algorithms.”<br />
- 18. #3: Focused<br />
- 19. Success on the Data Stack<br />Focused Services<br />Focused<br />Services<br />Analytics<br />Data<br />
- 20. “Real-time, large-scale analytics in a focused vertical.”<br />credit: Joe Reisinger, Metamarkets<br />
- 21. Success on the Data Stack<br />Fast, Big, and Focused<br />Focused<br />Services<br />Big<br />Analytics<br />Fast<br />Data<br />
- 22. Thank You. Questions?<br />Michael E. Driscoll, CTO, Metamarkets<br />@medriscoll<br />O’Reilly Strata Online | May 25, 2011<br />

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

FYI, I cited this slide set in a presentation I prepared - an introduction to big data for marketers [available here: http://www.slideshare.net/acanhoto/cim-2012-big-data]