29,062 views

Building Data Start-Ups: Fast, Big, and Focused

====================================================== 1. Building Data Start-ups: Fast, Big, and Focused ====================================================== * 2 parts today: (i) forces behind big data opportunity (ii) big data stack and how to compete with in * building a data start-up is a bit like Sumo Wrestling * data is heavy, has weight - we need agile strategies to succeed * today: talk about opportunities for data, strategies for success * in a nutshell: data start-ups must be fast, big, and focused ================================================ 2. The Big Data Opportunity ================================================ * it's a cliche by now: there is a mountain of data in this world * understanding these forces is critical to data start-up's strategy <transition>: what are some of the tectonic forces at work? ================================================ 3-4. Attack of the Exponentials ================================================ * these are something that i call 'attack of exponentials' * VCs like curves like [transition] * in the past few decades, the cost of storage, CPU, and bandwidth has been exponentially dropping, while network access has shot up * in 1980, a terabyte of storage cost $14 MILLION - today it's $47 dollars <transition>: exponential economics, together with two other forces ================================================ 5. Intersection of Three Forces ================================================ * ... form the inputs to this massive increase in data, the data singularity * sensor networks the phones, GPS devices, laptops, and instrumented spimes * cloud computing has democratized and made computing power & storage a utility ( "even if it turns out that the cloud is actually just some place in Virginia.") ================================================ 6-7. Data Value Must Exceed Data Cost ================================================ * the laws of economics have not changed: value must exceed cost * the upper left side of this graph shows data whose value exceeded its cost of collecting, storing, and computing over a decade ago * the human genome data cost $3 billion (in 2000) [shift slide] * but as the tide shifts, new classes of data are revealed as being valuable * the dog genome cost only $30 million (in 2005) * web log data used to be tossed; now it's cheap enough to collect, store, and compute over * i encourage all of you, think of a data source that was previously not collected, or not kept around, and mull the possibilities <transition>: with that, i would like to now talk about the emerging stack, and the strategies for being successful within it ================================================ 8-9, 10-11. Success on the Data Stack ================================================ * here is my vision of the emerging big data stack * at bottom is data - persistence layer - databases - the brawn * in the middle is analytics - the intelligence layer * at the top - services, what you all the brains and brawn [ transitions in quite succession ] * I argue that data start-ups, to succeed, must have == FAST data, BIG analytics, and FOCUSED services == * let's take each of these in turn, exploring the competitive axes at each layer starting from the bottom of the stack, data ================================================ 12. FAST ================================================ * as I said before, data is heavy * being able to move big data quickly is key * let's pull the data layer out of the stack & examine it ================================================ 13. Fast Data ================================================ * so we have the two competitive axes on the data layer * the first axis is scale: for data, the scaling issue has been solved. * Hadoop

Technology◦

The Intersection of Three ForcesYields Higher Volume & Velocity of Dataexponential economicssensor networkscloud computing

Data Value Must Exceed Data Cost... New Classes of Data are Now Valuable

Success on the Data StackServicesAnalyticsData

Success on the Data StackFastServicesAnalyticsFastData

Success on the Data StackFast, BigServicesBigAnalyticsFastData

Success on the Data StackFast, Big, and FocusedFocusedServicesBigAnalyticsFastData

Success on the Data StackFast Datareal-timeKdbNetezzaEsperVerticaMongoDBspeedInfoBrightAsterMySQLMapRGreenplumPostgresbatchHadoopServicesmegabytespetabytesscaleAnalyticsfree, open-sourceDatacommercial

Fast Data With Cheap Memory1964 – Univac 2k$51 million/MB2011 – DDR 1GB1 cent/MBdata sources: http://www.sharkyextreme.com & http://www.webservicessummit.com/Trends/TechTrends1/img11.html, plotted with ggplot2

Success on the Data StackBig Analyticscustom(hardware)real-timespeedRevolution RRcustom distributedSAPSASSciPySPSSbatchServicesmegabytespetabytesscaleAnalyticsfree, open-sourceDatacommercial

The Promise ofAnalyticsextractlearnpredictDATAFEATURESMODELS“More data usually beats better algorithms.”

Success on the Data StackFocused ServicesFocusedServicesAnalyticsData

“Real-time, large-scale analytics in a focused vertical.”credit: Joe Reisinger, Metamarkets

Thank You. Questions?Michael E. Driscoll, CTO, Metamarkets@medriscollO’Reilly Strata Online | May 25, 2011