• Like
Big data, why now?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Big data, why now?

  • 610 views
Published

This talk presents an economic explanation for the explosive growth of big data techniques.

This talk presents an economic explanation for the explosive growth of big data techniques.

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
610
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
27
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Why is big data sooo fashionable with big and small companies from different industries? What has suddenly changed?
  • Google searches are up 10x over just four years ago.
  • Hadoop use is exploding. We chose this example, which shows job trends for Hadoop. Further evidence that you should pay attention during this talk.
  • But we have seen constant growth for a long time. And simple growth would only explain some kinds of companies starting with big data (probably big ones) and then slow adoption. Databases started with big companies and took 20 years or more to reach everywhere because the need exceeded cost at different times for different companies. The internet, on the other hand, largely happened to everybody at the same time so it changed things in nearly all industries at all scales nearly simultaneously. Why is big data exploding right now and why is it exploding at all?
  • The different kinds of scaling laws have different shape and I think that shape is the key.
  • The value of analytics always increases with more data, but the rate of increase drops dramatically after an initial quick increase.
  • In classical analytics, the cost of doing analytics increases sharply.
  • The result is a net value that has a sharp optimum in the area where value is increasing rapidly and cost is not yet increasing so rapidly.
  • New techniques such as Hadoop result in linear scaling of cost. This is a change in shape and it causes a qualitative change in the way that costs trade off against value to give net value. As technology improves, the slope of this cost line is also changing rapidly over time.
  • This next sequence shows how the net value changes with different slope linear cost models.
  • Notice how the best net value has jumped up significantly
  • And as the line approaches horizontal, the highest net value occurs at dramatically larger data scale.
  • Constant time implies constantfactor of growth. Thus the accumulation of all of history before 10 time units ago is less than half the accumulation in the last 10 units alone. This is true at all time.
  • Startups use this fact to their advantage and completely change everything to allow time-efficient development initially with conversion to computer-efficient systems later.
  • Here the later history is shown after the initial exponential growth phase. This changes the economics of the company dramatically.
  • The startup can throw away history because it is so small. That means that the startup has almost no compatibility requirement because the data lost due to lack of compatibility is a small fraction of the total data.
  • A large enterprise cannot do that. They have to have access to the old data and have to share between old data and Hadoop accessible data.This doesn’t have to happen with the proof of concept level, but it really must happen when hadoop first goes to production.

Transcript

  • 1. MapR: The Next Generation Big Data Platform©MapR Technologies - Confidential 1
  • 2. Big is the next big thing Big data and Hadoop are exploding Companies are being funded Books are being written Applications sprouting up everywhere©MapR Technologies - Confidential 2 2
  • 3. Slow Motion Explosion©MapR Technologies - Confidential 3 3
  • 4. Hadoop Explosion©MapR Technologies - Confidential 4 4
  • 5. Why Now?  But Moore’s law has applied for a long time  Why is Hadoop exploding now?  Why not 10 years ago?  Why not 20?6/1/2012 ©MapR Technologies - Confidential 5 5
  • 6. Size Matters, but … If it were just availability of data then existing big companies would adopt big data technology first©MapR Technologies - Confidential 6 6
  • 7. Size Matters, but … If it were just availability of data then existing big companies would adopt big data technology first They didn’t©MapR Technologies - Confidential 7 7
  • 8. Or Maybe Cost If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte©MapR Technologies - Confidential 8 8
  • 9. Or Maybe Cost If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte They didn’t©MapR Technologies - Confidential 9 9
  • 10. Backwards adoption Under almost any threshold argument startups would not adopt big data technology first©MapR Technologies - Confidential 10 10
  • 11. Backwards adoption Under almost any threshold argument startups would not adopt big data technology first They did©MapR Technologies - Confidential 11 11
  • 12. Everywhere at Once? Something very strange is happening – Big data is being applied at many different scales – At many value scales – By large companies and small©MapR Technologies - Confidential 12 12
  • 13. Everywhere at Once? Something very strange is happening – Big data is being applied at many different scales – At many value scales – By large companies and small Why?©MapR Technologies - Confidential 13 13
  • 14. The Conventional AnswerMore data is being produced more quicklyData sizes are bigger than even a very large computer can holdCost to create and store continues to decrease©MapR Technologies - Confidential 14
  • 15. Analytics Scaling Laws Analytics scaling is all about the 80-20 rule – Big gains for little initial effort – Rapidly diminishing returns The key to net value is how costs scale – Old school – exponential scaling – Big data – linear scaling, low constant Cost/performance has changed radically – IF you can use many commodity boxes©MapR Technologies - Confidential 15
  • 16. You’re kidding, people do that? We didn’t know that! We should have known that We knew that©MapR Technologies - Confidential 16
  • 17. NSA, non-proliferation 1 0.75 Industry-wide data consortium Value 0.5 In-house analytics Intern with a spreadsheet 0.25 Anybody with eyes 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 17
  • 18. 1 0.75 Net value optimum has a Value 0.5 sharp peak well before maximum effort 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 18
  • 19. But scaling laws are changing both slope and shape©MapR Technologies - Confidential 19
  • 20. 1 0.75 Value 0.5 More than just a little 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 20
  • 21. 1 0.75 Value 0.5 They are changing a LOT! 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 21
  • 22. ©MapR Technologies - Confidential 22
  • 23. ©MapR Technologies - Confidential 23
  • 24. 1 0.75 Value 0.5 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 24
  • 25. 1 0.75 Value 0.5 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 25
  • 26. 1 0.75 A tipping point is reached and things change radically … Value 0.5 Initially, linear cost scaling actually makes things worse 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 26
  • 27. Pre-requisites for Tipping To reach the tipping point, Algorithms must scale out horizontally – On commodity hardware – That can and will fail Data practice must change – Denormalized is the new black – Flexible data dictionaries are the rule – Structured data becomes rare©MapR Technologies - Confidential 27
  • 28. But there is more Especially for large enterprises©MapR Technologies - Confidential 28
  • 29. Physics of startup companies©MapR Technologies - Confidential 29
  • 30. For startups History is always small The future is huge Must adopt new technology to survive Compatibility is not as important – In fact, incompatibility is assumed©MapR Technologies - Confidential 30
  • 31. Physics of large companies Absolute growth still very large Startup phase©MapR Technologies - Confidential 31
  • 32. For large businesses Present state is always large Relative growth is much smaller Absolute growth rate can be very large Must adopt new technology to survive – Cautiously! – But must integrate technology with legacy Compatibility is crucial©MapR Technologies - Confidential 32
  • 33. The startup technology picture No compatibility requirement Old computers and software Expected hardware and software growth Current computers and software©MapR Technologies - Confidential 33
  • 34. The large enterprise picture Must work together ? Current hardware and software Proof of concept Hadoop cluster Long-term Hadoop cluster©MapR Technologies - Confidential 34
  • 35. So that is why and why now©MapR Technologies - Confidential 35 35
  • 36. So that is why, and why now What can you do with it? And how?©MapR Technologies - Confidential 36 36
  • 37. Scale-free Computing Map-reduce – pure functions for practical batch parallel computation – high level languages like Hive and Pig available – MapR provides standard access systems via NFS and ODBC BSP – pure functions for synchronous iterative actor-based compute – Apache Giraph provides practical implementation Actors – tuple passing with transformations – Storm provides practical implementation©MapR Technologies - Confidential 37
  • 38. Future Proof Schemas Denormalize data where possible to avoid seeks – use embedded lists – duplicate data Flexible Schemas – use standard system for data serialization – must provide protocol migration without versioning – Protobufs (Google), Avro (Apache) and Thrift can all be used©MapR Technologies - Confidential 38
  • 39. Open Compute and Storage Big data has mass and inertia – once it lands, it should not move Computation must move to the data – map-reduce, Storm, Giraph … all OK – conventional relational models … not OK One model is not enough – must allow access by multiple models of computation©MapR Technologies - Confidential 39
  • 40. More Information Contact: – tdunning@maprtech.com – @ted_dunning Slides and such: – http://info.mapr.com/ted-paris-05-2012©MapR Technologies - Confidential 40
  • 41. Thank You©MapR Technologies - Confidential 41