Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Scalability in Hadoop and                                               Similar Systems©MapR Technologies - Confidential  ...
Big is the next big thing     Big data and Hadoop are exploding     Companies are being funded     Books are being writ...
Slow Motion Explosion©MapR Technologies - Confidential   3                                        3
Hadoop Explosion©MapR Technologies - Confidential   4                                        4
Why Now?        But Moore’s law has applied for a long time        Why is Hadoop exploding now?        Why not 10 years...
Size Matters, but …     If it were just availability of data then existing big companies would      adopt big data techno...
Size Matters, but …     If it were just availability of data then existing big companies would      adopt big data techno...
Or Maybe Cost     If it were just a net positive value then finance companies should      adopt first because they have h...
Or Maybe Cost     If it were just a net positive value then finance companies should      adopt first because they have h...
Backwards adoption     Under almost any threshold argument startups would not adopt      big data technology first©MapR T...
Backwards adoption     Under almost any threshold argument startups would not adopt      big data technology first       ...
Everywhere at Once?     Something very strange is happening       –   Big data is being applied at many different scales ...
Everywhere at Once?     Something very strange is happening       –   Big data is being applied at many different scales ...
The Conventional AnswerMore data is being produced more quicklyData sizes are bigger than even a very large computer can h...
Analytics Scaling Laws     Analytics scaling is all about the 80-20 rule       –   Big gains for little initial effort   ...
You’re kidding, people do that?                                      We didn’t know that!                                 ...
NSA, non-proliferation                                      1                                    0.75                     ...
1                                    0.75                                               Net value optimum has a           ...
But scaling laws are changing                                         both slope and shape©MapR Technologies - Confidentia...
1                                    0.75                           Value                                     0.5         ...
1                                    0.75                           Value                                     0.5         ...
©MapR Technologies - Confidential   22
©MapR Technologies - Confidential   23
1                                    0.75                           Value                                     0.5         ...
1                                    0.75                           Value                                     0.5         ...
1                                    0.75                                                                   A tipping poin...
Pre-requisites for Tipping     To reach the tipping point,     Algorithms must scale out horizontally       –   On commo...
Yeah… but wait©MapR Technologies - Confidential         28
The Standard Sort of Model     People talk about the law of large numbers as if it were …     Well, as if it were a law...
What if …     These assumptions are:     Changes have a       –   stationary,       –   independent,       –   finite va...
For Example                         Stuff                                    Tim e©MapR Technologies - Confidential    31
End point                         Stuff                                            has nice                                ...
What if the Assumptions are Wrong?     Take the finite variance as a simple example     This leads to Levy stable distri...
Is it Really Different?©MapR Technologies - Confidential   34
Stuff                                    Tim e©MapR Technologies - Confidential    35
What About Real Life?©MapR Technologies - Confidential             36
©MapR Technologies - Confidential   37
But is it Really Infinite Variance?     Or are there other kinds of phenomena that show this?     What about the indepen...
Why the Difference?                     The space of              Infinite                  The space of                  ...
What Happens with Interactions     Social phenomena defeat the law of large numbers     Distributions are well modeled b...
What are the                                    Implications?©MapR Technologies - Confidential         41
1                                    0.75                           Value                                     0.5         ...
In a Nutshell     Scalability is much more important than we thought     Mashups are more important than we thought    ...
Thank You©MapR Technologies - Confidential   44
whoami?     Ted Dunning       –   @ted_dunning       –   tdunning@maprtech.com (MapR distribution for Hadoop)       –   t...
Upcoming SlideShare
Loading in …5
×

Chicago finance-big-data

783 views

Published on

Talk about what scalability really means in terms of interacting processes and statistics of growth

Published in: Technology, Business

Chicago finance-big-data

  1. 1. Scalability in Hadoop and Similar Systems©MapR Technologies - Confidential 1
  2. 2. Big is the next big thing Big data and Hadoop are exploding Companies are being funded Books are being written Applications sprouting up everywhere©MapR Technologies - Confidential 2 2
  3. 3. Slow Motion Explosion©MapR Technologies - Confidential 3 3
  4. 4. Hadoop Explosion©MapR Technologies - Confidential 4 4
  5. 5. Why Now?  But Moore’s law has applied for a long time  Why is Hadoop exploding now?  Why not 10 years ago?  Why not 20?9/18/2012 ©MapR Technologies - Confidential 5 5
  6. 6. Size Matters, but … If it were just availability of data then existing big companies would adopt big data technology first©MapR Technologies - Confidential 6 6
  7. 7. Size Matters, but … If it were just availability of data then existing big companies would adopt big data technology first They didn’t©MapR Technologies - Confidential 7 7
  8. 8. Or Maybe Cost If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte©MapR Technologies - Confidential 8 8
  9. 9. Or Maybe Cost If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte They didn’t©MapR Technologies - Confidential 9 9
  10. 10. Backwards adoption Under almost any threshold argument startups would not adopt big data technology first©MapR Technologies - Confidential 10 10
  11. 11. Backwards adoption Under almost any threshold argument startups would not adopt big data technology first They did©MapR Technologies - Confidential 11 11
  12. 12. Everywhere at Once? Something very strange is happening – Big data is being applied at many different scales – At many value scales – By large companies and small©MapR Technologies - Confidential 12 12
  13. 13. Everywhere at Once? Something very strange is happening – Big data is being applied at many different scales – At many value scales – By large companies and small Why?©MapR Technologies - Confidential 13 13
  14. 14. The Conventional AnswerMore data is being produced more quicklyData sizes are bigger than even a very large computer can holdCost to create and store continues to decrease©MapR Technologies - Confidential 14
  15. 15. Analytics Scaling Laws Analytics scaling is all about the 80-20 rule – Big gains for little initial effort – Rapidly diminishing returns The key to net value is how costs scale – Old school – exponential scaling – Big data – linear scaling, low constant Cost/performance has changed radically – IF you can use many commodity boxes©MapR Technologies - Confidential 15
  16. 16. You’re kidding, people do that? We didn’t know that! We should have known that We knew that©MapR Technologies - Confidential 16
  17. 17. NSA, non-proliferation 1 0.75 Industry-wide data consortium Value 0.5 In-house analytics Intern with a spreadsheet 0.25 Anybody with eyes 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 17
  18. 18. 1 0.75 Net value optimum has a Value 0.5 sharp peak well before maximum effort 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 18
  19. 19. But scaling laws are changing both slope and shape©MapR Technologies - Confidential 19
  20. 20. 1 0.75 Value 0.5 More than just a little 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 20
  21. 21. 1 0.75 Value 0.5 They are changing a LOT! 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 21
  22. 22. ©MapR Technologies - Confidential 22
  23. 23. ©MapR Technologies - Confidential 23
  24. 24. 1 0.75 Value 0.5 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 24
  25. 25. 1 0.75 Value 0.5 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 25
  26. 26. 1 0.75 A tipping point is reached and things change radically … Value 0.5 Initially, linear cost scaling actually makes things worse 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 26
  27. 27. Pre-requisites for Tipping To reach the tipping point, Algorithms must scale out horizontally – On commodity hardware – That can and will fail Data practice must change – Denormalized is the new black – Flexible data dictionaries are the rule – Structured data becomes rare©MapR Technologies - Confidential 27
  28. 28. Yeah… but wait©MapR Technologies - Confidential 28
  29. 29. The Standard Sort of Model People talk about the law of large numbers as if it were … Well, as if it were a law It’s not … It is a context and assumption dependent theorem©MapR Technologies - Confidential 29
  30. 30. What if … These assumptions are: Changes have a – stationary, – independent, – finite variance distribution What happens if these assumptions are wrong? And which of them is really wrong?©MapR Technologies - Confidential 30
  31. 31. For Example Stuff Tim e©MapR Technologies - Confidential 31
  32. 32. End point Stuff has nice tractable distribution Tim e©MapR Technologies - Confidential 32
  33. 33. What if the Assumptions are Wrong? Take the finite variance as a simple example This leads to Levy stable distributions Like the Cauchy distribution©MapR Technologies - Confidential 33
  34. 34. Is it Really Different?©MapR Technologies - Confidential 34
  35. 35. Stuff Tim e©MapR Technologies - Confidential 35
  36. 36. What About Real Life?©MapR Technologies - Confidential 36
  37. 37. ©MapR Technologies - Confidential 37
  38. 38. But is it Really Infinite Variance? Or are there other kinds of phenomena that show this? What about the independence assumption? What if the supposedly independent components of the system communicate? Like we do. Everyday. All the time.©MapR Technologies - Confidential 38
  39. 39. Why the Difference? The space of Infinite The space of all things that variance interacting change things Law of large Interacting numbers agentsApologies and credit toSimon DaDeo, SFI ©MapR Technologies - Confidential 39
  40. 40. What Happens with Interactions Social phenomena defeat the law of large numbers Distributions are well modeled by “rich get richer” processes – Pittman-Yar process, Indian Buffet Limiting dstributions are heavy tailed, power law We see these distributions everywhere – price of cotton in the 19th century – word frequencies – popularity of Github projects – equity pricing and volumes – sizes of cities – popularity of web-sites©MapR Technologies - Confidential 40
  41. 41. What are the Implications?©MapR Technologies - Confidential 41
  42. 42. 1 0.75 Value 0.5 0.25 0 0 500 1000 1500 2,000 Scale©MapR Technologies - Confidential 42
  43. 43. In a Nutshell Scalability is much more important than we thought Mashups are more important than we thought Network effects are more important than we thought Exploration is more important than we thought Hadoop style linear scaling must be mixed with ad hoc analysis©MapR Technologies - Confidential 43
  44. 44. Thank You©MapR Technologies - Confidential 44
  45. 45. whoami? Ted Dunning – @ted_dunning – tdunning@maprtech.com (MapR distribution for Hadoop) – tdunning@apache.com (Mahout, Hadoop, Lucene, Zookeeper, Drill) – ted.dunning@gmail.com (me) More info: http://www.mapr.com/company/events/hadoop-in-finance-2012©MapR Technologies - Confidential 45

×