Interesting statistical question. Thought about since Aristotle.Randomness+Resouces+Time=Anything PossibleNo real monkeys – need virtual monkeys
Shakespeare lazy. Heavily influenced English Literature.Big Data isn’t always a huge file. It can be high computation.
This is not a map of MT and ID1 to 20 node testingKeep efficiency up RDBMS efficiency in gutter
Engineers not spending time coding to scale. Busy adding new features.No code changes for scaling. Took 1.5 months on one computer and 3.5 days on 20 nodesSpending on new computers gives a consistent, linear increase. Compare spending on RDBMS and Hadoop.
DO NOT USE PUBLICLY Million Monkeys PRIOR TO 10/23/12 Headline Goes Here Jesse Anderson | Curriculum Developer and Instructor Speaker Name or Subhead Goes Here November 20121
About Me • Cloudera - Educational Services Team • Twitter - @jessetanderson • Blog and more info: http://www.jesse-anderson.com • Screencasts on Pragmatic Programmers: Buy It Now on http://www.jesse-anderson.com • President – Northern Nevada Software Developers Group2
About Cloudera • Cloudera is “The commercial Hadoop company” • Founded by leading experts on Hadoop from Facebook, Google, Oracle and Yahoo • Provides consulting and training services for Hadoop users • Staff includes committers to virtually all Hadoop projects3
Introduction • Infinite Monkey Theorem • Hadoop • Million Monkeys Algorithm • Business Case4
Exponential Growth (aka Big Data) Odds of finding a group Contiguous Combinations of characters is 1 in 26 Characters raised to the power of the number of 8 208,827,064,576 contiguous characters 9 5,429,503,678,976 10 141,167,095,653,3766
Business Value of Scalability Scaling does not require Adding more computers massive re-engineering to cluster gets a and complete rewrites of predictable increase in code computational power and storage SAVE SAVE14
Going Viral (and taking over the world) Covered internationally 26,000 unique in BBC, Wall Street visits from 119 Journal, Wired and countries in Slashdot one day15
Next Steps • Books • Hadoop: The Definitive Guide - Tom White • Hadoop Operations - Eric Sammer • Cloudera Training • Developer, Admin, Hive and Pig, HBase, Essentials • CDH • Clouderas Apache Distribution Including Hadoop • Open Source • VM Image16
Conclusion • MapReduce breaks up problem efficiently • No code changes to scale • Incredible scalability • Enables previously impossible tasks17