Million Monkeys User Group

399 views

Published on

Million Monkeys presentation given to Silicon Mountain Technology Group on 11-12-2012.

  • Be the first to comment

  • Be the first to like this

Million Monkeys User Group

  1. 1. DO NOT USE PUBLICLY Million Monkeys PRIOR TO 10/23/12 Headline Goes Here Jesse Anderson | Curriculum Developer and Instructor Speaker Name or Subhead Goes Here November 20121
  2. 2. About Me • Cloudera - Educational Services Team • Twitter - @jessetanderson • Blog and more info: http://www.jesse-anderson.com • Screencasts on Pragmatic Programmers: Buy It Now on http://www.jesse-anderson.com • President – Northern Nevada Software Developers Group2
  3. 3. About Cloudera • Cloudera is “The commercial Hadoop company” • Founded by leading experts on Hadoop from Facebook, Google, Oracle and Yahoo • Provides consulting and training services for Hadoop users • Staff includes committers to virtually all Hadoop projects3
  4. 4. Introduction • Infinite Monkey Theorem • Hadoop • Million Monkeys Algorithm • Business Case4
  5. 5. Infinite Monkey Theorem5
  6. 6. Exponential Growth (aka Big Data) Odds of finding a group Contiguous Combinations of characters is 1 in 26 Characters raised to the power of the number of 8 208,827,064,576 contiguous characters 9 5,429,503,678,976 10 141,167,095,653,3766
  7. 7. Hadoop • Apache Project • Reliable, Scalable, Distributed Computing • Software Framework • MapReduce • Distributed File System (HDFS) • Other projects7
  8. 8. Map Create or process the input data8
  9. 9. Reduce Process data from Map into something usable9
  10. 10. Data Flow10
  11. 11. Million Monkeys Algorithm11
  12. 12. Business Case12
  13. 13. Hadoop Scalability Percent of Linear Scalability 100 80 Percent 60 RDBMS Hadoop 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Nodes RDBMS = Relational Database13
  14. 14. Business Value of Scalability Scaling does not require Adding more computers massive re-engineering to cluster gets a and complete rewrites of predictable increase in code computational power and storage SAVE SAVE14
  15. 15. Going Viral (and taking over the world) Covered internationally 26,000 unique in BBC, Wall Street visits from 119 Journal, Wired and countries in Slashdot one day15
  16. 16. Next Steps • Books • Hadoop: The Definitive Guide - Tom White • Hadoop Operations - Eric Sammer • Cloudera Training • Developer, Admin, Hive and Pig, HBase, Essentials • CDH • Clouderas Apache Distribution Including Hadoop • Open Source • VM Image16
  17. 17. Conclusion • MapReduce breaks up problem efficiently • No code changes to scale • Incredible scalability • Enables previously impossible tasks17
  18. 18. 18

×