Hadoop introduction berlin buzzwords 2011

3,435 views
3,363 views

Published on

Hadoop introduction berlin buzzwords 2011 from Berlin Buzzwords 2011

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,435
On SlideShare
0
From Embeds
0
Number of Embeds
827
Actions
Shares
0
Downloads
158
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

Hadoop introduction berlin buzzwords 2011

  1. 1. An Introduction Kai Voigt, Cloudera Inc Berlin Buzzwords, June 6th 2011Montag, 6. Juni 2011
  2. 2. Big Data • Capture • Storage • Search • AnalyticsMontag, 6. Juni 2011
  3. 3. hadoop.apache.org Google Filesystem Hadoop Distributed (GFS) Filesystem (HDFS) MapReduce MapReduceMontag, 6. Juni 2011
  4. 4. Hadoop Distributed Filesystem • Easy Access • Distributed • Redundant • ScalableMontag, 6. Juni 2011
  5. 5. File Splits Large File 6440M Block Block Block Block Block Block Block Block 1 2 3 4 5 6 ... 100 101 64M 64M 64M 64M 64M 64M 64M 40MMontag, 6. Juni 2011
  6. 6. Block Placement Block Block Block Block Block 1 2 3 2 2 Block Block Block 3 1 3 Block 1 Node 1 Node 2 Node 3 Node 4 Node 5Montag, 6. Juni 2011
  7. 7. MapReduce Block Block Block Block Input 1 2 3 4 map() Inter-mediate Data reduce() OutputMontag, 6. Juni 2011
  8. 8. WordCount Example map (offset, line) { foreach word in line { emit (word, 1) } }Montag, 6. Juni 2011
  9. 9. WordCount Example reduce (word, count[]) { total = 0; foreach number in count[] { total += number } emit (word, total) }Montag, 6. Juni 2011
  10. 10. map() ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1)Montag, 6. Juni 2011
  11. 11. Sort & Shuffle ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , 1) ( , (1,1)) ( , (1,1,1,1)) ( , (1,1)) ( , (1,1)) ( , (1))Montag, 6. Juni 2011
  12. 12. reduce() ( , (1,1)) ( , (1,1,1,1)) ( , (1,1)) ( , (1,1)) ( , (1)) ( , 2) ( , 4) ( , 2) ( , 2) ( , 1)Montag, 6. Juni 2011
  13. 13. Use Case: Recommendations • "People looking as this article also looked at these articles" • "You might also know these people" • iTunes Genius Playlist • Banner PlacementMontag, 6. Juni 2011
  14. 14. Use Case: Text Processing • Document Indexing • Semantic AnalyticsMontag, 6. Juni 2011
  15. 15. Use Case: Machine Learning Data Data Data • Spam vs No Spam • Credit Card Fraud • "People of Interest" InformationMontag, 6. Juni 2011
  16. 16. Use Case: Graphs • Shortest Paths • Bottleneck Nodes • Flow Optimization • Spanning Trees • Spanning RoutesMontag, 6. Juni 2011
  17. 17. Hadoop Ecosystem Hive & Pig High Level Language Access HBase Real Time Access Sqoop SQL to/from Hadoop Flume Distributed Data Collection Oozie Job Workflow Mahout Machine Learning Library many moreMontag, 6. Juni 2011
  18. 18. Homework • Clouderas Distribution including Hadoop (CDH3) • Online Tutorials • WordCount Example • Conference Demo ClusterMontag, 6. Juni 2011
  19. 19. Quick Demo!Montag, 6. Juni 2011
  20. 20. Thank you! • Kai Voigt • kai@cloudera.com • http://www.cloudera.com/ • http://apache.hadoop.org/Montag, 6. Juni 2011

×