Tackling Big Data with Hadoop

2,094 views

Published on

An introduction to Hadoop, present at Vermont Code Camp 2011.

Published in: Technology, News & Politics
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
2,094
On SlideShare
0
From Embeds
0
Number of Embeds
687
Actions
Shares
0
Downloads
30
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Tackling Big Data with Hadoop

  1. 1. TACKLING BIG DATA WITH HADOOP David HowellSunday, September 11, 11
  2. 2. WHAT IS BIG DATA?Sunday, September 11, 11
  3. 3. WHAT IS BIG DATA? Google web crawlSunday, September 11, 11
  4. 4. WHAT IS BIG DATA? stream of Twitter messagesSunday, September 11, 11
  5. 5. WHAT IS BIG DATA? Annoying Farmville requests on FacebookSunday, September 11, 11
  6. 6. WHAT IS BIG DATA? terabyte-scale data sets awkward to work with using traditional toolsSunday, September 11, 11
  7. 7. WHAT IS BIG DATA? requires distributed computingSunday, September 11, 11
  8. 8. MEDIUM DATA dozens to hundreds of gigabytes still awkward to work with using traditional toolsSunday, September 11, 11
  9. 9. MAP-REDUCE http://labs.google.com/papers/mapreduce.htmlSunday, September 11, 11
  10. 10. Sunday, September 11, 11
  11. 11. Sunday, September 11, 11
  12. 12. COUNTING AT SCALESunday, September 11, 11
  13. 13. function map_1(t, search_phrase) emit(search_phrase, 1) sort and shuffle function reduce_1(search_phrase, counts) total = 0 for count in counts total += count emit(search_phrase, total) function map_2(search_phrase, total) emit(total, search_phrase) sort and shuffle function reduce_2(total, search_phrases) for search_phrase in search_phrases emit(search_phrase, total)Sunday, September 11, 11
  14. 14. map shuffle reduce cat IN | sort | uniq -c > OUT map shuffle reduce awk ‘{print $2,$1}’ OUT | sort > FINALSunday, September 11, 11
  15. 15. WHY BOTHER?Sunday, September 11, 11
  16. 16. HADOOPSunday, September 11, 11
  17. 17. DISTRIBUTED COMPUTING PLATFORMSunday, September 11, 11
  18. 18. TOOLS IN THE PLATFORM Map-Reduce APIs Higher Level APIs •Java •Hive •C++ •Cascading •UNIX pipes •PigSunday, September 11, 11
  19. 19. THE ORIGIN STORYSunday, September 11, 11
  20. 20. WHO’S USING IT?Sunday, September 11, 11
  21. 21. HADOOP How does it work?Sunday, September 11, 11
  22. 22. Sunday, September 11, 11
  23. 23. Sunday, September 11, 11
  24. 24. Sunday, September 11, 11
  25. 25. Sunday, September 11, 11
  26. 26. DEMO!Sunday, September 11, 11
  27. 27. YOUR DATA PLATFORM ad hoc unstructured prototyping experiment data-driven curiosity playSunday, September 11, 11
  28. 28. LEARN MORE http://hadoop.apache.org/ http://www.cloudera.com/ Hadoop: The Definitive Guide @dehowell dave@poorlytrainedape.com http://github.com/dehowell/hadoop-crypto-demoSunday, September 11, 11

×