• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,704
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
27
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hadoop Simple. Scalable.
  • 2. @markgunnels mark@catamorphiclabs.com
  • 3. Java. Clojure. Ruby. Cloudera Certified
  • 4. posscon.org April 15, 16, and 17
  • 5. Agenda Overview Massively Large Data Sets and the problems therein Distributed File System MapReduce Pig
  • 6. Overview
  • 7. Doug Cutting Genius
  • 8. Favorite Hadoop Story New York Times
  • 9. 4 Terabytes of Source Articles.
  • 10. 24 Hours.
  • 11. 5.5 Terabytes of PDFs.
  • 12. Did it again.
  • 13. $240.
  • 14. Infoporn from Yahoo 73 hours 490 TB Shuffling 280 TB Output 4000 Nodes 16 PB Disk Space 32K Cores 64 TB RAM
  • 15. Hadoop solves...
  • 16. Analyzing Massively Large Datasets
  • 17. Two Problems You have to distribute.
  • 18. Data Storage Capacity has increased rapidly beyond read speeds. Datasets won't fit on one disk. Tolerate node failure.
  • 19. Data Analysis Combine data from many machines. Tolerate node failure.
  • 20. How Hadoop solves these problems.
  • 21. Send Code to Data. Not Data to Code.
  • 22. Data Storage HDFS
  • 23. Name Node. Data Nodes. Master - Slave Relationship
  • 24. Shard massive files across multiple machines. MB, GB, and TB
  • 25. Tolerant of Node Failure Files replicated across at least 3 nodes.
  • 26. HDFS behaves like a normal file system. No true appends yet.
  • 27. Demonstration.
  • 28. Data Analysis MapReduce
  • 29. Job Tracker. Task Nodes. Master - Slave Relationship.
  • 30. map
  • 31. Demonstration
  • 32. pmap
  • 33. Demonstration
  • 34. reduce
  • 35. Demonstration
  • 36. (reduce (pmap))
  • 37. Demonstration.
  • 38. MapReduce Java
  • 39. Nobody likes it. :-)
  • 40. MapReduce Ruby. Python. Unix Utilities.
  • 41. MapReduce Clojure
  • 42. Hadoop Ecosystem Pigkeeper. Hive. Cascading.
  • 43. Pig
  • 44. HBase