Your SlideShare is downloading. ×
Hadoop - Simple. Scalable.
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hadoop - Simple. Scalable.

1,801

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,801
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Hadoop Simple. Scalable.
  • 2. @markgunnels mark@catamorphiclabs.com
  • 3. Java. Clojure. Ruby. Cloudera Certified
  • 4. posscon.org April 15, 16, and 17
  • 5. Agenda Overview Massively Large Data Sets and the problems therein Distributed File System MapReduce Pig
  • 6. Overview
  • 7. Doug Cutting Genius
  • 8. Favorite Hadoop Story New York Times
  • 9. 4 Terabytes of Source Articles.
  • 10. 24 Hours.
  • 11. 5.5 Terabytes of PDFs.
  • 12. Did it again.
  • 13. $240.
  • 14. Infoporn from Yahoo 73 hours 490 TB Shuffling 280 TB Output 4000 Nodes 16 PB Disk Space 32K Cores 64 TB RAM
  • 15. Hadoop solves...
  • 16. Analyzing Massively Large Datasets
  • 17. Two Problems You have to distribute.
  • 18. Data Storage Capacity has increased rapidly beyond read speeds. Datasets won't fit on one disk. Tolerate node failure.
  • 19. Data Analysis Combine data from many machines. Tolerate node failure.
  • 20. How Hadoop solves these problems.
  • 21. Send Code to Data. Not Data to Code.
  • 22. Data Storage HDFS
  • 23. Name Node. Data Nodes. Master - Slave Relationship
  • 24. Shard massive files across multiple machines. MB, GB, and TB
  • 25. Tolerant of Node Failure Files replicated across at least 3 nodes.
  • 26. HDFS behaves like a normal file system. No true appends yet.
  • 27. Demonstration.
  • 28. Data Analysis MapReduce
  • 29. Job Tracker. Task Nodes. Master - Slave Relationship.
  • 30. map
  • 31. Demonstration
  • 32. pmap
  • 33. Demonstration
  • 34. reduce
  • 35. Demonstration
  • 36. (reduce (pmap))
  • 37. Demonstration.
  • 38. MapReduce Java
  • 39. Nobody likes it. :-)
  • 40. MapReduce Ruby. Python. Unix Utilities.
  • 41. MapReduce Clojure
  • 42. Hadoop Ecosystem Pigkeeper. Hive. Cascading.
  • 43. Pig
  • 44. HBase

×