Your SlideShare is downloading. ×
0
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Mr hadoop seedrocket
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mr hadoop seedrocket

1,199

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,199
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. MapReduce & Hadoop● Juanjo Mostazo● SeedRocket BCN● June 2011
  • 2. M/R: Motivation● Process big amount of data toproduce other data● Scale up VS Scale out
  • 3. M/R: What is it?● Different programming paradigm● Based on a google paper (2004)● Automatic parallelization and distribution● I/O Scheduling● Fault tolerance● Status and monitoring
  • 4. M/R: Basics● Input & Output: set of key/value pairs● Big amount of data group & sort● Job = Two phases = Mapper & Reducer● Map (in_key, in_value) → list(interm_key, interm_value)● Reduce (interm_key, list(interm_value)) → list (out_key, out_value)
  • 5. M/R: Example(word counter)
  • 6. M/R: Workflow
  • 7. Hadoop: What is it?● Framework based on GMR / GFS● Apache project● Developed in Java● Multiple applications● Used by many companies● And growing!
  • 8. Hadoop: HDFS dist. systems● Data is sent tocomputation(whereas herecomputation goes todata)● Nodes must get infofrom each other● Very difficult torecover from partialfailure
  • 9. Hadoop: HDFS concepts● Distributed file system.Layer on top ext3, xfs...● Works better on huge files● Redundancy (default 3)● Bad seeking, no append!● Good rack scale. Notgood data center scale● File divided in 64Mb –128Mb blocks
  • 10. Hadoop: HDFS Architecture
  • 11. Hadoop: Architecture v1
  • 12. Hadoop: Architecture v2
  • 13. Hadoop: Architecture v3
  • 14. M/R: Example(word counter)
  • 15. Hadoop: EcoSystem ● FLUME ● ZOOKEEPER
  • 16. Hadoop: Advanced stuff● Distributed caches● Partitioner● Sort comparator● Group comparator● Combiner● Input format & Record reader● MultiInput● MultiOutput● Compression (LZO)
  • 17. Hadoop: Conclussions● Simplify large-scale computation● Hide parallel programming issues● Easy to get into & develop● Deeply used & maintained by community● Possibility yo throw away RDBMs! (Bottleneck)
  • 18. MapReduce & Hadoop● Juanjo Mostazo● juanj.mostazo@gmail.com

×