An Introduction to MapReduce

6,598 views
6,675 views

Published on

Published in: Technology
1 Comment
9 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
6,598
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
387
Comments
1
Likes
9
Embeds 0
No embeds

No notes for slide

An Introduction to MapReduce

  1. 1. An Introduction to MapReduce Presented by Frane Bandov at the Operating Complex IT-Systems seminar Berlin, 1/26/2010
  2. 2. Outline•  Introduction•  Google MapReduce –  Idea –  Overview –  Fault Tolerance –  GFS: Google File System –  Job Example•  Alternative Implementations•  Reception and Criticism•  Trends and Future Development•  Conclusion2/16/10 An Introduction to MapReduce 2
  3. 3. Outline•  Introduction•  Google MapReduce –  Idea –  Overview –  Fault Tolerance –  GFS: Google File System –  Job Example•  Alternative Implementations•  Reception and Criticism•  Trends and Future Development•  Conclusion2/16/10 An Introduction to MapReduce 3
  4. 4. Introduction – ProblemSometimes we have to deal with huge amounts of dataTBytes250200 150100 50 0 You Facebook Yahoo! Groups German Climate Computing Centre 2/16/10 An Introduction to MapReduce 4
  5. 5. Introduction – Problem The data needs to be processed, but how? Can‘t process all of this data on one machine  Distribute the processing to many machines2/16/10 An Introduction to MapReduce 5
  6. 6. Introduction – Approach Distributed computing is the solution “Let’s write our own distributed computing software as a solution to our problem” Checklist design protocols   evelopment takes a long time D design data structures write the code  Expensive: Cost-benefit ratio? assure failure tolerance Build complex software for simple computations? 2/16/10 An Introduction to MapReduce 6
  7. 7. Outline•  Introduction•  Google MapReduce –  Idea –  Overview –  Fault Tolerance –  GFS: Google File System –  Job Example•  Alternative Implementations•  Reception and Criticism•  Trends and Future Development•  Conclusion2/16/10 An Introduction to MapReduce 7
  8. 8. Google MapReduce – Idea A framework for distributed computing Don‘t care about protocols, failure tolerance, etc. Just write your simple computation2/16/10 An Introduction to MapReduce 8
  9. 9. Google MapReduce – Idea MapReduce ParadigmMap: Reduce: Apply function to all Combine all elements elements of a list of a listsquare x = x * x; reduce (+)[1, 2, 3, 4, 5];map square [1, 2, 3, 4, 5]; [1, 4, 9, 16, 25]  152/16/10 An Introduction to MapReduce 9
  10. 10. Google MapReduce – Idea Basic functioning Input Map Reduce Output2/16/10 An Introduction to MapReduce 10
  11. 11. Google MapReduce – Overview MapReduce-Based User Program GFS GFS Split 1 Master Split 2 Intermediate Worker Worker File 1 File 1 Split 3 Intermediate Worker File 2 Worker File 2 Split 4 Intermediate Split 5 Worker File 3 Reduce OutputInput file Map Phase Phase files2/16/10 An Introduction to MapReduce 11
  12. 12. MapReduce – Fault Tolerance•  Workers are periodically pinged by master•  No answer over certain time  worker failedMapper fails: –  Reset map job as idle –  Even if job was completed  intermediate files are inaccessible –  Notify reducers where to get the new intermediate fileReducer fails: –  Reset its job as idle2/16/10 An Introduction to MapReduce 12
  13. 13. MapReduce – Fault ToleranceMaster fails: –  Periodically sets checkpoints –  In case of failure MapReduce-Operation is aborted –  Operation can be restarted from last checkpoint2/16/10 An Introduction to MapReduce 13
  14. 14. Google MapReduce – GFS Google File System•  In-house distributed file system at Google•  Stores all input an output files•  Stores files… – divided into 64 MB blocks – on at least 3 different machines•  Machines running GFS also run MapReduce2/16/10 An Introduction to MapReduce 14
  15. 15. Google MapReduce – Job Example2/16/10 An Introduction to MapReduce 15
  16. 16. Google MapReduce – Job Example2/16/10 An Introduction to MapReduce 16
  17. 17. Google MapReduce – Job Example2/16/10 An Introduction to MapReduce 17
  18. 18. Google MapReduce – Job Example2/16/10 An Introduction to MapReduce 18
  19. 19. Outline•  Introduction•  Google MapReduce –  Idea –  Overview –  Fault Tolerance –  GFS: Google File System –  Job Example•  Alternative Implementations•  Reception and Criticism•  Trends and Future Development•  Conclusion2/16/10 An Introduction to MapReduce 19
  20. 20. Alternative ImplementationsApache Hadoop•  Open-Source-Implementation in Java•  Jobs can be written in C++, Java, Python, etc.•  Used by Yahoo!, Facebook, Amazon and others•  Most commonly used implementation•  HDFS as open-source-implementation of GFS•  Can also use Amazon S3, HTTP(S) or FTP•  Extensions: Hive, Pig, HBase2/16/10 An Introduction to MapReduce 20
  21. 21. Alternative Implementations Mars MapReduce-Implementation for nVidia GPU using the CUDA framework MapReduce-Cell Implementation for the Cell multi-core processor Qizmt MySpace’s implementation of MapReduce in C#2/16/10 An Introduction to MapReduce 21
  22. 22. Alternative Implementations There are many other open- and closed- source implementations of MapReduce!2/16/10 An Introduction to MapReduce 22
  23. 23. Outline•  Introduction•  Google MapReduce –  Idea –  Overview –  Fault Tolerance –  GFS: Google File System –  Job Example•  Alternative Implementations•  Reception and Criticism•  Trends and Future Development•  Conclusion2/16/10 An Introduction to MapReduce 23
  24. 24. Reception and Criticism•  Yahoo!: Hadoop on a 10,000 server cluster•  Facebook analyses the daily log (25TB) on a 1,000 server cluster•  Amazon Elastic MapReduce: Hadoop clusters for rent on EC2 and S3•  IBM and Google: Support university courses in distributed programming•  UC Berkley announced to teach freashmen programming MapReduce2/16/10 An Introduction to MapReduce 24
  25. 25. Reception and Criticism2/16/10 An Introduction to MapReduce 25
  26. 26. Reception and Criticism•  Criticism mainly by RDBMS experts DeWitt and Stonebraker•  MapReduce – is a step backwards in database access – is a poor implementation – is not novel – is missing features that are routinely provided by modern DBMSs – is incompatible with the DBMS tools2/16/10 An Introduction to MapReduce 26
  27. 27. Reception and Criticism Response to criticism MapReduce is no RDBMS It suits well for processing and structuring huge amounts of unstructured data MapReduces big inovation is that it enables distributing data processing across a network of cheap and possibly unreliable computers2/16/10 An Introduction to MapReduce 27
  28. 28. Outline•  Introduction•  Google MapReduce –  Idea –  Overview –  Fault Tolerance –  GFS: Google File System –  Job Example•  Alternative Implementations•  Reception and Criticism•  Trends and Future Development•  Conclusion2/16/10 An Introduction to MapReduce 28
  29. 29. Trends and Future Development Trend of utilizing MapReduce/Hadoop as parallel database•  Hive: Query language for Hadoop•  HBase: Column-oriented distributed database (modeled after Google’s BigTable)•  Map-Reduce-Merge: Adding merge to the paradigm allows implementing features of relational algebra2/16/10 An Introduction to MapReduce 29
  30. 30. Trends and Future Development Trend to use the MapReduce-paradigm to better utilize multi-core CPUs•  Qt Concurrent –  Simplified C++ version of MapReduce for distributing tasks between multiple processor cores•  Mars•  MapReduce-Cell2/16/10 An Introduction to MapReduce 30
  31. 31. Outline•  Introduction•  Google MapReduce –  Idea –  Overview –  Fault Tolerance –  GFS: Google File System –  Job Example•  Alternative Implementations•  Reception and Criticism•  Trends and Future Development•  Conclusion2/16/10 An Introduction to MapReduce 31
  32. 32. Conclusion MapReduce provides an easy solution for the processing of large amounts of data brings a paradigm shift in programming changed the world, i.e. made data processing more efficient and cheaper, is the foundation of many other approaches and solutions2/16/10 An Introduction to MapReduce 32
  33. 33. Questions?2/16/10 An Introduction to MapReduce 33
  34. 34. Thank You!2/16/10 An Introduction to MapReduce 34

×