Past Present and Future of Data Processing in Apache Hadoop

  • 1,668 views
Uploaded on

Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to …

Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to recast Hadoop as a much more powerful data-processing system. As a result Hadoop looks very different from itself 12 months ago. Now, ever wonder what it might look like in 12 months or 24 months or longer? This talk will take you through some ideas for YARN itself and the many myriad ways it`s really moving the needle for MapReduce, Pig, Hive, Cascading and other data-processing tools in the Hadoop ecosystem.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,668
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
0
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Processing with HadoopLooking Back, Looking AheadArun C. MurthyFounder & Architect@acmurthy (@hortonworks) Page 1
  • 2. Hello!• Founder/Architect at Hortonworks Inc. – Lead - Map-Reduce/YARN/Tez – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MapReduce as a service for all of Yahoo (~50k nodes footprint)• Apache Hadoop, ASF – Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time for 7 years) – Release Manager for hadoop-2.x © Hortonworks Inc. 2013 Page 2
  • 3. Once upon a time … … long, long ago, there was a kingdom we shall call Apache Hadoop http://2.bp.blogspot.com/-hIp99urgxCk/UAsSFo4i8YI/AAAAAAAAAFg/IzjNDwrBBVg/s1600/magickingdo © Hortonworks Inc. 2013 Page 3
  • 4. Hadoop begat … … a two-headed monster on every node in the kingdom; each belonged to a different clan and answered to a different master http://4.bp.blogspot.com/_C7CsfdqySYc/TNSKvIwiFcI/AAAAAAAAAbs/2FSU2TV_rRA/s1600/Two-Headed+Monster+-+With+Identifiers+-+Jan+19,+2009_0.jpg © Hortonworks Inc. 2013 Page 4
  • 5. Knights of Bytes - HDFS… stored data uncompromisingly in directories/files, nary a care about contents http://whoiscraigmoser.com/Images/identity/knight.png © Hortonworks Inc. 2013 Page 5
  • 6. Prince of Processing - MapReduce He ruled with an iron fist by mapping, and then by mercilessly reducing data http://media.comicvine.com/uploads/14/144886/2868181-sauron.jpg © Hortonworks Inc. 2013 Page 6
  • 7. Peace Reigned… for a while with the odd change in the direction of the wind http://www.get-covers.com/wp-content/uploads/2012/07/Peace.jpg © Hortonworks Inc. 2013 Page 7
  • 8. Slowly, but surely …Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp © Hortonworks Inc. 2013 Page 8
  • 9. Slowly, but surely …Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp © Hortonworks Inc. 2013 Page 9
  • 10. Slowly, but surely … … people of the kingdom clamored for more. A palpable sense of greed & expectation. http://sidoxia.files.wordpress.com/2011/11/wall-st-greed-st1.jpg © Hortonworks Inc. 2013 Page 10
  • 11. Signs of Distress SQL said some, others said Machine Learning, still others said Real-Time Event Processing http://www.truth-seeker.info/wp-content/uploads/2012/11/distress.jpg © Hortonworks Inc. 2013 Page 11
  • 12. A Meeting at the SummitMapReduce is dead! Err… not quite.We need more options! We need more! True… http://4.bp.blogspot.com/- oqr1t6avx6g/TW55kUnmQvI/AAAAAAAAMMk/q9Jc87MSG4g/s400/arab%2Bleague%2Bround%2Btable%2B%2Bbig%2Bgood%2B2011.bmp © Hortonworks Inc. 2013 Page 12
  • 13. A Meeting at the SummitA common thread YARN running through all applications… Long live the King! http://whipup.net/wp-content/images/2008/08/yarn.gif © Hortonworks Inc. 2013 Page 13
  • 14. The Edict Henceforth, in the Kingdom of King YARN… MapReduce has been relegated to the status of, merely, one of the applications! http://www.napavintners.org/images/winery_Labels/EdictWines-800HW.jpg © Hortonworks Inc. 2013 Page 14
  • 15. Reign of King YARNKing YARN came to thronewith promises to return powerto all applicationsequally, lower performancetaxes and resourcemanagement… http://images.fineartamerica.com/images-medium-large/the-coronation-the-crown-that-queen-everett.jpg © Hortonworks Inc. 2013 Page 15
  • 16. Oh the Shame!Well, at least, PrinceMapReduce still hadpowerful allies likeHighnessHive, PowerfulPig, CheeryCascading… http://www.gibbsmagazine.com/MPj03414090000%5B1%5D.jpg © Hortonworks Inc. 2013 Page 16
  • 17. Things get worse before betterUnfortunately, things got a lot worse for the Prince MapReduce… http://www.deviantart.com/download/144412184/Smile__Tomorrow_will_be_worse__by_daGrevis.jpg © Hortonworks Inc. 2013 Page 17
  • 18. Knight Tez He did MapReduce, and so much more… Smartly aligned himself to Kingdom YARN. http://twomorrows.com/alterego/media/08shiningknight.gif © Hortonworks Inc. 2013 Page 18
  • 19. Knight TezLong term alliances of MapReduce withHive, Pig, Cascading etc. broke up… … they decided to throw their lot with Knight Tez! http://informatica.upg-ploiesti.ro/62689/img/partners.jpg http://www.officialpsds.com/images/thumbs/broken-glass-psd44132.png © Hortonworks Inc. 2013 Page 19
  • 20. Happily ever after… (nothing cute to say) © Hortonworks Inc. 2013 Page 20
  • 21. On a more serious note… © Hortonworks Inc. 2013 Page 21
  • 22. Every season has a flavor… SQL-on-Hadoop is the new black! SQL-on-Hadoop will be solved within the existing ecosystem © Hortonworks Inc. 2013 Page 22
  • 23. Looking ahead What will it be next year? Real-time event processing? Machine Learning? © Hortonworks Inc. 2013 Page 23
  • 24. Play to our strengths Invest in the Apache Hadoop platform and the ecosystem (Hive et al). © Hortonworks Inc. 2013 Page 24
  • 25. Seriously…Technical Details © Hortonworks Inc. 2013 Page 25
  • 26. Hadoop MapReduce – The System © Hortonworks Inc. 2013 Page 26
  • 27. Hadoop MapReduce – The Paradigm m m0 m1 m2 m3 m4 r r0 r1 r2 © Hortonworks Inc. 2013 Page 27
  • 28. Hadoop YARN Node Node Manager Manager Container App Mstr App Mstr Client Resource Node Node Resource Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container
  • 29. Tez - Core IdeasTask <Input, Processor & Output> Input Processor Output Task Tez Task - <Input, Processor, Output> YARN ApplicationMaster to run DAG of Tasks © Hortonworks Inc. 2013 Page 29
  • 30. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*) FROM a JOIN b ON (a.id = b.id) GROUP BY a.state I/O Synchronization I/O Pipelining Barrier Pig/Hive - MR Pig/Hive - Tez © Hortonworks Inc. 2013 Page 30
  • 31. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Job 1 Job 2I/O Synchronization Barrier Job 3 I/O Synchronization Barrier Single Job Job 4 Pig/Hive - MR Pig/Hive - Tez © Hortonworks Inc. 2013 Page 31
  • 32. Thank You! Questions (surely) & Answers (maybe)@acmurthy © Hortonworks Inc. 2013 Page 32