Data Processing with HadoopLooking Back, Looking AheadArun C. MurthyFounder & Architect@acmurthy (@hortonworks)           ...
Hello!• Founder/Architect at Hortonworks Inc.  – Lead - Map-Reduce/YARN/Tez  – Formerly, Architect Hadoop MapReduce, Yahoo...
Once upon a time …  … long, long ago, there was a kingdom we shall call                   Apache Hadoop                   ...
Hadoop begat … … a two-headed monster on every node in the kingdom;  each belonged to a different clan and answered to a  ...
Knights of Bytes - HDFS… stored data uncompromisingly in directories/files, nary a                  care about contents   ...
Prince of Processing - MapReduce      He ruled with an iron fist by mapping,      and then by mercilessly reducing data  h...
Peace Reigned… for a while with the odd change in the direction of the wind                                               ...
Slowly, but surely …Human beings define reality through misery and suffering.                                            -...
Slowly, but surely …Human beings define reality through misery and suffering.                                            -...
Slowly, but surely …    … people of the kingdom clamored for more.     A palpable sense of greed & expectation.           ...
Signs of Distress          SQL said some, others said Machine Learning,            still others said Real-Time Event Proce...
A Meeting at the SummitMapReduce is dead!                       Err… not quite.We need more options! We need more!        ...
A Meeting at the SummitA common thread YARN running through all applications…                                    Long live...
The Edict     Henceforth, in the Kingdom of King YARN…     MapReduce has been relegated to the status         of, merely, ...
Reign of King YARNKing YARN came to thronewith promises to return powerto all applicationsequally, lower performancetaxes ...
Oh the Shame!Well, at least, PrinceMapReduce still hadpowerful allies likeHighnessHive, PowerfulPig, CheeryCascading…     ...
Things get worse before betterUnfortunately, things got a lot worse for the Prince MapReduce…                             ...
Knight Tez      He did MapReduce, and so much more…      Smartly aligned himself to Kingdom YARN.                         ...
Knight TezLong term alliances of MapReduce withHive, Pig, Cascading etc. broke up…                                        ...
Happily ever after…         (nothing cute to say)               © Hortonworks Inc. 2013   Page 20
On a more serious note…           © Hortonworks Inc. 2013   Page 21
Every season has a flavor…   SQL-on-Hadoop is the new black!  SQL-on-Hadoop will be solved within       the existing ecosy...
Looking ahead       What will it be next year?     Real-time event processing?         Machine Learning?                © ...
Play to our strengths Invest in the Apache Hadoop platform    and the ecosystem (Hive et al).                © Hortonworks...
Seriously…Technical Details                    © Hortonworks Inc. 2013   Page 25
Hadoop MapReduce – The System             © Hortonworks Inc. 2013   Page 26
Hadoop MapReduce – The Paradigm      m              m0                m1   m2   m3   m4      r                            ...
Hadoop YARN                                           Node                                           Node                 ...
Tez - Core IdeasTask <Input, Processor & Output>            Input   Processor   Output                       Task    Tez T...
Pig/Hive-MR versus Pig/Hive-Tez                                 SELECT a.state, COUNT(*)                               FRO...
Pig/Hive-MR versus Pig/Hive-Tez                                           SELECT a.state, COUNT(*), AVERAGE(c.price)      ...
Thank You! Questions (surely) & Answers (maybe)@acmurthy                    © Hortonworks Inc. 2013   Page 32
Upcoming SlideShare
Loading in...5
×

Past Present and Future of Data Processing in Apache Hadoop

1,779

Published on

Apache Hadoop MapReduce has undergone a complete re-haul to emerge as Apache Hadoop YARN, a generic compute fabric to support MapReduce and other application paradigms. This really changes the game to recast Hadoop as a much more powerful data-processing system. As a result Hadoop looks very different from itself 12 months ago. Now, ever wonder what it might look like in 12 months or 24 months or longer? This talk will take you through some ideas for YARN itself and the many myriad ways it`s really moving the needle for MapReduce, Pig, Hive, Cascading and other data-processing tools in the Hadoop ecosystem.

Published in: Technology

Transcript of "Past Present and Future of Data Processing in Apache Hadoop"

  1. 1. Data Processing with HadoopLooking Back, Looking AheadArun C. MurthyFounder & Architect@acmurthy (@hortonworks) Page 1
  2. 2. Hello!• Founder/Architect at Hortonworks Inc. – Lead - Map-Reduce/YARN/Tez – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MapReduce as a service for all of Yahoo (~50k nodes footprint)• Apache Hadoop, ASF – Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time for 7 years) – Release Manager for hadoop-2.x © Hortonworks Inc. 2013 Page 2
  3. 3. Once upon a time … … long, long ago, there was a kingdom we shall call Apache Hadoop http://2.bp.blogspot.com/-hIp99urgxCk/UAsSFo4i8YI/AAAAAAAAAFg/IzjNDwrBBVg/s1600/magickingdo © Hortonworks Inc. 2013 Page 3
  4. 4. Hadoop begat … … a two-headed monster on every node in the kingdom; each belonged to a different clan and answered to a different master http://4.bp.blogspot.com/_C7CsfdqySYc/TNSKvIwiFcI/AAAAAAAAAbs/2FSU2TV_rRA/s1600/Two-Headed+Monster+-+With+Identifiers+-+Jan+19,+2009_0.jpg © Hortonworks Inc. 2013 Page 4
  5. 5. Knights of Bytes - HDFS… stored data uncompromisingly in directories/files, nary a care about contents http://whoiscraigmoser.com/Images/identity/knight.png © Hortonworks Inc. 2013 Page 5
  6. 6. Prince of Processing - MapReduce He ruled with an iron fist by mapping, and then by mercilessly reducing data http://media.comicvine.com/uploads/14/144886/2868181-sauron.jpg © Hortonworks Inc. 2013 Page 6
  7. 7. Peace Reigned… for a while with the odd change in the direction of the wind http://www.get-covers.com/wp-content/uploads/2012/07/Peace.jpg © Hortonworks Inc. 2013 Page 7
  8. 8. Slowly, but surely …Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp © Hortonworks Inc. 2013 Page 8
  9. 9. Slowly, but surely …Human beings define reality through misery and suffering. - Agent Smith http://api.ning.com/files/*oWmhl7LBlXuodD2itWUUtOautEVfD*pbBn57L8ThCyYIykiTuzkO4lJY1bwaNbJF7GecTDwsVj3EFHpDM-F1y-UW4b3Xsvh/matrix_revolutions_agent_smith_04.bmp © Hortonworks Inc. 2013 Page 9
  10. 10. Slowly, but surely … … people of the kingdom clamored for more. A palpable sense of greed & expectation. http://sidoxia.files.wordpress.com/2011/11/wall-st-greed-st1.jpg © Hortonworks Inc. 2013 Page 10
  11. 11. Signs of Distress SQL said some, others said Machine Learning, still others said Real-Time Event Processing http://www.truth-seeker.info/wp-content/uploads/2012/11/distress.jpg © Hortonworks Inc. 2013 Page 11
  12. 12. A Meeting at the SummitMapReduce is dead! Err… not quite.We need more options! We need more! True… http://4.bp.blogspot.com/- oqr1t6avx6g/TW55kUnmQvI/AAAAAAAAMMk/q9Jc87MSG4g/s400/arab%2Bleague%2Bround%2Btable%2B%2Bbig%2Bgood%2B2011.bmp © Hortonworks Inc. 2013 Page 12
  13. 13. A Meeting at the SummitA common thread YARN running through all applications… Long live the King! http://whipup.net/wp-content/images/2008/08/yarn.gif © Hortonworks Inc. 2013 Page 13
  14. 14. The Edict Henceforth, in the Kingdom of King YARN… MapReduce has been relegated to the status of, merely, one of the applications! http://www.napavintners.org/images/winery_Labels/EdictWines-800HW.jpg © Hortonworks Inc. 2013 Page 14
  15. 15. Reign of King YARNKing YARN came to thronewith promises to return powerto all applicationsequally, lower performancetaxes and resourcemanagement… http://images.fineartamerica.com/images-medium-large/the-coronation-the-crown-that-queen-everett.jpg © Hortonworks Inc. 2013 Page 15
  16. 16. Oh the Shame!Well, at least, PrinceMapReduce still hadpowerful allies likeHighnessHive, PowerfulPig, CheeryCascading… http://www.gibbsmagazine.com/MPj03414090000%5B1%5D.jpg © Hortonworks Inc. 2013 Page 16
  17. 17. Things get worse before betterUnfortunately, things got a lot worse for the Prince MapReduce… http://www.deviantart.com/download/144412184/Smile__Tomorrow_will_be_worse__by_daGrevis.jpg © Hortonworks Inc. 2013 Page 17
  18. 18. Knight Tez He did MapReduce, and so much more… Smartly aligned himself to Kingdom YARN. http://twomorrows.com/alterego/media/08shiningknight.gif © Hortonworks Inc. 2013 Page 18
  19. 19. Knight TezLong term alliances of MapReduce withHive, Pig, Cascading etc. broke up… … they decided to throw their lot with Knight Tez! http://informatica.upg-ploiesti.ro/62689/img/partners.jpg http://www.officialpsds.com/images/thumbs/broken-glass-psd44132.png © Hortonworks Inc. 2013 Page 19
  20. 20. Happily ever after… (nothing cute to say) © Hortonworks Inc. 2013 Page 20
  21. 21. On a more serious note… © Hortonworks Inc. 2013 Page 21
  22. 22. Every season has a flavor… SQL-on-Hadoop is the new black! SQL-on-Hadoop will be solved within the existing ecosystem © Hortonworks Inc. 2013 Page 22
  23. 23. Looking ahead What will it be next year? Real-time event processing? Machine Learning? © Hortonworks Inc. 2013 Page 23
  24. 24. Play to our strengths Invest in the Apache Hadoop platform and the ecosystem (Hive et al). © Hortonworks Inc. 2013 Page 24
  25. 25. Seriously…Technical Details © Hortonworks Inc. 2013 Page 25
  26. 26. Hadoop MapReduce – The System © Hortonworks Inc. 2013 Page 26
  27. 27. Hadoop MapReduce – The Paradigm m m0 m1 m2 m3 m4 r r0 r1 r2 © Hortonworks Inc. 2013 Page 27
  28. 28. Hadoop YARN Node Node Manager Manager Container App Mstr App Mstr Client Resource Node Node Resource Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container
  29. 29. Tez - Core IdeasTask <Input, Processor & Output> Input Processor Output Task Tez Task - <Input, Processor, Output> YARN ApplicationMaster to run DAG of Tasks © Hortonworks Inc. 2013 Page 29
  30. 30. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*) FROM a JOIN b ON (a.id = b.id) GROUP BY a.state I/O Synchronization I/O Pipelining Barrier Pig/Hive - MR Pig/Hive - Tez © Hortonworks Inc. 2013 Page 30
  31. 31. Pig/Hive-MR versus Pig/Hive-Tez SELECT a.state, COUNT(*), AVERAGE(c.price) FROM a JOIN b ON (a.id = b.id) JOIN c ON (a.itemId = c.itemId) GROUP BY a.state Job 1 Job 2I/O Synchronization Barrier Job 3 I/O Synchronization Barrier Single Job Job 4 Pig/Hive - MR Pig/Hive - Tez © Hortonworks Inc. 2013 Page 31
  32. 32. Thank You! Questions (surely) & Answers (maybe)@acmurthy © Hortonworks Inc. 2013 Page 32

×