Your SlideShare is downloading. ×
Apache Hadoop YARN: Present and Future
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Apache Hadoop YARN: Present and Future

1,797
views

Published on

Published in: Technology

1 Comment
11 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,797
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
0
Comments
1
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Graph processing – Giraph, Hama
    Stream proessing – Smaza, Storm, Spark, DataTorrent
    MapReduce
    Tez – fast query execution

    Weave/REEF – frameworks to help with writing applications


    List of some of the applications which already support YARN, in some form.
    Smaza, Storm, S4 and DataTorrent are streaming frameworks
    Various types of graph processing frameworks – Giraph and Hama are graph processing systems
    There’s some github projects – caching systems, on-demand web-server spin up

    Wave and REEF are frameworks on top of YARN to make writing applications easier
  • Transcript

    • 1. © Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Jian He jianhe [at] apache.org Page 1
    • 2. © Hortonworks Inc. 2014 Who are we? • Vinod Kumar Vavilapalli – 7 Hadoop-years old – Previously @Yahoo!, now @Hortonworks – Hadoop MapReduce and YARN Development lead & Architect at Hortonworks – Apache Hadoop YARN project lead – Apache Hadoop PMC, Apache Member – 99% + code in Apache, Hadoop • Jian He – Software Engineer @ Hortonworks – Apache Hadoop Committer – Masters Degree from Brown University. – Focus on YARN/MapReduce Page 2 Architecting the Future of Big Data
    • 3. © Hortonworks Inc. 2014 A quick show of hands.. • Hadoop 1 • Hadoop 2 & YARN • YARN for MapReduce2 • YARN for beyond MR2 Page 3 Architecting the Future of Big Data
    • 4. © Hortonworks Inc. 2014 Agenda • Apache Hadoop 2 : Overview • Community • Present • Future Page 4 Architecting the Future of Big Data
    • 5. © Hortonworks Inc. 2014 Apache Hadoop 2 Next Generation Architecture Architecting the Future of Big Data Page 5
    • 6. © Hortonworks Inc. 2014 YARN: the Data Operating System Page 6 Architecting the Future of Big Data • Resource Management Platform • MapReduce v2 • Beyond MapReduce with Tez, Storm, Spark; in Hadoop! • Did I mention Services like HBase, Accumulo on YARN with Apache Slider?
    • 7. © Hortonworks Inc. 2014 Why? • 2.0 >= 2 * 1.0 – YARN: Next generation architecture • Scale • Agility • Return on Investment: 2x throughput on same hardware! • Ready for improvements in hardware • Not convinced? Let’s see what others are saying! Page 7 Architecting the Future of Big Data
    • 8. © Hortonworks Inc. 2014 Yahoo! • Leader/Visionary on all things Hadoop! • On YARN (0.23.x) • Moving fast to 2.x Page 8 Architecting the Future of Big Data http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
    • 9. © Hortonworks Inc. 2014 Twitter Page 9 Architecting the Future of Big Data Talk: “ Hadoop 2 @Twitter, Elephant Scale” By: Lohit Vijayarenu & Gera Shegalov
    • 10. © Hortonworks Inc. 2014 Ebay • Has one of the largest Hadoop clusters in the industry with tens- hundreds petabytes of data • Migrated production clusters to Hadoop-2 Page 10 Architecting the Future of Big Data
    • 11. © Hortonworks Inc. 2014 YARN Community At Apache Software Foundation Architecting the Future of Big Data Page 11
    • 12. © Hortonworks Inc. 2014 YARN contributions Page 12 Architecting the Future of Big Data 0 50 100 150 200 250 300 350 400 2.0.x 2.1.x 2.2.x 2.3.x 2.4.x 2.x trunk YARN Releases - 06/02/14 YARN Releases - 06/02/14
    • 13. © Hortonworks Inc. 2014 Contributors • 104 and counting • Few ‘big’ contributors • And a long tail Page 13 Architecting the Future of Big Data 0 10 20 30 40 50 60 70 80 90 100
    • 14. © Hortonworks Inc. 2014 Present Architecting the Future of Big Data Page 14
    • 15. © Hortonworks Inc. 2014 Apache Hadoop releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – YARN level APIs solidified for the future – Binary Compatibility for MapReduce applications built on hadoop-1.x – Performance – Scale! • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem – Pig, Hive, Oozie, HBase.. Page 15 Architecting the Future of Big Data Apache Hadoop 2.2
    • 16. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Alpha features in YARN – ResourceManager High Availability – Application History Server – Will be covered in detail in the 2.4 section • Number of bug-fixes, enhancements Page 16 Architecting the Future of Big Data Apache Hadoop 2.3
    • 17. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • 7 April, 2014 • Most recent release • Stabilizing features in YARN – Details follow – ResourceManager HA – YARN Timeline Server (beyond history server) – Preemption in YARN CapacityScheduler – Container-preserving AM recovery. Page 17 Architecting the Future of Big Data Apache Hadoop 2.4
    • 18. © Hortonworks Inc. 2014 ResourceManager High Availability Page 18 Architecting the Future of Big Data • RM – single point of failure • Goal : Downtime invisible to end-users – Apps not required to be re-submitted – NMs to rebind with newly started RM • Two stories: – Recovery of state – Failover
    • 19. © Hortonworks Inc. 2014 ResourceManager High Availability Page 19 Architecting the Future of Big Data • Active/Standby o Leader election (ZooKeeper) • Standby on transition to Active loads all the state from the state store. • NM, AM, clients, redirect to the new RM o RMProxy lib Talk: Highly Available Resource Management for YARN By: Karthik Kambatla, Xuan Gong
    • 20. © Hortonworks Inc. 2014 YARN Timeline Server • Few MR specific implementations: History and web-UI • YARN: Not just MR anymore! • Previous state – MapReduce specific Job History Server – YARN level ‘History’ lost beyond ResourceManager Restart Page 20 Architecting the Future of Big Data
    • 21. © Hortonworks Inc. 2014 YARN Timeline Server (contd) Page 21 Entity and Event collection RM and Applications periodically send events to Timeline sever Pluggable store Depending on site requirements REST APIs or RPC Applications and user-interfaces can access information via REST/ RPC Visualizations Users can build tools and visualizations using the APIs Apps and System Applications as well as the system entities/events
    • 22. © Hortonworks Inc. 2014 YARN Timeline Server (contd) Page 22 Architecting the Future of Big Data YARN Timeline Serv`er App1 App2 RM Custom App monitoring client RPC REST API Events Events AMBARI Events Talk: “Analyzing Historical Data of Applications on Hadoop YARN: for Fun and Profit” By: Zhijie Shen, Mayank Bansal
    • 23. © Hortonworks Inc. 2014 Capacity Scheduler Preemption • Enforce SLAs • Preempt across queues • Current Capacity • Guaranteed Capacity Gather Queue State STEP1 • Select applications to preempt: Over cap. Qs Identify preemptions STEP2 • Issue preemptions for containers to application Issue preemptions STEP3 • Track containers that have been issued by not yet executed preemption • Forcibly kill these containers after timeout Kill containers STEP4
    • 24. © Hortonworks Inc. 2014 Capacity Scheduler Preemption (Contd) Application Scheduler Page 24 Architecting the Future of Big Data Premptions Release Resource Premptions Kill containers forcibly after timeout x
    • 25. © Hortonworks Inc. 2014 Container-preserving AM restart • Problem – Containers are killed when AM goes down. – New AM needs to know where the previous containers are running – Previous containers need to know about the new AM. (WIP) Page 25 Architecting the Future of Big Data Container1 Container2 Container3 AM1 AM2 restart
    • 26. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • Next releases – 2.4.1 – 2.5.x • YARN – Details follow in future’s section – ResourceManager work-preserving restart for High Availability – YARN Timeline Server security & enhancement. – Lots more Page 26 Architecting the Future of Big Data Apache Hadoop 2.5.x
    • 27. © Hortonworks Inc. 2014 Future Architecting the Future of Big Data Page 27
    • 28. © Hortonworks Inc. 2014 Future: Operational enhancements • Rolling upgrades – No/minimal impact to users – Ideal: Always rolling! • HDFS upgrades effort is in • YARN – RM restart – NM restart – Upgrades Page 28 Architecting the Future of Big Data Talk: “Hadoop Rolling Upgrades – Taking Availability to the Next Level” By: Suresh Srinvias, Hortonworks & Jason Lowe Yahoo!
    • 29. © Hortonworks Inc. 2014 Future: Enabling apps • Beyond MapReduce – Apache Tez, Apache Slider, Apache Storm. • Discussing next – Long running services – Multi-dimensional resource scheduling – Isolation – Web services Page 29 Architecting the Future of Big Data
    • 30. © Hortonworks Inc. 2014 Future: Long running services • You can run them already! • Few enhancements needed – Logs – Security – Management/monitoring • Resource sharing across workload types Page 30 Architecting the Future of Big Data Talk: “ Bring your Service to YARN” By: Sumit Mohanty
    • 31. © Hortonworks Inc. 2014 Multi-resource scheduling • Today – memory & cpu – Physical memory / virtual memory – CPU Cores – Virtual cores • CPU stuff: More bake in • Disks – Space – IOPS • Network Page 31 Architecting the Future of Big Data
    • 32. © Hortonworks Inc. 2014 Fine-grain isolation for multi-tenancy • Custom memory-monitoring • Cgroups • Linux Containers • VMs Page 32 Architecting the Future of Big Data
    • 33. © Hortonworks Inc. 2014 Other features • Application SLAs – Run my application at 6:00 AM tomorrow and guarantee capacity for me! • Node labels – Some of the nodes in my cluster have specialized hardware, give them to me! • Node affinity/anti-affinity – Get me on to the nodes where my data is – Get me off of this node • Better online queue-management – Centralized – Quality feedback • Web-services – RESTful APIs for submitting, monitoring and killing apps – Beyond java-only clients Page 33 Architecting the Future of Big Data
    • 34. © Hortonworks Inc. 2014 YARN Ecosystem Beyond the core YARN project: Briefly Architecting the Future of Big Data Page 34
    • 35. © Hortonworks Inc. 2014 Eco-system Page 35 Classic Apache Hadoop MapReduce – Batch Batch & Interactive • Apache Tez – Batch/Interactive Stream Processing • Apache Storm • Apache Samza Apache Spark – Iterative applications YARN Frameworks • Apache Twill • Microsoft REEF There's an app for that... YARN App Marketplace! Existing apps • Apache Slider Graph Processing • Apache Giraph Applications Powered by YARN Talk: Apache Tez - A New Chapter in Hadoop Data Processing” By Bikas Saha, Hitesh Shah
    • 36. © Hortonworks Inc. 2014 Recap Architecting the Future of Big Data Page 36
    • 37. © Hortonworks Inc. 2014 Recap Page 37 Architecting the Future of Big Data • YARN helps Apache Hadoop 2 to be twice as good! • Exciting journey with Hadoop for this decade… – Hadoop is no longer a one-trick pony, err elephant – Beyond just MapReduce • Hadoop 2: Architecture for the future – Centralized data, multiple apps • Lots of exciting new features – Exciting spectrum of application types, workloads and use-cases
    • 38. © Hortonworks Inc. 2014 Couple more things.. Architecting the Future of Big Data Page 38
    • 39. © Hortonworks Inc. 2014 The Book is out! Page 39 Architecting the Future of Big Data
    • 40. © Hortonworks Inc. 2014 Page 40 Architecting the Future of Big Data
    • 41. © Hortonworks Inc. 2014 Thank you! Page 41 Download Sandbox: Experience Apache Hadoop Both 2.x and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions Time!

    ×