Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Apache Hadoop YARN: Present and Future

3,392 views

Published on

Published in: Technology
  • http://dbmanagement.info/Tutorials/MapReduce.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Apache Hadoop YARN: Present and Future

  1. 1. Apache Hadoop YARN: Present and Future Vinod Kumar Vavilapalli Hortonworks
  2. 2. © Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 2
  3. 3. © Hortonworks Inc. 2014 A quick show of hands.. • Hadoop 2 Page 3 Architecting the Future of Big Data Real life Hadoop Logo
  4. 4. © Hortonworks Inc. 2014 Who am I? • 6.75 Hadoop-years old • Last thing at School – a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) • Previously @Yahoo! • Now @Hortonworks • Two hats – Hortonworks: Hadoop MapReduce and YARN Development lead – Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member • Worked/working on – YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security – Apache Ambari: Kickstarted the project and its first release – Stinger: High performance data processing with Hadoop/Hive • Lots of trouble shooting on clusters • 99% + code in Apache, Hadoop Page 4 Architecting the Future of Big Data
  5. 5. © Hortonworks Inc. 2014 Agenda • Apache Hadoop 2 : Overview • Past • Present • Future Page 5 Architecting the Future of Big Data
  6. 6. © Hortonworks Inc. 2014 Apache Hadoop 2 Next Generation Architecture Architecting the Future of Big Data Page 6
  7. 7. © Hortonworks Inc. 2014 What is YARN? • Resource Management Platform – MapReduce v2 – Beyond MapReduce with Tez, Storm, Spark; in Hadoop! – Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider? • How is it different from Hadoop 1? .. Page 7 Architecting the Future of Big Data
  8. 8. © Hortonworks Inc. 2014 Hadoop 1 vs Hadoop 2 HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, highly-available & reliable storage) YARN (cluster resource management) MapReduce (data processing) Others HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 8
  9. 9. © Hortonworks Inc. 2014 Key Benefits of YARN • Scale • New Programming Models & Services • Improved cluster utilization • Agility • To infinity and beyond .. Page 9
  10. 10. © Hortonworks Inc. 2014 Why Migrate? • 2.0 >= 2 * 1.0 – HDFS: Lots of ground-breaking features – YARN: Next generation architecture • Return on Investment: 2x throughput on same hardware! • Ready for improvements in hardware • Not convinced? Let’s see what others are saying! Page 10 Architecting the Future of Big Data
  11. 11. © Hortonworks Inc. 2014 Yahoo! • Leader/Visionary on all things Hadoop! • On YARN (0.23.x) • Moving fast to 2.x Page 11 Architecting the Future of Big Data http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
  12. 12. © Hortonworks Inc. 2014 Twitter Page 12 Architecting the Future of Big Data
  13. 13. © Hortonworks Inc. 2014 Ebay • Has one of the largest Hadoop clusters in the industry with many petabytes of data • Migrated production clusters to Hadoop-2 • Go to Mayank’s talk – “Hadoop-2 @ ebay”! – Thursday, April 3 – Track : Deployment and Operations • Should be convinced by now .. . No? Page 13 Architecting the Future of Big Data
  14. 14. © Hortonworks Inc. 2014 YARN: the Data Operating System Page 14 Architecting the Future of Big Data
  15. 15. © Hortonworks Inc. 2014 Present Architecting the Future of Big Data Page 15
  16. 16. © Hortonworks Inc. 2014 Apache Hadoop releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – Binary Compatibility for MapReduce applications built on hadoop-1.x – YARN level APIs solidified for the future – Performance – Scale! • HDFS – High Availability for HDFS – HDFS Federation – HDFS Snapshots – NFSv3 access to data in HDFS • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem Page 16 Architecting the Future of Big Data Apache Hadoop 2.2
  17. 17. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Alpha features in YARN – ResourceManager HA – Application History – Will cover in the 2.4 content • HDFS – Details follow.. • Number of bug-fixes, enhancements Page 17 Architecting the Future of Big Data Apache Hadoop 2.3
  18. 18. © Hortonworks Inc. 2014 HDFS: Heterogeneous Storage Page 18 Architecting the Future of Big Data
  19. 19. © Hortonworks Inc. 2014 HDFS: DataNode caching Page 19 Architecting the Future of Big Data
  20. 20. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • Very soon! • YARN – Details follow.. – ResourceManager restart fail-over for high availability – Preemption – Application History and timeline • HDFS – FileSystem ACLs – Rolling upgrades Page 20 Architecting the Future of Big Data Apache Hadoop 2.4
  21. 21. © Hortonworks Inc. 2014 ResourceManager Restart and fail-over Page 21 Architecting the Future of Big Data ZooKeeper
  22. 22. © Hortonworks Inc. 2014 Capacity Scheduler Preemption Page 22 Architecting the Future of Big Data
  23. 23. © Hortonworks Inc. 2014 Application History and Timeline • Few MR specific implementations: History and web-UI • Not just MR anymore! • History – MapReduce specific Job History Server – Beyond ResourceManager Restart • Timeline – Framework specific event collection and UIs • Run analytics on historical apps! Page 23 Architecting the Future of Big Data
  24. 24. © Hortonworks Inc. 2014 Future Architecting the Future of Big Data Page 24
  25. 25. © Hortonworks Inc. 2014 Future: Operational enhancements • Rolling upgrades – No/minimal impact to users – Ideal: Always rolling! • HDFS in • YARN Page 25 Architecting the Future of Big Data
  26. 26. © Hortonworks Inc. 2014 Future: Enabling more apps • Beyond MR • Discussing next – Long running services – Isolation – Multi-dimensional resource scheduling Page 26 Architecting the Future of Big Data
  27. 27. © Hortonworks Inc. 2014 Future: Long running services • You can run them already! • Few enhancements needed – Logs – Security – Management/monitoring • Resource sharing across workload types • Project Slider Page 27 Architecting the Future of Big Data
  28. 28. © Hortonworks Inc. 2014 Fine-grain isolation for multi-tenancy • Custom memory-monitoring • Cgroups • Linux Containers • VMs Page 28 Architecting the Future of Big Data
  29. 29. © Hortonworks Inc. 2014 Multi-resource scheduling • Today – memory & cpu – Physical memory / virtual memory – Cpu Cores – Virtual cores • CPU stuff: More bake in • Disks – Space – IOPS • Network Page 29 Architecting the Future of Big Data
  30. 30. © Hortonworks Inc. 2014 Other features • Application SLAs • Node labels • Node affinity/anti-affinity • Better online queue-management Page 30 Architecting the Future of Big Data
  31. 31. © Hortonworks Inc. 2014 YARN Ecosystem Beyond the core YARN project: Briefly Architecting the Future of Big Data Page 31
  32. 32. © Hortonworks Inc. 2014 Eco-system Page 32 Applications Powered by YARN Apache Giraph – Graph Processing Apache Hama – BSP Apache Hadoop MapReduce – Batch Apache Tez – Batch/Interactive Apache S4 – Stream Processing Apache Samza – Stream Processing Apache Storm – Stream Processing Apache Spark – Iterative applications HOYA – HBase on YARN YARN Frameworks Apache Twill REEF by Microsoft Spring support for Hadoop 2 There's an app for that... YARN App Marketplace!
  33. 33. © Hortonworks Inc. 2014 Apache TEZ • Moving beyond MR • A data processing framework that can execute a complex DAG of tasks. • “Apache Tez - A New Chapter in Hadoop Data Processing” – By Siddharth Seth: YARN & Tez Committer/PMC Member – Thursday, April 3 (4:20-5:00pm) Page 33 Architecting the Future of Big Data
  34. 34. © Hortonworks Inc. 2014 Recap Architecting the Future of Big Data Page 34
  35. 35. © Hortonworks Inc. 2014 Recap Page 35 Architecting the Future of Big Data • Apache Hadoop 2 is, at least, twice as good! • Exciting journey with Hadoop for this decade… – Hadoop is no longer a one-trick pony, err elephant – Beyond just HDFS & MapReduce • Architecture for the future – Centralized data – Exciting spectrum of application types, workloads and usecases
  36. 36. © Hortonworks Inc. 2014 Couple more things.. Architecting the Future of Big Data Page 36
  37. 37. © Hortonworks Inc. 2014 The Book is out! Page 37 Architecting the Future of Big Data
  38. 38. © Hortonworks Inc. 2014 Page 38 Architecting the Future of Big Data
  39. 39. © Hortonworks Inc. 2014 Thank you! Page 39 Download Sandbox: Experience Apache Hadoop Both 2.x and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions Time!

×