Your SlideShare is downloading. ×
0
© Hortonworks Inc. 2014
Apache Hadoop YARN
Present and Future
Vinod Kumar Vavilapalli
vinodkv [at] apache.org
@tshooter
Pa...
© Hortonworks Inc. 2014
A quick show of hands..
• Hadoop 2
Page 2
Architecting the Future of Big Data
Real life Hadoop Logo
© Hortonworks Inc. 2014
Who am I?
• 6.75 Hadoop-years old
• Last thing at School – a two node Tomcat cluster. Three months...
© Hortonworks Inc. 2014
Agenda
• Apache Hadoop 2 : Overview
• Past
• Present
• Future
Page 4
Architecting the Future of Bi...
© Hortonworks Inc. 2014
Apache Hadoop 2
Next Generation Architecture
Architecting the Future of Big Data
Page 5
© Hortonworks Inc. 2014
What is YARN?
• Resource Management Platform
– MapReduce v2
– Beyond MapReduce with Tez, Storm, Sp...
© Hortonworks Inc. 2014
Hadoop 1 vs Hadoop 2
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource man...
© Hortonworks Inc. 2014
Key Benefits of YARN
• Scale
• New Programming Models & Services
• Improved cluster utilization
• ...
© Hortonworks Inc. 2014
Why Migrate?
• 2.0 >= 2 * 1.0
– HDFS: Lots of ground-breaking features
– YARN: Next generation arc...
© Hortonworks Inc. 2014
Yahoo!
• Leader/Visionary on all things Hadoop!
• On YARN (0.23.x)
• Moving fast to 2.x
Page 10
Ar...
© Hortonworks Inc. 2014
Twitter
Page 11
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Ebay
• Has one of the largest Hadoop clusters in the industry with many
petabytes of data
• Migrat...
© Hortonworks Inc. 2014
YARN: the Data Operating System
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Present
Architecting the Future of Big Data
Page 14
© Hortonworks Inc. 2014
Apache Hadoop releases
• 15 October, 2013
• The 1st GA release of Apache Hadoop 2.x
• YARN
– First...
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• 24 February, 2014
• First post GA release for the year 2014
• Alp...
© Hortonworks Inc. 2014
HDFS: Heterogeneous Storage
Page 17
Architecting the Future of Big Data
© Hortonworks Inc. 2014
HDFS: DataNode caching
Page 18
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• Very soon!
• YARN
– Details follow..
– ResourceManager restart fa...
© Hortonworks Inc. 2014
ResourceManager Restart and fail-over
Page 20
Architecting the Future of Big Data
ZooKeeper
© Hortonworks Inc. 2014
Capacity Scheduler Preemption
Page 21
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Application History and Timeline
• Few MR specific implementations: History and web-UI
• Not just ...
© Hortonworks Inc. 2014
Future
Architecting the Future of Big Data
Page 23
© Hortonworks Inc. 2014
Future: Operational enhancements
• Rolling upgrades
– No/minimal impact to users
– Ideal: Always r...
© Hortonworks Inc. 2014
Future: Enabling more apps
• Beyond MR
• Discussing next
– Long running services
– Isolation
– Mul...
© Hortonworks Inc. 2014
Future: Long running services
• You can run them already!
• Few enhancements needed
– Logs
– Secur...
© Hortonworks Inc. 2014
Fine-grain isolation for multi-tenancy
• Custom memory-monitoring
• Cgroups
• Linux Containers
• V...
© Hortonworks Inc. 2014
Multi-resource scheduling
• Today – memory & cpu
– Physical memory / virtual memory
– Cpu Cores – ...
© Hortonworks Inc. 2014
Other features
• Application SLAs
• Node labels
• Node affinity/anti-affinity
• Better online queu...
© Hortonworks Inc. 2014
YARN Ecosystem
Beyond the core YARN project: Briefly
Architecting the Future of Big Data
Page 30
© Hortonworks Inc. 2014
Eco-system
Page 31
Applications Powered by YARN
Apache Giraph – Graph Processing
Apache Hama – BSP...
© Hortonworks Inc. 2014
Apache TEZ
• Moving beyond MR
• A data processing framework that can execute a complex DAG of
task...
© Hortonworks Inc. 2014
Recap
Architecting the Future of Big Data
Page 33
© Hortonworks Inc. 2014
Recap
Page 34
Architecting the Future of Big Data
• Apache Hadoop 2 is, at least, twice as good!
•...
© Hortonworks Inc. 2014
Couple more things..
Architecting the Future of Big Data
Page 35
© Hortonworks Inc. 2014
The Book is out!
Page 36
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Page 37
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Thank you!
Page 38
Download Sandbox: Experience Apache Hadoop
Both 2.x and 1.x Versions Available!...
Upcoming SlideShare
Loading in...5
×

Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future

1,485

Published on

Title: Apache Hadoop YARN: Present and Future

Abstract: Apache Hadoop YARN evolves the Hadoop compute platform from being centered only around MapReduce to being a generic data processing platform that can take advantage of a multitude of programming paradigms all on the same data. In this talk, we'll talk about the journey of YARN from a concept to being the cornerstone of Hadoop 2 GA releases. We'll cover the current status of YARN, how it is faring today and how it stands apart from the monochromatic world that is Hadoop 1.0. We`ll then move on to the exciting future of YARN - features that are making YARN a first class resource-management platform for enterprise Hadoop, rolling upgrades, high availability, support for long running services alongside applications, fine-grain isolation for multi-tenancy, preemption, application SLAs, application-history to name a few.

Published in: Engineering, Technology
1 Comment
8 Likes
Statistics
Notes
  • http://dbmanagement.info/Tutorials/Hadoop.htm
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,485
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
94
Comments
1
Likes
8
Embeds 0
No embeds

No notes for slide
  • Graph processing – Giraph, HamaStream proessing – Smaza, Storm, Spark, DataTorrentMapReduceTez – fast query executionWeave/REEF – frameworks to help with writing applicationsList of some of the applications which already support YARN, in some form.Smaza, Storm, S4 and DataTorrent are streaming frameworksVarious types of graph processing frameworks – Giraph and Hama are graph processing systemsThere’s some github projects – caching systems, on-demand web-server spin up Wave and REEF are frameworks on top of YARN to make writing applications easier
  • Transcript of "Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future"

    1. 1. © Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 1
    2. 2. © Hortonworks Inc. 2014 A quick show of hands.. • Hadoop 2 Page 2 Architecting the Future of Big Data Real life Hadoop Logo
    3. 3. © Hortonworks Inc. 2014 Who am I? • 6.75 Hadoop-years old • Last thing at School – a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) • Previously @Yahoo! • Now @Hortonworks • Two hats – Hortonworks: Hadoop MapReduce and YARN Development lead – Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member • Worked/working on – YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security – Apache Ambari: Kickstarted the project and its first release – Stinger: High performance data processing with Hadoop/Hive • Lots of trouble shooting on clusters • 99% + code in Apache, Hadoop Page 3 Architecting the Future of Big Data
    4. 4. © Hortonworks Inc. 2014 Agenda • Apache Hadoop 2 : Overview • Past • Present • Future Page 4 Architecting the Future of Big Data
    5. 5. © Hortonworks Inc. 2014 Apache Hadoop 2 Next Generation Architecture Architecting the Future of Big Data Page 5
    6. 6. © Hortonworks Inc. 2014 What is YARN? • Resource Management Platform – MapReduce v2 – Beyond MapReduce with Tez, Storm, Spark; in Hadoop! – Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider? • How is it different from Hadoop 1? .. Page 6 Architecting the Future of Big Data
    7. 7. © Hortonworks Inc. 2014 Hadoop 1 vs Hadoop 2 HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, highly-available & reliable storage) YARN (cluster resource management) MapReduce (data processing) Others HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 7
    8. 8. © Hortonworks Inc. 2014 Key Benefits of YARN • Scale • New Programming Models & Services • Improved cluster utilization • Agility • To infinity and beyond .. Page 8
    9. 9. © Hortonworks Inc. 2014 Why Migrate? • 2.0 >= 2 * 1.0 – HDFS: Lots of ground-breaking features – YARN: Next generation architecture • Return on Investment: 2x throughput on same hardware! • Ready for improvements in hardware • Not convinced? Let’s see what others are saying! Page 9 Architecting the Future of Big Data
    10. 10. © Hortonworks Inc. 2014 Yahoo! • Leader/Visionary on all things Hadoop! • On YARN (0.23.x) • Moving fast to 2.x Page 10 Architecting the Future of Big Data http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
    11. 11. © Hortonworks Inc. 2014 Twitter Page 11 Architecting the Future of Big Data
    12. 12. © Hortonworks Inc. 2014 Ebay • Has one of the largest Hadoop clusters in the industry with many petabytes of data • Migrated production clusters to Hadoop-2 • Go to Mayank’s talk – “Hadoop-2 @ ebay”! – Thursday, April 3 – Track : Deployment and Operations • Should be convinced by now .. . No? Page 12 Architecting the Future of Big Data
    13. 13. © Hortonworks Inc. 2014 YARN: the Data Operating System Page 13 Architecting the Future of Big Data
    14. 14. © Hortonworks Inc. 2014 Present Architecting the Future of Big Data Page 14
    15. 15. © Hortonworks Inc. 2014 Apache Hadoop releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – Binary Compatibility for MapReduce applications built on hadoop-1.x – YARN level APIs solidified for the future – Performance – Scale! • HDFS – High Availability for HDFS – HDFS Federation – HDFS Snapshots – NFSv3 access to data in HDFS • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem Page 15 Architecting the Future of Big Data Apache Hadoop 2.2
    16. 16. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Alpha features in YARN – ResourceManager HA – Application History – Will cover in the 2.4 content • HDFS – Details follow.. • Number of bug-fixes, enhancements Page 16 Architecting the Future of Big Data Apache Hadoop 2.3
    17. 17. © Hortonworks Inc. 2014 HDFS: Heterogeneous Storage Page 17 Architecting the Future of Big Data
    18. 18. © Hortonworks Inc. 2014 HDFS: DataNode caching Page 18 Architecting the Future of Big Data
    19. 19. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • Very soon! • YARN – Details follow.. – ResourceManager restart fail-over for high availability – Preemption – Application History and timeline • HDFS – FileSystem ACLs – Rolling upgrades Page 19 Architecting the Future of Big Data Apache Hadoop 2.4
    20. 20. © Hortonworks Inc. 2014 ResourceManager Restart and fail-over Page 20 Architecting the Future of Big Data ZooKeeper
    21. 21. © Hortonworks Inc. 2014 Capacity Scheduler Preemption Page 21 Architecting the Future of Big Data
    22. 22. © Hortonworks Inc. 2014 Application History and Timeline • Few MR specific implementations: History and web-UI • Not just MR anymore! • History – MapReduce specific Job History Server – Beyond ResourceManager Restart • Timeline – Framework specific event collection and UIs • Run analytics on historical apps! Page 22 Architecting the Future of Big Data
    23. 23. © Hortonworks Inc. 2014 Future Architecting the Future of Big Data Page 23
    24. 24. © Hortonworks Inc. 2014 Future: Operational enhancements • Rolling upgrades – No/minimal impact to users – Ideal: Always rolling! • HDFS in • YARN Page 24 Architecting the Future of Big Data
    25. 25. © Hortonworks Inc. 2014 Future: Enabling more apps • Beyond MR • Discussing next – Long running services – Isolation – Multi-dimensional resource scheduling Page 25 Architecting the Future of Big Data
    26. 26. © Hortonworks Inc. 2014 Future: Long running services • You can run them already! • Few enhancements needed – Logs – Security – Management/monitoring • Resource sharing across workload types • Project Slider Page 26 Architecting the Future of Big Data
    27. 27. © Hortonworks Inc. 2014 Fine-grain isolation for multi-tenancy • Custom memory-monitoring • Cgroups • Linux Containers • VMs Page 27 Architecting the Future of Big Data
    28. 28. © Hortonworks Inc. 2014 Multi-resource scheduling • Today – memory & cpu – Physical memory / virtual memory – Cpu Cores – Virtual cores • CPU stuff: More bake in • Disks – Space – IOPS • Network Page 28 Architecting the Future of Big Data
    29. 29. © Hortonworks Inc. 2014 Other features • Application SLAs • Node labels • Node affinity/anti-affinity • Better online queue-management Page 29 Architecting the Future of Big Data
    30. 30. © Hortonworks Inc. 2014 YARN Ecosystem Beyond the core YARN project: Briefly Architecting the Future of Big Data Page 30
    31. 31. © Hortonworks Inc. 2014 Eco-system Page 31 Applications Powered by YARN Apache Giraph – Graph Processing Apache Hama – BSP Apache Hadoop MapReduce – Batch Apache Tez – Batch/Interactive Apache S4 – Stream Processing Apache Samza – Stream Processing Apache Storm – Stream Processing Apache Spark – Iterative applications HOYA – HBase on YARN YARN Frameworks Apache Twill REEF by Microsoft Spring support for Hadoop 2 There's an app for that... YARN App Marketplace!
    32. 32. © Hortonworks Inc. 2014 Apache TEZ • Moving beyond MR • A data processing framework that can execute a complex DAG of tasks. • “Apache Tez - A New Chapter in Hadoop Data Processing” – By Siddharth Seth: YARN & Tez Committer/PMC Member – Thursday, April 3 (4:20-5:00pm) Page 32 Architecting the Future of Big Data
    33. 33. © Hortonworks Inc. 2014 Recap Architecting the Future of Big Data Page 33
    34. 34. © Hortonworks Inc. 2014 Recap Page 34 Architecting the Future of Big Data • Apache Hadoop 2 is, at least, twice as good! • Exciting journey with Hadoop for this decade… – Hadoop is no longer a one-trick pony, err elephant – Beyond just HDFS & MapReduce • Architecture for the future – Centralized data – Exciting spectrum of application types, workloads and usecases
    35. 35. © Hortonworks Inc. 2014 Couple more things.. Architecting the Future of Big Data Page 35
    36. 36. © Hortonworks Inc. 2014 The Book is out! Page 36 Architecting the Future of Big Data
    37. 37. © Hortonworks Inc. 2014 Page 37 Architecting the Future of Big Data
    38. 38. © Hortonworks Inc. 2014 Thank you! Page 38 Download Sandbox: Experience Apache Hadoop Both 2.x and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions Time!
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×