Apache Hadoop YARN: Present and Future
Upcoming SlideShare
Loading in...5
×
 

Apache Hadoop YARN: Present and Future

on

  • 1,177 views

 

Statistics

Views

Total Views
1,177
Views on SlideShare
1,169
Embed Views
8

Actions

Likes
3
Downloads
84
Comments
0

2 Embeds 8

http://www.slideee.com 7
http://dschool.co 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Graph processing – Giraph, HamaStream proessing – Smaza, Storm, Spark, DataTorrentMapReduceTez – fast query executionWeave/REEF – frameworks to help with writing applicationsList of some of the applications which already support YARN, in some form.Smaza, Storm, S4 and DataTorrent are streaming frameworksVarious types of graph processing frameworks – Giraph and Hama are graph processing systemsThere’s some github projects – caching systems, on-demand web-server spin up Wave and REEF are frameworks on top of YARN to make writing applications easier

Apache Hadoop YARN: Present and Future Apache Hadoop YARN: Present and Future Presentation Transcript

  • Apache Hadoop YARN: Present and Future Vinod Kumar Vavilapalli Hortonworks
  • © Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 2
  • © Hortonworks Inc. 2014 A quick show of hands.. • Hadoop 2 Page 3 Architecting the Future of Big Data Real life Hadoop Logo
  • © Hortonworks Inc. 2014 Who am I? • 6.75 Hadoop-years old • Last thing at School – a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) • Previously @Yahoo! • Now @Hortonworks • Two hats – Hortonworks: Hadoop MapReduce and YARN Development lead – Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member • Worked/working on – YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security – Apache Ambari: Kickstarted the project and its first release – Stinger: High performance data processing with Hadoop/Hive • Lots of trouble shooting on clusters • 99% + code in Apache, Hadoop Page 4 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Agenda • Apache Hadoop 2 : Overview • Past • Present • Future Page 5 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Apache Hadoop 2 Next Generation Architecture Architecting the Future of Big Data Page 6
  • © Hortonworks Inc. 2014 What is YARN? • Resource Management Platform – MapReduce v2 – Beyond MapReduce with Tez, Storm, Spark; in Hadoop! – Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider? • How is it different from Hadoop 1? .. Page 7 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Hadoop 1 vs Hadoop 2 HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, highly-available & reliable storage) YARN (cluster resource management) MapReduce (data processing) Others HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 8
  • © Hortonworks Inc. 2014 Key Benefits of YARN • Scale • New Programming Models & Services • Improved cluster utilization • Agility • To infinity and beyond .. Page 9
  • © Hortonworks Inc. 2014 Why Migrate? • 2.0 >= 2 * 1.0 – HDFS: Lots of ground-breaking features – YARN: Next generation architecture • Return on Investment: 2x throughput on same hardware! • Ready for improvements in hardware • Not convinced? Let’s see what others are saying! Page 10 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Yahoo! • Leader/Visionary on all things Hadoop! • On YARN (0.23.x) • Moving fast to 2.x Page 11 Architecting the Future of Big Data http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
  • © Hortonworks Inc. 2014 Twitter Page 12 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Ebay • Has one of the largest Hadoop clusters in the industry with many petabytes of data • Migrated production clusters to Hadoop-2 • Go to Mayank’s talk – “Hadoop-2 @ ebay”! – Thursday, April 3 – Track : Deployment and Operations • Should be convinced by now .. . No? Page 13 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 YARN: the Data Operating System Page 14 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Present Architecting the Future of Big Data Page 15
  • © Hortonworks Inc. 2014 Apache Hadoop releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – Binary Compatibility for MapReduce applications built on hadoop-1.x – YARN level APIs solidified for the future – Performance – Scale! • HDFS – High Availability for HDFS – HDFS Federation – HDFS Snapshots – NFSv3 access to data in HDFS • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem Page 16 Architecting the Future of Big Data Apache Hadoop 2.2
  • © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Alpha features in YARN – ResourceManager HA – Application History – Will cover in the 2.4 content • HDFS – Details follow.. • Number of bug-fixes, enhancements Page 17 Architecting the Future of Big Data Apache Hadoop 2.3
  • © Hortonworks Inc. 2014 HDFS: Heterogeneous Storage Page 18 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 HDFS: DataNode caching Page 19 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • Very soon! • YARN – Details follow.. – ResourceManager restart fail-over for high availability – Preemption – Application History and timeline • HDFS – FileSystem ACLs – Rolling upgrades Page 20 Architecting the Future of Big Data Apache Hadoop 2.4
  • © Hortonworks Inc. 2014 ResourceManager Restart and fail-over Page 21 Architecting the Future of Big Data ZooKeeper
  • © Hortonworks Inc. 2014 Capacity Scheduler Preemption Page 22 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Application History and Timeline • Few MR specific implementations: History and web-UI • Not just MR anymore! • History – MapReduce specific Job History Server – Beyond ResourceManager Restart • Timeline – Framework specific event collection and UIs • Run analytics on historical apps! Page 23 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Future Architecting the Future of Big Data Page 24
  • © Hortonworks Inc. 2014 Future: Operational enhancements • Rolling upgrades – No/minimal impact to users – Ideal: Always rolling! • HDFS in • YARN Page 25 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Future: Enabling more apps • Beyond MR • Discussing next – Long running services – Isolation – Multi-dimensional resource scheduling Page 26 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Future: Long running services • You can run them already! • Few enhancements needed – Logs – Security – Management/monitoring • Resource sharing across workload types • Project Slider Page 27 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Fine-grain isolation for multi-tenancy • Custom memory-monitoring • Cgroups • Linux Containers • VMs Page 28 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Multi-resource scheduling • Today – memory & cpu – Physical memory / virtual memory – Cpu Cores – Virtual cores • CPU stuff: More bake in • Disks – Space – IOPS • Network Page 29 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Other features • Application SLAs • Node labels • Node affinity/anti-affinity • Better online queue-management Page 30 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 YARN Ecosystem Beyond the core YARN project: Briefly Architecting the Future of Big Data Page 31
  • © Hortonworks Inc. 2014 Eco-system Page 32 Applications Powered by YARN Apache Giraph – Graph Processing Apache Hama – BSP Apache Hadoop MapReduce – Batch Apache Tez – Batch/Interactive Apache S4 – Stream Processing Apache Samza – Stream Processing Apache Storm – Stream Processing Apache Spark – Iterative applications HOYA – HBase on YARN YARN Frameworks Apache Twill REEF by Microsoft Spring support for Hadoop 2 There's an app for that... YARN App Marketplace!
  • © Hortonworks Inc. 2014 Apache TEZ • Moving beyond MR • A data processing framework that can execute a complex DAG of tasks. • “Apache Tez - A New Chapter in Hadoop Data Processing” – By Siddharth Seth: YARN & Tez Committer/PMC Member – Thursday, April 3 (4:20-5:00pm) Page 33 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Recap Architecting the Future of Big Data Page 34
  • © Hortonworks Inc. 2014 Recap Page 35 Architecting the Future of Big Data • Apache Hadoop 2 is, at least, twice as good! • Exciting journey with Hadoop for this decade… – Hadoop is no longer a one-trick pony, err elephant – Beyond just HDFS & MapReduce • Architecture for the future – Centralized data – Exciting spectrum of application types, workloads and usecases
  • © Hortonworks Inc. 2014 Couple more things.. Architecting the Future of Big Data Page 36
  • © Hortonworks Inc. 2014 The Book is out! Page 37 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Page 38 Architecting the Future of Big Data
  • © Hortonworks Inc. 2014 Thank you! Page 39 Download Sandbox: Experience Apache Hadoop Both 2.x and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions Time!