SlideShare a Scribd company logo
1 of 39
Apache Hadoop YARN:
State of the union
Wangda Tan, Billie Rinaldi
@ Hortonworks
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Speaker intro
 Wangda Tan: Apache Hadoop PMC member, mostly focus on GPU/deep learning on
YARN, worked on features of scheduler like node label / preemption, etc.
 Billie Rinaldi: Apache Hadoop committer, PMC member of various other top-level
Apache projects and incubating projects, currently focusing on long running services
and Docker containers on Apache Hadoop YARN
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Introduction
 Past
 State of the Union
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi colored YARN
 Multi-colored YARN
– Apps
– Long running services
 It’s all about data!
 Layers that enable applications and
higher order frameworks that interact
with data
https://www.flickr.com/photos/happyskrappy/15699919424
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Page 5
Containerization
Containers
GPUs /
FPGAs
More
powerful
scheduling
Much faster
scheduling
Scale
SLAs
Usability
Service
workloads
Categories of recent initiatives
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform – Today and Tomorrow
Platform Services
Storage
Resource
Management
Service
Discovery Cluster Mgmt
Monitoring
Alerts
IOT Assembly
Kafk
a
Storm HBase Solr
Security
Governance
MR Tez
Spark
Hive / Pig
LLAP
Flink
REEF
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Past: A quick history
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: Pre GA
• Sub-project of Apache Hadoop
• Alphas and betas
– In production at several large sites for MapReduce already by that time
June-July 2010 August 2011 May 2012 August 2013
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 1/3
2.2 2.3 2.4 2.5 2.6 2.7
15 October 2013 24 February 2014 07 April 2014 11 August 2014
• 1st GA
• MR binary
compatibility
• YARN API
cleanup
• Testing!
• 1st Post GA
• Bug fixes
• Alpha features
• RM Fail-over
• CS
Preemption
• Timeline
Service V1
• Writable
REST APIs
• Timeline
Service V1
security
• Rolling
Upgrades
• Docker
• Node labels
18 November 2014
• Moving to
JDK 7+
• Pluggable
YARN
authentication
21 Apr 2015
Most Essential Requirements for enterprise usage
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 2/3
2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1 2.8.2 2.8.3
25 January 2016 25 August 2016 18 October 2016 04 August 2017
• Application
Priority
• Reservations
• Node labels
improvements
22 March 2017 08 June 2017 03 Oct 2017 12 Dec 2017
Enterprise consumption, need stablization
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 3/3
3.0.0-alpha1-4 3.0.0-beta1 3.0.0-GA2.9.0 3.0.1 3.1.0
Sep 16 – Aug 17 03 Oct 2017 13 Dec 2017
• GPU/FPGA
• Native
Service
• Placement
Constraints
06 April 201825 March 201817 Nov 2017
• YARN
Federation
• Opportunistic
Container
• Resource
types
• New YARN UI
• Timeline
service V2
More requirements comes (computation intensive, larger, services)
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 2.8/2.9
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Application priorities – YARN-1963
• Allocate resource to important apps first.
• Within a leaf-queue
FIFO Policy App 1 App 2 App 3 App 4
FIFO Policy
With priorities
App 1App 2App 3App 4
Higher priority  Lower priority
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Queue priorities – YARN-5864
 For interactive / SLA sentitive workload
 Today
– Give to the least satisfied queue first
 With priorities
– Give to the highest priority queue first (for important workload).
root
A
20% Configured Capacity
But 5% of used Capacity
Usage = 5/20 = 25%
B
80% Configured Capacity
But 8% of used Capacity
Usage 8/80 = 10%
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reservations – YARN-1051
• “Run my workload tomorrow at 6AM”
• Persistence of the plans with RM failover: YARN-2573
Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.0/3.1
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Looking at the Scale!
 Tons of sites with clusters made up of large amount of nodes
– Yahoo!, Twitter, LinkedIn, Microsoft, Alibaba etc.
 Previously, largest clusters
– 6K-8K
 Now: 40K nodes (federated), 20K nodes (single cluster).
 Roadmap: To 100K and beyond
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving towards Global & Fast Scheduling
 Problems
– Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
– Several coarse grained locks
 Current effort made us where we improved to
– Look at several nodes at a time
– Fine grained locks
– Multiple allocator threads
– YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
– 10X throughput gains with enhancement added recently
– Much better placement decisions
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles and custom resource types
 Past
– Supports only Memory and CPU
 Now
– A generalized vector
– Custom Resource Types!
 Ease of resource requesting model using
profiles
NodeManager
Memory
CPU
GPU
FPGA
Profile Memory CPU GPU
Small 2 GB 4 Cores 0 Cores
Medium 4 GB 8 Cores 0 Cores
Large 16 GB 16 Cores 4 Cores
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
GPU support on YARN
 Why need isolation?
– Multiple processes use the single GPU will be:
• Serialized.
• Cause OOM easily.
 GPU isolation on YARN: .
– Granularity is for per-GPU device.
– Use Cgroups / docker to enforce the isolation.
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
FPGA on YARN!
 FPGA isolation on YARN: .
– Granularity is for per-FPGA device.
– Use Cgroups to enforce the isolation.
 Currently, only Intel OpenCL SDK for FPGA is supported. But impl is extensible to other
FPGA SDK.
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Better placement strategies (YARN-6592)
 Affinity  Anti-affinity
HBase Sto
rm
Hbase-
Region
Server
Hbase-
Region
Server
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Federation!
 Enables applications to scale to 100k of thousands of nodes
 Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters
 Federation negotiates with sub-clusters RM’s and provide resources to the application
 Applications can schedule tasks on any node
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Packaging
 Containers
– Lightweight mechanism for packaging and resource isolation
– Popularized and made accessible by Docker
– Can replace VMs in some cases
– Or more accurately, VMs got used in places where they didn’t
need to be
 Native integration ++ in YARN
– Support for “Container Runtimes” in LCE: YARN-3611
– Process runtime
– Docker runtime
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services support
 Application & Services upgrades
– “Do an upgrade of my Spark / HBase apps with minimal impact to end-users”
– YARN-4726
 Simplified discovery of services via DNS mechanisms: YARN-4757
– regionserver-0.hbase-app-3.hadoop.yarn.site
 Placement policies
 Container restart
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplified APIs for service definitions
 Applications need simple APIs
 Need to be deployable “easily”
 Simple REST API layer fronting YARN
– YARN-4793 Simplified API layer for services and beyond
 Spawn services & Manage them
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services Framework
 Platform is only as good as the tools
 A native YARN services framework
– YARN-4692
– [Umbrella] Native YARN framework layer for services and
beyond
 Assembly: Supporting a DAG of apps:
– SLIDER-875
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
API based queue management
Decentralized
(YARN-5734)
Improved logs
management
(YARN-4904)
Live application logs
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service
 Application History
– “Where did my containers run?”
– “Why is my application slow?”
– “Is it really slow?”
– “Why is my application failing?”
– “What happened with my application?
Succeeded?”
 Cluster History
– Run analytics on historical apps!
– “User with most resource utilization”
– “Largest application run”
– “Why is my cluster slow?”
– “Why is my cluster down?”
– “What happened in my clusters?”
 Collect and use past data
– To schedule “my application” better
– To do better capacity planning
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service 2.0
• Next generation
– Today’s solution helped us understand the space
– Limited scalability and availability
• “Analyzing Hadoop Clusters is becoming a big-data problem”
– Don’t want to throw away the Hadoop application metadata
– Large scale
– Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.”
• Timeline data stored in HBase and accessible to queries
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.2 and beyond
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Node Attributes (YARN-3409)
• Node Partition vs. Node Attribute
• Partition:
• One partition for one node
• ACL
• Shares between queues
• Preemption enforced.
• Attribute:
• For container placement
• No ACL/Shares on attributes
• First-come-first-serve
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Container overcommit (YARN-1011)
 Each node has some allocated but unutilized capacities
 Use such capacity to run opportunistic tasks
 Preemption such tasks when needed
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Auto-spawning of system services
(YARN-8048)
• System services is services required by
YARN, need to be started during
bootstrap.
• For example YARN ATSv2 needs Hbase, so
Hbase is system service of YARN.
• Only Admin can configure
• Started along with ResourceManager
• Place spec files under
yarn.service.system-service.dir FS path
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Lessons learned running a container cloud
on YARN
https://dataworkssummit.com/berlin-2018/session/lessons-learned-running-a-container-cloud-on-
yarn/
4PM, Room I, Wed April 18th
-- Related Session --
Billie Rinaldi
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deep learning on YARN: running
distributed Tensorflow, etc. on Hadoop
clusters
https://dataworkssummit.com/berlin-2018/session/deep-learning-on-yarn-running-distributed-
tensorflow-mxnet-caffe-xgboost-on-hadoop-clusters/
2PM, Room II, Wed April 18th
-- Related Session --
Wangda Tan
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
BoF’s: Apache Hadoop – YARN, HDFS
https://dataworkssummit.com/berlin-2018/bofs/#apache-hadoop-8211-yarn-hdfs
Thursday April 19th
-- Related Session --

More Related Content

What's hot

Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Hortonworks
 
Next Generation Execution for Apache Storm
Next Generation Execution for Apache StormNext Generation Execution for Apache Storm
Next Generation Execution for Apache StormDataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Hortonworks
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
Apache Ambari - What's New in 2.0.0
Apache Ambari - What's New in 2.0.0Apache Ambari - What's New in 2.0.0
Apache Ambari - What's New in 2.0.0Hortonworks
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingDataWorks Summit/Hadoop Summit
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARNDataWorks Summit
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
Apache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & GrafanaApache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & GrafanaPrajwal Rao
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016alanfgates
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4DataWorks Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghaiYifeng Jiang
 
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native ServicesAccumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native ServicesAccumulo Summit
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 

What's hot (20)

Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4 Apache Ambari - What's New in 2.4
Apache Ambari - What's New in 2.4
 
Next Generation Execution for Apache Storm
Next Generation Execution for Apache StormNext Generation Execution for Apache Storm
Next Generation Execution for Apache Storm
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1Apache Ambari - What's New in 2.1
Apache Ambari - What's New in 2.1
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Apache Ambari - What's New in 2.0.0
Apache Ambari - What's New in 2.0.0Apache Ambari - What's New in 2.0.0
Apache Ambari - What's New in 2.0.0
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and TroubleshootingApache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
 
Running a container cloud on YARN
Running a container cloud on YARNRunning a container cloud on YARN
Running a container cloud on YARN
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Apache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & GrafanaApache Ambari Meetup - AMS & Grafana
Apache Ambari Meetup - AMS & Grafana
 
Spark Security
Spark SecuritySpark Security
Spark Security
 
Apache Slider
Apache SliderApache Slider
Apache Slider
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
 
What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4What s new in spark 2.3 and spark 2.4
What s new in spark 2.3 and spark 2.4
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native ServicesAccumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 

Similar to Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & FutureDataWorks Summit
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storySunil Govindan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNHortonworks
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...VMware Tanzu
 

Similar to Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union (20)

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
Achieving Mega-Scale Business Intelligence Through Speed of Thought Analytics...
 

More from Wangda Tan

Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformWangda Tan
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x Wangda Tan
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Wangda Tan
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARNWangda Tan
 
Hadoop summit-diverse-workload
Hadoop summit-diverse-workloadHadoop summit-diverse-workload
Hadoop summit-diverse-workloadWangda Tan
 

More from Wangda Tan (6)

Apache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning PlatformApache Submarine: Unified Machine Learning Platform
Apache Submarine: Unified Machine Learning Platform
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
Running Tensorflow In Production: Challenges and Solutions on YARN 3.x
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
 
Hadoop summit-diverse-workload
Hadoop summit-diverse-workloadHadoop summit-diverse-workload
Hadoop summit-diverse-workload
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union

  • 1. Apache Hadoop YARN: State of the union Wangda Tan, Billie Rinaldi @ Hortonworks
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Speaker intro  Wangda Tan: Apache Hadoop PMC member, mostly focus on GPU/deep learning on YARN, worked on features of scheduler like node label / preemption, etc.  Billie Rinaldi: Apache Hadoop committer, PMC member of various other top-level Apache projects and incubating projects, currently focusing on long running services and Docker containers on Apache Hadoop YARN
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda  Introduction  Past  State of the Union
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi colored YARN  Multi-colored YARN – Apps – Long running services  It’s all about data!  Layers that enable applications and higher order frameworks that interact with data https://www.flickr.com/photos/happyskrappy/15699919424
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 5 Containerization Containers GPUs / FPGAs More powerful scheduling Much faster scheduling Scale SLAs Usability Service workloads Categories of recent initiatives
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform – Today and Tomorrow Platform Services Storage Resource Management Service Discovery Cluster Mgmt Monitoring Alerts IOT Assembly Kafk a Storm HBase Solr Security Governance MR Tez Spark Hive / Pig LLAP Flink REEF
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Past: A quick history
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: Pre GA • Sub-project of Apache Hadoop • Alphas and betas – In production at several large sites for MapReduce already by that time June-July 2010 August 2011 May 2012 August 2013
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 1/3 2.2 2.3 2.4 2.5 2.6 2.7 15 October 2013 24 February 2014 07 April 2014 11 August 2014 • 1st GA • MR binary compatibility • YARN API cleanup • Testing! • 1st Post GA • Bug fixes • Alpha features • RM Fail-over • CS Preemption • Timeline Service V1 • Writable REST APIs • Timeline Service V1 security • Rolling Upgrades • Docker • Node labels 18 November 2014 • Moving to JDK 7+ • Pluggable YARN authentication 21 Apr 2015 Most Essential Requirements for enterprise usage
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 2/3 2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1 2.8.2 2.8.3 25 January 2016 25 August 2016 18 October 2016 04 August 2017 • Application Priority • Reservations • Node labels improvements 22 March 2017 08 June 2017 03 Oct 2017 12 Dec 2017 Enterprise consumption, need stablization
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 3/3 3.0.0-alpha1-4 3.0.0-beta1 3.0.0-GA2.9.0 3.0.1 3.1.0 Sep 16 – Aug 17 03 Oct 2017 13 Dec 2017 • GPU/FPGA • Native Service • Placement Constraints 06 April 201825 March 201817 Nov 2017 • YARN Federation • Opportunistic Container • Resource types • New YARN UI • Timeline service V2 More requirements comes (computation intensive, larger, services)
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 2.8/2.9
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application priorities – YARN-1963 • Allocate resource to important apps first. • Within a leaf-queue FIFO Policy App 1 App 2 App 3 App 4 FIFO Policy With priorities App 1App 2App 3App 4 Higher priority  Lower priority
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Queue priorities – YARN-5864  For interactive / SLA sentitive workload  Today – Give to the least satisfied queue first  With priorities – Give to the highest priority queue first (for important workload). root A 20% Configured Capacity But 5% of used Capacity Usage = 5/20 = 25% B 80% Configured Capacity But 8% of used Capacity Usage 8/80 = 10%
  • 15. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reservations – YARN-1051 • “Run my workload tomorrow at 6AM” • Persistence of the plans with RM failover: YARN-2573 Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
  • 16. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.0/3.1
  • 17. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Looking at the Scale!  Tons of sites with clusters made up of large amount of nodes – Yahoo!, Twitter, LinkedIn, Microsoft, Alibaba etc.  Previously, largest clusters – 6K-8K  Now: 40K nodes (federated), 20K nodes (single cluster).  Roadmap: To 100K and beyond
  • 18. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Moving towards Global & Fast Scheduling  Problems – Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions. – Several coarse grained locks  Current effort made us where we improved to – Look at several nodes at a time – Fine grained locks – Multiple allocator threads – YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour! – 10X throughput gains with enhancement added recently – Much better placement decisions
  • 19. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles and custom resource types  Past – Supports only Memory and CPU  Now – A generalized vector – Custom Resource Types!  Ease of resource requesting model using profiles NodeManager Memory CPU GPU FPGA Profile Memory CPU GPU Small 2 GB 4 Cores 0 Cores Medium 4 GB 8 Cores 0 Cores Large 16 GB 16 Cores 4 Cores
  • 20. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GPU support on YARN  Why need isolation? – Multiple processes use the single GPU will be: • Serialized. • Cause OOM easily.  GPU isolation on YARN: . – Granularity is for per-GPU device. – Use Cgroups / docker to enforce the isolation.
  • 21. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved FPGA on YARN!  FPGA isolation on YARN: . – Granularity is for per-FPGA device. – Use Cgroups to enforce the isolation.  Currently, only Intel OpenCL SDK for FPGA is supported. But impl is extensible to other FPGA SDK.
  • 22. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Better placement strategies (YARN-6592)  Affinity  Anti-affinity HBase Sto rm Hbase- Region Server Hbase- Region Server
  • 23. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Federation!  Enables applications to scale to 100k of thousands of nodes  Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters  Federation negotiates with sub-clusters RM’s and provide resources to the application  Applications can schedule tasks on any node
  • 24. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Packaging  Containers – Lightweight mechanism for packaging and resource isolation – Popularized and made accessible by Docker – Can replace VMs in some cases – Or more accurately, VMs got used in places where they didn’t need to be  Native integration ++ in YARN – Support for “Container Runtimes” in LCE: YARN-3611 – Process runtime – Docker runtime
  • 25. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services support  Application & Services upgrades – “Do an upgrade of my Spark / HBase apps with minimal impact to end-users” – YARN-4726  Simplified discovery of services via DNS mechanisms: YARN-4757 – regionserver-0.hbase-app-3.hadoop.yarn.site  Placement policies  Container restart
  • 26. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplified APIs for service definitions  Applications need simple APIs  Need to be deployable “easily”  Simple REST API layer fronting YARN – YARN-4793 Simplified API layer for services and beyond  Spawn services & Manage them
  • 27. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services Framework  Platform is only as good as the tools  A native YARN services framework – YARN-4692 – [Umbrella] Native YARN framework layer for services and beyond  Assembly: Supporting a DAG of apps: – SLIDER-875
  • 28. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience API based queue management Decentralized (YARN-5734) Improved logs management (YARN-4904) Live application logs
  • 29. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 30. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 31. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service  Application History – “Where did my containers run?” – “Why is my application slow?” – “Is it really slow?” – “Why is my application failing?” – “What happened with my application? Succeeded?”  Cluster History – Run analytics on historical apps! – “User with most resource utilization” – “Largest application run” – “Why is my cluster slow?” – “Why is my cluster down?” – “What happened in my clusters?”  Collect and use past data – To schedule “my application” better – To do better capacity planning
  • 32. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service 2.0 • Next generation – Today’s solution helped us understand the space – Limited scalability and availability • “Analyzing Hadoop Clusters is becoming a big-data problem” – Don’t want to throw away the Hadoop application metadata – Large scale – Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.” • Timeline data stored in HBase and accessible to queries
  • 33. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.2 and beyond
  • 34. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Node Attributes (YARN-3409) • Node Partition vs. Node Attribute • Partition: • One partition for one node • ACL • Shares between queues • Preemption enforced. • Attribute: • For container placement • No ACL/Shares on attributes • First-come-first-serve
  • 35. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container overcommit (YARN-1011)  Each node has some allocated but unutilized capacities  Use such capacity to run opportunistic tasks  Preemption such tasks when needed
  • 36. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Auto-spawning of system services (YARN-8048) • System services is services required by YARN, need to be started during bootstrap. • For example YARN ATSv2 needs Hbase, so Hbase is system service of YARN. • Only Admin can configure • Started along with ResourceManager • Place spec files under yarn.service.system-service.dir FS path
  • 37. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Lessons learned running a container cloud on YARN https://dataworkssummit.com/berlin-2018/session/lessons-learned-running-a-container-cloud-on- yarn/ 4PM, Room I, Wed April 18th -- Related Session -- Billie Rinaldi
  • 38. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deep learning on YARN: running distributed Tensorflow, etc. on Hadoop clusters https://dataworkssummit.com/berlin-2018/session/deep-learning-on-yarn-running-distributed- tensorflow-mxnet-caffe-xgboost-on-hadoop-clusters/ 2PM, Room II, Wed April 18th -- Related Session -- Wangda Tan
  • 39. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BoF’s: Apache Hadoop – YARN, HDFS https://dataworkssummit.com/berlin-2018/bofs/#apache-hadoop-8211-yarn-hdfs Thursday April 19th -- Related Session --

Editor's Notes

  1. For new people, 10%
  2. Appplication centric
  3. Categories of recent initiatives
  4. Many users are requesting most necessary features, so that’s why we release so fast. Many of this featuress a necessary to run YARN cluster, such as RM fail-over/ HA, etc
  5. 2.6/2.7/2.8 are versions which most prod cluster are using, so many feature development remains in the background and many community effort are focusing on stablizing features.
  6. Again, the most essential YARN doesn’t meet user’s requirement anymore, that’s why we released 3 minor releases in 6 months, and include features like: YARN Federation (larger cluster) GPU / FPGA, etc.
  7. Even though TF provide options to use GPU memory less than whole device provided. But we cannot enforce this from external.
  8. High level talk on ATSv2, it is scalable solution compared to 1.5.