SlideShare a Scribd company logo
1 of 29
YARN High Availability
Karthik Kambatla – Cloudera Inc
Xuan Gong – Hortonworks Inc
Outline
• Background
– YARN architecture and need for HA
• RM HA architecture
– Persisting the state
– Active/ Standby pair and Fencing
– Failover and redirection
• Configuring HA
• Demo
6/17/2014 YARN High Availability, Hadoop Summit 2
YARN Architecture
6/17/2014 YARN High Availability, Hadoop Summit 3
Resource
Manager
Node Manager
Node Manager
Node Manager
App
Master
Container
Client
Client
Cluster State
Applications
State
Fault-tolerance
6/17/2014 YARN High Availability, Hadoop Summit 4
Resource
Manager
Node Manager
Node Manager
Node Manager
App
Master
Container
Client
Client
App
Master
ContainerCluster State
Applications
State
Naïve RM Restart
6/17/2014 YARN High Availability, Hadoop Summit 5
Resource
Manager
Node Manager
Node Manager
Client
Client
App
Master
Cluster State
Applications
State
ResourceManager is a YARN cluster’s
single point of failure.
6/17/2014 YARN High Availability, Hadoop Summit 6
Need stateful restart and multiple RMs.
Highly Available Resource Manager
a.k.a. HARMful YARN
• Currently shipped
– Beta in Apache Hadoop 2.3.0
– Stable in Apache Hadoop 2.4.0
– More stable in Apache Hadoop 2.4.1
6/17/2014 YARN High Availability, Hadoop Summit 7
Stateful RM Restart (Phase 1)
6/17/2014 YARN High Availability, Hadoop Summit 8
Node Manager
Node Manager
App
Master
Container
Client
Client
Resource
Manager
Cluster State
Applications
State
RM Store
App
Master
Container
RM Store Implementations
• Memory store – testing purposes
• Filesystem based store
– Any file system: local, HDFS or any other
• Zookeeper based store (ZKRMStateStore)
– Recommended (for fencing)
– Loading 10,000 applications takes about 8.5 secs.
6/17/2014 YARN High Availability, Hadoop Summit 9
Implications to Running applications
• In-flight work is lost.
• AMs are restarted.
• AMs could checkpoint completed work.
– MapReduce AM does.
– Consider a job with 100 map tasks
• If RM goes down after 90 map tasks finish.
• After restart, only the remaining 10 are run.
6/17/2014 YARN High Availability, Hadoop Summit 10
Stateful RM Restart (Phase 2)
• Under development (YARN-556)
– No loss of in-flight work
• Related work
– Work-preserving NodeManager restart (YARN-
1336)
– Work-preserving ApplicationMaster restart (YARN-
1489)
6/17/2014 YARN High Availability, Hadoop Summit 11
Multiple RMs
• Active / Standby architecture
– Potentially multiple standbys
– Warm standby
• Running
• Loads state and starts RPC servers on becoming Active
– Manual / automatic failover
– Clients and Web UI failover automatically
6/17/2014 YARN High Availability, Hadoop Summit 12
Active / Standby
6/17/2014 YARN High Availability, Hadoop Summit 13
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
Manual Failover through CLI
6/17/2014 YARN High Availability, Hadoop Summit 14
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
Client Failover
(ConfiguredFailoverProxyProvider)
6/17/2014 YARN High Availability, Hadoop Summit 15
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
App
Master
Automatic Failover
6/17/2014 YARN High Availability, Hadoop Summit 16
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
Elector
Elector
ZK
Automatic Failover
6/17/2014 YARN High Availability, Hadoop Summit 17
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
Elector
Elector
ZK
Automatic Failover
• Zookeeper based
– Uses ActiveStandbyElector for Active election
• No need for a FailoverController
– Can’t monitor RM process health and recover
6/17/2014 YARN High Availability, Hadoop Summit 18
Network Hiccup
6/17/2014 YARN High Availability, Hadoop Summit 19
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
Elector
Elector
ZK
Multiple Actives?
6/17/2014 YARN High Availability, Hadoop Summit 20
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Active
Resource
Manager
Elector
Elector
ZK
Fencing
• The state store gets corrupted when multiple
RMs assume the Active role.
• Exclusive access to a single RM.
– ZKRMStateStore takes care of this.
– Shared “admin” access.
– Exclusive “create-delete” access on transition to
Active
6/17/2014 YARN High Availability, Hadoop Summit 21
Network Hiccup
6/17/2014 YARN High Availability, Hadoop Summit 22
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
Elector
Elector
ZK
Active / Standby
6/17/2014 YARN High Availability, Hadoop Summit 23
Node Manager
Node Manager
App
Master
Client
Client
Active
Resource
Manager
RM Store
Standby
Resource
Manager
Elector
Elector
ZK
In-flight RPCs
• In-flight RPCs: Retry or not?
– E.g. Submit application – we clearly don’t want
two applications submitted.
• Depends on whether failover happens before, during,
or after the RM acts on the call.
• Solution
– Annotate APIs as Idempotent or AtMostOnce
6/17/2014 YARN High Availability, Hadoop Summit 24
Web UI
• Standby RM has no/stale information.
• Users don’t know which RM is Active.
• Redirect Web UI and REST calls to Active RM.
– Except a few pages that give information about
the RM.
6/17/2014 YARN High Availability, Hadoop Summit 25
Admin Refresh
• Admin refresh ($ yarn rmadmin –refresh):
– Refreshes that particular RM – Active/Standby
– Uses local configuration file
• FileSystemBasedConfigurationProvider
– Upload the configuration files to (potentially shared)
filesystem like HDFS.
6/17/2014 YARN High Availability, Hadoop Summit 26
Setting up HA
Config name Value
yarn.resourcemanager.ha.enabled true
yarn.resourcemanager.ha.rm-ids rm1,rm2
yarn.resourcemanager.hostname.rm1 <host1>
yarn.resourcemanager.hostname.rm2 <host2>
yarn.resourcemanager.recovery.enabled true
yarn.resourcemanager.store.class ZKRMStateStore1
yarn.resourcemanager.zk-address <zk-quorum>
yarn.resourcemanager.cluster-id <cluster-id>
6/17/2014 YARN High Availability, Hadoop Summit 27
1. org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
Demo!
6/17/2014 YARN High Availability, Hadoop Summit 28
Questions?
6/17/2014 YARN High Availability, Hadoop Summit 29

More Related Content

What's hot

Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnOmid Vahdaty
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignMichael Noll
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Amy W. Tang
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engineWalter Liu
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Simplilearn
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsHan Zhou
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower OffloadNetronome
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStackKamesh Pemmaraju
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 

What's hot (20)

Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Fluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log ManagementFluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log Management
 
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
Espresso: LinkedIn's Distributed Data Serving Platform (Talk)
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
Airflow - a data flow engine
Airflow - a data flow engineAirflow - a data flow engine
Airflow - a data flow engine
 
Spark overview
Spark overviewSpark overview
Spark overview
 
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginn...
 
Large scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutionsLarge scale overlay networks with ovn: problems and solutions
Large scale overlay networks with ovn: problems and solutions
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
TC Flower Offload
TC Flower OffloadTC Flower Offload
TC Flower Offload
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 

Viewers also liked

YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High AvailabilityCloudera, Inc.
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and ClouderaJoey Echeverria
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARNArinto Murdopo
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.finalHortonworks
 
Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloCloudera, Inc.
 
Analyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and ProfitAnalyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and ProfitDataWorks Summit
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceJUNWEI GUAN
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpaceAndrei Savu
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013Cloudera Japan
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutAmbarish Hazarnis
 
リクルート式Hadoopの使い方
リクルート式Hadoopの使い方リクルート式Hadoopの使い方
リクルート式Hadoopの使い方Recruit Technologies
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installationSumitra Pundlik
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIClouderaUserGroups
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformCloudera, Inc.
 

Viewers also liked (20)

YARN High Availability
YARN High AvailabilityYARN High Availability
YARN High Availability
 
Apache Accumulo and Cloudera
Apache Accumulo and ClouderaApache Accumulo and Cloudera
Apache Accumulo and Cloudera
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 
Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache Accumulo
 
Analyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and ProfitAnalyzing Historical Data of Applications on YARN for Fun and Profit
Analyzing Historical Data of Applications on YARN for Fun and Profit
 
AnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business IntelligenceAnalyzingMovieData and Business Intelligence
AnalyzingMovieData and Business Intelligence
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
Unit testing Agile OpenSpace
Unit testing Agile OpenSpaceUnit testing Agile OpenSpace
Unit testing Agile OpenSpace
 
CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013CDH5最新情報 #cwt2013
CDH5最新情報 #cwt2013
 
Recommendation Engine using Apache Mahout
Recommendation Engine using Apache MahoutRecommendation Engine using Apache Mahout
Recommendation Engine using Apache Mahout
 
リクルート式Hadoopの使い方
リクルート式Hadoopの使い方リクルート式Hadoopの使い方
リクルート式Hadoopの使い方
 
Cloudera hadoop installation
Cloudera hadoop installationCloudera hadoop installation
Cloudera hadoop installation
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Extending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via APIExtending and Automating Cloudera Manager via API
Extending and Automating Cloudera Manager via API
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics PlatformSamsung’s First 90-Days Building a Next-Generation Analytics Platform
Samsung’s First 90-Days Building a Next-Generation Analytics Platform
 

Similar to YARN High Availability Architecture and Configuration

Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInHortonworks
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012hitesh1892
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Hortonworks
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014InMobi Technology
 
Accumulo Summit 2014: Accumulo on YARN
Accumulo Summit 2014: Accumulo on YARNAccumulo Summit 2014: Accumulo on YARN
Accumulo Summit 2014: Accumulo on YARNAccumulo Summit
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnDataWorks Summit
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 

Similar to YARN High Availability Architecture and Configuration (20)

Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedIn
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Yarn
YarnYarn
Yarn
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Running Services on YARN
Running Services on YARNRunning Services on YARN
Running Services on YARN
 
Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
 
October 2014 HUG : Apache Slider
October 2014 HUG : Apache SliderOctober 2014 HUG : Apache Slider
October 2014 HUG : Apache Slider
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
 
Accumulo Summit 2014: Accumulo on YARN
Accumulo Summit 2014: Accumulo on YARNAccumulo Summit 2014: Accumulo on YARN
Accumulo Summit 2014: Accumulo on YARN
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 

YARN High Availability Architecture and Configuration

  • 1. YARN High Availability Karthik Kambatla – Cloudera Inc Xuan Gong – Hortonworks Inc
  • 2. Outline • Background – YARN architecture and need for HA • RM HA architecture – Persisting the state – Active/ Standby pair and Fencing – Failover and redirection • Configuring HA • Demo 6/17/2014 YARN High Availability, Hadoop Summit 2
  • 3. YARN Architecture 6/17/2014 YARN High Availability, Hadoop Summit 3 Resource Manager Node Manager Node Manager Node Manager App Master Container Client Client Cluster State Applications State
  • 4. Fault-tolerance 6/17/2014 YARN High Availability, Hadoop Summit 4 Resource Manager Node Manager Node Manager Node Manager App Master Container Client Client App Master ContainerCluster State Applications State
  • 5. Naïve RM Restart 6/17/2014 YARN High Availability, Hadoop Summit 5 Resource Manager Node Manager Node Manager Client Client App Master Cluster State Applications State
  • 6. ResourceManager is a YARN cluster’s single point of failure. 6/17/2014 YARN High Availability, Hadoop Summit 6 Need stateful restart and multiple RMs.
  • 7. Highly Available Resource Manager a.k.a. HARMful YARN • Currently shipped – Beta in Apache Hadoop 2.3.0 – Stable in Apache Hadoop 2.4.0 – More stable in Apache Hadoop 2.4.1 6/17/2014 YARN High Availability, Hadoop Summit 7
  • 8. Stateful RM Restart (Phase 1) 6/17/2014 YARN High Availability, Hadoop Summit 8 Node Manager Node Manager App Master Container Client Client Resource Manager Cluster State Applications State RM Store App Master Container
  • 9. RM Store Implementations • Memory store – testing purposes • Filesystem based store – Any file system: local, HDFS or any other • Zookeeper based store (ZKRMStateStore) – Recommended (for fencing) – Loading 10,000 applications takes about 8.5 secs. 6/17/2014 YARN High Availability, Hadoop Summit 9
  • 10. Implications to Running applications • In-flight work is lost. • AMs are restarted. • AMs could checkpoint completed work. – MapReduce AM does. – Consider a job with 100 map tasks • If RM goes down after 90 map tasks finish. • After restart, only the remaining 10 are run. 6/17/2014 YARN High Availability, Hadoop Summit 10
  • 11. Stateful RM Restart (Phase 2) • Under development (YARN-556) – No loss of in-flight work • Related work – Work-preserving NodeManager restart (YARN- 1336) – Work-preserving ApplicationMaster restart (YARN- 1489) 6/17/2014 YARN High Availability, Hadoop Summit 11
  • 12. Multiple RMs • Active / Standby architecture – Potentially multiple standbys – Warm standby • Running • Loads state and starts RPC servers on becoming Active – Manual / automatic failover – Clients and Web UI failover automatically 6/17/2014 YARN High Availability, Hadoop Summit 12
  • 13. Active / Standby 6/17/2014 YARN High Availability, Hadoop Summit 13 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager
  • 14. Manual Failover through CLI 6/17/2014 YARN High Availability, Hadoop Summit 14 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager
  • 15. Client Failover (ConfiguredFailoverProxyProvider) 6/17/2014 YARN High Availability, Hadoop Summit 15 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager App Master
  • 16. Automatic Failover 6/17/2014 YARN High Availability, Hadoop Summit 16 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager Elector Elector ZK
  • 17. Automatic Failover 6/17/2014 YARN High Availability, Hadoop Summit 17 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager Elector Elector ZK
  • 18. Automatic Failover • Zookeeper based – Uses ActiveStandbyElector for Active election • No need for a FailoverController – Can’t monitor RM process health and recover 6/17/2014 YARN High Availability, Hadoop Summit 18
  • 19. Network Hiccup 6/17/2014 YARN High Availability, Hadoop Summit 19 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager Elector Elector ZK
  • 20. Multiple Actives? 6/17/2014 YARN High Availability, Hadoop Summit 20 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Active Resource Manager Elector Elector ZK
  • 21. Fencing • The state store gets corrupted when multiple RMs assume the Active role. • Exclusive access to a single RM. – ZKRMStateStore takes care of this. – Shared “admin” access. – Exclusive “create-delete” access on transition to Active 6/17/2014 YARN High Availability, Hadoop Summit 21
  • 22. Network Hiccup 6/17/2014 YARN High Availability, Hadoop Summit 22 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager Elector Elector ZK
  • 23. Active / Standby 6/17/2014 YARN High Availability, Hadoop Summit 23 Node Manager Node Manager App Master Client Client Active Resource Manager RM Store Standby Resource Manager Elector Elector ZK
  • 24. In-flight RPCs • In-flight RPCs: Retry or not? – E.g. Submit application – we clearly don’t want two applications submitted. • Depends on whether failover happens before, during, or after the RM acts on the call. • Solution – Annotate APIs as Idempotent or AtMostOnce 6/17/2014 YARN High Availability, Hadoop Summit 24
  • 25. Web UI • Standby RM has no/stale information. • Users don’t know which RM is Active. • Redirect Web UI and REST calls to Active RM. – Except a few pages that give information about the RM. 6/17/2014 YARN High Availability, Hadoop Summit 25
  • 26. Admin Refresh • Admin refresh ($ yarn rmadmin –refresh): – Refreshes that particular RM – Active/Standby – Uses local configuration file • FileSystemBasedConfigurationProvider – Upload the configuration files to (potentially shared) filesystem like HDFS. 6/17/2014 YARN High Availability, Hadoop Summit 26
  • 27. Setting up HA Config name Value yarn.resourcemanager.ha.enabled true yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 <host1> yarn.resourcemanager.hostname.rm2 <host2> yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class ZKRMStateStore1 yarn.resourcemanager.zk-address <zk-quorum> yarn.resourcemanager.cluster-id <cluster-id> 6/17/2014 YARN High Availability, Hadoop Summit 27 1. org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
  • 28. Demo! 6/17/2014 YARN High Availability, Hadoop Summit 28
  • 29. Questions? 6/17/2014 YARN High Availability, Hadoop Summit 29