SlideShare a Scribd company logo
1 of 25
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Running Services on
YARN
Munich, April 2017
Varun Vasudev
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About myself
⬢ Apache Hadoop contributor since 2014
⬢ Apache Hadoop committer and PMC member
⬢ Currently working for Hortonworks
⬢ vvasudev@apache.org
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Introduction to Apache Hadoop YARN
⬢ Architectural center of big data workloads
⬢ Enterprise adoption
–Secure mode is popular
–Multi-tenant support
⬢ SLAs
–Tolerance for slow running jobs decreasing
–Consistent performance desired
⬢ Diverse workloads increasing
–LLAP on Slider
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Introduction to Apache Hadoop YARN
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
TezTez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
YARN
The Architectural
Center of Hadoop
• Common data platform, many applications
• Support multi-tenant access & processing
• Batch, interactive & real-time use cases
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Several important trends in age of Hadoop 3.0 +
YARN and Other Platform Services
Storage
Resource
Management Security
Service
Discovery Management
Monitoring
Alerts
IOT Assembly
Kafka Storm HBase Solr
Governance
MR Tez Spark …
Innovating
frameworks:
Flink,
DL(TensorFlow),
etc.
Various Environments
On Premise Private Cloud Public Cloud
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services workloads becoming more popular
⬢ Users using more and more long running services like LLAP, HiveServer, HBase, etc
⬢ Service workloads are gaining more importance
–Need a webserver to serve results from a MR job
–New YARN UI can be run in its own container
–ATSv2 would like to launch ATS reader containers as well
–Applications want to run their own shuffle service
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Node 1
NodeManager128G, 16 vcores
Launch Applicaton 1 AMAM process
Launch AM process via
ContainerExecutor – DCE, LCE, WSCE.
Monitor/isolate memory and cpu
Application Lifecycle
ResourceManager
(active)
Request containers
Allocate containers
Container 1 process
Container 2 process
Launch containers on node using
DCE, LCE, WSCE. Monitor/isolate
memory and cpu
History Server(ATS – leveldb,
JHS - HDFS)
HDFS
Log aggregation
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Application Lifecycle
⬢ Designed for batch jobs
–Jobs run for hours, days
–Jobs are using frameworks(like MR, Tez, Spark) which are aware of YARN
–Container failure is bad but frameworks have logic to handle it
•Local container state loss is handled
–Jobs are chained/pipelined using application ids
–Debugging is an offline event
⬢ Doesn’t carry over cleanly for services
–Services run for longer periods of time
–Services may or may not be aware of YARN
–Container loss is a bigger problem, can have really bad consequences
–Services would like to discover other services
–Debugging is an online event
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enabling Services on YARN
1
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enabling Services on YARN
⬢ AM to manage services
⬢ Service discovery
⬢ Container lifecycle
⬢ Scheduler changes
⬢ YARN UI
⬢ Application upgrades
⬢ Other issues
–Log collection
–Support for monitoring
1
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
AM to manage services
⬢ Any service/job on YARN requires an AM
–AM’s are hard to write
–Different services will re-implement the same functionalities
–AM has to keep up with changes in Apache Hadoop
⬢ Native YARN framework layer for services(YARN-5079)
–Provides an AM that ships as part of Apache Hadoop that can be used to manage services
–AM is from the Apache Slider project
–AM provides REST APIs to manage applications
–Has support for functionality such as port scheduling, flexing the number of containers
–Maintained by the Apache Hadoop developers so it’s kept up to date with the rest of YARN
–New YARN REST APIs to launch services
1
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN REST API to launch services
{ "name": "vvasudev-druid-2017-03-16",
"resource": {
"cpus": 16,
"memory": "51200"
},
"components" :
[
{
"name": "vvasudev-druid",
"dependencies": [ ],
"artifact": {
"id": ”druid-image:0.1.0.0-25",
"type": "DOCKER"
},
"configuration": {
"properties": {
"env.CUSTOM_SERVICE_PROPERTIES": "true",
"env.ZK_HOSTS": ”zkhost1:2181,zkhost2:2181,zkhost3:2181"
}
}
}
],
"number_of_containers": 5,
"launch_command": "/opt/druid/start-druid.sh",
"queue" : ”yarn-services”
}
1
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Service discovery
⬢ Long running services require a way to discover them
–Application ids are constant for the lifetime of the application
–Container ids are constant for the lifetime of the container but containers will come up and go
down
⬢ Add support for discovery of long running services using DNS and the Registry
Service(YARN-4757)
–DNS is well understood
–Registry service will have a record of the application to DNS name
–YARN has a DNS server but currently this is for testing and experimentation only
–YARN will need to add support for DNS updates to fit into existing DNS solutions
1
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Service Discovery
NodeManager
NodeManager
NodeManager
ResourceManager
DNS Server Registry Service
ApplicationManager
Zookeeper
Zookeeper
Zookeeper
User
1
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Container lifecycle
⬢ When the container exits, the NodeManager(NM) reclaims all the resources
immediately
–NM also cleans up any local state that the container maintained
⬢ AM may or may not be able to get a container back on the same node
–NM has to download any private resources again for the container leading to delays in restarts
⬢ Added support for first class container re-tries(YARN-4725)
–AM can specify retry policy when starting the container
–On process exit, the NM will not clean up any state or resources
–Instead it will attempt to retry the container
–AM can specify limits on the number of retries as well as the delay between retries
1
6
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Container Lifecycle
NodeManager Container process
Disk 1 Disk 2
Disk 3
HDFS
Application
Container
Data
1
7
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduler improvements
⬢ In case of services, affinity and and anti-affinity become important
–Affinity and anti-affinity apply at a container and an application level – e.g. don’t schedule two
HBase region servers on the same node but schedule the Spark containers on the same nodes as
the region server
⬢ Support is being added for affinity and anti-affinity in the RM(YARN-5907)
–Slider AM already has some basic support for container affinity and anti-affinity via re-tries
–RM can do a better job of container placement if it has first class support
–AMs can specify affinity and anti-affinity policies to get the right placement they need
1
8
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduler improvements - Affinity and Anti-affinity
⬢ Anti-Affinity
–Some services don’t want their daemons run on the same host/rack for better fault recovering or
performance.
–For example, don’t run >1 HBase region server on the same fault zone.
1
9
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Scheduler Improvements - Affinity and Anti-affinity
⬢ Affinity
–Some services want to run their daemons on the same host/rack, etc. for performance.
–For example, run Storm workers as close as possible for better data exchanging performance.
(SW = Storm Worker)
2
0
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN UI(YARN-3368)
2
1
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN UI - Services
2
2
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Application upgrades
⬢ YARN has no support for container or application upgrades
–Container upgrade support support needs to be added in NM
–Application upgrade support has to be added in the RM
⬢ Support added for container upgrade and rollback(YARN-4726)
–Application upgrade support still to be carried out
2
3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other issues
⬢ Log rotation
–Log rotation used to run on application completion
–Support has been added to fetch the logs for running containers
⬢ Support for container monitoring/health checks
2
4
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
In Conclusion
⬢ Services workloads becoming more and more popular on YARN
⬢ Fundamental pieces to add support for services are in place but few additional pieces
remain
2
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved2
5
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!

More Related Content

What's hot

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 

What's hot (20)

Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionDebugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in Production
 
LLAP: Building Cloud First BI
LLAP: Building Cloud First BILLAP: Building Cloud First BI
LLAP: Building Cloud First BI
 
An Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, FutureAn Overview on Optimization in Apache Hive: Past, Present, Future
An Overview on Optimization in Apache Hive: Past, Present, Future
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San JoseCloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
Cloudy with a chance of Hadoop - DataWorks Summit 2017 San Jose
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Data Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming SystemsData Guarantees and Fault Tolerance in Streaming Systems
Data Guarantees and Fault Tolerance in Streaming Systems
 

Viewers also liked

Viewers also liked (14)

Hadoop YARN Services
Hadoop YARN ServicesHadoop YARN Services
Hadoop YARN Services
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Solving Cyber at Scale
Solving Cyber at ScaleSolving Cyber at Scale
Solving Cyber at Scale
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache MetronMaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security Apache Metron: Community Driven Cyber Security
Apache Metron: Community Driven Cyber Security
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 

Similar to Running Services on YARN

Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
DataWorks Summit
 

Similar to Running Services on YARN (20)

Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Scheduling Policies in YARN
Scheduling Policies in YARNScheduling Policies in YARN
Scheduling Policies in YARN
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Cloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerationsCloudy with a chance of Hadoop - real world considerations
Cloudy with a chance of Hadoop - real world considerations
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedIn
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
YARN - Past, Present, & Future
YARN - Past, Present, & FutureYARN - Past, Present, & Future
YARN - Past, Present, & Future
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop ManagementTaming the Elephant: Efficient and Effective Apache Hadoop Management
Taming the Elephant: Efficient and Effective Apache Hadoop Management
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
The Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral ProcessingThe Unbearable Lightness of Ephemeral Processing
The Unbearable Lightness of Ephemeral Processing
 

More from DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Running Services on YARN

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Running Services on YARN Munich, April 2017 Varun Vasudev
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About myself ⬢ Apache Hadoop contributor since 2014 ⬢ Apache Hadoop committer and PMC member ⬢ Currently working for Hortonworks ⬢ vvasudev@apache.org
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introduction to Apache Hadoop YARN ⬢ Architectural center of big data workloads ⬢ Enterprise adoption –Secure mode is popular –Multi-tenant support ⬢ SLAs –Tolerance for slow running jobs decreasing –Consistent performance desired ⬢ Diverse workloads increasing –LLAP on Slider
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introduction to Apache Hadoop YARN YARN: Data Operating System (Cluster Resource Management) 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° Script Pig SQL Hive TezTez Java Scala Cascading Tez ° ° ° ° ° ° ° ° ° ° ° ° ° ° Others ISV Engines HDFS (Hadoop Distributed File System) Stream Storm Search Solr NoSQL HBase Accumulo Slider Slider BATCH, INTERACTIVE & REAL-TIME DATA ACCESS In-Memory Spark YARN The Architectural Center of Hadoop • Common data platform, many applications • Support multi-tenant access & processing • Batch, interactive & real-time use cases
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Several important trends in age of Hadoop 3.0 + YARN and Other Platform Services Storage Resource Management Security Service Discovery Management Monitoring Alerts IOT Assembly Kafka Storm HBase Solr Governance MR Tez Spark … Innovating frameworks: Flink, DL(TensorFlow), etc. Various Environments On Premise Private Cloud Public Cloud
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services workloads becoming more popular ⬢ Users using more and more long running services like LLAP, HiveServer, HBase, etc ⬢ Service workloads are gaining more importance –Need a webserver to serve results from a MR job –New YARN UI can be run in its own container –ATSv2 would like to launch ATS reader containers as well –Applications want to run their own shuffle service
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Node 1 NodeManager128G, 16 vcores Launch Applicaton 1 AMAM process Launch AM process via ContainerExecutor – DCE, LCE, WSCE. Monitor/isolate memory and cpu Application Lifecycle ResourceManager (active) Request containers Allocate containers Container 1 process Container 2 process Launch containers on node using DCE, LCE, WSCE. Monitor/isolate memory and cpu History Server(ATS – leveldb, JHS - HDFS) HDFS Log aggregation
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application Lifecycle ⬢ Designed for batch jobs –Jobs run for hours, days –Jobs are using frameworks(like MR, Tez, Spark) which are aware of YARN –Container failure is bad but frameworks have logic to handle it •Local container state loss is handled –Jobs are chained/pipelined using application ids –Debugging is an offline event ⬢ Doesn’t carry over cleanly for services –Services run for longer periods of time –Services may or may not be aware of YARN –Container loss is a bigger problem, can have really bad consequences –Services would like to discover other services –Debugging is an online event
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enabling Services on YARN
  • 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enabling Services on YARN ⬢ AM to manage services ⬢ Service discovery ⬢ Container lifecycle ⬢ Scheduler changes ⬢ YARN UI ⬢ Application upgrades ⬢ Other issues –Log collection –Support for monitoring
  • 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved AM to manage services ⬢ Any service/job on YARN requires an AM –AM’s are hard to write –Different services will re-implement the same functionalities –AM has to keep up with changes in Apache Hadoop ⬢ Native YARN framework layer for services(YARN-5079) –Provides an AM that ships as part of Apache Hadoop that can be used to manage services –AM is from the Apache Slider project –AM provides REST APIs to manage applications –Has support for functionality such as port scheduling, flexing the number of containers –Maintained by the Apache Hadoop developers so it’s kept up to date with the rest of YARN –New YARN REST APIs to launch services
  • 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN REST API to launch services { "name": "vvasudev-druid-2017-03-16", "resource": { "cpus": 16, "memory": "51200" }, "components" : [ { "name": "vvasudev-druid", "dependencies": [ ], "artifact": { "id": ”druid-image:0.1.0.0-25", "type": "DOCKER" }, "configuration": { "properties": { "env.CUSTOM_SERVICE_PROPERTIES": "true", "env.ZK_HOSTS": ”zkhost1:2181,zkhost2:2181,zkhost3:2181" } } } ], "number_of_containers": 5, "launch_command": "/opt/druid/start-druid.sh", "queue" : ”yarn-services” }
  • 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Service discovery ⬢ Long running services require a way to discover them –Application ids are constant for the lifetime of the application –Container ids are constant for the lifetime of the container but containers will come up and go down ⬢ Add support for discovery of long running services using DNS and the Registry Service(YARN-4757) –DNS is well understood –Registry service will have a record of the application to DNS name –YARN has a DNS server but currently this is for testing and experimentation only –YARN will need to add support for DNS updates to fit into existing DNS solutions
  • 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Service Discovery NodeManager NodeManager NodeManager ResourceManager DNS Server Registry Service ApplicationManager Zookeeper Zookeeper Zookeeper User
  • 15. 1 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container lifecycle ⬢ When the container exits, the NodeManager(NM) reclaims all the resources immediately –NM also cleans up any local state that the container maintained ⬢ AM may or may not be able to get a container back on the same node –NM has to download any private resources again for the container leading to delays in restarts ⬢ Added support for first class container re-tries(YARN-4725) –AM can specify retry policy when starting the container –On process exit, the NM will not clean up any state or resources –Instead it will attempt to retry the container –AM can specify limits on the number of retries as well as the delay between retries
  • 16. 1 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Container Lifecycle NodeManager Container process Disk 1 Disk 2 Disk 3 HDFS Application Container Data
  • 17. 1 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduler improvements ⬢ In case of services, affinity and and anti-affinity become important –Affinity and anti-affinity apply at a container and an application level – e.g. don’t schedule two HBase region servers on the same node but schedule the Spark containers on the same nodes as the region server ⬢ Support is being added for affinity and anti-affinity in the RM(YARN-5907) –Slider AM already has some basic support for container affinity and anti-affinity via re-tries –RM can do a better job of container placement if it has first class support –AMs can specify affinity and anti-affinity policies to get the right placement they need
  • 18. 1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduler improvements - Affinity and Anti-affinity ⬢ Anti-Affinity –Some services don’t want their daemons run on the same host/rack for better fault recovering or performance. –For example, don’t run >1 HBase region server on the same fault zone.
  • 19. 1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduler Improvements - Affinity and Anti-affinity ⬢ Affinity –Some services want to run their daemons on the same host/rack, etc. for performance. –For example, run Storm workers as close as possible for better data exchanging performance. (SW = Storm Worker)
  • 20. 2 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN UI(YARN-3368)
  • 21. 2 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN UI - Services
  • 22. 2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application upgrades ⬢ YARN has no support for container or application upgrades –Container upgrade support support needs to be added in NM –Application upgrade support has to be added in the RM ⬢ Support added for container upgrade and rollback(YARN-4726) –Application upgrade support still to be carried out
  • 23. 2 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other issues ⬢ Log rotation –Log rotation used to run on application completion –Support has been added to fetch the logs for running containers ⬢ Support for container monitoring/health checks
  • 24. 2 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved In Conclusion ⬢ Services workloads becoming more and more popular on YARN ⬢ Fundamental pieces to add support for services are in place but few additional pieces remain
  • 25. 2 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you!