SlideShare a Scribd company logo
1 of 39
© Hortonworks Inc. 2015
Apache Hadoop YARN 2015
Present and Future
Vinod Kumar Vavilapalli
vinodkv [at] apache.org
@tshooter
Page 1
© Hortonworks Inc. 2015
Who am I?
• 7.75 Hadoop-years old
– Don’t fall for the job postings asking
for 10 years #Hadoop Experience yet

• Past
– 2007: Last thing at School – a two
node Tomcat cluster. Three months
later, first thing at job, brought down a
800 node cluster ;)
– Team that ran Hadoop @ Yahoo!
• Present: @Hortonworks
• Two hats
– Hortonworks: Hadoop MapReduce
and YARN Development lead
– Apache: Apache Hadoop PMC,
Apache Member
• Worked/working on
– YARN, Hadoop MapReduce,
HadoopOnDemand,
CapacityScheduler, Hadoop security
– Apache Ambari: Kickstarted the
project’s first release
– Stinger: High performance data
processing with Hadoop/Hive
• Lots of trouble shooting on
clusters (@tshooter)
• 99% + code in Apache, Hadoop
– Open Source
– Community driven
Page 2
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Agenda
• Apache Hadoop YARN : Overview
• Past
• Present
• Future
Page 3
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Overview
The Why and the What
Architecting the Future of Big Data
Page 4
© Hortonworks Inc. 2015
Why Hadoop YARN?
• Resource Management
• A messy problem
– Multiple apps, frameworks, their life-
cycles and evolution
• Varied expectations
– On isolation, capacity allocations,
scheduling
– Admin: “Best use of my cluster”
– Users: “Get me as much as possible,
as fast as possible”
• Tenancy
– “I am running this cluster for one
user”
– It almost never stops there
– Groups, Teams, Users
• Adhoc structures get bad real fast
• What’s different?
– Centered around Data
• ‘iIities
– Admission policies. Sharing. Security.
Elasticity. SLAs. ROI
Page 5
Architecting the Future of Big Data
Data
?
Applications
Admins Users
© Hortonworks Inc. 2015
What is Hadoop YARN?
Page 6
HDFS (Scalable, Reliable Storage)
YARN (Cluster Resource Management)
Applications (Running Natively in Hadoop)
• Store all your data in one place … (HDFS)
• Interact with that data in multiple ways … (YARN Platform + Apps)
• Scale as you go, shared, multi-tenant, secure … (The Hadoop Stack)
Queues Admins/Users
Cluster Resources
Pipelines
© Hortonworks Inc. 2015
Past
A quick history
Architecting the Future of Big Data
Page 7
© Hortonworks Inc. 2015
A brief Timeline before the BigBang
• Sub-project of Apache Hadoop
• Releases tied to Hadoop releases
• Gmail like alphas and betas 
– In production at several large sites for
MapReduce already by that time
Page 8
Architecting the Future of Big Data
1st line of Code Open sourced First 2.0 alpha First 2.0 beta
June-July 2010 August 2011 May 2012 August 2013
© Hortonworks Inc. 2015
Apache Hadoop YARN releases
• 15 October, 2013
• The 1st GA release of Apache Hadoop 2.x
• YARN
– First stable and supported release of YARN
– Binary Compatibility for MapReduce applications built on Hadoop-1.x
– YARN level APIs solidified for the future
– Performance
– Scale from the get-go!
• Support for running Hadoop on Microsoft Windows
• Substantial amount of integration testing with rest of projects in the
ecosystem
Page 9
Architecting the Future of Big Data
Apache Hadoop 2.2
© Hortonworks Inc. 2015
Releases (contd)
• 24 February, 2014
• First post GA release for the year 2014
• Number of bug-fixes, enhancements
• Alpha features in YARN
– ResourceManager Failover
– Application History
Page 10
Architecting the Future of Big Data
Apache Hadoop 2.3
© Hortonworks Inc. 2015
Releases (contd)
• 07 April, 2014
• YARN
– ResourceManager Fail-over
– Preemption aided Scheduling
– Application History and Timeline Service V1
Page 11
Architecting the Future of Big Data
Apache Hadoop 2.4
© Hortonworks Inc. 2015
Releases (contd)
• 11 August, 2014
• YARN
– YARN's REST APIs
– Submitting & killing applications.
– Timeline Service V1 Security
Page 12
Architecting the Future of Big Data
Apache Hadoop 2.5
© Hortonworks Inc. 2015
Present
Architecting the Future of Big Data
Page 13
© Hortonworks Inc. 2015
Apache Hadoop releases (contd)
• 18 November 2014
• Last major release at the time of this talk
• YARN
– Support for rolling upgrades
– Support for long running services
– Support for node labels
– Alpha/Beta features: Time-based resource reservations, running applications
natively in Docker containers
Page 14
Architecting the Future of Big Data
Apache Hadoop 2.6
© Hortonworks Inc. 2015
Rolling Upgrades
At a click of a button
Architecting the Future of Big Data
Page 15
© Hortonworks Inc. 2015
Work preserving ResourceManager restart
Page 16
Architecting the Future of Big Data
• ResourceManager remembers some state
• Reconstructs the remaining from nodes and apps
© Hortonworks Inc. 2015
Work preserving NodeManager restart
Page 17
Architecting the Future of Big Data
• NodeManager remembers state on each machine
• Reconnects to running containers
© Hortonworks Inc. 2015
ResourceManager Fail-over
• Active/Standby Mode
• Depends on fast-recovery
Page 18
Architecting the Future of Big Data
ZooKeeper
© Hortonworks Inc. 2015
YARN Rolling Upgrades Workflow
Page 19
Architecting the Future of Big Data
• Servers first
– Masters followed by Slaves
• Upgrade of Applications/Frameworks is decoupled!
© Hortonworks Inc. 2015
YARN Rolling Upgrades Snapshot
Page 20
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Stack Rolling Upgrades
Page 21
Architecting the Future of Big Data
Rolling Updates Session by Sanjay Radia
Thursday April 16, 2015 11:45-12:25
@ Silver Hall
© Hortonworks Inc. 2015
Services on YARN
Architecting the Future of Big Data
Page 22
© Hortonworks Inc. 2015
Long running services
• You could run them already before
2.6!
• Enhancements needed
– Logs
– Security
– Management/monitoring
– Sharing and Placement
– Discovery
• Resource sharing across
workload types
• Fault tolerance of long running
services
– Work preserving AM restart
– AM forgetting faults
• Service registry
• Project Slider:
http://slider.incubator.apache.org/
• HBase, Storm, Kafka already!
Page 23
Architecting the Future of Big Data
“Bringing Long Running Services to Hadoop YARN”
by Steve Loughran
Thursday April 16, 2015 12:40-13:20
@ Copper Hall
© Hortonworks Inc. 2015
Cluster Management Features
Architecting the Future of Big Data
Page 24
© Hortonworks Inc. 2015
Preemption aided Scheduling
• Admins
– “Make the best use of cluster resources”
• Users
– “Give me resources fast”
• Solution
– Elastic queues
– Loan idle capacities to others
– Take it back on demand
– Balance across queues: In
– Balance across users in a queue: WIP
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Fine-grain isolation for multi-tenancy
• Memory
– Custom monitoring
– Inelastic Resource
• CPU
– Cgroups on Linux
– Elastic Resource
• Support on Windows
– WIP
Page 26
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Multi-resource scheduling
• Multi-dimensional bin-packing
– Application A says “I want 8GB RAM
and 2 CPUs”
– Application B says “I want 1GB RAM
and 10 CPUs”
• Today – memory & cpu
– Physical memory / virtual memory
– Cpu Cores – Virtual cores
• Scheduling constrained based on
the “bottleneck” resource
– Watch out for utilization drop on the
non-scarce resource
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Node Labels
• Partitions
– Admin: “I have machines of different
types”
– Impact on capacity planning: “Hey,
we bought those Windows machines”
• Types
– Exclusive: “This is my Precious!”
– Non-exclusive: “I get binding
preference. Use it for others when
idle”
• Constraints
– “Take me to a machine running JDK
version 9”
– No impact on capacity planning
– WIP
Page 28
Architecting the Future of Big Data
Default Partition
Partition B
Linux
Partition C
Windows
JDK 8 JDK 7 JDK 7
© Hortonworks Inc. 2015
Operational and Developer tooling
Architecting the Future of Big Data
Page 29
© Hortonworks Inc. 2015
Application History and Timeline Service
• Before
– Few MR specific implementations:
History and web-UI
• Not just MR anymore!
• History
– “Why was my application slow?”
– “Where did my containers run?”
– MapReduce specific Job History
Server
– Need a generic solution beyond
ResourceManager Restart
• Run analytics on historical apps!
– “User with most resource utilization”
– “Largest application run”
• Application Timeline
– Framework specific event collection
and UIs
– “Show me the Counters for my
running MapReduce task”
– “Show me the slowest Storm stream
processing bolt while it is running”
• Present
– A LevelDB based implementation
– Integrated into MapReduce, Apache
Tez, Apache Hive
Page 30
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Other features
• Web Services
– No need for installed Hadoop Clients
– Submit an app
– Monitor / Kill it
• Multi-homing Environments
– Clients on a public networks
– Cluster traffic on a private network
– Fault tolerance
– Security
Page 31
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future
Architecting the Future of Big Data
Page 32
© Hortonworks Inc. 2015
Apache Hadoop releases (contd)
• Hadoop 2.7
– Likely April 19-24 week, 2014
– Moving to JDK 7 and beyond
• Future
Page 33
Architecting the Future of Big Data
Apache Hadoop 2.7,
2.8 and beyond
© Hortonworks Inc. 2015
Future: Timeline Service Next Generation
• Next generation
– Today’s solution helped understand the space
– Limited scalability and availability
• Analyzing Hadoop Clusters is a big-data problem
– Don’t want to throw away the Hadoop application metadata
– Large scale
– Enable near real-time analysis: “Find me the user who is hammering the
FileSystem with rouge applications. Now.”
• Timeline data stored in HBase and accessible to queries
Page 34
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: Improved Usability
• Generic run-time information
– “What is my actual usage by the running container?”
– “How many rack local containers did I get”
– “How healthy is the scheduler”
– “Why is my application stuck? What limits did it hit?”
• With Timeline Service
– Why is my application slow?
– Why is my cluster slow?
– Why is my application failing?
– Why is my cluster down?
– What happened with my application? Succeeded?
– What happened in my clusters?
• Collect and use past data
– To schedule my application better
– To do better capacity planning
Page 35
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: Containerized Applications
• Running Containerized
Applications on YARN
• Docker
• Multiple use-cases
– Run my existing service on YARN
– Slider + Docker
– Run my existing MapReduce
application on YARN via a docker
image
Page 36
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: Scheduling
• Support priorities across
applications within the same
queue
• Policy Driven scheduling
– “I want app level fairness in queue A,
user level fairness in queue B, and
throughput focus in all other queues”
• Node anti-affinity
– “Do not run two copies of my service
daemon on the same machine”
• Gang scheduling
– “Run all of my app at once”
• Dynamic scheduling of containers
based on actual utilization
• Stabilized App Reservations
– “Create a reservation for my app with
X resources to run at 6AM tomorrow”
• Time based policies
– “10% cluster capacity for queue A
from 6-9AM, but 20% from 9-12AM”
• Prioritized queues
– Admin’s queue takes precedence
over everything else
• Lot more ..
Page 37
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Future: More Resource Types
• Node level Isolation and Cluster
level Scheduling
• Disks
– Space
– IOPS: Read/Write
• Network
– Incoming bandwidth
– Outgoing bandwidth
Page 38
Architecting the Future of Big Data
© Hortonworks Inc. 2015
Thank you!
Page 39
Architecting the Future of Big Data
Sandbox: Hadoop in a VM!
Questions Time!

More Related Content

What's hot

Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
DataWorks Summit
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
DataWorks Summit
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 

What's hot (20)

YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
Lessons Learned from Migration of a Large-analytics Platform from MPP Databas...
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduce
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Yarn
YarnYarn
Yarn
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 

Viewers also liked

Viewers also liked (9)

Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Distributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop ClustersDistributed Deep Learning on Hadoop Clusters
Distributed Deep Learning on Hadoop Clusters
 
Setting up Hadoop YARN Clustering
Setting up Hadoop YARN ClusteringSetting up Hadoop YARN Clustering
Setting up Hadoop YARN Clustering
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 

Similar to Hadoop Summit Europe 2015 - YARN Present and Future

Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 

Similar to Hadoop Summit Europe 2015 - YARN Present and Future (20)

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big DataMrinal devadas, Hortonworks Making Sense Of Big Data
Mrinal devadas, Hortonworks Making Sense Of Big Data
 
201305 hadoop jpl-v3
201305 hadoop jpl-v3201305 hadoop jpl-v3
201305 hadoop jpl-v3
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Apache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft LibraryApache Ratis - In Search of a Usable Raft Library
Apache Ratis - In Search of a Usable Raft Library
 
Deploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARIDeploying and Managing Hadoop Clusters with AMBARI
Deploying and Managing Hadoop Clusters with AMBARI
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Architecting the Future of Big Data and Search
Architecting the Future of Big Data and SearchArchitecting the Future of Big Data and Search
Architecting the Future of Big Data and Search
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Hadoop Summit Europe 2015 - YARN Present and Future

  • 1. © Hortonworks Inc. 2015 Apache Hadoop YARN 2015 Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 1
  • 2. © Hortonworks Inc. 2015 Who am I? • 7.75 Hadoop-years old – Don’t fall for the job postings asking for 10 years #Hadoop Experience yet  • Past – 2007: Last thing at School – a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) – Team that ran Hadoop @ Yahoo! • Present: @Hortonworks • Two hats – Hortonworks: Hadoop MapReduce and YARN Development lead – Apache: Apache Hadoop PMC, Apache Member • Worked/working on – YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security – Apache Ambari: Kickstarted the project’s first release – Stinger: High performance data processing with Hadoop/Hive • Lots of trouble shooting on clusters (@tshooter) • 99% + code in Apache, Hadoop – Open Source – Community driven Page 2 Architecting the Future of Big Data
  • 3. © Hortonworks Inc. 2015 Agenda • Apache Hadoop YARN : Overview • Past • Present • Future Page 3 Architecting the Future of Big Data
  • 4. © Hortonworks Inc. 2015 Overview The Why and the What Architecting the Future of Big Data Page 4
  • 5. © Hortonworks Inc. 2015 Why Hadoop YARN? • Resource Management • A messy problem – Multiple apps, frameworks, their life- cycles and evolution • Varied expectations – On isolation, capacity allocations, scheduling – Admin: “Best use of my cluster” – Users: “Get me as much as possible, as fast as possible” • Tenancy – “I am running this cluster for one user” – It almost never stops there – Groups, Teams, Users • Adhoc structures get bad real fast • What’s different? – Centered around Data • ‘iIities – Admission policies. Sharing. Security. Elasticity. SLAs. ROI Page 5 Architecting the Future of Big Data Data ? Applications Admins Users
  • 6. © Hortonworks Inc. 2015 What is Hadoop YARN? Page 6 HDFS (Scalable, Reliable Storage) YARN (Cluster Resource Management) Applications (Running Natively in Hadoop) • Store all your data in one place … (HDFS) • Interact with that data in multiple ways … (YARN Platform + Apps) • Scale as you go, shared, multi-tenant, secure … (The Hadoop Stack) Queues Admins/Users Cluster Resources Pipelines
  • 7. © Hortonworks Inc. 2015 Past A quick history Architecting the Future of Big Data Page 7
  • 8. © Hortonworks Inc. 2015 A brief Timeline before the BigBang • Sub-project of Apache Hadoop • Releases tied to Hadoop releases • Gmail like alphas and betas  – In production at several large sites for MapReduce already by that time Page 8 Architecting the Future of Big Data 1st line of Code Open sourced First 2.0 alpha First 2.0 beta June-July 2010 August 2011 May 2012 August 2013
  • 9. © Hortonworks Inc. 2015 Apache Hadoop YARN releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – Binary Compatibility for MapReduce applications built on Hadoop-1.x – YARN level APIs solidified for the future – Performance – Scale from the get-go! • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem Page 9 Architecting the Future of Big Data Apache Hadoop 2.2
  • 10. © Hortonworks Inc. 2015 Releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Number of bug-fixes, enhancements • Alpha features in YARN – ResourceManager Failover – Application History Page 10 Architecting the Future of Big Data Apache Hadoop 2.3
  • 11. © Hortonworks Inc. 2015 Releases (contd) • 07 April, 2014 • YARN – ResourceManager Fail-over – Preemption aided Scheduling – Application History and Timeline Service V1 Page 11 Architecting the Future of Big Data Apache Hadoop 2.4
  • 12. © Hortonworks Inc. 2015 Releases (contd) • 11 August, 2014 • YARN – YARN's REST APIs – Submitting & killing applications. – Timeline Service V1 Security Page 12 Architecting the Future of Big Data Apache Hadoop 2.5
  • 13. © Hortonworks Inc. 2015 Present Architecting the Future of Big Data Page 13
  • 14. © Hortonworks Inc. 2015 Apache Hadoop releases (contd) • 18 November 2014 • Last major release at the time of this talk • YARN – Support for rolling upgrades – Support for long running services – Support for node labels – Alpha/Beta features: Time-based resource reservations, running applications natively in Docker containers Page 14 Architecting the Future of Big Data Apache Hadoop 2.6
  • 15. © Hortonworks Inc. 2015 Rolling Upgrades At a click of a button Architecting the Future of Big Data Page 15
  • 16. © Hortonworks Inc. 2015 Work preserving ResourceManager restart Page 16 Architecting the Future of Big Data • ResourceManager remembers some state • Reconstructs the remaining from nodes and apps
  • 17. © Hortonworks Inc. 2015 Work preserving NodeManager restart Page 17 Architecting the Future of Big Data • NodeManager remembers state on each machine • Reconnects to running containers
  • 18. © Hortonworks Inc. 2015 ResourceManager Fail-over • Active/Standby Mode • Depends on fast-recovery Page 18 Architecting the Future of Big Data ZooKeeper
  • 19. © Hortonworks Inc. 2015 YARN Rolling Upgrades Workflow Page 19 Architecting the Future of Big Data • Servers first – Masters followed by Slaves • Upgrade of Applications/Frameworks is decoupled!
  • 20. © Hortonworks Inc. 2015 YARN Rolling Upgrades Snapshot Page 20 Architecting the Future of Big Data
  • 21. © Hortonworks Inc. 2015 Stack Rolling Upgrades Page 21 Architecting the Future of Big Data Rolling Updates Session by Sanjay Radia Thursday April 16, 2015 11:45-12:25 @ Silver Hall
  • 22. © Hortonworks Inc. 2015 Services on YARN Architecting the Future of Big Data Page 22
  • 23. © Hortonworks Inc. 2015 Long running services • You could run them already before 2.6! • Enhancements needed – Logs – Security – Management/monitoring – Sharing and Placement – Discovery • Resource sharing across workload types • Fault tolerance of long running services – Work preserving AM restart – AM forgetting faults • Service registry • Project Slider: http://slider.incubator.apache.org/ • HBase, Storm, Kafka already! Page 23 Architecting the Future of Big Data “Bringing Long Running Services to Hadoop YARN” by Steve Loughran Thursday April 16, 2015 12:40-13:20 @ Copper Hall
  • 24. © Hortonworks Inc. 2015 Cluster Management Features Architecting the Future of Big Data Page 24
  • 25. © Hortonworks Inc. 2015 Preemption aided Scheduling • Admins – “Make the best use of cluster resources” • Users – “Give me resources fast” • Solution – Elastic queues – Loan idle capacities to others – Take it back on demand – Balance across queues: In – Balance across users in a queue: WIP Page 25 Architecting the Future of Big Data
  • 26. © Hortonworks Inc. 2015 Fine-grain isolation for multi-tenancy • Memory – Custom monitoring – Inelastic Resource • CPU – Cgroups on Linux – Elastic Resource • Support on Windows – WIP Page 26 Architecting the Future of Big Data
  • 27. © Hortonworks Inc. 2015 Multi-resource scheduling • Multi-dimensional bin-packing – Application A says “I want 8GB RAM and 2 CPUs” – Application B says “I want 1GB RAM and 10 CPUs” • Today – memory & cpu – Physical memory / virtual memory – Cpu Cores – Virtual cores • Scheduling constrained based on the “bottleneck” resource – Watch out for utilization drop on the non-scarce resource Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2015 Node Labels • Partitions – Admin: “I have machines of different types” – Impact on capacity planning: “Hey, we bought those Windows machines” • Types – Exclusive: “This is my Precious!” – Non-exclusive: “I get binding preference. Use it for others when idle” • Constraints – “Take me to a machine running JDK version 9” – No impact on capacity planning – WIP Page 28 Architecting the Future of Big Data Default Partition Partition B Linux Partition C Windows JDK 8 JDK 7 JDK 7
  • 29. © Hortonworks Inc. 2015 Operational and Developer tooling Architecting the Future of Big Data Page 29
  • 30. © Hortonworks Inc. 2015 Application History and Timeline Service • Before – Few MR specific implementations: History and web-UI • Not just MR anymore! • History – “Why was my application slow?” – “Where did my containers run?” – MapReduce specific Job History Server – Need a generic solution beyond ResourceManager Restart • Run analytics on historical apps! – “User with most resource utilization” – “Largest application run” • Application Timeline – Framework specific event collection and UIs – “Show me the Counters for my running MapReduce task” – “Show me the slowest Storm stream processing bolt while it is running” • Present – A LevelDB based implementation – Integrated into MapReduce, Apache Tez, Apache Hive Page 30 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2015 Other features • Web Services – No need for installed Hadoop Clients – Submit an app – Monitor / Kill it • Multi-homing Environments – Clients on a public networks – Cluster traffic on a private network – Fault tolerance – Security Page 31 Architecting the Future of Big Data
  • 32. © Hortonworks Inc. 2015 Future Architecting the Future of Big Data Page 32
  • 33. © Hortonworks Inc. 2015 Apache Hadoop releases (contd) • Hadoop 2.7 – Likely April 19-24 week, 2014 – Moving to JDK 7 and beyond • Future Page 33 Architecting the Future of Big Data Apache Hadoop 2.7, 2.8 and beyond
  • 34. © Hortonworks Inc. 2015 Future: Timeline Service Next Generation • Next generation – Today’s solution helped understand the space – Limited scalability and availability • Analyzing Hadoop Clusters is a big-data problem – Don’t want to throw away the Hadoop application metadata – Large scale – Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.” • Timeline data stored in HBase and accessible to queries Page 34 Architecting the Future of Big Data
  • 35. © Hortonworks Inc. 2015 Future: Improved Usability • Generic run-time information – “What is my actual usage by the running container?” – “How many rack local containers did I get” – “How healthy is the scheduler” – “Why is my application stuck? What limits did it hit?” • With Timeline Service – Why is my application slow? – Why is my cluster slow? – Why is my application failing? – Why is my cluster down? – What happened with my application? Succeeded? – What happened in my clusters? • Collect and use past data – To schedule my application better – To do better capacity planning Page 35 Architecting the Future of Big Data
  • 36. © Hortonworks Inc. 2015 Future: Containerized Applications • Running Containerized Applications on YARN • Docker • Multiple use-cases – Run my existing service on YARN – Slider + Docker – Run my existing MapReduce application on YARN via a docker image Page 36 Architecting the Future of Big Data
  • 37. © Hortonworks Inc. 2015 Future: Scheduling • Support priorities across applications within the same queue • Policy Driven scheduling – “I want app level fairness in queue A, user level fairness in queue B, and throughput focus in all other queues” • Node anti-affinity – “Do not run two copies of my service daemon on the same machine” • Gang scheduling – “Run all of my app at once” • Dynamic scheduling of containers based on actual utilization • Stabilized App Reservations – “Create a reservation for my app with X resources to run at 6AM tomorrow” • Time based policies – “10% cluster capacity for queue A from 6-9AM, but 20% from 9-12AM” • Prioritized queues – Admin’s queue takes precedence over everything else • Lot more .. Page 37 Architecting the Future of Big Data
  • 38. © Hortonworks Inc. 2015 Future: More Resource Types • Node level Isolation and Cluster level Scheduling • Disks – Space – IOPS: Read/Write • Network – Incoming bandwidth – Outgoing bandwidth Page 38 Architecting the Future of Big Data
  • 39. © Hortonworks Inc. 2015 Thank you! Page 39 Architecting the Future of Big Data Sandbox: Hadoop in a VM! Questions Time!

Editor's Notes

  1. YARN is not the first general Resource Management platform. So what’s different? It’s data!
  2. Queues reflect org structures. Hierarchical in nature.