SlideShare a Scribd company logo
Rohith Sharma K S & Sunil G
Apache Hadoop PMC
Hortonworks Inc
Apache Hadoop YARN:
Present and Future 2017
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About.html
Rohith Sharma K S
 Apache Hadoop PMC and Committer
 Sr.Software Engineer @ Hortonworks
 7+ years for being “Hadooper”
Sunil G
 Apache Hadoop PMC and Committer
 Sr.Software Engineer @ Hortonworks
 Overall 10+ experience in enterprises software
development (5+ years for being “Hadooper”)
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
 Introduction
 Past
 Present & Future
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform – Today and Tomorrow
 It’s all about data!
 Layers that enable applications and
higher order frameworks that interact
with data
 Multi-colored YARN
– Apps
– Long running services
 Admins and admin tools (Ambari) for
cluster management and monitoring
https://www.flickr.com/photos/happyskrappy/15699919424
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why?
• Different asks from different actors
• On isolation, capacity allocations, scheduling
Faster!
More! Best for my cluster
Throughput
Utilization
Elasticity
Service uptime
Security
ROIEverything! Right now!
SLA!
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Compute Platform – Today and Tomorrow
Platform Services
Storage
Resource
Management
Service
Discovery Management
Monitoring
Alerts
Holiday Assembly
HBase
HTTP
IOT Assembly
Kafk
a
Storm HBase Solr
Security
Governance
MR Tez
Spark
Hive / Pig
LLAP
Flink
REEF
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Past: A quick history
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: Pre GA
• Sub-project of Apache Hadoop
• Releases tied to Hadoop releases
• Alphas and betas
– In production at several large sites for MapReduce already by that time
June-July 2010 August 2011 May 2012 August 2013
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 1/3
2.2 2.3 2.4 2.5 2.6 2.7
15 October 2013 24 February 2014 07 April 2014 11 August 2014
• 1st GA
• MR binary
compatibility
• YARN API
cleanup
• Testing!
• 1st Post GA
• Bug fixes
• Alpha features
• RM Fail-over
• CS
Preemption
• Timeline
Service V1
• Writable
REST APIs
• Timeline
Service V1
security
• Rolling
Upgrades
• Services
• Node labels
18 November 2014
• Moving to
JDK 7+
• Pluggable
YARN
authentication
21 Apr 2015
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 2/3
2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1
25 January 2016 25 August 2016 18 October 2016 04 August 2017
• Stabilization • Stabilization
• All community
parties
brought
together
• Stabilization • Stabilization • Application
Priority
• Reservations
• Node labels
improvements
22 March 2017
• Stabilization
• Next major
release
08 June 2017
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
A brief Timeline: GA Releases 3/3
3.0.0-alpha1 3.0.0-alpha2 3.0.0-alpha3 3.0.0-alpha4
03 September 2016 25 January 2017 26 May 2017 04 August 2017
• First alpha! • Second alpha! • Third alpha! • Fourth alpha!
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Present & Future
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 13
Containerization
Containers
GPUs /
FPGAs
More
powerful
scheduling
Much
faster
scheduling
Scale
SLAs
Usability
Service
workloads
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Last few Hadoop releases
Apache Hadoop 2.8.x
Apache Hadoop 3.0.x
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 2.8.x
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Application priorities
• YARN-1963
• Within a leaf-queue
FIFO Policy App 1 App 2 App 3 App 4
FIFO Policy
With priorities
App 1
P8
App 2
P4
App 3
P3
App 4
P2
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Queue priorities
 Today
– Give to the least satisfied queue first
 With priorities
– Give to the highest priority queue first
root
A
20% Configured Capacity
But 5% of used Capacity
Usage = 5/20 = 25%
B
80% Configured Capacity
But 8% of used Capacity
Usage 8/80 = 10%
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Preemption within a queue
• Between apps of different priorities
• Between apps of different users
App 1 P3 App 2 P2 App 3 P1
App 1 U1 App 2 U2 App 3 U3
App 1 P3
App 2 P2
Candidates for
preemption
Queue Full
App 2 U2
App 1 U1
Candidates for
preemption
Queue Full
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Per-queue Policy-driven scheduling
Previously Now
Ingestion
FIFO
Adhoc
User-fairness
Adhoc
FIFO
Ingestion
FIFO
• Coarse policies
• One scheduling algorithm in the cluster
• Rigid
• Difficult to experiment
• Fine grained policies
• One scheduling algorithm per queue
• Flexible
• Very easy to experiment!
Batch
FIFO
Batch
FIFO
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reservations
• “Run my workload tomorrow at 6AM”
• Persistence of the plans with RM failover: YARN-2573
Timeline
Resources
6:00AM
Block #1
Timeline
Resources
6:00AM
Block #1
Block #2
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Hadoop 3.x
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Looking at the Scale!
 Tons of sites with clusters made up of multiple thousands of nodes
– Yahoo!, Twitter, LinkedIn, Microsoft
 Largest clusters the last couple of years
– 6K-8K
 Roadmap: To 100K thousands and beyond
 Current progress: 40K nodes!
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving towards Global & Fast Scheduling
 Effort led by Wangda Tan from Hortonworks.
 Problems
– Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
– Several coarse grained locks
 Current effort made us where we improved to
– Look at several nodes at a time
– Fine grained locks
– Multiple allocator threads
– YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
– 10X throughput gains with enhancement added recently
– Much better placement decisions
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capacities in numbers vs percentages
root
(100%)
Sales
(30%)
Engineering
(65%)
Default
(5%)
Dev
(50%)
QE
(50%)
root
[memory=100Gi,vcores=100]
Sales
[memory=30Gi,vcores
=30]
Engineering
[memory=65Gi, vcores=60]
Default
[memory=1Gi,
vcores=1]
Dev
[memory=30Gi,
vcores=50]
QE
[memory=35Gi,
vcores=10]
 Current model using percentages has worked well for various use-cases
 Few challenges
– percentage based capacity is resilient to any changes in the cluster resource
– one percentage value for all resource-types
 YARN-5881
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Resource profiles and custom resource types
 Past
– Supports only Memory and CPU
 Now
– A generalized vector
– Admins can create custom Resource Types!
– Ease of resource requesting model using
profiles
NodeManager
Memory
CPU
GPU
NodeManager
Memory
CPU
GPU
ResourceManager
Small
Medium
Large
Profile Memory CPU GPU
Small 2 GB 4 Cores 1 Cores
Medium 4 GB 8 Cores 1 Cores
Large 16 GB 16 Cores 4 Cores Application Master
Small (3 containers)
Medium (2 containers)
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
GPUs on YARN!
 GPU can speed up compute-intensive applications 10x - 300x times
 Different levels of support
– Take me to a machine where GPUs are available with Partitions / Node Labels
– Take me to a machine where GPUs are available
• give me a full device only to me for the lifetime of my container
• give me multiple full devices only to me for the lifetime of my container
• give me full device(s) only to me for a portion of the lifetime of my container
• give me a slice of device(s) to me for a full / portion of the lifetime of my
container
 More dimensions:
– CPUs and memory and GPUs and on-GPU memory
– Topology of multiple GPUs
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
YARN Federation!
 Enables applications to scale to 100k of thousands of nodes
 Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters
 Federation negotiates with sub-clusters RM’s and provide resources to the application
 Applications can schedule tasks on any node
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Better placement strategies
 Affinity
 Anti-affinity
 Node Constraints
HBase
Storm
YARN
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Packaging
 Containers
– Lightweight mechanism for packaging and resource isolation
– Popularized and made accessible by Docker
– Can replace VMs in some cases
– Or more accurately, VMs got used in places where they didn’t
need to be
 Native integration ++ in YARN
– Support for “Container Runtimes” in LCE: YARN-3611
– Process runtime
– Docker runtime
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplified APIs for service definitions
 Applications need simple APIs
 Need to be deployable “easily”
 Simple REST API layer fronting YARN
– YARN-4793 Simplified API layer for services and beyond
 Spawn services & Manage them
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services support
 Application & Services upgrades
– ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users”
– YARN-4726
 Simplified discovery of services via DNS mechanisms: YARN-4757
– regionserver30.hbase-app-3.0.hadoop.yarn.site
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Services Framework
 Platform is only as good as the tools
 A native YARN services framework
– YARN-4692
– [Umbrella] Native YARN framework layer for services and
beyond
 Assembly: Supporting a DAG of apps:
– SLIDER-875
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
API based queue management
Decentralized
Improved logs
management
Live application logs
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User experience
New web UI
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service
 Application History
– “Where did my containers run?”
– “Why is my application slow?”
– “Is it really slow?”
– “Why is my application failing?”
– “What happened with my application?
Succeeded?”
 Cluster History
– Run analytics on historical apps!
– “User with most resource utilization”
– “Largest application run”
– “Why is my cluster slow?”
– “Why is my cluster down?”
– “What happened in my clusters?”
 Collect and use past data
– To schedule “my application” better
– To do better capacity planning
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Timeline Service 2.0
• Next generation
– Today’s solution helped us understand the space
– Limited scalability and availability
• “Analyzing Hadoop Clusters is becoming a big-data problem”
– Don’t want to throw away the Hadoop application metadata
– Large scale
– Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.”
• Timeline data stored in HBase and accessible to queries
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
BoF’s: Apache Hadoop – YARN, HDFS
Sanjay Radia
Hortonworks
Sunil G
Hortonworks
https://dataworkssummit.com/sydney-2017/birds-of-a-feather/apache-hadoop-yarn-hdfs
Thursday September 21st Room C2.1 6.00PM
-- Related Session --
Rohith Sharma K S
Hortonworks
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you!

More Related Content

What's hot

Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
DataWorks Summit/Hadoop Summit
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
DataWorks Summit/Hadoop Summit
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
DataWorks Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
DataWorks Summit/Hadoop Summit
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
DataWorks Summit
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
DataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
DataWorks Summit
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
DataWorks Summit
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
DataWorks Summit/Hadoop Summit
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
DataWorks Summit
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
Hortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
Hortonworks
 

What's hot (20)

Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Why is my Hadoop* job slow?
Why is my Hadoop* job slow?Why is my Hadoop* job slow?
Why is my Hadoop* job slow?
 
Running Zeppelin in Enterprise
Running Zeppelin in EnterpriseRunning Zeppelin in Enterprise
Running Zeppelin in Enterprise
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
Its Finally Here! Building Complex Streaming Analytics Apps in under 10 min w...
 
Protecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against DisastersProtecting your Critical Hadoop Clusters Against Disasters
Protecting your Critical Hadoop Clusters Against Disasters
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
Creating the Internet of Your Things
Creating the Internet of Your ThingsCreating the Internet of Your Things
Creating the Internet of Your Things
 
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3Deep learning on yarn  running distributed tensorflow etc on hadoop cluster v3
Deep learning on yarn running distributed tensorflow etc on hadoop cluster v3
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
 
Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014Splunk-hortonworks-risk-management-oct-2014
Splunk-hortonworks-risk-management-oct-2014
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Edw Optimization Solution
Edw Optimization Solution Edw Optimization Solution
Edw Optimization Solution
 

Similar to YARN - Past, Present, & Future

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Yahoo Developer Network
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
DataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Robin Cloud Platform for Big Data and Databases
Robin Cloud Platform for Big Data and DatabasesRobin Cloud Platform for Big Data and Databases
Robin Cloud Platform for Big Data and Databases
Robin Systems
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
Jayush Luniya
 

Similar to YARN - Past, Present, & Future (20)

Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration storyApache Hadoop 3 updates with migration story
Apache Hadoop 3 updates with migration story
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and FutureHadoop Summit San Jose 2015: YARN - Past, Present and Future
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Robin Cloud Platform for Big Data and Databases
Robin Cloud Platform for Big Data and DatabasesRobin Cloud Platform for Big Data and Databases
Robin Cloud Platform for Big Data and Databases
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache AmbariManaging Enterprise Hadoop Clusters with Apache Ambari
Managing Enterprise Hadoop Clusters with Apache Ambari
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

YARN - Past, Present, & Future

  • 1. Rohith Sharma K S & Sunil G Apache Hadoop PMC Hortonworks Inc Apache Hadoop YARN: Present and Future 2017
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About.html Rohith Sharma K S  Apache Hadoop PMC and Committer  Sr.Software Engineer @ Hortonworks  7+ years for being “Hadooper” Sunil G  Apache Hadoop PMC and Committer  Sr.Software Engineer @ Hortonworks  Overall 10+ experience in enterprises software development (5+ years for being “Hadooper”)
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda  Introduction  Past  Present & Future
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform – Today and Tomorrow  It’s all about data!  Layers that enable applications and higher order frameworks that interact with data  Multi-colored YARN – Apps – Long running services  Admins and admin tools (Ambari) for cluster management and monitoring https://www.flickr.com/photos/happyskrappy/15699919424
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why? • Different asks from different actors • On isolation, capacity allocations, scheduling Faster! More! Best for my cluster Throughput Utilization Elasticity Service uptime Security ROIEverything! Right now! SLA!
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Compute Platform – Today and Tomorrow Platform Services Storage Resource Management Service Discovery Management Monitoring Alerts Holiday Assembly HBase HTTP IOT Assembly Kafk a Storm HBase Solr Security Governance MR Tez Spark Hive / Pig LLAP Flink REEF
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Past: A quick history
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: Pre GA • Sub-project of Apache Hadoop • Releases tied to Hadoop releases • Alphas and betas – In production at several large sites for MapReduce already by that time June-July 2010 August 2011 May 2012 August 2013
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 1/3 2.2 2.3 2.4 2.5 2.6 2.7 15 October 2013 24 February 2014 07 April 2014 11 August 2014 • 1st GA • MR binary compatibility • YARN API cleanup • Testing! • 1st Post GA • Bug fixes • Alpha features • RM Fail-over • CS Preemption • Timeline Service V1 • Writable REST APIs • Timeline Service V1 security • Rolling Upgrades • Services • Node labels 18 November 2014 • Moving to JDK 7+ • Pluggable YARN authentication 21 Apr 2015
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 2/3 2.7.2 2.7.3 2.6.5 2.7.4 2.8.0 2.8.1 25 January 2016 25 August 2016 18 October 2016 04 August 2017 • Stabilization • Stabilization • All community parties brought together • Stabilization • Stabilization • Application Priority • Reservations • Node labels improvements 22 March 2017 • Stabilization • Next major release 08 June 2017
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved A brief Timeline: GA Releases 3/3 3.0.0-alpha1 3.0.0-alpha2 3.0.0-alpha3 3.0.0-alpha4 03 September 2016 25 January 2017 26 May 2017 04 August 2017 • First alpha! • Second alpha! • Third alpha! • Fourth alpha!
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Present & Future
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Page 13 Containerization Containers GPUs / FPGAs More powerful scheduling Much faster scheduling Scale SLAs Usability Service workloads
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Last few Hadoop releases Apache Hadoop 2.8.x Apache Hadoop 3.0.x
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 2.8.x
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Application priorities • YARN-1963 • Within a leaf-queue FIFO Policy App 1 App 2 App 3 App 4 FIFO Policy With priorities App 1 P8 App 2 P4 App 3 P3 App 4 P2
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Queue priorities  Today – Give to the least satisfied queue first  With priorities – Give to the highest priority queue first root A 20% Configured Capacity But 5% of used Capacity Usage = 5/20 = 25% B 80% Configured Capacity But 8% of used Capacity Usage 8/80 = 10%
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Preemption within a queue • Between apps of different priorities • Between apps of different users App 1 P3 App 2 P2 App 3 P1 App 1 U1 App 2 U2 App 3 U3 App 1 P3 App 2 P2 Candidates for preemption Queue Full App 2 U2 App 1 U1 Candidates for preemption Queue Full
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Per-queue Policy-driven scheduling Previously Now Ingestion FIFO Adhoc User-fairness Adhoc FIFO Ingestion FIFO • Coarse policies • One scheduling algorithm in the cluster • Rigid • Difficult to experiment • Fine grained policies • One scheduling algorithm per queue • Flexible • Very easy to experiment! Batch FIFO Batch FIFO
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Reservations • “Run my workload tomorrow at 6AM” • Persistence of the plans with RM failover: YARN-2573 Timeline Resources 6:00AM Block #1 Timeline Resources 6:00AM Block #1 Block #2
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Hadoop 3.x
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Looking at the Scale!  Tons of sites with clusters made up of multiple thousands of nodes – Yahoo!, Twitter, LinkedIn, Microsoft  Largest clusters the last couple of years – 6K-8K  Roadmap: To 100K thousands and beyond  Current progress: 40K nodes!
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Moving towards Global & Fast Scheduling  Effort led by Wangda Tan from Hortonworks.  Problems – Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions. – Several coarse grained locks  Current effort made us where we improved to – Look at several nodes at a time – Fine grained locks – Multiple allocator threads – YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour! – 10X throughput gains with enhancement added recently – Much better placement decisions
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Capacities in numbers vs percentages root (100%) Sales (30%) Engineering (65%) Default (5%) Dev (50%) QE (50%) root [memory=100Gi,vcores=100] Sales [memory=30Gi,vcores =30] Engineering [memory=65Gi, vcores=60] Default [memory=1Gi, vcores=1] Dev [memory=30Gi, vcores=50] QE [memory=35Gi, vcores=10]  Current model using percentages has worked well for various use-cases  Few challenges – percentage based capacity is resilient to any changes in the cluster resource – one percentage value for all resource-types  YARN-5881
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Resource profiles and custom resource types  Past – Supports only Memory and CPU  Now – A generalized vector – Admins can create custom Resource Types! – Ease of resource requesting model using profiles NodeManager Memory CPU GPU NodeManager Memory CPU GPU ResourceManager Small Medium Large Profile Memory CPU GPU Small 2 GB 4 Cores 1 Cores Medium 4 GB 8 Cores 1 Cores Large 16 GB 16 Cores 4 Cores Application Master Small (3 containers) Medium (2 containers)
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved GPUs on YARN!  GPU can speed up compute-intensive applications 10x - 300x times  Different levels of support – Take me to a machine where GPUs are available with Partitions / Node Labels – Take me to a machine where GPUs are available • give me a full device only to me for the lifetime of my container • give me multiple full devices only to me for the lifetime of my container • give me full device(s) only to me for a portion of the lifetime of my container • give me a slice of device(s) to me for a full / portion of the lifetime of my container  More dimensions: – CPUs and memory and GPUs and on-GPU memory – Topology of multiple GPUs
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved YARN Federation!  Enables applications to scale to 100k of thousands of nodes  Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters  Federation negotiates with sub-clusters RM’s and provide resources to the application  Applications can schedule tasks on any node
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Better placement strategies  Affinity  Anti-affinity  Node Constraints HBase Storm YARN
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Packaging  Containers – Lightweight mechanism for packaging and resource isolation – Popularized and made accessible by Docker – Can replace VMs in some cases – Or more accurately, VMs got used in places where they didn’t need to be  Native integration ++ in YARN – Support for “Container Runtimes” in LCE: YARN-3611 – Process runtime – Docker runtime
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplified APIs for service definitions  Applications need simple APIs  Need to be deployable “easily”  Simple REST API layer fronting YARN – YARN-4793 Simplified API layer for services and beyond  Spawn services & Manage them
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services support  Application & Services upgrades – ”Do an upgrade of my Spark / HBase apps with minimal impact to end-users” – YARN-4726  Simplified discovery of services via DNS mechanisms: YARN-4757 – regionserver30.hbase-app-3.0.hadoop.yarn.site
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Services Framework  Platform is only as good as the tools  A native YARN services framework – YARN-4692 – [Umbrella] Native YARN framework layer for services and beyond  Assembly: Supporting a DAG of apps: – SLIDER-875
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience API based queue management Decentralized Improved logs management Live application logs
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User experience New web UI
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service  Application History – “Where did my containers run?” – “Why is my application slow?” – “Is it really slow?” – “Why is my application failing?” – “What happened with my application? Succeeded?”  Cluster History – Run analytics on historical apps! – “User with most resource utilization” – “Largest application run” – “Why is my cluster slow?” – “Why is my cluster down?” – “What happened in my clusters?”  Collect and use past data – To schedule “my application” better – To do better capacity planning
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Timeline Service 2.0 • Next generation – Today’s solution helped us understand the space – Limited scalability and availability • “Analyzing Hadoop Clusters is becoming a big-data problem” – Don’t want to throw away the Hadoop application metadata – Large scale – Enable near real-time analysis: “Find me the user who is hammering the FileSystem with rouge applications. Now.” • Timeline data stored in HBase and accessible to queries
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BoF’s: Apache Hadoop – YARN, HDFS Sanjay Radia Hortonworks Sunil G Hortonworks https://dataworkssummit.com/sydney-2017/birds-of-a-feather/apache-hadoop-yarn-hdfs Thursday September 21st Room C2.1 6.00PM -- Related Session -- Rohith Sharma K S Hortonworks
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you!

Editor's Notes

  1. For new people, 10%
  2. Appplication centric
  3. Happens to be polarized User vs Administrators More : existing tools I know, give me something more Admin : Cross workloads, need to monitor and get best out it.
  4. Categories of recent initiatives