SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Hadoop YARN
Yet Another Resource Negotiator
Page © Hortonworks Inc. 2014
Quick Bio
• Hadoop user for ~3 years
• One of the Co-Authors for Apache Hadoop YARN
• Originally used Hadoop for location based services
• Destination Prediction
• Traffic Analysis
• Effects of weather at client locations on call center call types
• Pending Patent in Automotive/Telematics domain
• Defensive Paper on M2M Validation
• Started on analytics to be better at an MMORPG
Page © Hortonworks Inc. 2014
Agenda
• Hadoop History
• Hadoop 1 Recap
• What is YARN
• MapReduce on YARN
• Multi-Workload, Multi-Tenant
• Example YARN App
• YARN Futures
• Short Demo
• Takeaways & QA
Page © Hortonworks Inc. 2014
History Lesson
•Requirement #1: Scalability –
•The next-generation compute platform should scale horizontally to tens of thousands of
nodes and concurrent applications
•Phase 0: The Era of Ad Hoc Clusters
•Per User, Ingress & Egress every time
•No data persisted on HDFS
•Phase 1: Hadoop on Demand (HOD)
•Private ‘spin-up, spin-down processing’ Clusters on Shared Commodity Hardware
•Data persisted on HDFS as shared service
•Phase 2: Dawn of Shared Compute Cluster
•Multi-Tenant shared MapReduce & HDFS
•Phase 3: Emergence of YARN
•Multi-Tenant, Multi-Workload, Beyond MapReduce
Page © Hortonworks Inc. 2014
Hadoop 1 Recap
Page © Hortonworks Inc. 2014
• JobTracker
• TaskTracker
• Tasks
Hadoop MapReduce Classic
Page 6
Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
MapReduce Classic: Limitations
• Scalability
• Maximum Cluster size – 4,000 nodes
• Maximum concurrent tasks – 40,000
• Availability
• Failure kills all queued and running jobs
• Hard partition of resources into map and reduce slots
• Low resource utilization
• Lacks support for alternate paradigms and services
• Iterative applications implemented using MapReduce are 10x slower
Page 7
Page © Hortonworks Inc. 2014
What is YARN
Page © Hortonworks Inc. 2014
What is YARN?
•Cluster Operating System
•Enable’s Generic Data Processing Tasks with ‘Containers’
•Big Compute (Metal Detectors) for Big Data (Hay Stack)
•Resource Manager
•Global resource scheduler
•Node Manager
•Per-machine agent
•Manages the life-cycle of container & resource monitoring
•Application Master
•Per-application master that manages application scheduling and task execution
•E.g. MapReduce Application Master
•Container
•Basic unit of allocation
•Fine-grained resource allocation across multiple resource types
•(memory, cpu, disk, network, gpu etc.)
Page © Hortonworks Inc. 2014
YARN what is it good for?
•Compute for Data Processing
•Compute for Embarrassingly Parallel Problems
•Problems with tiny datasets and/or that don’t depend on one another
•ie: Exhaustive Search, Trade Simulations, Climate Models, Genetic Algorithms
•Beyond MapReduce
•Enables Multi Workload Compute Applications on a Single Shared Infrastructure
•Stream Processing, NoSQL, Search, InMemory, Graphs, etc
•ANYTHING YOU CAN START FROM CLI!
•Slider & Code Reuse
•Run existing applications on YARN: HBase on YARN, Storm on YARN
•Reuse existing Java code in Containers making serial applications parallel
Page © Hortonworks Inc. 2014
Multi-workload Processing
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
HDFS2
(redundant, reliable storage)
YARN
(cluster resource management)
MapReduce
(data processing)
Others
(data processing)
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming,
…
Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Beyond MapReduce
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN : Data Operating System
DATA MANAGEMENT
SECURITYDATA ACCESS
GOVERNANCE &
INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-Memory
Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS
(Hadoop Distributed File System)
Batch
Map
Reduce
Page © Hortonworks Inc. 2014
MapReduce on YARN
Page © Hortonworks Inc. 2014
Apache Hadoop MapReduce on YARN
• Original use-case
• Most complex application to build
• Data-locality
• Fault tolerance
• ApplicationMaster recovery: Check point to HDFS
• Intra-application Priorities: Maps v/s Reduces
• Security
• Isolation
• Binary compatible with Apache Hadoop 1.x
Page © Hortonworks Inc. 2014
Efficiency Gains of MRv2
• Key Optimizations
• No hard segmentation of resource into map and reduce slots
• Yarn scheduler is more efficient
• MRv2 framework has become more efficient than MRv1 for
• instance shuffle phase in MRv2 is more performant with the usage of a
different web server.
• Yahoo has over 30000 nodes running YARN across over 365PB of data.
• They calculate running about 400,000 jobs per day for about 10 million hours
of compute time.
• They also have estimated a 60% – 150% improvement on node usage per day.
Page © Hortonworks Inc. 2014
© Hortonworks Inc. 2013
An Example Calculating Node Capacity
Important Parameters
–mapreduce.[map|reduce].memory.mb
– This is the physical ram hard-limit enforced by Hadoop on the task
–mapreduce.[map|reduce].java.opts
– The heapsize of the jvm –Xmx
–yarn.scheduler.minimum-allocation-mb
– The smallest container yarn will allow
–yarn.nodemanager.resource.memory-mb
– The amount of physical ram on the node
Page © Hortonworks Inc. 2014
© Hortonworks Inc. 2013
Calculating Node Capacity Continued
• Lets pretend we need a 1g map and a 2g reduce
• mapreduce[map|reduce].java.opts = [-Xmx 1g | -Xmx 2g]
• Remember a container has more overhead then just your heap!
• Add 512mb to the container limit for overhead
• mapreduce.[map.reduce].memory.mb= [1536 | 2560]
• We have 36g per node and minimum allocations of 512mb
• yarn.nodemanager.resource.memory-mb=36864
• yarn.scheduler.minimum-allocation-mb=512
• Our 36g node can support
• 24 Maps OR 14 Reducers OR any combination
allowed by the resources on the node
Page © Hortonworks Inc. 2014
Multi-Workload, Multi-Tenant
© Hortonworks Inc. 2012
NodeManager NodeManager NodeManager NodeManager
map 1.1
vertex1.2.2
NodeManager NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
map1.2
reduce1.1
Batch
vertex1.1.1
vertex1.1.2
vertex1.2.1
Interactive SQL
YARN as OS for Data Lake
ResourceManager
Scheduler
Real-Time
nimbus0
nimbus1
nimbus2
© Hortonworks Inc. 2012
Multi-Tenant YARN
ResourceManager
Scheduler
root
Adhoc
10%
DW
60%
Mrkting
30%
Dev
10%
Reserved
20%
Prod
70%
Prod
80%
Dev
20%
P0
70%
P1
30%
© Hortonworks Inc. 2013
Multi-Tenancy with CapacityScheduler
• Queues
• Economics as queue-capacity
– Heirarchical Queues
• SLAs
– Preemption
• Resource Isolation
– Linux: cgroups
– MS Windows: Job Control
– Roadmap: Virtualization (Xen, KVM)
• Administration
– Queue ACLs
– Run-time re-configuration for queues
– Charge-back
Page 21
ResourceManager
Scheduler
root
Adhoc
10%
DW
70%
Mrkting
20%
Dev
10%
Reserved
20%
Prod
70%
Prod
80%
Dev
20%
P0
70%
P1
30%
Capacity Scheduler
Hierarchical
Queues
Capacity Scheduler Configuration
Root$Queue
Max$Queue$Capacity
Guaranteed$Queue$
Capacity
Sub$Queue
ROOT
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.queues=adhoc,batch,prod
ADHOC
yarn.scheduler.capacity.root.adhoc.acl_submit_applications=*
yarn.scheduler.capacity.root.adhoc.capacity=25
yarn.scheduler.capacity.root.adhoc.maximum-capacity=50
yarn.scheduler.capacity.root.adhoc.state=RUNNING
yarn.scheduler.capacity.root.adhoc.user-limit-factor=2
PROD
yarn.scheduler.capacity.root.prod.acl_administer_queue=yarn
yarn.scheduler.capacity.root.prod.acl_submit_applications=yarn,mapred
yarn.scheduler.capacity.root.prod.capacity=50
yarn.scheduler.capacity.root.prod.queues=reports,ops
PROD - Reports
yarn.scheduler.capacity.root.prod.reports.state=RUNNING
yarn.scheduler.capacity.root.prod.reports.capacity=80
yarn.scheduler.capacity.root.prod.reports.maximum-capacity=100
yarn.scheduler.capacity.root.prod.reports.user-limit-factor=3
yarn.scheduler.capacity.root.prod.reports.minimum-user-limit-
percent=20
yarn.scheduler.capacity.prod.reports.maximum-applications = 1
ROOT
yarn.scheduler.capacity.root.capacity = 100
ADHOC
yarn.scheduler.capacity.
root.adhoc.maximum-
capacity = 50
yarn.scheduler.capa
city.root.adhoc.capac
ity = 25
BATCH
yarn.scheduler.ca
pacity.root.batch.
maximum-
capacity = 75
yarn.scheduler
.capacity.root.b
atch.capacity =
25
PROD
yarn.scheduler.capacity.root.prod.reports.maximum
-capacity = 100
yarn.scheduler.capacity.root.prod.ops.maximum-
capacity = 50
yarn.scheduler.capacity.root.prod.capacity = 50
yarn.scheduler.capacity
.root.prod.reports.capac
ity = 80
yarn.scheduler.c
apacity.root.prod
.ops.capacity =
20
Page © Hortonworks Inc. 2014
An Example YARN App
Page © Hortonworks Inc. 2014
Moya – Memcached on YARN
•Proof of concept project
•Minimum Effort
•Used Distributed Shell as skeleton
•GIT Hub:
•https://github.com/josephxsxn/moya
•Today
–Launch N-jmemcached Server Daemons
–Provides Configuration Information Via Zookeeper
Page © Hortonworks Inc. 2014
NodeManager
Moya Architecture
NodeManager NodeManager Zookeeper Quorum NodeManager
Container 1.1
ResourceManager
NodeManager NodeManager NodeManager
NodeManager NodeManager NodeManager NodeManager
Container 1.3
AM 1
Scheduler
ZK 1
ZK3
ZK 2
Container 1.2
Program using
Memcache Client
AM to Container
ZK Configuration Info and Heartbeat
Client Memcached Request
Page © Hortonworks Inc. 2014
What’s inside the Moya AppMaster?
•Negotiates for all other application containers
//Request Containers
Priority pri = Records.newRecord(Priority.class);
pri.setPriority(requestPriority);
// Set up resource type requirements
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(containerMemory);
//Memory Req, Hosts, Rack, Priority, Number of Containers
ContainerRequest request = new ContainerRequest(capability, null, null, pri, numContainers);
resourceManager.addContainerRequest(containerAsk);
//Resource Manager calls us back with a list of allocatedContainers
// Launch container by create ContainerLaunchContext
for (Container allocatedContainer : allocatedContainers) {
ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);
//Setup command details on next slide
nmClient.startContainer(container, ctx);}
Page © Hortonworks Inc. 2014
//Setup Local Resources like our runnable jar
1.Map<String, LocalResource> localResources = new HashMap<String, LocalResource>();
2.LocalResource libsJarRsrc = Records.newRecord(LocalResource.class);
3.libsJarRsrc.setType(LocalResourceType.FILE);
4.libsJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);
//Get the path which was provided by the Client when it placed the libs on HDFS
1. libsJarRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(libsPath)));
2. localResources.put("Runnable.jar", libsJarRsrc);
3. ctx.setLocalResources(localResources);
//Setup the environment for the container.
1. Map<String, String> env = new HashMap<String, String>();
2.StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$()) .append(File.pathSeparatorChar).append("./*");
//Initial Commands To startup the runnable jar
1. Vector<CharSequence> vargs = new Vector<CharSequence>(5);
2. vargs.add(Environment.JAVA_HOME.$() + "/bin/java -jar Runnable.jar”)
//Convert vargs to string and add to Launch Context
1. ctx.setCommands(commands);
Moya AppMaster Continued
•Sets up the Container, Container Environment, Initial Commands, and Local Resources
Page © Hortonworks Inc. 2014
A Look at the Moya Memcached Container
•Simple!
•Join the Zookeeper Moya group with Hostname and Port as member name
//initialize the server
daemon = new MemCacheDaemon<LocalCacheElement>();
CacheStorage<Key, LocalCacheElement> storage;
InetSocketAddress c = new InetSocketAddress(8555);
storage = ConcurrentLinkedHashMap.create(
ConcurrentLinkedHashMap.EvictionPolicy.FIFO, 15000, 67108864);
daemon.setCache(new CacheImpl(storage));
daemon.setBinary(false);
daemon.setAddr(c);
daemon.setIdleTime(120);
daemon.setVerbose(true);
daemon.start();
//StartJettyTest.main(new String[] {}); // Whats this?
// Add self in zookeer /moya/ group
JoinGroup.main( new String[]
{ "172.16.165.155:2181”, "moya », InetAddress.getLocalHost().getHostName() + « : »+ c.getPort() });
Page © Hortonworks Inc. 2014
YARN Futures
Page © Hortonworks Inc. 2014
Targeted Futures*Asterisks Included Free
•YARN Node Labels (YARN-796)
•Needed for long lived services
•Apache Slider*
•A framework to support deployment and management of arbitrary applications on YARN
•HBase on YARN, Storm on YARN
•Ambari Deploys YARN HA*
•CPU Scheduling (YARN-2)*
•Helping enable Storm and HBase on YARN scheduling
•CGroups Resource Isolation across RAM and CPU (YARN-3)*
•Application Timeline Server (ATS) goes GA (YARN-321)*
•Enable generic data collection captured in YARN apps
•MRv2 Integration with ATS (MAPREDUCE-5858)*
•Docker Container Executor (YARN-1964)
•Work Preserving Restart (YARN-1489)
Page © Hortonworks Inc. 2014
A Short Demo
Page © Hortonworks Inc. 2014
Preemption / Ambari REST Multi Tenant Load Demo
• Multiple workloads hitting queues with & without preemption
• Multi Tenant Queues –
• adhoc (default) : min 25% max 50%
• batch: min 25% max 75%
• prod: min 50%
prod.reports: min 80% of Prod, max 100% of cluster
prod.ops: min 20% of Prod, max 50% of cluster
• Demonstrate cluster automation with Ambari REST API
• Scheduler Changes & Refresh
• YARN Configuration Changes
• YARN Restart
• Launching MR Jobs
Preemption Operations Order
Page © Hortonworks Inc. 2014
Takeaways & QA
Page © Hortonworks Inc. 2014
Thank You!

More Related Content

What's hot

Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Yarn
YarnYarn
Yarn
Yu Xia
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
Data Con LA
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
bigdatagurus_meetup
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
Adam Doyle
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
Uwe Printz
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
Insight Technology, Inc.
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
Bikas Saha
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
POSSCON
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache Tez
Gal Vinograd
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
DataWorks Summit
 

What's hot (20)

Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Yarn
YarnYarn
Yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Tez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_sahaTez big datacamp-la-bikas_saha
Tez big datacamp-la-bikas_saha
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Apache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query ProcessingApache Tez : Accelerating Hadoop Query Processing
Apache Tez : Accelerating Hadoop Query Processing
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
Tuning up with Apache Tez
Tuning up with Apache TezTuning up with Apache Tez
Tuning up with Apache Tez
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 

Similar to Yarnthug2014

MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
Joseph Niemiec
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
Hortonworks
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
Hortonworks
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
Hortonworks
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh1892
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduceHadoop: Beyond MapReduce
Hadoop: Beyond MapReduce
Steve Loughran
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
DataWorks Summit
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
Tsuyoshi OZAWA
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
Bigdata Meetup Kochi
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
t3rmin4t0r
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Cloudera, Inc.
 

Similar to Yarnthug2014 (20)

MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez YARN Ready: Integrating to YARN with Tez
YARN Ready: Integrating to YARN with Tez
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query ProcessingApache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.02013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Hadoop: Beyond MapReduce
Hadoop: Beyond MapReduceHadoop: Beyond MapReduce
Hadoop: Beyond MapReduce
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARNCombine SAS High-Performance Capabilities with Hadoop YARN
Combine SAS High-Performance Capabilities with Hadoop YARN
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
Hadoop In Action
Hadoop In ActionHadoop In Action
Hadoop In Action
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Tez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthelTez: Accelerating Data Pipelines - fifthel
Tez: Accelerating Data Pipelines - fifthel
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

Yarnthug2014

  • 1. Page1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Apache Hadoop YARN Yet Another Resource Negotiator
  • 2. Page © Hortonworks Inc. 2014 Quick Bio • Hadoop user for ~3 years • One of the Co-Authors for Apache Hadoop YARN • Originally used Hadoop for location based services • Destination Prediction • Traffic Analysis • Effects of weather at client locations on call center call types • Pending Patent in Automotive/Telematics domain • Defensive Paper on M2M Validation • Started on analytics to be better at an MMORPG
  • 3. Page © Hortonworks Inc. 2014 Agenda • Hadoop History • Hadoop 1 Recap • What is YARN • MapReduce on YARN • Multi-Workload, Multi-Tenant • Example YARN App • YARN Futures • Short Demo • Takeaways & QA
  • 4. Page © Hortonworks Inc. 2014 History Lesson •Requirement #1: Scalability – •The next-generation compute platform should scale horizontally to tens of thousands of nodes and concurrent applications •Phase 0: The Era of Ad Hoc Clusters •Per User, Ingress & Egress every time •No data persisted on HDFS •Phase 1: Hadoop on Demand (HOD) •Private ‘spin-up, spin-down processing’ Clusters on Shared Commodity Hardware •Data persisted on HDFS as shared service •Phase 2: Dawn of Shared Compute Cluster •Multi-Tenant shared MapReduce & HDFS •Phase 3: Emergence of YARN •Multi-Tenant, Multi-Workload, Beyond MapReduce
  • 5. Page © Hortonworks Inc. 2014 Hadoop 1 Recap
  • 6. Page © Hortonworks Inc. 2014 • JobTracker • TaskTracker • Tasks Hadoop MapReduce Classic Page 6
  • 7. Page7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved MapReduce Classic: Limitations • Scalability • Maximum Cluster size – 4,000 nodes • Maximum concurrent tasks – 40,000 • Availability • Failure kills all queued and running jobs • Hard partition of resources into map and reduce slots • Low resource utilization • Lacks support for alternate paradigms and services • Iterative applications implemented using MapReduce are 10x slower Page 7
  • 8. Page © Hortonworks Inc. 2014 What is YARN
  • 9. Page © Hortonworks Inc. 2014 What is YARN? •Cluster Operating System •Enable’s Generic Data Processing Tasks with ‘Containers’ •Big Compute (Metal Detectors) for Big Data (Hay Stack) •Resource Manager •Global resource scheduler •Node Manager •Per-machine agent •Manages the life-cycle of container & resource monitoring •Application Master •Per-application master that manages application scheduling and task execution •E.g. MapReduce Application Master •Container •Basic unit of allocation •Fine-grained resource allocation across multiple resource types •(memory, cpu, disk, network, gpu etc.)
  • 10. Page © Hortonworks Inc. 2014 YARN what is it good for? •Compute for Data Processing •Compute for Embarrassingly Parallel Problems •Problems with tiny datasets and/or that don’t depend on one another •ie: Exhaustive Search, Trade Simulations, Climate Models, Genetic Algorithms •Beyond MapReduce •Enables Multi Workload Compute Applications on a Single Shared Infrastructure •Stream Processing, NoSQL, Search, InMemory, Graphs, etc •ANYTHING YOU CAN START FROM CLI! •Slider & Code Reuse •Run existing applications on YARN: HBase on YARN, Storm on YARN •Reuse existing Java code in Containers making serial applications parallel
  • 11. Page © Hortonworks Inc. 2014 Multi-workload Processing HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, reliable storage) YARN (cluster resource management) MapReduce (data processing) Others (data processing) HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, …
  • 12. Page12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Beyond MapReduce Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workflow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS YARN : Data Operating System DATA MANAGEMENT SECURITYDATA ACCESS GOVERNANCE & INTEGRATION Authentication Authorization Accounting Data Protection Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon Cluster: Knox OPERATIONS Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-Memory Analytics, ISV engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS (Hadoop Distributed File System) Batch Map Reduce
  • 13. Page © Hortonworks Inc. 2014 MapReduce on YARN
  • 14. Page © Hortonworks Inc. 2014 Apache Hadoop MapReduce on YARN • Original use-case • Most complex application to build • Data-locality • Fault tolerance • ApplicationMaster recovery: Check point to HDFS • Intra-application Priorities: Maps v/s Reduces • Security • Isolation • Binary compatible with Apache Hadoop 1.x
  • 15. Page © Hortonworks Inc. 2014 Efficiency Gains of MRv2 • Key Optimizations • No hard segmentation of resource into map and reduce slots • Yarn scheduler is more efficient • MRv2 framework has become more efficient than MRv1 for • instance shuffle phase in MRv2 is more performant with the usage of a different web server. • Yahoo has over 30000 nodes running YARN across over 365PB of data. • They calculate running about 400,000 jobs per day for about 10 million hours of compute time. • They also have estimated a 60% – 150% improvement on node usage per day.
  • 16. Page © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 An Example Calculating Node Capacity Important Parameters –mapreduce.[map|reduce].memory.mb – This is the physical ram hard-limit enforced by Hadoop on the task –mapreduce.[map|reduce].java.opts – The heapsize of the jvm –Xmx –yarn.scheduler.minimum-allocation-mb – The smallest container yarn will allow –yarn.nodemanager.resource.memory-mb – The amount of physical ram on the node
  • 17. Page © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 Calculating Node Capacity Continued • Lets pretend we need a 1g map and a 2g reduce • mapreduce[map|reduce].java.opts = [-Xmx 1g | -Xmx 2g] • Remember a container has more overhead then just your heap! • Add 512mb to the container limit for overhead • mapreduce.[map.reduce].memory.mb= [1536 | 2560] • We have 36g per node and minimum allocations of 512mb • yarn.nodemanager.resource.memory-mb=36864 • yarn.scheduler.minimum-allocation-mb=512 • Our 36g node can support • 24 Maps OR 14 Reducers OR any combination allowed by the resources on the node
  • 18. Page © Hortonworks Inc. 2014 Multi-Workload, Multi-Tenant
  • 19. © Hortonworks Inc. 2012 NodeManager NodeManager NodeManager NodeManager map 1.1 vertex1.2.2 NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager map1.2 reduce1.1 Batch vertex1.1.1 vertex1.1.2 vertex1.2.1 Interactive SQL YARN as OS for Data Lake ResourceManager Scheduler Real-Time nimbus0 nimbus1 nimbus2
  • 20. © Hortonworks Inc. 2012 Multi-Tenant YARN ResourceManager Scheduler root Adhoc 10% DW 60% Mrkting 30% Dev 10% Reserved 20% Prod 70% Prod 80% Dev 20% P0 70% P1 30%
  • 21. © Hortonworks Inc. 2013 Multi-Tenancy with CapacityScheduler • Queues • Economics as queue-capacity – Heirarchical Queues • SLAs – Preemption • Resource Isolation – Linux: cgroups – MS Windows: Job Control – Roadmap: Virtualization (Xen, KVM) • Administration – Queue ACLs – Run-time re-configuration for queues – Charge-back Page 21 ResourceManager Scheduler root Adhoc 10% DW 70% Mrkting 20% Dev 10% Reserved 20% Prod 70% Prod 80% Dev 20% P0 70% P1 30% Capacity Scheduler Hierarchical Queues
  • 22. Capacity Scheduler Configuration Root$Queue Max$Queue$Capacity Guaranteed$Queue$ Capacity Sub$Queue ROOT yarn.scheduler.capacity.root.capacity=100 yarn.scheduler.capacity.root.queues=adhoc,batch,prod ADHOC yarn.scheduler.capacity.root.adhoc.acl_submit_applications=* yarn.scheduler.capacity.root.adhoc.capacity=25 yarn.scheduler.capacity.root.adhoc.maximum-capacity=50 yarn.scheduler.capacity.root.adhoc.state=RUNNING yarn.scheduler.capacity.root.adhoc.user-limit-factor=2 PROD yarn.scheduler.capacity.root.prod.acl_administer_queue=yarn yarn.scheduler.capacity.root.prod.acl_submit_applications=yarn,mapred yarn.scheduler.capacity.root.prod.capacity=50 yarn.scheduler.capacity.root.prod.queues=reports,ops PROD - Reports yarn.scheduler.capacity.root.prod.reports.state=RUNNING yarn.scheduler.capacity.root.prod.reports.capacity=80 yarn.scheduler.capacity.root.prod.reports.maximum-capacity=100 yarn.scheduler.capacity.root.prod.reports.user-limit-factor=3 yarn.scheduler.capacity.root.prod.reports.minimum-user-limit- percent=20 yarn.scheduler.capacity.prod.reports.maximum-applications = 1 ROOT yarn.scheduler.capacity.root.capacity = 100 ADHOC yarn.scheduler.capacity. root.adhoc.maximum- capacity = 50 yarn.scheduler.capa city.root.adhoc.capac ity = 25 BATCH yarn.scheduler.ca pacity.root.batch. maximum- capacity = 75 yarn.scheduler .capacity.root.b atch.capacity = 25 PROD yarn.scheduler.capacity.root.prod.reports.maximum -capacity = 100 yarn.scheduler.capacity.root.prod.ops.maximum- capacity = 50 yarn.scheduler.capacity.root.prod.capacity = 50 yarn.scheduler.capacity .root.prod.reports.capac ity = 80 yarn.scheduler.c apacity.root.prod .ops.capacity = 20
  • 23. Page © Hortonworks Inc. 2014 An Example YARN App
  • 24. Page © Hortonworks Inc. 2014 Moya – Memcached on YARN •Proof of concept project •Minimum Effort •Used Distributed Shell as skeleton •GIT Hub: •https://github.com/josephxsxn/moya •Today –Launch N-jmemcached Server Daemons –Provides Configuration Information Via Zookeeper
  • 25. Page © Hortonworks Inc. 2014 NodeManager Moya Architecture NodeManager NodeManager Zookeeper Quorum NodeManager Container 1.1 ResourceManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager NodeManager Container 1.3 AM 1 Scheduler ZK 1 ZK3 ZK 2 Container 1.2 Program using Memcache Client AM to Container ZK Configuration Info and Heartbeat Client Memcached Request
  • 26. Page © Hortonworks Inc. 2014 What’s inside the Moya AppMaster? •Negotiates for all other application containers //Request Containers Priority pri = Records.newRecord(Priority.class); pri.setPriority(requestPriority); // Set up resource type requirements Resource capability = Records.newRecord(Resource.class); capability.setMemory(containerMemory); //Memory Req, Hosts, Rack, Priority, Number of Containers ContainerRequest request = new ContainerRequest(capability, null, null, pri, numContainers); resourceManager.addContainerRequest(containerAsk); //Resource Manager calls us back with a list of allocatedContainers // Launch container by create ContainerLaunchContext for (Container allocatedContainer : allocatedContainers) { ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class); //Setup command details on next slide nmClient.startContainer(container, ctx);}
  • 27. Page © Hortonworks Inc. 2014 //Setup Local Resources like our runnable jar 1.Map<String, LocalResource> localResources = new HashMap<String, LocalResource>(); 2.LocalResource libsJarRsrc = Records.newRecord(LocalResource.class); 3.libsJarRsrc.setType(LocalResourceType.FILE); 4.libsJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION); //Get the path which was provided by the Client when it placed the libs on HDFS 1. libsJarRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(libsPath))); 2. localResources.put("Runnable.jar", libsJarRsrc); 3. ctx.setLocalResources(localResources); //Setup the environment for the container. 1. Map<String, String> env = new HashMap<String, String>(); 2.StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$()) .append(File.pathSeparatorChar).append("./*"); //Initial Commands To startup the runnable jar 1. Vector<CharSequence> vargs = new Vector<CharSequence>(5); 2. vargs.add(Environment.JAVA_HOME.$() + "/bin/java -jar Runnable.jar”) //Convert vargs to string and add to Launch Context 1. ctx.setCommands(commands); Moya AppMaster Continued •Sets up the Container, Container Environment, Initial Commands, and Local Resources
  • 28. Page © Hortonworks Inc. 2014 A Look at the Moya Memcached Container •Simple! •Join the Zookeeper Moya group with Hostname and Port as member name //initialize the server daemon = new MemCacheDaemon<LocalCacheElement>(); CacheStorage<Key, LocalCacheElement> storage; InetSocketAddress c = new InetSocketAddress(8555); storage = ConcurrentLinkedHashMap.create( ConcurrentLinkedHashMap.EvictionPolicy.FIFO, 15000, 67108864); daemon.setCache(new CacheImpl(storage)); daemon.setBinary(false); daemon.setAddr(c); daemon.setIdleTime(120); daemon.setVerbose(true); daemon.start(); //StartJettyTest.main(new String[] {}); // Whats this? // Add self in zookeer /moya/ group JoinGroup.main( new String[] { "172.16.165.155:2181”, "moya », InetAddress.getLocalHost().getHostName() + « : »+ c.getPort() });
  • 29. Page © Hortonworks Inc. 2014 YARN Futures
  • 30. Page © Hortonworks Inc. 2014 Targeted Futures*Asterisks Included Free •YARN Node Labels (YARN-796) •Needed for long lived services •Apache Slider* •A framework to support deployment and management of arbitrary applications on YARN •HBase on YARN, Storm on YARN •Ambari Deploys YARN HA* •CPU Scheduling (YARN-2)* •Helping enable Storm and HBase on YARN scheduling •CGroups Resource Isolation across RAM and CPU (YARN-3)* •Application Timeline Server (ATS) goes GA (YARN-321)* •Enable generic data collection captured in YARN apps •MRv2 Integration with ATS (MAPREDUCE-5858)* •Docker Container Executor (YARN-1964) •Work Preserving Restart (YARN-1489)
  • 31. Page © Hortonworks Inc. 2014 A Short Demo
  • 32. Page © Hortonworks Inc. 2014 Preemption / Ambari REST Multi Tenant Load Demo • Multiple workloads hitting queues with & without preemption • Multi Tenant Queues – • adhoc (default) : min 25% max 50% • batch: min 25% max 75% • prod: min 50% prod.reports: min 80% of Prod, max 100% of cluster prod.ops: min 20% of Prod, max 50% of cluster • Demonstrate cluster automation with Ambari REST API • Scheduler Changes & Refresh • YARN Configuration Changes • YARN Restart • Launching MR Jobs Preemption Operations Order
  • 33. Page © Hortonworks Inc. 2014 Takeaways & QA
  • 34. Page © Hortonworks Inc. 2014 Thank You!