Yarnthug2014

© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Apache Hadoop YARN
Yet Another Resource Negotiator

Page © Hortonworks Inc. 2014
Quick Bio
• Hadoop user for ~3 years
• One of the Co-Authors for Apache Hadoop YARN
• Originally used Hadoop for location based services
• Destination Prediction
• Traffic Analysis
• Effects of weather at client locations on call center call types
• Pending Patent in Automotive/Telematics domain
• Defensive Paper on M2M Validation
• Started on analytics to be better at an MMORPG

Agenda
• Hadoop History
• Hadoop 1 Recap
• What is YARN
• MapReduce on YARN
• Multi-Workload, Multi-Tenant
• Example YARN App
• YARN Futures
• Short Demo
• Takeaways & QA

History Lesson
•Requirement #1: Scalability –
•The next-generation compute platform should scale horizontally to tens of thousands of
nodes and concurrent applications
•Phase 0: The Era of Ad Hoc Clusters
•Per User, Ingress & Egress every time
•No data persisted on HDFS
•Phase 1: Hadoop on Demand (HOD)
•Private ‘spin-up, spin-down processing’ Clusters on Shared Commodity Hardware
•Data persisted on HDFS as shared service
•Phase 2: Dawn of Shared Compute Cluster
•Multi-Tenant shared MapReduce & HDFS
•Phase 3: Emergence of YARN
•Multi-Tenant, Multi-Workload, Beyond MapReduce

Hadoop 1 Recap

• JobTracker
• TaskTracker
• Tasks
Hadoop MapReduce Classic
Page 6

MapReduce Classic: Limitations
• Scalability
• Maximum Cluster size – 4,000 nodes
• Maximum concurrent tasks – 40,000
• Availability
• Failure kills all queued and running jobs
• Hard partition of resources into map and reduce slots
• Low resource utilization
• Lacks support for alternate paradigms and services
• Iterative applications implemented using MapReduce are 10x slower
Page 7

What is YARN

What is YARN?
•Cluster Operating System
•Enable’s Generic Data Processing Tasks with ‘Containers’
•Big Compute (Metal Detectors) for Big Data (Hay Stack)
•Resource Manager
•Global resource scheduler
•Node Manager
•Per-machine agent
•Manages the life-cycle of container & resource monitoring
•Application Master
•Per-application master that manages application scheduling and task execution
•E.g. MapReduce Application Master
•Container
•Basic unit of allocation
•Fine-grained resource allocation across multiple resource types
•(memory, cpu, disk, network, gpu etc.)

YARN what is it good for?
•Compute for Data Processing
•Compute for Embarrassingly Parallel Problems
•Problems with tiny datasets and/or that don’t depend on one another
•ie: Exhaustive Search, Trade Simulations, Climate Models, Genetic Algorithms
•Beyond MapReduce
•Enables Multi Workload Compute Applications on a Single Shared Infrastructure
•Stream Processing, NoSQL, Search, InMemory, Graphs, etc
•ANYTHING YOU CAN START FROM CLI!
•Slider & Code Reuse
•Run existing applications on YARN: HBase on YARN, Storm on YARN
•Reuse existing Java code in Containers making serial applications parallel

Multi-workload Processing
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
HDFS2
(redundant, reliable storage)
YARN
(cluster resource management)
MapReduce
(data processing)
Others
(data processing)
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming,
…

Beyond MapReduce
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN : Data Operating System
DATA MANAGEMENT
SECURITYDATA ACCESS
GOVERNANCE &
INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-Memory
Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS
(Hadoop Distributed File System)
Batch
Map
Reduce

MapReduce on YARN

Apache Hadoop MapReduce on YARN
• Original use-case
• Most complex application to build
• Data-locality
• Fault tolerance
• ApplicationMaster recovery: Check point to HDFS
• Intra-application Priorities: Maps v/s Reduces
• Security
• Isolation
• Binary compatible with Apache Hadoop 1.x

Efficiency Gains of MRv2
• Key Optimizations
• No hard segmentation of resource into map and reduce slots
• Yarn scheduler is more efficient
• MRv2 framework has become more efficient than MRv1 for
• instance shuffle phase in MRv2 is more performant with the usage of a
different web server.
• Yahoo has over 30000 nodes running YARN across over 365PB of data.
• They calculate running about 400,000 jobs per day for about 10 million hours
of compute time.
• They also have estimated a 60% – 150% improvement on node usage per day.

© Hortonworks Inc. 2013
An Example Calculating Node Capacity
Important Parameters
–mapreduce.[map|reduce].memory.mb
– This is the physical ram hard-limit enforced by Hadoop on the task
–mapreduce.[map|reduce].java.opts
– The heapsize of the jvm –Xmx
–yarn.scheduler.minimum-allocation-mb
– The smallest container yarn will allow
–yarn.nodemanager.resource.memory-mb
– The amount of physical ram on the node

Calculating Node Capacity Continued
• Lets pretend we need a 1g map and a 2g reduce
• mapreduce[map|reduce].java.opts = [-Xmx 1g | -Xmx 2g]
• Remember a container has more overhead then just your heap!
• Add 512mb to the container limit for overhead
• mapreduce.[map.reduce].memory.mb= [1536 | 2560]
• We have 36g per node and minimum allocations of 512mb
• yarn.nodemanager.resource.memory-mb=36864
• yarn.scheduler.minimum-allocation-mb=512
• Our 36g node can support
• 24 Maps OR 14 Reducers OR any combination
allowed by the resources on the node

Multi-Workload, Multi-Tenant

NodeManager NodeManager NodeManager NodeManager
map 1.1
vertex1.2.2
map1.2
reduce1.1
Batch
vertex1.1.1
vertex1.1.2
vertex1.2.1
Interactive SQL
YARN as OS for Data Lake
ResourceManager
Scheduler
Real-Time
nimbus0
nimbus1
nimbus2

Multi-Tenant YARN
ResourceManager
Scheduler
root
Adhoc
10%
DW
60%
Mrkting
30%
Dev
10%
Reserved
20%
Prod
70%
Prod
80%
Dev
20%
P0
70%
P1
30%

Multi-Tenancy with CapacityScheduler
• Queues
• Economics as queue-capacity
– Heirarchical Queues
• SLAs
– Preemption
• Resource Isolation
– Linux: cgroups
– MS Windows: Job Control
– Roadmap: Virtualization (Xen, KVM)
• Administration
– Queue ACLs
– Run-time re-configuration for queues
– Charge-back
Page 21
ResourceManager
Scheduler
root
Adhoc
10%
DW
70%
Mrkting
20%
Dev
10%
Reserved
20%
Prod
70%
Prod
80%
Dev
20%
P0
70%
P1
30%
Capacity Scheduler
Hierarchical
Queues

Capacity Scheduler Configuration
Root$Queue
Max$Queue$Capacity
Guaranteed$Queue$
Capacity
Sub$Queue
ROOT
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.queues=adhoc,batch,prod
ADHOC
yarn.scheduler.capacity.root.adhoc.acl_submit_applications=*
yarn.scheduler.capacity.root.adhoc.capacity=25
yarn.scheduler.capacity.root.adhoc.maximum-capacity=50
yarn.scheduler.capacity.root.adhoc.state=RUNNING
yarn.scheduler.capacity.root.adhoc.user-limit-factor=2
PROD
yarn.scheduler.capacity.root.prod.acl_administer_queue=yarn
yarn.scheduler.capacity.root.prod.acl_submit_applications=yarn,mapred
yarn.scheduler.capacity.root.prod.capacity=50
yarn.scheduler.capacity.root.prod.queues=reports,ops
PROD - Reports
yarn.scheduler.capacity.root.prod.reports.state=RUNNING
yarn.scheduler.capacity.root.prod.reports.capacity=80
yarn.scheduler.capacity.root.prod.reports.maximum-capacity=100
yarn.scheduler.capacity.root.prod.reports.user-limit-factor=3
yarn.scheduler.capacity.root.prod.reports.minimum-user-limit-
percent=20
yarn.scheduler.capacity.prod.reports.maximum-applications = 1
ROOT
yarn.scheduler.capacity.root.capacity = 100
ADHOC
yarn.scheduler.capacity.
root.adhoc.maximum-
capacity = 50
yarn.scheduler.capa
city.root.adhoc.capac
ity = 25
BATCH
yarn.scheduler.ca
pacity.root.batch.
maximum-
capacity = 75
yarn.scheduler
.capacity.root.b
atch.capacity =
25
PROD
yarn.scheduler.capacity.root.prod.reports.maximum
-capacity = 100
yarn.scheduler.capacity.root.prod.ops.maximum-
capacity = 50
yarn.scheduler.capacity.root.prod.capacity = 50
yarn.scheduler.capacity
.root.prod.reports.capac
ity = 80
yarn.scheduler.c
apacity.root.prod
.ops.capacity =
20

An Example YARN App

Moya – Memcached on YARN
•Proof of concept project
•Minimum Effort
•Used Distributed Shell as skeleton
•GIT Hub:
•https://github.com/josephxsxn/moya
•Today
–Launch N-jmemcached Server Daemons
–Provides Configuration Information Via Zookeeper

NodeManager
Moya Architecture
NodeManager NodeManager Zookeeper Quorum NodeManager
Container 1.1
ResourceManager
NodeManager NodeManager NodeManager
Container 1.3
AM 1
Scheduler
ZK 1
ZK3
ZK 2
Container 1.2
Program using
Memcache Client
AM to Container
ZK Configuration Info and Heartbeat
Client Memcached Request

What’s inside the Moya AppMaster?
•Negotiates for all other application containers
//Request Containers
Priority pri = Records.newRecord(Priority.class);
pri.setPriority(requestPriority);
// Set up resource type requirements
Resource capability = Records.newRecord(Resource.class);
capability.setMemory(containerMemory);
//Memory Req, Hosts, Rack, Priority, Number of Containers
ContainerRequest request = new ContainerRequest(capability, null, null, pri, numContainers);
resourceManager.addContainerRequest(containerAsk);
//Resource Manager calls us back with a list of allocatedContainers
// Launch container by create ContainerLaunchContext
for (Container allocatedContainer : allocatedContainers) {
ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);
//Setup command details on next slide
nmClient.startContainer(container, ctx);}

//Setup Local Resources like our runnable jar
1.Map<String, LocalResource> localResources = new HashMap<String, LocalResource>();
2.LocalResource libsJarRsrc = Records.newRecord(LocalResource.class);
3.libsJarRsrc.setType(LocalResourceType.FILE);
4.libsJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);
//Get the path which was provided by the Client when it placed the libs on HDFS
1. libsJarRsrc.setResource(ConverterUtils.getYarnUrlFromURI(new URI(libsPath)));
2. localResources.put("Runnable.jar", libsJarRsrc);
3. ctx.setLocalResources(localResources);
//Setup the environment for the container.
1. Map<String, String> env = new HashMap<String, String>();
2.StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$()) .append(File.pathSeparatorChar).append("./*");
//Initial Commands To startup the runnable jar
1. Vector<CharSequence> vargs = new Vector<CharSequence>(5);
2. vargs.add(Environment.JAVA_HOME.$() + "/bin/java -jar Runnable.jar”)
//Convert vargs to string and add to Launch Context
1. ctx.setCommands(commands);
Moya AppMaster Continued
•Sets up the Container, Container Environment, Initial Commands, and Local Resources

A Look at the Moya Memcached Container
•Simple!
•Join the Zookeeper Moya group with Hostname and Port as member name
//initialize the server
daemon = new MemCacheDaemon<LocalCacheElement>();
CacheStorage<Key, LocalCacheElement> storage;
InetSocketAddress c = new InetSocketAddress(8555);
storage = ConcurrentLinkedHashMap.create(
ConcurrentLinkedHashMap.EvictionPolicy.FIFO, 15000, 67108864);
daemon.setCache(new CacheImpl(storage));
daemon.setBinary(false);
daemon.setAddr(c);
daemon.setIdleTime(120);
daemon.setVerbose(true);
daemon.start();
//StartJettyTest.main(new String[] {}); // Whats this?
// Add self in zookeer /moya/ group
JoinGroup.main( new String[]
{ "172.16.165.155:2181”, "moya », InetAddress.getLocalHost().getHostName() + « : »+ c.getPort() });

YARN Futures

Targeted Futures*Asterisks Included Free
•YARN Node Labels (YARN-796)
•Needed for long lived services
•Apache Slider*
•A framework to support deployment and management of arbitrary applications on YARN
•HBase on YARN, Storm on YARN
•Ambari Deploys YARN HA*
•CPU Scheduling (YARN-2)*
•Helping enable Storm and HBase on YARN scheduling
•CGroups Resource Isolation across RAM and CPU (YARN-3)*
•Application Timeline Server (ATS) goes GA (YARN-321)*
•Enable generic data collection captured in YARN apps
•MRv2 Integration with ATS (MAPREDUCE-5858)*
•Docker Container Executor (YARN-1964)
•Work Preserving Restart (YARN-1489)

A Short Demo

Preemption / Ambari REST Multi Tenant Load Demo
• Multiple workloads hitting queues with & without preemption
• Multi Tenant Queues –
• adhoc (default) : min 25% max 50%
• batch: min 25% max 75%
• prod: min 50%
prod.reports: min 80% of Prod, max 100% of cluster
prod.ops: min 20% of Prod, max 50% of cluster
• Demonstrate cluster automation with Ambari REST API
• Scheduler Changes & Refresh
• YARN Configuration Changes
• YARN Restart
• Launching MR Jobs
Preemption Operations Order

Takeaways & QA

Thank You!

Yarnthug2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Yarnthug2014

Similar to Yarnthug2014 (20)

Recently uploaded

Recently uploaded (20)

Yarnthug2014