YARN Code Overview
Ocular bleeding is no reason to stop programing!

© Hortonworks Inc. 2013

Page 1
Quick Bio – Joseph Niemiec
• Hadoop user for 2+ years
• 1 of 5 Author’s for Apache Hadoop YARN
• Originally used Hadoop for location based services
(March 2014)

– Destination Prediction
– Traffic Analysis
– Effects of weather at client locations on call center call types

• Pending Patent in Automotive/Telematics domain
• Defensive Paper on M2M Validation
• Started on analytics to be better at an MMORPG

© Hortonworks Inc. 2013
Agenda
• What Is YARN
• YARN Concepts & Architecture
• Code and more Code
• Q&A

© Hortonworks Inc. 2013

Page 3
From Batch To Anything
Single Use System

Multi Purpose Platform

Batch Apps

Batch, Interactive, Online, Streaming, …

HADOOP 1.0

HADOOP 2.0
MapReduce
(data processing)

MapReduce

Others
(data processing)

YARN

(cluster resource management
& data processing)

(cluster resource management)

HDFS

HDFS2

(redundant, reliable storage)

(redundant, reliable storage)

© Hortonworks Inc. 2013

Page 4
Concepts
• Application
–Application is a job submitted to the framework
–Examples
– Map Reduce Job
– MoYa Cluster

• Container
–Basic unit of allocation
–Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU

–Replaces the fixed map/reduce slots

© Hortonworks Inc. 2013

5
Architecture
• Resource Manager
–Global resource scheduler
–Hierarchical queues

• Node Manager
–Per-machine agent
–Manages the life-cycle of container
–Container resource monitoring

• Application Master
–Per-application
–Manages application scheduling and task execution
–E.g. MapReduce Application Master
© Hortonworks Inc. 2013

6
To the code!

© Hortonworks Inc. 2013

Page 7
Q&A

© Hortonworks Inc. 2013

Page 8
YARN - ApplicationMaster
• ApplicationMaster
– ApplicationSubmissionContext is the complete specification of the
ApplicationMaster, provided by Client
– ResourceManager responsible for allocating and launching
ApplicationMaster container

ApplicationSubmissionContext
resourceRequest
containerLaunchContext
appName
queue

© Hortonworks Inc. 2013

Page 9
YARN – Resource Allocation & Usage
• ContainerLaunchContext
– The context provided by ApplicationMaster to NodeManager to
launch the Container
– Complete specification for a process
– LocalResource used to specify container binary and
dependencies
– NodeManager responsible for downloading from shared namespace
(typically HDFS)

ContainerLaunchContext
container
commands
environment
localResources

LocalResource
uri
type

© Hortonworks Inc. 2013

Page 10
YARN – Resource Allocation & Usage
• ResourceRequest

priority

1

© Hortonworks Inc. 2013

<4gb, 1 core>

numContainers
1

rack0

1

*

<2gb, 1 core>

resourceName
host01

0

capability

1

*

1

Page 11
YARN – Resource Allocation & Usage
• Container
– The basic unit of allocation in YARN
– The result of the ResourceRequest provided by ResourceManager
to the ApplicationMaster
– A specific amount of resources (cpu, memory etc.) on a specific
machine
Container
containerId
resourceName
capability

tokens

© Hortonworks Inc. 2013

Page 12

Hortonworks Yarn Code Walk Through January 2014

  • 1.
    YARN Code Overview Ocularbleeding is no reason to stop programing! © Hortonworks Inc. 2013 Page 1
  • 2.
    Quick Bio –Joseph Niemiec • Hadoop user for 2+ years • 1 of 5 Author’s for Apache Hadoop YARN • Originally used Hadoop for location based services (March 2014) – Destination Prediction – Traffic Analysis – Effects of weather at client locations on call center call types • Pending Patent in Automotive/Telematics domain • Defensive Paper on M2M Validation • Started on analytics to be better at an MMORPG © Hortonworks Inc. 2013
  • 3.
    Agenda • What IsYARN • YARN Concepts & Architecture • Code and more Code • Q&A © Hortonworks Inc. 2013 Page 3
  • 4.
    From Batch ToAnything Single Use System Multi Purpose Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1.0 HADOOP 2.0 MapReduce (data processing) MapReduce Others (data processing) YARN (cluster resource management & data processing) (cluster resource management) HDFS HDFS2 (redundant, reliable storage) (redundant, reliable storage) © Hortonworks Inc. 2013 Page 4
  • 5.
    Concepts • Application –Application isa job submitted to the framework –Examples – Map Reduce Job – MoYa Cluster • Container –Basic unit of allocation –Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU –Replaces the fixed map/reduce slots © Hortonworks Inc. 2013 5
  • 6.
    Architecture • Resource Manager –Globalresource scheduler –Hierarchical queues • Node Manager –Per-machine agent –Manages the life-cycle of container –Container resource monitoring • Application Master –Per-application –Manages application scheduling and task execution –E.g. MapReduce Application Master © Hortonworks Inc. 2013 6
  • 7.
    To the code! ©Hortonworks Inc. 2013 Page 7
  • 8.
  • 9.
    YARN - ApplicationMaster •ApplicationMaster – ApplicationSubmissionContext is the complete specification of the ApplicationMaster, provided by Client – ResourceManager responsible for allocating and launching ApplicationMaster container ApplicationSubmissionContext resourceRequest containerLaunchContext appName queue © Hortonworks Inc. 2013 Page 9
  • 10.
    YARN – ResourceAllocation & Usage • ContainerLaunchContext – The context provided by ApplicationMaster to NodeManager to launch the Container – Complete specification for a process – LocalResource used to specify container binary and dependencies – NodeManager responsible for downloading from shared namespace (typically HDFS) ContainerLaunchContext container commands environment localResources LocalResource uri type © Hortonworks Inc. 2013 Page 10
  • 11.
    YARN – ResourceAllocation & Usage • ResourceRequest priority 1 © Hortonworks Inc. 2013 <4gb, 1 core> numContainers 1 rack0 1 * <2gb, 1 core> resourceName host01 0 capability 1 * 1 Page 11
  • 12.
    YARN – ResourceAllocation & Usage • Container – The basic unit of allocation in YARN – The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster – A specific amount of resources (cpu, memory etc.) on a specific machine Container containerId resourceName capability tokens © Hortonworks Inc. 2013 Page 12

Editor's Notes

  • #5 So while Hadoop 1.x had its uses this is really about turning Hadoop into the next generation platform. So what does that mean? A platform should be able to do multiple things, ergo more then just batch processing. Need Batch, Interactive, Online, and Streaming capabilities to really turn Hadoop into a Next Gen Platform.SCALES! Yahoo plans to move into a 10k node cluster
  • #6 Now we have a concept of deploying applications into the hadoop clusterThese applications run in containers of set resources
  • #7 RM takes place of JT and still has scheduling ques and such like the fair, capacity and hierarchical ques