0
Intro	
  to	
  YARN	
  Apps	
  
Sandy	
  Ryza	
  
Introduc4on	
  

•  What’s YARN?
•  YARN apps
•  Building YARN apps
The	
  OS	
  analogy	
  
Traditional Operating System

Storage:
File System

Execution/Scheduling:
Processes/Kernel
Schedu...
The	
  OS	
  analogy	
  
Hadoop

Storage:
Hadoop Distributed
File System (HDFS)

Execution/Scheduling:
YARN!
Goal:	
  Mul4tenancy	
  

•  Different types of applications on the same
• 

cluster
Different users and organizations on ...
ResourceManager	
  (RM)	
  

•  Central service that tracks

• 

o  Nodes
§  Resources
o  Applications
o  Containers
Hous...
NodeManager	
  (NM)	
  

•  One on every node
•  Launches container processes
•  Enforces resource allocations
•  Monitors...
Applica4on	
  Master	
  (AM)	
  

•  User/application code
•  Every application instance has one
•  Runs inside a containe...
YARN	
  
JobHistory
Server

ResourceManager

NodeManager

Container
Map Task

Client

NodeManager

Container
Application
M...
Processing	
  Frameworks	
  /	
  YARN	
  apps	
  

•  MapReduce
• 
• 
• 

o  Batch processing, fault tolerant
Impala
o  Lo...
YARN	
  app	
  models	
  
• 

Applica4on	
  master	
  (AM)	
  per	
  job	
  
Most	
  simple	
  for	
  batch	
  
•  Used	
 ...
YARN	
  app	
  models	
  
• 

Applica4on	
  master	
  per	
  session	
  
Runs	
  mul4ple	
  jobs	
  on	
  behalf	
  of	
  ...
YARN	
  app	
  models	
  
• 

Singleton	
  AM	
  as	
  permanent	
  service	
  
Always	
  on,	
  waits	
  around	
  for	
 ...
YARN/MR	
  Scheduling	
  

ResourceManage
r

Fair Scheduler
Decide which jobs to give resources to

MapReduce
Application ...
Scheduling	
  on	
  Hadoop	
  
Application
Master 1
ResourceManager

Application
Master 2

Node 1

Node 2

Node 3
Scheduling	
  on	
  Hadoop	
  
I want 2 containers
with 1024 MB and a
1 core each

Application
Master 1

ResourceManager

...
Scheduling	
  on	
  Hadoop	
  
Application
Master 1

Noted
ResourceManager

Application
Master 2

Node 1

Node 2

Node 3
Scheduling	
  on	
  Hadoop	
  
Application
Master 1
ResourceManager

Application
Master 2

I’m still
here

Node 1

Node 2
...
Scheduling	
  on	
  Hadoop	
  
I’ll reserve
some
space on
node1 for
AM1

Application
Master 1
ResourceManager

Application...
Scheduling	
  on	
  Hadoop	
  
Got anything for
me?

Application
Master 1

ResourceManager

Application
Master 2

Node 1

...
Scheduling	
  on	
  Hadoop	
  
Here’s a security
token to let you launch
a container on Node 1

Application
Master 1

Reso...
Scheduling	
  on	
  Hadoop	
  
Hey, launch my
container with this
shell command

Application
Master 1

ResourceManager

Ap...
Scheduling	
  on	
  Hadoop	
  
Application
Master 1
ResourceManager

Application
Master 2

Node 1
Node 2
Container

Node 3
Should you build a YARN app?

•  MapReduce can’t run arbitrary DAGs?
o  Use Spark
Should you build a YARN app?

•  MapReduce can’t store data in memory?
o  Use Spark
Should you build a YARN app?

•  Iterative processing?
o  Use Spark
Should you build a YARN app?

•  Have an existing distributed app that runs all
tasks at once?
o  Use distributed shell
When to build a YARN app

•  Allocating and releasing containers
• 

dynamically
Weird scheduling requirements
o  Gang
o  ...
What YARN does for you

•  Deploys your bits
•  Runs your processes
•  Monitors your processes
•  Kills your processes whe...
What YARN does not do for you

•  Communication between your processes
AMRMClientAsync
CallbackHandler handler = new CallbackHandler() {
public void onContainersAllocated(List<Container> contai...
NMClientAsync
CallbackHandler nmHandler = new CallbackHandler() {
[... listen for containers stopped and started]
}
NMClie...
Launching Containers
public void startContainer(Container container) {
ContainerLaunchContext launchContext =
ContainerLau...
Local resources
Node
Container

Container

Node
Container

file.txt

Container

file.txt

file.txt
HDFS
Upcoming SlideShare
Loading in...5
×

Introduction to YARN Apps

1,925

Published on

The new YARN framework promises to make Hadoop a general-purpose platform for Big Data and enterprise data hub applications. In this talk, you'll learn about writing and taking advantage of applications built on YARN.

Published in: Technology

Transcript of "Introduction to YARN Apps"

  1. 1. Intro  to  YARN  Apps   Sandy  Ryza  
  2. 2. Introduc4on   •  What’s YARN? •  YARN apps •  Building YARN apps
  3. 3. The  OS  analogy   Traditional Operating System Storage: File System Execution/Scheduling: Processes/Kernel Scheduler
  4. 4. The  OS  analogy   Hadoop Storage: Hadoop Distributed File System (HDFS) Execution/Scheduling: YARN!
  5. 5. Goal:  Mul4tenancy   •  Different types of applications on the same •  cluster Different users and organizations on the same cluster
  6. 6. ResourceManager  (RM)   •  Central service that tracks •  o  Nodes §  Resources o  Applications o  Containers Houses scheduler, which is in charge of all container placement decisions
  7. 7. NodeManager  (NM)   •  One on every node •  Launches container processes •  Enforces resource allocations •  Monitors liveliness
  8. 8. Applica4on  Master  (AM)   •  User/application code •  Every application instance has one •  Runs inside a container on the cluster •  Requests resources from ResourceManager
  9. 9. YARN   JobHistory Server ResourceManager NodeManager Container Map Task Client NodeManager Container Application Master Container Reduce Task
  10. 10. Processing  Frameworks  /  YARN  apps   •  MapReduce •  •  •  o  Batch processing, fault tolerant Impala o  Low latency SQL on Hadoop Spark o  Load data into memory, great for iterative algorithms Storm o  Stream processing
  11. 11. YARN  app  models   •  Applica4on  master  (AM)  per  job   Most  simple  for  batch   •  Used  by  MapReduce   • 
  12. 12. YARN  app  models   •  Applica4on  master  per  session   Runs  mul4ple  jobs  on  behalf  of  the  same  user   •  Recently  added  in  Tez   •  Spark  interac4ve  mode   • 
  13. 13. YARN  app  models   •  Singleton  AM  as  permanent  service   Always  on,  waits  around  for  jobs  to  come  in   •  Used  for  Impala   • 
  14. 14. YARN/MR  Scheduling   ResourceManage r Fair Scheduler Decide which jobs to give resources to MapReduce Application Master Decide which tasks to give resources to within a job
  15. 15. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  16. 16. Scheduling  on  Hadoop   I want 2 containers with 1024 MB and a 1 core each Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  17. 17. Scheduling  on  Hadoop   Application Master 1 Noted ResourceManager Application Master 2 Node 1 Node 2 Node 3
  18. 18. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 I’m still here Node 1 Node 2 Node 3
  19. 19. Scheduling  on  Hadoop   I’ll reserve some space on node1 for AM1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  20. 20. Scheduling  on  Hadoop   Got anything for me? Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  21. 21. Scheduling  on  Hadoop   Here’s a security token to let you launch a container on Node 1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  22. 22. Scheduling  on  Hadoop   Hey, launch my container with this shell command Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  23. 23. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Container Node 3
  24. 24. Should you build a YARN app? •  MapReduce can’t run arbitrary DAGs? o  Use Spark
  25. 25. Should you build a YARN app? •  MapReduce can’t store data in memory? o  Use Spark
  26. 26. Should you build a YARN app? •  Iterative processing? o  Use Spark
  27. 27. Should you build a YARN app? •  Have an existing distributed app that runs all tasks at once? o  Use distributed shell
  28. 28. When to build a YARN app •  Allocating and releasing containers •  dynamically Weird scheduling requirements o  Gang o  Complex locality
  29. 29. What YARN does for you •  Deploys your bits •  Runs your processes •  Monitors your processes •  Kills your processes when they misbehave
  30. 30. What YARN does not do for you •  Communication between your processes
  31. 31. AMRMClientAsync CallbackHandler handler = new CallbackHandler() { public void onContainersAllocated(List<Container> containers) { for (Container container : containers) { startTask(container); } } [... more methods] } AMRMClientAsync amClient = AMRMClientAsync.createAMRMClientAsync(1000, handler); amClient.registerApplicationMaster(NetUtils.getHostName(), -1, “”); amClient.addContainerRequest( new ContainerRequest( Resource.newInstance(1024, 1), new String[] {“node1”, “node2”}, new String[] {“rack1”}, Priority.newInstance(2)));
  32. 32. NMClientAsync CallbackHandler nmHandler = new CallbackHandler() { [... listen for containers stopped and started] } NMClientAsync nmClient = NMClientAsync.createNMClientAsync(nmHandler);
  33. 33. Launching Containers public void startContainer(Container container) { ContainerLaunchContext launchContext = ContainerLaunchContext.newInstance( localResources, environment, Arrays.asList(“sleep 1000”), serviceData, tokens, acls); nmClient.startContainerAsync(container, launchContext); }
  34. 34. Local resources Node Container Container Node Container file.txt Container file.txt file.txt HDFS
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×