• Like

Introduction to YARN Apps

  • 1,788 views
Uploaded on

The new YARN framework promises to make Hadoop a general-purpose platform for Big Data and enterprise data hub applications. In this talk, you'll learn about writing and taking advantage of …

The new YARN framework promises to make Hadoop a general-purpose platform for Big Data and enterprise data hub applications. In this talk, you'll learn about writing and taking advantage of applications built on YARN.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,788
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
140
Comments
0
Likes
16

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Intro  to  YARN  Apps   Sandy  Ryza  
  • 2. Introduc4on   •  What’s YARN? •  YARN apps •  Building YARN apps
  • 3. The  OS  analogy   Traditional Operating System Storage: File System Execution/Scheduling: Processes/Kernel Scheduler
  • 4. The  OS  analogy   Hadoop Storage: Hadoop Distributed File System (HDFS) Execution/Scheduling: YARN!
  • 5. Goal:  Mul4tenancy   •  Different types of applications on the same •  cluster Different users and organizations on the same cluster
  • 6. ResourceManager  (RM)   •  Central service that tracks •  o  Nodes §  Resources o  Applications o  Containers Houses scheduler, which is in charge of all container placement decisions
  • 7. NodeManager  (NM)   •  One on every node •  Launches container processes •  Enforces resource allocations •  Monitors liveliness
  • 8. Applica4on  Master  (AM)   •  User/application code •  Every application instance has one •  Runs inside a container on the cluster •  Requests resources from ResourceManager
  • 9. YARN   JobHistory Server ResourceManager NodeManager Container Map Task Client NodeManager Container Application Master Container Reduce Task
  • 10. Processing  Frameworks  /  YARN  apps   •  MapReduce •  •  •  o  Batch processing, fault tolerant Impala o  Low latency SQL on Hadoop Spark o  Load data into memory, great for iterative algorithms Storm o  Stream processing
  • 11. YARN  app  models   •  Applica4on  master  (AM)  per  job   Most  simple  for  batch   •  Used  by  MapReduce   • 
  • 12. YARN  app  models   •  Applica4on  master  per  session   Runs  mul4ple  jobs  on  behalf  of  the  same  user   •  Recently  added  in  Tez   •  Spark  interac4ve  mode   • 
  • 13. YARN  app  models   •  Singleton  AM  as  permanent  service   Always  on,  waits  around  for  jobs  to  come  in   •  Used  for  Impala   • 
  • 14. YARN/MR  Scheduling   ResourceManage r Fair Scheduler Decide which jobs to give resources to MapReduce Application Master Decide which tasks to give resources to within a job
  • 15. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 16. Scheduling  on  Hadoop   I want 2 containers with 1024 MB and a 1 core each Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 17. Scheduling  on  Hadoop   Application Master 1 Noted ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 18. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 I’m still here Node 1 Node 2 Node 3
  • 19. Scheduling  on  Hadoop   I’ll reserve some space on node1 for AM1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 20. Scheduling  on  Hadoop   Got anything for me? Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 21. Scheduling  on  Hadoop   Here’s a security token to let you launch a container on Node 1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 22. Scheduling  on  Hadoop   Hey, launch my container with this shell command Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 23. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Container Node 3
  • 24. Should you build a YARN app? •  MapReduce can’t run arbitrary DAGs? o  Use Spark
  • 25. Should you build a YARN app? •  MapReduce can’t store data in memory? o  Use Spark
  • 26. Should you build a YARN app? •  Iterative processing? o  Use Spark
  • 27. Should you build a YARN app? •  Have an existing distributed app that runs all tasks at once? o  Use distributed shell
  • 28. When to build a YARN app •  Allocating and releasing containers •  dynamically Weird scheduling requirements o  Gang o  Complex locality
  • 29. What YARN does for you •  Deploys your bits •  Runs your processes •  Monitors your processes •  Kills your processes when they misbehave
  • 30. What YARN does not do for you •  Communication between your processes
  • 31. AMRMClientAsync CallbackHandler handler = new CallbackHandler() { public void onContainersAllocated(List<Container> containers) { for (Container container : containers) { startTask(container); } } [... more methods] } AMRMClientAsync amClient = AMRMClientAsync.createAMRMClientAsync(1000, handler); amClient.registerApplicationMaster(NetUtils.getHostName(), -1, “”); amClient.addContainerRequest( new ContainerRequest( Resource.newInstance(1024, 1), new String[] {“node1”, “node2”}, new String[] {“rack1”}, Priority.newInstance(2)));
  • 32. NMClientAsync CallbackHandler nmHandler = new CallbackHandler() { [... listen for containers stopped and started] } NMClientAsync nmClient = NMClientAsync.createNMClientAsync(nmHandler);
  • 33. Launching Containers public void startContainer(Container container) { ContainerLaunchContext launchContext = ContainerLaunchContext.newInstance( localResources, environment, Arrays.asList(“sleep 1000”), serviceData, tokens, acls); nmClient.startContainerAsync(container, launchContext); }
  • 34. Local resources Node Container Container Node Container file.txt Container file.txt file.txt HDFS