Introduction to YARN Apps

2,900 views
2,622 views

Published on

The new YARN framework promises to make Hadoop a general-purpose platform for Big Data and enterprise data hub applications. In this talk, you'll learn about writing and taking advantage of applications built on YARN.

Published in: Technology
0 Comments
19 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,900
On SlideShare
0
From Embeds
0
Number of Embeds
30
Actions
Shares
0
Downloads
175
Comments
0
Likes
19
Embeds 0
No embeds

No notes for slide

Introduction to YARN Apps

  1. 1. Intro  to  YARN  Apps   Sandy  Ryza  
  2. 2. Introduc4on   •  What’s YARN? •  YARN apps •  Building YARN apps
  3. 3. The  OS  analogy   Traditional Operating System Storage: File System Execution/Scheduling: Processes/Kernel Scheduler
  4. 4. The  OS  analogy   Hadoop Storage: Hadoop Distributed File System (HDFS) Execution/Scheduling: YARN!
  5. 5. Goal:  Mul4tenancy   •  Different types of applications on the same •  cluster Different users and organizations on the same cluster
  6. 6. ResourceManager  (RM)   •  Central service that tracks •  o  Nodes §  Resources o  Applications o  Containers Houses scheduler, which is in charge of all container placement decisions
  7. 7. NodeManager  (NM)   •  One on every node •  Launches container processes •  Enforces resource allocations •  Monitors liveliness
  8. 8. Applica4on  Master  (AM)   •  User/application code •  Every application instance has one •  Runs inside a container on the cluster •  Requests resources from ResourceManager
  9. 9. YARN   JobHistory Server ResourceManager NodeManager Container Map Task Client NodeManager Container Application Master Container Reduce Task
  10. 10. Processing  Frameworks  /  YARN  apps   •  MapReduce •  •  •  o  Batch processing, fault tolerant Impala o  Low latency SQL on Hadoop Spark o  Load data into memory, great for iterative algorithms Storm o  Stream processing
  11. 11. YARN  app  models   •  Applica4on  master  (AM)  per  job   Most  simple  for  batch   •  Used  by  MapReduce   • 
  12. 12. YARN  app  models   •  Applica4on  master  per  session   Runs  mul4ple  jobs  on  behalf  of  the  same  user   •  Recently  added  in  Tez   •  Spark  interac4ve  mode   • 
  13. 13. YARN  app  models   •  Singleton  AM  as  permanent  service   Always  on,  waits  around  for  jobs  to  come  in   •  Used  for  Impala   • 
  14. 14. YARN/MR  Scheduling   ResourceManage r Fair Scheduler Decide which jobs to give resources to MapReduce Application Master Decide which tasks to give resources to within a job
  15. 15. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  16. 16. Scheduling  on  Hadoop   I want 2 containers with 1024 MB and a 1 core each Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  17. 17. Scheduling  on  Hadoop   Application Master 1 Noted ResourceManager Application Master 2 Node 1 Node 2 Node 3
  18. 18. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 I’m still here Node 1 Node 2 Node 3
  19. 19. Scheduling  on  Hadoop   I’ll reserve some space on node1 for AM1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  20. 20. Scheduling  on  Hadoop   Got anything for me? Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  21. 21. Scheduling  on  Hadoop   Here’s a security token to let you launch a container on Node 1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  22. 22. Scheduling  on  Hadoop   Hey, launch my container with this shell command Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  23. 23. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Container Node 3
  24. 24. Should you build a YARN app? •  MapReduce can’t run arbitrary DAGs? o  Use Spark
  25. 25. Should you build a YARN app? •  MapReduce can’t store data in memory? o  Use Spark
  26. 26. Should you build a YARN app? •  Iterative processing? o  Use Spark
  27. 27. Should you build a YARN app? •  Have an existing distributed app that runs all tasks at once? o  Use distributed shell
  28. 28. When to build a YARN app •  Allocating and releasing containers •  dynamically Weird scheduling requirements o  Gang o  Complex locality
  29. 29. What YARN does for you •  Deploys your bits •  Runs your processes •  Monitors your processes •  Kills your processes when they misbehave
  30. 30. What YARN does not do for you •  Communication between your processes
  31. 31. AMRMClientAsync CallbackHandler handler = new CallbackHandler() { public void onContainersAllocated(List<Container> containers) { for (Container container : containers) { startTask(container); } } [... more methods] } AMRMClientAsync amClient = AMRMClientAsync.createAMRMClientAsync(1000, handler); amClient.registerApplicationMaster(NetUtils.getHostName(), -1, “”); amClient.addContainerRequest( new ContainerRequest( Resource.newInstance(1024, 1), new String[] {“node1”, “node2”}, new String[] {“rack1”}, Priority.newInstance(2)));
  32. 32. NMClientAsync CallbackHandler nmHandler = new CallbackHandler() { [... listen for containers stopped and started] } NMClientAsync nmClient = NMClientAsync.createNMClientAsync(nmHandler);
  33. 33. Launching Containers public void startContainer(Container container) { ContainerLaunchContext launchContext = ContainerLaunchContext.newInstance( localResources, environment, Arrays.asList(“sleep 1000”), serviceData, tokens, acls); nmClient.startContainerAsync(container, launchContext); }
  34. 34. Local resources Node Container Container Node Container file.txt Container file.txt file.txt HDFS

×