Introduction to YARN Apps
 

Like this? Share it with your network

Share

Introduction to YARN Apps

on

  • 1,885 views

The new YARN framework promises to make Hadoop a general-purpose platform for Big Data and enterprise data hub applications. In this talk, you'll learn about writing and taking advantage of ...

The new YARN framework promises to make Hadoop a general-purpose platform for Big Data and enterprise data hub applications. In this talk, you'll learn about writing and taking advantage of applications built on YARN.

Statistics

Views

Total Views
1,885
Views on SlideShare
1,864
Embed Views
21

Actions

Likes
16
Downloads
140
Comments
0

1 Embed 21

https://twitter.com 21

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction to YARN Apps Presentation Transcript

  • 1. Intro  to  YARN  Apps   Sandy  Ryza  
  • 2. Introduc4on   •  What’s YARN? •  YARN apps •  Building YARN apps
  • 3. The  OS  analogy   Traditional Operating System Storage: File System Execution/Scheduling: Processes/Kernel Scheduler
  • 4. The  OS  analogy   Hadoop Storage: Hadoop Distributed File System (HDFS) Execution/Scheduling: YARN!
  • 5. Goal:  Mul4tenancy   •  Different types of applications on the same •  cluster Different users and organizations on the same cluster
  • 6. ResourceManager  (RM)   •  Central service that tracks •  o  Nodes §  Resources o  Applications o  Containers Houses scheduler, which is in charge of all container placement decisions
  • 7. NodeManager  (NM)   •  One on every node •  Launches container processes •  Enforces resource allocations •  Monitors liveliness
  • 8. Applica4on  Master  (AM)   •  User/application code •  Every application instance has one •  Runs inside a container on the cluster •  Requests resources from ResourceManager
  • 9. YARN   JobHistory Server ResourceManager NodeManager Container Map Task Client NodeManager Container Application Master Container Reduce Task
  • 10. Processing  Frameworks  /  YARN  apps   •  MapReduce •  •  •  o  Batch processing, fault tolerant Impala o  Low latency SQL on Hadoop Spark o  Load data into memory, great for iterative algorithms Storm o  Stream processing
  • 11. YARN  app  models   •  Applica4on  master  (AM)  per  job   Most  simple  for  batch   •  Used  by  MapReduce   • 
  • 12. YARN  app  models   •  Applica4on  master  per  session   Runs  mul4ple  jobs  on  behalf  of  the  same  user   •  Recently  added  in  Tez   •  Spark  interac4ve  mode   • 
  • 13. YARN  app  models   •  Singleton  AM  as  permanent  service   Always  on,  waits  around  for  jobs  to  come  in   •  Used  for  Impala   • 
  • 14. YARN/MR  Scheduling   ResourceManage r Fair Scheduler Decide which jobs to give resources to MapReduce Application Master Decide which tasks to give resources to within a job
  • 15. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 16. Scheduling  on  Hadoop   I want 2 containers with 1024 MB and a 1 core each Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 17. Scheduling  on  Hadoop   Application Master 1 Noted ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 18. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 I’m still here Node 1 Node 2 Node 3
  • 19. Scheduling  on  Hadoop   I’ll reserve some space on node1 for AM1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 20. Scheduling  on  Hadoop   Got anything for me? Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 21. Scheduling  on  Hadoop   Here’s a security token to let you launch a container on Node 1 Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 22. Scheduling  on  Hadoop   Hey, launch my container with this shell command Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Node 3
  • 23. Scheduling  on  Hadoop   Application Master 1 ResourceManager Application Master 2 Node 1 Node 2 Container Node 3
  • 24. Should you build a YARN app? •  MapReduce can’t run arbitrary DAGs? o  Use Spark
  • 25. Should you build a YARN app? •  MapReduce can’t store data in memory? o  Use Spark
  • 26. Should you build a YARN app? •  Iterative processing? o  Use Spark
  • 27. Should you build a YARN app? •  Have an existing distributed app that runs all tasks at once? o  Use distributed shell
  • 28. When to build a YARN app •  Allocating and releasing containers •  dynamically Weird scheduling requirements o  Gang o  Complex locality
  • 29. What YARN does for you •  Deploys your bits •  Runs your processes •  Monitors your processes •  Kills your processes when they misbehave
  • 30. What YARN does not do for you •  Communication between your processes
  • 31. AMRMClientAsync CallbackHandler handler = new CallbackHandler() { public void onContainersAllocated(List<Container> containers) { for (Container container : containers) { startTask(container); } } [... more methods] } AMRMClientAsync amClient = AMRMClientAsync.createAMRMClientAsync(1000, handler); amClient.registerApplicationMaster(NetUtils.getHostName(), -1, “”); amClient.addContainerRequest( new ContainerRequest( Resource.newInstance(1024, 1), new String[] {“node1”, “node2”}, new String[] {“rack1”}, Priority.newInstance(2)));
  • 32. NMClientAsync CallbackHandler nmHandler = new CallbackHandler() { [... listen for containers stopped and started] } NMClientAsync nmClient = NMClientAsync.createNMClientAsync(nmHandler);
  • 33. Launching Containers public void startContainer(Container container) { ContainerLaunchContext launchContext = ContainerLaunchContext.newInstance( localResources, environment, Arrays.asList(“sleep 1000”), serviceData, tokens, acls); nmClient.startContainerAsync(container, launchContext); }
  • 34. Local resources Node Container Container Node Container file.txt Container file.txt file.txt HDFS