Hortonworks Yarn Code Walk Through January 2014

3,893 views

Published on

This slide deck accompanies the Webinar recording YARN Code Walk through on Jan. 22, 2014, on Hortonworks.com/webinars under Past Webinars, or
https://hortonworks.webex.com/hortonworks/lsr.php?AT=pb&SP=EC&rID=129468197&rKey=b645044305775657

Published in: Technology, Business
1 Comment
13 Likes
Statistics
Notes
No Downloads
Views
Total views
3,893
On SlideShare
0
From Embeds
0
Number of Embeds
342
Actions
Shares
0
Downloads
185
Comments
1
Likes
13
Embeds 0
No embeds

No notes for slide
  • So while Hadoop 1.x had its uses this is really about turning Hadoop into the next generation platform. So what does that mean? A platform should be able to do multiple things, ergo more then just batch processing. Need Batch, Interactive, Online, and Streaming capabilities to really turn Hadoop into a Next Gen Platform.SCALES! Yahoo plans to move into a 10k node cluster
  • Now we have a concept of deploying applications into the hadoop clusterThese applications run in containers of set resources
  • RM takes place of JT and still has scheduling ques and such like the fair, capacity and hierarchical ques
  • Hortonworks Yarn Code Walk Through January 2014

    1. 1. YARN Code Overview Ocular bleeding is no reason to stop programing! © Hortonworks Inc. 2013 Page 1
    2. 2. Quick Bio – Joseph Niemiec • Hadoop user for 2+ years • 1 of 5 Author’s for Apache Hadoop YARN • Originally used Hadoop for location based services (March 2014) – Destination Prediction – Traffic Analysis – Effects of weather at client locations on call center call types • Pending Patent in Automotive/Telematics domain • Defensive Paper on M2M Validation • Started on analytics to be better at an MMORPG © Hortonworks Inc. 2013
    3. 3. Agenda • What Is YARN • YARN Concepts & Architecture • Code and more Code • Q&A © Hortonworks Inc. 2013 Page 3
    4. 4. From Batch To Anything Single Use System Multi Purpose Platform Batch Apps Batch, Interactive, Online, Streaming, … HADOOP 1.0 HADOOP 2.0 MapReduce (data processing) MapReduce Others (data processing) YARN (cluster resource management & data processing) (cluster resource management) HDFS HDFS2 (redundant, reliable storage) (redundant, reliable storage) © Hortonworks Inc. 2013 Page 4
    5. 5. Concepts • Application –Application is a job submitted to the framework –Examples – Map Reduce Job – MoYa Cluster • Container –Basic unit of allocation –Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU –Replaces the fixed map/reduce slots © Hortonworks Inc. 2013 5
    6. 6. Architecture • Resource Manager –Global resource scheduler –Hierarchical queues • Node Manager –Per-machine agent –Manages the life-cycle of container –Container resource monitoring • Application Master –Per-application –Manages application scheduling and task execution –E.g. MapReduce Application Master © Hortonworks Inc. 2013 6
    7. 7. To the code! © Hortonworks Inc. 2013 Page 7
    8. 8. Q&A © Hortonworks Inc. 2013 Page 8
    9. 9. YARN - ApplicationMaster • ApplicationMaster – ApplicationSubmissionContext is the complete specification of the ApplicationMaster, provided by Client – ResourceManager responsible for allocating and launching ApplicationMaster container ApplicationSubmissionContext resourceRequest containerLaunchContext appName queue © Hortonworks Inc. 2013 Page 9
    10. 10. YARN – Resource Allocation & Usage • ContainerLaunchContext – The context provided by ApplicationMaster to NodeManager to launch the Container – Complete specification for a process – LocalResource used to specify container binary and dependencies – NodeManager responsible for downloading from shared namespace (typically HDFS) ContainerLaunchContext container commands environment localResources LocalResource uri type © Hortonworks Inc. 2013 Page 10
    11. 11. YARN – Resource Allocation & Usage • ResourceRequest priority 1 © Hortonworks Inc. 2013 <4gb, 1 core> numContainers 1 rack0 1 * <2gb, 1 core> resourceName host01 0 capability 1 * 1 Page 11
    12. 12. YARN – Resource Allocation & Usage • Container – The basic unit of allocation in YARN – The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster – A specific amount of resources (cpu, memory etc.) on a specific machine Container containerId resourceName capability tokens © Hortonworks Inc. 2013 Page 12

    ×