SlideShare a Scribd company logo
1 of 26
Writing Application Frameworks
on Apache Hadoop YARN


Hitesh Shah
hitesh@hortonworks.com




© Hortonworks Inc. 2011      Page 1
Hitesh Shah - Background
• Member of Technical Staff at Hortonworks Inc.
• Committer for Apache MapReduce and Ambari
• Earlier, spent 8+ years at Yahoo! building various
  infrastructure pieces all the way from data storage
  platforms to high throughput online ad-serving
  systems.




     Architecting the Future of Big Data
                                                        Page 2
     © Hortonworks Inc. 2011
Agenda

• YARN Architecture and Concepts
• Writing a New Framework




   Architecting the Future of Big Data
                                         Page 3
   © Hortonworks Inc. 2011
YARN Architecture
• Resource Manager
  – Global resource scheduler
  – Hierarchical queues

• Node Manager
  – Per-machine agent
  – Manages the life-cycle of container
  – Container resource monitoring

• Application Master
  – Per-application
  – Manages application scheduling and task execution
  – E.g. MapReduce Application Master

     Architecting the Future of Big Data
                                                        Page 4
     © Hortonworks Inc. 2011
YARN Architecture

                                                            Node
                                                           Manager


                                                    Container   App Mstr


            Client

                                         Resource           Node
                                         Manager           Manager
            Client

                                                    App Mstr    Container




               MapReduce Status                             Node
                                                           Manager
                 Job Submission
                 Node Status
               Resource Request                     Container   Container




   Architecting the Future of Big Data
                                                                            Page 5
   © Hortonworks Inc. 2011
YARN Concepts
• Application ID
  – Application Attempt IDs
• Container
  – ContainerLaunchContext
• ResourceRequest
  – Host/Rack/Any match
  – Priority
  – Resource constraints
• Local Resource
  – File/Archive
  – Visibility – public/private/application


      Architecting the Future of Big Data
                                              Page 6
      © Hortonworks Inc. 2011
What you need for a new Framework
• Application Submission Client
  – For example, the MR Job Client
• Application Master
  – The core framework library
• Application History ( optional )
  – History of all previously run instances
• Auxiliary Services ( optional )
  – Long-running application-specific services running on the
    NodeManager




     Architecting the Future of Big Data
                                                                Page 7
     © Hortonworks Inc. 2011
Use Case: Distributed Shell
• Take a user-provided script               Node
  or application and run it on a            Manager
  set of nodes in the Cluster
                                               DS AppMaster

• Input:
   – User Script to execute
   – Number of containers to run on         Node
                                            Manager
   – Variable arguments for each
     different container                         Shell Script
   – Memory requirements for the
     shell script                           Node
   – Output Location/Dir                    Manager
                                                 Shell Script


      Architecting the Future of Big Data
                                                                Page 8
      © Hortonworks Inc. 2011
Client: RPC calls
•  Uses ClientRM Protocol
                                                        ClientRMProtocol#getNewApplication

•  Get a new Application
   ID from the RM
                                                        ClientRMProtocol#submitApplication



•  Application Submission                      CLIENT
                                                                                                RM

                                                        ClientRMProtocol#getApplicationReport


•  Application Monitoring
                                                         ClientRMProtocol#killApplication


•  Kill the Application?




         Architecting the Future of Big Data
                                                                                                Page 9
         © Hortonworks Inc. 2011
Client
• Registration with the RM
   – New Application ID


• Application Submission
   – User information
   – Scheduler queue
   – Define the container for the Distributed Shell App Master via
     the ContainerLaunchContext

•  Application Monitoring
   – AppMaster host details with tokens if needed, tracking url
   – Application Status (submitted/running/finished)


       Architecting the Future of Big Data
                                                                  Page 10
       © Hortonworks Inc. 2011
Defining a Container
• ContainerLaunchContext class
  – Can run a shell script, a java process or launch a VM


• Command(s) to run
• Local resources needed for the process to run
  – Dependent jars, native libs, data files/archives
• Environment to setup
  – Java Classpath
• Security-related data
  – Container Tokens



      Architecting the Future of Big Data
                                                            Page 11
      © Hortonworks Inc. 2011
Application Master: RPC calls
•  AMRM and CM protocols
                                              Client

•  Register AM with RM                                         AMRM.registerAM


•  Ask RM to allocate
   resources                                                       AMRM.allocate
                                                          AM
                                                                                          RM
•  Launch tasks on
   allocated containers                                                       AMRM.
                                                                             finishAM
                                                 App-specific
                                                    RPC
•  Manage tasks to final
   completion
                                                                CM.startContainer

•  Inform RM of completion                               NM      NM




        Architecting the Future of Big Data
                                                                                       Page 12
        © Hortonworks Inc. 2011
Application Master
•  Setup RPC to handle requests from Client and/or tasks launched
   on Containers

•  Register and send regular heartbeats to the RM

•  Request resources from the RM.

•  Launch user shell script on containers as and when allocated.

•  Monitor status of user script of remote containers and manage
   failures by retrying if needed.

•  Inform RM of completion when application is done.


       Architecting the Future of Big Data
                                                                   Page 13
       © Hortonworks Inc. 2011
AMRM#allocate
• Request:
  – Containers needed
      – Not a delta protocol
      – Locality constraints: Host/Rack/Any
      – Resource constraints: memory
      – Priority-based assignments

  – Containers to release – extra/unwanted?
      – Only non-launched containers

• Response:
  – Allocated Containers
      – Launch or release

  – Completed Containers
      – Status of completion

     Architecting the Future of Big Data
                                              Page 14
     © Hortonworks Inc. 2011
YARN Applications
• Data Processing:
  – OpenMPI on Hadoop
  – Spark (UC Berkeley)
       –  Shark ( Hive-on-Spark )

  – Real-time data processing
       –  Storm ( Twitter )
       –  Apache S4

  – Graph processing – Apache Giraph
• Beyond data:
  – Deploying Apache HBase via YARN (HBASE-4329)
  – Hbase Co-processors via YARN (HBASE-4047)




      Architecting the Future of Big Data
                                                   Page 15
      © Hortonworks Inc. 2011
References

• Doc on writing new applications:
  – WritingYarnApplications.html ( available at
   http://hadoop.apache.org/common/docs/r2.0.0-
   alpha/ )




     Architecting the Future of Big Data
                                                  Page 16
     © Hortonworks Inc. 2011
Questions?


Thank You!
Hitesh Shah
hitesh@hortonworks.com




       Architecting the Future of Big Data
                                             Page 17
       © Hortonworks Inc. 2011
Appendix: Code Examples




  Architecting the Future of Big Data
                                        Page 18
  © Hortonworks Inc. 2011
Client: Registration
ClientRMProtocol applicationsManager;
YarnConfiguration yarnConf = new YarnConfiguration(conf);
InetSocketAddress rmAddress = NetUtils.createSocketAddr(
  yarnConf.get(YarnConfiguration.RM_ADDRESS));

applicationsManager = ((ClientRMProtocol)
  rpc.getProxy(ClientRMProtocol.class,
               rmAddress, appsManagerServerConf));

GetNewApplicationRequest request =
  Records.newRecord(GetNewApplicationRequest.class);
GetNewApplicationResponse response =
  applicationsManager.getNewApplication(request);




       Architecting the Future of Big Data
                                                            Page 19
       © Hortonworks Inc. 2011
Client: App Submission
ApplicationSubmissionContext appContext;

ContainerLaunchContext amContainer;
amContainer.setLocalResources(Map<String, LocalResource> localResources);
amContainer.setEnvironment(Map<String, String> env);
String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2
“;
amContainer.setCommands(List<String> commands);
Resource capability; capability.setMemory(amMemory); amContainer.setResource
(capability);

appContext.setAMContainerSpec(amContainer);

SubmitApplicationRequest appRequest;
appRequest.setApplicationSubmissionContext(appContext);

applicationsManager.submitApplication(appRequest);


        Architecting the Future of Big Data
                                                                         Page 20
        © Hortonworks Inc. 2011
Client: App Monitoring
•  Get Application Status

GetApplicationReportRequest reportRequest =
    Records.newRecord(GetApplicationReportRequest.class);
reportRequest.setApplicationId(appId);
GetApplicationReportResponse reportResponse =
  applicationsManager.getApplicationReport(reportRequest);
ApplicationReport report = reportResponse.getApplicationReport();


•  Kill the application

KillApplicationRequest killRequest =
      Records.newRecord(KillApplicationRequest.class);
killRequest.setApplicationId(appId);
applicationsManager.forceKillApplication(killRequest);

       Architecting the Future of Big Data
                                                                    Page 21
       © Hortonworks Inc. 2011
AM: Ask RM for Containers
ResourceRequest rsrcRequest;
rsrcRequest.setHostName("*”); // hostname, rack, wildcard
rsrcRequest.setPriority(pri);
Resource capability; capability.setMemory(containerMemory);
rsrcRequest.setCapability(capability)
rsrcRequest.setNumContainers(numContainers);

List<ResourceRequest> requestedContainers;
List<ContainerId> releasedContainers;

AllocateRequest req;
req.setResponseId(rmRequestID);
req.addAllAsks(requestedContainers);
req.addAllReleases(releasedContainers);
req.setProgress(currentProgress);
AllocateResponse allocateResponse = resourceManager.allocate(req);



        Architecting the Future of Big Data
                                                                     Page 22
        © Hortonworks Inc. 2011
AM: Launch Containers
AMResponse amResp = allocateResponse.getAMResponse();

ContainerManager cm = (ContainerManager)rpc.getProxy
  (ContainerManager.class, cmAddress, conf);

List<Container> allocatedContainers = amResp.getAllocatedContainers();
for (Container allocatedContainer : allocatedContainers) {
   ContainerLaunchContext ctx;
   ctx.setContainerId(allocatedContainer .getId());
   ctx.setResource(allocatedContainer .getResource());
   // set env, command, local resources, …

    StartContainerRequest startReq;
    startReq.setContainerLaunchContext(ctx);
    cm.startContainer(startReq);
}

         Architecting the Future of Big Data
                                                                         Page 23
         © Hortonworks Inc. 2011
AM: Monitoring Containers
•  Running Containers
GetContainerStatusRequest statusReq;
statusReq.setContainerId(containerId);
GetContainerStatusResponse statusResp =
  cm.getContainerStatus(statusReq);


•  Completed Containers
AMResponse amResp = allocateResponse.getAMResponse();
List<Container> completedContainersStatus =
  amResp.getCompletedContainerStatuses();
for (ContainerStatus containerStatus : completedContainers) {
    // containerStatus.getContainerId()
    // containerStatus.getExitStatus()
    // containerStatus.getDiagnostics()
}



        Architecting the Future of Big Data
                                                                Page 24
        © Hortonworks Inc. 2011
AM: I am done
FinishApplicationMasterRequest finishReq;
finishReq.setAppAttemptId(appAttemptID);

finishReq.setFinishApplicationStatus
   (FinalApplicationStatus.SUCCEEDED); // or FAILED

finishReq.setDiagnostics(diagnostics);

resourceManager.finishApplicationMaster(finishReq);




       Architecting the Future of Big Data
                                                      Page 25
       © Hortonworks Inc. 2011
Thank You!


             Page 26

More Related Content

What's hot

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionDebugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionXuan Gong
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveDataWorks Summit
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambariHortonworks
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureDataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsDataWorks Summit
 

What's hot (20)

Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Apache Slider
Apache SliderApache Slider
Apache Slider
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hadoop YARN Services
Hadoop YARN ServicesHadoop YARN Services
Hadoop YARN Services
 
Debugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in ProductionDebugging Apache Hadoop YARN Cluster in Production
Debugging Apache Hadoop YARN Cluster in Production
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
LLAP: long-lived execution in Hive
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
 
Hortonworks technical workshop operations with ambari
Hortonworks technical workshop   operations with ambariHortonworks technical workshop   operations with ambari
Hortonworks technical workshop operations with ambari
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 
Yarn
YarnYarn
Yarn
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
A Multi Colored YARN
A Multi Colored YARNA Multi Colored YARN
A Multi Colored YARN
 
Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010Database as a Service - Tutorial @ICDE 2010
Database as a Service - Tutorial @ICDE 2010
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Apache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and FutureApache Hadoop YARN 2015: Present and Future
Apache Hadoop YARN 2015: Present and Future
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agentsTuning Apache Ambari performance for Big Data at scale with 3000 agents
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
 

Similar to Writing app framworks for hadoop on yarn

Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012hitesh1892
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史Insight Technology, Inc.
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Apache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationApache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationHortonworks
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...Zhijie Shen
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Cloudera, Inc.
 
Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInHortonworks
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
MapReduce Container ReUse
MapReduce Container ReUseMapReduce Container ReUse
MapReduce Container ReUseHortonworks
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Tsuyoshi OZAWA
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHortonworks
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireCarter Shanklin
 
Building a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServerBuilding a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServerAfkham Azeez
 

Similar to Writing app framworks for hadoop on yarn (20)

Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012Writing YARN Applications Hadoop Summit 2012
Writing YARN Applications Hadoop Summit 2012
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Apache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationApache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup Presentation
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedIn
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
MapReduce Container ReUse
MapReduce Container ReUseMapReduce Container ReUse
MapReduce Container ReUse
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014
 
Running Legacy Applications with Containers
Running Legacy Applications with ContainersRunning Legacy Applications with Containers
Running Legacy Applications with Containers
 
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next GenHadoop World 2011, Apache Hadoop MapReduce Next Gen
Hadoop World 2011, Apache Hadoop MapReduce Next Gen
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFire
 
Building a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServerBuilding a multi-tenanted Cloud-native AppServer
Building a multi-tenanted Cloud-native AppServer
 
Yarn
YarnYarn
Yarn
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Writing app framworks for hadoop on yarn

  • 1. Writing Application Frameworks on Apache Hadoop YARN Hitesh Shah hitesh@hortonworks.com © Hortonworks Inc. 2011 Page 1
  • 2. Hitesh Shah - Background • Member of Technical Staff at Hortonworks Inc. • Committer for Apache MapReduce and Ambari • Earlier, spent 8+ years at Yahoo! building various infrastructure pieces all the way from data storage platforms to high throughput online ad-serving systems. Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  • 3. Agenda • YARN Architecture and Concepts • Writing a New Framework Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2011
  • 4. YARN Architecture • Resource Manager – Global resource scheduler – Hierarchical queues • Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring • Application Master – Per-application – Manages application scheduling and task execution – E.g. MapReduce Application Master Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2011
  • 5. YARN Architecture Node Manager Container App Mstr Client Resource Node Manager Manager Client App Mstr Container MapReduce Status Node Manager Job Submission Node Status Resource Request Container Container Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2011
  • 6. YARN Concepts • Application ID – Application Attempt IDs • Container – ContainerLaunchContext • ResourceRequest – Host/Rack/Any match – Priority – Resource constraints • Local Resource – File/Archive – Visibility – public/private/application Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2011
  • 7. What you need for a new Framework • Application Submission Client – For example, the MR Job Client • Application Master – The core framework library • Application History ( optional ) – History of all previously run instances • Auxiliary Services ( optional ) – Long-running application-specific services running on the NodeManager Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2011
  • 8. Use Case: Distributed Shell • Take a user-provided script Node or application and run it on a Manager set of nodes in the Cluster DS AppMaster • Input: – User Script to execute – Number of containers to run on Node Manager – Variable arguments for each different container Shell Script – Memory requirements for the shell script Node – Output Location/Dir Manager Shell Script Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  • 9. Client: RPC calls •  Uses ClientRM Protocol ClientRMProtocol#getNewApplication •  Get a new Application ID from the RM ClientRMProtocol#submitApplication •  Application Submission CLIENT RM ClientRMProtocol#getApplicationReport •  Application Monitoring ClientRMProtocol#killApplication •  Kill the Application? Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2011
  • 10. Client • Registration with the RM – New Application ID • Application Submission – User information – Scheduler queue – Define the container for the Distributed Shell App Master via the ContainerLaunchContext •  Application Monitoring – AppMaster host details with tokens if needed, tracking url – Application Status (submitted/running/finished) Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2011
  • 11. Defining a Container • ContainerLaunchContext class – Can run a shell script, a java process or launch a VM • Command(s) to run • Local resources needed for the process to run – Dependent jars, native libs, data files/archives • Environment to setup – Java Classpath • Security-related data – Container Tokens Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2011
  • 12. Application Master: RPC calls •  AMRM and CM protocols Client •  Register AM with RM AMRM.registerAM •  Ask RM to allocate resources AMRM.allocate AM RM •  Launch tasks on allocated containers AMRM. finishAM App-specific RPC •  Manage tasks to final completion CM.startContainer •  Inform RM of completion NM NM Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  • 13. Application Master •  Setup RPC to handle requests from Client and/or tasks launched on Containers •  Register and send regular heartbeats to the RM •  Request resources from the RM. •  Launch user shell script on containers as and when allocated. •  Monitor status of user script of remote containers and manage failures by retrying if needed. •  Inform RM of completion when application is done. Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  • 14. AMRM#allocate • Request: – Containers needed – Not a delta protocol – Locality constraints: Host/Rack/Any – Resource constraints: memory – Priority-based assignments – Containers to release – extra/unwanted? – Only non-launched containers • Response: – Allocated Containers – Launch or release – Completed Containers – Status of completion Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2011
  • 15. YARN Applications • Data Processing: – OpenMPI on Hadoop – Spark (UC Berkeley) –  Shark ( Hive-on-Spark ) – Real-time data processing –  Storm ( Twitter ) –  Apache S4 – Graph processing – Apache Giraph • Beyond data: – Deploying Apache HBase via YARN (HBASE-4329) – Hbase Co-processors via YARN (HBASE-4047) Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2011
  • 16. References • Doc on writing new applications: – WritingYarnApplications.html ( available at http://hadoop.apache.org/common/docs/r2.0.0- alpha/ ) Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  • 17. Questions? Thank You! Hitesh Shah hitesh@hortonworks.com Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2011
  • 18. Appendix: Code Examples Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2011
  • 19. Client: Registration ClientRMProtocol applicationsManager; YarnConfiguration yarnConf = new YarnConfiguration(conf); InetSocketAddress rmAddress = NetUtils.createSocketAddr( yarnConf.get(YarnConfiguration.RM_ADDRESS)); applicationsManager = ((ClientRMProtocol) rpc.getProxy(ClientRMProtocol.class, rmAddress, appsManagerServerConf)); GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class); GetNewApplicationResponse response = applicationsManager.getNewApplication(request); Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  • 20. Client: App Submission ApplicationSubmissionContext appContext; ContainerLaunchContext amContainer; amContainer.setLocalResources(Map<String, LocalResource> localResources); amContainer.setEnvironment(Map<String, String> env); String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2 “; amContainer.setCommands(List<String> commands); Resource capability; capability.setMemory(amMemory); amContainer.setResource (capability); appContext.setAMContainerSpec(amContainer); SubmitApplicationRequest appRequest; appRequest.setApplicationSubmissionContext(appContext); applicationsManager.submitApplication(appRequest); Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2011
  • 21. Client: App Monitoring •  Get Application Status GetApplicationReportRequest reportRequest = Records.newRecord(GetApplicationReportRequest.class); reportRequest.setApplicationId(appId); GetApplicationReportResponse reportResponse = applicationsManager.getApplicationReport(reportRequest); ApplicationReport report = reportResponse.getApplicationReport(); •  Kill the application KillApplicationRequest killRequest = Records.newRecord(KillApplicationRequest.class); killRequest.setApplicationId(appId); applicationsManager.forceKillApplication(killRequest); Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2011
  • 22. AM: Ask RM for Containers ResourceRequest rsrcRequest; rsrcRequest.setHostName("*”); // hostname, rack, wildcard rsrcRequest.setPriority(pri); Resource capability; capability.setMemory(containerMemory); rsrcRequest.setCapability(capability) rsrcRequest.setNumContainers(numContainers); List<ResourceRequest> requestedContainers; List<ContainerId> releasedContainers; AllocateRequest req; req.setResponseId(rmRequestID); req.addAllAsks(requestedContainers); req.addAllReleases(releasedContainers); req.setProgress(currentProgress); AllocateResponse allocateResponse = resourceManager.allocate(req); Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2011
  • 23. AM: Launch Containers AMResponse amResp = allocateResponse.getAMResponse(); ContainerManager cm = (ContainerManager)rpc.getProxy (ContainerManager.class, cmAddress, conf); List<Container> allocatedContainers = amResp.getAllocatedContainers(); for (Container allocatedContainer : allocatedContainers) { ContainerLaunchContext ctx; ctx.setContainerId(allocatedContainer .getId()); ctx.setResource(allocatedContainer .getResource()); // set env, command, local resources, … StartContainerRequest startReq; startReq.setContainerLaunchContext(ctx); cm.startContainer(startReq); } Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  • 24. AM: Monitoring Containers •  Running Containers GetContainerStatusRequest statusReq; statusReq.setContainerId(containerId); GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq); •  Completed Containers AMResponse amResp = allocateResponse.getAMResponse(); List<Container> completedContainersStatus = amResp.getCompletedContainerStatuses(); for (ContainerStatus containerStatus : completedContainers) { // containerStatus.getContainerId() // containerStatus.getExitStatus() // containerStatus.getDiagnostics() } Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  • 25. AM: I am done FinishApplicationMasterRequest finishReq; finishReq.setAppAttemptId(appAttemptID); finishReq.setFinishApplicationStatus (FinalApplicationStatus.SUCCEEDED); // or FAILED finishReq.setDiagnostics(diagnostics); resourceManager.finishApplicationMaster(finishReq); Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2011
  • 26. Thank You! Page 26