SlideShare a Scribd company logo
1 of 25
Writing Application Frameworks
on Apache Hadoop YARN


Hitesh Shah
hitesh@hortonworks.com




© Hortonworks Inc. 2011      Page 1
Hitesh Shah - Background
• Member of Technical Staff at Hortonworks Inc.
• Committer for Apache MapReduce and Ambari
• Earlier, spent 8+ years at Yahoo! building various
  infrastructure pieces all the way from data storage
  platforms to high throughput online ad-serving
  systems.




     Architecting the Future of Big Data
                                                        Page 2
     © Hortonworks Inc. 2011
Agenda

•YARN Architecture and Concepts
•Writing a New Framework




   Architecting the Future of Big Data
                                         Page 3
   © Hortonworks Inc. 2011
YARN Architecture
• Resource Manager
  –Global resource scheduler
  –Hierarchical queues
• Node Manager
  –Per-machine agent
  –Manages the life-cycle of container
  –Container resource monitoring
• Application Master
  –Per-application
  –Manages application scheduling and task execution
  –E.g. MapReduce Application Master

     Architecting the Future of Big Data
                                                       Page 4
     © Hortonworks Inc. 2011
YARN Architecture

                                                            Node
                                                           Manager


                                                    Container   App Mstr


            Client

                                         Resource           Node
                                         Manager           Manager
            Client

                                                    App Mstr    Container




              MapReduce Status                              Node
                                                           Manager
                 Job Submission
                 Node Status
               Resource Request                     Container   Container




   Architecting the Future of Big Data
                                                                            Page 5
   © Hortonworks Inc. 2011
YARN Concepts
• Application ID
  –Application Attempt IDs
• Container
  –ContainerLaunchContext
• ResourceRequest
  –Host/Rack/Any match
  –Priority
  –Resource constraints
• Local Resource
  –File/Archive
  –Visibility – public/private/application


      Architecting the Future of Big Data
                                             Page 6
      © Hortonworks Inc. 2011
What you need for a new Framework
• Application Submission Client
  –For example, the MR Job Client
• Application Master
  –The core framework library
• Application History ( optional )
  –History of all previously run instances
• Auxiliary Services ( optional )
  –Long-running application-specific services running on the
   NodeManager




     Architecting the Future of Big Data
                                                               Page 7
     © Hortonworks Inc. 2011
Use Case: Distributed Shell
• Take a user-provided script               Node
  or application and run it on a            Manager
  set of nodes in the Cluster
                                               DS AppMaster

• Input:
   – User Script to execute
   – Number of containers to run on         Node
                                            Manager
   – Variable arguments for each
     different container                         Shell Script
   – Memory requirements for the
     shell script                           Node
   – Output Location/Dir                    Manager
                                                 Shell Script


      Architecting the Future of Big Data
                                                                Page 8
      © Hortonworks Inc. 2011
Client: RPC calls
• Uses ClientRM Protocol
                                                        ClientRMProtocol#getNewApplication

• Get a new Application
  ID from the RM
                                                        ClientRMProtocol#submitApplication



• Application Submission                       CLIENT
                                                                                                RM

                                                        ClientRMProtocol#getApplicationReport


• Application Monitoring
                                                         ClientRMProtocol#killApplication


• Kill the Application?




         Architecting the Future of Big Data
                                                                                                Page 9
         © Hortonworks Inc. 2011
Client
• Registration with the RM
  –New Application ID


• Application Submission
  –User information
  –Scheduler queue
  –Define the container for the Distributed Shell App Master via
   the ContainerLaunchContext

• Application Monitoring
  – AppMaster host details with tokens if needed, tracking url
  – Application Status (submitted/running/finished)


      Architecting the Future of Big Data
                                                                 Page 10
      © Hortonworks Inc. 2011
Defining a Container
• ContainerLaunchContext class
  –Can run a shell script, a java process or launch a VM


• Command(s) to run
• Local resources needed for the process to run
  –Dependent jars, native libs, data files/archives
• Environment to setup
  –Java Classpath
• Security-related data
  –Container Tokens



      Architecting the Future of Big Data
                                                           Page 11
      © Hortonworks Inc. 2011
Application Master: RPC calls
• AMRM and CM protocols
                                             Client

• Register AM with RM                                         AMRM.registerAM


• Ask RM to allocate
  resources                                                       AMRM.allocate
                                                         AM
                                                                                         RM
• Launch tasks on
  allocated containers                                                       AMRM.
                                                                            finishAM
                                                App-specific
• Manage tasks to final                            RPC

  completion
                                                               CM.startContainer

• Inform RM of completion                               NM      NM




       Architecting the Future of Big Data
                                                                                      Page 12
       © Hortonworks Inc. 2011
Application Master
• Setup RPC to handle requests from Client and/or tasks launched
  on Containers

• Register and send regular heartbeats to the RM

• Request resources from the RM.

• Launch user shell script on containers as and when allocated.

• Monitor status of user script of remote containers and manage
  failures by retrying if needed.

• Inform RM of completion when application is done.


      Architecting the Future of Big Data
                                                                  Page 13
      © Hortonworks Inc. 2011
AMRM#allocate
• Request:
  – Containers needed
      – Not a delta protocol
      – Locality constraints: Host/Rack/Any
      – Resource constraints: memory
      – Priority-based assignments

  – Containers to release – extra/unwanted?
      – Only non-launched containers

• Response:
  – Allocated Containers
      – Launch or release

  – Completed Containers
      – Status of completion

     Architecting the Future of Big Data
                                              Page 14
     © Hortonworks Inc. 2011
YARN Applications
• Data Processing:
  – OpenMPI on Hadoop
  – Spark (UC Berkeley)
       – Shark ( Hive-on-Spark )

  – Real-time data processing
       – Storm ( Twitter )
       – Apache S4

  – Graph processing – Apache Giraph
• Beyond data:
  – Deploying Apache HBase via YARN (HBASE-4329)
  – Hbase Co-processors via YARN (HBASE-4047)




      Architecting the Future of Big Data
                                                   Page 15
      © Hortonworks Inc. 2011
References

•Doc on writing new applications:
  –WritingYarnApplications.html ( available at
   http://hadoop.apache.org/common/docs/r2.0.0-
   alpha/ )




     Architecting the Future of Big Data
                                                 Page 16
     © Hortonworks Inc. 2011
Questions?


Thank You!
Hitesh Shah
hitesh@hortonworks.com




       Architecting the Future of Big Data
                                             Page 17
       © Hortonworks Inc. 2011
Appendix: Code
Examples



  Architecting the Future of Big Data
                                        Page 18
  © Hortonworks Inc. 2011
Client: Registration
ClientRMProtocol applicationsManager;
YarnConfiguration yarnConf = new YarnConfiguration(conf);
InetSocketAddress rmAddress = NetUtils.createSocketAddr(
  yarnConf.get(YarnConfiguration.RM_ADDRESS));

applicationsManager = ((ClientRMProtocol)
  rpc.getProxy(ClientRMProtocol.class,
               rmAddress, appsManagerServerConf));

GetNewApplicationRequest request =
  Records.newRecord(GetNewApplicationRequest.class);
GetNewApplicationResponse response =
  applicationsManager.getNewApplication(request);




       Architecting the Future of Big Data
                                                            Page 19
       © Hortonworks Inc. 2011
Client: App Submission
ApplicationSubmissionContext appContext;

ContainerLaunchContext amContainer;
amContainer.setLocalResources(Map<String, LocalResource> localResources);
amContainer.setEnvironment(Map<String, String> env);
String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2
“;
amContainer.setCommands(List<String> commands);
Resource capability; capability.setMemory(amMemory);
amContainer.setResource(capability);

appContext.setAMContainerSpec(amContainer);

SubmitApplicationRequest appRequest;
appRequest.setApplicationSubmissionContext(appContext);

applicationsManager.submitApplication(appRequest);


        Architecting the Future of Big Data
                                                                          Page 20
        © Hortonworks Inc. 2011
Client: App Monitoring
• Get Application Status

GetApplicationReportRequest reportRequest =
    Records.newRecord(GetApplicationReportRequest.class);
reportRequest.setApplicationId(appId);
GetApplicationReportResponse reportResponse =
  applicationsManager.getApplicationReport(reportRequest);
ApplicationReport report = reportResponse.getApplicationReport();


• Kill the application

KillApplicationRequest killRequest =
      Records.newRecord(KillApplicationRequest.class);
killRequest.setApplicationId(appId);
applicationsManager.forceKillApplication(killRequest);

       Architecting the Future of Big Data
                                                                    Page 21
       © Hortonworks Inc. 2011
AM: Ask RM for Containers
ResourceRequest rsrcRequest;
rsrcRequest.setHostName("*”); // hostname, rack, wildcard
rsrcRequest.setPriority(pri);
Resource capability; capability.setMemory(containerMemory);
rsrcRequest.setCapability(capability)
rsrcRequest.setNumContainers(numContainers);

List<ResourceRequest> requestedContainers;
List<ContainerId> releasedContainers;

AllocateRequest req;
req.setResponseId(rmRequestID);
req.addAllAsks(requestedContainers);
req.addAllReleases(releasedContainers);
req.setProgress(currentProgress);
AllocateResponse allocateResponse = resourceManager.allocate(req);



        Architecting the Future of Big Data
                                                                     Page 22
        © Hortonworks Inc. 2011
AM: Launch Containers
AMResponse amResp = allocateResponse.getAMResponse();

ContainerManager cm = (ContainerManager)rpc.getProxy
  (ContainerManager.class, cmAddress, conf);

List<Container> allocatedContainers = amResp.getAllocatedContainers();
for (Container allocatedContainer : allocatedContainers) {
   ContainerLaunchContext ctx;
   ctx.setContainerId(allocatedContainer .getId());
   ctx.setResource(allocatedContainer .getResource());
   // set env, command, local resources, …

    StartContainerRequest startReq;
    startReq.setContainerLaunchContext(ctx);
    cm.startContainer(startReq);
}

        Architecting the Future of Big Data
                                                                         Page 23
        © Hortonworks Inc. 2011
AM: Monitoring Containers
• Running Containers
GetContainerStatusRequest statusReq;
statusReq.setContainerId(containerId);
GetContainerStatusResponse statusResp =
  cm.getContainerStatus(statusReq);


• Completed Containers
AMResponse amResp = allocateResponse.getAMResponse();
List<Container> completedContainersStatus =
  amResp.getCompletedContainerStatuses();
for (ContainerStatus containerStatus : completedContainers) {
    // containerStatus.getContainerId()
    // containerStatus.getExitStatus()
    // containerStatus.getDiagnostics()
}



        Architecting the Future of Big Data
                                                                Page 24
        © Hortonworks Inc. 2011
AM: I am done
FinishApplicationMasterRequest finishReq;
finishReq.setAppAttemptId(appAttemptID);

finishReq.setFinishApplicationStatus
   (FinalApplicationStatus.SUCCEEDED); // or FAILED

finishReq.setDiagnostics(diagnostics);

resourceManager.finishApplicationMaster(finishReq);




       Architecting the Future of Big Data
                                                      Page 25
       © Hortonworks Inc. 2011

More Related Content

What's hot

Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsHortonworks
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerVertiCloud Inc
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupRommel Garcia
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN AppsCloudera, Inc.
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Simplilearn
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnDataWorks Summit
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnMike Frampton
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceHortonworks
 

What's hot (20)

Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with HadoopApache Hadoop YARN - The Future of Data Processing with Hadoop
Apache Hadoop YARN - The Future of Data Processing with Hadoop
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data ApplicationsApache Hadoop YARN - Enabling Next Generation Data Applications
Apache Hadoop YARN - Enabling Next Generation Data Applications
 
Yarn
YarnYarn
Yarn
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
YARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User GroupYARN - Presented At Dallas Hadoop User Group
YARN - Presented At Dallas Hadoop User Group
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Introduction to YARN Apps
Introduction to YARN AppsIntroduction to YARN Apps
Introduction to YARN Apps
 
Hadoop YARN overview
Hadoop YARN overviewHadoop YARN overview
Hadoop YARN overview
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
 
Writing app framworks for hadoop on yarn
Writing app framworks for hadoop on yarnWriting app framworks for hadoop on yarn
Writing app framworks for hadoop on yarn
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
NextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduceNextGen Apache Hadoop MapReduce
NextGen Apache Hadoop MapReduce
 

Viewers also liked

Lars George HBase Seminar with O'REILLY Oct.12 2012
Lars George HBase Seminar with O'REILLY Oct.12 2012Lars George HBase Seminar with O'REILLY Oct.12 2012
Lars George HBase Seminar with O'REILLY Oct.12 2012Cloudera Japan
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理Makoto Yui
 
Data analytics with hadoop hive on multiple data centers
Data analytics with hadoop hive on multiple data centersData analytics with hadoop hive on multiple data centers
Data analytics with hadoop hive on multiple data centersHirotaka Niisato
 
【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理Developers Summit
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Hortonworks
 
Cloudera Manager4.0とNameNode-HAセミナー資料
Cloudera Manager4.0とNameNode-HAセミナー資料Cloudera Manager4.0とNameNode-HAセミナー資料
Cloudera Manager4.0とNameNode-HAセミナー資料Cloudera Japan
 
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)Satoshi Yamada
 
20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回都元ダイスケ Miyamoto
 
あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界Yoshinori Nakanishi
 
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方kwatch
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterBill Graham
 
SQLチューニング入門 入門編
SQLチューニング入門 入門編SQLチューニング入門 入門編
SQLチューニング入門 入門編Miki Shimogai
 
Datalogからsqlへの トランスレータを書いた話
Datalogからsqlへの トランスレータを書いた話Datalogからsqlへの トランスレータを書いた話
Datalogからsqlへの トランスレータを書いた話Yuki Takeichi
 
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~Miki Shimogai
 

Viewers also liked (16)

Lars George HBase Seminar with O'REILLY Oct.12 2012
Lars George HBase Seminar with O'REILLY Oct.12 2012Lars George HBase Seminar with O'REILLY Oct.12 2012
Lars George HBase Seminar with O'REILLY Oct.12 2012
 
並列データベースシステムの概念と原理
並列データベースシステムの概念と原理並列データベースシステムの概念と原理
並列データベースシステムの概念と原理
 
Data analytics with hadoop hive on multiple data centers
Data analytics with hadoop hive on multiple data centersData analytics with hadoop hive on multiple data centers
Data analytics with hadoop hive on multiple data centers
 
【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理【17-E-3】 オンライン機械学習で実現する大規模データ処理
【17-E-3】 オンライン機械学習で実現する大規模データ処理
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
Cloudera Manager4.0とNameNode-HAセミナー資料
Cloudera Manager4.0とNameNode-HAセミナー資料Cloudera Manager4.0とNameNode-HAセミナー資料
Cloudera Manager4.0とNameNode-HAセミナー資料
 
Database smells
Database smellsDatabase smells
Database smells
 
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
PostgreSQLの実行計画を読み解こう(OSC2015 Spring/Tokyo)
 
20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回20120830 DBリファクタリング読書会第三回
20120830 DBリファクタリング読書会第三回
 
あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界あなたの知らないPostgreSQL監視の世界
あなたの知らないPostgreSQL監視の世界
 
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
【SQLインジェクション対策】徳丸先生に怒られない、動的SQLの安全な組み立て方
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
 
SQLチューニング入門 入門編
SQLチューニング入門 入門編SQLチューニング入門 入門編
SQLチューニング入門 入門編
 
ならば(その弐)
ならば(その弐)ならば(その弐)
ならば(その弐)
 
Datalogからsqlへの トランスレータを書いた話
Datalogからsqlへの トランスレータを書いた話Datalogからsqlへの トランスレータを書いた話
Datalogからsqlへの トランスレータを書いた話
 
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
PostgreSQLクエリ実行の基礎知識 ~Explainを読み解こう~
 

Similar to Writing Yarn Applications Hadoop Summit 2012

[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史Insight Technology, Inc.
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider projectSteve Loughran
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...Zhijie Shen
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Apache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationApache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationHortonworks
 
Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInHortonworks
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireCarter Shanklin
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Cloudera, Inc.
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Tsuyoshi OZAWA
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo DataWorks Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 

Similar to Writing Yarn Applications Hadoop Summit 2012 (20)

[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
Overview of slider project
Overview of slider projectOverview of slider project
Overview of slider project
 
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
ApacheCon North America 2014 - Apache Hadoop YARN: The Next-generation Distri...
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Apache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup PresentationApache Hadoop YARN - Hortonworks Meetup Presentation
Apache Hadoop YARN - Hortonworks Meetup Presentation
 
Field Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedInField Notes: YARN Meetup at LinkedIn
Field Notes: YARN Meetup at LinkedIn
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Virtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFireVirtualizing Latency Sensitive Workloads and vFabric GemFire
Virtualizing Latency Sensitive Workloads and vFabric GemFire
 
Apache Slider
Apache SliderApache Slider
Apache Slider
 
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Running Legacy Applications with Containers
Running Legacy Applications with ContainersRunning Legacy Applications with Containers
Running Legacy Applications with Containers
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 

Recently uploaded (20)

The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 

Writing Yarn Applications Hadoop Summit 2012

  • 1. Writing Application Frameworks on Apache Hadoop YARN Hitesh Shah hitesh@hortonworks.com © Hortonworks Inc. 2011 Page 1
  • 2. Hitesh Shah - Background • Member of Technical Staff at Hortonworks Inc. • Committer for Apache MapReduce and Ambari • Earlier, spent 8+ years at Yahoo! building various infrastructure pieces all the way from data storage platforms to high throughput online ad-serving systems. Architecting the Future of Big Data Page 2 © Hortonworks Inc. 2011
  • 3. Agenda •YARN Architecture and Concepts •Writing a New Framework Architecting the Future of Big Data Page 3 © Hortonworks Inc. 2011
  • 4. YARN Architecture • Resource Manager –Global resource scheduler –Hierarchical queues • Node Manager –Per-machine agent –Manages the life-cycle of container –Container resource monitoring • Application Master –Per-application –Manages application scheduling and task execution –E.g. MapReduce Application Master Architecting the Future of Big Data Page 4 © Hortonworks Inc. 2011
  • 5. YARN Architecture Node Manager Container App Mstr Client Resource Node Manager Manager Client App Mstr Container MapReduce Status Node Manager Job Submission Node Status Resource Request Container Container Architecting the Future of Big Data Page 5 © Hortonworks Inc. 2011
  • 6. YARN Concepts • Application ID –Application Attempt IDs • Container –ContainerLaunchContext • ResourceRequest –Host/Rack/Any match –Priority –Resource constraints • Local Resource –File/Archive –Visibility – public/private/application Architecting the Future of Big Data Page 6 © Hortonworks Inc. 2011
  • 7. What you need for a new Framework • Application Submission Client –For example, the MR Job Client • Application Master –The core framework library • Application History ( optional ) –History of all previously run instances • Auxiliary Services ( optional ) –Long-running application-specific services running on the NodeManager Architecting the Future of Big Data Page 7 © Hortonworks Inc. 2011
  • 8. Use Case: Distributed Shell • Take a user-provided script Node or application and run it on a Manager set of nodes in the Cluster DS AppMaster • Input: – User Script to execute – Number of containers to run on Node Manager – Variable arguments for each different container Shell Script – Memory requirements for the shell script Node – Output Location/Dir Manager Shell Script Architecting the Future of Big Data Page 8 © Hortonworks Inc. 2011
  • 9. Client: RPC calls • Uses ClientRM Protocol ClientRMProtocol#getNewApplication • Get a new Application ID from the RM ClientRMProtocol#submitApplication • Application Submission CLIENT RM ClientRMProtocol#getApplicationReport • Application Monitoring ClientRMProtocol#killApplication • Kill the Application? Architecting the Future of Big Data Page 9 © Hortonworks Inc. 2011
  • 10. Client • Registration with the RM –New Application ID • Application Submission –User information –Scheduler queue –Define the container for the Distributed Shell App Master via the ContainerLaunchContext • Application Monitoring – AppMaster host details with tokens if needed, tracking url – Application Status (submitted/running/finished) Architecting the Future of Big Data Page 10 © Hortonworks Inc. 2011
  • 11. Defining a Container • ContainerLaunchContext class –Can run a shell script, a java process or launch a VM • Command(s) to run • Local resources needed for the process to run –Dependent jars, native libs, data files/archives • Environment to setup –Java Classpath • Security-related data –Container Tokens Architecting the Future of Big Data Page 11 © Hortonworks Inc. 2011
  • 12. Application Master: RPC calls • AMRM and CM protocols Client • Register AM with RM AMRM.registerAM • Ask RM to allocate resources AMRM.allocate AM RM • Launch tasks on allocated containers AMRM. finishAM App-specific • Manage tasks to final RPC completion CM.startContainer • Inform RM of completion NM NM Architecting the Future of Big Data Page 12 © Hortonworks Inc. 2011
  • 13. Application Master • Setup RPC to handle requests from Client and/or tasks launched on Containers • Register and send regular heartbeats to the RM • Request resources from the RM. • Launch user shell script on containers as and when allocated. • Monitor status of user script of remote containers and manage failures by retrying if needed. • Inform RM of completion when application is done. Architecting the Future of Big Data Page 13 © Hortonworks Inc. 2011
  • 14. AMRM#allocate • Request: – Containers needed – Not a delta protocol – Locality constraints: Host/Rack/Any – Resource constraints: memory – Priority-based assignments – Containers to release – extra/unwanted? – Only non-launched containers • Response: – Allocated Containers – Launch or release – Completed Containers – Status of completion Architecting the Future of Big Data Page 14 © Hortonworks Inc. 2011
  • 15. YARN Applications • Data Processing: – OpenMPI on Hadoop – Spark (UC Berkeley) – Shark ( Hive-on-Spark ) – Real-time data processing – Storm ( Twitter ) – Apache S4 – Graph processing – Apache Giraph • Beyond data: – Deploying Apache HBase via YARN (HBASE-4329) – Hbase Co-processors via YARN (HBASE-4047) Architecting the Future of Big Data Page 15 © Hortonworks Inc. 2011
  • 16. References •Doc on writing new applications: –WritingYarnApplications.html ( available at http://hadoop.apache.org/common/docs/r2.0.0- alpha/ ) Architecting the Future of Big Data Page 16 © Hortonworks Inc. 2011
  • 17. Questions? Thank You! Hitesh Shah hitesh@hortonworks.com Architecting the Future of Big Data Page 17 © Hortonworks Inc. 2011
  • 18. Appendix: Code Examples Architecting the Future of Big Data Page 18 © Hortonworks Inc. 2011
  • 19. Client: Registration ClientRMProtocol applicationsManager; YarnConfiguration yarnConf = new YarnConfiguration(conf); InetSocketAddress rmAddress = NetUtils.createSocketAddr( yarnConf.get(YarnConfiguration.RM_ADDRESS)); applicationsManager = ((ClientRMProtocol) rpc.getProxy(ClientRMProtocol.class, rmAddress, appsManagerServerConf)); GetNewApplicationRequest request = Records.newRecord(GetNewApplicationRequest.class); GetNewApplicationResponse response = applicationsManager.getNewApplication(request); Architecting the Future of Big Data Page 19 © Hortonworks Inc. 2011
  • 20. Client: App Submission ApplicationSubmissionContext appContext; ContainerLaunchContext amContainer; amContainer.setLocalResources(Map<String, LocalResource> localResources); amContainer.setEnvironment(Map<String, String> env); String command = "${JAVA_HOME}" + /bin/java" + " MyAppMaster " + " arg1 arg2 “; amContainer.setCommands(List<String> commands); Resource capability; capability.setMemory(amMemory); amContainer.setResource(capability); appContext.setAMContainerSpec(amContainer); SubmitApplicationRequest appRequest; appRequest.setApplicationSubmissionContext(appContext); applicationsManager.submitApplication(appRequest); Architecting the Future of Big Data Page 20 © Hortonworks Inc. 2011
  • 21. Client: App Monitoring • Get Application Status GetApplicationReportRequest reportRequest = Records.newRecord(GetApplicationReportRequest.class); reportRequest.setApplicationId(appId); GetApplicationReportResponse reportResponse = applicationsManager.getApplicationReport(reportRequest); ApplicationReport report = reportResponse.getApplicationReport(); • Kill the application KillApplicationRequest killRequest = Records.newRecord(KillApplicationRequest.class); killRequest.setApplicationId(appId); applicationsManager.forceKillApplication(killRequest); Architecting the Future of Big Data Page 21 © Hortonworks Inc. 2011
  • 22. AM: Ask RM for Containers ResourceRequest rsrcRequest; rsrcRequest.setHostName("*”); // hostname, rack, wildcard rsrcRequest.setPriority(pri); Resource capability; capability.setMemory(containerMemory); rsrcRequest.setCapability(capability) rsrcRequest.setNumContainers(numContainers); List<ResourceRequest> requestedContainers; List<ContainerId> releasedContainers; AllocateRequest req; req.setResponseId(rmRequestID); req.addAllAsks(requestedContainers); req.addAllReleases(releasedContainers); req.setProgress(currentProgress); AllocateResponse allocateResponse = resourceManager.allocate(req); Architecting the Future of Big Data Page 22 © Hortonworks Inc. 2011
  • 23. AM: Launch Containers AMResponse amResp = allocateResponse.getAMResponse(); ContainerManager cm = (ContainerManager)rpc.getProxy (ContainerManager.class, cmAddress, conf); List<Container> allocatedContainers = amResp.getAllocatedContainers(); for (Container allocatedContainer : allocatedContainers) { ContainerLaunchContext ctx; ctx.setContainerId(allocatedContainer .getId()); ctx.setResource(allocatedContainer .getResource()); // set env, command, local resources, … StartContainerRequest startReq; startReq.setContainerLaunchContext(ctx); cm.startContainer(startReq); } Architecting the Future of Big Data Page 23 © Hortonworks Inc. 2011
  • 24. AM: Monitoring Containers • Running Containers GetContainerStatusRequest statusReq; statusReq.setContainerId(containerId); GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq); • Completed Containers AMResponse amResp = allocateResponse.getAMResponse(); List<Container> completedContainersStatus = amResp.getCompletedContainerStatuses(); for (ContainerStatus containerStatus : completedContainers) { // containerStatus.getContainerId() // containerStatus.getExitStatus() // containerStatus.getDiagnostics() } Architecting the Future of Big Data Page 24 © Hortonworks Inc. 2011
  • 25. AM: I am done FinishApplicationMasterRequest finishReq; finishReq.setAppAttemptId(appAttemptID); finishReq.setFinishApplicationStatus (FinalApplicationStatus.SUCCEEDED); // or FAILED finishReq.setDiagnostics(diagnostics); resourceManager.finishApplicationMaster(finishReq); Architecting the Future of Big Data Page 25 © Hortonworks Inc. 2011