SlideShare a Scribd company logo
1 of 64
Download to read offline
© Hortonworks Inc. 2013
Apache Hadoop YARN
Enabling next generation data applications
Page 1
Arun C. Murthy
Founder & Architect
@acmurthy (@hortonworks)
© Hortonworks Inc. 2013
Hello!
• Founder/Architect at Hortonworks Inc.
– Lead - Map-Reduce/YARN/Tez
– Formerly, Architect Hadoop MapReduce, Yahoo
– Responsible for running Hadoop MapReduce as a service for all of
Yahoo (~50k nodes footprint)
• Apache Hadoop, ASF
– Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC)
– Long-term Committer/PMC member (full time >6 years)
– Release Manager for hadoop-2.x
Page 2
© Hortonworks Inc. 2013
Agenda
• Why YARN?
• YARN Architecture and Concepts
• Building applications on YARN
• Next Steps
Page 3
© Hortonworks Inc. 2013
Agenda
• Why YARN?
• YARN Architecture and Concepts
• Building applications on YARN
• Next Steps
Page 4
© Hortonworks Inc. 2013
The 1st Generation of Hadoop: Batch
HADOOP 1.0
Built for Web-Scale Batch Apps
Single	
  App	
  
BATCH
HDFS
Single	
  App	
  
INTERACTIVE
Single	
  App	
  
BATCH
HDFS
•  All other usage
patterns must
leverage that same
infrastructure
•  Forces the creation
of silos for managing
mixed workloads
Single	
  App	
  
BATCH
HDFS
Single	
  App	
  
ONLINE
© Hortonworks Inc. 2013
Hadoop MapReduce Classic
• JobTracker
– Manages cluster resources and job scheduling
• TaskTracker
– Per-node agent
– Manage tasks
Page 6
© Hortonworks Inc. 2013
MapReduce Classic: Limitations
• Scalability
– Maximum Cluster size – 4,000 nodes
– Maximum concurrent tasks – 40,000
– Coarse synchronization in JobTracker
• Availability
– Failure kills all queued and running jobs
• Hard partition of resources into map and reduce slots
– Low resource utilization
• Lacks support for alternate paradigms and services
– Iterative applications implemented using MapReduce are
10x slower
Page 7
© Hortonworks Inc. 2012© Hortonworks Inc. 2013. Confidential and Proprietary.
Our Vision: Hadoop as Next-Gen Platform
HADOOP 1.0
HDFS	
  
(redundant,	
  reliable	
  storage)	
  
MapReduce	
  
(cluster	
  resource	
  management	
  
	
  &	
  data	
  processing)	
  
HDFS2	
  
(redundant,	
  reliable	
  storage)	
  
YARN	
  
(cluster	
  resource	
  management)	
  
MapReduce	
  
(data	
  processing)	
  
Others	
  
(data	
  processing)	
  
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming, …
Page 8
© Hortonworks Inc. 2013
YARN: Taking Hadoop Beyond Batch
Page 9
Applica;ons	
  Run	
  Na;vely	
  IN	
  Hadoop	
  
HDFS2	
  (Redundant,	
  Reliable	
  Storage)	
  
YARN	
  (Cluster	
  Resource	
  Management)	
  	
  	
  
BATCH	
  
(MapReduce)	
  
INTERACTIVE	
  
(Tez)	
  
STREAMING	
  
(Storm,	
  S4,…)	
  
GRAPH	
  
(Giraph)	
  
IN-­‐MEMORY	
  
(Spark)	
  
HPC	
  MPI	
  
(OpenMPI)	
  
ONLINE	
  
(HBase)	
  
OTHER	
  
(Search)	
  
(Weave…)	
  
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
© Hortonworks Inc. 2013
5
5 Key Benefits of YARN
1.  Scale
2.  New Programming Models &
Services
3.  Improved cluster utilization
4.  Agility
5.  Beyond Java
Page 10
© Hortonworks Inc. 2013
Agenda
• Why YARN
• YARN Architecture and Concepts
• Building applications on YARN
• Next Steps
Page 11
© Hortonworks Inc. 2013
A Brief History of YARN
•  Originally conceived & architected by the team at Yahoo!
– Arun Murthy created the original JIRA in 2008 and led the PMC
•  The team at Hortonworks has been working on YARN for 4 years
•  YARN based architecture running at scale at Yahoo!
– Deployed on 35,000 nodes for 6+ months
•  Multitude of YARN applications
Page 12
© Hortonworks Inc. 2013
Concepts
• Application
– Application is a job submitted to the framework
– Example – Map Reduce Job
• Container
– Basic unit of allocation
– Fine-grained resource allocation across multiple resource
types (memory, cpu, disk, network, gpu etc.)
– container_0 = 2GB, 1CPU
– container_1 = 1GB, 6 CPU
– Replaces the fixed map/reduce slots
13
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
Container	
  1.1	
  
Container	
  2.4	
  
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
Container	
  1.2	
  
Container	
  1.3	
  
AM	
  1	
  
Container	
  2.1	
  
Container	
  2.2	
  
Container	
  2.3	
  
AM2	
  
YARN Architecture
Scheduler	
  
© Hortonworks Inc. 2013
Architecture
• Resource Manager
– Global resource scheduler
– Hierarchical queues
• Node Manager
– Per-machine agent
– Manages the life-cycle of container
– Container resource monitoring
• Application Master
– Per-application
– Manages application scheduling and task execution
– E.g. MapReduce Application Master
15
© Hortonworks Inc. 2013
Design Centre
• Split up the two major functions of JobTracker
– Cluster resource management
– Application life-cycle management
• MapReduce becomes user-land library
16
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
Container	
  1.1	
  
Container	
  2.4	
  
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
Container	
  1.2	
  
Container	
  1.3	
  
AM	
  1	
  
Container	
  2.2	
  
Container	
  2.1	
  
Container	
  2.3	
  
AM2	
  
YARN Architecture - Walkthrough
Scheduler	
  Client2	
  
© Hortonworks Inc. 2013
5
Review - Benefits of YARN
1.  Scale
2.  New Programming Models &
Services
3.  Improved cluster utilization
4.  Agility
5.  Beyond Java
Page 18
© Hortonworks Inc. 2013
Agenda
• Why YARN
• YARN Architecture and Concepts
• Building applications on YARN
• Next Steps
Page 19
© Hortonworks Inc. 2013
YARN Applications
• Data processing applications and services
– Online Serving – HOYA (HBase on YARN)
– Real-time event processing – Storm, S4, other commercial
platforms
– Tez – Generic framework to run a complex DAG
– MPI: OpenMPI, MPICH2
– Master-Worker
– Machine Learning: Spark
– Graph processing: Giraph
– Enabled by allowing the use of paradigm-specific application
master
Run all on the same Hadoop cluster!
Page 20
© Hortonworks Inc. 2013
YARN – Implementing Applications
• What APIs do I need to use?
– Only three protocols
– Client to ResourceManager
– Application submission
– ApplicationMaster to ResourceManager
– Container allocation
– ApplicationMaster to NodeManager
– Container launch
– Use client libraries for all 3 actions
– Module yarn-client
– Provides both synchronous and asynchronous libraries
– Use 3rd party like Weave
–  http://continuuity.github.io/weave/
21
© Hortonworks Inc. 2013
YARN – Implementing Applications
• What do I need to do?
– Write a submission Client
– Write an ApplicationMaster (well copy-paste)
– DistributedShell is the new WordCount!
– Get containers, run whatever you want!
22
© Hortonworks Inc. 2013
YARN – Implementing Applications
• What else do I need to know?
– Resource Allocation & Usage
– ResourceRequest !
– Container !
– ContainerLaunchContext!
– LocalResource!
– ApplicationMaster
– ApplicationId
– ApplicationAttemptId!
– ApplicationSubmissionContext!
23
© Hortonworks Inc. 2013
YARN – Resource Allocation & Usage!
• ResourceRequest!
– Fine-grained resource ask to the ResourceManager
– Ask for a specific amount of resources (memory, cpu etc.) on a
specific machine or rack
– Use special value of * for resource name for any machine
Page 24
ResourceRequest!
priority!
resourceName!
capability!
numContainers!
© Hortonworks Inc. 2013
YARN – Resource Allocation & Usage!
• ResourceRequest!
!
Page 25
priority! capability! resourceName! numContainers!
!
0!
!
<2gb, 1 core>!
host01! 1!
rack0! 1!
*! 1!
1! <4gb, 1 core>! *! 1!
© Hortonworks Inc. 2013
YARN – Resource Allocation & Usage!
• Container!
– The basic unit of allocation in YARN
– The result of the ResourceRequest provided by
ResourceManager to the ApplicationMaster
– A specific amount of resources (cpu, memory etc.) on a specific
machine
Page 26
Container!
containerId!
resourceName!
capability!
tokens!
© Hortonworks Inc. 2013
YARN – Resource Allocation & Usage!
• ContainerLaunchContext!
– The context provided by ApplicationMaster to NodeManager to
launch the Container!
– Complete specification for a process
– LocalResource used to specify container binary and
dependencies
– NodeManager responsible for downloading from shared namespace
(typically HDFS)
Page 27
ContainerLaunchContext!
container!
commands!
environment!
localResources! LocalResource!
uri!
type!
© Hortonworks Inc. 2013
YARN - ApplicationMaster
• ApplicationMaster
– Per-application controller aka container_0!
– Parent for all containers of the application
– ApplicationMaster negotiates all it’s containers from
ResourceManager
– ApplicationMaster container is child of ResourceManager
– Think init process in Unix
– RM restarts the ApplicationMaster attempt if required (unique
ApplicationAttemptId)
– Code for application is submitted along with Application itself
Page 28
© Hortonworks Inc. 2013
YARN - ApplicationMaster
• ApplicationMaster
– ApplicationSubmissionContext is the complete
specification of the ApplicationMaster, provided by Client
– ResourceManager responsible for allocating and launching
ApplicationMaster container
Page 29
ApplicationSubmissionContext!
resourceRequest!
containerLaunchContext!
appName!
queue!
© Hortonworks Inc. 2013
YARN Application API - Overview
• hadoop-yarn-client module
• YarnClient is submission client api
• Both synchronous & asynchronous APIs for resource
allocation and container start/stop
• Synchronous API
– AMRMClient!
– AMNMClient!
• Asynchronous API
– AMRMClientAsync!
– AMNMClientAsync!
Page 30
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
Container	
  1.1	
  
Container	
  2.4	
  
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
Container	
  1.2	
  
Container	
  1.3	
  
AM	
  1	
  
Container	
  2.2	
  
Container	
  2.1	
  
Container	
  2.3	
  
AM2	
  
Client2	
  
	
  
New Application Request:
YarnClient.createApplication!
Submit Application:
YarnClient.submitApplication!
1
2
YARN Application API – The Client
Scheduler	
  
© Hortonworks Inc. 2013
YARN Application API – The Client
• YarnClient!
– createApplication to create application
– submitApplication to start application
–  Application developer needs to provide
ApplicationSubmissionContext!
– APIs to get other information from ResourceManager
–  getAllQueues!
–  getApplications!
–  getNodeReports!
– APIs to manipulate submitted application e.g. killApplication!
Page 32
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
YARN Application API – Resource Allocation
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
  
AM	
  
registerApplicationMaster!
1
4
AMRMClient.allocate!
Container!
2
3
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
unregisterApplicationMaster!
Scheduler	
  
© Hortonworks Inc. 2013
YARN Application API – Resource Allocation
• AMRMClient - Synchronous API for ApplicationMaster
to interact with ResourceManager
– Prologue / epilogue – registerApplicationMaster /
unregisterApplicationMaster
– Resource negotiation with ResourceManager
–  Internal book-keeping - addContainerRequest /
removeContainerRequest / releaseAssignedContainer!
–  Main API – allocate!
– Helper APIs for cluster information
–  getAvailableResources!
–  getClusterNodeCount!
Page 34
© Hortonworks Inc. 2013
YARN Application API – Resource Allocation
• AMRMClientAsync - Asynchronous API for
ApplicationMaster
– Extension of AMRMClient to provide asynchronous
CallbackHandler!
– Callbacks make it easier to build mental model of interaction with
ResourceManager for the application developer
–  onContainersAllocated!
–  onContainersCompleted!
–  onNodesUpdated!
–  onError!
–  onShutdownRequest!
Page 35
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
  
	
  
NodeManager	
  
	
  
NodeManager	
  
	
  
YARN Application API – Using Resources
Container	
  1.1	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
	
  
NodeManager	
   NodeManager	
  
	
  
NodeManager	
  
	
  
NodeManager	
  
	
  
AM	
  1	
  
AMNMClient.startContainer!
AMNMClient.getContainerStatus!
ResourceManager	
  
Scheduler	
  
© Hortonworks Inc. 2013
YARN Application API – Using Resources
• AMNMClient - Synchronous API for ApplicationMaster
to launch / stop containers at NodeManager
– Simple (trivial) APIs
–  startContainer!
–  stopContainer!
–  getContainerStatus!
Page 37
© Hortonworks Inc. 2013
YARN Application API – Using Resources
• AMNMClient - Asynchronous API for
ApplicationMaster to launch / stop containers at
NodeManager
– Simple (trivial) APIs
–  startContainerAsync!
–  stopContainerAsync!
–  getContainerStatusAsync!
– CallbackHandler to make it easier to build mental model of
interaction with NodeManager for the application developer
–  onContainerStarted!
–  onContainerStopped!
–  onStartContainerError!
–  onContainerStatusReceived!
Page 38
© Hortonworks Inc. 2013
YARN Application API - Development
• Un-Managed Mode for ApplicationMaster
– Run ApplicationMaster on development machine rather than in-
cluster
– No submission client
– hadoop-yarn-applications-unmanaged-am-launcher!
– Easier to step through debugger, browse logs etc.!
Page 39
!
$ bin/hadoop jar hadoop-yarn-applications-unmanaged-am-launcher.jar !
Client !
–jar my-application-master.jar !
–cmd ‘java MyApplicationMaster <args>’!
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• Overview
– YARN application to run n copies for a Shell command
– Simplest example of a YARN application – get n containers and
run a specific Unix command
Page 40
!
$ bin/hadoop jar hadoop-yarn-applications-distributedshell.jar  !
org.apache.hadoop.yarn.applications.distributedshell.Client !
–shell_command ‘/bin/date’ !
–num_containers <n>!
Code: https://github.com/hortonworks/simple-yarn-app
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• Code Overview
– User submits application to ResourceManager via
org.apache.hadoop.yarn.applications.distributedsh
ell.Client!
– Client provides ApplicationSubmissionContext to the
ResourceManager
– It is responsibility of
org.apache.hadoop.yarn.applications.distributedsh
ell.ApplicationMaster to negotiate n containers
– ApplicationMaster launches containers with the user-specified
command as ContainerLaunchContext.commands!
Page 41
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• Client – Code Walkthrough
– hadoop-yarn-client module
– Steps:
– YarnClient.createApplication!
– Specify ApplicationSubmissionContext, in particular,
ContainerLaunchContext with commands, and other key pieces
such as resource capability for ApplicationMaster container and
queue, appName, appType etc.
– YarnClient.submitApplication!
Page 42
1 . // Create yarnClient!
2 .   YarnClient yarnClient = YarnClient.createYarnClient();!
3 .   yarnClient.init(new Configuration());!
4 .   yarnClient.start();!
5 . !
6 .    // Create application via yarnClient!
7 .  YarnClientApplication app = yarnClient.createApplication();!
8 . !
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
Page 43
9.  // Set up the container launch context for the application master!
1 0.    ContainerLaunchContext amContainer = !
1 1.        Records.newRecord(ContainerLaunchContext.class);!
1 2.    List<String> command = new List<String>();!
1 3.    commands.add("$JAVA_HOME/bin/java");!
1 4.    commands.add("-Xmx256M");!
1 5.    commands.add(!
1 6.        "org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster");!
1 7.    commands.add("--container_memory 1024” );!
1 8.    commands.add("--container_cores 1”);!
1 9.    commands.add("--num_containers 3");;!
2 0.    amContainer.setCommands(commands);!
2 1. !
2 2.    // Set up resource type requirements for ApplicationMaster!
2 3.    Resource capability = Records.newRecord(Resource.class);!
2 4.    capability.setMemory(256);!
2 5.    capability.setVirtualCores(2);!
2 6. !
• Client – Code Walkthrough
Command to launch ApplicationMaster
process
Resources required for
ApplicationMaster container!
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
Page 44
2 7.    // Finally, set-up ApplicationSubmissionContext for the application!
2 8.    ApplicationSubmissionContext appContext = !
2 9.        app.getApplicationSubmissionContext();!
3 0.    appContext.setQueue("my-queue");            // queue !
3 1.    appContext.setAMContainerSpec(amContainer);!
3 2.    appContext.setResource(capability);!
3 3.    appContext.setApplicationName("my-app");     // application name!
3 4.    appContext.setApplicationType(”DISTRIBUTED_SHELL");    // application type!
3 5. !
3 6.    // Submit application!
3 7.    yarnClient.submitApplication(appContext);
• Client – Code Walkthrough
ApplicationSubmissionContext !
for
ApplicationMaster
Submit application to
ResourceManager
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• ApplicationMaster – Code Walkthrough
– Again, hadoop-yarn-client module
– Steps:
– AMRMClient.registerApplication!
– Negotiate containers from ResourceManager by providing
ContainerRequest to AMRMClient.addContainerRequest
– Take the resultant Container returned via subsequent call to
AMRMClient.allocate, build ContainerLaunchContext with
Container and commands, then launch them using
AMNMClient.launchContainer!
– Use LocalResources to specify software/configuration dependencies
for each worker container
– Wait till done…
AllocateResponse.getCompletedContainersStatuses from
subsequent calls to AMRMClient.allocate
– AMRMClient.unregisterApplication!
Page 45
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• ApplicationMaster – Code Walkthrough
Page 46
1.  // Initialize clients to ResourceManager and NodeManagers!
2 .        Configuration conf = new Configuration();!
3 . !
4 .        AMRMClient rmClient = AMRMClientAsync.createAMRMClient();!
5 .        rmClient.init(conf);!
6 .        rmClient.start();!
7 . !
8 .        NMClient nmClient = NMClient.createNMClient();!
9 .        nmClientAsync.init(conf);!
1 0.     nmClientAsync.start();!
1 1.!
1 2.     // Register with ResourceManager!
1 3.     rmClient.registerApplicationMaster("", 0, "");!
Initialize clients to
ResourceManager and
NodeManagers
Register with
ResourceManager
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• ApplicationMaster – Code Walkthrough
Page 47
1 5.    // Priority for worker containers - priorities are intra-application!
1 6.    Priority priority = Records.newRecord(Priority.class);!
1 7.    priority.setPriority(0);      !
1 8. !
1 9.    // Resource requirements for worker containers!
2 0.    Resource capability = Records.newRecord(Resource.class);!
2 1.    capability.setMemory(128);!
2 2.    capability.setVirtualCores(1);!
2 3. !
2 4.    // Make container requests to ResourceManager!
2 5.    for (int i = 0; i < n; ++i) {!
2 6.      ContainerRequest containerAsk = new ContainerRequest(capability, null, null,
priority);!
2 7.      rmClient.addContainerRequest(containerAsk);!
2 8.    }
!
Setup requirements for
worker containers
Make resource requests
to ResourceManager
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• ApplicationMaster – Code Walkthrough
Page 48
!
3 0.    // Obtain allocated containers and launch !
3 1.    int allocatedContainers = 0;!
3 2.    while (allocatedContainers < n) {!
3 3.      AllocateResponse response = rmClient.allocate(0);!
3 4.      for (Container container : response.getAllocatedContainers()) {!
3 5.        ++allocatedContainers;!
3 6. !
3 7.        // Launch container by create ContainerLaunchContext!
3 8.        ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);!
3 9.        ctx.setCommands(Collections.singletonList(”/bin/date”));!
4 0.        nmClient.startContainer(container, ctx);!
4 1.      }!
4 2.      Thread.sleep(100);!
4 3.    }
!
Setup requirements for
worker containers
Make resource requests
to ResourceManager
© Hortonworks Inc. 2013
Sample YARN Application – DistributedShell
• ApplicationMaster – Code Walkthrough
Page 49
4     // Now wait for containers to complete!
4     int completedContainers = 0;!
4     while (completedContainers < n) {!
4       AllocateResponse response = rmClient.allocate(completedContainers/n);!
4       for (ContainerStatus status : response.getCompletedContainersStatuses()) {!
5         if (status.getExitStatus() == 0) {!
5           ++completedContainers;!
5         }  !
5       }!
4       Thread.sleep(100);!
5     }!
5  !
5     // Un-register with ResourceManager!
5     rmClient.unregisterApplicationMaster(SUCCEEDED, "", "");!
Wait for containers to
complete successfully
Un-register with
ResourceManager
© Hortonworks Inc. 2013
Apache Hadoop MapReduce on YARN
• Original use-case
• Most complex application to build
– Data-locality
– Fault tolerance
– ApplicationMaster recovery: Check point to HDFS
– Intra-application Priorities: Maps v/s Reduces
– Needed complex feedback mechanism from ResourceManager
– Security
– Isolation
• Binary compatible with Apache Hadoop 1.x
Page 50
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map	
  1.1	
  
reduce2.1	
  
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map1.2	
  
reduce1.1	
  
MR	
  AM	
  1	
  
map2.1	
  
map2.2	
  
reduce2.2	
  
MR	
  AM2	
  
Apache Hadoop MapReduce on YARN
Scheduler	
  
© Hortonworks Inc. 2013
Apache Tez on YARN
• Replaces MapReduce as primitive for Pig, Hive,
Cascading etc.
– Smaller latency for interactive queries
– Higher throughput for batch queries
– http://incubator.apache.org/projects/tez.html
Page 52
Hive - Tez
SELECT a.x, AVERAGE(b.y) AS avg
FROM a JOIN b ON (a.id = b.id)
GROUP BY a
UNION SELECT x, AVERAGE(y) AS AVG
FROM c
GROUP BY x
ORDER BY AVG;
Hive - MR
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map	
  1.1	
  
vertex1.2.2	
  
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map1.2	
  
reduce1.1	
  
MR	
  AM	
  1	
  
vertex1.1.1	
  
vertex1.1.2	
  
vertex1.2.1	
  
Tez	
  AM1	
  
Apache Tez on YARN
Scheduler	
  
© Hortonworks Inc. 2013
HOYA - Apache HBase on YARN
• Hoya – Apache HBase becomes user-level application
• Use cases
– Small HBase cluster in large YARN cluster
– Dynamic HBase clusters
– Transient/intermittent clusters for workflows
• APIs to create, start, stop & delete HBase clusters
• Flex cluster size: increase/decrease size with load
• Recover from Region Server loss with new container.
Page 54
Code: https://github.com/hortonworks/hoya
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map	
  1.1	
  
Hbase	
  Master1.1	
  
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map1.2	
  
reduce1.1	
  
MR	
  AM	
  1	
  
RegionServer	
  1.1	
  
RegionServer	
  1.2	
  
RegionServer	
  1.3	
  
HOYA	
  AM1	
  
HOYA - Apache HBase on YARN
Scheduler	
  
© Hortonworks Inc. 2013
HOYA - Highlights
• Cluster specification stored as JSON in HDFS
• Config directory cached - dynamically patched before
pushing up as local resources for Master &
RegionServers
• HBase tar file stored in HDFS -clusters can use the
same/different HBase versions
• Handling of cluster flexing is the same code as
unplanned container loss.
• No Hoya code on RegionServers: client and AM only
Page 56
© Hortonworks Inc. 2013
Storm on YARN
• Ability to deploy multiple Storm clusters on YARN for
real-time event processing
• Yahoo – Primary contributor
– 200+ nodes in production
• Ability to recover from faulty nodes
– Get new containers
• Auto-scale for load balancing
– Get new containers as load increases
– Release containers as load decreases
Page 57
Code: https://github.com/yahoo/storm-yarn
© Hortonworks Inc. 2012
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map	
  1.1	
  
Nimbus1.4	
  
ResourceManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
NodeManager	
   NodeManager	
   NodeManager	
   NodeManager	
  
map1.2	
  
reduce1.1	
  
MR	
  AM	
  1	
  
Nimbus1.1	
  
Nimbus1.2	
  
Nimbus1.3	
  
Storm	
  AM1	
  
Storm on YARN
Scheduler	
  
© Hortonworks Inc. 2013
General Architectural Considerations
• Fault Tolerance
– Checkpoint
• Security
• Always-On services
• Scheduler features
– Whitelist resources
– Blacklist resources
– Labels for machines
– License management
Page 59
© Hortonworks Inc. 2013
Agenda
• Why YARN
• YARN Architecture and Concepts
• Building applications on YARN
• Next Steps
Page 60
© Hortonworks Inc. 2013
HDP 2.0 Community Preview & YARN Certification Program
Goal: Accelerate # of certified YARN-based solutions
•  HDP 2.0 Community Preview
– Contains latest community Beta of
Apache Hadoop 2.0 & YARN
– Delivered as easy to use Sandbox
VM, as well as RPMs and Tarballs
– Enables YARN Cert Program
Community & commercial
ecosystem to test and certify new
and existing YARN-based apps
•  YARN Certification Program
– More than 14 partners in
program at launch
–  Splunk*
–  Elastic Search*
–  Altiscale*
–  Concurrent*
–  Microsoft
–  Platfora
–  Tableau
–  (IBM) DataStage
–  Informatica
–  Karmasphere
–  and others
* Already certified
© Hortonworks Inc. 2013
Forum & Office Hours
• YARN Forum
– Community of Hadoop YARN developers
– Focused on collaboration and Q&A
–  http://hortonworks.com/community/forums/forum/yarn
• Office Hours
– http://www.meetup.com/HSquared-Hadoop-Hortonworks-User-
Group/
– Bi-weekly office hours with Hortonworks engineers & architects
– Every other Thursday, starting on Aug 15th from 4:30 – 5:30 PM
(PST)
– At Hortonworks Palo Alto HQ
– West Bayshore Rd. Palo Alto, CA 94303 USA
Page 62
© Hortonworks Inc. 2013
Technical Resources
• ASF hadoop-2.1.0-beta
– Coming soon!
–  http://hadoop.apache.org/common/releases.html
– Release Documentation:
http://hadoop.apache.org/common/docs/r2.1.0-beta
• Blogs:
–  http://hortonworks.com/hadoop/yarn
–  http://hortonworks.com/blog/category/apache-hadoop/yarn/
–  http://hortonworks.com/blog/introducing-apache-hadoop-yarn/
–  http://hortonworks.com/blog/apache-hadoop-yarn-background-
and-an-overview/
Page 63
© Hortonworks Inc. 2013
What Next?
• Download the Book
• Download the Community preview
• Join the Beta Program
• Participate via the YARN forum
• Come see us during the YARN
Office Hours
• Follow us… @hortonworks
Thank You!
Page 64
http://hortonworks.com/hadoop/yarn

More Related Content

What's hot

Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesDataWorks Summit
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Emilio Coppa
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopHortonworks
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache TezGetInData
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureVinod Kumar Vavilapalli
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersDataWorks Summit
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnMike Frampton
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudDataWorks Summit/Hadoop Summit
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processinghitesh1892
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureDataWorks Summit
 

What's hot (20)

Apache Hadoop YARN: best practices
Apache Hadoop YARN: best practicesApache Hadoop YARN: best practices
Apache Hadoop YARN: best practices
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
October 2014 HUG : Hive On Spark
October 2014 HUG : Hive On SparkOctober 2014 HUG : Hive On Spark
October 2014 HUG : Hive On Spark
 
Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)Hadoop Internals (2.3.0 or later)
Hadoop Internals (2.3.0 or later)
 
YARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo HadoopYARN - Next Generation Compute Platform fo Hadoop
YARN - Next Generation Compute Platform fo Hadoop
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Yarns About Yarn
Yarns About YarnYarns About Yarn
Yarns About Yarn
 
Quick Introduction to Apache Tez
Quick Introduction to Apache TezQuick Introduction to Apache Tez
Quick Introduction to Apache Tez
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Yarn
YarnYarn
Yarn
 
Hadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and FutureHadoop Summit Europe 2015 - YARN Present and Future
Hadoop Summit Europe 2015 - YARN Present and Future
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
Towards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN ClustersTowards SLA-based Scheduling on YARN Clusters
Towards SLA-based Scheduling on YARN Clusters
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the CloudOperationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN based Hadoop Clusters in the Cloud
 
Apache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data ProcessingApache Tez - Accelerating Hadoop Data Processing
Apache Tez - Accelerating Hadoop Data Processing
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 

Viewers also liked

Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...DataWorks Summit/Hadoop Summit
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Jeffrey Breen
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containerspranav_joshi
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2benjaminwootton
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Janos Matyas
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm ClusterFernando Ike
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesMarc Sluiter
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 

Viewers also liked (11)

Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
 
Hadoop Cluster on Docker Containers
Hadoop Cluster on Docker ContainersHadoop Cluster on Docker Containers
Hadoop Cluster on Docker Containers
 
Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2Configuring Your First Hadoop Cluster On EC2
Configuring Your First Hadoop Cluster On EC2
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014 Docker based Hadoop provisioning - Hadoop Summit 2014
Docker based Hadoop provisioning - Hadoop Summit 2014
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks Technical Workshop: What's New in HDP 2.3
 
Docker Swarm Cluster
Docker Swarm ClusterDocker Swarm Cluster
Docker Swarm Cluster
 
Simplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & TroubleshootingSimplified Cluster Operation & Troubleshooting
Simplified Cluster Operation & Troubleshooting
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing KubernetesManaging Docker Containers In A Cluster - Introducing Kubernetes
Managing Docker Containers In A Cluster - Introducing Kubernetes
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 

Similar to Apache Hadoop YARN - Enabling Next Generation Data Applications

Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Hakka Labs
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformBikas Saha
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopHortonworks
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextDataWorks Summit
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoophitesh1892
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN ApplicationsHortonworks
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopHortonworks
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史Insight Technology, Inc.
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Hortonworks
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionDataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformTsuyoshi OZAWA
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionDataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo DataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionWangda Tan
 

Similar to Apache Hadoop YARN - Enabling Next Generation Data Applications (20)

Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
 
MHUG - YARN
MHUG - YARNMHUG - YARN
MHUG - YARN
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014Hortonworks Yarn Code Walk Through January 2014
Hortonworks Yarn Code Walk Through January 2014
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 
YARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute PlatformYARN - Hadoop Next Generation Compute Platform
YARN - Hadoop Next Generation Compute Platform
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of HadoopApache Hadoop YARN: Understanding the Data Operating System of Hadoop
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
 
Yarnthug2014
Yarnthug2014Yarnthug2014
Yarnthug2014
 
Apache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's NextApache Hadoop MapReduce: What's Next
Apache Hadoop MapReduce: What's Next
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
Get Started Building YARN Applications
Get Started Building YARN ApplicationsGet Started Building YARN Applications
Get Started Building YARN Applications
 
YARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache HadoopYARN: Future of Data Processing with Apache Hadoop
YARN: Future of Data Processing with Apache Hadoop
 
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から  by NTT 小沢健史
[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014Developing YARN Applications - Integrating natively to YARN July 24 2014
Developing YARN Applications - Integrating natively to YARN July 24 2014
 
Apache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the UnionApache Hadoop YARN: State of the Union
Apache Hadoop YARN: State of the Union
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo Apache Hadoop YARN: state of the union - Tokyo
Apache Hadoop YARN: state of the union - Tokyo
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The UnionDataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideStefan Dietze
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 

Recently uploaded (20)

1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 

Apache Hadoop YARN - Enabling Next Generation Data Applications

  • 1. © Hortonworks Inc. 2013 Apache Hadoop YARN Enabling next generation data applications Page 1 Arun C. Murthy Founder & Architect @acmurthy (@hortonworks)
  • 2. © Hortonworks Inc. 2013 Hello! • Founder/Architect at Hortonworks Inc. – Lead - Map-Reduce/YARN/Tez – Formerly, Architect Hadoop MapReduce, Yahoo – Responsible for running Hadoop MapReduce as a service for all of Yahoo (~50k nodes footprint) • Apache Hadoop, ASF – Frmr. VP, Apache Hadoop, ASF (Chair of Apache Hadoop PMC) – Long-term Committer/PMC member (full time >6 years) – Release Manager for hadoop-2.x Page 2
  • 3. © Hortonworks Inc. 2013 Agenda • Why YARN? • YARN Architecture and Concepts • Building applications on YARN • Next Steps Page 3
  • 4. © Hortonworks Inc. 2013 Agenda • Why YARN? • YARN Architecture and Concepts • Building applications on YARN • Next Steps Page 4
  • 5. © Hortonworks Inc. 2013 The 1st Generation of Hadoop: Batch HADOOP 1.0 Built for Web-Scale Batch Apps Single  App   BATCH HDFS Single  App   INTERACTIVE Single  App   BATCH HDFS •  All other usage patterns must leverage that same infrastructure •  Forces the creation of silos for managing mixed workloads Single  App   BATCH HDFS Single  App   ONLINE
  • 6. © Hortonworks Inc. 2013 Hadoop MapReduce Classic • JobTracker – Manages cluster resources and job scheduling • TaskTracker – Per-node agent – Manage tasks Page 6
  • 7. © Hortonworks Inc. 2013 MapReduce Classic: Limitations • Scalability – Maximum Cluster size – 4,000 nodes – Maximum concurrent tasks – 40,000 – Coarse synchronization in JobTracker • Availability – Failure kills all queued and running jobs • Hard partition of resources into map and reduce slots – Low resource utilization • Lacks support for alternate paradigms and services – Iterative applications implemented using MapReduce are 10x slower Page 7
  • 8. © Hortonworks Inc. 2012© Hortonworks Inc. 2013. Confidential and Proprietary. Our Vision: Hadoop as Next-Gen Platform HADOOP 1.0 HDFS   (redundant,  reliable  storage)   MapReduce   (cluster  resource  management    &  data  processing)   HDFS2   (redundant,  reliable  storage)   YARN   (cluster  resource  management)   MapReduce   (data  processing)   Others   (data  processing)   HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 8
  • 9. © Hortonworks Inc. 2013 YARN: Taking Hadoop Beyond Batch Page 9 Applica;ons  Run  Na;vely  IN  Hadoop   HDFS2  (Redundant,  Reliable  Storage)   YARN  (Cluster  Resource  Management)       BATCH   (MapReduce)   INTERACTIVE   (Tez)   STREAMING   (Storm,  S4,…)   GRAPH   (Giraph)   IN-­‐MEMORY   (Spark)   HPC  MPI   (OpenMPI)   ONLINE   (HBase)   OTHER   (Search)   (Weave…)   Store ALL DATA in one place… Interact with that data in MULTIPLE WAYS with Predictable Performance and Quality of Service
  • 10. © Hortonworks Inc. 2013 5 5 Key Benefits of YARN 1.  Scale 2.  New Programming Models & Services 3.  Improved cluster utilization 4.  Agility 5.  Beyond Java Page 10
  • 11. © Hortonworks Inc. 2013 Agenda • Why YARN • YARN Architecture and Concepts • Building applications on YARN • Next Steps Page 11
  • 12. © Hortonworks Inc. 2013 A Brief History of YARN •  Originally conceived & architected by the team at Yahoo! – Arun Murthy created the original JIRA in 2008 and led the PMC •  The team at Hortonworks has been working on YARN for 4 years •  YARN based architecture running at scale at Yahoo! – Deployed on 35,000 nodes for 6+ months •  Multitude of YARN applications Page 12
  • 13. © Hortonworks Inc. 2013 Concepts • Application – Application is a job submitted to the framework – Example – Map Reduce Job • Container – Basic unit of allocation – Fine-grained resource allocation across multiple resource types (memory, cpu, disk, network, gpu etc.) – container_0 = 2GB, 1CPU – container_1 = 1GB, 6 CPU – Replaces the fixed map/reduce slots 13
  • 14. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   Container  1.1   Container  2.4   ResourceManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   Container  1.2   Container  1.3   AM  1   Container  2.1   Container  2.2   Container  2.3   AM2   YARN Architecture Scheduler  
  • 15. © Hortonworks Inc. 2013 Architecture • Resource Manager – Global resource scheduler – Hierarchical queues • Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring • Application Master – Per-application – Manages application scheduling and task execution – E.g. MapReduce Application Master 15
  • 16. © Hortonworks Inc. 2013 Design Centre • Split up the two major functions of JobTracker – Cluster resource management – Application life-cycle management • MapReduce becomes user-land library 16
  • 17. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   Container  1.1   Container  2.4   ResourceManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   Container  1.2   Container  1.3   AM  1   Container  2.2   Container  2.1   Container  2.3   AM2   YARN Architecture - Walkthrough Scheduler  Client2  
  • 18. © Hortonworks Inc. 2013 5 Review - Benefits of YARN 1.  Scale 2.  New Programming Models & Services 3.  Improved cluster utilization 4.  Agility 5.  Beyond Java Page 18
  • 19. © Hortonworks Inc. 2013 Agenda • Why YARN • YARN Architecture and Concepts • Building applications on YARN • Next Steps Page 19
  • 20. © Hortonworks Inc. 2013 YARN Applications • Data processing applications and services – Online Serving – HOYA (HBase on YARN) – Real-time event processing – Storm, S4, other commercial platforms – Tez – Generic framework to run a complex DAG – MPI: OpenMPI, MPICH2 – Master-Worker – Machine Learning: Spark – Graph processing: Giraph – Enabled by allowing the use of paradigm-specific application master Run all on the same Hadoop cluster! Page 20
  • 21. © Hortonworks Inc. 2013 YARN – Implementing Applications • What APIs do I need to use? – Only three protocols – Client to ResourceManager – Application submission – ApplicationMaster to ResourceManager – Container allocation – ApplicationMaster to NodeManager – Container launch – Use client libraries for all 3 actions – Module yarn-client – Provides both synchronous and asynchronous libraries – Use 3rd party like Weave –  http://continuuity.github.io/weave/ 21
  • 22. © Hortonworks Inc. 2013 YARN – Implementing Applications • What do I need to do? – Write a submission Client – Write an ApplicationMaster (well copy-paste) – DistributedShell is the new WordCount! – Get containers, run whatever you want! 22
  • 23. © Hortonworks Inc. 2013 YARN – Implementing Applications • What else do I need to know? – Resource Allocation & Usage – ResourceRequest ! – Container ! – ContainerLaunchContext! – LocalResource! – ApplicationMaster – ApplicationId – ApplicationAttemptId! – ApplicationSubmissionContext! 23
  • 24. © Hortonworks Inc. 2013 YARN – Resource Allocation & Usage! • ResourceRequest! – Fine-grained resource ask to the ResourceManager – Ask for a specific amount of resources (memory, cpu etc.) on a specific machine or rack – Use special value of * for resource name for any machine Page 24 ResourceRequest! priority! resourceName! capability! numContainers!
  • 25. © Hortonworks Inc. 2013 YARN – Resource Allocation & Usage! • ResourceRequest! ! Page 25 priority! capability! resourceName! numContainers! ! 0! ! <2gb, 1 core>! host01! 1! rack0! 1! *! 1! 1! <4gb, 1 core>! *! 1!
  • 26. © Hortonworks Inc. 2013 YARN – Resource Allocation & Usage! • Container! – The basic unit of allocation in YARN – The result of the ResourceRequest provided by ResourceManager to the ApplicationMaster – A specific amount of resources (cpu, memory etc.) on a specific machine Page 26 Container! containerId! resourceName! capability! tokens!
  • 27. © Hortonworks Inc. 2013 YARN – Resource Allocation & Usage! • ContainerLaunchContext! – The context provided by ApplicationMaster to NodeManager to launch the Container! – Complete specification for a process – LocalResource used to specify container binary and dependencies – NodeManager responsible for downloading from shared namespace (typically HDFS) Page 27 ContainerLaunchContext! container! commands! environment! localResources! LocalResource! uri! type!
  • 28. © Hortonworks Inc. 2013 YARN - ApplicationMaster • ApplicationMaster – Per-application controller aka container_0! – Parent for all containers of the application – ApplicationMaster negotiates all it’s containers from ResourceManager – ApplicationMaster container is child of ResourceManager – Think init process in Unix – RM restarts the ApplicationMaster attempt if required (unique ApplicationAttemptId) – Code for application is submitted along with Application itself Page 28
  • 29. © Hortonworks Inc. 2013 YARN - ApplicationMaster • ApplicationMaster – ApplicationSubmissionContext is the complete specification of the ApplicationMaster, provided by Client – ResourceManager responsible for allocating and launching ApplicationMaster container Page 29 ApplicationSubmissionContext! resourceRequest! containerLaunchContext! appName! queue!
  • 30. © Hortonworks Inc. 2013 YARN Application API - Overview • hadoop-yarn-client module • YarnClient is submission client api • Both synchronous & asynchronous APIs for resource allocation and container start/stop • Synchronous API – AMRMClient! – AMNMClient! • Asynchronous API – AMRMClientAsync! – AMNMClientAsync! Page 30
  • 31. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   Container  1.1   Container  2.4   ResourceManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   Container  1.2   Container  1.3   AM  1   Container  2.2   Container  2.1   Container  2.3   AM2   Client2     New Application Request: YarnClient.createApplication! Submit Application: YarnClient.submitApplication! 1 2 YARN Application API – The Client Scheduler  
  • 32. © Hortonworks Inc. 2013 YARN Application API – The Client • YarnClient! – createApplication to create application – submitApplication to start application –  Application developer needs to provide ApplicationSubmissionContext! – APIs to get other information from ResourceManager –  getAllQueues! –  getApplications! –  getNodeReports! – APIs to manipulate submitted application e.g. killApplication! Page 32
  • 33. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   YARN Application API – Resource Allocation ResourceManager   NodeManager   NodeManager   NodeManager   AM   registerApplicationMaster! 1 4 AMRMClient.allocate! Container! 2 3 NodeManager   NodeManager   NodeManager   NodeManager   unregisterApplicationMaster! Scheduler  
  • 34. © Hortonworks Inc. 2013 YARN Application API – Resource Allocation • AMRMClient - Synchronous API for ApplicationMaster to interact with ResourceManager – Prologue / epilogue – registerApplicationMaster / unregisterApplicationMaster – Resource negotiation with ResourceManager –  Internal book-keeping - addContainerRequest / removeContainerRequest / releaseAssignedContainer! –  Main API – allocate! – Helper APIs for cluster information –  getAvailableResources! –  getClusterNodeCount! Page 34
  • 35. © Hortonworks Inc. 2013 YARN Application API – Resource Allocation • AMRMClientAsync - Asynchronous API for ApplicationMaster – Extension of AMRMClient to provide asynchronous CallbackHandler! – Callbacks make it easier to build mental model of interaction with ResourceManager for the application developer –  onContainersAllocated! –  onContainersCompleted! –  onNodesUpdated! –  onError! –  onShutdownRequest! Page 35
  • 36. © Hortonworks Inc. 2012 NodeManager   NodeManager     NodeManager     NodeManager     YARN Application API – Using Resources Container  1.1   NodeManager   NodeManager   NodeManager   NodeManager     NodeManager   NodeManager     NodeManager     NodeManager     AM  1   AMNMClient.startContainer! AMNMClient.getContainerStatus! ResourceManager   Scheduler  
  • 37. © Hortonworks Inc. 2013 YARN Application API – Using Resources • AMNMClient - Synchronous API for ApplicationMaster to launch / stop containers at NodeManager – Simple (trivial) APIs –  startContainer! –  stopContainer! –  getContainerStatus! Page 37
  • 38. © Hortonworks Inc. 2013 YARN Application API – Using Resources • AMNMClient - Asynchronous API for ApplicationMaster to launch / stop containers at NodeManager – Simple (trivial) APIs –  startContainerAsync! –  stopContainerAsync! –  getContainerStatusAsync! – CallbackHandler to make it easier to build mental model of interaction with NodeManager for the application developer –  onContainerStarted! –  onContainerStopped! –  onStartContainerError! –  onContainerStatusReceived! Page 38
  • 39. © Hortonworks Inc. 2013 YARN Application API - Development • Un-Managed Mode for ApplicationMaster – Run ApplicationMaster on development machine rather than in- cluster – No submission client – hadoop-yarn-applications-unmanaged-am-launcher! – Easier to step through debugger, browse logs etc.! Page 39 ! $ bin/hadoop jar hadoop-yarn-applications-unmanaged-am-launcher.jar ! Client ! –jar my-application-master.jar ! –cmd ‘java MyApplicationMaster <args>’!
  • 40. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • Overview – YARN application to run n copies for a Shell command – Simplest example of a YARN application – get n containers and run a specific Unix command Page 40 ! $ bin/hadoop jar hadoop-yarn-applications-distributedshell.jar ! org.apache.hadoop.yarn.applications.distributedshell.Client ! –shell_command ‘/bin/date’ ! –num_containers <n>! Code: https://github.com/hortonworks/simple-yarn-app
  • 41. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • Code Overview – User submits application to ResourceManager via org.apache.hadoop.yarn.applications.distributedsh ell.Client! – Client provides ApplicationSubmissionContext to the ResourceManager – It is responsibility of org.apache.hadoop.yarn.applications.distributedsh ell.ApplicationMaster to negotiate n containers – ApplicationMaster launches containers with the user-specified command as ContainerLaunchContext.commands! Page 41
  • 42. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • Client – Code Walkthrough – hadoop-yarn-client module – Steps: – YarnClient.createApplication! – Specify ApplicationSubmissionContext, in particular, ContainerLaunchContext with commands, and other key pieces such as resource capability for ApplicationMaster container and queue, appName, appType etc. – YarnClient.submitApplication! Page 42 1 . // Create yarnClient! 2 .   YarnClient yarnClient = YarnClient.createYarnClient();! 3 .   yarnClient.init(new Configuration());! 4 .   yarnClient.start();! 5 . ! 6 .    // Create application via yarnClient! 7 .  YarnClientApplication app = yarnClient.createApplication();! 8 . !
  • 43. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell Page 43 9.  // Set up the container launch context for the application master! 1 0.    ContainerLaunchContext amContainer = ! 1 1.        Records.newRecord(ContainerLaunchContext.class);! 1 2.    List<String> command = new List<String>();! 1 3.    commands.add("$JAVA_HOME/bin/java");! 1 4.    commands.add("-Xmx256M");! 1 5.    commands.add(! 1 6.        "org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster");! 1 7.    commands.add("--container_memory 1024” );! 1 8.    commands.add("--container_cores 1”);! 1 9.    commands.add("--num_containers 3");;! 2 0.    amContainer.setCommands(commands);! 2 1. ! 2 2.    // Set up resource type requirements for ApplicationMaster! 2 3.    Resource capability = Records.newRecord(Resource.class);! 2 4.    capability.setMemory(256);! 2 5.    capability.setVirtualCores(2);! 2 6. ! • Client – Code Walkthrough Command to launch ApplicationMaster process Resources required for ApplicationMaster container!
  • 44. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell Page 44 2 7.    // Finally, set-up ApplicationSubmissionContext for the application! 2 8.    ApplicationSubmissionContext appContext = ! 2 9.        app.getApplicationSubmissionContext();! 3 0.    appContext.setQueue("my-queue");            // queue ! 3 1.    appContext.setAMContainerSpec(amContainer);! 3 2.    appContext.setResource(capability);! 3 3.    appContext.setApplicationName("my-app");     // application name! 3 4.    appContext.setApplicationType(”DISTRIBUTED_SHELL");    // application type! 3 5. ! 3 6.    // Submit application! 3 7.    yarnClient.submitApplication(appContext); • Client – Code Walkthrough ApplicationSubmissionContext ! for ApplicationMaster Submit application to ResourceManager
  • 45. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • ApplicationMaster – Code Walkthrough – Again, hadoop-yarn-client module – Steps: – AMRMClient.registerApplication! – Negotiate containers from ResourceManager by providing ContainerRequest to AMRMClient.addContainerRequest – Take the resultant Container returned via subsequent call to AMRMClient.allocate, build ContainerLaunchContext with Container and commands, then launch them using AMNMClient.launchContainer! – Use LocalResources to specify software/configuration dependencies for each worker container – Wait till done… AllocateResponse.getCompletedContainersStatuses from subsequent calls to AMRMClient.allocate – AMRMClient.unregisterApplication! Page 45
  • 46. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • ApplicationMaster – Code Walkthrough Page 46 1.  // Initialize clients to ResourceManager and NodeManagers! 2 .        Configuration conf = new Configuration();! 3 . ! 4 .        AMRMClient rmClient = AMRMClientAsync.createAMRMClient();! 5 .        rmClient.init(conf);! 6 .        rmClient.start();! 7 . ! 8 .        NMClient nmClient = NMClient.createNMClient();! 9 .        nmClientAsync.init(conf);! 1 0.     nmClientAsync.start();! 1 1.! 1 2.     // Register with ResourceManager! 1 3.     rmClient.registerApplicationMaster("", 0, "");! Initialize clients to ResourceManager and NodeManagers Register with ResourceManager
  • 47. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • ApplicationMaster – Code Walkthrough Page 47 1 5.    // Priority for worker containers - priorities are intra-application! 1 6.    Priority priority = Records.newRecord(Priority.class);! 1 7.    priority.setPriority(0);      ! 1 8. ! 1 9.    // Resource requirements for worker containers! 2 0.    Resource capability = Records.newRecord(Resource.class);! 2 1.    capability.setMemory(128);! 2 2.    capability.setVirtualCores(1);! 2 3. ! 2 4.    // Make container requests to ResourceManager! 2 5.    for (int i = 0; i < n; ++i) {! 2 6.      ContainerRequest containerAsk = new ContainerRequest(capability, null, null, priority);! 2 7.      rmClient.addContainerRequest(containerAsk);! 2 8.    } ! Setup requirements for worker containers Make resource requests to ResourceManager
  • 48. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • ApplicationMaster – Code Walkthrough Page 48 ! 3 0.    // Obtain allocated containers and launch ! 3 1.    int allocatedContainers = 0;! 3 2.    while (allocatedContainers < n) {! 3 3.      AllocateResponse response = rmClient.allocate(0);! 3 4.      for (Container container : response.getAllocatedContainers()) {! 3 5.        ++allocatedContainers;! 3 6. ! 3 7.        // Launch container by create ContainerLaunchContext! 3 8.        ContainerLaunchContext ctx = Records.newRecord(ContainerLaunchContext.class);! 3 9.        ctx.setCommands(Collections.singletonList(”/bin/date”));! 4 0.        nmClient.startContainer(container, ctx);! 4 1.      }! 4 2.      Thread.sleep(100);! 4 3.    } ! Setup requirements for worker containers Make resource requests to ResourceManager
  • 49. © Hortonworks Inc. 2013 Sample YARN Application – DistributedShell • ApplicationMaster – Code Walkthrough Page 49 4     // Now wait for containers to complete! 4     int completedContainers = 0;! 4     while (completedContainers < n) {! 4       AllocateResponse response = rmClient.allocate(completedContainers/n);! 4       for (ContainerStatus status : response.getCompletedContainersStatuses()) {! 5         if (status.getExitStatus() == 0) {! 5           ++completedContainers;! 5         }  ! 5       }! 4       Thread.sleep(100);! 5     }! 5  ! 5     // Un-register with ResourceManager! 5     rmClient.unregisterApplicationMaster(SUCCEEDED, "", "");! Wait for containers to complete successfully Un-register with ResourceManager
  • 50. © Hortonworks Inc. 2013 Apache Hadoop MapReduce on YARN • Original use-case • Most complex application to build – Data-locality – Fault tolerance – ApplicationMaster recovery: Check point to HDFS – Intra-application Priorities: Maps v/s Reduces – Needed complex feedback mechanism from ResourceManager – Security – Isolation • Binary compatible with Apache Hadoop 1.x Page 50
  • 51. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   map  1.1   reduce2.1   ResourceManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   map1.2   reduce1.1   MR  AM  1   map2.1   map2.2   reduce2.2   MR  AM2   Apache Hadoop MapReduce on YARN Scheduler  
  • 52. © Hortonworks Inc. 2013 Apache Tez on YARN • Replaces MapReduce as primitive for Pig, Hive, Cascading etc. – Smaller latency for interactive queries – Higher throughput for batch queries – http://incubator.apache.org/projects/tez.html Page 52 Hive - Tez SELECT a.x, AVERAGE(b.y) AS avg FROM a JOIN b ON (a.id = b.id) GROUP BY a UNION SELECT x, AVERAGE(y) AS AVG FROM c GROUP BY x ORDER BY AVG; Hive - MR
  • 53. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   map  1.1   vertex1.2.2   ResourceManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   map1.2   reduce1.1   MR  AM  1   vertex1.1.1   vertex1.1.2   vertex1.2.1   Tez  AM1   Apache Tez on YARN Scheduler  
  • 54. © Hortonworks Inc. 2013 HOYA - Apache HBase on YARN • Hoya – Apache HBase becomes user-level application • Use cases – Small HBase cluster in large YARN cluster – Dynamic HBase clusters – Transient/intermittent clusters for workflows • APIs to create, start, stop & delete HBase clusters • Flex cluster size: increase/decrease size with load • Recover from Region Server loss with new container. Page 54 Code: https://github.com/hortonworks/hoya
  • 55. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   map  1.1   Hbase  Master1.1   ResourceManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   map1.2   reduce1.1   MR  AM  1   RegionServer  1.1   RegionServer  1.2   RegionServer  1.3   HOYA  AM1   HOYA - Apache HBase on YARN Scheduler  
  • 56. © Hortonworks Inc. 2013 HOYA - Highlights • Cluster specification stored as JSON in HDFS • Config directory cached - dynamically patched before pushing up as local resources for Master & RegionServers • HBase tar file stored in HDFS -clusters can use the same/different HBase versions • Handling of cluster flexing is the same code as unplanned container loss. • No Hoya code on RegionServers: client and AM only Page 56
  • 57. © Hortonworks Inc. 2013 Storm on YARN • Ability to deploy multiple Storm clusters on YARN for real-time event processing • Yahoo – Primary contributor – 200+ nodes in production • Ability to recover from faulty nodes – Get new containers • Auto-scale for load balancing – Get new containers as load increases – Release containers as load decreases Page 57 Code: https://github.com/yahoo/storm-yarn
  • 58. © Hortonworks Inc. 2012 NodeManager   NodeManager   NodeManager   NodeManager   map  1.1   Nimbus1.4   ResourceManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   NodeManager   map1.2   reduce1.1   MR  AM  1   Nimbus1.1   Nimbus1.2   Nimbus1.3   Storm  AM1   Storm on YARN Scheduler  
  • 59. © Hortonworks Inc. 2013 General Architectural Considerations • Fault Tolerance – Checkpoint • Security • Always-On services • Scheduler features – Whitelist resources – Blacklist resources – Labels for machines – License management Page 59
  • 60. © Hortonworks Inc. 2013 Agenda • Why YARN • YARN Architecture and Concepts • Building applications on YARN • Next Steps Page 60
  • 61. © Hortonworks Inc. 2013 HDP 2.0 Community Preview & YARN Certification Program Goal: Accelerate # of certified YARN-based solutions •  HDP 2.0 Community Preview – Contains latest community Beta of Apache Hadoop 2.0 & YARN – Delivered as easy to use Sandbox VM, as well as RPMs and Tarballs – Enables YARN Cert Program Community & commercial ecosystem to test and certify new and existing YARN-based apps •  YARN Certification Program – More than 14 partners in program at launch –  Splunk* –  Elastic Search* –  Altiscale* –  Concurrent* –  Microsoft –  Platfora –  Tableau –  (IBM) DataStage –  Informatica –  Karmasphere –  and others * Already certified
  • 62. © Hortonworks Inc. 2013 Forum & Office Hours • YARN Forum – Community of Hadoop YARN developers – Focused on collaboration and Q&A –  http://hortonworks.com/community/forums/forum/yarn • Office Hours – http://www.meetup.com/HSquared-Hadoop-Hortonworks-User- Group/ – Bi-weekly office hours with Hortonworks engineers & architects – Every other Thursday, starting on Aug 15th from 4:30 – 5:30 PM (PST) – At Hortonworks Palo Alto HQ – West Bayshore Rd. Palo Alto, CA 94303 USA Page 62
  • 63. © Hortonworks Inc. 2013 Technical Resources • ASF hadoop-2.1.0-beta – Coming soon! –  http://hadoop.apache.org/common/releases.html – Release Documentation: http://hadoop.apache.org/common/docs/r2.1.0-beta • Blogs: –  http://hortonworks.com/hadoop/yarn –  http://hortonworks.com/blog/category/apache-hadoop/yarn/ –  http://hortonworks.com/blog/introducing-apache-hadoop-yarn/ –  http://hortonworks.com/blog/apache-hadoop-yarn-background- and-an-overview/ Page 63
  • 64. © Hortonworks Inc. 2013 What Next? • Download the Book • Download the Community preview • Join the Beta Program • Participate via the YARN forum • Come see us during the YARN Office Hours • Follow us… @hortonworks Thank You! Page 64 http://hortonworks.com/hadoop/yarn