YARN: Hadoop Beyond MapReduce - Andreas Neumann
by jaxconf on Jun 11, 2013
- 346 views
When Apache Hadoop was first introduced to the Open Source world, it was focused on implementing Google's Map/Reduce, a framework for batch processing of very large files in a distributed system. ...
When Apache Hadoop was first introduced to the Open Source world, it was focused on implementing Google's Map/Reduce, a framework for batch processing of very large files in a distributed system. Hadoop implemented this framework along with a built-in cluster resource manager specialized in the execution of Map/Reduce programs. The popularity of Hadoop has since led to the (re-)emergence of various other "Big Data" processing paradigms, such as real-time stream processing (Storm, S4), message-passing (MPI), and graph processing (Giraph). However, because Hadoop was specialized in Map/Reduce, it has been hard to leverage existing Hadoop infrastructure for these other paradigms. This changes with the release of Apache Hadoop 2.0, the next-generation Map/Reduce engine: Its new resource manager, YARN, not only improves availability, scalability, security and multi-tenancy, it also decouples the cluster management from the Map/Reduce engine. YARN manages the cluster's compute resources as "compute slots". Through its application master API, each application can obtain slots from YARN and is free to use them for any type of computation. YARN manages and schedules available slots in the cluster, monitors running slots, and notifies the application if one of its slots has died. The application master decides how many slots to allocate and what tasks run in those slots. With this architecture, YARN supports running arbitrary compute paradigms that can all share the resources of one cluster. This allows for innovation, agility and better hardware utilization. This talk will give a detailed introduction of YARN's concepts, architecture and interfaces. We will illustrate the use of YARN's APIs and show examples of YARN applications. We will share the experience and lessons learned from building a real-time streaming engine on top of YARN and running it at scale in production, and we will present best practices and design patterns around YARN.
- Total Views
- Views on SlideShare
- Embed Views