Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

•

8 likes•3,209 views

YARN is a new resource management architecture in Hadoop that provides improved scaling for large applications and high cluster utilization. It introduces the concept of separating resource management from job scheduling and tracking. This allows it to scale to larger clusters and support a wider variety of applications beyond just MapReduce. Key aspects of YARN include the use of an event-driven architecture for asynchronous processing of heartbeats, declarative state management for improved debuggability, and application master recovery for fault tolerance.

Technology

Hadoop
YARN - Under the Hood

Sharad Agarwal
sharad@apache.org

Recap: Hadoop 1.0 Map-Reduce
JobTracker
Manages cluster resources
and job scheduling
TaskTracker
Per-node agent
Manage tasks

YARN Architecture
Node
Node
Manager
Manager

Container App Mstr
App Mstr

Client

Resource
Resource Node
Node
Manager
Manager Manager
Manager
Client
Client

App Mstr Container
Container

MapReduce Status Node
Node
MapReduce Status
Manager
Manager
Job Submission
Job Submission
Node Status
Node Status
Resource Request
Resource Request Container Container

What the new Architecture gets us?

Scale
Compute Platform

Scale for a compute platform
• Application Size
• No of sub-tasks
• Application level state
• eg. Counters
• Number of Concurrent Tasks in a single
cluster

Application size scaling in
Hadoop 1.0

JTHeap µTotalTasks, Nodes, JobCounters

Application size scaling in YARN
is by
Architecture

Why a limitation on cluster size ?

Hadoop 1.0
Cluster
Utilization

Cluster Size

JobTracker JIP TIP Scheduler

Heartbeat
Request

• Synchronous Heartbeat
Processing
• JobTracker Global Lock

Heartbeat
Response

JT transaction rate limit:
200 heartbeats/sec

Highly Concurrent Systems
• scales much better (if done
right)
• makes effective use of multi-
core hardware
• managing eventual
consistency of states hard
• need for a systemic framework
to manage this

Event Queue Event
Dispatcher

Component Component Component
A B N

• Mutations only via events
• Components only expose Read APIs
• Use Re-entrant locks
• Components follow clear lifecycle

Event Model

Heartbeat NodeManager
Listener Event Q Meta

Heartbeat
Request

Get
commands

Heartbeat
Response

Asynchronous
Heartbeat Handling

YARN: Better utilization bigger
cluster
YARN

Cluster
Utilization Hadoop 1.0

Cluster Size

State management in JT
Very Hard to Maintain
Debugging even harder

Complex State Management
• Light weight State Machines Library
• Declarative way of specifying the state
Transitions
• Invalid transitions are handled automatically
• Fits nicely with the event model
• Debug-ability is drastically improved.
Lineage of object states can easily be
determined
• Handy while recovering the state

MR Application Master Recovery
• Hadoop 1.0
• Application need to resubmit Job
• All completed tasks are lost

• YARN
• Application execution state check pointed in
HDFS
• Rebuilds the state by replaying the events

Resource Manager HA
• Based on Zookeeper
• Coming Soon
• YARN-128

YARN: New Possibilities
• Open MPI - MR-2911
• Master-Worker – MR-3315
• Distributed Shell
• Graph processing – Giraph-13
• BSP – HAMA-431
• CEP
• S4 – S4-25
• Storm -
https://github.com/nathanmarz/storm/issues/74
• Iterative processing - Spark
https://github.com/mesos/spark-yarn/

YARN - a solid foundation to take
Hadoop to next level
on

Scale, High Availability, Utilization
And
Alternate Compute Paradigms

The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. In this session, we will be presenting the architecture and design of the next generation of MapReduce and will delve into the details of the architecture that makes it much easier to innovate. The architecture will have built in HA, security and multi-tenancy to support many users on the larger clusters. It will also increase innovation, agility and hardware utilization. We will also be presenting large scale and small scale comparisons on some benchmarks with MRV1.

Introduction to Yarn

Omid Vahdaty

Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...

Cloudera, Inc.

Hadoop is a popular framework for web 2.0 and enterprise businesses who are challenged to store, process and analyze large amounts of data as part of their business requirements. Hadoop’s framework brings a new set of challenges related to the compute infrastructure and underlined network architectures. This session reviews the state of Hadoop enterprise environments, discusses fundamental and advanced Hadoop concepts and reviews benchmarking analysis and projection for big data growth as related to Data Center and Cluster designs. The session also discusses network architecture tradeoffs, and the advantages of close integration between compute and networking.

YARN - Next Generation Compute Platform fo Hadoop

Hortonworks

GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale

sparktc

In this talk, we introduce the extensions of Spark Streaming to support (1) SQL-based query processing and (2) elastic-seamless resource allocation. First, we explain the methods of supporting window queries and query chains. As we know, last year, Grace Huang and Jerry Shao introduced the concept of “StreamSQL” that can process streaming data with SQL-like queries by adapting SparkSQL to Spark Streaming. However, we made advances in supporting complex event processing (CEP) based on their efforts. In detail, we implemented the sliding window concept to support a time-based streaming data processing at the SQL level. Here, to reduce the aggregation time of large windows, we generate an efficient query plan that computes the partial results by evaluating only the data entering or leaving the window and then gets the current result by merging the previous one and the partial ones. Next, to support query chains, we made the result of a query over streaming data be a table by adding the “insert into” query. That is, it allows us to apply stream queries to the results of other ones. Second, we explain the methods of allocating resources to streaming applications dynamically, which enable the applications to meet a given deadline. As the rate of incoming events varies over time, resources allocated to applications need to be adjusted for high resource utilization. However, the current Spark's resource allocation features are not suitable for streaming applications. That is, the resources allocated will not be freed when new data are arriving continuously to the streaming applications even though the quantity of the new ones is very small. In order to resolve the problem, we consider their resource utilization. If the utilization is low, we choose victim nodes to be killed. Then, we do not feed new data into the victims to prevent a useless recovery issuing when they are killed. Accordingly, we can scale-in/-out the resources seamlessly.

Back to School - St. Louis Hadoop Meetup September 2016

Adam Doyle

Distributed Processing Frameworks

Antonios Katsarakis

Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...

DataStax Academy

Presenter: Gurashish Brar, Member of Technical Staff at Bloomreach Dynamically scaling Cassandra to serve hundreds of map-reduce jobs that come at an unpredictable rate and at the same time providing access to the data in real time to front-end application with strict TP95 latency guarantees is a hard problem. We present a system for managing Cassandra clusters which provide following functionality: 1) Dynamic scaling of capacity to serve high throughput map-reduce jobs 2) Provide access to data generated by map-reduce jobs in realtime to front-end applications while providing latency SLAs for TP95 3) Maintain a low cost by leveraging Amazon Spot Instances and through demand based scaling. At the heart of this infrastructure lies a custom data replication service that makes it possible to stream data to new nodes as needed.

Resource scheduling

Ghazal Tashakor

Migrating to Riak at Shareaholic

Shareaholic

High Performance Deep learning with Apache Spark

Rui Liu

Enterprise Scale Topological Data Analysis Using Spark

Alpine Data

Philly DB MapR OverviewMapR Technologies

Взгляд на облака с точки зрения HPC

Olga Lavrentieva

Hadoop scheduler

Subhas Kumar Ghosh

Apache Tez : Accelerating Hadoop Query Processing

Bikas Saha

Apache Tez is the new data processing framework in the Hadoop ecosystem. It runs on top of YARN - the new compute platform for Hadoop 2. Learn how Tez is built from the ground up to tackle a broad spectrum of data processing scenarios in Hadoop/BigData - ranging from interactive query processing to complex batch processing. With a high degree of automation built-in, and support for extensive customization, Tez aims to work out of the box for good performance and efficiency. Apache Hive and Pig are already adopting Tez as their platform of choice for query execution.

Hadoop Scheduling - a 7 year perspective

Joydeep Sen Sarma

CaffeOnSpark: Deep Learning On Spark Cluster

Jen Aman

Yarnthug2014

Joseph Niemiec

Anti patterns in hadoop cluster deployment

Naganarasimha Garla

Distributed Resource Scheduling Frameworks, Is there a clear Winner ?

Naganarasimha Garla

YARN - Hadoop Next Generation Compute Platform

Bikas Saha

Taming YARN @ Hadoop conference Japan 2014Tsuyoshi OZAWA

Dynamic Resource Allocation Spark on YARNTsuyoshi OZAWA

Closing the back dooronthecity

What's hot

Spark Overview and Performance Issues

Antonios Katsarakis

Extending Spark Streaming to Support Complex Event Processing

Oh Chan Kwon

Back to School - St. Louis Hadoop Meetup September 2016

Adam Doyle

Distributed Processing Frameworks

Antonios Katsarakis

Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...

DataStax Academy

Resource scheduling

Ghazal Tashakor

Migrating to Riak at Shareaholic

Shareaholic

High Performance Deep learning with Apache Spark

Rui Liu

Enterprise Scale Topological Data Analysis Using Spark

Alpine Data

Philly DB MapR OverviewMapR Technologies

Взгляд на облака с точки зрения HPC

Olga Lavrentieva

Hadoop scheduler

Subhas Kumar Ghosh

Apache Tez : Accelerating Hadoop Query Processing

Bikas Saha

Hadoop Scheduling - a 7 year perspective

Joydeep Sen Sarma

CaffeOnSpark: Deep Learning On Spark Cluster

Jen Aman

Yarnthug2014

Joseph Niemiec

Anti patterns in hadoop cluster deployment

Naganarasimha Garla

Distributed Resource Scheduling Frameworks, Is there a clear Winner ?

Naganarasimha Garla

YARN - Hadoop Next Generation Compute Platform

Bikas Saha

Taming YARN @ Hadoop conference Japan 2014Tsuyoshi OZAWA

What's hot (20)

Spark Overview and Performance Issues

Extending Spark Streaming to Support Complex Event Processing

Back to School - St. Louis Hadoop Meetup September 2016

Distributed Processing Frameworks

Cassandra Summit 2014: Cassandra Compute Cloud: An elastic Cassandra Infrastr...

Resource scheduling

Migrating to Riak at Shareaholic

High Performance Deep learning with Apache Spark

Enterprise Scale Topological Data Analysis Using Spark

Philly DB MapR Overview

Взгляд на облака с точки зрения HPC

Hadoop scheduler

Apache Tez : Accelerating Hadoop Query Processing

Hadoop Scheduling - a 7 year perspective

CaffeOnSpark: Deep Learning On Spark Cluster

Yarnthug2014

Anti patterns in hadoop cluster deployment

Distributed Resource Scheduling Frameworks, Is there a clear Winner ?

YARN - Hadoop Next Generation Compute Platform

Taming YARN @ Hadoop conference Japan 2014

Viewers also liked

Dynamic Resource Allocation Spark on YARNTsuyoshi OZAWA

Closing the back dooronthecity

Solvency And Asset Recommendations 2011

mrittmayer

Got Energy? You can't be successful without it!

Chery Gegelman

Astec artesyn power supply stock

Bestern Asia Industrial Limited

Skoleni golfovych rozhodcich III. tridy

Boleslav Bobcik

Presentació tertúlies literàries

lelescd

Output

Janeth Flores

What are the types of channelsSindhu Ragunathan

Guía educación Bilingüe para padres / Bilingual Education Guide for parents

Baby Erasmus

2000 years ago

Nancy Jensen

Gulliver al país de Li.liput

lelescd

What are the types of channelsSindhu Ragunathan

Speaker Kit - Gift Spotter

Kristyn Haywood

D o-eKeith James

Lomo okoruzinek

Alicesychun92

Ceit338gozdesalahi

The Authentic Leadership Program

Kristyn Haywood

Redrock It Brochure Spredrock2000

Viewers also liked (20)

Dynamic Resource Allocation Spark on YARN

Closing the back door

Solvency And Asset Recommendations 2011

Got Energy? You can't be successful without it!

Astec artesyn power supply stock

Skoleni golfovych rozhodcich III. tridy

Presentació tertúlies literàries

Output

What are the types of channels

Guía educación Bilingüe para padres / Bilingual Education Guide for parents

2000 years ago

Gulliver al país de Li.liput

What are the types of channels

Speaker Kit - Gift Spotter

D o-e

Lomo oko

Alice

Ceit338

The Authentic Leadership Program

Redrock It Brochure Sp

Similar to Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

Hadoop ecosystem

Stanley Wang

Hadoop ecosystem

Stanley Wang

[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史

Insight Technology, Inc.

次期リリースとなるApache Hadoop 2.6 は,2系リリース後の最大のアップデートと言えるほど新しい機能が目白押しです。本講演では、Hadoop開発者の視点からHadoop 2系の中心となる YARN に関する基本的な説明と、Apache Hadoop 2.6 でリリース予定の最新機能の紹介を行います。特に、当方が開発に関わっている YARN のマスタ高可用化の仕組みや、Hadoop 2系を運用する上で必須なYARNのリソース管理の方法について詳細に解説します。

Apache Hadoop YARN - Hortonworks Meetup PresentationHortonworks

Hadoop World 2011, Apache Hadoop MapReduce Next GenHortonworks

Yarn

Yu Xia

Taming YARN @ Hadoop Conference Japan 2014Tsuyoshi OZAWA

Parallel Linear Regression in Interative Reduce and YARN

DataWorks Summit

Online learning techniques, such as Stochastic Gradient Descent (SGD), are powerful when applied to risk minimization and convex games on large problems. However, their sequential design prevents them from taking advantage of newer distributed frameworks such as Hadoop/MapReduce. In this session, we will take a look at how we parallelized linear regression parameter optimization on the next-gen YARN framework Iterative Reduce.

YARN: Future of Data Processing with Apache HadoopHortonworks

Apache Hadoop YARN State of the Union

Weiwei Yang

Searching conversations with hadoopDataWorks Summit

ApacheCon BigData - What it takes to process a trillion events a day?

Jagadish Venkatraman

Riak at shareaholic

freerobby

Apache Hadoop MapReduce: What's Next

DataWorks Summit

Apache Hadoop has made giant strides since the last Hadoop Summit: the community has released hadoop-1.0 after nearly 6 years and is now on the cusp of the Hadoop.next (think of it as hadoop-2.0). Given the next generation of MR is out with 0.23.0 and 0.23.1, there is a new set of features that have been requested in the community. In this talk we will talk about the next set of features like pre emption, web services and near real time analysis and how we are working on tackling these in the near future. In this talk we will also cover the roadmap for Next Gen Map Reduce and timelines along with the release schedule for Apache Hadoop.

CloudStack Architecture Future

Kimihiko Kitase

Hanborq optimizations on hadoop map reduce 20120221a

Schubert Zhang

How to Make Hadoop Easy, Dependable and FastMapR Technologies

Introduction to Yarn

Apache Apex

Times Ten in-memory database when time counts - Laszlo Ludas

ORACLE USER GROUP ESTONIA

Introduction to HadoopOvidiu Dimulescu

Similar to Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe) (20)

Hadoop ecosystem

[db tech showcase Tokyo 2014] C32: Hadoop最前線 - 開発の現場から by NTT 小沢健史

Apache Hadoop YARN - Hortonworks Meetup Presentation

Hadoop World 2011, Apache Hadoop MapReduce Next Gen

Yarn

Taming YARN @ Hadoop Conference Japan 2014

Parallel Linear Regression in Interative Reduce and YARN

YARN: Future of Data Processing with Apache Hadoop

Apache Hadoop YARN State of the Union

Searching conversations with hadoop

ApacheCon BigData - What it takes to process a trillion events a day?

Riak at shareaholic

Apache Hadoop MapReduce: What's Next

CloudStack Architecture Future

Hanborq optimizations on hadoop map reduce 20120221a

How to Make Hadoop Easy, Dependable and Fast

Introduction to Yarn

Times Ten in-memory database when time counts - Laszlo Ludas

Introduction to Hadoop

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

Securing your Kubernetes cluster_ a step-by-step guide to success !

KatiaHIMEUR1

Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster. However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks. In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

FIDO Alliance

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Essentials of Automations: Optimizing FME Workflows with Parameters

Safe Software

Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place. Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects. Here’s what you’ll gain: - Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows. - Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy. - Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency. - Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity. We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic. Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

When stars align: studies in data quality, knowledge graphs, and machine lear...

Elena Simperl

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

Albert Hoitingh

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Product School

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

BookNet Canada

The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more. Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/ Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.

Monitoring Java Application Security with JDK Tools and JFR Events

Ana-Maria Mihalceanu

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

Elevating Tactical DDD Patterns Through Object Calisthenics

Dorra BARTAGUIZ

After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Paul Groth

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph

Mission to Decommission: Importance of Decommissioning Products to Increase E...

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Securing your Kubernetes cluster_ a step-by-step guide to success !

Key Trends Shaping the Future of Infrastructure.pdf

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Essentials of Automations: Optimizing FME Workflows with Parameters

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

When stars align: studies in data quality, knowledge graphs, and machine lear...

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...

Transcript: Selling digital books in 2024: Insights from industry leaders - T...

Monitoring Java Application Security with JDK Tools and JFR Events

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

Elevating Tactical DDD Patterns Through Object Calisthenics

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

To Graph or Not to Graph Knowledge Graph Architectures and LLMs

Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

1. Hadoop YARN - Under the Hood Sharad Agarwal sharad@apache.org

2. About me

3. Recap: Hadoop 1.0 Map-Reduce JobTracker Manages cluster resources and job scheduling TaskTracker Per-node agent Manage tasks

4. YARN Architecture Node Node Manager Manager Container App Mstr App Mstr Client Resource Resource Node Node Manager Manager Manager Manager Client Client App Mstr Container Container MapReduce Status Node Node MapReduce Status Manager Manager Job Submission Job Submission Node Status Node Status Resource Request Resource Request Container Container

5. What the new Architecture gets us? Scale Compute Platform

6. Scale for a compute platform • Application Size • No of sub-tasks • Application level state • eg. Counters • Number of Concurrent Tasks in a single cluster

7. Application size scaling in Hadoop 1.0 JTHeap µTotalTasks, Nodes, JobCounters

8. Application size scaling in YARN is by Architecture

9. Why a limitation on cluster size ? Hadoop 1.0 Cluster Utilization Cluster Size

10. JobTracker JIP TIP Scheduler Heartbeat Request • Synchronous Heartbeat Processing • JobTracker Global Lock Heartbeat Response JT transaction rate limit: 200 heartbeats/sec

11. Highly Concurrent Systems • scales much better (if done right) • makes effective use of multi- core hardware • managing eventual consistency of states hard • need for a systemic framework to manage this

12. Event Queue Event Dispatcher Component Component Component A B N • Mutations only via events • Components only expose Read APIs • Use Re-entrant locks • Components follow clear lifecycle Event Model

13. Heartbeat NodeManager Listener Event Q Meta Heartbeat Request Get commands Heartbeat Response Asynchronous Heartbeat Handling

14. YARN: Better utilization bigger cluster YARN Cluster Utilization Hadoop 1.0 Cluster Size

15. State Management

16.

17. State management in JT Very Hard to Maintain Debugging even harder

18. Complex State Management • Light weight State Machines Library • Declarative way of specifying the state Transitions • Invalid transitions are handled automatically • Fits nicely with the event model • Debug-ability is drastically improved. Lineage of object states can easily be determined • Handy while recovering the state

19. Declarative State Machine

20. High Availability

21. MR Application Master Recovery • Hadoop 1.0 • Application need to resubmit Job • All completed tasks are lost • YARN • Application execution state check pointed in HDFS • Rebuilds the state by replaying the events

22. Resource Manager HA • Based on Zookeeper • Coming Soon • YARN-128

23. YARN: New Possibilities • Open MPI - MR-2911 • Master-Worker – MR-3315 • Distributed Shell • Graph processing – Giraph-13 • BSP – HAMA-431 • CEP • S4 – S4-25 • Storm - https://github.com/nathanmarz/storm/issues/74 • Iterative processing - Spark https://github.com/mesos/spark-yarn/

24. YARN - a solid foundation to take Hadoop to next level on Scale, High Availability, Utilization And Alternate Compute Paradigms

25. Thank You @twitter: sharad_ag

Editor's Notes

We will talk about how YARN is built fundamentally different than Hadoop 1.0. what is the motivation for doing so ? What it buys us ?Hadoop 1.0 as classic MRHadoop 2.0 has MR on Yarn
I work primarily on Map-Reduce side and was part of the team when yarn was conceptualizedI work at InMobi, which is a mobile advertising company. I lead the development of big data platforms at InMobi, right from data collection to data analytics systems.I don’t see many folks from India. I am the organizer of hadoopmeetup group.
Quick primer on the Hadoop 1.0 architectureSingle Master known as JobTracker. Slave daemons are called TaskTrackerClient submit Jobs to JobTracker.Individual jobs contain map and reduce definitions.Jobtracker knows about the cluster resource and schedules the map and reduce tasks accordingly.
Single master known as Resource Manager - RM manages the resources of the clusterSlave daemons known as NodeManager - manages the resources of individual nodesClient submits Applications (Jobs are now called applications in YARN) to ResourceManagerEach Application has its own master process which gets spawned when the Application starts running - this process is responsible for managing the lifecycle of the Application- called Application master - fi Application Master wants to spawn more processes in the cluster, it ask the resource manager to spawn one. - the resource definition of the process which needs to be launched in the cluster is container – it says about things like RAM, disk, cpuetcFundamentally the application state mgmt is distributedRM is only responsible for cluster mgmt
What the new architecture gets usTodo:put animationScale and general purpose distributed compute platformI will discuss First Lets understand what scale meanshadoop context
For a distributed computate platform, scalability is at two levelsin terms of how big a single application couldAnd the number of concurrent running tasks in a single clusterApplication size is number of sub tasks and application level state.Number of concurrent tasks is nothing but the cluster size
In hadoop 1.0, application size is constrained by JobtrakerJT is a huge Monolithic master.Keeps cluster level metadata, task level metadata and application specific meta data. You see things like counter limits etc for the same reasonTODO: put the formulae
Todo: put animationBecauseApplication management is distributedLets see the other : number of concurrent tasks
Todo: animationWhy nobody runs more than say 3k or 4k nodes in JTBecause as the cluster size grows the utilization drops. The steeper the curve, the more you sacrifice on utilization So we said at 4k utilization is acceptable, so cluster size should not grow beyond thatLets see why this drops
Task scheduling happens in the heartbeatJob tracker has a global lock and heartbeat is process synchronouslyJT thru put is limited say 200 heartbeats/secAs cluster size is increased the interval a TT sends a heartbeat increasesJT is not very concurrentNeed to design for better concurrency
Same as in slides
Asynchronous processing of eventsEach component encapsulates its state. Mutations happen only via eventsReads can happen direclty
In Resource manager, the heartbeat processing is asynchronous, so it can handle large number of heartbeats/secso what is the impact of this on utilization
as the cluster size increases, the drop in utlization is much lowerYarn cluster can have large number of nodes within a single cluster
Lets look at state management aspectsThe state management in distributed systems where there are lot of moving parts is very crucial
This is the state transition picture for Jobsimilarly for different entities like Job, task, attemptTask and attempt have even more nodes
This is a very small snippet of Jobtracker code. It is thru out like this.No one dares to touch this. Very very fragile
For this reason, yarn has very light weight state machine library
All valid state transitions are declared upfrontApart from the obvious benefits:Now one can visualize/discuss/argue about the proposed changes to state machine which is not possible in current Hadoop 1.0I remember the first version of all the state transitions we designed in a spreadsheet in which we could see what all valid transitions we are missing
Lets look at the HA story
Same as in slides
Now since the work being done by Resource Manager is limited to cluster management and scheduling, now it is much much simpler to build HA in RM as opposed to JT which has a huge state
There are several compute paradigm being built over YARN. This list some of themThere are several others as well
Same as slideYARN is a general purpose distributed compute platform

Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

Similar to Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe) (20)

Recently uploaded

Recently uploaded (20)

Apachecon Hadoop YARN - Under The Hood (at ApacheCon Europe)

Editor's Notes