Economic Scheduling of Hadoop Jobs

•

5 likes•1,254 views

A presentation on The Dynamic Priority MapReduce Scheduler by Thomas Sandholm and Kevin Lai of HP Labs, Palo Alto. This scheduler is a contribution to Hadoop 0.21+

Technology

Economic Scheduling of
Hadoop Jobs
-The Dynamic Priority MapReduce Scheduler

Thomas Sandholm, HP Labs, Palo Alto
Kevin Lai, HP Labs, Palo Alto

1

The Problem

» Allocate slots on compute nodes for job tasks
» Classic Approach: Throughput optimization
» Cross User Priorities inferred based on heuristics
» Social Scheduling
» Our Approach: User value optimization
» Users are given an incentive to scale up or down
» Automate demand conflict resolution

2

Other Hadoop Schedulers

» FIFO
» HOD
» Fairshare
» Capacity

» Designed for no queues or few static fixed QoS
queues
» Works well in corporate clusters

3

Dynamic Priority Scheduler Requirements

» Users may come and go frequently
» Users may be unknown to providers
» Users may want to schedule jobs across data
centers and Hadoop installations

manual, social scheduling of users
(assumed to be cooperating) breaks down

4

Our Solution: Automated Resource Allocation

Budget Remaining

Running Share
Tasks

Pending Spending Rate
Tasks
6

Proportional-Share Scheduling

» qi = bi/(bi + p)
» p = ∑ b-i

» Huberman et al Spawn ‘92
» Waldspurger et al Lottery Scheduling ‘95
» Lai et al Tycoon ‘05

7

Key Design Principles

» Pay-per-use: spending rate is only deducted from budget
if a job performed work
» Work-conserving: users are never charged more than
their spending rates but can get more slots if other users
are idle
» Preemptive: higher spending users may cause tasks from
lower spending users to be killed
» Scalable: No memory, or history-based fair-share
smoothing

8

Implementation

» Standalone Hadoop MapReduce JobTracker Scheduler
Plugin
» HTTP/XML/REST Servlet to provide secure management
and monitoring of queues
» Generic queue allocation/accounting classes (could
move into mapred core)
» Pluggable scheduler enforcing shares, when scheduling
jobs (could be replaced by capacity/fairshare
enforcers)

9

Configuration
Option Examples
mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.DynamicPriorityScheduler

mapred.priority-scheduler.kill-interval 0

mapred.dynamic-scheduler.alloc-interval 20

mapred.dynamic-scheduler.budget-file /etc/hadoop.budget

mapred.priority-scheduler.acl-file /etc/hadoop.acl

10

Experiment

Fairshare vs Capacity vs FIFO vs DP
2-80 simulated users/queues
2 Clusters
PiEstimator Simulation

11

Budget Dynamics
DynPrio preempt FIFO scheduler
DynPrio no Capacity scheduler
preempt

Funding runs out Budget replenished

12

Service Differentiation

DynPrio FIFO

13

More info

» Papers
› SIGMETRICS 2009
› Workshop on Job Scheduling for Parallel Processing
(JSSPP’10)
› International Conference on Cloud Computing and
Virtualization (CCV’10)
» HADOOP-4768 JIRA
» Source:
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/

15

What's hot

Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...LeMeniz Infotech

Hadoop Summit San Jose 2014: Costing Your Big Data Operations Sumeet Singh

Advanced Hadoop Tuning and Optimization Shivkumar Babshetty

Hadoop schedulerSubhas Kumar Ghosh

Distributed Processing FrameworksAntonios Katsarakis

Analysing of big data using map reducePaladion Networks

Resource Aware Scheduling for Hadoop [Final Presentation]Lu Wei

Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Renato Bonomini

MapReduce Scheduling AlgorithmsLeila panahi

AnjuAnju Shekhawat

Present & Future of Greenplum Database A massively parallel Postgres Database...VMware Tanzu

Greenplum-Spark November 2018KongYew Chan, MBA

May 2013 HUG: HCatalog/Hive Data OutYahoo Developer Network

Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...Sumeet Singh

Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh

Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...VMware Tanzu

Introduction to Hadoop and MapReduceCsaba Toth

Tensor Processing Unit (TPU)Antonios Katsarakis

Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu

Greenplum for Kubernetes - Greenplum Summit 2019VMware Tanzu

What's hot (20)

Self adjusting slot configurations for homogeneous and heterogeneous hadoop c...

Hadoop Summit San Jose 2014: Costing Your Big Data Operations

Advanced Hadoop Tuning and Optimization

Hadoop scheduler

Distributed Processing Frameworks

Analysing of big data using map reduce

Resource Aware Scheduling for Hadoop [Final Presentation]

Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...

MapReduce Scheduling Algorithms

Anju

Present & Future of Greenplum Database A massively parallel Postgres Database...

Greenplum-Spark November 2018

May 2013 HUG: HCatalog/Hive Data Out

Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...

Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...

Machine Learning, Graph, Text and Geospatial on Postgres and Greenplum - Gree...

Introduction to Hadoop and MapReduce

Tensor Processing Unit (TPU)

Mixing Analytic Workloads with Greenplum and Apache Spark

Greenplum for Kubernetes - Greenplum Summit 2019

Viewers also liked

My other computer_is_a_datacentreSteve Loughran

Extended essay overviewMary Alice Osborne

Battle At Goliadcompd

H is for_hadoopSteve Loughran

Overview of slider projectSteve Loughran

Farms, Fabrics and CloudsSteve Loughran

2013 11-19-hoya-statusSteve Loughran

Taming Deployment With Smart FrogSteve Loughran

HDP-1 introduction for HUG FranceSteve Loughran

A New Approach To Organizationcompd

GraphsSteve Loughran

Scholarly articlesMary Alice Osborne

EcholocationMary Alice Osborne

Did you really want that data?Steve Loughran

Viewers also liked (14)

My other computer_is_a_datacentre

Extended essay overview

Battle At Goliad

H is for_hadoop

Overview of slider project

Farms, Fabrics and Clouds

2013 11-19-hoya-status

Taming Deployment With Smart Frog

HDP-1 introduction for HUG France

A New Approach To Organization

Graphs

Scholarly articles

Echolocation

Did you really want that data?

Similar to Economic Scheduling of Hadoop Jobs

Apache Hadoop YARN - The Future of Data Processing with HadoopHortonworks

SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013James McGalliard

Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin

L017656475IOSR Journals

Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...iosrjce

Oct 2011 CHADNUG Presentation on HadoopJosh Patterson

Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...Govt.Engineering college, Idukki

Review of Calculation Paradigm and its ComponentsNamuk Park

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju

Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...Sumeet Singh

Introduction to HadoopYork University

Hadoop tutorialAamir Ameen

Apache Tez : Accelerating Hadoop Query ProcessingBikas Saha

Hadoop Tutorial.pptSathish24111

Hadoop ensma poitiersRim Moussa

Introduction to Apache HadoopChristopher Pezza

Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃Etu Solution

Bigdata and Hadoop with Dockerharidasnss

Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...DataWorks Summit

Drill into Drill – How Providing Flexibility and Performance is PossibleMapR Technologies

Similar to Economic Scheduling of Hadoop Jobs (20)

Apache Hadoop YARN - The Future of Data Processing with Hadoop

SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013

Apache Airflow (incubating) NL HUG Meetup 2016-07-19

L017656475

Map-Reduce Synchronized and Comparative Queue Capacity Scheduler in Hadoop fo...

Oct 2011 CHADNUG Presentation on Hadoop

Dache: A Data Aware Caching for Big-Data Applications Usingthe MapReduce Fra...

Review of Calculation Paradigm and its Components

Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015

Hadoop Summit San Jose 2015: What it Takes to Run Hadoop at Scale Yahoo Persp...

Introduction to Hadoop

Hadoop tutorial

Apache Tez : Accelerating Hadoop Query Processing

Hadoop Tutorial.ppt

Hadoop ensma poitiers

Introduction to Apache Hadoop

Track B-3 解構大數據架構 - 大數據系統的伺服器與網路資源規劃

Bigdata and Hadoop with Docker

Dr Elephant: LinkedIn's Self-Service System for Detecting and Treating Hadoop...

Drill into Drill – How Providing Flexibility and Performance is Possible

Recently uploaded

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq

TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein

2024 April Patch TuesdayIvanti

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes

Sample pptx for embedding into website for demoHarshalMandlekar2

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Take control of your SAP testing with UiPath Test SuiteDianaGray10

Recently uploaded (20)

How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx

TeamStation AI System Report LATAM IT Salaries 2024

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

The State of Passkeys with FIDO Alliance.pptx

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

UiPath Community: Communication Mining from Zero to Hero

Generative Artificial Intelligence: How generative AI works.pdf

DevEX - reference for building teams, processes, and platforms

Genislab builds better products and faster go-to-market with Lean project man...

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24

2024 April Patch Tuesday

Decarbonising Buildings: Making a net-zero built environment a reality

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

Assure Ecommerce and Retail Operations Uptime with ThousandEyes

Sample pptx for embedding into website for demo

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Take control of your SAP testing with UiPath Test Suite

Economic Scheduling of Hadoop Jobs

1. Economic Scheduling of Hadoop Jobs -The Dynamic Priority MapReduce Scheduler Thomas Sandholm, HP Labs, Palo Alto Kevin Lai, HP Labs, Palo Alto 1

2. The Problem » Allocate slots on compute nodes for job tasks » Classic Approach: Throughput optimization » Cross User Priorities inferred based on heuristics » Social Scheduling » Our Approach: User value optimization » Users are given an incentive to scale up or down » Automate demand conflict resolution 2

3. Other Hadoop Schedulers » FIFO » HOD » Fairshare » Capacity » Designed for no queues or few static fixed QoS queues » Works well in corporate clusters 3

4. Dynamic Priority Scheduler Requirements » Users may come and go frequently » Users may be unknown to providers » Users may want to schedule jobs across data centers and Hadoop installations manual, social scheduling of users (assumed to be cooperating) breaks down 4

5. Architecture 4/(4+1.5+2)*15=8 5

6. Our Solution: Automated Resource Allocation Budget Remaining Running Share Tasks Pending Spending Rate Tasks 6

7. Proportional-Share Scheduling » qi = bi/(bi + p) » p = ∑ b-i » Huberman et al Spawn ‘92 » Waldspurger et al Lottery Scheduling ‘95 » Lai et al Tycoon ‘05 7

8. Key Design Principles » Pay-per-use: spending rate is only deducted from budget if a job performed work » Work-conserving: users are never charged more than their spending rates but can get more slots if other users are idle » Preemptive: higher spending users may cause tasks from lower spending users to be killed » Scalable: No memory, or history-based fair-share smoothing 8

9. Implementation » Standalone Hadoop MapReduce JobTracker Scheduler Plugin » HTTP/XML/REST Servlet to provide secure management and monitoring of queues » Generic queue allocation/accounting classes (could move into mapred core) » Pluggable scheduler enforcing shares, when scheduling jobs (could be replaced by capacity/fairshare enforcers) 9

10. Configuration Option Examples mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.DynamicPriorityScheduler mapred.priority-scheduler.kill-interval 0 mapred.dynamic-scheduler.alloc-interval 20 mapred.dynamic-scheduler.budget-file /etc/hadoop.budget mapred.priority-scheduler.acl-file /etc/hadoop.acl 10

11. Experiment Fairshare vs Capacity vs FIFO vs DP 2-80 simulated users/queues 2 Clusters PiEstimator Simulation 11

12. Budget Dynamics DynPrio preempt FIFO scheduler DynPrio no Capacity scheduler preempt Funding runs out Budget replenished 12

13. Service Differentiation DynPrio FIFO 13

14. Dynamic Adjustment 14

15. More info » Papers › SIGMETRICS 2009 › Workshop on Job Scheduling for Parallel Processing (JSSPP’10) › International Conference on Cloud Computing and Virtualization (CCV’10) » HADOOP-4768 JIRA » Source: http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/ 15

Economic Scheduling of Hadoop Jobs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Economic Scheduling of Hadoop Jobs

Similar to Economic Scheduling of Hadoop Jobs (20)

More from Steve Loughran

More from Steve Loughran (20)

Recently uploaded

Recently uploaded (20)

Economic Scheduling of Hadoop Jobs