SlideShare a Scribd company logo
1 of 28
Download to read offline
Juan Sebastián Numpaque - Nicolás Cardozo
@ncardoz
{js.numpaque10, n.cardozo}@uniandes.edu.co
CCC’21 - 15 Congreso Colombiano de Computación- 22 al 26 de noviembre - (Virtual)
Evaluation of Work Stealing Algorithms
2
Scheduling computation
static Dynamic
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4 P3
P2
P1 P4
2
Scheduling computation
static Dynamic
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4 P3
P2
P1 P4
v3 v2 v1
3
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
Idle processors steal tasks from processors with tasks in their queue
v3 v2 v1
3
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
Idle processors steal tasks from processors with tasks in their queue
v3
v3 v2 v1
3
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
Idle processors steal tasks from processors with tasks in their queue
v3 v2
4
Work stealing
Work stealing presents an improvement with respect to dynamic
scheduling with respect to:
Automated work balancing
Better Portability
Scalability to the number of processors
Work stealing algorithms are good,
but how good are they?
6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V2
V3
V4
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1
head
6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V3
V4
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1 V2
head
6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1 V2
V3
V4
head
6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1 V2
V3
V4
head
LIFO FIFO
7
Work stealing algorithms
LIFO
FIFO
• A tasks’s children are enqueued at the back of the queue in the
processor that executed the parent task
• If the processor is idle, it takes the task at the queue’s head
• Tasks are stolen from another processor’s queue head
• A tasks’s children are enqueued at the head of the queue in the
processor that executed the parent task
• If the processor is idle, it takes the task at the queue’s head
• Tasks are stolen from the back of another processor’s queue
8
Priority-based work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
Longest path over the computation nodes
8
Priority-based work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7
Longest path over the computation nodes
8
Priority-based work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7 v8 v13
Longest path over the computation nodes
8
Priority-based work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7
v3 v8 v13
Longest path over the computation nodes
8
Priority-based work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7
v3 v8 v13
v6
v5 v7 v8 v13
Longest path over the computation nodes
9
Priority-based work stealing
Tasks further away from the end node (v14) should take priority
over tasks closer towards the end of the computation
• A tasks’s children are enqueued at the back of the queue ordered
by priority
• If the processor is idle, it takes the task at the queue’s head
• Tasks are stolen from another processor’s queue head
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
• Performance of the algorithm depends on the
way tasks are chosen (avoid possible
bottlenecks!)
• Classic algorithms are not fare
11
Evaluation
We evaluate the performance and fairness of existing work
stealing algorithms and our proposed approach
1. Generate a random computation DAGs
graph nodes variate in [50, 1600]
graph edges variate in density {0.2, 0.5, 0.8}
2.Scale the number of processors in the execution [1, 96]
3.Execute all the tasks in the DAG using each algorithm
12
Performance results
https://flaglab.github.io/WorkStealingAlgorithms/
Execution
time
in
ms
0
15
30
45
60
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
1
2
3
4
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
3
7
10
13
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
18
35
53
70
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
8 processors
96 processors
32 processors
density = 0.2
13
Performance results
https://flaglab.github.io/WorkStealingAlgorithms/
Execution
time
in
ms
0
13
25
38
50
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
2
3
5
6
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
3
7
10
13
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
30
60
90
120
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
8 processors
96 processors
32 processors
density = 0.5
14
Performance results
https://flaglab.github.io/WorkStealingAlgorithms/
8 processors
96 processors
32 processors
Execution
time
in
ms
0
13
25
38
50
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
2
5
7
9
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
20
40
60
80
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
225
450
675
900
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
density = 0.8
15
Fairness results
https://flaglab.github.io/WorkStealingAlgorithms/
Load
No.
of
tasks
0
45
90
135
180
No. of processors
1 2 3 4 5 6 7 8
PRIO FIFO LIFO
No.
of
tasks
0
40
80
120
160
No. of processors
1 2 3 4 5 6 7 8
PRIO FIFO LIFO
No.
of
tasks
0
35
70
105
140
No. of processor
1 2 3 4 5 6 7 8
PRIO FIFO LIFO
0.2 density 0.5 density
0.8 density
16
Fairness results
https://flaglab.github.io/WorkStealingAlgorithms/
Load
No.
of
tasks
0
10
20
30
40
No. of processors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
PRIO FIFO LIFO
No.
of
tasks
0
40
80
120
160
No. of processors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
PRIO FIFO LIFO
No.
of
tasks
0
35
70
105
140
No. of processors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
PRIO FIFO LIFO
0.2 density 0.5 density
0.8 density
• FIFO falls short in the in both performance and
balance at scale
• LIFO scales better that other algorithms
• Priority has a good performance but it can
decay rapidly with many nodes, however it
presents the best balance
@ncardoz n.cardozo@uniandes.edu.co
Conclusion
https://flaglab.github.io
• FIFO falls short in the in both performance and
balance at scale
• LIFO scales better that other algorithms
• Priority has a good performance but it can
decay rapidly with many nodes, however it
presents the best balance
@ncardoz n.cardozo@uniandes.edu.co
Questions?
Conclusion
https://flaglab.github.io

More Related Content

What's hot

Understanding git
Understanding gitUnderstanding git
Understanding git
Avik Das
 
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDevMake Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
Jian-Hong Pan
 

What's hot (20)

BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
BKK16-503 Undefined Behavior and Compiler Optimizations – Why Your Program St...
 
Understanding git
Understanding gitUnderstanding git
Understanding git
 
Berkeley Packet Filters
Berkeley Packet FiltersBerkeley Packet Filters
Berkeley Packet Filters
 
NYAN Conference: Debugging asynchronous scenarios in .net
NYAN Conference: Debugging asynchronous scenarios in .netNYAN Conference: Debugging asynchronous scenarios in .net
NYAN Conference: Debugging asynchronous scenarios in .net
 
p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4
 
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
Veriloggen.Stream: データフローからハードウェアを作る(2018年3月3日 高位合成友の会 第5回 @東京工業大学)
 
Radare2 @ ndh2k15 : First r2babies steps
Radare2 @ ndh2k15 : First r2babies stepsRadare2 @ ndh2k15 : First r2babies steps
Radare2 @ ndh2k15 : First r2babies steps
 
Mod06 new development tools
Mod06 new development toolsMod06 new development tools
Mod06 new development tools
 
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
Building Observable Applications w/ Node.js -- BayNode Meetup, March 2014
 
The Simple Scheduler in Embedded System @ OSDC.TW 2014
The Simple Scheduler in Embedded System @ OSDC.TW 2014The Simple Scheduler in Embedded System @ OSDC.TW 2014
The Simple Scheduler in Embedded System @ OSDC.TW 2014
 
Specializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network StackSpecializing the Data Path - Hooking into the Linux Network Stack
Specializing the Data Path - Hooking into the Linux Network Stack
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Pimp my gc - Supersonic Scala
Pimp my gc - Supersonic ScalaPimp my gc - Supersonic Scala
Pimp my gc - Supersonic Scala
 
Andrea Righi - Spying on the Linux kernel for fun and profit
Andrea Righi - Spying on the Linux kernel for fun and profitAndrea Righi - Spying on the Linux kernel for fun and profit
Andrea Righi - Spying on the Linux kernel for fun and profit
 
SuperAGILE Standard Orbital data Analysis pipeline
SuperAGILE Standard Orbital  data Analysis pipelineSuperAGILE Standard Orbital  data Analysis pipeline
SuperAGILE Standard Orbital data Analysis pipeline
 
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation GuideBKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
BKK16-302: Android Optimizing Compiler: New Member Assimilation Guide
 
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDevMake Your Own Developement Board @ 2014.4.21 JuluOSDev
Make Your Own Developement Board @ 2014.4.21 JuluOSDev
 
Using Kafka in your python application - Python fwdays 2020
Using Kafka in your python application - Python fwdays 2020Using Kafka in your python application - Python fwdays 2020
Using Kafka in your python application - Python fwdays 2020
 
Oleksandr Tarasenko "Using Kafka in your python applications"
Oleksandr Tarasenko "Using Kafka in your python applications"Oleksandr Tarasenko "Using Kafka in your python applications"
Oleksandr Tarasenko "Using Kafka in your python applications"
 
#Include os - From bootloader to REST API with the new C++
#Include os - From bootloader to REST API with the new C++#Include os - From bootloader to REST API with the new C++
#Include os - From bootloader to REST API with the new C++
 

Similar to [CCC'21] Evaluation of Work Stealing Algorithms

May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
Jeff Larkin
 
Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...
Lucas Leong
 

Similar to [CCC'21] Evaluation of Work Stealing Algorithms (20)

May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).Scaling mysql with python (and Docker).
Scaling mysql with python (and Docker).
 
Joblib: Lightweight pipelining for parallel jobs (v2)
Joblib:  Lightweight pipelining for parallel jobs (v2)Joblib:  Lightweight pipelining for parallel jobs (v2)
Joblib: Lightweight pipelining for parallel jobs (v2)
 
LinuxLabs 2017 talk: Container monitoring challenges
LinuxLabs 2017 talk: Container monitoring challengesLinuxLabs 2017 talk: Container monitoring challenges
LinuxLabs 2017 talk: Container monitoring challenges
 
Global Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the SealGlobal Interpreter Lock: Episode I - Break the Seal
Global Interpreter Lock: Episode I - Break the Seal
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Reverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande ModemReverse engineering Swisscom's Centro Grande Modem
Reverse engineering Swisscom's Centro Grande Modem
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
 
Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...
 
Serverless on OpenStack with Docker Swarm, Mistral, and StackStorm
Serverless on OpenStack with Docker Swarm, Mistral, and StackStormServerless on OpenStack with Docker Swarm, Mistral, and StackStorm
Serverless on OpenStack with Docker Swarm, Mistral, and StackStorm
 
Docker In the Bank
Docker In the BankDocker In the Bank
Docker In the Bank
 
Automating with NX-OS: Let's Get Started!
Automating with NX-OS: Let's Get Started!Automating with NX-OS: Let's Get Started!
Automating with NX-OS: Let's Get Started!
 
第二回CTF勉強会資料
第二回CTF勉強会資料第二回CTF勉強会資料
第二回CTF勉強会資料
 
Profiling the logwriter and database writer
Profiling the logwriter and database writerProfiling the logwriter and database writer
Profiling the logwriter and database writer
 
(Even more) Rapid App Development with RubyMotion
(Even more) Rapid App Development with RubyMotion(Even more) Rapid App Development with RubyMotion
(Even more) Rapid App Development with RubyMotion
 
Power of linked list
Power of linked listPower of linked list
Power of linked list
 
HPC Examples
HPC ExamplesHPC Examples
HPC Examples
 
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
 
Comp architecture : branch prediction
Comp architecture : branch predictionComp architecture : branch prediction
Comp architecture : branch prediction
 

More from Universidad de los Andes

An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...
Universidad de los Andes
 
[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects
[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects
[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects
Universidad de los Andes
 
[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile Apps[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile Apps
Universidad de los Andes
 
Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Universidad de los Andes
 

More from Universidad de los Andes (18)

An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...An expressive and modular layer activation mechanism for Context-Oriented Pro...
An expressive and modular layer activation mechanism for Context-Oriented Pro...
 
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
[FTfJP23] Points-to Analysis for Context-oriented Javascript Programs
 
[JIST] Programming language implementations for context-oriented self-adaptiv...
[JIST] Programming language implementations for context-oriented self-adaptiv...[JIST] Programming language implementations for context-oriented self-adaptiv...
[JIST] Programming language implementations for context-oriented self-adaptiv...
 
[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects
[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects
[CAIN'23] Prevalence of Code Smells in Reinforcement Learning Projects
 
[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile Apps[CIbSE2023] Cross-language clone detection for Mobile Apps
[CIbSE2023] Cross-language clone detection for Mobile Apps
 
Keeping Up! with LaTeX
Keeping Up! with LaTeXKeeping Up! with LaTeX
Keeping Up! with LaTeX
 
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
[JPDC,JCC@LMN22] Ad hoc systems Management and specification with distributed...
 
Generating Adaptations from the System Execution using Reinforcement Learning...
Generating Adaptations from the System Execution using Reinforcement Learning...Generating Adaptations from the System Execution using Reinforcement Learning...
Generating Adaptations from the System Execution using Reinforcement Learning...
 
Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...Language Abstractions and Techniques for Developing Collective Adaptive Syste...
Language Abstractions and Techniques for Developing Collective Adaptive Syste...
 
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary studyDoes Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
Does Neuron Coverage Matter for Deep Reinforcement Learning? A preliminary study
 
Learning run-time composition of interacting adaptations
Learning run-time composition of interacting adaptationsLearning run-time composition of interacting adaptations
Learning run-time composition of interacting adaptations
 
Distributed context Petri nets
Distributed context Petri netsDistributed context Petri nets
Distributed context Petri nets
 
CQL: declarative language for context activation
CQL: declarative language for context activationCQL: declarative language for context activation
CQL: declarative language for context activation
 
Generating software adaptations using machine learning
Generating software adaptations using machine learningGenerating software adaptations using machine learning
Generating software adaptations using machine learning
 
[Bachelor_project] Asignación de exámenes finales
[Bachelor_project] Asignación de exámenes finales[Bachelor_project] Asignación de exámenes finales
[Bachelor_project] Asignación de exámenes finales
 
Programming language techniques for adaptive software
Programming language techniques for adaptive softwareProgramming language techniques for adaptive software
Programming language techniques for adaptive software
 
Peace COrP: Learning to solve conflicts between contexts
Peace COrP: Learning to solve conflicts between contextsPeace COrP: Learning to solve conflicts between contexts
Peace COrP: Learning to solve conflicts between contexts
 
Emergent Software Services
Emergent Software ServicesEmergent Software Services
Emergent Software Services
 

Recently uploaded

Recently uploaded (20)

Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Basic Intentional Injuries Health Education
Basic Intentional Injuries Health EducationBasic Intentional Injuries Health Education
Basic Intentional Injuries Health Education
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

[CCC'21] Evaluation of Work Stealing Algorithms

  • 1. Juan Sebastián Numpaque - Nicolás Cardozo @ncardoz {js.numpaque10, n.cardozo}@uniandes.edu.co CCC’21 - 15 Congreso Colombiano de Computación- 22 al 26 de noviembre - (Virtual) Evaluation of Work Stealing Algorithms
  • 2. 2 Scheduling computation static Dynamic v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 P3 P2 P1 P4 P3 P2 P1 P4
  • 3. 2 Scheduling computation static Dynamic v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 P3 P2 P1 P4 P3 P2 P1 P4
  • 4. v3 v2 v1 3 Work stealing [Blumofe et al. Scheduling multithreaded computations by workstealing. 1995] v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 P3 P2 P1 P4 Idle processors steal tasks from processors with tasks in their queue
  • 5. v3 v2 v1 3 Work stealing [Blumofe et al. Scheduling multithreaded computations by workstealing. 1995] v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 P3 P2 P1 P4 Idle processors steal tasks from processors with tasks in their queue v3
  • 6. v3 v2 v1 3 Work stealing [Blumofe et al. Scheduling multithreaded computations by workstealing. 1995] v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 P3 P2 P1 P4 Idle processors steal tasks from processors with tasks in their queue v3 v2
  • 7. 4 Work stealing Work stealing presents an improvement with respect to dynamic scheduling with respect to: Automated work balancing Better Portability Scalability to the number of processors
  • 8. Work stealing algorithms are good, but how good are they?
  • 9. 6 Work stealing [Blumofe et al. Scheduling multithreaded computations by workstealing. 1995] V2 V3 V4 V5 Queue P1 Queue P2 Queue P3 Queue P4 P1 P2 P3 P4 V1 head
  • 10. 6 Work stealing [Blumofe et al. Scheduling multithreaded computations by workstealing. 1995] V3 V4 V5 Queue P1 Queue P2 Queue P3 Queue P4 P1 P2 P3 P4 V1 V2 head
  • 11. 6 Work stealing [Blumofe et al. Scheduling multithreaded computations by workstealing. 1995] V5 Queue P1 Queue P2 Queue P3 Queue P4 P1 P2 P3 P4 V1 V2 V3 V4 head
  • 12. 6 Work stealing [Blumofe et al. Scheduling multithreaded computations by workstealing. 1995] V5 Queue P1 Queue P2 Queue P3 Queue P4 P1 P2 P3 P4 V1 V2 V3 V4 head LIFO FIFO
  • 13. 7 Work stealing algorithms LIFO FIFO • A tasks’s children are enqueued at the back of the queue in the processor that executed the parent task • If the processor is idle, it takes the task at the queue’s head • Tasks are stolen from another processor’s queue head • A tasks’s children are enqueued at the head of the queue in the processor that executed the parent task • If the processor is idle, it takes the task at the queue’s head • Tasks are stolen from the back of another processor’s queue
  • 14. 8 Priority-based work stealing v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 Longest path over the computation nodes
  • 15. 8 Priority-based work stealing v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 v7 Longest path over the computation nodes
  • 16. 8 Priority-based work stealing v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 v7 v8 v13 Longest path over the computation nodes
  • 17. 8 Priority-based work stealing v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 v7 v3 v8 v13 Longest path over the computation nodes
  • 18. 8 Priority-based work stealing v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15 v7 v3 v8 v13 v6 v5 v7 v8 v13 Longest path over the computation nodes
  • 19. 9 Priority-based work stealing Tasks further away from the end node (v14) should take priority over tasks closer towards the end of the computation • A tasks’s children are enqueued at the back of the queue ordered by priority • If the processor is idle, it takes the task at the queue’s head • Tasks are stolen from another processor’s queue head v7 v6 v5 v9 v10 v8 v12 v11 v4 v3 v2 v1 v13 v14 v15
  • 20. • Performance of the algorithm depends on the way tasks are chosen (avoid possible bottlenecks!) • Classic algorithms are not fare
  • 21. 11 Evaluation We evaluate the performance and fairness of existing work stealing algorithms and our proposed approach 1. Generate a random computation DAGs graph nodes variate in [50, 1600] graph edges variate in density {0.2, 0.5, 0.8} 2.Scale the number of processors in the execution [1, 96] 3.Execute all the tasks in the DAG using each algorithm
  • 22. 12 Performance results https://flaglab.github.io/WorkStealingAlgorithms/ Execution time in ms 0 15 30 45 60 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 1 2 3 4 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 3 7 10 13 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 18 35 53 70 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO 8 processors 96 processors 32 processors density = 0.2
  • 23. 13 Performance results https://flaglab.github.io/WorkStealingAlgorithms/ Execution time in ms 0 13 25 38 50 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 2 3 5 6 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 3 7 10 13 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 30 60 90 120 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO 8 processors 96 processors 32 processors density = 0.5
  • 24. 14 Performance results https://flaglab.github.io/WorkStealingAlgorithms/ 8 processors 96 processors 32 processors Execution time in ms 0 13 25 38 50 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 2 5 7 9 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 20 40 60 80 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO Execution time in ms 0 225 450 675 900 No. of DAG nodes 50 100 200 400 800 1600 PRIO FIFO LIFO density = 0.8
  • 25. 15 Fairness results https://flaglab.github.io/WorkStealingAlgorithms/ Load No. of tasks 0 45 90 135 180 No. of processors 1 2 3 4 5 6 7 8 PRIO FIFO LIFO No. of tasks 0 40 80 120 160 No. of processors 1 2 3 4 5 6 7 8 PRIO FIFO LIFO No. of tasks 0 35 70 105 140 No. of processor 1 2 3 4 5 6 7 8 PRIO FIFO LIFO 0.2 density 0.5 density 0.8 density
  • 26. 16 Fairness results https://flaglab.github.io/WorkStealingAlgorithms/ Load No. of tasks 0 10 20 30 40 No. of processors 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 PRIO FIFO LIFO No. of tasks 0 40 80 120 160 No. of processors 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 PRIO FIFO LIFO No. of tasks 0 35 70 105 140 No. of processors 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 PRIO FIFO LIFO 0.2 density 0.5 density 0.8 density
  • 27. • FIFO falls short in the in both performance and balance at scale • LIFO scales better that other algorithms • Priority has a good performance but it can decay rapidly with many nodes, however it presents the best balance @ncardoz n.cardozo@uniandes.edu.co Conclusion https://flaglab.github.io
  • 28. • FIFO falls short in the in both performance and balance at scale • LIFO scales better that other algorithms • Priority has a good performance but it can decay rapidly with many nodes, however it presents the best balance @ncardoz n.cardozo@uniandes.edu.co Questions? Conclusion https://flaglab.github.io