[CCC'21] Evaluation of Work Stealing Algorithms

Juan Sebastián Numpaque - Nicolás Cardozo
@ncardoz
{js.numpaque10, n.cardozo}@uniandes.edu.co
CCC’21 - 15 Congreso Colombiano de Computación- 22 al 26 de noviembre - (Virtual)
Evaluation of Work Stealing Algorithms

2
Scheduling computation
static Dynamic
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4 P3
P2
P1 P4

2
Scheduling computation
static Dynamic
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4 P3
P2
P1 P4

v3 v2 v1
3
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
Idle processors steal tasks from processors with tasks in their queue

v3 v2 v1
3
Work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
v3

v3 v2 v1
3
Work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
v3 v2

4
Work stealing
Work stealing presents an improvement with respect to dynamic
scheduling with respect to:
Automated work balancing
Better Portability
Scalability to the number of processors

Work stealing algorithms are good,
but how good are they?

6
Work stealing
V2
V3
V4
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1
head

6
Work stealing
V3
V4
V5
P1 P2 P3 P4
V1 V2
head

6
Work stealing
V5
P1 P2 P3 P4
V1 V2
V3
V4
head

6
Work stealing
V5
P1 P2 P3 P4
V1 V2
V3
V4
head
LIFO FIFO

7
Work stealing algorithms
LIFO
FIFO
• A tasks’s children are enqueued at the back of the queue in the
processor that executed the parent task
• If the processor is idle, it takes the task at the queue’s head
• Tasks are stolen from another processor’s queue head
• A tasks’s children are enqueued at the head of the queue in the
processor that executed the parent task
• Tasks are stolen from the back of another processor’s queue

8
Priority-based work stealing
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
Longest path over the computation nodes

8
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7

8
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7 v8 v13

8
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7
v3 v8 v13

8
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
v7
v3 v8 v13
v6
v5 v7 v8 v13

9
Tasks further away from the end node (v14) should take priority
over tasks closer towards the end of the computation
• A tasks’s children are enqueued at the back of the queue ordered
by priority
• Tasks are stolen from another processor’s queue head
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15

• Performance of the algorithm depends on the
way tasks are chosen (avoid possible
bottlenecks!)
• Classic algorithms are not fare

11
Evaluation
We evaluate the performance and fairness of existing work
stealing algorithms and our proposed approach
1. Generate a random computation DAGs
graph nodes variate in [50, 1600]
graph edges variate in density {0.2, 0.5, 0.8}
2.Scale the number of processors in the execution [1, 96]
3.Execute all the tasks in the DAG using each algorithm

12
Performance results
https://flaglab.github.io/WorkStealingAlgorithms/
Execution
time
in
ms
0
15
30
45
60
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
1
2
3
4
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
3
7
10
13
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
18
35
53
70
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
8 processors
96 processors
32 processors
density = 0.2

13
Performance results
Execution
time
in
ms
0
13
25
38
50
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
2
3
5
6
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
3
7
10
13
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
30
60
90
120
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
8 processors
96 processors
32 processors
density = 0.5

14
Performance results
8 processors
96 processors
32 processors
Execution
time
in
ms
0
13
25
38
50
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
2
5
7
9
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
20
40
60
80
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
225
450
675
900
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
density = 0.8

15
Fairness results
Load
No.
of
tasks
0
45
90
135
180
No. of processors
1 2 3 4 5 6 7 8
PRIO FIFO LIFO
No.
of
tasks
0
40
80
120
160
No. of processors
1 2 3 4 5 6 7 8
PRIO FIFO LIFO
No.
of
tasks
0
35
70
105
140
No. of processor
1 2 3 4 5 6 7 8
PRIO FIFO LIFO
0.2 density 0.5 density
0.8 density

16
Fairness results
Load
No.
of
tasks
0
10
20
30
40
No. of processors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
PRIO FIFO LIFO
No.
of
tasks
0
40
80
120
160
No. of processors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
PRIO FIFO LIFO
No.
of
tasks
0
35
70
105
140
No. of processors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
PRIO FIFO LIFO
0.2 density 0.5 density
0.8 density

• FIFO falls short in the in both performance and
balance at scale
• LIFO scales better that other algorithms
• Priority has a good performance but it can
decay rapidly with many nodes, however it
presents the best balance
@ncardoz n.cardozo@uniandes.edu.co
Conclusion
https://flaglab.github.io

• FIFO falls short in the in both performance and
balance at scale
• LIFO scales better that other algorithms
• Priority has a good performance but it can
decay rapidly with many nodes, however it
presents the best balance
@ncardoz n.cardozo@uniandes.edu.co
Questions?
Conclusion
https://flaglab.github.io

[CCC'21] Evaluation of Work Stealing Algorithms

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [CCC'21] Evaluation of Work Stealing Algorithms

Similar to [CCC'21] Evaluation of Work Stealing Algorithms (20)

More from Universidad de los Andes

More from Universidad de los Andes (18)

Recently uploaded

Recently uploaded (20)

[CCC'21] Evaluation of Work Stealing Algorithms