This work presents the evaluation of the two classic workstealing algorithms (FIFO and LIFO) together with a new proposed implementation based on the priority of tasks calculated using the longest path as a metric
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
[CCC'21] Evaluation of Work Stealing Algorithms
1. Juan Sebastián Numpaque - Nicolás Cardozo
@ncardoz
{js.numpaque10, n.cardozo}@uniandes.edu.co
CCC’21 - 15 Congreso Colombiano de Computación- 22 al 26 de noviembre - (Virtual)
Evaluation of Work Stealing Algorithms
4. v3 v2 v1
3
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
Idle processors steal tasks from processors with tasks in their queue
5. v3 v2 v1
3
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
Idle processors steal tasks from processors with tasks in their queue
v3
6. v3 v2 v1
3
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
P3
P2
P1 P4
Idle processors steal tasks from processors with tasks in their queue
v3 v2
7. 4
Work stealing
Work stealing presents an improvement with respect to dynamic
scheduling with respect to:
Automated work balancing
Better Portability
Scalability to the number of processors
9. 6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V2
V3
V4
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1
head
10. 6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V3
V4
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1 V2
head
11. 6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1 V2
V3
V4
head
12. 6
Work stealing
[Blumofe et al. Scheduling multithreaded computations by workstealing. 1995]
V5
Queue P1 Queue P2 Queue P3 Queue P4
P1 P2 P3 P4
V1 V2
V3
V4
head
LIFO FIFO
13. 7
Work stealing algorithms
LIFO
FIFO
• A tasks’s children are enqueued at the back of the queue in the
processor that executed the parent task
• If the processor is idle, it takes the task at the queue’s head
• Tasks are stolen from another processor’s queue head
• A tasks’s children are enqueued at the head of the queue in the
processor that executed the parent task
• If the processor is idle, it takes the task at the queue’s head
• Tasks are stolen from the back of another processor’s queue
19. 9
Priority-based work stealing
Tasks further away from the end node (v14) should take priority
over tasks closer towards the end of the computation
• A tasks’s children are enqueued at the back of the queue ordered
by priority
• If the processor is idle, it takes the task at the queue’s head
• Tasks are stolen from another processor’s queue head
v7
v6
v5
v9 v10
v8
v12
v11
v4
v3
v2
v1 v13 v14
v15
20. • Performance of the algorithm depends on the
way tasks are chosen (avoid possible
bottlenecks!)
• Classic algorithms are not fare
21. 11
Evaluation
We evaluate the performance and fairness of existing work
stealing algorithms and our proposed approach
1. Generate a random computation DAGs
graph nodes variate in [50, 1600]
graph edges variate in density {0.2, 0.5, 0.8}
2.Scale the number of processors in the execution [1, 96]
3.Execute all the tasks in the DAG using each algorithm
22. 12
Performance results
https://flaglab.github.io/WorkStealingAlgorithms/
Execution
time
in
ms
0
15
30
45
60
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
1
2
3
4
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
3
7
10
13
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
18
35
53
70
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
8 processors
96 processors
32 processors
density = 0.2
23. 13
Performance results
https://flaglab.github.io/WorkStealingAlgorithms/
Execution
time
in
ms
0
13
25
38
50
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
2
3
5
6
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
3
7
10
13
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
30
60
90
120
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
8 processors
96 processors
32 processors
density = 0.5
24. 14
Performance results
https://flaglab.github.io/WorkStealingAlgorithms/
8 processors
96 processors
32 processors
Execution
time
in
ms
0
13
25
38
50
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
2
5
7
9
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
20
40
60
80
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
Execution
time
in
ms
0
225
450
675
900
No. of DAG nodes
50 100 200 400 800 1600
PRIO FIFO LIFO
density = 0.8
27. • FIFO falls short in the in both performance and
balance at scale
• LIFO scales better that other algorithms
• Priority has a good performance but it can
decay rapidly with many nodes, however it
presents the best balance
@ncardoz n.cardozo@uniandes.edu.co
Conclusion
https://flaglab.github.io
28. • FIFO falls short in the in both performance and
balance at scale
• LIFO scales better that other algorithms
• Priority has a good performance but it can
decay rapidly with many nodes, however it
presents the best balance
@ncardoz n.cardozo@uniandes.edu.co
Questions?
Conclusion
https://flaglab.github.io