Low Energy Task Scheduling based on Work Stealing

The LEGaTO project has received funding from the European Union’s Horizon 2020 research and innovation
programme under the grant agreement No 780681. www.legato-project.eu
Low Energy Task Scheduling based on Work Stealing
Jing Chen
Chalmers University of Technology
Directed Acyclic Graph (DAG)
n A task-based way to express
multithreading applications.
n Nodes are tasks.
n Edges are dependencies.
Asymmetric platforms feature
n High performance and power
hungry cores
n Energy efficient and small
cores
Dynamic Task Scheduling
Work stealing: better scalability in larger systems, less
communication contention than centralized scheme.
Performance Improvement NOT enough for energy reduction
DVFS: voltage and frequency scaling
n Users are usually not permitted to manipulate DVFS settings.
Overhead: tens
of 𝜇𝑠 to over
one 𝑚𝑠
Multithreading
Application: 𝜇𝑠
level fine-
grained tasks
NOT realistic to
use DVFS per
task
State-of-the-
art: Per-core
DVFS
Significant
hardware cost
(inductors and
capacitors)
Most systems
only feature
cluster-based
DVFS
State-of-the-
art: platform
complete
control
If some other
applications run
on same cluster
Badly influence
energy of these
applications
Low Energy Runtime Design
Power Profiling
n Help runtime understand CPU power consumption trends
(number/type of cores, different frequencies)
n We evaluate power profiling techniques:
(a) Directly sample power by accessing the onboard power sensor, e.g.
NVIDIA Jetson TX2 INA3221.
(b) Intel RAPL energy model, sample energy every fixed time, then:
Powern+1 = (Energyn+1 - Energyn) / (tn+1 - tn)
Dynamic Performance Modeling
n Provide accurate prediction for future task given a set of resources
n Independent of platforms and frequencies
n Achieve scalablity and portablity goals
Idleness Tracing
n Give the information about real-time status of cores
n Put cores to ”sleep” when it is under-utilized
n Sleeping time exploits backoff exponential strategy
n Provide the real-time parallel slackness of active cores =>
calculation of shared board static power on each running task
Task Mapping Algorithm (Per task level)
For a given configuration (Start core, number of cores):
n Performance Tracer => Execution Time Prediction
n Power Profiles => Dynamic Power Prediction
n Power Profiles + Idleness Tracer => Static Power Prediction
Energy Prediction = (Static Power + Dynamic Power) x Execution Time
Experimental Results
Name Acronym Notion
Random Work
Stealing (+Sleep)
RWS (+S) Typical greedy scheduling (enhanced with Sleep)
Fastest Cores
with Criticality
(+Sleep)
FCC (+S) Critical tasks are mapped to the set of cores that
minimize execution time and are not allowed
work stealing, noncritical tasks follow parent
queue and only search for the best number of
cores that minimize the execution time of the
task (enhanced with Sleep)
Lowest Cost with
Criticality
(+Sleep)
LCC (+S) The difference between LCC and FCC is that
minimizing execution time becomes minimizing
parallel cost. The parallel cost means ”execution
time * number of cores” (enhanced with Sleep)
Lowest Energy
without
Criticality
LENC Task scheduling targets lowest energy, no need
for criticality awareness
0
2
4
6
8
10
12
14
16
18
RWS RWS+Sleep FCC FCC+Sleep LCC LCC+Sleep LENC
EnergyConsumption[J]
x1000
2D-Heat on Haswell one node
1000 iterations, resolution=10240
0
100
200
300
400
500
600
MAX&MAX MAX&MIN MIN&MAX MIN&MIN
Energy[J]
VGG-16 on NVIDIA Jetson TX2
RWS RWS+S FCC FCC+S LCC LCC+S LENC
n MAX&MIN (x-axis) means on TX2, Denver cluster frequency is
maximum, A57 cluster frequency is minimum.
n LENC achieves lowest energy, e.g.31%-74% energy reduction than
RWS, 19%-68% than FCC, 25%-73% than LCC.
n Haswell is a symmetric platform, 2D-Heat includes two kernels:
copy (memory-bound) and stencil (compute-bound).
n Sleep strategy brings 38% energy reduction in RWS vs. RWS+S, 9%
in FCC vs. FCC+S, 33% in LCC vs. LCC+S.
n LENC achieves low energy task type awareness:
(a) Copy tasks choose number of cores=5
(b) Stencil tasks choose number of cores=10.
Background
The importance of task feature awareness:
n Naive assignment causes the mismatch of task types and
core types, e.g. compute-bound kernels using powerful
Denver cluster on TX2 is more energy efficient than using all.

Low Energy Task Scheduling based on Work Stealing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Low Energy Task Scheduling based on Work Stealing

Similar to Low Energy Task Scheduling based on Work Stealing (20)

More from LEGATO project

More from LEGATO project (20)

Recently uploaded

Recently uploaded (20)

Low Energy Task Scheduling based on Work Stealing