1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and
innovation programme under the grant agreement No 780681
LEGaTO: Low-Energy,
Heterogeneous Computing
Workshop
HIPEAC Autumn CSW 2020
Osman Unsal
16/October/2020
3. FPL 2020
The industry challenge
• Dennard scaling is dead
• Moore's law is slowing down
• Trend towards heterogeneous architectures
• Increasing focus on energy: ICT sector is responsible for 5%
of global energy consumption
• Creates a programmability challenge
4. FPL 2020
The future challenge of computing: MW, not FLOPS
4
“… without dramatic increases in
efficiency, ICT industry could use
20% of all electricity and emit up
to 5.5% of the world’s carbon
emissions by 2025.”
“We have a tsunami of data
approaching. Everything which
can be is being digitalised. It is a
perfect storm.”
“ … a single $1bn Apple data
centre planned for Athenry in Co
Galway, expects to eventually
use 300MW of electricity, or over
8% of the national capacity and
more than the daily entire usage
of Dublin. It will require 144
large diesel generators as back
up for when the wind does not
blow.”
5. FPL 2020
How did we get here?
5
Decades of exponential growth in performance
End of Dennard scaling
Moore’s Law is slowing down
Explore new architectures & models of computation
Exponential growth in demand & data
Move towards accelerators
6. FPL 2020
The End of Performance
• Frequency levels off, cores fill in the gap
http://doi.ieeecomputersociety.org/10.1109/MC.2015.368
7. FPL 2020
The real challenge: data movements
7
Pedram, Richardson, Galal, Kvatinsky, and Horowitz, “Dark Memory and Accelerator-
Rich System Optimization in the Dark Silicon Era”, April 2016, IEEE Design & Test
“As a result the most important optimization must be
done at the algorithm level, to reduce off- chip memory
accesses, to create Dark Memory. The algorithms must
first be (re)written for both locality and parallelism
before you tailor the hardware to accelerate them.”
“My hypothesis is that we can solve [the software crisis
in parallel computing], but only if we work from the
algorithm down to the hardware — not the traditional
hardware first mentality.” Tim Mattson, principal
engineer at Intel.
In agreement with Intel:
8. FPL 2020
FPGAs to the rescue?
• The model of computation is key
• Build ultra-deep, highly efficient pipelines
8
9. FPL 2020
LEGaTO Ambition
• Create software stack-support for energy-efficient
heterogeneous computing
o Starting with Made-in-Europe mature software stack, and optimizing
this stack to support energy-efficiency
o Computing on a commercial cutting-edge European-developed
heterogeneous hardware substrate with CPU + GPU + FPGA +
FPGA-based Dataflow Engines (DFE)
• Main goal: energy efficiency
14. FPL 2020
Use Cases
• Healthcare: Infection biomarkers
o Statistical search for biomarkers, which often
needs intensive computation. A biomarker is
a measurable value that can indicate the
state of an organism, and is often the
presence, absence or severity of a specific
disease
• Smart Home: Assisted Living
o The ability of the home to learn from the
users behavior and anticipate future behavior
is still an open task and necessary to obtain
a broad user acceptance of assisted living in
the general public
15. FPL 2020
Use Cases
• Smart City: operational urban
pollutant dispersion modelling
o Modeling city landscape + sensor data +
wind prediction to issue a “pollutant
weather prediction”
• Machine Learning: Automated driving
and graphics rendering
o Object detection using CNN networks for
automated driving systems and CNN-
and LSTM-based methods for realistic
rendering of graphics for gaming and
multi-camera systems
• Secure IoT Gateway
o Variety of sensors and actors in an
industrial and private surrounding
16. FPL 2020
Smart City
7x gain in energy efficiency
using FPGAs
Health Care
822x speedup using
FPGAs, enabling a new
world of Biomarker
analysis
Machine Learning
up to 16x gain in energy
efficiency and
performance on the same
hardware, using the
EmbeDL optimizer
Boosting Use Cases by Using LEGaTO toolchain
17. FPL 2020
Partial List of LEGaTO Contributions – detailed today
• Task-based dataflow runtimes:
o OmPss
o XiTAO
o MaxJ
o Dfiant
• Fault Tolerant Interface
o Integrated API for CPU/GPU/FPGA checkpointing
• HEATS scheduler
• FPGA undervolting
• Scone: Realizing full-potential of Intel SGX
• Secure task-based programming
• Malleability through node composition
• Smart mirror: 10X energy savings
o Using shared-memory programming style on distributed GPUs
• COM-HPC energy efficient small form factor standard
16.10.20 17
processing on massive scale will have a significant energy impact
MW will be new focus, not FLOPS
data centres need to reduce energy !
frequency scaling (or Dennard scaling) stopped a long time ago
the rely on more cores
seems to suggest parallelism is the answer
second issue has to do with data transfers
moving data off chip is extremely inefficient
even on chip has significant impact
end-game for computing is efficient data movements
for large scale compute, parallelism might not be the most efficient
assembly line model
not even a new idea, compute equivalent is dataflow
this view is Maxeler specific, but the solution is not
Maxeler more explicit to model and develop your application this way
here focus is performance but low energy very related