A low energy toolset
for heterogeneous
computing
LEGaTO Ambition
• Optimizing OmpSs to support energy-efficiency
• Working on RECS|Box hardware with
CPU + GPU + FPGA + FPGA-based Dataflow Engines (DFE)
Create software stack-support for energy-efficient heterogeneous computing
Main goal: energy efficiency
• Fault tolerance
• Security
• Programmer productivity
Boosted Use-Cases
Smart City
7x gain in energy efficiency
using FPGAs
Health Care
822x speedup using
FPGAs, enabling a new
world of Biomarker
analysis
Machine Learning
up to 16x gain in energy
efficiency and
performance on the same
hardware, using the
EmbeDL optimizer
Smart Mirror
• Image: Object, face and gestures
• Speech with DeepSpeech
Displays personalized information
All detections simultaniously
• Start: 16 FPS at 500 W
• Now: 25 FPS at 400 W
• Goal: 10 FPS at 50 W
Local machine learning frameworks
Cluster Server
• 56-63% less TCO over 5 years
• Heterogeneous, flexible and scalable microserver cluster
• Turnkey appliances optimised to relevant classes of applications
X86 and ARM CPU, GPU, FPGA
TCO reduced by > 50 % after 5 years
Reduced TCO with RECS®|Box Deneb
 Lowered upgrade costs by replacing
only the microservers in regular
upgrade cycles
 Reduced data centre footprint
through ultra-high density of
computation power
 Less energy costs by superior energy-
efficiency
LEGaTO Partners

Embedded World 2020

  • 1.
    A low energytoolset for heterogeneous computing
  • 2.
    LEGaTO Ambition • OptimizingOmpSs to support energy-efficiency • Working on RECS|Box hardware with CPU + GPU + FPGA + FPGA-based Dataflow Engines (DFE) Create software stack-support for energy-efficient heterogeneous computing Main goal: energy efficiency • Fault tolerance • Security • Programmer productivity
  • 3.
    Boosted Use-Cases Smart City 7xgain in energy efficiency using FPGAs Health Care 822x speedup using FPGAs, enabling a new world of Biomarker analysis Machine Learning up to 16x gain in energy efficiency and performance on the same hardware, using the EmbeDL optimizer
  • 4.
    Smart Mirror • Image:Object, face and gestures • Speech with DeepSpeech Displays personalized information All detections simultaniously • Start: 16 FPS at 500 W • Now: 25 FPS at 400 W • Goal: 10 FPS at 50 W Local machine learning frameworks
  • 5.
    Cluster Server • 56-63%less TCO over 5 years • Heterogeneous, flexible and scalable microserver cluster • Turnkey appliances optimised to relevant classes of applications X86 and ARM CPU, GPU, FPGA
  • 6.
    TCO reduced by> 50 % after 5 years Reduced TCO with RECS®|Box Deneb  Lowered upgrade costs by replacing only the microservers in regular upgrade cycles  Reduced data centre footprint through ultra-high density of computation power  Less energy costs by superior energy- efficiency
  • 7.