SlideShare a Scribd company logo
The LEGaTO project has received funding from the European Union's Horizon 2020 research
and innovation programme under the grant agreement No 780681
LEGaTO: Low-Energy, Secure,
and Resilient Toolset for
Heterogeneous Computing
DATE 2020
Valerio Schiavoni (Uni. Neuchatel)
11/March/2020
DATE 2020
The industry challenge
• Dennard scaling is dead
• Moore's law is slowing down
• Trend towards heterogeneous architectures
• Increasing focus on energy: ICT sector is responsible for
5% of global energy consumption
• Creates a programmability challenge
DATE 2020
LEGaTO Ambition
• Create software stack-support for energy-efficient
heterogeneous computing
o Starting with Made-in-Europe mature software stack, and
optimizing this stack to support energy-efficiency
o Computing on a commercial cutting-edge European-developed
heterogeneous hardware substrate with CPU + GPU + FPGA +
FPGA-based Dataflow Engines (DFE)
• Main goal: energy efficiency
DATE 2020
The future challenge of computing: MW, not FLOPS
4
“… without dramatic increases
in efficiency, ICT industry could
use 20% of all electricity and
emit up to 5.5% of the world’s
carbon emissions by 2025.”
“We have a tsunami of data
approaching. Everything which
can be is being digitalised. It is
a perfect storm.”
“ … a single $1bn Apple data
centre planned for Athenry in Co
Galway, expects to eventually
use 300MW of electricity, or
over 8% of the national capacity
and more than the daily entire
usage of Dublin. It will require
144 large diesel generators as
back up for when the wind does
not blow.”
DATE 2020
How did we get here?
5
Decades of exponential growth in performance
End of Dennard scaling
Moore’s Law is slowing down
Explore new architectures & models of computation
Exponential growth in demand & data
Move towards accelerators
DATE 2020
The End of Performance
• Frequency levels off, cores fill in the gap
http://doi.ieeecomputersociety.org/10.1109/MC.2015.368
DATE 2020
The real challenge: data movements
7
Pedram, Richardson, Galal, Kvatinsky, and Horowitz, “Dark Memory and Accelerator-
Rich System Optimization in the Dark Silicon Era”, April 2016, IEEE Design & Test
“As a result the most important optimization must be
done at the algorithm level, to reduce off- chip memory
accesses, to create Dark Memory. The algorithms must
first be (re)written for both locality and parallelism
before you tailor the hardware to accelerate them.”
“My hypothesis is that we can solve [the software crisis
in parallel computing], but only if we work from the
algorithm down to the hardware — not the traditional
hardware first mentality.” Tim Mattson, principal
engineer at Intel.
In agreement with Intel:
DATE 2020
FPGAs to the rescue?
• The model of computation is key
• Build ultra-deep, highly efficient pipelines
8
DATE 2020
LEGaTO Objectives
19.03.20 9
DATE 2020
Partners
DATE 2020
Overview
The LEGaTO Compute Stack
DATE 2020
Hardware Platforms
• Christmann (RECS Box)
o Heterogeneous platform: Supports CPU (X86 or Arm), GPU
(Nvidia) and FPGA (Xillinx and Altera)
• MAXELER
o Maxeler DFE platforms and toolset
• Heterogeneous microserver approach, integrating CPU, GPU and FPGA
technology
• High-speed, low latency communication infrastructure,
enabling accelerators, scalable across multiple servers
• Modular, blade-style design, hot pluggable/swappable
19/03/2020 14
RECS|Box overview
Concept
19/03/2020 15
• Dedicated network for
monitoring, control and
iKVM
• Multiple 1Gb/10Gb
Ethernet links per
Microserver – integrated 40
Gb Ethernet backbone
• Dedicated high-speed, low-
latency communication
infrastructure
• Host-2-Host PCIe between
CPUs
• Switched high-speed
serial lanes between
FPGAs
• Enables pooling of
accelerators
and assignment to
different hosts
Baseboard
Baseboard
Baseboard
3/16
Microserver
s
per
Baseboard
Microserver
Module
CPU Mem I/O
Microserver
Module
Microserver
Module
Backplane
Up to 15 Baseboards per
Server
Distributed
Monitoring
and KVM
Distributed
Monitoring
and KVM
Ethernet
Communication
Infrastructure
Ethernet
Communication
Infrastructure
High-speed
Low-latency
Communication
Storage /
I/O-Ext.
High-speed
Low-latency
Communication
RECS|Box overview
Architecture
CPU Mem I/O CPU Mem I/O
19/03/2020 16
RECS|Box overview
Microserver
Intel x86
Arm Server SoCs
Xilinx FPGAs
Intel FPGAs
Low-Power
Microserver
High-Performance
Microserver
CPU Microserver FPGA MicroserverGPGPU
NVIDIA Jetson TX1/TX2
4 Core A57@1.73 GHz + Maxwell GPGPU@1.5 GHz
2 Core Denver + 4 Core A57 + Pascal GPGPU@1.5 GHz
ARMv8 Server SoC
32 Core A72@2.1 GHz
NVIDIA Tesla P100/V100
Pascal/Volta GPGPU@1.3 GHz Intel Stratix 10 SoC
2,800 kLE
Xilinx Zynq 7020
85 kLC
PCIe-Extensions
PCIe SSD
Xeon Phi
RAPTOR
FPGA acc.
DATE 2020
Maxeler Dataflow Computing Systems
MPC-X3000
• 8 Dataflow Engines in 1U
• Up to 1TB of DFE RAM
• Zero-copy RDMA
• Dynamic allocation of DFEs to
conventional CPU servers through
Infiniband
• Equivalent performance to
20-50 x86 servers
DATE 2020
Dataflow System at Jülich Supercomputing Center
• Pilot system deployed in October 2017 as part of PRACE PCP
• System configuration
o 1U Maxeler MPC-X with 8 MAX5 DFEs
o 1U AMD EPYC server
• 2x AMD 7601 EPYC CPUs
• 1 TB RAM
• 9.6 TB SSD (data storage)
o one 1U login head node
• Runs 4 scientific workloads
18
MPC-X node
Remote users
MAX5 DFE EPYC CPU
1TB DDR4
Head/Build node
ipmi
56 Gbps 2x Infiniband @ 56 Gbps
10 Gbps
10 Gbps
Supermicro EPYC node
http://www.prace-ri.eu/pcp/
DATE 2020
Scaling Dataflow Computing into the Cloud
• MAX5 DFEs are compatible with Amazon EC2 F1
instances
• MaxCompiler supports programming F1
• Hybrid Cloud Model:
o On-premise Dataflow system from Maxeler
o Elastically expand to the Amazon Cloud when
needed
19
On-premise
Cloud
DATE 2020
Programming Models and Runtimes:
Task based programming + Dataflow!
• OmpSs / OmpSs@FPGA (BSC)
• MaxCompiler (Maxeler)
• DFiant HLS (TECHNION)
• XiTAO (CHALMERS)
19 March, 2020 20
DATE 2020
21
Towards a single source for any target
• A need if we expect programmers to survive
o Architectures appear like mushrooms
• Productivity
o Develop once → run everywhere
• Performance
• Key concept behind OmpSs
o Sequential task based program on single address/name space
o + directionality annotations
o Executed in parallel: Automatic runtime computation of dependences among
tasks
o LEGaTO: Extend tasks with resource requirements, propagate through the
stack to find the most energy efficient solution at run time
22
Example for OmpSs@FPGA (Matrix multiply with
OmpSs and Vivado HLS annotations)
DATE 2020
23
AXIOM board:
Xilinx Zynq Ultrascale+ chip,
with 4 ARM Cortex-A53 cores,
and the ZU9EG FPGA.
IPs configuration 1*256, 3*128 Number of instances * size
Frequency (MHz) 200, 250, 300 Working frequency of the FPGA
Number of SMP cores SMP: 1 to 4
FPGA: 3+1 helper, 2+2 helpers
Combination of SMP and helper threads
Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage tasks on
the FPGA
Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores before
waiting for their finalization
OmpSs@FPGA Experiments (Matrix multiply)
DATE 2020
DFiant HLS
• Aims to bridge the programmability gap by combining constructs and
semantics from software, hardware and dataflow languages
• Programming model accommodates a middle-ground between low-
level HDL and high-level sequential programming
High-Level Synthesis
Languages and Tools
(e.g., C and Vivado HLS)
Register-Transfer Level
HDLs
(e.g., VHDL)
DFiant: A Dataflow HDL
 Automatic pipelining
 Not an HDL
 Problem with state
 Separating timing
from functionality
 Concurrency
 Fine-grain control
 Automatic pipelining Concurrency
 Fine-grain control
 Bound to clock
 Explicit pipelining
DATE 2020
PCI
Express
Manager
DFE
Memory
MyManager (.maxj)
x
x
+
30
x
Manager m = new Manager();
Kernel k =
new MyKernel();
m.setKernel(k);
m.setIO(
link(“x", CPU),
m.build();
link(“y", CPU));
MyKernel(DATA_SIZE,
x, DATA_SIZE*4,
y, DATA_SIZE*4);
for (int i =0; i < DATA_SIZE; i++)
y[i]= x[i] * x[i] + 30;
Main
Memory
CPU
CPU
Code
Host Code (.c)
MaxCompiler: Application Development
25
SLiC
MaxelerOS
DFEVar x = io.input("x", dfeInt(32));
DFEVar result = x * x + 30;
io.output("y", result, dfeInt(32));
MyKernel (.maxj)
int*x, *y;
y
x
x
+
30
y
x
Task-based kernel identification/DFE mapping
OmpSs identifies “static” task graphs while running and DFE kilo-task instance is
selected
Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines
DataFlow Engines
(DFEs) produce
one pair of results
at each clock cycle
XiTAO: Generalized Tasks
Reduces Overheads
Reduces Parallel Slackness
Overcommits memory resources
 Improve Parallel Slackness by
reducing dimensionality
 Reduce Cache/BW pressure by
allocating multiple resources
OmpSs + XiTao
19.03.20 28
Extend the XiTAO with
support for heterogeneous
resources
Why extremely low-voltage for FPGAs?
Why Energy efficiency through Undervolting?
• Still way to go for <20MW@exascale
• FPGAs are at least 20X less power- and energy-efficient than ASICs.
• Undervolting directly delivers dynamic and static power/energy reduction.
• Note: undervolting is not DVFS
• Frequency is not lowered!
FPGA Undervolting Preliminary Experiments
Voltage scaling below standard nominal level, in on-chip
memories of Xilinx technology:
o Power quadratically decreases, deliver savings of more than 10X.
o A conservative voltage guardband exists below nominal.
o Fault exponentially increases, below the guardband.
o Faults can be detected / managed.
DATE 2020
Use Cases
• Healthcare: Infection biomarkers
o Statistical search for biomarkers, which often
needs intensive computation. A biomarker is
a measurable value that can indicate the
state of an organism, and is often the
presence, absence or severity of a specific
disease
• Smart Home: Assisted Living
o The ability of the home to learn from the
users behavior and anticipate future behavior
is still an open task and necessary to obtain
a broad user acceptance of assisted living in
the general public
DATE 2020
Use Cases
• Smart City: operational urban
pollutant dispersion modelling
o Modeling city landscape + sensor data +
wind prediction to issue a “pollutant
weather prediction”
• Machine Learning: Automated driving
and graphics rendering
o Object detection using CNN networks for
automated driving systems and CNN-
and LSTM-based methods for realistic
rendering of graphics for gaming and
multi-camera systems
• Secure IoT Gateway
o Variety of sensors and actors in an
industrial and private surrounding
DATE 2020
Smart City
7x gain in energy efficiency
using FPGAs
Health Care
822x speedup using
FPGAs, enabling a new
world of Biomarker
analysis
Machine Learning
up to 16x gain in energy
efficiency and
performance on the same
hardware, using the
EmbeDL optimizer
Boosting Use Cases by Using LEGaTO toolchain
DATE 2020
o Image: Object, face and gestures
o Speech with DeepSpeech
Displays personalized
information
All detections simultaniously
• Start: 16 FPS at 500 W
• Now: 25 FPS at 400 W
• Goal: 10 FPS at 50 W
Local machine learning
frameworks
Smart Mirror Use Case
Questions?
https://legato-project.eu/
legato-project@bsc.es

More Related Content

What's hot

Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
inside-BigData.com
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
Edge AI and Vision Alliance
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
Edge AI and Vision Alliance
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
Putchong Uthayopas
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
Edge AI and Vision Alliance
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlap
inside-BigData.com
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
Ganesan Narayanasamy
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
RCCSRENKEI
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
Volodymyr Saviak
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
Edge AI and Vision Alliance
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
Ganesan Narayanasamy
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Edge AI and Vision Alliance
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
NVIDIA Taiwan
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangAi Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
NVIDIA Taiwan
 
SGI HPC Update for June 2013
SGI HPC Update for June 2013SGI HPC Update for June 2013
SGI HPC Update for June 2013
inside-BigData.com
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...
RCCSRENKEI
 

What's hot (20)

Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlap
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangAi Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
 
SGI HPC Update for June 2013
SGI HPC Update for June 2013SGI HPC Update for June 2013
SGI HPC Update for June 2013
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...
 

Similar to DATE 2020: Design, Automation and Test in Europe Conference

SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
LEGATO project
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
LEGATO project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
LEGATO project
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
Alison B. Lowndes
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
VigneshwarRamaswamy
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
Wilhelm van Belkum
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
CastLabKAIST
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT Project
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
Bill Wong
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Edge AI and Vision Alliance
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
Shinnosuke Furuya
 
uCluster
uClusteruCluster
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
mustafa sarac
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
OpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in ZurichOpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in Zurich
Ganesan Narayanasamy
 

Similar to DATE 2020: Design, Automation and Test in Europe Conference (20)

SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
uCluster
uClusteruCluster
uCluster
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
OpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in ZurichOpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in Zurich
 

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
LEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
LEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
LEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
LEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
LEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
LEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGATO project
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
LEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
LEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
LEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
LEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
LEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
LEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
LEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
LEGATO project
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
LEGATO project
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat data
LEGATO project
 

More from LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat data
 

Recently uploaded

atom, elements, molecule and compounds in telugu.pptx
atom, elements, molecule and compounds in telugu.pptxatom, elements, molecule and compounds in telugu.pptx
atom, elements, molecule and compounds in telugu.pptx
ManjulaVani3
 
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
Faculty of Applied Chemistry and Materials Science
 
MARIGREEN PROJECT - overview, Oana Cristina Pârvulescu
MARIGREEN PROJECT - overview, Oana Cristina PârvulescuMARIGREEN PROJECT - overview, Oana Cristina Pârvulescu
MARIGREEN PROJECT - overview, Oana Cristina Pârvulescu
Faculty of Applied Chemistry and Materials Science
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
J. Bovas Joel BFSc
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
Thane Heins
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 
Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)
vanshgarg8002
 
Speed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion modelsSpeed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion models
sosukeito
 
Rapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd SannanRapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd Sannan
Faculty of Applied Chemistry and Materials Science
 
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
Faculty of Applied Chemistry and Materials Science
 
Ancient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , BiogenesisAncient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , Biogenesis
SoniaBajaj10
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Faculty of Applied Chemistry and Materials Science
 
GENE THERAPY [Autosaved].pptx A brief explanation about Gene Therapy
GENE THERAPY [Autosaved].pptx A brief explanation about Gene TherapyGENE THERAPY [Autosaved].pptx A brief explanation about Gene Therapy
GENE THERAPY [Autosaved].pptx A brief explanation about Gene Therapy
shubhamve111yadav
 
Structure of Sperm / Spermatozoon .pdf
Structure of  Sperm / Spermatozoon  .pdfStructure of  Sperm / Spermatozoon  .pdf
Structure of Sperm / Spermatozoon .pdf
SELF-EXPLANATORY
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
Faculty of Applied Chemistry and Materials Science
 
Synopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic SpecimenSynopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic Specimen
Sérgio Sacani
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
Faculty of Applied Chemistry and Materials Science
 
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Sérgio Sacani
 
Post RN - Biochemistry (Unit 1) Basic concept of Chemistry
Post RN - Biochemistry (Unit 1) Basic concept of ChemistryPost RN - Biochemistry (Unit 1) Basic concept of Chemistry
Post RN - Biochemistry (Unit 1) Basic concept of Chemistry
Areesha Ahmad
 

Recently uploaded (20)

atom, elements, molecule and compounds in telugu.pptx
atom, elements, molecule and compounds in telugu.pptxatom, elements, molecule and compounds in telugu.pptx
atom, elements, molecule and compounds in telugu.pptx
 
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
End of pipe treatment: Unlocking the potential of RAS waste - Carlos Octavio ...
 
MARIGREEN PROJECT - overview, Oana Cristina Pârvulescu
MARIGREEN PROJECT - overview, Oana Cristina PârvulescuMARIGREEN PROJECT - overview, Oana Cristina Pârvulescu
MARIGREEN PROJECT - overview, Oana Cristina Pârvulescu
 
Potential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptxPotential of Marine renewable and Non renewable energy.pptx
Potential of Marine renewable and Non renewable energy.pptx
 
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
PART 1 & PART 2 The New Natural Principles of Newtonian Mechanics, Electromec...
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 
Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)Introduction to Space (Our Solar System)
Introduction to Space (Our Solar System)
 
Speed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion modelsSpeed-accuracy trade-off for the diffusion models
Speed-accuracy trade-off for the diffusion models
 
Rapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd SannanRapid pulse drying of marine biomasses - Sigurd Sannan
Rapid pulse drying of marine biomasses - Sigurd Sannan
 
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
AlgaeBrew project - Unlocking the potential of microalgae for the valorisatio...
 
Ancient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , BiogenesisAncient Theory, Abiogenesis , Biogenesis
Ancient Theory, Abiogenesis , Biogenesis
 
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen BergstedtFish in the Loop: Exploring RAS - Julie Hansen Bergstedt
Fish in the Loop: Exploring RAS - Julie Hansen Bergstedt
 
GENE THERAPY [Autosaved].pptx A brief explanation about Gene Therapy
GENE THERAPY [Autosaved].pptx A brief explanation about Gene TherapyGENE THERAPY [Autosaved].pptx A brief explanation about Gene Therapy
GENE THERAPY [Autosaved].pptx A brief explanation about Gene Therapy
 
Structure of Sperm / Spermatozoon .pdf
Structure of  Sperm / Spermatozoon  .pdfStructure of  Sperm / Spermatozoon  .pdf
Structure of Sperm / Spermatozoon .pdf
 
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
Accessing Data to Support Pesticide Residue and Emerging Contaminant Analysis...
 
Analytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina BujorAnalytical methods for blue residues characterization - Oana Crina Bujor
Analytical methods for blue residues characterization - Oana Crina Bujor
 
Synopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic SpecimenSynopsis: Analysis of a Metallic Specimen
Synopsis: Analysis of a Metallic Specimen
 
Traditional, current and future use of fish and seaweed for fertilisation - ...
Traditional, current and future use of fish and seaweed for fertilisation -  ...Traditional, current and future use of fish and seaweed for fertilisation -  ...
Traditional, current and future use of fish and seaweed for fertilisation - ...
 
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...Surface properties of the seas of Titan as revealed by Cassini mission bistat...
Surface properties of the seas of Titan as revealed by Cassini mission bistat...
 
Post RN - Biochemistry (Unit 1) Basic concept of Chemistry
Post RN - Biochemistry (Unit 1) Basic concept of ChemistryPost RN - Biochemistry (Unit 1) Basic concept of Chemistry
Post RN - Biochemistry (Unit 1) Basic concept of Chemistry
 

DATE 2020: Design, Automation and Test in Europe Conference

  • 1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing DATE 2020 Valerio Schiavoni (Uni. Neuchatel) 11/March/2020
  • 2. DATE 2020 The industry challenge • Dennard scaling is dead • Moore's law is slowing down • Trend towards heterogeneous architectures • Increasing focus on energy: ICT sector is responsible for 5% of global energy consumption • Creates a programmability challenge
  • 3. DATE 2020 LEGaTO Ambition • Create software stack-support for energy-efficient heterogeneous computing o Starting with Made-in-Europe mature software stack, and optimizing this stack to support energy-efficiency o Computing on a commercial cutting-edge European-developed heterogeneous hardware substrate with CPU + GPU + FPGA + FPGA-based Dataflow Engines (DFE) • Main goal: energy efficiency
  • 4. DATE 2020 The future challenge of computing: MW, not FLOPS 4 “… without dramatic increases in efficiency, ICT industry could use 20% of all electricity and emit up to 5.5% of the world’s carbon emissions by 2025.” “We have a tsunami of data approaching. Everything which can be is being digitalised. It is a perfect storm.” “ … a single $1bn Apple data centre planned for Athenry in Co Galway, expects to eventually use 300MW of electricity, or over 8% of the national capacity and more than the daily entire usage of Dublin. It will require 144 large diesel generators as back up for when the wind does not blow.”
  • 5. DATE 2020 How did we get here? 5 Decades of exponential growth in performance End of Dennard scaling Moore’s Law is slowing down Explore new architectures & models of computation Exponential growth in demand & data Move towards accelerators
  • 6. DATE 2020 The End of Performance • Frequency levels off, cores fill in the gap http://doi.ieeecomputersociety.org/10.1109/MC.2015.368
  • 7. DATE 2020 The real challenge: data movements 7 Pedram, Richardson, Galal, Kvatinsky, and Horowitz, “Dark Memory and Accelerator- Rich System Optimization in the Dark Silicon Era”, April 2016, IEEE Design & Test “As a result the most important optimization must be done at the algorithm level, to reduce off- chip memory accesses, to create Dark Memory. The algorithms must first be (re)written for both locality and parallelism before you tailor the hardware to accelerate them.” “My hypothesis is that we can solve [the software crisis in parallel computing], but only if we work from the algorithm down to the hardware — not the traditional hardware first mentality.” Tim Mattson, principal engineer at Intel. In agreement with Intel:
  • 8. DATE 2020 FPGAs to the rescue? • The model of computation is key • Build ultra-deep, highly efficient pipelines 8
  • 13. DATE 2020 Hardware Platforms • Christmann (RECS Box) o Heterogeneous platform: Supports CPU (X86 or Arm), GPU (Nvidia) and FPGA (Xillinx and Altera) • MAXELER o Maxeler DFE platforms and toolset
  • 14. • Heterogeneous microserver approach, integrating CPU, GPU and FPGA technology • High-speed, low latency communication infrastructure, enabling accelerators, scalable across multiple servers • Modular, blade-style design, hot pluggable/swappable 19/03/2020 14 RECS|Box overview Concept
  • 15. 19/03/2020 15 • Dedicated network for monitoring, control and iKVM • Multiple 1Gb/10Gb Ethernet links per Microserver – integrated 40 Gb Ethernet backbone • Dedicated high-speed, low- latency communication infrastructure • Host-2-Host PCIe between CPUs • Switched high-speed serial lanes between FPGAs • Enables pooling of accelerators and assignment to different hosts Baseboard Baseboard Baseboard 3/16 Microserver s per Baseboard Microserver Module CPU Mem I/O Microserver Module Microserver Module Backplane Up to 15 Baseboards per Server Distributed Monitoring and KVM Distributed Monitoring and KVM Ethernet Communication Infrastructure Ethernet Communication Infrastructure High-speed Low-latency Communication Storage / I/O-Ext. High-speed Low-latency Communication RECS|Box overview Architecture CPU Mem I/O CPU Mem I/O
  • 16. 19/03/2020 16 RECS|Box overview Microserver Intel x86 Arm Server SoCs Xilinx FPGAs Intel FPGAs Low-Power Microserver High-Performance Microserver CPU Microserver FPGA MicroserverGPGPU NVIDIA Jetson TX1/TX2 4 Core A57@1.73 GHz + Maxwell GPGPU@1.5 GHz 2 Core Denver + 4 Core A57 + Pascal GPGPU@1.5 GHz ARMv8 Server SoC 32 Core A72@2.1 GHz NVIDIA Tesla P100/V100 Pascal/Volta GPGPU@1.3 GHz Intel Stratix 10 SoC 2,800 kLE Xilinx Zynq 7020 85 kLC PCIe-Extensions PCIe SSD Xeon Phi RAPTOR FPGA acc.
  • 17. DATE 2020 Maxeler Dataflow Computing Systems MPC-X3000 • 8 Dataflow Engines in 1U • Up to 1TB of DFE RAM • Zero-copy RDMA • Dynamic allocation of DFEs to conventional CPU servers through Infiniband • Equivalent performance to 20-50 x86 servers
  • 18. DATE 2020 Dataflow System at Jülich Supercomputing Center • Pilot system deployed in October 2017 as part of PRACE PCP • System configuration o 1U Maxeler MPC-X with 8 MAX5 DFEs o 1U AMD EPYC server • 2x AMD 7601 EPYC CPUs • 1 TB RAM • 9.6 TB SSD (data storage) o one 1U login head node • Runs 4 scientific workloads 18 MPC-X node Remote users MAX5 DFE EPYC CPU 1TB DDR4 Head/Build node ipmi 56 Gbps 2x Infiniband @ 56 Gbps 10 Gbps 10 Gbps Supermicro EPYC node http://www.prace-ri.eu/pcp/
  • 19. DATE 2020 Scaling Dataflow Computing into the Cloud • MAX5 DFEs are compatible with Amazon EC2 F1 instances • MaxCompiler supports programming F1 • Hybrid Cloud Model: o On-premise Dataflow system from Maxeler o Elastically expand to the Amazon Cloud when needed 19 On-premise Cloud
  • 20. DATE 2020 Programming Models and Runtimes: Task based programming + Dataflow! • OmpSs / OmpSs@FPGA (BSC) • MaxCompiler (Maxeler) • DFiant HLS (TECHNION) • XiTAO (CHALMERS) 19 March, 2020 20
  • 21. DATE 2020 21 Towards a single source for any target • A need if we expect programmers to survive o Architectures appear like mushrooms • Productivity o Develop once → run everywhere • Performance • Key concept behind OmpSs o Sequential task based program on single address/name space o + directionality annotations o Executed in parallel: Automatic runtime computation of dependences among tasks o LEGaTO: Extend tasks with resource requirements, propagate through the stack to find the most energy efficient solution at run time
  • 22. 22 Example for OmpSs@FPGA (Matrix multiply with OmpSs and Vivado HLS annotations)
  • 23. DATE 2020 23 AXIOM board: Xilinx Zynq Ultrascale+ chip, with 4 ARM Cortex-A53 cores, and the ZU9EG FPGA. IPs configuration 1*256, 3*128 Number of instances * size Frequency (MHz) 200, 250, 300 Working frequency of the FPGA Number of SMP cores SMP: 1 to 4 FPGA: 3+1 helper, 2+2 helpers Combination of SMP and helper threads Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage tasks on the FPGA Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores before waiting for their finalization OmpSs@FPGA Experiments (Matrix multiply)
  • 24. DATE 2020 DFiant HLS • Aims to bridge the programmability gap by combining constructs and semantics from software, hardware and dataflow languages • Programming model accommodates a middle-ground between low- level HDL and high-level sequential programming High-Level Synthesis Languages and Tools (e.g., C and Vivado HLS) Register-Transfer Level HDLs (e.g., VHDL) DFiant: A Dataflow HDL  Automatic pipelining  Not an HDL  Problem with state  Separating timing from functionality  Concurrency  Fine-grain control  Automatic pipelining Concurrency  Fine-grain control  Bound to clock  Explicit pipelining
  • 25. DATE 2020 PCI Express Manager DFE Memory MyManager (.maxj) x x + 30 x Manager m = new Manager(); Kernel k = new MyKernel(); m.setKernel(k); m.setIO( link(“x", CPU), m.build(); link(“y", CPU)); MyKernel(DATA_SIZE, x, DATA_SIZE*4, y, DATA_SIZE*4); for (int i =0; i < DATA_SIZE; i++) y[i]= x[i] * x[i] + 30; Main Memory CPU CPU Code Host Code (.c) MaxCompiler: Application Development 25 SLiC MaxelerOS DFEVar x = io.input("x", dfeInt(32)); DFEVar result = x * x + 30; io.output("y", result, dfeInt(32)); MyKernel (.maxj) int*x, *y; y x x + 30 y x
  • 26. Task-based kernel identification/DFE mapping OmpSs identifies “static” task graphs while running and DFE kilo-task instance is selected Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines DataFlow Engines (DFEs) produce one pair of results at each clock cycle
  • 27. XiTAO: Generalized Tasks Reduces Overheads Reduces Parallel Slackness Overcommits memory resources  Improve Parallel Slackness by reducing dimensionality  Reduce Cache/BW pressure by allocating multiple resources
  • 28. OmpSs + XiTao 19.03.20 28 Extend the XiTAO with support for heterogeneous resources
  • 29. Why extremely low-voltage for FPGAs? Why Energy efficiency through Undervolting? • Still way to go for <20MW@exascale • FPGAs are at least 20X less power- and energy-efficient than ASICs. • Undervolting directly delivers dynamic and static power/energy reduction. • Note: undervolting is not DVFS • Frequency is not lowered!
  • 30. FPGA Undervolting Preliminary Experiments Voltage scaling below standard nominal level, in on-chip memories of Xilinx technology: o Power quadratically decreases, deliver savings of more than 10X. o A conservative voltage guardband exists below nominal. o Fault exponentially increases, below the guardband. o Faults can be detected / managed.
  • 31. DATE 2020 Use Cases • Healthcare: Infection biomarkers o Statistical search for biomarkers, which often needs intensive computation. A biomarker is a measurable value that can indicate the state of an organism, and is often the presence, absence or severity of a specific disease • Smart Home: Assisted Living o The ability of the home to learn from the users behavior and anticipate future behavior is still an open task and necessary to obtain a broad user acceptance of assisted living in the general public
  • 32. DATE 2020 Use Cases • Smart City: operational urban pollutant dispersion modelling o Modeling city landscape + sensor data + wind prediction to issue a “pollutant weather prediction” • Machine Learning: Automated driving and graphics rendering o Object detection using CNN networks for automated driving systems and CNN- and LSTM-based methods for realistic rendering of graphics for gaming and multi-camera systems • Secure IoT Gateway o Variety of sensors and actors in an industrial and private surrounding
  • 33. DATE 2020 Smart City 7x gain in energy efficiency using FPGAs Health Care 822x speedup using FPGAs, enabling a new world of Biomarker analysis Machine Learning up to 16x gain in energy efficiency and performance on the same hardware, using the EmbeDL optimizer Boosting Use Cases by Using LEGaTO toolchain
  • 34. DATE 2020 o Image: Object, face and gestures o Speech with DeepSpeech Displays personalized information All detections simultaniously • Start: 16 FPS at 500 W • Now: 25 FPS at 400 W • Goal: 10 FPS at 50 W Local machine learning frameworks Smart Mirror Use Case