SlideShare a Scribd company logo
The LEGaTO project has received funding from the European Union's Horizon 2020 research
and innovation programme under the grant agreement No 780681
LEGaTO: Low-Energy, Secure,
and Resilient Toolset for
Heterogeneous Computing
DATE 2020
Valerio Schiavoni (Uni. Neuchatel)
11/March/2020
DATE 2020
The industry challenge
• Dennard scaling is dead
• Moore's law is slowing down
• Trend towards heterogeneous architectures
• Increasing focus on energy: ICT sector is responsible for
5% of global energy consumption
• Creates a programmability challenge
DATE 2020
LEGaTO Ambition
• Create software stack-support for energy-efficient
heterogeneous computing
o Starting with Made-in-Europe mature software stack, and
optimizing this stack to support energy-efficiency
o Computing on a commercial cutting-edge European-developed
heterogeneous hardware substrate with CPU + GPU + FPGA +
FPGA-based Dataflow Engines (DFE)
• Main goal: energy efficiency
DATE 2020
The future challenge of computing: MW, not FLOPS
4
“… without dramatic increases
in efficiency, ICT industry could
use 20% of all electricity and
emit up to 5.5% of the world’s
carbon emissions by 2025.”
“We have a tsunami of data
approaching. Everything which
can be is being digitalised. It is
a perfect storm.”
“ … a single $1bn Apple data
centre planned for Athenry in Co
Galway, expects to eventually
use 300MW of electricity, or
over 8% of the national capacity
and more than the daily entire
usage of Dublin. It will require
144 large diesel generators as
back up for when the wind does
not blow.”
DATE 2020
How did we get here?
5
Decades of exponential growth in performance
End of Dennard scaling
Moore’s Law is slowing down
Explore new architectures & models of computation
Exponential growth in demand & data
Move towards accelerators
DATE 2020
The End of Performance
• Frequency levels off, cores fill in the gap
http://doi.ieeecomputersociety.org/10.1109/MC.2015.368
DATE 2020
The real challenge: data movements
7
Pedram, Richardson, Galal, Kvatinsky, and Horowitz, “Dark Memory and Accelerator-
Rich System Optimization in the Dark Silicon Era”, April 2016, IEEE Design & Test
“As a result the most important optimization must be
done at the algorithm level, to reduce off- chip memory
accesses, to create Dark Memory. The algorithms must
first be (re)written for both locality and parallelism
before you tailor the hardware to accelerate them.”
“My hypothesis is that we can solve [the software crisis
in parallel computing], but only if we work from the
algorithm down to the hardware — not the traditional
hardware first mentality.” Tim Mattson, principal
engineer at Intel.
In agreement with Intel:
DATE 2020
FPGAs to the rescue?
• The model of computation is key
• Build ultra-deep, highly efficient pipelines
8
DATE 2020
LEGaTO Objectives
19.03.20 9
DATE 2020
Partners
DATE 2020
Overview
The LEGaTO Compute Stack
DATE 2020
Hardware Platforms
• Christmann (RECS Box)
o Heterogeneous platform: Supports CPU (X86 or Arm), GPU
(Nvidia) and FPGA (Xillinx and Altera)
• MAXELER
o Maxeler DFE platforms and toolset
• Heterogeneous microserver approach, integrating CPU, GPU and FPGA
technology
• High-speed, low latency communication infrastructure,
enabling accelerators, scalable across multiple servers
• Modular, blade-style design, hot pluggable/swappable
19/03/2020 14
RECS|Box overview
Concept
19/03/2020 15
• Dedicated network for
monitoring, control and
iKVM
• Multiple 1Gb/10Gb
Ethernet links per
Microserver – integrated 40
Gb Ethernet backbone
• Dedicated high-speed, low-
latency communication
infrastructure
• Host-2-Host PCIe between
CPUs
• Switched high-speed
serial lanes between
FPGAs
• Enables pooling of
accelerators
and assignment to
different hosts
Baseboard
Baseboard
Baseboard
3/16
Microserver
s
per
Baseboard
Microserver
Module
CPU Mem I/O
Microserver
Module
Microserver
Module
Backplane
Up to 15 Baseboards per
Server
Distributed
Monitoring
and KVM
Distributed
Monitoring
and KVM
Ethernet
Communication
Infrastructure
Ethernet
Communication
Infrastructure
High-speed
Low-latency
Communication
Storage /
I/O-Ext.
High-speed
Low-latency
Communication
RECS|Box overview
Architecture
CPU Mem I/O CPU Mem I/O
19/03/2020 16
RECS|Box overview
Microserver
Intel x86
Arm Server SoCs
Xilinx FPGAs
Intel FPGAs
Low-Power
Microserver
High-Performance
Microserver
CPU Microserver FPGA MicroserverGPGPU
NVIDIA Jetson TX1/TX2
4 Core A57@1.73 GHz + Maxwell GPGPU@1.5 GHz
2 Core Denver + 4 Core A57 + Pascal GPGPU@1.5 GHz
ARMv8 Server SoC
32 Core A72@2.1 GHz
NVIDIA Tesla P100/V100
Pascal/Volta GPGPU@1.3 GHz Intel Stratix 10 SoC
2,800 kLE
Xilinx Zynq 7020
85 kLC
PCIe-Extensions
PCIe SSD
Xeon Phi
RAPTOR
FPGA acc.
DATE 2020
Maxeler Dataflow Computing Systems
MPC-X3000
• 8 Dataflow Engines in 1U
• Up to 1TB of DFE RAM
• Zero-copy RDMA
• Dynamic allocation of DFEs to
conventional CPU servers through
Infiniband
• Equivalent performance to
20-50 x86 servers
DATE 2020
Dataflow System at Jülich Supercomputing Center
• Pilot system deployed in October 2017 as part of PRACE PCP
• System configuration
o 1U Maxeler MPC-X with 8 MAX5 DFEs
o 1U AMD EPYC server
• 2x AMD 7601 EPYC CPUs
• 1 TB RAM
• 9.6 TB SSD (data storage)
o one 1U login head node
• Runs 4 scientific workloads
18
MPC-X node
Remote users
MAX5 DFE EPYC CPU
1TB DDR4
Head/Build node
ipmi
56 Gbps 2x Infiniband @ 56 Gbps
10 Gbps
10 Gbps
Supermicro EPYC node
http://www.prace-ri.eu/pcp/
DATE 2020
Scaling Dataflow Computing into the Cloud
• MAX5 DFEs are compatible with Amazon EC2 F1
instances
• MaxCompiler supports programming F1
• Hybrid Cloud Model:
o On-premise Dataflow system from Maxeler
o Elastically expand to the Amazon Cloud when
needed
19
On-premise
Cloud
DATE 2020
Programming Models and Runtimes:
Task based programming + Dataflow!
• OmpSs / OmpSs@FPGA (BSC)
• MaxCompiler (Maxeler)
• DFiant HLS (TECHNION)
• XiTAO (CHALMERS)
19 March, 2020 20
DATE 2020
21
Towards a single source for any target
• A need if we expect programmers to survive
o Architectures appear like mushrooms
• Productivity
o Develop once → run everywhere
• Performance
• Key concept behind OmpSs
o Sequential task based program on single address/name space
o + directionality annotations
o Executed in parallel: Automatic runtime computation of dependences among
tasks
o LEGaTO: Extend tasks with resource requirements, propagate through the
stack to find the most energy efficient solution at run time
22
Example for OmpSs@FPGA (Matrix multiply with
OmpSs and Vivado HLS annotations)
DATE 2020
23
AXIOM board:
Xilinx Zynq Ultrascale+ chip,
with 4 ARM Cortex-A53 cores,
and the ZU9EG FPGA.
IPs configuration 1*256, 3*128 Number of instances * size
Frequency (MHz) 200, 250, 300 Working frequency of the FPGA
Number of SMP cores SMP: 1 to 4
FPGA: 3+1 helper, 2+2 helpers
Combination of SMP and helper threads
Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage tasks on
the FPGA
Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores before
waiting for their finalization
OmpSs@FPGA Experiments (Matrix multiply)
DATE 2020
DFiant HLS
• Aims to bridge the programmability gap by combining constructs and
semantics from software, hardware and dataflow languages
• Programming model accommodates a middle-ground between low-
level HDL and high-level sequential programming
High-Level Synthesis
Languages and Tools
(e.g., C and Vivado HLS)
Register-Transfer Level
HDLs
(e.g., VHDL)
DFiant: A Dataflow HDL
 Automatic pipelining
 Not an HDL
 Problem with state
 Separating timing
from functionality
 Concurrency
 Fine-grain control
 Automatic pipelining Concurrency
 Fine-grain control
 Bound to clock
 Explicit pipelining
DATE 2020
PCI
Express
Manager
DFE
Memory
MyManager (.maxj)
x
x
+
30
x
Manager m = new Manager();
Kernel k =
new MyKernel();
m.setKernel(k);
m.setIO(
link(“x", CPU),
m.build();
link(“y", CPU));
MyKernel(DATA_SIZE,
x, DATA_SIZE*4,
y, DATA_SIZE*4);
for (int i =0; i < DATA_SIZE; i++)
y[i]= x[i] * x[i] + 30;
Main
Memory
CPU
CPU
Code
Host Code (.c)
MaxCompiler: Application Development
25
SLiC
MaxelerOS
DFEVar x = io.input("x", dfeInt(32));
DFEVar result = x * x + 30;
io.output("y", result, dfeInt(32));
MyKernel (.maxj)
int*x, *y;
y
x
x
+
30
y
x
Task-based kernel identification/DFE mapping
OmpSs identifies “static” task graphs while running and DFE kilo-task instance is
selected
Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines
DataFlow Engines
(DFEs) produce
one pair of results
at each clock cycle
XiTAO: Generalized Tasks
Reduces Overheads
Reduces Parallel Slackness
Overcommits memory resources
 Improve Parallel Slackness by
reducing dimensionality
 Reduce Cache/BW pressure by
allocating multiple resources
OmpSs + XiTao
19.03.20 28
Extend the XiTAO with
support for heterogeneous
resources
Why extremely low-voltage for FPGAs?
Why Energy efficiency through Undervolting?
• Still way to go for <20MW@exascale
• FPGAs are at least 20X less power- and energy-efficient than ASICs.
• Undervolting directly delivers dynamic and static power/energy reduction.
• Note: undervolting is not DVFS
• Frequency is not lowered!
FPGA Undervolting Preliminary Experiments
Voltage scaling below standard nominal level, in on-chip
memories of Xilinx technology:
o Power quadratically decreases, deliver savings of more than 10X.
o A conservative voltage guardband exists below nominal.
o Fault exponentially increases, below the guardband.
o Faults can be detected / managed.
DATE 2020
Use Cases
• Healthcare: Infection biomarkers
o Statistical search for biomarkers, which often
needs intensive computation. A biomarker is
a measurable value that can indicate the
state of an organism, and is often the
presence, absence or severity of a specific
disease
• Smart Home: Assisted Living
o The ability of the home to learn from the
users behavior and anticipate future behavior
is still an open task and necessary to obtain
a broad user acceptance of assisted living in
the general public
DATE 2020
Use Cases
• Smart City: operational urban
pollutant dispersion modelling
o Modeling city landscape + sensor data +
wind prediction to issue a “pollutant
weather prediction”
• Machine Learning: Automated driving
and graphics rendering
o Object detection using CNN networks for
automated driving systems and CNN-
and LSTM-based methods for realistic
rendering of graphics for gaming and
multi-camera systems
• Secure IoT Gateway
o Variety of sensors and actors in an
industrial and private surrounding
DATE 2020
Smart City
7x gain in energy efficiency
using FPGAs
Health Care
822x speedup using
FPGAs, enabling a new
world of Biomarker
analysis
Machine Learning
up to 16x gain in energy
efficiency and
performance on the same
hardware, using the
EmbeDL optimizer
Boosting Use Cases by Using LEGaTO toolchain
DATE 2020
o Image: Object, face and gestures
o Speech with DeepSpeech
Displays personalized
information
All detections simultaniously
• Start: 16 FPS at 500 W
• Now: 25 FPS at 400 W
• Goal: 10 FPS at 50 W
Local machine learning
frameworks
Smart Mirror Use Case
Questions?
https://legato-project.eu/
legato-project@bsc.es

More Related Content

What's hot

Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
inside-BigData.com
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
Edge AI and Vision Alliance
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
Edge AI and Vision Alliance
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
Putchong Uthayopas
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
Edge AI and Vision Alliance
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlap
inside-BigData.com
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
Ganesan Narayanasamy
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
RCCSRENKEI
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
Volodymyr Saviak
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
Edge AI and Vision Alliance
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
Ganesan Narayanasamy
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
Edge AI and Vision Alliance
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
NVIDIA Taiwan
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangAi Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
NVIDIA Taiwan
 
SGI HPC Update for June 2013
SGI HPC Update for June 2013SGI HPC Update for June 2013
SGI HPC Update for June 2013
inside-BigData.com
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...
RCCSRENKEI
 

What's hot (20)

Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre..."Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
 
dCUDA: Distributed GPU Computing with Hardware Overlap
 dCUDA: Distributed GPU Computing with Hardware Overlap dCUDA: Distributed GPU Computing with Hardware Overlap
dCUDA: Distributed GPU Computing with Hardware Overlap
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
SGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 KievSGI HPC DAY 2011 Kiev
SGI HPC DAY 2011 Kiev
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen HuangAi Forum at Computex 2017 - Keynote Slides by Jensen Huang
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
 
SGI HPC Update for June 2013
SGI HPC Update for June 2013SGI HPC Update for June 2013
SGI HPC Update for June 2013
 
04 New opportunities in photon science with high-speed X-ray imaging detecto...
04 New opportunities in photon science with high-speed X-ray imaging  detecto...04 New opportunities in photon science with high-speed X-ray imaging  detecto...
04 New opportunities in photon science with high-speed X-ray imaging detecto...
 

Similar to DATE 2020: Design, Automation and Test in Europe Conference

SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
LEGATO project
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
Sagar Dolas
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
LEGATO project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
LEGATO project
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
Sagar Dolas
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
Alison B. Lowndes
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
VigneshwarRamaswamy
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
Wilhelm van Belkum
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
CastLabKAIST
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT Project
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
Bill Wong
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Edge AI and Vision Alliance
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
Shinnosuke Furuya
 
uCluster
uClusteruCluster
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
mustafa sarac
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
HPCC Systems
 
OpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in ZurichOpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in Zurich
Ganesan Narayanasamy
 

Similar to DATE 2020: Design, Automation and Test in Europe Conference (20)

SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Hardware architecture of Summit Supercomputer
 Hardware architecture of Summit Supercomputer Hardware architecture of Summit Supercomputer
Hardware architecture of Summit Supercomputer
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
AI Accelerators for Cloud Datacenters
AI Accelerators for Cloud DatacentersAI Accelerators for Cloud Datacenters
AI Accelerators for Cloud Datacenters
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
Dell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム組み込みから HPC まで ARM コアで実現するエコシステム
組み込みから HPC まで ARM コアで実現するエコシステム
 
uCluster
uClusteruCluster
uCluster
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
OpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in ZurichOpenPOWER Boot camp in Zurich
OpenPOWER Boot camp in Zurich
 

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
LEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
LEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
LEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
LEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
LEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
LEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGATO project
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
LEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
LEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
LEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
LEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
LEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
LEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
LEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
LEGATO project
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
LEGATO project
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat data
LEGATO project
 

More from LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
 
HiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat dataHiPerMAb: A statistical tool for judging the potential of short fat data
HiPerMAb: A statistical tool for judging the potential of short fat data
 

Recently uploaded

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
Sérgio Sacani
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
yourprojectpartner05
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Sérgio Sacani
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
ABHISHEK SONI NIMT INSTITUTE OF MEDICAL AND PARAMEDCIAL SCIENCES , GOVT PG COLLEGE NOIDA
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Sérgio Sacani
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
Sérgio Sacani
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
vadgavevedant86
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
sandertein
 

Recently uploaded (20)

Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Anti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark UniverseAnti-Universe And Emergent Gravity and the Dark Universe
Anti-Universe And Emergent Gravity and the Dark Universe
 
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptxLEARNING TO LIVE WITH LAWS OF MOTION .pptx
LEARNING TO LIVE WITH LAWS OF MOTION .pptx
 
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...
 
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...Discovery of An Apparent Red, High-Velocity Type Ia Supernova at  𝐳 = 2.9  wi...
Discovery of An Apparent Red, High-Velocity Type Ia Supernova at 𝐳 = 2.9 wi...
 
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDSJAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
JAMES WEBB STUDY THE MASSIVE BLACK HOLE SEEDS
 
Summary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdfSummary Of transcription and Translation.pdf
Summary Of transcription and Translation.pdf
 
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
 
cathode ray oscilloscope and its applications
cathode ray oscilloscope and its applicationscathode ray oscilloscope and its applications
cathode ray oscilloscope and its applications
 

DATE 2020: Design, Automation and Test in Europe Conference

  • 1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing DATE 2020 Valerio Schiavoni (Uni. Neuchatel) 11/March/2020
  • 2. DATE 2020 The industry challenge • Dennard scaling is dead • Moore's law is slowing down • Trend towards heterogeneous architectures • Increasing focus on energy: ICT sector is responsible for 5% of global energy consumption • Creates a programmability challenge
  • 3. DATE 2020 LEGaTO Ambition • Create software stack-support for energy-efficient heterogeneous computing o Starting with Made-in-Europe mature software stack, and optimizing this stack to support energy-efficiency o Computing on a commercial cutting-edge European-developed heterogeneous hardware substrate with CPU + GPU + FPGA + FPGA-based Dataflow Engines (DFE) • Main goal: energy efficiency
  • 4. DATE 2020 The future challenge of computing: MW, not FLOPS 4 “… without dramatic increases in efficiency, ICT industry could use 20% of all electricity and emit up to 5.5% of the world’s carbon emissions by 2025.” “We have a tsunami of data approaching. Everything which can be is being digitalised. It is a perfect storm.” “ … a single $1bn Apple data centre planned for Athenry in Co Galway, expects to eventually use 300MW of electricity, or over 8% of the national capacity and more than the daily entire usage of Dublin. It will require 144 large diesel generators as back up for when the wind does not blow.”
  • 5. DATE 2020 How did we get here? 5 Decades of exponential growth in performance End of Dennard scaling Moore’s Law is slowing down Explore new architectures & models of computation Exponential growth in demand & data Move towards accelerators
  • 6. DATE 2020 The End of Performance • Frequency levels off, cores fill in the gap http://doi.ieeecomputersociety.org/10.1109/MC.2015.368
  • 7. DATE 2020 The real challenge: data movements 7 Pedram, Richardson, Galal, Kvatinsky, and Horowitz, “Dark Memory and Accelerator- Rich System Optimization in the Dark Silicon Era”, April 2016, IEEE Design & Test “As a result the most important optimization must be done at the algorithm level, to reduce off- chip memory accesses, to create Dark Memory. The algorithms must first be (re)written for both locality and parallelism before you tailor the hardware to accelerate them.” “My hypothesis is that we can solve [the software crisis in parallel computing], but only if we work from the algorithm down to the hardware — not the traditional hardware first mentality.” Tim Mattson, principal engineer at Intel. In agreement with Intel:
  • 8. DATE 2020 FPGAs to the rescue? • The model of computation is key • Build ultra-deep, highly efficient pipelines 8
  • 13. DATE 2020 Hardware Platforms • Christmann (RECS Box) o Heterogeneous platform: Supports CPU (X86 or Arm), GPU (Nvidia) and FPGA (Xillinx and Altera) • MAXELER o Maxeler DFE platforms and toolset
  • 14. • Heterogeneous microserver approach, integrating CPU, GPU and FPGA technology • High-speed, low latency communication infrastructure, enabling accelerators, scalable across multiple servers • Modular, blade-style design, hot pluggable/swappable 19/03/2020 14 RECS|Box overview Concept
  • 15. 19/03/2020 15 • Dedicated network for monitoring, control and iKVM • Multiple 1Gb/10Gb Ethernet links per Microserver – integrated 40 Gb Ethernet backbone • Dedicated high-speed, low- latency communication infrastructure • Host-2-Host PCIe between CPUs • Switched high-speed serial lanes between FPGAs • Enables pooling of accelerators and assignment to different hosts Baseboard Baseboard Baseboard 3/16 Microserver s per Baseboard Microserver Module CPU Mem I/O Microserver Module Microserver Module Backplane Up to 15 Baseboards per Server Distributed Monitoring and KVM Distributed Monitoring and KVM Ethernet Communication Infrastructure Ethernet Communication Infrastructure High-speed Low-latency Communication Storage / I/O-Ext. High-speed Low-latency Communication RECS|Box overview Architecture CPU Mem I/O CPU Mem I/O
  • 16. 19/03/2020 16 RECS|Box overview Microserver Intel x86 Arm Server SoCs Xilinx FPGAs Intel FPGAs Low-Power Microserver High-Performance Microserver CPU Microserver FPGA MicroserverGPGPU NVIDIA Jetson TX1/TX2 4 Core A57@1.73 GHz + Maxwell GPGPU@1.5 GHz 2 Core Denver + 4 Core A57 + Pascal GPGPU@1.5 GHz ARMv8 Server SoC 32 Core A72@2.1 GHz NVIDIA Tesla P100/V100 Pascal/Volta GPGPU@1.3 GHz Intel Stratix 10 SoC 2,800 kLE Xilinx Zynq 7020 85 kLC PCIe-Extensions PCIe SSD Xeon Phi RAPTOR FPGA acc.
  • 17. DATE 2020 Maxeler Dataflow Computing Systems MPC-X3000 • 8 Dataflow Engines in 1U • Up to 1TB of DFE RAM • Zero-copy RDMA • Dynamic allocation of DFEs to conventional CPU servers through Infiniband • Equivalent performance to 20-50 x86 servers
  • 18. DATE 2020 Dataflow System at Jülich Supercomputing Center • Pilot system deployed in October 2017 as part of PRACE PCP • System configuration o 1U Maxeler MPC-X with 8 MAX5 DFEs o 1U AMD EPYC server • 2x AMD 7601 EPYC CPUs • 1 TB RAM • 9.6 TB SSD (data storage) o one 1U login head node • Runs 4 scientific workloads 18 MPC-X node Remote users MAX5 DFE EPYC CPU 1TB DDR4 Head/Build node ipmi 56 Gbps 2x Infiniband @ 56 Gbps 10 Gbps 10 Gbps Supermicro EPYC node http://www.prace-ri.eu/pcp/
  • 19. DATE 2020 Scaling Dataflow Computing into the Cloud • MAX5 DFEs are compatible with Amazon EC2 F1 instances • MaxCompiler supports programming F1 • Hybrid Cloud Model: o On-premise Dataflow system from Maxeler o Elastically expand to the Amazon Cloud when needed 19 On-premise Cloud
  • 20. DATE 2020 Programming Models and Runtimes: Task based programming + Dataflow! • OmpSs / OmpSs@FPGA (BSC) • MaxCompiler (Maxeler) • DFiant HLS (TECHNION) • XiTAO (CHALMERS) 19 March, 2020 20
  • 21. DATE 2020 21 Towards a single source for any target • A need if we expect programmers to survive o Architectures appear like mushrooms • Productivity o Develop once → run everywhere • Performance • Key concept behind OmpSs o Sequential task based program on single address/name space o + directionality annotations o Executed in parallel: Automatic runtime computation of dependences among tasks o LEGaTO: Extend tasks with resource requirements, propagate through the stack to find the most energy efficient solution at run time
  • 22. 22 Example for OmpSs@FPGA (Matrix multiply with OmpSs and Vivado HLS annotations)
  • 23. DATE 2020 23 AXIOM board: Xilinx Zynq Ultrascale+ chip, with 4 ARM Cortex-A53 cores, and the ZU9EG FPGA. IPs configuration 1*256, 3*128 Number of instances * size Frequency (MHz) 200, 250, 300 Working frequency of the FPGA Number of SMP cores SMP: 1 to 4 FPGA: 3+1 helper, 2+2 helpers Combination of SMP and helper threads Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage tasks on the FPGA Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores before waiting for their finalization OmpSs@FPGA Experiments (Matrix multiply)
  • 24. DATE 2020 DFiant HLS • Aims to bridge the programmability gap by combining constructs and semantics from software, hardware and dataflow languages • Programming model accommodates a middle-ground between low- level HDL and high-level sequential programming High-Level Synthesis Languages and Tools (e.g., C and Vivado HLS) Register-Transfer Level HDLs (e.g., VHDL) DFiant: A Dataflow HDL  Automatic pipelining  Not an HDL  Problem with state  Separating timing from functionality  Concurrency  Fine-grain control  Automatic pipelining Concurrency  Fine-grain control  Bound to clock  Explicit pipelining
  • 25. DATE 2020 PCI Express Manager DFE Memory MyManager (.maxj) x x + 30 x Manager m = new Manager(); Kernel k = new MyKernel(); m.setKernel(k); m.setIO( link(“x", CPU), m.build(); link(“y", CPU)); MyKernel(DATA_SIZE, x, DATA_SIZE*4, y, DATA_SIZE*4); for (int i =0; i < DATA_SIZE; i++) y[i]= x[i] * x[i] + 30; Main Memory CPU CPU Code Host Code (.c) MaxCompiler: Application Development 25 SLiC MaxelerOS DFEVar x = io.input("x", dfeInt(32)); DFEVar result = x * x + 30; io.output("y", result, dfeInt(32)); MyKernel (.maxj) int*x, *y; y x x + 30 y x
  • 26. Task-based kernel identification/DFE mapping OmpSs identifies “static” task graphs while running and DFE kilo-task instance is selected Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines DataFlow Engines (DFEs) produce one pair of results at each clock cycle
  • 27. XiTAO: Generalized Tasks Reduces Overheads Reduces Parallel Slackness Overcommits memory resources  Improve Parallel Slackness by reducing dimensionality  Reduce Cache/BW pressure by allocating multiple resources
  • 28. OmpSs + XiTao 19.03.20 28 Extend the XiTAO with support for heterogeneous resources
  • 29. Why extremely low-voltage for FPGAs? Why Energy efficiency through Undervolting? • Still way to go for <20MW@exascale • FPGAs are at least 20X less power- and energy-efficient than ASICs. • Undervolting directly delivers dynamic and static power/energy reduction. • Note: undervolting is not DVFS • Frequency is not lowered!
  • 30. FPGA Undervolting Preliminary Experiments Voltage scaling below standard nominal level, in on-chip memories of Xilinx technology: o Power quadratically decreases, deliver savings of more than 10X. o A conservative voltage guardband exists below nominal. o Fault exponentially increases, below the guardband. o Faults can be detected / managed.
  • 31. DATE 2020 Use Cases • Healthcare: Infection biomarkers o Statistical search for biomarkers, which often needs intensive computation. A biomarker is a measurable value that can indicate the state of an organism, and is often the presence, absence or severity of a specific disease • Smart Home: Assisted Living o The ability of the home to learn from the users behavior and anticipate future behavior is still an open task and necessary to obtain a broad user acceptance of assisted living in the general public
  • 32. DATE 2020 Use Cases • Smart City: operational urban pollutant dispersion modelling o Modeling city landscape + sensor data + wind prediction to issue a “pollutant weather prediction” • Machine Learning: Automated driving and graphics rendering o Object detection using CNN networks for automated driving systems and CNN- and LSTM-based methods for realistic rendering of graphics for gaming and multi-camera systems • Secure IoT Gateway o Variety of sensors and actors in an industrial and private surrounding
  • 33. DATE 2020 Smart City 7x gain in energy efficiency using FPGAs Health Care 822x speedup using FPGAs, enabling a new world of Biomarker analysis Machine Learning up to 16x gain in energy efficiency and performance on the same hardware, using the EmbeDL optimizer Boosting Use Cases by Using LEGaTO toolchain
  • 34. DATE 2020 o Image: Object, face and gestures o Speech with DeepSpeech Displays personalized information All detections simultaniously • Start: 16 FPS at 500 W • Now: 25 FPS at 400 W • Goal: 10 FPS at 50 W Local machine learning frameworks Smart Mirror Use Case