SlideShare a Scribd company logo
The LEGaTO project has received funding from the European Union's Horizon 2020 research and
innovation programme under the grant agreement No 780681
LEGaTO: Low-Energy,
Heterogeneous Computing
Use of AI in the Project
AI4EU Café Webinar
Osman Unsal
Barcelona Supercomputing Center
28/October/2020
AI4EU Cafe
The future challenge of computing: MW, not FLOPS
2
“… without dramatic increases
in efficiency, ICT industry could
use 20% of all electricity and
emit up to 5.5% of the world’s
carbon emissions by 2025.”
“We have a tsunami of data
approaching. Everything which
can be is being digitalised. It is
a perfect storm.”
“ … a single $1bn Apple data
centre planned for Athenry in Co
Galway, expects to eventually
use 300MW of electricity, or
over 8% of the national capacity
and more than the daily entire
usage of Dublin. It will require
144 large diesel generators as
back up for when the wind does
not blow.”
AI4EU Cafe
How did we get here?
3
Decades of exponential growth in performance
End of Dennard scaling
Moore’s Law is slowing down
Explore new architectures & models of computation
Exponential growth in demand & data
Move towards accelerators
AI4EU Cafe
FPGAs to the rescue?
• The model of computation is key
• Build ultra-deep, highly efficient pipelines
4
AI4EU Cafe
LEGaTO Ambition
• Create software stack-support for energy-efficient
heterogeneous computing
o Starting with Made-in-Europe mature software stack, and optimizing
this stack to support energy-efficiency
o Computing on a commercial cutting-edge European-developed
heterogeneous hardware substrate with CPU + GPU + FPGA +
FPGA-based Dataflow Engines (DFE)
• Main goal: energy efficiency
AI4EU Cafe
LEGaTO Objectives
28.10.20 6
AI4EU Cafe
Overview
AI4EU Cafe
Partners
AI4EU Cafe
Use Cases
• Healthcare: Infection biomarkers
o Statistical search for biomarkers, which often
needs intensive computation. A biomarker is
a measurable value that can indicate the
state of an organism, and is often the
presence, absence or severity of a specific
disease
• Smart Home: Assisted Living
o The ability of the home to learn from the
users behavior and anticipate future
behavior is still an open task and necessary
to obtain a broad user acceptance of
assisted living in the general public
AI4EU Cafe
Use Cases
• Smart City: operational urban
pollutant dispersion modelling
o Modeling city landscape + sensor data +
wind prediction to issue a “pollutant
weather prediction”
• Machine Learning: Automated driving
and graphics rendering
o Object detection using CNN networks for
automated driving systems and CNN-
and LSTM-based methods for realistic
rendering of graphics for gaming and
multi-camera systems
• Secure IoT Gateway
o Variety of sensors and actors in an
industrial and private surrounding
AI4EU Cafe
LEGaTO Healthcare Use Case and AI
• Leverage tree based methods and LASSO regression for
Infection Biomarker research
• Integrated to LEGaTO Scone security technology
o Efficient deployment of Intel SGX security extensions
• LEGaTO scheduling techniques help to accelerate one of
the key algorithms using random forest
28.10.20 11
AI4EU Cafe
LEGaTO ML (DNN) Use Case
• In presentation of Hans Salomonsson (Embedl)
28.10.20 12
AI4EU Cafe
LEGaTO Smart Home Use Case and AI
• In presentation of Nils Kucza (University of Bielefeld)
28.10.20 13
AI4EU Cafe
LEGaTO Student Research Perspective on AI
• Scheduling VGG across heterogeneous cores in mobile
edge devices
• On Nvidia Jetson TX2
o 4-core ARM A57 and
o 2-core Denver 2
• In presentation of Pirah Noor Soomro (Chalmers University)
28.10.20 14
AI4EU Cafe
LEGaTO Undervolting Technology for DNN
• Following slides
28.10.20 15
Reduced-Voltage Operation in Modern FPGAs
for Neural Network Acceleration
Behzad Salami Baturay Onural Ismail Yuksel
Fahrettin Koc Oguz Ergin Adrian Cristal
Osman Unsal Hamid Sarbazi-Azad
Executive Summary
• Motivation: Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
• Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs
• Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by Undervolting below nominal level
• Evaluation Setup
 5 Image classification workloads
 3 Xilinx UltraScale+ ZCU102 platforms
 2 On-chip voltage rails
• Main Results
 Large voltage guardband (i.e., 33%)
 >3X power-efficiency gain
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Motivation and Background
• Motivation
 Power consumption of neural networks is a main concern
 Hardware acceleration: GPUs, FPGAs, and ASICs
 FPGAs: Getting popular but less power-efficient than equivalent ASICs
 Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs
 Any potential of “Undervolting FPGAs” for power-efficiency of neural networks?
• Background
 Neural Networks: Widely deployed with an inherent resilience to errors
 FPGAs: Higher throughput than GPUs and better flexibility than ASICs
 Undervolting: Reduces power cons., may incur reliability or performance issues
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Our Goal
• Primary Goal
 Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by:
 Undervolting (i.e., underscaling voltage below nominal level)
• Secondary Goals
 Study the voltage behavior of real FPGAs (e.g., guardband)
 Study the power-efficiency gain of undervolting for neural networks
 Study the reliability overhead
 Study the frequency underscaling to prevent the accuracy loss
 Study the effect of environmental temperature
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Overall Methodology
• 5 CNN image classification
workloads, i.e., VGGNet, GoogleNet,
AlexNet, ResNet50, Inception.
• Xilinx DNNDK to map CNN into FPGA
 By default optimized for INT8
• 3 identical samples of Xilinx ZCU102
 ZYNQ Ultrscale+ architecture
 Hard-core ARM for data orchestration
 FPGA for CNN acceleration
• 2 on-chip voltage rails, via PMBus
 𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …
 𝑉𝐶𝐶𝐵𝑅𝐴𝑀: BRAMs
 𝑉𝑛𝑜𝑚= 850mV (set by manufacturer)
Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Overall Voltage Behavior
Slight variation of voltage behavior across platforms and benchmarks
 FPGA stops operatingCrash
• Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽)
• Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽)
• Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)
 No performance or reliability loss
 Added by the vendor to ensure the
worst-case conditions
 Large guardband, average of 33%
Guard
band
 A narrow voltage region
 Neural network accuracy collapse
Critical
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Power-Reliability Trade-off
Power-efficiency (GOPs/W) gain
• >3X power saving (2.6X by eliminating guardband and further 43% in critical region)
Reliability overhead (i.e., CNN accuracy loss)
VGGNet GoogleNet AlexNet ResNet Inception
• Slight variation across 3 platforms and 5 workloads
• No accuracy loss in the guardband, accuracy collapse in the critical region
• Slight variation across 3 platforms and 5 workloads
Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
Environmental Temperature
• Effects of environmental temperature on power-reliability
 Use fan speed to test temperature in [34 ℃, 50 ℃]
 On-board temperature monitored by PMBus
• Temperature effects on power consumption
 ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)
 By undervolting, the impact of temperature on power consumption reduces.
• Temperature effects on reliability
 ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)
 In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly.
GoogleNet
https://legato-project.eu/
legato-project@bsc.es
Framework Overview
Denver 0
Denver 1
A57 2
A57 3
A57 4
A57 5
61 200 500
Timeline [s]
Pipeline Stage 1
Pipeline Stage 2
Pipeline Stage 3
Training Phase
FC
FC
FC
MAXPOOL
Conv64
Conv64
Conv64
MAXPOOL
Conv64
Conv64
Conv64
MAXPOOL
Conv64
Conv64
Conv64
MAXPOOL
Conv64
Conv64
MAXPOOL
Conv64
Conv64
Execution of VGG-16 on Nvidia Jetson TX2
Best Configuration: 3 staged pipeline, 6-5-10 layer partitioning, 2-2-2
core assignment
Preliminary Results:
Comparison of Pipe Search algorithm
with Brute Force Algorithm
Approach
Number of
trials
Training
Time [s]
Total execution
time of 2000
frames [s]
Best
Configuration*
Seed**
Exhaustive Search 1970 8129.21 8166.9 3,7,4,10,2,1,1, ….
Pipe Search Algorithm 41 116.305 2915.91 3,7,4,10,2,1,1, 3,6,5,10,2,1,1,
Experimental Setup:
Hardware: Nvidia Jetson TX2.
Used cores: 2 Denver, 2 A57
*. Throughput maximizing pipeline configuration. The sequence contains three distinct
sections:
1- Number of Pipeline Stages
2- Layer distribution among Pipeline Stages
3- Core placement for each Pipeline Stage.
**. Seed is a configuration which is calculated using computational hints. A good seeds
minimizes number of trials in search space exploration.
Application: VGG-16
Total 21 Layers, 16/21 are compute
intensive layers
Input Frames = 2000

More Related Content

What's hot

IRJET- IoT based Solar Power Monitoring System
IRJET- IoT based Solar Power Monitoring SystemIRJET- IoT based Solar Power Monitoring System
IRJET- IoT based Solar Power Monitoring System
IRJET Journal
 
Measurement Of Rn222 Concentrations In The Air Of Peshraw & Darbandikhan Tu...
Measurement Of Rn222  Concentrations In The Air Of Peshraw &  Darbandikhan Tu...Measurement Of Rn222  Concentrations In The Air Of Peshraw &  Darbandikhan Tu...
Measurement Of Rn222 Concentrations In The Air Of Peshraw & Darbandikhan Tu...
IJMER
 
Modeling Uncertainty For Middleware-based Streaming Power Grid Applications
Modeling Uncertainty For Middleware-based Streaming Power Grid ApplicationsModeling Uncertainty For Middleware-based Streaming Power Grid Applications
Modeling Uncertainty For Middleware-based Streaming Power Grid Applications
Jenny Liu
 
Solar panel monitoring solution using IoT-Faststream Technologies
Solar panel monitoring solution using IoT-Faststream TechnologiesSolar panel monitoring solution using IoT-Faststream Technologies
Solar panel monitoring solution using IoT-Faststream Technologies
Sudipta Maity
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Luigi Vanfretti
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
RCCSRENKEI
 
Datacenter Transformation - Energy And Availability - Dio Van Der Arend
Datacenter Transformation - Energy And Availability - Dio Van Der ArendDatacenter Transformation - Energy And Availability - Dio Van Der Arend
Datacenter Transformation - Energy And Availability - Dio Van Der ArendHPDutchWorld
 
Presentation from Sierra Club panel discussion on Microgrids in DC
Presentation from Sierra Club panel discussion on Microgrids in DCPresentation from Sierra Club panel discussion on Microgrids in DC
Presentation from Sierra Club panel discussion on Microgrids in DC
Hugh Youngblood
 
Remote Monitoring System for Solar Inverters
Remote Monitoring System for Solar InvertersRemote Monitoring System for Solar Inverters
Remote Monitoring System for Solar Inverters
IRJET Journal
 
Microgrid & renewable integration at burbank water & power
Microgrid & renewable integration at burbank water & powerMicrogrid & renewable integration at burbank water & power
Microgrid & renewable integration at burbank water & power
Schneider Electric
 
10 Abundant-Data Computing
10 Abundant-Data Computing10 Abundant-Data Computing
10 Abundant-Data Computing
RCCSRENKEI
 
Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...
IJECEIAES
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems
RCCSRENKEI
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V Swaminathan
FNian
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
RCCSRENKEI
 
IRJET- IOT Based Residence Energy Control System
IRJET- IOT Based Residence Energy Control SystemIRJET- IOT Based Residence Energy Control System
IRJET- IOT Based Residence Energy Control System
IRJET Journal
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Fisnik Kraja
 
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
deawoo Kim
 
The Future Of Data Center Critical Power - GP100 eBook
The Future Of Data Center Critical Power - GP100 eBookThe Future Of Data Center Critical Power - GP100 eBook
The Future Of Data Center Critical Power - GP100 eBook
GE Energy Connections
 
Tektronix higher edupdate_q315
Tektronix higher edupdate_q315Tektronix higher edupdate_q315
Tektronix higher edupdate_q315Jeff Sable
 

What's hot (20)

IRJET- IoT based Solar Power Monitoring System
IRJET- IoT based Solar Power Monitoring SystemIRJET- IoT based Solar Power Monitoring System
IRJET- IoT based Solar Power Monitoring System
 
Measurement Of Rn222 Concentrations In The Air Of Peshraw & Darbandikhan Tu...
Measurement Of Rn222  Concentrations In The Air Of Peshraw &  Darbandikhan Tu...Measurement Of Rn222  Concentrations In The Air Of Peshraw &  Darbandikhan Tu...
Measurement Of Rn222 Concentrations In The Air Of Peshraw & Darbandikhan Tu...
 
Modeling Uncertainty For Middleware-based Streaming Power Grid Applications
Modeling Uncertainty For Middleware-based Streaming Power Grid ApplicationsModeling Uncertainty For Middleware-based Streaming Power Grid Applications
Modeling Uncertainty For Middleware-based Streaming Power Grid Applications
 
Solar panel monitoring solution using IoT-Faststream Technologies
Solar panel monitoring solution using IoT-Faststream TechnologiesSolar panel monitoring solution using IoT-Faststream Technologies
Solar panel monitoring solution using IoT-Faststream Technologies
 
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
Datacenter Transformation - Energy And Availability - Dio Van Der Arend
Datacenter Transformation - Energy And Availability - Dio Van Der ArendDatacenter Transformation - Energy And Availability - Dio Van Der Arend
Datacenter Transformation - Energy And Availability - Dio Van Der Arend
 
Presentation from Sierra Club panel discussion on Microgrids in DC
Presentation from Sierra Club panel discussion on Microgrids in DCPresentation from Sierra Club panel discussion on Microgrids in DC
Presentation from Sierra Club panel discussion on Microgrids in DC
 
Remote Monitoring System for Solar Inverters
Remote Monitoring System for Solar InvertersRemote Monitoring System for Solar Inverters
Remote Monitoring System for Solar Inverters
 
Microgrid & renewable integration at burbank water & power
Microgrid & renewable integration at burbank water & powerMicrogrid & renewable integration at burbank water & power
Microgrid & renewable integration at burbank water & power
 
10 Abundant-Data Computing
10 Abundant-Data Computing10 Abundant-Data Computing
10 Abundant-Data Computing
 
Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...Energy efficient chaotic whale optimization technique for data gathering in w...
Energy efficient chaotic whale optimization technique for data gathering in w...
 
13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems13 Supercomputer-Scale AI with Cerebras Systems
13 Supercomputer-Scale AI with Cerebras Systems
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V Swaminathan
 
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
09 The Extreme-scale Scientific Software Stack for Collaborative Open Source
 
IRJET- IOT Based Residence Energy Control System
IRJET- IOT Based Residence Energy Control SystemIRJET- IOT Based Residence Energy Control System
IRJET- IOT Based Residence Energy Control System
 
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
 
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
 
The Future Of Data Center Critical Power - GP100 eBook
The Future Of Data Center Critical Power - GP100 eBookThe Future Of Data Center Critical Power - GP100 eBook
The Future Of Data Center Critical Power - GP100 eBook
 
Tektronix higher edupdate_q315
Tektronix higher edupdate_q315Tektronix higher edupdate_q315
Tektronix higher edupdate_q315
 

Similar to LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project

An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
LEGATO project
 
Bob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks AnalysisBob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks Analysis
EnergyTech2015
 
Managing Grid Constraints with Active Management Systems
Managing Grid Constraints with Active Management SystemsManaging Grid Constraints with Active Management Systems
Managing Grid Constraints with Active Management Systems
Smarter Grid Solutions
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
NECST Lab @ Politecnico di Milano
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
a3labdsp
 
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
Sandia National Laboratories: Energy & Climate: Renewables
 
Low-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable ComputingLow-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable Computing
Omar Elshal
 
FPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECSTFPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECST
NECST Lab @ Politecnico di Milano
 
gridFUTURE, A Concept
gridFUTURE, A ConceptgridFUTURE, A Concept
gridFUTURE, A Concept
John Schneider, Dr. Eng.
 
NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN  NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN
Engr. Muhammad Shan Saleem
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingRoger Rafanell Mas
 
Modeling of Micro-Hydro Power Plant and its Direct Based on Neural system
Modeling of Micro-Hydro Power Plant and its Direct Based on Neural systemModeling of Micro-Hydro Power Plant and its Direct Based on Neural system
Modeling of Micro-Hydro Power Plant and its Direct Based on Neural system
IRJET Journal
 
sustentación tesis maestria Smart Grids.pptx
sustentación tesis maestria Smart Grids.pptxsustentación tesis maestria Smart Grids.pptx
sustentación tesis maestria Smart Grids.pptx
ssuser107d0e1
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
Tal Lavian Ph.D.
 
Greendroid ppt
Greendroid pptGreendroid ppt
Greendroid ppt
Seminar Links
 
IoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentationIoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentation
VEDLIoT Project
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
Deepak Shankar
 

Similar to LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project (20)

An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...
 
Poster_group22
Poster_group22Poster_group22
Poster_group22
 
Bob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks AnalysisBob Garrett: Network of Networks Analysis
Bob Garrett: Network of Networks Analysis
 
Managing Grid Constraints with Active Management Systems
Managing Grid Constraints with Active Management SystemsManaging Grid Constraints with Active Management Systems
Managing Grid Constraints with Active Management Systems
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Energy saving policies final
Energy saving policies finalEnergy saving policies final
Energy saving policies final
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
10.6_Utility Microgrids_Reid_EPRI/SNL Microgrid Symposium
 
Low-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable ComputingLow-power Innovative techniques for Wearable Computing
Low-power Innovative techniques for Wearable Computing
 
FPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECSTFPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECST
 
gridFUTURE, A Concept
gridFUTURE, A ConceptgridFUTURE, A Concept
gridFUTURE, A Concept
 
NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN  NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN
 
MRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud ComputingMRI Energy-Efficient Cloud Computing
MRI Energy-Efficient Cloud Computing
 
Modeling of Micro-Hydro Power Plant and its Direct Based on Neural system
Modeling of Micro-Hydro Power Plant and its Direct Based on Neural systemModeling of Micro-Hydro Power Plant and its Direct Based on Neural system
Modeling of Micro-Hydro Power Plant and its Direct Based on Neural system
 
sustentación tesis maestria Smart Grids.pptx
sustentación tesis maestria Smart Grids.pptxsustentación tesis maestria Smart Grids.pptx
sustentación tesis maestria Smart Grids.pptx
 
Resume_Akshay
Resume_AkshayResume_Akshay
Resume_Akshay
 
TeraGrid Communication and Computation
TeraGrid Communication and ComputationTeraGrid Communication and Computation
TeraGrid Communication and Computation
 
Greendroid ppt
Greendroid pptGreendroid ppt
Greendroid ppt
 
IoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentationIoT Tech Expo 2023_Micha vor dem Berge presentation
IoT Tech Expo 2023_Micha vor dem Berge presentation
 
How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration? How to achieve 95%+ Accurate power measurement during architecture exploration?
How to achieve 95%+ Accurate power measurement during architecture exploration?
 

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
LEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
LEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
LEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
LEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
LEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
LEGATO project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
LEGATO project
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
LEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
LEGATO project
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
LEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
LEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
LEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
LEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
LEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
LEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
LEGATO project
 

More from LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 

Recently uploaded

GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
Cherry
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
azzyixes
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
yusufzako14
 

Recently uploaded (20)

GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Large scale production of streptomycin.pptx
Large scale production of streptomycin.pptxLarge scale production of streptomycin.pptx
Large scale production of streptomycin.pptx
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
justice-and-fairness-ethics with example
justice-and-fairness-ethics with examplejustice-and-fairness-ethics with example
justice-and-fairness-ethics with example
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
plant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptxplant biotechnology Lecture note ppt.pptx
plant biotechnology Lecture note ppt.pptx
 

LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project

  • 1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 LEGaTO: Low-Energy, Heterogeneous Computing Use of AI in the Project AI4EU Café Webinar Osman Unsal Barcelona Supercomputing Center 28/October/2020
  • 2. AI4EU Cafe The future challenge of computing: MW, not FLOPS 2 “… without dramatic increases in efficiency, ICT industry could use 20% of all electricity and emit up to 5.5% of the world’s carbon emissions by 2025.” “We have a tsunami of data approaching. Everything which can be is being digitalised. It is a perfect storm.” “ … a single $1bn Apple data centre planned for Athenry in Co Galway, expects to eventually use 300MW of electricity, or over 8% of the national capacity and more than the daily entire usage of Dublin. It will require 144 large diesel generators as back up for when the wind does not blow.”
  • 3. AI4EU Cafe How did we get here? 3 Decades of exponential growth in performance End of Dennard scaling Moore’s Law is slowing down Explore new architectures & models of computation Exponential growth in demand & data Move towards accelerators
  • 4. AI4EU Cafe FPGAs to the rescue? • The model of computation is key • Build ultra-deep, highly efficient pipelines 4
  • 5. AI4EU Cafe LEGaTO Ambition • Create software stack-support for energy-efficient heterogeneous computing o Starting with Made-in-Europe mature software stack, and optimizing this stack to support energy-efficiency o Computing on a commercial cutting-edge European-developed heterogeneous hardware substrate with CPU + GPU + FPGA + FPGA-based Dataflow Engines (DFE) • Main goal: energy efficiency
  • 9. AI4EU Cafe Use Cases • Healthcare: Infection biomarkers o Statistical search for biomarkers, which often needs intensive computation. A biomarker is a measurable value that can indicate the state of an organism, and is often the presence, absence or severity of a specific disease • Smart Home: Assisted Living o The ability of the home to learn from the users behavior and anticipate future behavior is still an open task and necessary to obtain a broad user acceptance of assisted living in the general public
  • 10. AI4EU Cafe Use Cases • Smart City: operational urban pollutant dispersion modelling o Modeling city landscape + sensor data + wind prediction to issue a “pollutant weather prediction” • Machine Learning: Automated driving and graphics rendering o Object detection using CNN networks for automated driving systems and CNN- and LSTM-based methods for realistic rendering of graphics for gaming and multi-camera systems • Secure IoT Gateway o Variety of sensors and actors in an industrial and private surrounding
  • 11. AI4EU Cafe LEGaTO Healthcare Use Case and AI • Leverage tree based methods and LASSO regression for Infection Biomarker research • Integrated to LEGaTO Scone security technology o Efficient deployment of Intel SGX security extensions • LEGaTO scheduling techniques help to accelerate one of the key algorithms using random forest 28.10.20 11
  • 12. AI4EU Cafe LEGaTO ML (DNN) Use Case • In presentation of Hans Salomonsson (Embedl) 28.10.20 12
  • 13. AI4EU Cafe LEGaTO Smart Home Use Case and AI • In presentation of Nils Kucza (University of Bielefeld) 28.10.20 13
  • 14. AI4EU Cafe LEGaTO Student Research Perspective on AI • Scheduling VGG across heterogeneous cores in mobile edge devices • On Nvidia Jetson TX2 o 4-core ARM A57 and o 2-core Denver 2 • In presentation of Pirah Noor Soomro (Chalmers University) 28.10.20 14
  • 15. AI4EU Cafe LEGaTO Undervolting Technology for DNN • Following slides 28.10.20 15
  • 16. Reduced-Voltage Operation in Modern FPGAs for Neural Network Acceleration Behzad Salami Baturay Onural Ismail Yuksel Fahrettin Koc Oguz Ergin Adrian Cristal Osman Unsal Hamid Sarbazi-Azad
  • 17. Executive Summary • Motivation: Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs • Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs • Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by Undervolting below nominal level • Evaluation Setup  5 Image classification workloads  3 Xilinx UltraScale+ ZCU102 platforms  2 On-chip voltage rails • Main Results  Large voltage guardband (i.e., 33%)  >3X power-efficiency gain
  • 18. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 19. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 20. Motivation and Background • Motivation  Power consumption of neural networks is a main concern  Hardware acceleration: GPUs, FPGAs, and ASICs  FPGAs: Getting popular but less power-efficient than equivalent ASICs  Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs  Any potential of “Undervolting FPGAs” for power-efficiency of neural networks? • Background  Neural Networks: Widely deployed with an inherent resilience to errors  FPGAs: Higher throughput than GPUs and better flexibility than ASICs  Undervolting: Reduces power cons., may incur reliability or performance issues
  • 21. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 22. Our Goal • Primary Goal  Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by:  Undervolting (i.e., underscaling voltage below nominal level) • Secondary Goals  Study the voltage behavior of real FPGAs (e.g., guardband)  Study the power-efficiency gain of undervolting for neural networks  Study the reliability overhead  Study the frequency underscaling to prevent the accuracy loss  Study the effect of environmental temperature
  • 23. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 24. Overall Methodology • 5 CNN image classification workloads, i.e., VGGNet, GoogleNet, AlexNet, ResNet50, Inception. • Xilinx DNNDK to map CNN into FPGA  By default optimized for INT8 • 3 identical samples of Xilinx ZCU102  ZYNQ Ultrscale+ architecture  Hard-core ARM for data orchestration  FPGA for CNN acceleration • 2 on-chip voltage rails, via PMBus  𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …  𝑉𝐶𝐶𝐵𝑅𝐴𝑀: BRAMs  𝑉𝑛𝑜𝑚= 850mV (set by manufacturer) Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
  • 25. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 26. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 27. Overall Voltage Behavior Slight variation of voltage behavior across platforms and benchmarks  FPGA stops operatingCrash • Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽) • Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽) • Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)  No performance or reliability loss  Added by the vendor to ensure the worst-case conditions  Large guardband, average of 33% Guard band  A narrow voltage region  Neural network accuracy collapse Critical
  • 28. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 29. Power-Reliability Trade-off Power-efficiency (GOPs/W) gain • >3X power saving (2.6X by eliminating guardband and further 43% in critical region) Reliability overhead (i.e., CNN accuracy loss) VGGNet GoogleNet AlexNet ResNet Inception • Slight variation across 3 platforms and 5 workloads • No accuracy loss in the guardband, accuracy collapse in the critical region • Slight variation across 3 platforms and 5 workloads
  • 30. Outline • Motivation and Background • Our Goal • Methodology • Results - Overall Voltage Behavior - Power-Reliability Trade-off - Environmental Temperature • Prior Works • Summary, Conclusion, and Future Works
  • 31. Environmental Temperature • Effects of environmental temperature on power-reliability  Use fan speed to test temperature in [34 ℃, 50 ℃]  On-board temperature monitored by PMBus • Temperature effects on power consumption  ↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)  By undervolting, the impact of temperature on power consumption reduces. • Temperature effects on reliability  ↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)  In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly. GoogleNet
  • 34. Denver 0 Denver 1 A57 2 A57 3 A57 4 A57 5 61 200 500 Timeline [s] Pipeline Stage 1 Pipeline Stage 2 Pipeline Stage 3 Training Phase FC FC FC MAXPOOL Conv64 Conv64 Conv64 MAXPOOL Conv64 Conv64 Conv64 MAXPOOL Conv64 Conv64 Conv64 MAXPOOL Conv64 Conv64 MAXPOOL Conv64 Conv64 Execution of VGG-16 on Nvidia Jetson TX2 Best Configuration: 3 staged pipeline, 6-5-10 layer partitioning, 2-2-2 core assignment
  • 35. Preliminary Results: Comparison of Pipe Search algorithm with Brute Force Algorithm Approach Number of trials Training Time [s] Total execution time of 2000 frames [s] Best Configuration* Seed** Exhaustive Search 1970 8129.21 8166.9 3,7,4,10,2,1,1, …. Pipe Search Algorithm 41 116.305 2915.91 3,7,4,10,2,1,1, 3,6,5,10,2,1,1, Experimental Setup: Hardware: Nvidia Jetson TX2. Used cores: 2 Denver, 2 A57 *. Throughput maximizing pipeline configuration. The sequence contains three distinct sections: 1- Number of Pipeline Stages 2- Layer distribution among Pipeline Stages 3- Core placement for each Pipeline Stage. **. Seed is a configuration which is calculated using computational hints. A good seeds minimizes number of trials in search space exploration. Application: VGG-16 Total 21 Layers, 16/21 are compute intensive layers Input Frames = 2000

Editor's Notes

  1. processing on massive scale will have a significant energy impact MW will be new focus, not FLOPS data centres need to reduce energy !
  2. for large scale compute, parallelism might not be the most efficient assembly line model not even a new idea, compute equivalent is dataflow this view is Maxeler specific, but the solution is not Maxeler more explicit to model and develop your application this way here focus is performance but low energy very related
  3. Thank you.
  4. To begin, I will give a brief overview. [CLICK] Our motivation is that the power consumption of Neural Networks is a first class concern in state-of-the-art applications, due to the massive amount of data movement and computational power. [CLICK] To alleviate this issue, hardware acceleration using GPUs, FPGAs, and ASICS is a promising solution. [CLICK] Among these architectures, FPGAs are getting popular, since, they deliver higher throughput than GPUs and provide better flexibility than ASICs. [CLICK] But the problem is that the power-efficiency of FPGAs is at least 10X less than equivalent ASICs. [CLICK] Our goal is to alleviate this issue by undervolting off-the-shelf FPGAs running Neural Network applications. Undervolting means supply voltage underscaling below default level that is set by FPGA vendor. [CLICK] Our study is based on [CLICK] 5 image classification workloads [CLICK] 3 real Xilinx ZCU102 devices which is based on Zynq architecture, [CLICK] and 2 on-chip voltage rails [CLICK] Among the main results, [CLICK] we discover a large voltage guardband of 33% of the nominal voltage level. This guardband is set by vendor to ensure the correct functionality in the worst-case conditions. Eliminating this guardband does not incur any performance or reliability overhead. By undervolting, [CLICK] by applying this techinque, we achieve more than 3X power-efficiency.
  5. Here is the outline for the talk.
  6. I will first discuss the motivation behind the work and also will briefly provide the necessary background.
  7. [CLICK] First, motivation [CLICK] The main motivation behind this work is the increasing interest for neural network that are limited to their high power consumption drawback. [CLICK] Using efficient accelerators usually deliver better power-efficiency than general purpose processors. [CLICK] Among accelerator frameworks, FPGAs are getting popular thanks to their less time to market; however, their power efficiency is at least 10 times less than ASIC-based neural networks. [CLICK] as a hardware level technique, Undervolting has been recently studied for off-the-shelf CPUs, GPUs, and DRAMs. They have shown significant potential of this technique since vendors usually add large guardbands below the nominal level. This guardband is usually unnecessary for many real-world applications and eliminating it delivers power-efficieny without compromising performance or reliability. [CLICK] In this work, we aim to experimentally study the potential of undervolting of real FPGA devices for neural networks. [CLICK] now I will give a brief background about [CLICK] Neural networks first. Neural networks are getting popular since they have shown a significant potential to classify unseen objects. They have an inherent resiliency to errors. [CLICK] fpgas second. FPGAs have reconfigurable, massively parallel, and deeply pipelined architectures. They have advantage of both GPUs and ASICs in terms of flexibility and efficiency. [CLICK] finally undervolting. We refer undervolting as supply voltage underscaling below the nominal level that is set by vendor. We apply undervolting until the underlying FPGA device stops operating. The direct advantage is the power saving, however, it may have performance or reliability overhead. This trade-off is experimentally studied in this work.
  8. Let me elaborate on our main goals of this work.
  9. [CLICK] Our primary goal is to [CLICK] Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by: [CLICK] Voltage underscaling of real FPGAs below the nominal level [CLICK] Beside that, we aim to [CLICK] Study the voltage behavior of real FPGAs such as voltage guardbands [CLICK] Study the power-efficiency gain of undervolting for neural networks [CLICK] Study the reliability overhead [CLICK] Study the frequency underscaling to prevent the accuracy loss below guardband [CLICK] and finally, study the effect of environmental temperature
  10. I will briefly explain the experimental methodology next.
  11. [CLICK] our experimental methodology is summarized in this figure. Our focus is the classification phase of convolutional neural networks so we start with a pre-trained model. [CLICK] We selected 5 state-of-the-art image classification benchmarks as listed here. They have different number of layers, neurons, and models sizes from a few KBs to hundreds of MBs. [CLICK] For mapping CNN models into FPGA, we use a tool from Xilinx, called DNNDK. By using this tool, we make sure that our study is general-enough and not specified for a specifiec design. [CLICK] DNNDK support several Xilinx FPGAs. We perform our experiments on 3 identical samples of Xilinx ZCU102. This architecture is composed of ARM and fpga. The DNN computations are performed in FPGA part and the ARM processor is used to orchestrate the data movement. [CLICK] Lastly, we access the voltage rails on the FPGA platform using the PMBus. Among different voltage rails, we focus on on-chip ones including VCCINT and VCCBRAM. VCCINT is used in share by DSPs, LUTs, buffers, and routing resources, and VCCBRAM is individually used by on-chip memories. Note that this is hard setup of the platform set by vendor. The default voltage level for both these rails is 850mv. [CLICK] Among these voltage rails, we measure the power consumption at the nominal level and observed that the BRAMs power is negligible. This can be the result of the efficient on-chip memory design in Xilinx Ultrascale+ family. Hence, we focus on undervolting VCCINT.
  12. I will now discuss the experimental results.
  13. I will start with presenting and discussing the voltage behavior we experimentally observed for FPGAs.
  14. Here, we show the overall voltage behavior. [CLICK] Undervolting below the nominal level, we observe three voltage region: guardband, critical, and crash. [CLICK] Guardband is added by vendor to ensure the correct functionality in the worst-case conditions. We measured it an average of 33%. There is no reliability or performance loss in this region. So eliminating it can deliver significant power saving in normal conditions. [CLICK] Below guardband, there is a narrower region at which FPGA operates but with the CNN accuracy loss. We call it critical region. A minimum safe voltage level or Vmin separates the guardband and critical regions. [CLICK] Finally below the critical regions and at Vcrach FPGA does not operate. Vcrash is measured to be average of 540mV. [CLICK] Note that there is a slight variation of voltage behavior across 3 platforms and 5 benchmarks studied.
  15. I will discuss now the effect of the reduced-voltage operations on the power consumption and DNN accuracy trade-off according to the voltage behavior discussed.
  16. [CLICK] First power. There is a total of more than 3X gain in power-efficiency when undervolting from nominal level to the crash point. 2.6X of this is achieved by eliminating the guardband and a further 43% is as the result of undervolting in the critical region which has the cost of accuracy loss. [CLICK] there is a slight variation across 3 benchmarks and 5 platforms. [CLICK] the CNN accuracy is substantially reduced in the critical region. [click] and there is an accuracy collapse for all benchmarks. [click]As it can be seen, with a slight variation across 3 benchmarks and 5 platforms.
  17. Lastly, I will present our experimental analysis about the effect of the environmental temperature in the power-reliability trade-off.
  18. Temperature has significant impact on the power consumption as well as on the reliability. [CLICK] We studied its impact while undervolting 𝑉 𝐶𝐶𝐼𝑁𝑇 . [CLICK] In our tests, we use the speed of the FPGA fan that can generate the temperature in the range of [34 ℃, 50 ℃]. Wealso use PMBus to monitor the on-chip temperature. [CLICK] First power: as can be seen in this figure, temperature directly impacts the power consumption, meaning that lowering the temperature leads to lower power consumption. This is mainly due to the reduction of static power. [CLICK] However, by undervolting further, the impact of temperature on the power consumption reduces and as you can see, in the critical voltage region there is a negligible impact. This can be explained by noting that by undervolting, the static power also gradually reduces. As the result, the temperature which also impacts the static power does now show much effect on it. [CLICK] On the other side, temperature has an indirect impact on the reliability, meaning that reducing the temperature the reliability improves or the accuracy loss reduces. [CLICK] Also, we observe a negligible change in the voltage regions by temperature changes. Although, in wider ranges we may observe differences. This needs further experiments and equipment.
  19. The Framework consists of 2 modules. In Offline module, We determine the computational hints from network descriptor. The hints provide a notion of computational weights of each layer based on which we model the initial partition of network layers to generate a pipeline stage. In the second module, which is online we measure some configurations which are expected as a good candidate for a balanced pipeline on a given platform. The training finishes when algorithm has found the optimal solution for mapping. Rest of the input data is then processed in pipelined fashion.
  20. This figure shows: 1) The training phase of the execution 2) The normal phase of the pipeline processing During training, after 61 input frames, the algorithm converged to a 3-Staged pipeline. Pipeline Stage 1 comprises of first 6 layers and executes on 2 Denver cores Pipeline Stage 2 comprises of next 5 layers and executes on 2 A57 cores Pipeline Stage 3 comprises of first 10 layers and executes on 2 A57 cores As throughput maximizing pipeline is the one which has smallest bubble size and the slowest stage takes minimum possible time. This configuration is chosen based on the fact that it minimizes the bottleneck of the pipeline. As we can also observe in the figure that there is a small bubble in pipeline.