SlideShare a Scribd company logo
The LEGaTO project has received funding from the European Union’s Horizon 2020 research and innovation
programme under the grant agreement No 780681. www.legato-project.eu
Moldable pipelines for CNNs on heterogeneous edge devices
Pirah Noor Soomro, Chalmers University of Technology
A framework for efficient performance of CNNs on heterogeneous edge devices containing different type of compute
resources.
• We implement a brief and guided online training to find near optimal configuration for a balanced pipeline.
• We designed a simple and programmer friendly interface to generate high throughput and balanced CNN
pipeline by leveraging information provided through the interface.
Motivation
• Modern edge devices contain variable core configuration on a single chip.
• Existing DNN libraries do not provide heterogeneity aware implementation of CNNs targeting edge devices.
• Existing solutions [1,2] for CNN pipelines on edge devices require an offline training followed by an exhaustive
DSE (Domain Search space Exploration).
Background
Edge Devices: Nvidia Jetson TX2
4 energy efficient cores, 2 high performance
cores
Methodology
References
1. Wang, Siqi, et al. "High-throughput cnn inference on embedded arm big. little multi-core processors." IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems (2019).
2. Lu, Zongqing, et al. "Modeling the resource requirements of convolutional neural networks on mobile devices." Proceedings of the 25th ACM international conference on
Multimedia. 2017.
Conclusion
• A balanced VGG pipeline increases throughput by 22% compared to baseline.
• Computational hints provide a good seed to start exploration of near optimal
configuration.
• Our approach does offline partitioning and online molding (Changing number of cores)
of pipeline stages to generate a balanced pipeline.
Convolutional Neural
Networks
• Consecutive layers of
computationally intensives
convolutional kernels.
• Each layer has different
computational complexity,
represented by input
descriptors.
• Figure on right represents
VGG-16 CNN.
• Widely used for classification
on streaming input data.
• Pipelined implementation is
favored on streaming input.
Ne
de c
Ge e a e
c a a h
f de c
Ge e a e e e
age
R DSE g h
a eed
N
Ye
P e e
ba a ced
P ce e
N
Ye , e fe e ce de ec ed
Pe f a ce
deg aded
15
17
19
21
23
25
27
29
31
33
1 2 3 4
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
1 2 3 4
Computationalintensity
PS1 PS2 PS3 PS4 PS5 PS6
Experiments and observations
1) 3PS: 2D-2A57-2A57 [7-7-7] 4) 2PS: 4A57-2D [13-8]2) 6PS: 1D-1D-1A57-1A57-1A57-1A57 [4-4-4-4-3-2] 3) 3PS: 2D-2A57-2A57 [7-4-10]
Time
Training
Figure 4. Timeline of a 4-stage pipeline on a
20 cores machine. Training phase represents
trying various pipeline configurations to
select one best configurations for a balanced
pipeline
VGG pipelines on TX2
Four different pipeline
configurations are tested on
TX2.
• Figure 1 shows
configuration 3 is fastest.
• Figure 2 also supports the
observation, configuration
3 has the most balanced
distribution of
computations among 3
pipeline stages.
• Figure 3 presents a view of
pipelines. 1,2 and 4 are
imbalanced pipelines while
3 yields comparatively
balanced pipeline.
C0 C1 C2 C3
L2
L1I
L1D L1D
L1I L1I
L1D L1D
L1I
C5C4
L2
L1I
L1D L1D
L1I
Network description in
template language
main(){
…
Conv1 = CONV(ip,
op, weights);
Conv2 =
CONV(conv1, op,
weights);
….
network.add(Conv1)
;
network.add(Conv2)
;
…
network.execute();
}
4 A57s 2 Denvers
Figure 1. Execution-time(s)/input of 4 different
configurations of VGG pipelines (lower is better). The
baseline is data parallel implementation of VGG-16 on TX2.
Figure 2. Distribution of computational
load among Pipeline Stages(PS). The
numbers are derived from network input
descriptors.
Figure 3. Timeline of VGG pipelines read as; 1) 3-stage pipeline where first stage is scheduled on 2 Denver cores, second stage on 2 A57 cores and third on other 2 A57 cores. Configuration 3 is most
balanced among four configurations
A57 Denver
C0 C1 C2 C3 C4 C5
Kernel level parallelism.
Layers are executed one
after another
A57 Denver
C0 C1 C2 C3 C4 C5
Layer 1-10
Input 1
Layer 1-10
Input 2
Layer 1-10
Input 3
Layer 11-21
Input 1
Layer 11-21
Input 2
Layer 11-21
Input 3
2 Stage pipeline
on TX2
Conv 64
Conv 64
Maxpool
Conv128
Conv 128
Maxpool
Conv 256
Conv 256
Conv 256
Maxpool
conv 512
conv 512
conv 512
Maxpool
conv 512
conv 512
conv 512
Maxpool
FC
FC
FC
Conv 64
Conv 64
Maxpool
Conv128
Conv 128
Maxpool
Conv 256
Conv 256
Conv 256
………
Conv 64
Conv 64
Maxpool
…….
Conv 64
Conv 64
Maxpool
…….
Conv 64
Conv 64
Maxpool
…….
conv 512
conv 512
conv 512
……..
conv 512
conv 512
conv 512
……..
conv 512
conv 512
conv 512
……..

More Related Content

What's hot

Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive Computing
Ryousei Takano
 
Semester Project
Semester ProjectSemester Project
Semester Project
IaaC
 
Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAH
Akash M Shah
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
inside-BigData.com
 
Hpc with qpu
Hpc with qpuHpc with qpu
Hpc with qpu
Towfiqul Islam
 
European Exascale System Interconnect & Storage
European Exascale System Interconnect & StorageEuropean Exascale System Interconnect & Storage
European Exascale System Interconnect & Storage
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
Rain technology seminar
Rain technology seminar Rain technology seminar
Rain technology seminar
Mufeedh Muhammed
 
High performance computing
High performance computingHigh performance computing
High performance computingGuy Tel-Zur
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
irjes
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
Ian Lumb
 
Composable Energy Modeling for ML-Driven Drone Applications
Composable Energy Modeling for ML-Driven Drone ApplicationsComposable Energy Modeling for ML-Driven Drone Applications
Composable Energy Modeling for ML-Driven Drone Applications
Demetris Trihinas
 
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
Otávio Carvalho
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
NECST Lab @ Politecnico di Milano
 
Rain technology ppt
Rain technology pptRain technology ppt
Rain technology ppt
DC Graphics
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
inside-BigData.com
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning Simulator
CloudLightning
 
Static Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachStatic Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario Approach
GreenLabAtDI
 

What's hot (20)

Sai Dheeraj_Resume
Sai Dheeraj_ResumeSai Dheeraj_Resume
Sai Dheeraj_Resume
 
Error Permissive Computing
Error Permissive ComputingError Permissive Computing
Error Permissive Computing
 
Semester Project
Semester ProjectSemester Project
Semester Project
 
Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAH
 
Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)Update on the Exascale Computing Project (ECP)
Update on the Exascale Computing Project (ECP)
 
Hpc with qpu
Hpc with qpuHpc with qpu
Hpc with qpu
 
European Exascale System Interconnect & Storage
European Exascale System Interconnect & StorageEuropean Exascale System Interconnect & Storage
European Exascale System Interconnect & Storage
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
Rain technology seminar
Rain technology seminar Rain technology seminar
Rain technology seminar
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
High Performance Computing in the Cloud?
High Performance Computing in the Cloud?High Performance Computing in the Cloud?
High Performance Computing in the Cloud?
 
Composable Energy Modeling for ML-Driven Drone Applications
Composable Energy Modeling for ML-Driven Drone ApplicationsComposable Energy Modeling for ML-Driven Drone Applications
Composable Energy Modeling for ML-Driven Drone Applications
 
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
Rain technology ppt
Rain technology pptRain technology ppt
Rain technology ppt
 
Exascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing WorldExascale Computing Project - Driving a HUGE Change in a Changing World
Exascale Computing Project - Driving a HUGE Change in a Changing World
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
CloudLightning Simulator
CloudLightning SimulatorCloudLightning Simulator
CloudLightning Simulator
 
Static Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario ApproachStatic Energy Prediction in Software: A Worst-Case Scenario Approach
Static Energy Prediction in Software: A Worst-Case Scenario Approach
 

Similar to Moldable pipelines for CNNs on heterogeneous edge devices

Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing Approach
IJERA Editor
 
Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing Approach
IJERA Editor
 
IRJET- Re-Configuration Topology for On-Chip Networks by Back-Tracking
IRJET- Re-Configuration Topology for On-Chip Networks by Back-TrackingIRJET- Re-Configuration Topology for On-Chip Networks by Back-Tracking
IRJET- Re-Configuration Topology for On-Chip Networks by Back-Tracking
IRJET Journal
 
Area-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-ChipArea-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-Chip
VLSICS Design
 
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIPAREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
VLSICS Design
 
IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...
IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...
IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...
IRJET Journal
 
Application Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-ChipApplication Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-ChipIvonne Liu
 
Optimal configuration of network
Optimal configuration of networkOptimal configuration of network
Optimal configuration of network
jpstudcorner
 
Performance analysis and implementation of modified sdm based noc for mpsoc o...
Performance analysis and implementation of modified sdm based noc for mpsoc o...Performance analysis and implementation of modified sdm based noc for mpsoc o...
Performance analysis and implementation of modified sdm based noc for mpsoc o...
eSAT Journals
 
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Andrés Gómez
 
Application Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-ChipApplication Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-Chipzhao fu
 
G011136871
G011136871G011136871
G011136871
IOSR Journals
 
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIPMODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
VLSICS Design
 
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Luca Sinico
 
An air index for spatial query processing in road networks
An air index for spatial query processing in road networksAn air index for spatial query processing in road networks
An air index for spatial query processing in road networks
ieeepondy
 
scopus indexed journals list
scopus indexed journals listscopus indexed journals list
scopus indexed journals list
rikaseorika
 
published journals
published journalspublished journals
published journals
rikaseorika
 
Noise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelNoise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc Model
IJMER
 

Similar to Moldable pipelines for CNNs on heterogeneous edge devices (20)

Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing Approach
 
Investigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing ApproachInvestigating the Performance of NoC Using Hierarchical Routing Approach
Investigating the Performance of NoC Using Hierarchical Routing Approach
 
IRJET- Re-Configuration Topology for On-Chip Networks by Back-Tracking
IRJET- Re-Configuration Topology for On-Chip Networks by Back-TrackingIRJET- Re-Configuration Topology for On-Chip Networks by Back-Tracking
IRJET- Re-Configuration Topology for On-Chip Networks by Back-Tracking
 
Area-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-ChipArea-Efficient Design of Scheduler for Routing Node of Network-On-Chip
Area-Efficient Design of Scheduler for Routing Node of Network-On-Chip
 
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIPAREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
AREA-EFFICIENT DESIGN OF SCHEDULER FOR ROUTING NODE OF NETWORK-ON-CHIP
 
IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...
IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...
IRJET- Throughput Performance Improvement for Unbalanced Slotted Aloha Relay ...
 
Application Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-ChipApplication Behavior-Aware Flow Control in Network-on-Chip
Application Behavior-Aware Flow Control in Network-on-Chip
 
Optimal configuration of network
Optimal configuration of networkOptimal configuration of network
Optimal configuration of network
 
Performance analysis and implementation of modified sdm based noc for mpsoc o...
Performance analysis and implementation of modified sdm based noc for mpsoc o...Performance analysis and implementation of modified sdm based noc for mpsoc o...
Performance analysis and implementation of modified sdm based noc for mpsoc o...
 
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
 
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real C...
 
Application Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-ChipApplication Aware Topology Generation for Surface Wave Networks-on-Chip
Application Aware Topology Generation for Surface Wave Networks-on-Chip
 
G011136871
G011136871G011136871
G011136871
 
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIPMODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
MODELLING AND SIMULATION OF 128-BIT CROSSBAR SWITCH FOR NETWORK -ONCHIP
 
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
Review of the paper: Traffic-aware Frequency Scaling for Balanced On-Chip Net...
 
An air index for spatial query processing in road networks
An air index for spatial query processing in road networksAn air index for spatial query processing in road networks
An air index for spatial query processing in road networks
 
A0520106
A0520106A0520106
A0520106
 
scopus indexed journals list
scopus indexed journals listscopus indexed journals list
scopus indexed journals list
 
published journals
published journalspublished journals
published journals
 
Noise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelNoise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc Model
 

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
LEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
LEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
LEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
LEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
LEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
LEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGATO project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
LEGATO project
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
LEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
LEGATO project
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
LEGATO project
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
LEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
LEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
LEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
LEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
LEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
LEGATO project
 

More from LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 

Recently uploaded

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 

Recently uploaded (20)

Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 

Moldable pipelines for CNNs on heterogeneous edge devices

  • 1. The LEGaTO project has received funding from the European Union’s Horizon 2020 research and innovation programme under the grant agreement No 780681. www.legato-project.eu Moldable pipelines for CNNs on heterogeneous edge devices Pirah Noor Soomro, Chalmers University of Technology A framework for efficient performance of CNNs on heterogeneous edge devices containing different type of compute resources. • We implement a brief and guided online training to find near optimal configuration for a balanced pipeline. • We designed a simple and programmer friendly interface to generate high throughput and balanced CNN pipeline by leveraging information provided through the interface. Motivation • Modern edge devices contain variable core configuration on a single chip. • Existing DNN libraries do not provide heterogeneity aware implementation of CNNs targeting edge devices. • Existing solutions [1,2] for CNN pipelines on edge devices require an offline training followed by an exhaustive DSE (Domain Search space Exploration). Background Edge Devices: Nvidia Jetson TX2 4 energy efficient cores, 2 high performance cores Methodology References 1. Wang, Siqi, et al. "High-throughput cnn inference on embedded arm big. little multi-core processors." IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2019). 2. Lu, Zongqing, et al. "Modeling the resource requirements of convolutional neural networks on mobile devices." Proceedings of the 25th ACM international conference on Multimedia. 2017. Conclusion • A balanced VGG pipeline increases throughput by 22% compared to baseline. • Computational hints provide a good seed to start exploration of near optimal configuration. • Our approach does offline partitioning and online molding (Changing number of cores) of pipeline stages to generate a balanced pipeline. Convolutional Neural Networks • Consecutive layers of computationally intensives convolutional kernels. • Each layer has different computational complexity, represented by input descriptors. • Figure on right represents VGG-16 CNN. • Widely used for classification on streaming input data. • Pipelined implementation is favored on streaming input. Ne de c Ge e a e c a a h f de c Ge e a e e e age R DSE g h a eed N Ye P e e ba a ced P ce e N Ye , e fe e ce de ec ed Pe f a ce deg aded 15 17 19 21 23 25 27 29 31 33 1 2 3 4 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 1 2 3 4 Computationalintensity PS1 PS2 PS3 PS4 PS5 PS6 Experiments and observations 1) 3PS: 2D-2A57-2A57 [7-7-7] 4) 2PS: 4A57-2D [13-8]2) 6PS: 1D-1D-1A57-1A57-1A57-1A57 [4-4-4-4-3-2] 3) 3PS: 2D-2A57-2A57 [7-4-10] Time Training Figure 4. Timeline of a 4-stage pipeline on a 20 cores machine. Training phase represents trying various pipeline configurations to select one best configurations for a balanced pipeline VGG pipelines on TX2 Four different pipeline configurations are tested on TX2. • Figure 1 shows configuration 3 is fastest. • Figure 2 also supports the observation, configuration 3 has the most balanced distribution of computations among 3 pipeline stages. • Figure 3 presents a view of pipelines. 1,2 and 4 are imbalanced pipelines while 3 yields comparatively balanced pipeline. C0 C1 C2 C3 L2 L1I L1D L1D L1I L1I L1D L1D L1I C5C4 L2 L1I L1D L1D L1I Network description in template language main(){ … Conv1 = CONV(ip, op, weights); Conv2 = CONV(conv1, op, weights); …. network.add(Conv1) ; network.add(Conv2) ; … network.execute(); } 4 A57s 2 Denvers Figure 1. Execution-time(s)/input of 4 different configurations of VGG pipelines (lower is better). The baseline is data parallel implementation of VGG-16 on TX2. Figure 2. Distribution of computational load among Pipeline Stages(PS). The numbers are derived from network input descriptors. Figure 3. Timeline of VGG pipelines read as; 1) 3-stage pipeline where first stage is scheduled on 2 Denver cores, second stage on 2 A57 cores and third on other 2 A57 cores. Configuration 3 is most balanced among four configurations A57 Denver C0 C1 C2 C3 C4 C5 Kernel level parallelism. Layers are executed one after another A57 Denver C0 C1 C2 C3 C4 C5 Layer 1-10 Input 1 Layer 1-10 Input 2 Layer 1-10 Input 3 Layer 11-21 Input 1 Layer 11-21 Input 2 Layer 11-21 Input 3 2 Stage pipeline on TX2 Conv 64 Conv 64 Maxpool Conv128 Conv 128 Maxpool Conv 256 Conv 256 Conv 256 Maxpool conv 512 conv 512 conv 512 Maxpool conv 512 conv 512 conv 512 Maxpool FC FC FC Conv 64 Conv 64 Maxpool Conv128 Conv 128 Maxpool Conv 256 Conv 256 Conv 256 ……… Conv 64 Conv 64 Maxpool ……. Conv 64 Conv 64 Maxpool ……. Conv 64 Conv 64 Maxpool ……. conv 512 conv 512 conv 512 …….. conv 512 conv 512 conv 512 …….. conv 512 conv 512 conv 512 ……..