SlideShare a Scribd company logo
1 of 18
Download to read offline
Transfer Learning for Performance
Modeling of Deep Neural Network
Systems
1
Md Shahriar Iqbal Lars Kotthoff Pooyan Jamshidi
Performance can change across systems
Can we expect similar performance in new environment? 2
Environment 1.0
Dev/Staging
Environment 2.0
Production
Push Performance
- Inference time
- Energy consumption
Case study: Performance behavior changes across
environments and so does optimal configuration
Default is
2.8 times bad
Default is
2.1 times bad
Default is
2.6 times bad
Default is
5.1 times bad
optimal
Optimal in
Environment 1.0
is 1.9 times bad
Optimal in
Environment 1.0
is 4.9 times bad
optimal
3
How can the developer tune parameters for energy
consumption during source code development?
DNN system stack allows developers to tune
parameters for energy consumption
core status, core frequency,
gpu status, gpu frequency,
memory controller frequency etc
4
Network
Level
Model
Compiler
Level
Deployment
Level
Hardware/OS
Level
2100
configurations possible
>100
DNN Systems are highly configurable
Building configuration space of DNN systems
5
Hardware/OS
Level
Core
frequency
Core
status
gpu
status
gpu
frequency
Core
status
Inference time
Energy consumption
A typical approach to understand system
behavior is sensitivity analysis
C O1
xO2
xO3
xO4....
xO20
c1
0x0x0x0....x1 y1
=f(c1
)
c2
0x0x1x0....x1 y2
=f(c2
)
c3
1x0x0x0....x0 y3
=f(c3
)
. . . . . . . f*
->f
cn
1x1x1x0....x1 yn
=f(cn
)
Training Set
learn
6
Performance models could be in any form of
black box models
C O1
xO2
xO3
xO4....
xO20
c1
0x0x0x0....x1 y1
=f(c1
)
c2
0x0x1x0....x1 y2
=f(c2
)
c3
1x0x0x0....x0 y3
=f(c3
)
. . . . . . . f*
->f
cn
1x1x1x0....x1 yn
=f(cn
)
Training Set
7
Evaluating a performance model
C O1
xO2
xO3
xO4....
xO20
c1
0x0x0x0....x1 y1
=f(c1
)
c2
0x0x1x0....x1 y2
=f(c2
)
c3
1x0x0x0....x0 y3
=f(c3
) MAPE(f*
,f)=
. . . . . . . f*
->f
cn
1x1x1x0....x1 yn
=f(cn
)
Training Set
8
A Performance Model contains useful
information about option interactions
f:C->R
f(.)=1.2+3O1
+5O2
+7O7
+8O1
O2
-3.4O1
O7
+5O3
O7
+11.2O1
O2
O7
9
What are the roadblocks for performance
modeling of DNN systems?
● Configuration space grows exponentially with addition of new options.
● Performance measurements are costly.
220
=1048576 configurations takes nearly 2 years for measurement considering
only 60s on average per configuration.
Transfer learning can help to reuse the cheap measurements
from old environments to new environments to quickly learn
system behavior using performance modeling. 10
Direct Model
Transfer
Learning
Model learned in source
environment is used directly
in target environment
11
s fs
(s)
t fs
(t)
Experimentation Cost: 0
Linear/Non-Linear
Model Shift
Transfer Learning
Linear and non-linear model
shift are learned using
source model (all samples)
and target model (random
sub-samples).
Experimentation Cost: 0
s fs
(s)
t*
ft
(t*
)
f*
Experimentation Cost: m(t*
)
12
Guided Sampling
Transfer
Learning
Influential configurations
are sampled in target
environment by studying
options interactions from
source environment to build
performance model.
s fs
(s)
t*
ft
(t*
)
Experimentation Cost: m(t*
)
13
Experimental Setup
Measurement
Analysis
Select
Environment Generate
Configuration
Inference
Compute
Inference Time
Compute Energy
Consumption
Extract
knowledge to
sample
configuration
space from
Source
Measurements
Transfer
Knowledge
Build Performance
Model in Target
Compute Prediction
Error
fs
(s)=a1
O1
+a2
O3
+a3
O7
+a4
O1
O3
+a5
O1
O7
+..+a7
O1
O3
O
14
Results: Inference Time
Cost Reduction of Error
Direct Model Transfer 0 1
Linear Model Shift 2.48 hours (0.15%) 3.48
Non-Linear Model Shift 2.48 hours (0.15%) 5.76
Guided Sampling 2.48 hours (0.15%) 22.93 15
Results: Energy Consumption
Cost Reduction of Error
Direct Model Transfer 0 1
Linear Model Shift 2.48 hours (0.15%) 7.19
Non-Linear Model Shift 2.48 hours (0.15%) 16.31
Guided sampling 2.48 hours (0.15%) 42.85 16
Useful tool for practitioners
● To build performance models in new environments using
only 2.44% of the configuration space.
● To avoid invalid configurations to quickly find optimal
configurations.
● To learn the performance landscape of a system for
performance debugging e.g., performance regression etc.
17
Summary
18

More Related Content

What's hot

Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingRamandeep Kaur
 
Room Acoustics Simulation Using GPU
Room Acoustics Simulation Using GPURoom Acoustics Simulation Using GPU
Room Acoustics Simulation Using GPUChirag Batra
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...Rafael Ferreira da Silva
 
JOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHM
JOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHMJOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHM
JOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHMmailjkb
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architecturesIslam Samir
 
Task allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsTask allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsDon William
 
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...Journal For Research
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy Ehsan Sharifi
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwarePooyan Jamshidi
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Rafael Ferreira da Silva
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersCleverence Kombe
 
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Naoki Shibata
 
Cloud Partitioning of Load Balancing Using Round Robin Model
Cloud Partitioning of Load Balancing Using Round Robin ModelCloud Partitioning of Load Balancing Using Round Robin Model
Cloud Partitioning of Load Balancing Using Round Robin ModelIJCERT
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
Multi-phase-field simulations with OpenPhase
Multi-phase-field simulations with OpenPhaseMulti-phase-field simulations with OpenPhase
Multi-phase-field simulations with OpenPhasePFHub PFHub
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersAshraf Uddin
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...Soumya Banerjee
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh
 

What's hot (20)

Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud Computing
 
Room Acoustics Simulation Using GPU
Room Acoustics Simulation Using GPURoom Acoustics Simulation Using GPU
Room Acoustics Simulation Using GPU
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
JOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHM
JOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHMJOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHM
JOB SCHEDULING USING ANT COLONY OPTIMIZATION ALGORITHM
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures
 
Task allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessorsTask allocation and scheduling inmultiprocessors
Task allocation and scheduling inmultiprocessors
 
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
TASK SCHEDULING USING AMALGAMATION OF MET HEURISTICS SWARM OPTIMIZATION ALGOR...
 
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy A Study on Task Scheduling in Could Data Centers for Energy Efficacy
A Study on Task Scheduling in Could Data Centers for Energy Efficacy
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost an...
 
Cloud Partitioning of Load Balancing Using Round Robin Model
Cloud Partitioning of Load Balancing Using Round Robin ModelCloud Partitioning of Load Balancing Using Round Robin Model
Cloud Partitioning of Load Balancing Using Round Robin Model
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
Multi-phase-field simulations with OpenPhase
Multi-phase-field simulations with OpenPhaseMulti-phase-field simulations with OpenPhase
Multi-phase-field simulations with OpenPhase
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 

Similar to Opml 19-presentation-pdf

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
Scheduling in next generation os
Scheduling in next generation osScheduling in next generation os
Scheduling in next generation osSantosh Nage
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsNECST Lab @ Politecnico di Milano
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...Naoki Shibata
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen HypervisorMatteo Ferroni
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Comparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyComparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyBenoit des Ligneris
 
Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Jaipal Dhobale
 
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Tulipp. Eu
 
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...Mark J. Feldman
 
Optimal Decision Policy for Deteriorating Systems Under Variable Operating C...
Optimal Decision Policy for  Deteriorating Systems Under Variable Operating C...Optimal Decision Policy for  Deteriorating Systems Under Variable Operating C...
Optimal Decision Policy for Deteriorating Systems Under Variable Operating C...Dinesh Rajapaksha
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIlya Kryukov
 
The Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy ConsumptionThe Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy ConsumptionGreenLabAtDI
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
 
Black-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsBlack-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsMohammad Jafar Mashhadi
 

Similar to Opml 19-presentation-pdf (20)

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
Scheduling in next generation os
Scheduling in next generation osScheduling in next generation os
Scheduling in next generation os
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
 
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
(Slides) A Method for Distributed Computaion of Semi-Optimal Multicast Tree i...
 
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
[EWiLi2016] Enabling power-awareness for the Xen Hypervisor
 
Green scheduling
Green schedulingGreen scheduling
Green scheduling
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Comparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyComparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization Technology
 
Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...Computer Network Performance evaluation based on Network scalability using OM...
Computer Network Performance evaluation based on Network scalability using OM...
 
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
Quantifying Energy Consumption for Practical Fork-Join Parallelism on an Embe...
 
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...Cruz:Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Oper...
 
Presentation
PresentationPresentation
Presentation
 
Unit 3 part2
Unit 3 part2Unit 3 part2
Unit 3 part2
 
Optimal Decision Policy for Deteriorating Systems Under Variable Operating C...
Optimal Decision Policy for  Deteriorating Systems Under Variable Operating C...Optimal Decision Policy for  Deteriorating Systems Under Variable Operating C...
Optimal Decision Policy for Deteriorating Systems Under Variable Operating C...
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver Library
 
The Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy ConsumptionThe Influence of the Java Collection Framework on Overall Energy Consumption
The Influence of the Java Collection Framework on Overall Energy Consumption
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
Black-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software SystemsBlack-box Behavioral Model Inference for Autopilot Software Systems
Black-box Behavioral Model Inference for Autopilot Software Systems
 

Recently uploaded

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Recently uploaded (20)

办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

Opml 19-presentation-pdf

  • 1. Transfer Learning for Performance Modeling of Deep Neural Network Systems 1 Md Shahriar Iqbal Lars Kotthoff Pooyan Jamshidi
  • 2. Performance can change across systems Can we expect similar performance in new environment? 2 Environment 1.0 Dev/Staging Environment 2.0 Production Push Performance - Inference time - Energy consumption
  • 3. Case study: Performance behavior changes across environments and so does optimal configuration Default is 2.8 times bad Default is 2.1 times bad Default is 2.6 times bad Default is 5.1 times bad optimal Optimal in Environment 1.0 is 1.9 times bad Optimal in Environment 1.0 is 4.9 times bad optimal 3 How can the developer tune parameters for energy consumption during source code development?
  • 4. DNN system stack allows developers to tune parameters for energy consumption core status, core frequency, gpu status, gpu frequency, memory controller frequency etc 4 Network Level Model Compiler Level Deployment Level Hardware/OS Level 2100 configurations possible >100 DNN Systems are highly configurable
  • 5. Building configuration space of DNN systems 5 Hardware/OS Level Core frequency Core status gpu status gpu frequency Core status Inference time Energy consumption
  • 6. A typical approach to understand system behavior is sensitivity analysis C O1 xO2 xO3 xO4.... xO20 c1 0x0x0x0....x1 y1 =f(c1 ) c2 0x0x1x0....x1 y2 =f(c2 ) c3 1x0x0x0....x0 y3 =f(c3 ) . . . . . . . f* ->f cn 1x1x1x0....x1 yn =f(cn ) Training Set learn 6
  • 7. Performance models could be in any form of black box models C O1 xO2 xO3 xO4.... xO20 c1 0x0x0x0....x1 y1 =f(c1 ) c2 0x0x1x0....x1 y2 =f(c2 ) c3 1x0x0x0....x0 y3 =f(c3 ) . . . . . . . f* ->f cn 1x1x1x0....x1 yn =f(cn ) Training Set 7
  • 8. Evaluating a performance model C O1 xO2 xO3 xO4.... xO20 c1 0x0x0x0....x1 y1 =f(c1 ) c2 0x0x1x0....x1 y2 =f(c2 ) c3 1x0x0x0....x0 y3 =f(c3 ) MAPE(f* ,f)= . . . . . . . f* ->f cn 1x1x1x0....x1 yn =f(cn ) Training Set 8
  • 9. A Performance Model contains useful information about option interactions f:C->R f(.)=1.2+3O1 +5O2 +7O7 +8O1 O2 -3.4O1 O7 +5O3 O7 +11.2O1 O2 O7 9
  • 10. What are the roadblocks for performance modeling of DNN systems? ● Configuration space grows exponentially with addition of new options. ● Performance measurements are costly. 220 =1048576 configurations takes nearly 2 years for measurement considering only 60s on average per configuration. Transfer learning can help to reuse the cheap measurements from old environments to new environments to quickly learn system behavior using performance modeling. 10
  • 11. Direct Model Transfer Learning Model learned in source environment is used directly in target environment 11 s fs (s) t fs (t) Experimentation Cost: 0
  • 12. Linear/Non-Linear Model Shift Transfer Learning Linear and non-linear model shift are learned using source model (all samples) and target model (random sub-samples). Experimentation Cost: 0 s fs (s) t* ft (t* ) f* Experimentation Cost: m(t* ) 12
  • 13. Guided Sampling Transfer Learning Influential configurations are sampled in target environment by studying options interactions from source environment to build performance model. s fs (s) t* ft (t* ) Experimentation Cost: m(t* ) 13
  • 14. Experimental Setup Measurement Analysis Select Environment Generate Configuration Inference Compute Inference Time Compute Energy Consumption Extract knowledge to sample configuration space from Source Measurements Transfer Knowledge Build Performance Model in Target Compute Prediction Error fs (s)=a1 O1 +a2 O3 +a3 O7 +a4 O1 O3 +a5 O1 O7 +..+a7 O1 O3 O 14
  • 15. Results: Inference Time Cost Reduction of Error Direct Model Transfer 0 1 Linear Model Shift 2.48 hours (0.15%) 3.48 Non-Linear Model Shift 2.48 hours (0.15%) 5.76 Guided Sampling 2.48 hours (0.15%) 22.93 15
  • 16. Results: Energy Consumption Cost Reduction of Error Direct Model Transfer 0 1 Linear Model Shift 2.48 hours (0.15%) 7.19 Non-Linear Model Shift 2.48 hours (0.15%) 16.31 Guided sampling 2.48 hours (0.15%) 42.85 16
  • 17. Useful tool for practitioners ● To build performance models in new environments using only 2.44% of the configuration space. ● To avoid invalid configurations to quickly find optimal configurations. ● To learn the performance landscape of a system for performance debugging e.g., performance regression etc. 17