SlideShare a Scribd company logo
1 of 34
Download to read offline
A Survey of Model
Compression Methods
Rustem Feyzkhanov
Staff Machine Learning Engineer
Instrumental
• Cloud-to-edge ML pipeline and optimizations
• Model serialization
• Pruning
• Quantization
• Weight clustering
• Knowledge distillation
• Architecture search
• Summary
Agenda
2
© 2023 Instrumental
• Latency
• Memory usage
• Energy usage
• Disk usage
Challenges with ML on Edge
3
© 2023 Instrumental
Cloud-to-edge ML pipeline
4
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
• Data
acquisition
• Data
labeling
• Feature
selection
• Model
architecture
• Cloud training
pipeline
• Parameter
optimizations
• Serialization
• Post training
optimizations
• Compiling
model in cloud
for hardware
• Serving model
to the edge
• Initializing
model
• Making
predictions
Optimizations in ML pipeline
5
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Optimizations in ML pipeline
6
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Easier to implement/add
Affects accuracy less
7
© 2023 Instrumental
Model serialization
• Hardware specific optimizations
• Framework built-in native model compression methods
• Significantly affects speed
Inference frameworks and model serialization
8
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Frameworks focused on running inference. Lightweight, focused on
specific hardware, require separate serialization.
• Hardware agnostic: e.g., TorchScript, ONNX, TFLite*
• Hardware specific:
• CPU: e.g., OpenVINO (Intel)
• GPU: e.g., TensorRT
• Mobile: e.g., CoreML
• NPU: Check out Embedded Vision Alliance members :-)
Inference frameworks
9
© 2023 Instrumental
• Object detection - YOLOv5s
Inference frameworks - example
10
© 2023 Instrumental
Framework Size, MB CPU inference, ms GPU (V100) inf, ms
PyTorch 29.5 127.61 10.19
TorchScript 29.4 131.23 6.85
TensorRT 33.3 N/A 1.89
ONNX 29.3 69.34 14.63
OpenVINO 29.3 66.52 N/A
TFLite 29.0 316.61 N/A
From https://docs.ultralytics.com/yolov5/tutorials/model_export/
11
© 2023 Instrumental
Pruning
• Eliminating redundant or unimportant parameters
• Affects accuracy, but can be addressed by finetuning
• Affects size more than speed
Pruning
12
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
• Weight pruning
• Structural pruning
• Neuron
• Layer
• Filter
• Channel
Types of pruning
13
© 2023 Instrumental
Weight pruning
Neuron pruning
Weight pruning example
14
© 2023 Instrumental
• Image classification example – weight pruning
Configuration
Number of
parameters
Top-1 accuracy,
ImageNet
InceptionV3 Original 27.1M 78.1%
50% sparsity 13.6M 78.0%
75% sparsity 6.8M 76.1%
87.5% sparsity 3.3M 74.6%
From https://www.tensorflow.org/model_optimization/guide/pruning
15
© 2023 Instrumental
Quantization
• Reducing numerical precision of weights and activations
• Affects accuracy, but can be addressed during training
• Checkout the talk “Practical Approaches to DNN
Quantization” later today
Quantization
16
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Quantization example
17
© 2023 Instrumental
• Image classification example – quantization
Configuration Size, MB
Inference
(CPU), ms
Top-1 accuracy,
ImageNet
InceptionV3 Original 95.7 1130 78.1%
Post training
quantization
23.9 845 77.2%
Quantization
aware training
23.9 543 77.5%
From https://www.tensorflow.org/lite/performance/model_optimization
18
© 2023 Instrumental
Weight clustering
• Cluster model weights and use indices
• Only optimizes model size
• Similar to quantization, but doesn’t change computation
complexity
Weight clustering
19
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Weight clustering example
20
© 2023 Instrumental
• Replaces weights with
reference to the closest
centroids
• Centroids are usually
rounded, but not
quantized so main gain is
model size
0.86 -1.88 -1.44
-1.7 1.58 0.12
1.9 0.46 1.37
1 2 2
2 3 1
3 1 3
Centroid Index
0.33 1
-1.26 2
1.92 3
Weight
matrix
Weight clustering example
21
© 2023 Instrumental
• Image classification example
Configuration Size, MB
Top-1 accuracy,
ImageNet
MobileNetV2 Original 12.38 71.7%
Last 3 layers, 32 clusters 7.03 70.9%
Last 3 layers, 16 clusters 6.68 70.7%
All layers, 32 clsuters 4.05 69.7%
From https://www.tensorflow.org/model_optimization/guide/clustering
22
© 2023 Instrumental
Knowledge distillation
• Smaller, more efficient "student" model learns to mimic the
behavior of a larger, pre-trained "teacher" model
Knowledge distillation
23
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Student
Teacher
Training
data
Knowledge distillation - example
24
© 2023 Instrumental
• Only student model is
trained
• Student model and
teacher model run on the
same images
• Error is propagated back
for student mode Predictions
Predictions
True label
Knowledge distillation - example
25
© 2023 Instrumental
• Image classification example
From https://www.analyticsvidhya.com/blog/2022/01/knowledge-distillation-theory-and-end-to-end-case-study/
Base model
No. of
parameters
Test
accuracy
Teacher model VGG16 27 M 77%
Student model with
Distillation
VGG16 pruned 296 k 75%
Student model without
Distillation
VGG16 pruned 296 k 64%
• Finding smaller models which have less parameters and have
faster predictions
Optimizing model architecture
26
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Model architecture - example
27
© 2023 Instrumental
• Object detection – YOLOv5 family
From https://github.com/ultralytics/yolov5
Model Size, MB mAP, COCO CPU inf, ms GPU (V100) inf, ms
YOLOv5n 4.1 45.7 45 6.3
YOLOv5s 14.8 56.8 98 6.4
YOLOv5m 42.8 64.1 224 8.2
YOLOv5l 93.6 67.3 430 10.1
YOLOv5x 174.1 68.9 766 12.1
28
© 2023 Instrumental
Summary
Comparing optimization techniques
29
© 2023 Instrumental
Optimization Size decrease Speed increase
Inference framework Low High
Pruning High Low
Quantization High High
Weight clustering Low Low
Architecture search High High
• Use a test dataset to assess performance changes
• Impacts vary (e.g., less for image classification, more for object
detection)
• Integrate compilation optimizations into training for optimal model
selection
• Test on target hardware for deployment
• Balance trade-offs: Understand acceptable accuracy loss and
business metric impact
Recommendations on choosing model
compression techniques
30
© 2023 Instrumental
• Open-source
• Inference or training frameworks with built-in solutions
• Commercial
• A number of companies specifically focus on optimizing your
models (some of them are Alliance Members)
Open-source vs commercial
31
© 2023 Instrumental
• Begin with an open-source solution for quick optimization gains
• Consider commercial for specialized hardware (e.g., mobile) with
easy trial options
• Use a test set to validate accuracy trade-offs for both approaches
Open-source vs commercial
32
© 2023 Instrumental
• Model compression is vital for cloud-to-edge ML pipelines
• Streamlined training pipeline enables easy exploration of
approaches
• Integrating compression in training pipeline ensures optimal
accuracy
Conclusions
33
© 2023 Instrumental
• Inference framework guides
• OpenVINO https://docs.openvino.ai/latest/openvino_docs_model_optimization_guide.html
• TFLite https://www.tensorflow.org/lite/performance/model_optimization
• ONNX https://onnxruntime.ai/docs/performance/model-optimizations/
• Books
• TinyML https://www.oreilly.com/library/view/tinyml/9781492052036/
• Deep Learning with PyTorch https://livebook.manning.com/book/deep-learning-with-pytorch/
Resources
34
© 2023 Instrumental

More Related Content

Similar to “A Survey of Model Compression Methods,” a Presentation from Instrumental

Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
 
Software Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalSoftware Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalwww.pixelsolutionbd.com
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmIRJET Journal
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...Edge AI and Vision Alliance
 
Metrology sampling models using tool sensor data
Metrology sampling models using tool sensor dataMetrology sampling models using tool sensor data
Metrology sampling models using tool sensor dataArvind Mozumdar
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RIRJET Journal
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
 
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Luca Zavarella
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoMLArpitha Gurumurthy
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineersCameron Joannidis
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at OracleSandesh Rao
 
Solving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking ResultsSolving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking ResultsScyllaDB
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekaPrashant Menon
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 antimo musone
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)Arnab Biswas
 
Cloud – A strategic opportunity
Cloud – A strategic opportunityCloud – A strategic opportunity
Cloud – A strategic opportunityMicrosoft
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 

Similar to “A Survey of Model Compression Methods,” a Presentation from Instrumental (20)

Practical data science
Practical data sciencePractical data science
Practical data science
 
Software Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalSoftware Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_final
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
Metrology sampling models using tool sensor data
Metrology sampling models using tool sensor dataMetrology sampling models using tool sensor data
Metrology sampling models using tool sensor data
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoML
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
 
Solving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking ResultsSolving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking Results
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Cloud – A strategic opportunity
Cloud – A strategic opportunityCloud – A strategic opportunity
Cloud – A strategic opportunity
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

“A Survey of Model Compression Methods,” a Presentation from Instrumental

  • 1. A Survey of Model Compression Methods Rustem Feyzkhanov Staff Machine Learning Engineer Instrumental
  • 2. • Cloud-to-edge ML pipeline and optimizations • Model serialization • Pruning • Quantization • Weight clustering • Knowledge distillation • Architecture search • Summary Agenda 2 © 2023 Instrumental
  • 3. • Latency • Memory usage • Energy usage • Disk usage Challenges with ML on Edge 3 © 2023 Instrumental
  • 4. Cloud-to-edge ML pipeline 4 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference • Data acquisition • Data labeling • Feature selection • Model architecture • Cloud training pipeline • Parameter optimizations • Serialization • Post training optimizations • Compiling model in cloud for hardware • Serving model to the edge • Initializing model • Making predictions
  • 5. Optimizations in ML pipeline 5 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 6. Optimizations in ML pipeline 6 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference Easier to implement/add Affects accuracy less
  • 8. • Hardware specific optimizations • Framework built-in native model compression methods • Significantly affects speed Inference frameworks and model serialization 8 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 9. Frameworks focused on running inference. Lightweight, focused on specific hardware, require separate serialization. • Hardware agnostic: e.g., TorchScript, ONNX, TFLite* • Hardware specific: • CPU: e.g., OpenVINO (Intel) • GPU: e.g., TensorRT • Mobile: e.g., CoreML • NPU: Check out Embedded Vision Alliance members :-) Inference frameworks 9 © 2023 Instrumental
  • 10. • Object detection - YOLOv5s Inference frameworks - example 10 © 2023 Instrumental Framework Size, MB CPU inference, ms GPU (V100) inf, ms PyTorch 29.5 127.61 10.19 TorchScript 29.4 131.23 6.85 TensorRT 33.3 N/A 1.89 ONNX 29.3 69.34 14.63 OpenVINO 29.3 66.52 N/A TFLite 29.0 316.61 N/A From https://docs.ultralytics.com/yolov5/tutorials/model_export/
  • 12. • Eliminating redundant or unimportant parameters • Affects accuracy, but can be addressed by finetuning • Affects size more than speed Pruning 12 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 13. • Weight pruning • Structural pruning • Neuron • Layer • Filter • Channel Types of pruning 13 © 2023 Instrumental Weight pruning Neuron pruning
  • 14. Weight pruning example 14 © 2023 Instrumental • Image classification example – weight pruning Configuration Number of parameters Top-1 accuracy, ImageNet InceptionV3 Original 27.1M 78.1% 50% sparsity 13.6M 78.0% 75% sparsity 6.8M 76.1% 87.5% sparsity 3.3M 74.6% From https://www.tensorflow.org/model_optimization/guide/pruning
  • 16. • Reducing numerical precision of weights and activations • Affects accuracy, but can be addressed during training • Checkout the talk “Practical Approaches to DNN Quantization” later today Quantization 16 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 17. Quantization example 17 © 2023 Instrumental • Image classification example – quantization Configuration Size, MB Inference (CPU), ms Top-1 accuracy, ImageNet InceptionV3 Original 95.7 1130 78.1% Post training quantization 23.9 845 77.2% Quantization aware training 23.9 543 77.5% From https://www.tensorflow.org/lite/performance/model_optimization
  • 19. • Cluster model weights and use indices • Only optimizes model size • Similar to quantization, but doesn’t change computation complexity Weight clustering 19 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 20. Weight clustering example 20 © 2023 Instrumental • Replaces weights with reference to the closest centroids • Centroids are usually rounded, but not quantized so main gain is model size 0.86 -1.88 -1.44 -1.7 1.58 0.12 1.9 0.46 1.37 1 2 2 2 3 1 3 1 3 Centroid Index 0.33 1 -1.26 2 1.92 3 Weight matrix
  • 21. Weight clustering example 21 © 2023 Instrumental • Image classification example Configuration Size, MB Top-1 accuracy, ImageNet MobileNetV2 Original 12.38 71.7% Last 3 layers, 32 clusters 7.03 70.9% Last 3 layers, 16 clusters 6.68 70.7% All layers, 32 clsuters 4.05 69.7% From https://www.tensorflow.org/model_optimization/guide/clustering
  • 23. • Smaller, more efficient "student" model learns to mimic the behavior of a larger, pre-trained "teacher" model Knowledge distillation 23 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 24. Student Teacher Training data Knowledge distillation - example 24 © 2023 Instrumental • Only student model is trained • Student model and teacher model run on the same images • Error is propagated back for student mode Predictions Predictions True label
  • 25. Knowledge distillation - example 25 © 2023 Instrumental • Image classification example From https://www.analyticsvidhya.com/blog/2022/01/knowledge-distillation-theory-and-end-to-end-case-study/ Base model No. of parameters Test accuracy Teacher model VGG16 27 M 77% Student model with Distillation VGG16 pruned 296 k 75% Student model without Distillation VGG16 pruned 296 k 64%
  • 26. • Finding smaller models which have less parameters and have faster predictions Optimizing model architecture 26 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 27. Model architecture - example 27 © 2023 Instrumental • Object detection – YOLOv5 family From https://github.com/ultralytics/yolov5 Model Size, MB mAP, COCO CPU inf, ms GPU (V100) inf, ms YOLOv5n 4.1 45.7 45 6.3 YOLOv5s 14.8 56.8 98 6.4 YOLOv5m 42.8 64.1 224 8.2 YOLOv5l 93.6 67.3 430 10.1 YOLOv5x 174.1 68.9 766 12.1
  • 29. Comparing optimization techniques 29 © 2023 Instrumental Optimization Size decrease Speed increase Inference framework Low High Pruning High Low Quantization High High Weight clustering Low Low Architecture search High High
  • 30. • Use a test dataset to assess performance changes • Impacts vary (e.g., less for image classification, more for object detection) • Integrate compilation optimizations into training for optimal model selection • Test on target hardware for deployment • Balance trade-offs: Understand acceptable accuracy loss and business metric impact Recommendations on choosing model compression techniques 30 © 2023 Instrumental
  • 31. • Open-source • Inference or training frameworks with built-in solutions • Commercial • A number of companies specifically focus on optimizing your models (some of them are Alliance Members) Open-source vs commercial 31 © 2023 Instrumental
  • 32. • Begin with an open-source solution for quick optimization gains • Consider commercial for specialized hardware (e.g., mobile) with easy trial options • Use a test set to validate accuracy trade-offs for both approaches Open-source vs commercial 32 © 2023 Instrumental
  • 33. • Model compression is vital for cloud-to-edge ML pipelines • Streamlined training pipeline enables easy exploration of approaches • Integrating compression in training pipeline ensures optimal accuracy Conclusions 33 © 2023 Instrumental
  • 34. • Inference framework guides • OpenVINO https://docs.openvino.ai/latest/openvino_docs_model_optimization_guide.html • TFLite https://www.tensorflow.org/lite/performance/model_optimization • ONNX https://onnxruntime.ai/docs/performance/model-optimizations/ • Books • TinyML https://www.oreilly.com/library/view/tinyml/9781492052036/ • Deep Learning with PyTorch https://livebook.manning.com/book/deep-learning-with-pytorch/ Resources 34 © 2023 Instrumental