SlideShare a Scribd company logo
1 of 34
Download to read offline
A Survey of Model
Compression Methods
Rustem Feyzkhanov
Staff Machine Learning Engineer
Instrumental
• Cloud-to-edge ML pipeline and optimizations
• Model serialization
• Pruning
• Quantization
• Weight clustering
• Knowledge distillation
• Architecture search
• Summary
Agenda
2
© 2023 Instrumental
• Latency
• Memory usage
• Energy usage
• Disk usage
Challenges with ML on Edge
3
© 2023 Instrumental
Cloud-to-edge ML pipeline
4
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
• Data
acquisition
• Data
labeling
• Feature
selection
• Model
architecture
• Cloud training
pipeline
• Parameter
optimizations
• Serialization
• Post training
optimizations
• Compiling
model in cloud
for hardware
• Serving model
to the edge
• Initializing
model
• Making
predictions
Optimizations in ML pipeline
5
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Optimizations in ML pipeline
6
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Easier to implement/add
Affects accuracy less
7
© 2023 Instrumental
Model serialization
• Hardware specific optimizations
• Framework built-in native model compression methods
• Significantly affects speed
Inference frameworks and model serialization
8
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Frameworks focused on running inference. Lightweight, focused on
specific hardware, require separate serialization.
• Hardware agnostic: e.g., TorchScript, ONNX, TFLite*
• Hardware specific:
• CPU: e.g., OpenVINO (Intel)
• GPU: e.g., TensorRT
• Mobile: e.g., CoreML
• NPU: Check out Embedded Vision Alliance members :-)
Inference frameworks
9
© 2023 Instrumental
• Object detection - YOLOv5s
Inference frameworks - example
10
© 2023 Instrumental
Framework Size, MB CPU inference, ms GPU (V100) inf, ms
PyTorch 29.5 127.61 10.19
TorchScript 29.4 131.23 6.85
TensorRT 33.3 N/A 1.89
ONNX 29.3 69.34 14.63
OpenVINO 29.3 66.52 N/A
TFLite 29.0 316.61 N/A
From https://docs.ultralytics.com/yolov5/tutorials/model_export/
11
© 2023 Instrumental
Pruning
• Eliminating redundant or unimportant parameters
• Affects accuracy, but can be addressed by finetuning
• Affects size more than speed
Pruning
12
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
• Weight pruning
• Structural pruning
• Neuron
• Layer
• Filter
• Channel
Types of pruning
13
© 2023 Instrumental
Weight pruning
Neuron pruning
Weight pruning example
14
© 2023 Instrumental
• Image classification example – weight pruning
Configuration
Number of
parameters
Top-1 accuracy,
ImageNet
InceptionV3 Original 27.1M 78.1%
50% sparsity 13.6M 78.0%
75% sparsity 6.8M 76.1%
87.5% sparsity 3.3M 74.6%
From https://www.tensorflow.org/model_optimization/guide/pruning
15
© 2023 Instrumental
Quantization
• Reducing numerical precision of weights and activations
• Affects accuracy, but can be addressed during training
• Checkout the talk “Practical Approaches to DNN
Quantization” later today
Quantization
16
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Quantization example
17
© 2023 Instrumental
• Image classification example – quantization
Configuration Size, MB
Inference
(CPU), ms
Top-1 accuracy,
ImageNet
InceptionV3 Original 95.7 1130 78.1%
Post training
quantization
23.9 845 77.2%
Quantization
aware training
23.9 543 77.5%
From https://www.tensorflow.org/lite/performance/model_optimization
18
© 2023 Instrumental
Weight clustering
• Cluster model weights and use indices
• Only optimizes model size
• Similar to quantization, but doesn’t change computation
complexity
Weight clustering
19
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Weight clustering example
20
© 2023 Instrumental
• Replaces weights with
reference to the closest
centroids
• Centroids are usually
rounded, but not
quantized so main gain is
model size
0.86 -1.88 -1.44
-1.7 1.58 0.12
1.9 0.46 1.37
1 2 2
2 3 1
3 1 3
Centroid Index
0.33 1
-1.26 2
1.92 3
Weight
matrix
Weight clustering example
21
© 2023 Instrumental
• Image classification example
Configuration Size, MB
Top-1 accuracy,
ImageNet
MobileNetV2 Original 12.38 71.7%
Last 3 layers, 32 clusters 7.03 70.9%
Last 3 layers, 16 clusters 6.68 70.7%
All layers, 32 clsuters 4.05 69.7%
From https://www.tensorflow.org/model_optimization/guide/clustering
22
© 2023 Instrumental
Knowledge distillation
• Smaller, more efficient "student" model learns to mimic the
behavior of a larger, pre-trained "teacher" model
Knowledge distillation
23
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Student
Teacher
Training
data
Knowledge distillation - example
24
© 2023 Instrumental
• Only student model is
trained
• Student model and
teacher model run on the
same images
• Error is propagated back
for student mode Predictions
Predictions
True label
Knowledge distillation - example
25
© 2023 Instrumental
• Image classification example
From https://www.analyticsvidhya.com/blog/2022/01/knowledge-distillation-theory-and-end-to-end-case-study/
Base model
No. of
parameters
Test
accuracy
Teacher model VGG16 27 M 77%
Student model with
Distillation
VGG16 pruned 296 k 75%
Student model without
Distillation
VGG16 pruned 296 k 64%
• Finding smaller models which have less parameters and have
faster predictions
Optimizing model architecture
26
© 2023 Instrumental
Data
gathering
Architecture
search
Model
training
Model
compilation
Deployment
and model
inference
Model architecture - example
27
© 2023 Instrumental
• Object detection – YOLOv5 family
From https://github.com/ultralytics/yolov5
Model Size, MB mAP, COCO CPU inf, ms GPU (V100) inf, ms
YOLOv5n 4.1 45.7 45 6.3
YOLOv5s 14.8 56.8 98 6.4
YOLOv5m 42.8 64.1 224 8.2
YOLOv5l 93.6 67.3 430 10.1
YOLOv5x 174.1 68.9 766 12.1
28
© 2023 Instrumental
Summary
Comparing optimization techniques
29
© 2023 Instrumental
Optimization Size decrease Speed increase
Inference framework Low High
Pruning High Low
Quantization High High
Weight clustering Low Low
Architecture search High High
• Use a test dataset to assess performance changes
• Impacts vary (e.g., less for image classification, more for object
detection)
• Integrate compilation optimizations into training for optimal model
selection
• Test on target hardware for deployment
• Balance trade-offs: Understand acceptable accuracy loss and
business metric impact
Recommendations on choosing model
compression techniques
30
© 2023 Instrumental
• Open-source
• Inference or training frameworks with built-in solutions
• Commercial
• A number of companies specifically focus on optimizing your
models (some of them are Alliance Members)
Open-source vs commercial
31
© 2023 Instrumental
• Begin with an open-source solution for quick optimization gains
• Consider commercial for specialized hardware (e.g., mobile) with
easy trial options
• Use a test set to validate accuracy trade-offs for both approaches
Open-source vs commercial
32
© 2023 Instrumental
• Model compression is vital for cloud-to-edge ML pipelines
• Streamlined training pipeline enables easy exploration of
approaches
• Integrating compression in training pipeline ensures optimal
accuracy
Conclusions
33
© 2023 Instrumental
• Inference framework guides
• OpenVINO https://docs.openvino.ai/latest/openvino_docs_model_optimization_guide.html
• TFLite https://www.tensorflow.org/lite/performance/model_optimization
• ONNX https://onnxruntime.ai/docs/performance/model-optimizations/
• Books
• TinyML https://www.oreilly.com/library/view/tinyml/9781492052036/
• Deep Learning with PyTorch https://livebook.manning.com/book/deep-learning-with-pytorch/
Resources
34
© 2023 Instrumental

More Related Content

Similar to “A Survey of Model Compression Methods,” a Presentation from Instrumental

Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
 
Software Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalSoftware Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalwww.pixelsolutionbd.com
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmIRJET Journal
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...Edge AI and Vision Alliance
 
Metrology sampling models using tool sensor data
Metrology sampling models using tool sensor dataMetrology sampling models using tool sensor data
Metrology sampling models using tool sensor dataArvind Mozumdar
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RIRJET Journal
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
 
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Luca Zavarella
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoMLArpitha Gurumurthy
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineersCameron Joannidis
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at OracleSandesh Rao
 
Solving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking ResultsSolving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking ResultsScyllaDB
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using wekaPrashant Menon
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 antimo musone
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)Arnab Biswas
 
Cloud – A strategic opportunity
Cloud – A strategic opportunityCloud – A strategic opportunity
Cloud – A strategic opportunityMicrosoft
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 

Similar to “A Survey of Model Compression Methods,” a Presentation from Instrumental (20)

Practical data science
Practical data sciencePractical data science
Practical data science
 
Software Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_finalSoftware Testing in Cloud Platform A Survey_final
Software Testing in Cloud Platform A Survey_final
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic AlgorithmCloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
Cloud Computing Task Scheduling Algorithm Based on Modified Genetic Algorithm
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
Metrology sampling models using tool sensor data
Metrology sampling models using tool sensor dataMetrology sampling models using tool sensor data
Metrology sampling models using tool sensor data
 
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and RSvm Classifier Algorithm for Data Stream Mining Using Hive and R
Svm Classifier Algorithm for Data Stream Mining Using Hive and R
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
Unleashing the Power of Machine Learning Prototyping Using Azure AutoML and P...
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoML
 
Machine learning systems for engineers
Machine learning systems for engineersMachine learning systems for engineers
Machine learning systems for engineers
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
 
Solving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking ResultsSolving the Issue of Mysterious Database Benchmarking Results
Solving the Issue of Mysterious Database Benchmarking Results
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Data mining techniques using weka
Data mining techniques using wekaData mining techniques using weka
Data mining techniques using weka
 
Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015 Azure Machine Learning Dotnet Campus 2015
Azure Machine Learning Dotnet Campus 2015
 
A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)A survey on Machine Learning In Production (July 2018)
A survey on Machine Learning In Production (July 2018)
 
Cloud – A strategic opportunity
Cloud – A strategic opportunityCloud – A strategic opportunity
Cloud – A strategic opportunity
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

“A Survey of Model Compression Methods,” a Presentation from Instrumental

  • 1. A Survey of Model Compression Methods Rustem Feyzkhanov Staff Machine Learning Engineer Instrumental
  • 2. • Cloud-to-edge ML pipeline and optimizations • Model serialization • Pruning • Quantization • Weight clustering • Knowledge distillation • Architecture search • Summary Agenda 2 © 2023 Instrumental
  • 3. • Latency • Memory usage • Energy usage • Disk usage Challenges with ML on Edge 3 © 2023 Instrumental
  • 4. Cloud-to-edge ML pipeline 4 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference • Data acquisition • Data labeling • Feature selection • Model architecture • Cloud training pipeline • Parameter optimizations • Serialization • Post training optimizations • Compiling model in cloud for hardware • Serving model to the edge • Initializing model • Making predictions
  • 5. Optimizations in ML pipeline 5 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 6. Optimizations in ML pipeline 6 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference Easier to implement/add Affects accuracy less
  • 8. • Hardware specific optimizations • Framework built-in native model compression methods • Significantly affects speed Inference frameworks and model serialization 8 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 9. Frameworks focused on running inference. Lightweight, focused on specific hardware, require separate serialization. • Hardware agnostic: e.g., TorchScript, ONNX, TFLite* • Hardware specific: • CPU: e.g., OpenVINO (Intel) • GPU: e.g., TensorRT • Mobile: e.g., CoreML • NPU: Check out Embedded Vision Alliance members :-) Inference frameworks 9 © 2023 Instrumental
  • 10. • Object detection - YOLOv5s Inference frameworks - example 10 © 2023 Instrumental Framework Size, MB CPU inference, ms GPU (V100) inf, ms PyTorch 29.5 127.61 10.19 TorchScript 29.4 131.23 6.85 TensorRT 33.3 N/A 1.89 ONNX 29.3 69.34 14.63 OpenVINO 29.3 66.52 N/A TFLite 29.0 316.61 N/A From https://docs.ultralytics.com/yolov5/tutorials/model_export/
  • 12. • Eliminating redundant or unimportant parameters • Affects accuracy, but can be addressed by finetuning • Affects size more than speed Pruning 12 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 13. • Weight pruning • Structural pruning • Neuron • Layer • Filter • Channel Types of pruning 13 © 2023 Instrumental Weight pruning Neuron pruning
  • 14. Weight pruning example 14 © 2023 Instrumental • Image classification example – weight pruning Configuration Number of parameters Top-1 accuracy, ImageNet InceptionV3 Original 27.1M 78.1% 50% sparsity 13.6M 78.0% 75% sparsity 6.8M 76.1% 87.5% sparsity 3.3M 74.6% From https://www.tensorflow.org/model_optimization/guide/pruning
  • 16. • Reducing numerical precision of weights and activations • Affects accuracy, but can be addressed during training • Checkout the talk “Practical Approaches to DNN Quantization” later today Quantization 16 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 17. Quantization example 17 © 2023 Instrumental • Image classification example – quantization Configuration Size, MB Inference (CPU), ms Top-1 accuracy, ImageNet InceptionV3 Original 95.7 1130 78.1% Post training quantization 23.9 845 77.2% Quantization aware training 23.9 543 77.5% From https://www.tensorflow.org/lite/performance/model_optimization
  • 19. • Cluster model weights and use indices • Only optimizes model size • Similar to quantization, but doesn’t change computation complexity Weight clustering 19 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 20. Weight clustering example 20 © 2023 Instrumental • Replaces weights with reference to the closest centroids • Centroids are usually rounded, but not quantized so main gain is model size 0.86 -1.88 -1.44 -1.7 1.58 0.12 1.9 0.46 1.37 1 2 2 2 3 1 3 1 3 Centroid Index 0.33 1 -1.26 2 1.92 3 Weight matrix
  • 21. Weight clustering example 21 © 2023 Instrumental • Image classification example Configuration Size, MB Top-1 accuracy, ImageNet MobileNetV2 Original 12.38 71.7% Last 3 layers, 32 clusters 7.03 70.9% Last 3 layers, 16 clusters 6.68 70.7% All layers, 32 clsuters 4.05 69.7% From https://www.tensorflow.org/model_optimization/guide/clustering
  • 23. • Smaller, more efficient "student" model learns to mimic the behavior of a larger, pre-trained "teacher" model Knowledge distillation 23 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 24. Student Teacher Training data Knowledge distillation - example 24 © 2023 Instrumental • Only student model is trained • Student model and teacher model run on the same images • Error is propagated back for student mode Predictions Predictions True label
  • 25. Knowledge distillation - example 25 © 2023 Instrumental • Image classification example From https://www.analyticsvidhya.com/blog/2022/01/knowledge-distillation-theory-and-end-to-end-case-study/ Base model No. of parameters Test accuracy Teacher model VGG16 27 M 77% Student model with Distillation VGG16 pruned 296 k 75% Student model without Distillation VGG16 pruned 296 k 64%
  • 26. • Finding smaller models which have less parameters and have faster predictions Optimizing model architecture 26 © 2023 Instrumental Data gathering Architecture search Model training Model compilation Deployment and model inference
  • 27. Model architecture - example 27 © 2023 Instrumental • Object detection – YOLOv5 family From https://github.com/ultralytics/yolov5 Model Size, MB mAP, COCO CPU inf, ms GPU (V100) inf, ms YOLOv5n 4.1 45.7 45 6.3 YOLOv5s 14.8 56.8 98 6.4 YOLOv5m 42.8 64.1 224 8.2 YOLOv5l 93.6 67.3 430 10.1 YOLOv5x 174.1 68.9 766 12.1
  • 29. Comparing optimization techniques 29 © 2023 Instrumental Optimization Size decrease Speed increase Inference framework Low High Pruning High Low Quantization High High Weight clustering Low Low Architecture search High High
  • 30. • Use a test dataset to assess performance changes • Impacts vary (e.g., less for image classification, more for object detection) • Integrate compilation optimizations into training for optimal model selection • Test on target hardware for deployment • Balance trade-offs: Understand acceptable accuracy loss and business metric impact Recommendations on choosing model compression techniques 30 © 2023 Instrumental
  • 31. • Open-source • Inference or training frameworks with built-in solutions • Commercial • A number of companies specifically focus on optimizing your models (some of them are Alliance Members) Open-source vs commercial 31 © 2023 Instrumental
  • 32. • Begin with an open-source solution for quick optimization gains • Consider commercial for specialized hardware (e.g., mobile) with easy trial options • Use a test set to validate accuracy trade-offs for both approaches Open-source vs commercial 32 © 2023 Instrumental
  • 33. • Model compression is vital for cloud-to-edge ML pipelines • Streamlined training pipeline enables easy exploration of approaches • Integrating compression in training pipeline ensures optimal accuracy Conclusions 33 © 2023 Instrumental
  • 34. • Inference framework guides • OpenVINO https://docs.openvino.ai/latest/openvino_docs_model_optimization_guide.html • TFLite https://www.tensorflow.org/lite/performance/model_optimization • ONNX https://onnxruntime.ai/docs/performance/model-optimizations/ • Books • TinyML https://www.oreilly.com/library/view/tinyml/9781492052036/ • Deep Learning with PyTorch https://livebook.manning.com/book/deep-learning-with-pytorch/ Resources 34 © 2023 Instrumental