SlideShare a Scribd company logo
1 of 42
Download to read offline
Practical Approaches
to DNN Quantization
Dwith Chenna
Senior Embedded DSP Eng., Computer Vision
Magic Leap Inc.
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
3
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Quantization is a powerful tool to enable deep learning on edge
devices
• Resource constrained hardware with limited memory and low
power requirement
Why Quantization?
4
© 2023 Magic Leap
• Model compression: Up to 4x smaller (float32 to int8) network size
and memory bandwidth
• Latency reduction: Up to 2x-3x times, int8 compute is significantly
faster compared to float32 [1]
• Trade-off: Potential effects on the model accuracy
Why Quantization?
5
© 2023 Magic Leap
• Convert full precision float-point numbers to int8 [2]
q - quantized value, r - real value, s - scale, z - zero point
• Quantized value to float-point representation
• In case of float-point distribution, we obtain scale and zero point as:
Quantization Scheme
6
© 2023 Magic Leap
• Assumes symmetric distribution for
simplicity, zero point = 0
• Symmetric per tensor
• Calculate scale for the entire tensor
• Symmetric per channel
• Calculate scale for each channel of the
tensor
• Computationally efficient
Quantization Scheme: Symmetric
7
© 2023 Magic Leap
• Accounts for shifts in the distribution, better
utilization of quantization range
• Asymmetric per tensor
• Scale and zero point for the entire tensor
• Symmetric per channel
• Scale and zero points for each channel of
the tensor
• Better handling of diverse distributions
Quantization Scheme: Asymmetric
8
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
9
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Post Training Quantization (PTQ)
• Simple yet efficient
• Uses already trained model and calibration dataset
• Quantization Aware Training (QAT)
• Emulates inference-time quantization
• Resource intensive as it needs retraining
Types of Quantization
10
© 2023 Magic Leap
• Dynamic Quantization
• Weights are quantized ahead of time
• Activations are quantized during inference (dynamic)
• Static Quantization
• Weights and activations are quantized
• Memory bandwidth and compute savings
• Needs representative dataset
Post Training Quantization
11
© 2023 Magic Leap
• Best quantization scheme for deep neural
networks?
• Weights: Symmetric per channel
• Static distribution makes it easy for
quantization
• Weight distributions tend to be
symmetric [3]
• Symmetric per channel handles
diversity in weight distribution
Post Training Quantization
12
© 2023 Magic Leap
Empirical distribution in a pre-trained network
• Activations: Asymmetric/Symmetric per tensor
• Dynamic distribution per inference makes it difficult to find
statistics
• Approximation through representative/calibration dataset
• Batch normalization enables better distributions for quantization
Post Training Quantization
13
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
14
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Tflite supports 8-bit integer PTQ [1]
• Quantization scheme
• Weights: Symmetric per channel
• Activations: Asymmetric per tensor
• Quantization analysis
• Selective quantization with mixed precision (float32/16 + int8/int16)
• Layerwise quantization error with custom metrics
Quantization Tools: Tflite
15
© 2023 Magic Leap
• Pytorch supports 8-bit integer PTQ [4]
• Quantization scheme
• Weights: (A)symmetric per tensor/channel
• Activations: (A)symmetric per tensor/channel
• Quantization analysis
• Layerwise quantization error through custom metrics
Quantization Tools: Pytorch
16
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
17
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• FLOPS is not everything!
• Network Architecture Search (NAS)
• Most NAS based models (e.g., efficientNet) try to minimize
compute
• Results in deeper and leaner network that works well with
cache-based systems
Network Architecture
18
© 2023 Magic Leap
• Efficient architecture for quantization [5]
Network Architecture
19
© 2023 Magic Leap
• Quantization aware
• Larger models have redundancy which enables robustness to
quantization
• Quantization scheme
• Enable utilization of simpler and efficient quantization
schemes
Network Architecture
20
© 2023 Magic Leap
• Optimization tool chain
• Aggressive layer fusion for optimal memory bandwidth
• Optimal quantization parameter selection
• Hardware
• Better suited for the hardware CPU/GPU/DSP/accelerator
Network Architecture
21
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
22
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Representative dataset to estimate activation distribution
• Need to address diversity of the use case
• Size: ~100-1000 images are statistically significant [6]
Calibration Dataset
23
© 2023 Magic Leap
• Minimize quantization error and eliminate outliers
• Trade-offs: range vs quantization error
• Mean/Standard deviation
• Assuming normal distribution
• Min/Max: mean +/- 3*STD
Min/Max Tuning
24
© 2023 Magic Leap
• Histogram
• Ignore the last x% percent
• Moving average (TensorFlow default)
• Search max/min (Pytorch/TensorRT)
• Find histogram to cover most entropy
Min/Max Tuning
25
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Network Architecture
Contents
26
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Evaluate best fit quantization schemes to the model [2]
• ResNet50: Symmetric per tensor
• MobileNet: Asymmetric per channel
Quantization Evaluation
27
© 2023 Magic Leap
• Effects of quantization scheme on model accuracy [2]
• Classification accuracy of the quantized model
Quantization Evaluation
28
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Model Selection
Contents
29
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• What to do when quantization fails?
• Individual layer support for quantization
• Identifying few problematic layers will significantly improve performance
• Common pitfalls
• Handling input/output quantization
• Layer fusion before quantization
Quantization Analysis
30
© 2023 Magic Leap
• Analyse individual layers sensitivity to quantization [1]
• Selective quantization: mixed precision inference for testing
Quantization Analysis
31
© 2023 Magic Leap
There are many layers with wide ranges, and some layers have high rmse/scale values
layer number (x-axis) vs activation range (y-axis) root mean square error (rmse) vs activation range
• Non-Linear activations: precision requirement and quantization support
• ReLU/ReLU6 preferred over Sigmoid/LeakyReLU
• Weight/activation distribution: visualization or metrics for data
distribution, i.e., range
• Layer fusion Conv + BN + ReLU / Conv + BN / Conv + ReLU before
quantization
Quantization Analysis
32
© 2023 Magic Leap
• Use larger bit width for more sensitive layers, i.e., fully connected,
network head
• Int16 activation support in tflite
• Min/Max tuning: Outlier weights that cause all other weights to be less
precise
Quantization Analysis
33
© 2023 Magic Leap
• Large difference in weight values for different output channels: more
quantization error
• Asymmetric/Symmetric per channel quantization
• Weight equalization techniques to minimize the variation [7]
Quantization Analysis
34
© 2023 Magic Leap
• Weight equalization makes model quantization friendly
Quantization Analysis
35
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Model Selection
Contents
36
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• When everything else fails!
• QAT is a fine-tuning process
• Start with trained floating-point model: with reduced momentum and
learning rate
Quantization Aware Training
37
© 2023 Magic Leap
• Inserting quantization nodes
during training [8]
• Simulate quantization using
float-point operations
• Tune quantization parameters
during training
Quantization Aware Training
38
© 2023 Magic Leap
• QAT is able to achieve < 1% accuracy degradation w.r.t to
floating-point inference [2]
Quantization Aware Training
39
© 2023 Magic Leap
• Why Quantization?
• Quantization Scheme
• Types of Quantization
• Post Training Quantization
• Quantization Tools
• Model Selection
Contents
40
• Calibration Dataset
• Min/Max Tuning
• Quantization Evaluation
• Quantization Analysis
• Quantization Aware Training
• Best Practices
© 2023 Magic Leap
• Model selection
• NAS: Efficient architecture for quantization
• Quantization tools
• Support for quantization schemes and analysis tools
• Calibration dataset
• Representative dataset with ~100-1000 samples
Best Practices
41
© 2023 Magic Leap
• Quantization accuracy
• Evaluate best-fit quantization scheme for the model
• Quantization analysis
• Identify potentially problematic layers
• Quantization aware training
• Fine tune model for quantization
Best Practices
42
© 2023 Magic Leap
1. https://www.tensorflow.org/lite/performance/post_training_quantization
2. Quantizing deep convolutional networks for efficient inference: A whitepaper [link]
3. Fixed Point Quantization of Deep Convolutional Networks [link]
4. https://pytorch.org/docs/stable/quantization.html
5. https://deci.ai/resources/achieve-fp32-accuracy-int8-inference-speed/
6. SelectQ: Calibration Data Selection for Post-Training Quantization[link]
7. AI Model Efficiency Toolkit (AIMET) [link]
8. Aspects and best practices of quantization aware training for custom network
accelerators [link]
References
43
© 2023 Magic Leap

More Related Content

Similar to “Practical Approaches to DNN Quantization,” a Presentation from Magic Leap

Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...jemin lee
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPTUpender Upr
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsQuantUniversity
 
Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01hamza khan
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP Project
 
Securing Kubernetes Workloads
Securing Kubernetes WorkloadsSecuring Kubernetes Workloads
Securing Kubernetes WorkloadsJim Bugwadia
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...WMLab,NCU
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Mahdi Hosseini Moghaddam
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONI Ruby
 
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...Edge AI and Vision Alliance
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreePradeeban Kathiravelu, Ph.D.
 
Service mesh on Kubernetes - Istio 101
Service mesh on Kubernetes - Istio 101Service mesh on Kubernetes - Istio 101
Service mesh on Kubernetes - Istio 101Huy Vo
 
Cp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkCp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkDr Geetha Mohan
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningCloudLightning
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processingare you
 

Similar to “Practical Approaches to DNN Quantization,” a Presentation from Magic Leap (20)

Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...Integer quantization for deep learning inference: principles and empirical ev...
Integer quantization for deep learning inference: principles and empirical ev...
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Grid computing Seminar PPT
Grid computing Seminar PPTGrid computing Seminar PPT
Grid computing Seminar PPT
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01Networkproposalppt 101202160050-phpapp01
Networkproposalppt 101202160050-phpapp01
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation Approach
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Securing Kubernetes Workloads
Securing Kubernetes WorkloadsSecuring Kubernetes Workloads
Securing Kubernetes Workloads
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...Probabilistic consolidation of virtual machines in self organizing cloud data...
Probabilistic consolidation of virtual machines in self organizing cloud data...
 
Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...Application of machine learning and cognitive computing in intrusion detectio...
Application of machine learning and cognitive computing in intrusion detectio...
 
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTIONMOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
MOBILE CLOUD COMPUTING USING CRYPTOGRAPHIC HASH FUNCTION
 
Grid computiing
Grid computiingGrid computiing
Grid computiing
 
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
Service mesh on Kubernetes - Istio 101
Service mesh on Kubernetes - Istio 101Service mesh on Kubernetes - Istio 101
Service mesh on Kubernetes - Istio 101
 
Cp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -networkCp7101 design and management of computer networks -network
Cp7101 design and management of computer networks -network
 
Simulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightningSimulating Heterogeneous Resources in CloudLightning
Simulating Heterogeneous Resources in CloudLightning
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 

Recently uploaded

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap

  • 1. Practical Approaches to DNN Quantization Dwith Chenna Senior Embedded DSP Eng., Computer Vision Magic Leap Inc.
  • 2. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 3 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 3. • Quantization is a powerful tool to enable deep learning on edge devices • Resource constrained hardware with limited memory and low power requirement Why Quantization? 4 © 2023 Magic Leap
  • 4. • Model compression: Up to 4x smaller (float32 to int8) network size and memory bandwidth • Latency reduction: Up to 2x-3x times, int8 compute is significantly faster compared to float32 [1] • Trade-off: Potential effects on the model accuracy Why Quantization? 5 © 2023 Magic Leap
  • 5. • Convert full precision float-point numbers to int8 [2] q - quantized value, r - real value, s - scale, z - zero point • Quantized value to float-point representation • In case of float-point distribution, we obtain scale and zero point as: Quantization Scheme 6 © 2023 Magic Leap
  • 6. • Assumes symmetric distribution for simplicity, zero point = 0 • Symmetric per tensor • Calculate scale for the entire tensor • Symmetric per channel • Calculate scale for each channel of the tensor • Computationally efficient Quantization Scheme: Symmetric 7 © 2023 Magic Leap
  • 7. • Accounts for shifts in the distribution, better utilization of quantization range • Asymmetric per tensor • Scale and zero point for the entire tensor • Symmetric per channel • Scale and zero points for each channel of the tensor • Better handling of diverse distributions Quantization Scheme: Asymmetric 8 © 2023 Magic Leap
  • 8. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 9 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 9. • Post Training Quantization (PTQ) • Simple yet efficient • Uses already trained model and calibration dataset • Quantization Aware Training (QAT) • Emulates inference-time quantization • Resource intensive as it needs retraining Types of Quantization 10 © 2023 Magic Leap
  • 10. • Dynamic Quantization • Weights are quantized ahead of time • Activations are quantized during inference (dynamic) • Static Quantization • Weights and activations are quantized • Memory bandwidth and compute savings • Needs representative dataset Post Training Quantization 11 © 2023 Magic Leap
  • 11. • Best quantization scheme for deep neural networks? • Weights: Symmetric per channel • Static distribution makes it easy for quantization • Weight distributions tend to be symmetric [3] • Symmetric per channel handles diversity in weight distribution Post Training Quantization 12 © 2023 Magic Leap Empirical distribution in a pre-trained network
  • 12. • Activations: Asymmetric/Symmetric per tensor • Dynamic distribution per inference makes it difficult to find statistics • Approximation through representative/calibration dataset • Batch normalization enables better distributions for quantization Post Training Quantization 13 © 2023 Magic Leap
  • 13. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 14 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 14. • Tflite supports 8-bit integer PTQ [1] • Quantization scheme • Weights: Symmetric per channel • Activations: Asymmetric per tensor • Quantization analysis • Selective quantization with mixed precision (float32/16 + int8/int16) • Layerwise quantization error with custom metrics Quantization Tools: Tflite 15 © 2023 Magic Leap
  • 15. • Pytorch supports 8-bit integer PTQ [4] • Quantization scheme • Weights: (A)symmetric per tensor/channel • Activations: (A)symmetric per tensor/channel • Quantization analysis • Layerwise quantization error through custom metrics Quantization Tools: Pytorch 16 © 2023 Magic Leap
  • 16. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 17 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 17. • FLOPS is not everything! • Network Architecture Search (NAS) • Most NAS based models (e.g., efficientNet) try to minimize compute • Results in deeper and leaner network that works well with cache-based systems Network Architecture 18 © 2023 Magic Leap
  • 18. • Efficient architecture for quantization [5] Network Architecture 19 © 2023 Magic Leap
  • 19. • Quantization aware • Larger models have redundancy which enables robustness to quantization • Quantization scheme • Enable utilization of simpler and efficient quantization schemes Network Architecture 20 © 2023 Magic Leap
  • 20. • Optimization tool chain • Aggressive layer fusion for optimal memory bandwidth • Optimal quantization parameter selection • Hardware • Better suited for the hardware CPU/GPU/DSP/accelerator Network Architecture 21 © 2023 Magic Leap
  • 21. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 22 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 22. • Representative dataset to estimate activation distribution • Need to address diversity of the use case • Size: ~100-1000 images are statistically significant [6] Calibration Dataset 23 © 2023 Magic Leap
  • 23. • Minimize quantization error and eliminate outliers • Trade-offs: range vs quantization error • Mean/Standard deviation • Assuming normal distribution • Min/Max: mean +/- 3*STD Min/Max Tuning 24 © 2023 Magic Leap
  • 24. • Histogram • Ignore the last x% percent • Moving average (TensorFlow default) • Search max/min (Pytorch/TensorRT) • Find histogram to cover most entropy Min/Max Tuning 25 © 2023 Magic Leap
  • 25. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Network Architecture Contents 26 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 26. • Evaluate best fit quantization schemes to the model [2] • ResNet50: Symmetric per tensor • MobileNet: Asymmetric per channel Quantization Evaluation 27 © 2023 Magic Leap
  • 27. • Effects of quantization scheme on model accuracy [2] • Classification accuracy of the quantized model Quantization Evaluation 28 © 2023 Magic Leap
  • 28. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Model Selection Contents 29 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 29. • What to do when quantization fails? • Individual layer support for quantization • Identifying few problematic layers will significantly improve performance • Common pitfalls • Handling input/output quantization • Layer fusion before quantization Quantization Analysis 30 © 2023 Magic Leap
  • 30. • Analyse individual layers sensitivity to quantization [1] • Selective quantization: mixed precision inference for testing Quantization Analysis 31 © 2023 Magic Leap There are many layers with wide ranges, and some layers have high rmse/scale values layer number (x-axis) vs activation range (y-axis) root mean square error (rmse) vs activation range
  • 31. • Non-Linear activations: precision requirement and quantization support • ReLU/ReLU6 preferred over Sigmoid/LeakyReLU • Weight/activation distribution: visualization or metrics for data distribution, i.e., range • Layer fusion Conv + BN + ReLU / Conv + BN / Conv + ReLU before quantization Quantization Analysis 32 © 2023 Magic Leap
  • 32. • Use larger bit width for more sensitive layers, i.e., fully connected, network head • Int16 activation support in tflite • Min/Max tuning: Outlier weights that cause all other weights to be less precise Quantization Analysis 33 © 2023 Magic Leap
  • 33. • Large difference in weight values for different output channels: more quantization error • Asymmetric/Symmetric per channel quantization • Weight equalization techniques to minimize the variation [7] Quantization Analysis 34 © 2023 Magic Leap
  • 34. • Weight equalization makes model quantization friendly Quantization Analysis 35 © 2023 Magic Leap
  • 35. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Model Selection Contents 36 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 36. • When everything else fails! • QAT is a fine-tuning process • Start with trained floating-point model: with reduced momentum and learning rate Quantization Aware Training 37 © 2023 Magic Leap
  • 37. • Inserting quantization nodes during training [8] • Simulate quantization using float-point operations • Tune quantization parameters during training Quantization Aware Training 38 © 2023 Magic Leap
  • 38. • QAT is able to achieve < 1% accuracy degradation w.r.t to floating-point inference [2] Quantization Aware Training 39 © 2023 Magic Leap
  • 39. • Why Quantization? • Quantization Scheme • Types of Quantization • Post Training Quantization • Quantization Tools • Model Selection Contents 40 • Calibration Dataset • Min/Max Tuning • Quantization Evaluation • Quantization Analysis • Quantization Aware Training • Best Practices © 2023 Magic Leap
  • 40. • Model selection • NAS: Efficient architecture for quantization • Quantization tools • Support for quantization schemes and analysis tools • Calibration dataset • Representative dataset with ~100-1000 samples Best Practices 41 © 2023 Magic Leap
  • 41. • Quantization accuracy • Evaluate best-fit quantization scheme for the model • Quantization analysis • Identify potentially problematic layers • Quantization aware training • Fine tune model for quantization Best Practices 42 © 2023 Magic Leap
  • 42. 1. https://www.tensorflow.org/lite/performance/post_training_quantization 2. Quantizing deep convolutional networks for efficient inference: A whitepaper [link] 3. Fixed Point Quantization of Deep Convolutional Networks [link] 4. https://pytorch.org/docs/stable/quantization.html 5. https://deci.ai/resources/achieve-fp32-accuracy-int8-inference-speed/ 6. SelectQ: Calibration Data Selection for Post-Training Quantization[link] 7. AI Model Efficiency Toolkit (AIMET) [link] 8. Aspects and best practices of quantization aware training for custom network accelerators [link] References 43 © 2023 Magic Leap