SlideShare a Scribd company logo
1 of 22
Download to read offline
ยฉ GO Inc.
Model Quantization Technologies with AIMET
้ƒญ ๅ“็„ถ
ยฉ GO Inc. 2
Outline
โ— Challenges of edge AI workloads
โ— Introduction of AIMET
โ— Post-training quantization (PTQ) techniques
โ— Quantization-aware training (QAT)
โ— Quantization Simulation
ยฉ GO Inc. 3
Challenges of edge AI workloads
โ— Power and thermal efficiency are essential for on-device AI
1. Limited Resources:
Edge devices typically have limited computational power, memory, and energy
resources compared with cloud servers and PC desktops.
2. Latency and Real-time Processing:
Edge AI often requires real-time or near real-time processing to enable applications
ยฉ GO Inc.
Methods to improve model performance on edge device
โ— Model Quantization:
Reduce bit-precision while keeping desired accuracy
โ— Model Compression:
Compression model size while keeping desired accuracy
โ— Neural architecture search:
Design smaller neural networks suitable for real hardware
ยฉ GO Inc.
Model Quantization
INT 8 FP 16 FP 32
Inference Speed
Higher Accuracy
โ— Int8 is faster than FP32 but it sacrifices accuracy during inference process
ยฉ GO Inc.
Introduction of AIMET
reference:
https://github.com/quic/aimet
AIMET: AI Model Efficiency Toolkit,
provides model quantization and compression techniques for AI models.
ยฉ GO Inc.
AIMET Features
โ— Support model quantization and compression techniques
โ— Support for both TensorFlow and PyTorch
โ— Benchmarks and tests for many models
โ— User-friendly APIs
โ— Provide visualization tools for debugging and analysis models
ยฉ GO Inc.
โ— Post-Training Quantization (PTQ):
Performs quantization after the model has been trained
โ— Quantization-Aware Training (QAT):
Applying fine-tuning to restore accuracy degradation caused by quantization
โ— Quantization Simulation
Predicts on-target accuracy before deploying model to hardware
AIMET Model Quantization Use Cases
ยฉ GO Inc.
Post-Training Quantization (PTQ)
PTQ:
โ— Performs quantization after the model has been trained without model retraining
Features of PTQ:
โ— PTQ methods can be data-free
โ— PTQ methods also can do range analysis using calibration data
โ—‹ To determine step size for activations
โ—‹ Step size for weights can be determined without any data
AutoQuant:
โ— AIMET provide the AutoQuant feature to analyzes the model, determines the sequence of
quantization techniques and applies these techniques.
โ—‹ AutoQuant feature saves time and automates the quantization of the neural networks.
ยฉ GO Inc.
โ— Designed to find the best combination of quantization methods to maximize model performance
โ— AutoQuant applies these optimization for better performance:
โ—‹ Cross-Layer Equalization๏ผˆCLE๏ผ‰:
Equalizes weight ranges in consecutive layers
โ–  Markus Nagel, Mart van Baalen ใ€ŒData-Free Quantization Through Weight Equalization and Bias Correctionใ€
https://arxiv.org/pdf/1906.04721.pdf
โ—‹ Bias Correction๏ผˆBC๏ผ‰:
Focuses on correcting the bias parameters of individual layers in the quantized model
โ–  Markus Nagel, Mart van Baalen ใ€ŒData-Free Quantization Through Weight Equalization and Bias Correctionใ€
https://arxiv.org/pdf/1906.04721.pdf
โ—‹ Adaptive Rounding (AdaRound๏ผ‰:
Determines optimal rounding for weight tensors to improve quantized performance.
โ–  Markus Nagel, Rana Ali Amjad ใ€ŒUp or Down? Adaptive Rounding for Post-Training Quantizationใ€
https://arxiv.org/pdf/2004.10568.pdf
AutoQuant
ยฉ GO Inc.
Workflow of AutoQuant
https://quic.github.io/aimet-pages/releases/latest/user_guide/auto_quant.html#ug-auto-quant
ยฉ GO Inc.
Cross Layer Equalization & Bias Correction
Cross-layer Equalization (CLE)
โ— Equalizes the weight ranges by using the scale-equivariance property of activation functions.
โ— Especially beneficial for models with depth-wise separable convolution layers.
Bias Correction
โ— Fixes shifts in layer outputs introduced due to quantization. When noise due to weight quantization is
biased, it also introduces a shift
โ— Adapts a layerโ€™s bias parameter using a correction term to correct for the bias in the noise.
https://quic.github.io/aimet-pages/releases/latest/user_guide/post_training_quant_techniques.html#ug-post-training-quantization
ยฉ GO Inc.
AdaRound
Markus Nagel, Rana Ali Amjad ใ€ŒUp or Down? Adaptive Rounding for Post-Training Quantizationใ€
https://arxiv.org/pdf/2004.10568.pdf
โ— Use the โ€œnearest roundingโ€ technique, this
weight value is quantized to the nearest integer
value.
โ— AdaRound feature let the weight value is
quantized to the integer value far from it.
AIMET use the โ€œnearest roundingโ€ technique for achieving quantization.
ยฉ GO Inc.
AdaRound Techniques
โ— AdaRound results compare with baseline:
Chirag Patel, Tijmen Blankevoort ใ€ŒIntelligence at scale through AI model efficiencyใ€
https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/presentation_-_intelligence_at_scale_through_ai_model_efficiency.pdf
ยฉ GO Inc.
Quantization-Aware Training (QAT)
โ— Simulate quantization noise in forward pass.
โ— Finetune using training data
RELU
Conv/FC
Act quant
+
Input
Output
Bias
Wt quant
Weight
Backprop
Simulation ops added
automatically at appropriate
places in the model graph
โ— Learn quantization parameters (QAT
with Range Learning)
โ— Fine tune model weights
ยฉ GO Inc.
Two modes of QAT are supported by AIMET:
1. Regular QAT:
โ— Update:
โ—‹ Trainable parameters such as weights and biases
โ— Constant:
โ—‹ Scale and offset quantization parameters
2. QAT with Range Learning:
โ— Update:
โ—‹ Trainable parameters such as module weights, biases
โ—‹ Scale/offset parameters for weight quantizers
โ—‹ Scale/offset parameters for activation quantizers
Quantization-Aware Training (QAT)
ยฉ GO Inc.
AIMETโ€™s Quantization Simulation provides functionality to simulate the quantization model in hardware.
Quantization Simulation
https://quic.github.io/aimet-pages/AimetDocs/user_guide/quantization_sim.html
ยฉ GO Inc.
Quantization Simulation
https://quic.github.io/aimet-pages/AimetDocs/user_guide/quantization_sim.html
โ— AIMET can simulate the quantization noise
โ— Since dequantizated value may not be exactly the same as quantized value,
the difference between the two values is the quantization noise.
ยฉ GO Inc.
Quantization Simulation
https://quic.github.io/aimet-pages/AimetDocs/user_guide/quantization_sim.html
โ— AIMET analyzes the model and determines the optimal quantization encodings per-layer
ยฉ GO Inc.
Quantization Simulation
Results of CV models accuracy on the AIMET simulator without QAT
Compare with the pytorch and SNPE accuracy
โ— AIMET quantized models can provide good accuracy, comparable to floating point models.
โ— Gap between AIMET quant and SNPE quant :
โ—‹ Execution on different runtimes (GPU and DSP) can lead to different results.
โ—‹ The default quantization algorithm in AIMET may not be fully aligned with the algorithm used on hardware
Model (accuracy) Pytorch Offical
๏ผˆGPU๏ผ‰
Pytorch
(CPU๏ผ‰
AIMET quant
๏ผˆGPU๏ผ‰
SNPE quant
๏ผˆDSP๏ผ‰
ResNet18 69.758% 69.76% 69.608% 69.294%
ResNet50 76.13% 76.146% 75.86% 75.422%
Mobilenetv2 71.878% 71.87% 71.164% 69.226
Inceptionv3 77.294% 77.472% 76.564% 76.842%
SNPE:Snapdragon Neural
Processing Engine
DSP: Digital Signal Processor
https://pytorch.org/vision/main/models.html
ยฉ GO Inc.
Summary
Pros:
โ— AIMET provides QAT (Quantization-Aware Training) and PTQ (Post-Training
Quantization) technologies to improve the accuracy of models.
โ— AIMET is designed with user-friendliness in mind. It offers a user-friendly interface
and clear documentation
โ— AIMET offers debugging tools and visualization capabilities
Cons:
โ— Quantization simulations may ignore hardware-specific effects affecting model
performance.
ๆ–‡็ซ ใƒป็”ปๅƒ็ญ‰ใฎๅ†…ๅฎนใฎ็„กๆ–ญ่ปข่ผ‰ๅŠใณ่ค‡่ฃฝ็ญ‰ใฎ่กŒ็‚บใฏใ”้ ๆ…ฎใใ ใ•ใ„ใ€‚
ยฉ GO Inc.

More Related Content

Similar to Model Quantization Technologies with AIMET.pdf

Enhanced Skewed Load and Broadside Power Reduction in Transition Fault Testing
Enhanced Skewed Load and Broadside Power Reduction in Transition Fault TestingEnhanced Skewed Load and Broadside Power Reduction in Transition Fault Testing
Enhanced Skewed Load and Broadside Power Reduction in Transition Fault Testing
IJERA Editor
ย 

Similar to Model Quantization Technologies with AIMET.pdf (20)

FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSFACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
ย 
Photo Editing And Sharing Web Application With AI- Assisted Features
Photo Editing And Sharing Web Application With AI- Assisted FeaturesPhoto Editing And Sharing Web Application With AI- Assisted Features
Photo Editing And Sharing Web Application With AI- Assisted Features
ย 
Mixed Precision Training Review
Mixed Precision Training ReviewMixed Precision Training Review
Mixed Precision Training Review
ย 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
ย 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
ย 
Sheldon Fernandes (Lucid VR): Real-time Calibration for Stereo Cameras Using ...
Sheldon Fernandes (Lucid VR): Real-time Calibration for Stereo Cameras Using ...Sheldon Fernandes (Lucid VR): Real-time Calibration for Stereo Cameras Using ...
Sheldon Fernandes (Lucid VR): Real-time Calibration for Stereo Cameras Using ...
ย 
โ€œPractical Approaches to DNN Quantization,โ€ a Presentation from Magic Leap
โ€œPractical Approaches to DNN Quantization,โ€ a Presentation from Magic Leapโ€œPractical Approaches to DNN Quantization,โ€ a Presentation from Magic Leap
โ€œPractical Approaches to DNN Quantization,โ€ a Presentation from Magic Leap
ย 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
ย 
Enhanced Skewed Load and Broadside Power Reduction in Transition Fault Testing
Enhanced Skewed Load and Broadside Power Reduction in Transition Fault TestingEnhanced Skewed Load and Broadside Power Reduction in Transition Fault Testing
Enhanced Skewed Load and Broadside Power Reduction in Transition Fault Testing
ย 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
ย 
Gatling
Gatling Gatling
Gatling
ย 
Performance Test Automation With Gatling
Performance Test Automation  With GatlingPerformance Test Automation  With Gatling
Performance Test Automation With Gatling
ย 
B Kindilien Finding Efficiency In Mach 120408
B Kindilien Finding Efficiency In Mach 120408B Kindilien Finding Efficiency In Mach 120408
B Kindilien Finding Efficiency In Mach 120408
ย 
Post compiler software optimization for reducing energy
Post compiler software optimization for reducing energyPost compiler software optimization for reducing energy
Post compiler software optimization for reducing energy
ย 
IRJET- Cloud Cost Analyzer and Optimizer
IRJET- Cloud Cost Analyzer and OptimizerIRJET- Cloud Cost Analyzer and Optimizer
IRJET- Cloud Cost Analyzer and Optimizer
ย 
DIGEST PODCAST
DIGEST PODCASTDIGEST PODCAST
DIGEST PODCAST
ย 
B Kindilien-Does Manufacturing Have a Future?
B Kindilien-Does Manufacturing Have a Future?B Kindilien-Does Manufacturing Have a Future?
B Kindilien-Does Manufacturing Have a Future?
ย 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
ย 
IRJET - Design and Manufacturing of Gear Error Profile Detector
IRJET - Design and Manufacturing of Gear Error Profile DetectorIRJET - Design and Manufacturing of Gear Error Profile Detector
IRJET - Design and Manufacturing of Gear Error Profile Detector
ย 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
ย 

Recently uploaded

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
bodapatigopi8531
ย 
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
anilsa9823
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
ย 

Recently uploaded (20)

CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female serviceCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Badshah Nagar Lucknow best Female service
ย 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
ย 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
ย 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
ย 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlanโ€™s ...
ย 
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธcall girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
call girls in Vaishali (Ghaziabad) ๐Ÿ” >เผ’8448380779 ๐Ÿ” genuine Escort Service ๐Ÿ”โœ”๏ธโœ”๏ธ
ย 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
ย 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
ย 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
ย 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
ย 
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )๐Ÿ” 9953056974๐Ÿ”(=)/CALL GIRLS SERVICE
ย 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
ย 
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธCALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online  โ˜‚๏ธ
CALL ON โžฅ8923113531 ๐Ÿ”Call Girls Kakori Lucknow best sexual service Online โ˜‚๏ธ
ย 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
ย 
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS LiveVip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida โžก๏ธ Delhi โžก๏ธ 9999965857 No Advance 24HRS Live
ย 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
ย 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
ย 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
ย 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
ย 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
ย 

Model Quantization Technologies with AIMET.pdf

  • 1. ยฉ GO Inc. Model Quantization Technologies with AIMET ้ƒญ ๅ“็„ถ
  • 2. ยฉ GO Inc. 2 Outline โ— Challenges of edge AI workloads โ— Introduction of AIMET โ— Post-training quantization (PTQ) techniques โ— Quantization-aware training (QAT) โ— Quantization Simulation
  • 3. ยฉ GO Inc. 3 Challenges of edge AI workloads โ— Power and thermal efficiency are essential for on-device AI 1. Limited Resources: Edge devices typically have limited computational power, memory, and energy resources compared with cloud servers and PC desktops. 2. Latency and Real-time Processing: Edge AI often requires real-time or near real-time processing to enable applications
  • 4. ยฉ GO Inc. Methods to improve model performance on edge device โ— Model Quantization: Reduce bit-precision while keeping desired accuracy โ— Model Compression: Compression model size while keeping desired accuracy โ— Neural architecture search: Design smaller neural networks suitable for real hardware
  • 5. ยฉ GO Inc. Model Quantization INT 8 FP 16 FP 32 Inference Speed Higher Accuracy โ— Int8 is faster than FP32 but it sacrifices accuracy during inference process
  • 6. ยฉ GO Inc. Introduction of AIMET reference: https://github.com/quic/aimet AIMET: AI Model Efficiency Toolkit, provides model quantization and compression techniques for AI models.
  • 7. ยฉ GO Inc. AIMET Features โ— Support model quantization and compression techniques โ— Support for both TensorFlow and PyTorch โ— Benchmarks and tests for many models โ— User-friendly APIs โ— Provide visualization tools for debugging and analysis models
  • 8. ยฉ GO Inc. โ— Post-Training Quantization (PTQ): Performs quantization after the model has been trained โ— Quantization-Aware Training (QAT): Applying fine-tuning to restore accuracy degradation caused by quantization โ— Quantization Simulation Predicts on-target accuracy before deploying model to hardware AIMET Model Quantization Use Cases
  • 9. ยฉ GO Inc. Post-Training Quantization (PTQ) PTQ: โ— Performs quantization after the model has been trained without model retraining Features of PTQ: โ— PTQ methods can be data-free โ— PTQ methods also can do range analysis using calibration data โ—‹ To determine step size for activations โ—‹ Step size for weights can be determined without any data AutoQuant: โ— AIMET provide the AutoQuant feature to analyzes the model, determines the sequence of quantization techniques and applies these techniques. โ—‹ AutoQuant feature saves time and automates the quantization of the neural networks.
  • 10. ยฉ GO Inc. โ— Designed to find the best combination of quantization methods to maximize model performance โ— AutoQuant applies these optimization for better performance: โ—‹ Cross-Layer Equalization๏ผˆCLE๏ผ‰: Equalizes weight ranges in consecutive layers โ–  Markus Nagel, Mart van Baalen ใ€ŒData-Free Quantization Through Weight Equalization and Bias Correctionใ€ https://arxiv.org/pdf/1906.04721.pdf โ—‹ Bias Correction๏ผˆBC๏ผ‰: Focuses on correcting the bias parameters of individual layers in the quantized model โ–  Markus Nagel, Mart van Baalen ใ€ŒData-Free Quantization Through Weight Equalization and Bias Correctionใ€ https://arxiv.org/pdf/1906.04721.pdf โ—‹ Adaptive Rounding (AdaRound๏ผ‰: Determines optimal rounding for weight tensors to improve quantized performance. โ–  Markus Nagel, Rana Ali Amjad ใ€ŒUp or Down? Adaptive Rounding for Post-Training Quantizationใ€ https://arxiv.org/pdf/2004.10568.pdf AutoQuant
  • 11. ยฉ GO Inc. Workflow of AutoQuant https://quic.github.io/aimet-pages/releases/latest/user_guide/auto_quant.html#ug-auto-quant
  • 12. ยฉ GO Inc. Cross Layer Equalization & Bias Correction Cross-layer Equalization (CLE) โ— Equalizes the weight ranges by using the scale-equivariance property of activation functions. โ— Especially beneficial for models with depth-wise separable convolution layers. Bias Correction โ— Fixes shifts in layer outputs introduced due to quantization. When noise due to weight quantization is biased, it also introduces a shift โ— Adapts a layerโ€™s bias parameter using a correction term to correct for the bias in the noise. https://quic.github.io/aimet-pages/releases/latest/user_guide/post_training_quant_techniques.html#ug-post-training-quantization
  • 13. ยฉ GO Inc. AdaRound Markus Nagel, Rana Ali Amjad ใ€ŒUp or Down? Adaptive Rounding for Post-Training Quantizationใ€ https://arxiv.org/pdf/2004.10568.pdf โ— Use the โ€œnearest roundingโ€ technique, this weight value is quantized to the nearest integer value. โ— AdaRound feature let the weight value is quantized to the integer value far from it. AIMET use the โ€œnearest roundingโ€ technique for achieving quantization.
  • 14. ยฉ GO Inc. AdaRound Techniques โ— AdaRound results compare with baseline: Chirag Patel, Tijmen Blankevoort ใ€ŒIntelligence at scale through AI model efficiencyใ€ https://www.qualcomm.com/content/dam/qcomm-martech/dm-assets/documents/presentation_-_intelligence_at_scale_through_ai_model_efficiency.pdf
  • 15. ยฉ GO Inc. Quantization-Aware Training (QAT) โ— Simulate quantization noise in forward pass. โ— Finetune using training data RELU Conv/FC Act quant + Input Output Bias Wt quant Weight Backprop Simulation ops added automatically at appropriate places in the model graph โ— Learn quantization parameters (QAT with Range Learning) โ— Fine tune model weights
  • 16. ยฉ GO Inc. Two modes of QAT are supported by AIMET: 1. Regular QAT: โ— Update: โ—‹ Trainable parameters such as weights and biases โ— Constant: โ—‹ Scale and offset quantization parameters 2. QAT with Range Learning: โ— Update: โ—‹ Trainable parameters such as module weights, biases โ—‹ Scale/offset parameters for weight quantizers โ—‹ Scale/offset parameters for activation quantizers Quantization-Aware Training (QAT)
  • 17. ยฉ GO Inc. AIMETโ€™s Quantization Simulation provides functionality to simulate the quantization model in hardware. Quantization Simulation https://quic.github.io/aimet-pages/AimetDocs/user_guide/quantization_sim.html
  • 18. ยฉ GO Inc. Quantization Simulation https://quic.github.io/aimet-pages/AimetDocs/user_guide/quantization_sim.html โ— AIMET can simulate the quantization noise โ— Since dequantizated value may not be exactly the same as quantized value, the difference between the two values is the quantization noise.
  • 19. ยฉ GO Inc. Quantization Simulation https://quic.github.io/aimet-pages/AimetDocs/user_guide/quantization_sim.html โ— AIMET analyzes the model and determines the optimal quantization encodings per-layer
  • 20. ยฉ GO Inc. Quantization Simulation Results of CV models accuracy on the AIMET simulator without QAT Compare with the pytorch and SNPE accuracy โ— AIMET quantized models can provide good accuracy, comparable to floating point models. โ— Gap between AIMET quant and SNPE quant : โ—‹ Execution on different runtimes (GPU and DSP) can lead to different results. โ—‹ The default quantization algorithm in AIMET may not be fully aligned with the algorithm used on hardware Model (accuracy) Pytorch Offical ๏ผˆGPU๏ผ‰ Pytorch (CPU๏ผ‰ AIMET quant ๏ผˆGPU๏ผ‰ SNPE quant ๏ผˆDSP๏ผ‰ ResNet18 69.758% 69.76% 69.608% 69.294% ResNet50 76.13% 76.146% 75.86% 75.422% Mobilenetv2 71.878% 71.87% 71.164% 69.226 Inceptionv3 77.294% 77.472% 76.564% 76.842% SNPE:Snapdragon Neural Processing Engine DSP: Digital Signal Processor https://pytorch.org/vision/main/models.html
  • 21. ยฉ GO Inc. Summary Pros: โ— AIMET provides QAT (Quantization-Aware Training) and PTQ (Post-Training Quantization) technologies to improve the accuracy of models. โ— AIMET is designed with user-friendliness in mind. It offers a user-friendly interface and clear documentation โ— AIMET offers debugging tools and visualization capabilities Cons: โ— Quantization simulations may ignore hardware-specific effects affecting model performance.