SlideShare a Scribd company logo
1 of 27
Download to read offline
Introduction to
Optimizing ML Models
for the Edge
Kumaran Ponnambalam
Principal Engineer - AI
Cisco Systems, Emerging Tech &
Incubation
© 2023 Cisco and/or its affiliates. All rights reserved.
Agenda
• Deploying deep learning models at the edge
• Model compression techniques
• Quantization
• Pruning
• Low rank approximation
• Knowledge distillation
• Leveraging edge hardware
• Model optimization best practices
2
3
Deep learning models at the edge
© 2023 Cisco and/or its affiliates. All rights reserved.
Edge AI : Growth & challenges
• Exponential growth in Edge AI applications
• Logistics, smart homes, transportation, security etc.
• Computer vision, NLP, time series
• Challenges with using cloud-based models
• Latency
• Reliable network connectivity
• Security & privacy
• Challenges deploying deep learning models at the edge
• Huge model footprint ( > available memory)
• Limited processing capacity
4
© 2023 Cisco and/or its affiliates. All rights reserved.
Deep learning models at the edge
5
Deep learning models need to be
optimized
for efficient and effective
inference at the edge
© 2023 Cisco and/or its affiliates. All rights reserved.
Edge AI : Goals for optimization
• Maintain model performance thresholds (Accuracy, F1, Recall, etc.)
• Reduce model sizes (compression)
• Improve model performance
• Latency
• FLOPS
• Power usage
• Leverage edge hardware capabilities
• Edge CPUs / GPUs
• Hardware accelerators
6
7
Model compression techniques
© 2023 Cisco and/or its affiliates. All rights reserved.
Model compression benefits
• Smaller memory footprint
• Reduced CPU/GPU time
• Lower latency
• Improved scaling per deployment
• Negligible loss of accuracy in most cases
• Easier packaging, transport and deployment
8
© 2023 Cisco and/or its affiliates. All rights reserved.
Quantization
• Reduce the storage size of parameters
• 32-bit to 8-bit (4X reduction)
• Less memory requirements
• Lower compute (FP vs INT operations)
• Energy saving
• Possible loss of accuracy (depends on
model)
• Popular ML frameworks support
quantization techniques
9
FP32
0.76 -0.10 1.45
-2.20 0.92 -0.89
-0.01 2.14 1.78
INT8
95 -13 181
-275 115 -111
-1 268 223
© 2023 Cisco and/or its affiliates. All rights reserved.
Types of quantization
10
• Done on a float trained model
after training
• Convert weights, biases and
activations to integers
• Simple to implement
• Loss of accuracy
Post-training quantization
• Done during training
• Impact of quantization validated
and adjusted
• Post-training quantization on
this model results in smaller/no
loss of accuracy
Quantization aware training
© 2023 Cisco and/or its affiliates. All rights reserved.
Quantization performance
11
Retrieved from : https://www.softserveinc.com/en-us/blog/deep-learning-model-compression-and-optimization
© 2023 Cisco and/or its affiliates. All rights reserved.
Model pruning
• Eliminate model elements with low impact
on outcomes
• Nodes
• Connections
• Layers
• Iterative pruning (increased sparsity) with
test for performance
• Size vs accuracy trade-off
• Effectiveness depends on nature of data
• Popular ML frameworks support pruning
techniques
12
© 2023 Cisco and/or its affiliates. All rights reserved.
Types of pruning
13
• Remove individual elements
• Connections
• Nodes
• Random removal with validation
• Can achieve higher size
reductions based on amount of
pruning
Unstructured Pruning
• Remove part of the network
• Layers
• Channels
• Filters
• Easier process
• Benefits based on model
Structured Pruning
© 2023 Cisco and/or its affiliates. All rights reserved.
Pruning performance
14
Retrieved from : https://nips.cc/virtual/2020/public/poster_703957b6dd9e3a7980e040bee50ded65.html
© 2023 Cisco and/or its affiliates. All rights reserved.
Low-rank approximation
• Reduce the number of parameters
necessary to represent the model
• Create matrix of lower rank
• Eliminate redundant data
• Measure performance with low rank
matrix
• Benefits vary based on the use case
• Popular ML frameworks have out-of-the-
box support
15
© 2023 Cisco and/or its affiliates. All rights reserved.
Low-rank approximation performance
16
Retrieved from : https://www.researchgate.net/figure/Post-training-results-of-low-rank-approximation-no-fine-
tuning-fine-tuning-with_tbl1_362859051
© 2023 Cisco and/or its affiliates. All rights reserved.
Knowledge distillation
• Train a small student model to mimic the
outputs of a large teacher model
• Distillation process compares outputs of
the models for the same inputs and
adjusts parameters for the student
• Smaller model footprint for student
• Comparable accuracy / performance
• Training dataset can be use-case specific
17
Large
teacher model
Small
student
model
Distillation
process
© 2023 Cisco and/or its affiliates. All rights reserved.
Comparison of techniques
18
Quantization Pruning Low-rank
Approximation
Knowledge
Distillation
Cost Low Low Medium High
During
training
Yes Yes Yes Yes
Post training Yes Yes Yes Yes
Pretrained
models
Yes Yes Yes No
© 2023 Cisco and/or its affiliates. All rights reserved.
Compression process
• Create a baseline of the original model
• Parameters, training data, test results
• Set threshold levels for compression expectations
• Expected minimum accuracy, maximum resource usage
• Use an iterative approach
• Try model compression in stages
• Test with baseline training data
• Compare with baseline test results and thresholds
• Try different techniques to identify best approach
• Combining approaches is possible ( e.g., quantization and pruning )
19
20
Leveraging edge hardware
© 2023 Cisco and/or its affiliates. All rights reserved.
Edge specialized infrastructure
• Edge optimized hardware
• Deliver best performance for edge constrained environments
• Low end processors: Micro-controller units, neural processing units (NPU)
• High end processors: Google Edge TPU, NVIDIA Jetson
• Application specific AI accelerators
• Edge frameworks
• Compile models to optimize for edge specific hardware
• Leverage hardware specific capabilities
• Create deployable packages for models
• E.g., NVIDIA TensorRT, Apache TVM, ONNX runtime
21
© 2023 Cisco and/or its affiliates. All rights reserved.
Edge frameworks - benefits
• Optimize execution graph for hardware
• Reduce memory requirements
• Remove unwanted steps/instructions
• Fuse steps/instructions
• Choose best values for configuration options
• Evaluate multiple execution strategies and choose the best one
• Create an optimized executable for inference
• Package model for ready deployment
22
© 2023 Cisco and/or its affiliates. All rights reserved.
Edge compilation process
• Choose the right framework based on the deployment hardware
• E.g., TensorRT is most suited for NVIDIA processors
• Use the trained, validated and compressed model as input
• Compile the model
• Plan for multiple iterations
• Try available options for optimization/adaption
• Validate model performance
• Use same benchmarks as compression
• Test on hardware specific development kits
• Create deployable artifact
23
24
Model optimization best practices
© 2023 Cisco and/or its affiliates. All rights reserved.
Best practices for optimization - 1
• Performance baselines and goals need to be established and validated
throughout the process
• Accuracy, latency, FLOPS, etc., based on the use case
• Helps ensure that the model performs as desired while going through optimizations
• Track results against baseline for all model training iterations over time
• Choose hardware / frameworks when beginning model training
• Deployment infrastructure may impact model architecture and optimizations needed
• Dependency / overlaps can be understood ahead of time
• Multiple deployment options may need to be supported
25
© 2023 Cisco and/or its affiliates. All rights reserved.
Best practices for optimization - 2
• Include edge hardware development kits/emulators as part of the training
lifecycle
• Automate optimization, compilation and testing
• Use similar hardware configurations as deployment
• Include collaborating edge applications also in end-to-end testing
• Automate validating results and model promotion
• Monitor deployment performance
• Some optimizations may carry negative impact when deployed in actual hardware
• Monitor performance and validate against set baseline
• Improve models based on experience
26
27
Thank You

More Related Content

Similar to “Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisco Systems

Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity ManagementEDB
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Lionel Briand
 
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...DevOps.com
 
IBM PureFlex - Expert Integrated System
IBM PureFlex - Expert Integrated SystemIBM PureFlex - Expert Integrated System
IBM PureFlex - Expert Integrated SystemIBM Danmark
 
1457 - Reviewing Experiences from the PureExperience Program
1457 - Reviewing Experiences from the PureExperience Program1457 - Reviewing Experiences from the PureExperience Program
1457 - Reviewing Experiences from the PureExperience ProgramHendrik van Run
 
Mass Scale Networking
Mass Scale NetworkingMass Scale Networking
Mass Scale NetworkingSteve Iatrou
 
Resume of sidharam prachcande PLM Consultant
Resume of sidharam prachcande  PLM ConsultantResume of sidharam prachcande  PLM Consultant
Resume of sidharam prachcande PLM ConsultantSidharth prachande
 
A00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification SuccessA00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification SuccessPalakMazumdar1
 
A Year of “Testing” the Cloud for Development and Test
A Year of “Testing” the Cloud for Development and TestA Year of “Testing” the Cloud for Development and Test
A Year of “Testing” the Cloud for Development and TestTechWell
 
Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...
Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...
Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...Midori Oge
 
A Year of Testing in the Cloud: Lessons Learned
A Year of Testing in the Cloud: Lessons LearnedA Year of Testing in the Cloud: Lessons Learned
A Year of Testing in the Cloud: Lessons LearnedTechWell
 
Microsoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenMicrosoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenGoDataDriven
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...wweinmeyer79
 
Ibm PureApplication system
Ibm PureApplication systemIbm PureApplication system
Ibm PureApplication systemkhawkwf
 
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...danielschulz2005
 
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...Curiosity Software Ireland
 

Similar to “Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisco Systems (20)

Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
Microservices at Scale: How to Reduce Overhead and Increase Developer Product...
 
IBM PureFlex - Expert Integrated System
IBM PureFlex - Expert Integrated SystemIBM PureFlex - Expert Integrated System
IBM PureFlex - Expert Integrated System
 
1457 - Reviewing Experiences from the PureExperience Program
1457 - Reviewing Experiences from the PureExperience Program1457 - Reviewing Experiences from the PureExperience Program
1457 - Reviewing Experiences from the PureExperience Program
 
Mass Scale Networking
Mass Scale NetworkingMass Scale Networking
Mass Scale Networking
 
Resume of sidharam prachcande PLM Consultant
Resume of sidharam prachcande  PLM ConsultantResume of sidharam prachcande  PLM Consultant
Resume of sidharam prachcande PLM Consultant
 
A00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification SuccessA00-440: Useful Questions for SAS ModelOps Specialist Certification Success
A00-440: Useful Questions for SAS ModelOps Specialist Certification Success
 
A Year of “Testing” the Cloud for Development and Test
A Year of “Testing” the Cloud for Development and TestA Year of “Testing” the Cloud for Development and Test
A Year of “Testing” the Cloud for Development and Test
 
20703-1B_00.pptx
20703-1B_00.pptx20703-1B_00.pptx
20703-1B_00.pptx
 
Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...
Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...
Apache CloudStack Examination - CloudStack Collaboration Conference in Europe...
 
A Year of Testing in the Cloud: Lessons Learned
A Year of Testing in the Cloud: Lessons LearnedA Year of Testing in the Cloud: Lessons Learned
A Year of Testing in the Cloud: Lessons Learned
 
Technical Without Code
Technical Without CodeTechnical Without Code
Technical Without Code
 
Microsoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenMicrosoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDriven
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...
 
Ibm PureApplication system
Ibm PureApplication systemIbm PureApplication system
Ibm PureApplication system
 
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
Productionizing Predictive Analytics using the Rendezvous Architecture - for ...
 
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
Curiosity and fourTheorem present: From Coverage Guesswork to Targeted Test G...
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisco Systems

  • 1. Introduction to Optimizing ML Models for the Edge Kumaran Ponnambalam Principal Engineer - AI Cisco Systems, Emerging Tech & Incubation
  • 2. © 2023 Cisco and/or its affiliates. All rights reserved. Agenda • Deploying deep learning models at the edge • Model compression techniques • Quantization • Pruning • Low rank approximation • Knowledge distillation • Leveraging edge hardware • Model optimization best practices 2
  • 4. © 2023 Cisco and/or its affiliates. All rights reserved. Edge AI : Growth & challenges • Exponential growth in Edge AI applications • Logistics, smart homes, transportation, security etc. • Computer vision, NLP, time series • Challenges with using cloud-based models • Latency • Reliable network connectivity • Security & privacy • Challenges deploying deep learning models at the edge • Huge model footprint ( > available memory) • Limited processing capacity 4
  • 5. © 2023 Cisco and/or its affiliates. All rights reserved. Deep learning models at the edge 5 Deep learning models need to be optimized for efficient and effective inference at the edge
  • 6. © 2023 Cisco and/or its affiliates. All rights reserved. Edge AI : Goals for optimization • Maintain model performance thresholds (Accuracy, F1, Recall, etc.) • Reduce model sizes (compression) • Improve model performance • Latency • FLOPS • Power usage • Leverage edge hardware capabilities • Edge CPUs / GPUs • Hardware accelerators 6
  • 8. © 2023 Cisco and/or its affiliates. All rights reserved. Model compression benefits • Smaller memory footprint • Reduced CPU/GPU time • Lower latency • Improved scaling per deployment • Negligible loss of accuracy in most cases • Easier packaging, transport and deployment 8
  • 9. © 2023 Cisco and/or its affiliates. All rights reserved. Quantization • Reduce the storage size of parameters • 32-bit to 8-bit (4X reduction) • Less memory requirements • Lower compute (FP vs INT operations) • Energy saving • Possible loss of accuracy (depends on model) • Popular ML frameworks support quantization techniques 9 FP32 0.76 -0.10 1.45 -2.20 0.92 -0.89 -0.01 2.14 1.78 INT8 95 -13 181 -275 115 -111 -1 268 223
  • 10. © 2023 Cisco and/or its affiliates. All rights reserved. Types of quantization 10 • Done on a float trained model after training • Convert weights, biases and activations to integers • Simple to implement • Loss of accuracy Post-training quantization • Done during training • Impact of quantization validated and adjusted • Post-training quantization on this model results in smaller/no loss of accuracy Quantization aware training
  • 11. © 2023 Cisco and/or its affiliates. All rights reserved. Quantization performance 11 Retrieved from : https://www.softserveinc.com/en-us/blog/deep-learning-model-compression-and-optimization
  • 12. © 2023 Cisco and/or its affiliates. All rights reserved. Model pruning • Eliminate model elements with low impact on outcomes • Nodes • Connections • Layers • Iterative pruning (increased sparsity) with test for performance • Size vs accuracy trade-off • Effectiveness depends on nature of data • Popular ML frameworks support pruning techniques 12
  • 13. © 2023 Cisco and/or its affiliates. All rights reserved. Types of pruning 13 • Remove individual elements • Connections • Nodes • Random removal with validation • Can achieve higher size reductions based on amount of pruning Unstructured Pruning • Remove part of the network • Layers • Channels • Filters • Easier process • Benefits based on model Structured Pruning
  • 14. © 2023 Cisco and/or its affiliates. All rights reserved. Pruning performance 14 Retrieved from : https://nips.cc/virtual/2020/public/poster_703957b6dd9e3a7980e040bee50ded65.html
  • 15. © 2023 Cisco and/or its affiliates. All rights reserved. Low-rank approximation • Reduce the number of parameters necessary to represent the model • Create matrix of lower rank • Eliminate redundant data • Measure performance with low rank matrix • Benefits vary based on the use case • Popular ML frameworks have out-of-the- box support 15
  • 16. © 2023 Cisco and/or its affiliates. All rights reserved. Low-rank approximation performance 16 Retrieved from : https://www.researchgate.net/figure/Post-training-results-of-low-rank-approximation-no-fine- tuning-fine-tuning-with_tbl1_362859051
  • 17. © 2023 Cisco and/or its affiliates. All rights reserved. Knowledge distillation • Train a small student model to mimic the outputs of a large teacher model • Distillation process compares outputs of the models for the same inputs and adjusts parameters for the student • Smaller model footprint for student • Comparable accuracy / performance • Training dataset can be use-case specific 17 Large teacher model Small student model Distillation process
  • 18. © 2023 Cisco and/or its affiliates. All rights reserved. Comparison of techniques 18 Quantization Pruning Low-rank Approximation Knowledge Distillation Cost Low Low Medium High During training Yes Yes Yes Yes Post training Yes Yes Yes Yes Pretrained models Yes Yes Yes No
  • 19. © 2023 Cisco and/or its affiliates. All rights reserved. Compression process • Create a baseline of the original model • Parameters, training data, test results • Set threshold levels for compression expectations • Expected minimum accuracy, maximum resource usage • Use an iterative approach • Try model compression in stages • Test with baseline training data • Compare with baseline test results and thresholds • Try different techniques to identify best approach • Combining approaches is possible ( e.g., quantization and pruning ) 19
  • 21. © 2023 Cisco and/or its affiliates. All rights reserved. Edge specialized infrastructure • Edge optimized hardware • Deliver best performance for edge constrained environments • Low end processors: Micro-controller units, neural processing units (NPU) • High end processors: Google Edge TPU, NVIDIA Jetson • Application specific AI accelerators • Edge frameworks • Compile models to optimize for edge specific hardware • Leverage hardware specific capabilities • Create deployable packages for models • E.g., NVIDIA TensorRT, Apache TVM, ONNX runtime 21
  • 22. © 2023 Cisco and/or its affiliates. All rights reserved. Edge frameworks - benefits • Optimize execution graph for hardware • Reduce memory requirements • Remove unwanted steps/instructions • Fuse steps/instructions • Choose best values for configuration options • Evaluate multiple execution strategies and choose the best one • Create an optimized executable for inference • Package model for ready deployment 22
  • 23. © 2023 Cisco and/or its affiliates. All rights reserved. Edge compilation process • Choose the right framework based on the deployment hardware • E.g., TensorRT is most suited for NVIDIA processors • Use the trained, validated and compressed model as input • Compile the model • Plan for multiple iterations • Try available options for optimization/adaption • Validate model performance • Use same benchmarks as compression • Test on hardware specific development kits • Create deployable artifact 23
  • 25. © 2023 Cisco and/or its affiliates. All rights reserved. Best practices for optimization - 1 • Performance baselines and goals need to be established and validated throughout the process • Accuracy, latency, FLOPS, etc., based on the use case • Helps ensure that the model performs as desired while going through optimizations • Track results against baseline for all model training iterations over time • Choose hardware / frameworks when beginning model training • Deployment infrastructure may impact model architecture and optimizations needed • Dependency / overlaps can be understood ahead of time • Multiple deployment options may need to be supported 25
  • 26. © 2023 Cisco and/or its affiliates. All rights reserved. Best practices for optimization - 2 • Include edge hardware development kits/emulators as part of the training lifecycle • Automate optimization, compilation and testing • Use similar hardware configurations as deployment • Include collaborating edge applications also in end-to-end testing • Automate validating results and model promotion • Monitor deployment performance • Some optimizations may carry negative impact when deployed in actual hardware • Monitor performance and validate against set baseline • Improve models based on experience 26