SlideShare a Scribd company logo
© 2021 Chamberlain Group
Practical Guide to
Implementing ML on
Embedded Devices
Nathan Kopp
Chamberlain Group
© 2021 Chamberlain Group
About the Chamberlain Group
2
Chamberlain Group (CGI) is a global leader in access
solutions and products.
Giving the Power of
Access and Knowledge
People everywhere rely on
CGI to move safely through
their world, confident that
what they value most is
secure within reach.
Residential
Commercial
Automotive
END-MARKETS
SERVED
VISION
MISSION
Over 8,000 Employees Worldwide
CGI is a global team with solutions and operations
designed to serve customers in a variety of
markets worldwide.
© 2021 Chamberlain Group
About the Chamberlain Group
3
© 2021 Chamberlain Group
Goals of this talk
• Survey the landscape of edge inference implementation
• Explore software & hardware choices
• Examine model & optimization choices
4
More is possible than you think!
© 2021 Chamberlain Group
Choices
Hardware
Inference
Engine
Training
Framework
Model
Optimizations
Everything is intertwined.
5
© 2021 Chamberlain Group
Choices
Hardware
Inference
Engine
Training
Framework
Model
Optimizations
6
TFLite
Limited Layer Choice
TensorFlow
8-bit Quantized
Only
Some
Models
© 2021 Chamberlain Group
Choices
Hardware
Inference
Engine
Training
Framework
Model
Optimizations
7
TensorRT
Complex Pruning
PyTorch
16-bit FP YOLOv5
Hardware
8
© 2021 Chamberlain Group
Hardware Choices
MCU
Cortex-M, RISC-V
+SIMD +DSP +NPU +FPGA
CPU
Cortex-A, x86
+SIMD +DSP +GPU +NPU +FPGA
FPGA
On-Chip Memory, L1 & L2 Caches
Bus Speed
Register Count
9
Multiple Cores & Clock Speed
Out-of-Order Execution
© 2021 Chamberlain Group
Hardware Choices
MCU
Cortex-M, RISC-V
+SIMD +DSP +NPU +FPGA
CPU
Cortex-A, x86
+SIMD +DSP +GPU +NPU +FPGA
FPGA
On-Chip Memory, L1 & L2 Caches
Bus Speed
Register Count
10
Multiple Cores & Clock Speed
Out-of-Order Execution
© 2021 Chamberlain Group
Hardware Choices
MCU
Cortex-M, RISC-V
+SIMD +DSP +NPU +FPGA
CPU
Cortex-A, x86
+SIMD +DSP +GPU +NPU +FPGA
FPGA
On-Chip Memory, L1 & L2 Caches
Bus Speed
Register Count
11
Multiple Cores & Clock Speed
Out-of-Order Execution
© 2021 Chamberlain Group
Hardware Choices
MCU
Cortex-M, RISC-V
+SIMD +DSP +NPU +FPGA
CPU
Cortex-A, x86
+SIMD +DSP +GPU +NPU +FPGA
FPGA
On-Chip Memory, L1 & L2 Caches
Bus Speed
Register Count
12
Multiple Cores & Clock Speed
Out-of-Order Execution
© 2021 Chamberlain Group
Memory,
IO, etc.
Hardware: Common Configurations
Low Volume Mass Production
USB MIPI
Software → Hardware Hardware → Software
13
Model & Software
14
© 2021 Chamberlain Group
Model Optimization Workflow
Reference
Model
• State-of-the-art
• FP32 GPU
• Purpose: validate
your training data
Optimization
• Choose backbone
• Choose head
• Quantization
• Pruning
• NAS
Convert or
Compile
• Convert
• Compile
Deploy and Test
• Verify
• Review Choices
• Analyze
• Optimize
• Integrate
15
© 2021 Chamberlain Group
Model Optimization Workflow
Reference
Model
• State-of-the-art
• FP32 GPU
• Purpose: validate
your training data
Optimization
• Choose backbone
• Choose head
• Quantization
• Pruning
• NAS
Convert or
Compile
• Convert
• Compile
Deploy and Test
• Verify
• Review Choices
• Analyze
• Optimize
• Integrate
16
© 2021 Chamberlain Group
First, you will need data!
• Your problem space is different!
• Smaller datasets are usually OK
• Smaller models need less data
• Fine tuning needs less data
Research Datasets
Your
Application
17
© 2021 Chamberlain Group
Model Optimization Workflow
Reference
Model
• State-of-the-art
• FP32 GPU
• Purpose: validate
your training data
Optimization
• Choose backbone
• Choose head
• Quantization
• Pruning
• NAS
Convert or
Compile
• Convert
• Compile
Deploy and Test
• Verify
• Review Choices
• Analyze
• Optimize
• Integrate
18
© 2021 Chamberlain Group
Model Mash-Up
Output
Layer 5
Layer 4
Layer 3
Layer 2
Layer 1
Input
Head & Neck:
• Interprets results
• Inexpensive to fine-tune
• Lower data requirements than backbone
Backbone:
• Extracts features
• Costly to train; needs lots of data & time
• Recommendation: pre-trained weights
19
© 2021 Chamberlain Group
Model
Structure
Model
Optimizations
Inference
Engine
Training
Data
Optimization Options
Model Zoo
Pretrained Fine-tune with your data Train from scratch
Community-Supported Mix & Match Head & BB
Code it yourself
Quantization, Pruning, Compression
NAS / OFA
Decomposition
Hand-Optimized Code
Pre-Optimized Model
20
Pre-Optimized Runtime Community-Supported Runtimes
Commercial  → Open Source
© 2021 Chamberlain Group
Model Optimization Workflow
Reference
Model
• State-of-the-art
• FP32 GPU
• Purpose: validate
your training data
Optimization
• Choose backbone
• Choose head
• Quantization
• Pruning
• NAS
Convert or
Compile
• Convert
• Compile
Deploy and Test
• Verify
• Review Choices
• Analyze
• Optimize
• Integrate
21
© 2021 Chamberlain Group
Inference Engine
• Runtime
• Model is interpreted
• Model deployed separately
• Easier OTA updates
• Compiler
• Model is compiled
• Model is part of firmware
• Weights are often constants
22
• Code Optimizations
• Memory Usage
• Cache-aware (e.g., tiling)
• Efficient register usage
• Vectorization
• Use SIMD
• Use DSP, NPU, GPU
• Parallelization
© 2021 Chamberlain Group
Final Step: Integrate into your app
• Consensus over time
• No model gets it right all the time
• High frame rate:
• More samples for consensus
• Lower per-sample accuracy
• Low frame rate:
• Fewer samples for consensus
• Higher per-sample accuracy
100 ms inference time does NOT mean 10
FPS!
Reserve CPU cycles for:
• Ingesting from the sensor/buffer
• Interpreting the output
• Network
• Other app functions
• Temperature management
23
Example
24
© 2021 Chamberlain Group
Example: Object Detection on ArmV7
Task Vehicle Detection
Reference Model YOLOv5-s, FP32, PyTorch
Compute Constraints ARMv7 w/NEON, no accelerators
25
ASUS Tinkerboard (v1)
Cortex A17 @ 1.8 GHz
Raspberry Pi 2 B v1.1
Cortex A7 @ 900 MHz
Seeed NPI i.MX6ULL
Cortex A7 @ 800 MHz
© 2021 Chamberlain Group
Model
Structure
Model
Optimizations
Inference
Engine
Training
Data
Optimization Options
Model Zoo
Pretrained Fine-tune with your data Train from scratch
Community Supported Mix & Match Head & BB
Code it yourself
Quantization, Pruning, Compression
NAS / OFA
Decomposition
Hand-Optimized Code
Pre-Optimized Model
26
Pre-Optimized Runtime Community Supported Optimizations
Commercial  → Open Source
© 2021 Chamberlain Group
Models & Tools Tested
27
© 2021 Chamberlain Group
Models & Tools Tested
28
Why so much faster?
• Both 32-bit ARMv7
• NEON: 64-bit vs 32-bit
• FP: 16 registers vs 32 registers
• Out-of-order execution
• Deeper pipeline
• DMIPS/MHz: 4.0 vs 1.9
© 2021 Chamberlain Group
Models & Tools Tested
29
© 2021 Chamberlain Group
Models & Tools Tested
30
509x
54x
50x
10x
6x
Speed improvement
compared to
YOLOv5-s on PyTorch
80-class
80-class
1-class
© 2021 Chamberlain Group
Models & Tools Tested
31
Reference Model
509x
54x
50x
Speed
Improvement
© 2021 Chamberlain Group
Conclusions
• Inference on edge devices has become both possible and practical
• Small hardware features can make a big difference in speed
• Selecting the right model and the right inference engine for your hardware can expand
the scope of what is possible
32
© 2021 Chamberlain Group
Example of Resource Slide
33
Detectors used in the Example:
YOLOv5
https://github.com/ultralytics/yolov5
NanoDet
https://github.com/RangiLyu/nanodet
QuickYOLO
https://github.com/tehtea/QuickYOLO
Xailient
https://www.xailient.com/
Chamberlain Group:
https://chamberlaingroup.com
Note:
Many more links and resources are
available at the end of the slide deck.
Backup Material
34
© 2021 Chamberlain Group
Half of implementing deep learning
is fighting Python & C++ errors
and resolving library incompatibilities.
Pay close attention
to documented
versions!
Use “virtualenv”
Become a CMAKE
expert!
35
© 2021 Chamberlain Group
Papers
36
Paper Title URL
Larq Compute Engine: Design, Benchmark, and Deploy State-of-the-
Art Binarized Neural Networks
https://arxiv.org/abs/2011.09398
Latent Weights Do Not Exist: Rethinking Binarized Neural Network
Optimization
https://arxiv.org/abs/1906.02107
FCOS: Fully Convolutional One-Stage Object Detection https://arxiv.org/abs/1904.01355
Bridging the Gap Between Anchor-based and Anchor-free Detection
via Adaptive Training Sample Selection
https://arxiv.org/abs/1912.02424
Generalized Focal Loss: Learning Qualified and Distributed
Bounding Boxes for Dense Object Detection
https://arxiv.org/abs/2006.04388
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture
Design
https://arxiv.org/abs/1807.11164
© 2021 Chamberlain Group
Edge Inference Engines
Inference Engine Type Notes
TFLite Runtime Runtime for TF/Keras https://www.tensorflow.org/lite
TFLite Micro Runtime TFLite for MCUs https://www.tensorflow.org/lite/microcontrollers
Larq Compute Engine Runtime Binarized TFLite https://github.com/larq/compute-engine
NCNN Runtime Tencent runtime https://github.com/Tencent/ncnn
MNN Runtime Alibaba runtime https://github.com/alibaba/MNN
Apache TVM Compiler Compiler and optimizer https://tvm.apache.org/
Apache MicroTVM Compiler TVM for MCUs https://tvm.apache.org/docs/microtvm/index.html
Glow Compiler Compiler for ONNX https://ai.facebook.com/tools/glow/
Microsoft ELL Compiler Compiler https://github.com/Microsoft/ELL
deepC Compiler ONNX -> LLVM https://github.com/ai-techsystems/deepC
NNoM Library Keras -> C https://github.com/majianjia/nnom
37
Generic, Open-Source
Note: This list is not comprehensive.
© 2021 Chamberlain Group
Edge Inference Engines
Vendor Inference Engine Type Notes
Nvidia TensorRT Runtime Support for Nvidia GPUs, such as Jetson Nano
Arm Arm® NN Runtime Optimized for Arm Cortex-A CPU, Mali GPU, Ethos NPU
Arm CMSIS-NN Library Library used by various runtimes and compilers
NXP NXP eIQ™ Both Optimized for NXP; Tflite and Glow with Arm-NN & CMSIS-NN
Qualcomm SNPE Runtime For Qualcomm Snapdragon processors
Intel OpenVINO™ Runtime Runtime for Intel products, including Movidius
Morpho SoftNeuro Runtime Commercial platform; limited details publicly available.
Edge Impulse EON Compiler Commercial platform; Targeted at Microcontrollers
STMicro STM32Cube.AI Compiler Optimized for STM32
Kendryte nncase Compiler Kendryte K210; https://github.com/kendryte/nncase
38
Hardware-Specific and Commercial
Note: This list is not comprehensive.
© 2021 Chamberlain Group
Model Optimization Tools
Tool Framework(s) URL
TensorFlow MOT TensorFlow https://www.tensorflow.org/model_optimization
Microsoft NNI PyTorch https://github.com/microsoft/nni
IntelLabs Distiller PyTorch https://github.com/IntelLabs/distiller
Riptide TensorFlow + TVM https://github.com/jwfromm/Riptide
Qualcomm AIMET PyTorch, TensorFlow https://github.com/quic/aimet
39
Generic, Open-Source
Note: This list is not comprehensive.
© 2021 Chamberlain Group
Model Optimization Tools
Tool Framework(s) URL
OpenVINO NNCF PyTorch https://github.com/openvinotoolkit/nncf
NXP eIQ TensorFlow, TFLite, ONNX https://www.nxp.com/design/software/development-software/eiq-
ml-development-environment:EIQ
Deeplite PyTorch, TensorFlow, ONNX https://www.deeplite.ai/
Edge Impulse Keras
40
Hardware-Specific and Commercial
Note: This list is not comprehensive.
© 2021 Chamberlain Group
Peripheral Accelerators
Product Off-the-Thelf SBC USB
Nvidia GPU Jetson Nano, TX1, TX2 -
Movidius Myriad X - Intel Neural Compute Stick 2
Google Edge TPU Coral Dev Board, Dev Board Mini Coral USB Accelerator
Gryfalcon Lightspeeur® - Orange Pi AI Stick Lite
Rockchip RK1808 - Toybrick RK1808
41
Note: This list is not comprehensive.
© 2021 Chamberlain Group
SoCs w/Embedded Accelerators
Product Acceleration Single Board Computer
Qualcomm Snapdragon (various) DSP + GPU (+NPU) (by request only)
Ambarella CV2, CV5, CV22S, CV25S, CV28M DSP + NPU (by request only)
NXP i.MX 8 DSP + GPU SolidRun $160+
NXP i.MX 8M Plus DSP + GPU + NPU SolidRun, Wandboard $180+
Rockchip RK3399Pro NPU Rock Pi N10 $99+
Allwinner V831 NPU Sipeed MAIX-II Dock $29
Sophon BM1880 NPU Sophon Edge $129
42
Note: This list is not comprehensive.
© 2021 Chamberlain Group
MCUs for Inference
Vendor Product Features that support inference
Various Cortex-M4/7/33/35P SIMD instructions, FPU; Future Ethos-U55 microNPU
Raspberry Pi RP2040 Memory, bus fabric
Maxim Integrated MAX78000 Cortex-M4, CNN accelerator
Kendryte K210 DNN accelerator
Espressif ESP32-S3 SIMD instructions, FPU
Note: This list is not comprehensive.
43

More Related Content

What's hot

Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
Sanzid Kawsar
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
Deep Learning Italia
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Minh Pham
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
NAVER Engineering
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
Appsilon Data Science
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
홍배 김
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
CastLabKAIST
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
Somnath Banerjee
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
Natasha Latysheva
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
James Kirk
 
LSTM
LSTMLSTM
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
Dr. C.V. Suresh Babu
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
PhD thesis defence slides
PhD thesis defence slidesPhD thesis defence slides
PhD thesis defence slides
Sean Moran
 
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Jason Tsai
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
佳蓉 倪
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
SOYEON KIM
 

What's hot (20)

Natural Language processing
Natural Language processingNatural Language processing
Natural Language processing
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Normalization 방법
Normalization 방법 Normalization 방법
Normalization 방법
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
LeNet to ResNet
LeNet to ResNetLeNet to ResNet
LeNet to ResNet
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Boston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender SystemsBoston ML - Architecting Recommender Systems
Boston ML - Architecting Recommender Systems
 
LSTM
LSTMLSTM
LSTM
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Introduction to artificial neural network
Introduction to artificial neural networkIntroduction to artificial neural network
Introduction to artificial neural network
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
PhD thesis defence slides
PhD thesis defence slidesPhD thesis defence slides
PhD thesis defence slides
 
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 

Similar to “A Practical Guide to Implementing ML on Embedded Devices,” a Presentation from the Chamberlain Group

Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Data Con LA
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm update
inwin stack
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
Senturus
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
Ganesan Narayanasamy
 
sales-training-deck-apex-cloud-services-compute-hci-offers.pptx
sales-training-deck-apex-cloud-services-compute-hci-offers.pptxsales-training-deck-apex-cloud-services-compute-hci-offers.pptx
sales-training-deck-apex-cloud-services-compute-hci-offers.pptx
Aamir Ilyas
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
EDB
 
Trends in DNN compression
Trends in DNN compressionTrends in DNN compression
Trends in DNN compression
Kaushalya Madhawa
 
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorEarly Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Intel IT Center
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
Docker, Inc.
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
Rafael Arana
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
Amazon Web Services
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
Julien SIMON
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
DataWorks Summit
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
OCRE | Open Clouds for Research Environments
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
Srivatsan Ramanujam
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Yong Feng
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
MongoDB
 

Similar to “A Practical Guide to Implementing ML on Embedded Devices,” a Presentation from the Chamberlain Group (20)

Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
Arm - ceph on arm update
Arm - ceph on arm updateArm - ceph on arm update
Arm - ceph on arm update
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Cognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & TricksCognos Performance Tuning Tips & Tricks
Cognos Performance Tuning Tips & Tricks
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
 
sales-training-deck-apex-cloud-services-compute-hci-offers.pptx
sales-training-deck-apex-cloud-services-compute-hci-offers.pptxsales-training-deck-apex-cloud-services-compute-hci-offers.pptx
sales-training-deck-apex-cloud-services-compute-hci-offers.pptx
 
Machine Learning for Capacity Management
 Machine Learning for Capacity Management Machine Learning for Capacity Management
Machine Learning for Capacity Management
 
Trends in DNN compression
Trends in DNN compressionTrends in DNN compression
Trends in DNN compression
 
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi CoprocessorEarly Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
Early Successes Debugging with TotalView on the Intel Xeon Phi Coprocessor
 
Democratizing machine learning on kubernetes
Democratizing machine learning on kubernetesDemocratizing machine learning on kubernetes
Democratizing machine learning on kubernetes
 
Parallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks SummitParallel & Distributed Deep Learning - Dataworks Summit
Parallel & Distributed Deep Learning - Dataworks Summit
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Machine Learning Inference at the Edge
Machine Learning Inference at the EdgeMachine Learning Inference at the Edge
Machine Learning Inference at the Edge
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017Austin,TX Meetup presentation tensorflow final oct 26 2017
Austin,TX Meetup presentation tensorflow final oct 26 2017
 
Parallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSWParallel/Distributed Deep Learning and CDSW
Parallel/Distributed Deep Learning and CDSW
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 

More from Edge AI and Vision Alliance

“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
Edge AI and Vision Alliance
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
Edge AI and Vision Alliance
 
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
Edge AI and Vision Alliance
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
Edge AI and Vision Alliance
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
Edge AI and Vision Alliance
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
Edge AI and Vision Alliance
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
Edge AI and Vision Alliance
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
“Silicon Slip-Ups: The Ten Most Common Errors Processor Suppliers Make (Numbe...
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
“How Arm’s Machine Learning Solution Enables Vision Transformers at the Edge,...
 
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
“Nx EVOS: A New Enterprise Operating System for Video and Visual AI,” a Prese...
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a..."OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
"OpenCV for High-performance, Low-power Vision Applications on Snapdragon," a...
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
 
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
“Scaling Vision-based Edge AI Solutions: From Prototype to Global Deployment,...
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 

“A Practical Guide to Implementing ML on Embedded Devices,” a Presentation from the Chamberlain Group

  • 1. © 2021 Chamberlain Group Practical Guide to Implementing ML on Embedded Devices Nathan Kopp Chamberlain Group
  • 2. © 2021 Chamberlain Group About the Chamberlain Group 2 Chamberlain Group (CGI) is a global leader in access solutions and products. Giving the Power of Access and Knowledge People everywhere rely on CGI to move safely through their world, confident that what they value most is secure within reach. Residential Commercial Automotive END-MARKETS SERVED VISION MISSION Over 8,000 Employees Worldwide CGI is a global team with solutions and operations designed to serve customers in a variety of markets worldwide.
  • 3. © 2021 Chamberlain Group About the Chamberlain Group 3
  • 4. © 2021 Chamberlain Group Goals of this talk • Survey the landscape of edge inference implementation • Explore software & hardware choices • Examine model & optimization choices 4 More is possible than you think!
  • 5. © 2021 Chamberlain Group Choices Hardware Inference Engine Training Framework Model Optimizations Everything is intertwined. 5
  • 6. © 2021 Chamberlain Group Choices Hardware Inference Engine Training Framework Model Optimizations 6 TFLite Limited Layer Choice TensorFlow 8-bit Quantized Only Some Models
  • 7. © 2021 Chamberlain Group Choices Hardware Inference Engine Training Framework Model Optimizations 7 TensorRT Complex Pruning PyTorch 16-bit FP YOLOv5
  • 9. © 2021 Chamberlain Group Hardware Choices MCU Cortex-M, RISC-V +SIMD +DSP +NPU +FPGA CPU Cortex-A, x86 +SIMD +DSP +GPU +NPU +FPGA FPGA On-Chip Memory, L1 & L2 Caches Bus Speed Register Count 9 Multiple Cores & Clock Speed Out-of-Order Execution
  • 10. © 2021 Chamberlain Group Hardware Choices MCU Cortex-M, RISC-V +SIMD +DSP +NPU +FPGA CPU Cortex-A, x86 +SIMD +DSP +GPU +NPU +FPGA FPGA On-Chip Memory, L1 & L2 Caches Bus Speed Register Count 10 Multiple Cores & Clock Speed Out-of-Order Execution
  • 11. © 2021 Chamberlain Group Hardware Choices MCU Cortex-M, RISC-V +SIMD +DSP +NPU +FPGA CPU Cortex-A, x86 +SIMD +DSP +GPU +NPU +FPGA FPGA On-Chip Memory, L1 & L2 Caches Bus Speed Register Count 11 Multiple Cores & Clock Speed Out-of-Order Execution
  • 12. © 2021 Chamberlain Group Hardware Choices MCU Cortex-M, RISC-V +SIMD +DSP +NPU +FPGA CPU Cortex-A, x86 +SIMD +DSP +GPU +NPU +FPGA FPGA On-Chip Memory, L1 & L2 Caches Bus Speed Register Count 12 Multiple Cores & Clock Speed Out-of-Order Execution
  • 13. © 2021 Chamberlain Group Memory, IO, etc. Hardware: Common Configurations Low Volume Mass Production USB MIPI Software → Hardware Hardware → Software 13
  • 15. © 2021 Chamberlain Group Model Optimization Workflow Reference Model • State-of-the-art • FP32 GPU • Purpose: validate your training data Optimization • Choose backbone • Choose head • Quantization • Pruning • NAS Convert or Compile • Convert • Compile Deploy and Test • Verify • Review Choices • Analyze • Optimize • Integrate 15
  • 16. © 2021 Chamberlain Group Model Optimization Workflow Reference Model • State-of-the-art • FP32 GPU • Purpose: validate your training data Optimization • Choose backbone • Choose head • Quantization • Pruning • NAS Convert or Compile • Convert • Compile Deploy and Test • Verify • Review Choices • Analyze • Optimize • Integrate 16
  • 17. © 2021 Chamberlain Group First, you will need data! • Your problem space is different! • Smaller datasets are usually OK • Smaller models need less data • Fine tuning needs less data Research Datasets Your Application 17
  • 18. © 2021 Chamberlain Group Model Optimization Workflow Reference Model • State-of-the-art • FP32 GPU • Purpose: validate your training data Optimization • Choose backbone • Choose head • Quantization • Pruning • NAS Convert or Compile • Convert • Compile Deploy and Test • Verify • Review Choices • Analyze • Optimize • Integrate 18
  • 19. © 2021 Chamberlain Group Model Mash-Up Output Layer 5 Layer 4 Layer 3 Layer 2 Layer 1 Input Head & Neck: • Interprets results • Inexpensive to fine-tune • Lower data requirements than backbone Backbone: • Extracts features • Costly to train; needs lots of data & time • Recommendation: pre-trained weights 19
  • 20. © 2021 Chamberlain Group Model Structure Model Optimizations Inference Engine Training Data Optimization Options Model Zoo Pretrained Fine-tune with your data Train from scratch Community-Supported Mix & Match Head & BB Code it yourself Quantization, Pruning, Compression NAS / OFA Decomposition Hand-Optimized Code Pre-Optimized Model 20 Pre-Optimized Runtime Community-Supported Runtimes Commercial  → Open Source
  • 21. © 2021 Chamberlain Group Model Optimization Workflow Reference Model • State-of-the-art • FP32 GPU • Purpose: validate your training data Optimization • Choose backbone • Choose head • Quantization • Pruning • NAS Convert or Compile • Convert • Compile Deploy and Test • Verify • Review Choices • Analyze • Optimize • Integrate 21
  • 22. © 2021 Chamberlain Group Inference Engine • Runtime • Model is interpreted • Model deployed separately • Easier OTA updates • Compiler • Model is compiled • Model is part of firmware • Weights are often constants 22 • Code Optimizations • Memory Usage • Cache-aware (e.g., tiling) • Efficient register usage • Vectorization • Use SIMD • Use DSP, NPU, GPU • Parallelization
  • 23. © 2021 Chamberlain Group Final Step: Integrate into your app • Consensus over time • No model gets it right all the time • High frame rate: • More samples for consensus • Lower per-sample accuracy • Low frame rate: • Fewer samples for consensus • Higher per-sample accuracy 100 ms inference time does NOT mean 10 FPS! Reserve CPU cycles for: • Ingesting from the sensor/buffer • Interpreting the output • Network • Other app functions • Temperature management 23
  • 25. © 2021 Chamberlain Group Example: Object Detection on ArmV7 Task Vehicle Detection Reference Model YOLOv5-s, FP32, PyTorch Compute Constraints ARMv7 w/NEON, no accelerators 25 ASUS Tinkerboard (v1) Cortex A17 @ 1.8 GHz Raspberry Pi 2 B v1.1 Cortex A7 @ 900 MHz Seeed NPI i.MX6ULL Cortex A7 @ 800 MHz
  • 26. © 2021 Chamberlain Group Model Structure Model Optimizations Inference Engine Training Data Optimization Options Model Zoo Pretrained Fine-tune with your data Train from scratch Community Supported Mix & Match Head & BB Code it yourself Quantization, Pruning, Compression NAS / OFA Decomposition Hand-Optimized Code Pre-Optimized Model 26 Pre-Optimized Runtime Community Supported Optimizations Commercial  → Open Source
  • 27. © 2021 Chamberlain Group Models & Tools Tested 27
  • 28. © 2021 Chamberlain Group Models & Tools Tested 28 Why so much faster? • Both 32-bit ARMv7 • NEON: 64-bit vs 32-bit • FP: 16 registers vs 32 registers • Out-of-order execution • Deeper pipeline • DMIPS/MHz: 4.0 vs 1.9
  • 29. © 2021 Chamberlain Group Models & Tools Tested 29
  • 30. © 2021 Chamberlain Group Models & Tools Tested 30 509x 54x 50x 10x 6x Speed improvement compared to YOLOv5-s on PyTorch 80-class 80-class 1-class
  • 31. © 2021 Chamberlain Group Models & Tools Tested 31 Reference Model 509x 54x 50x Speed Improvement
  • 32. © 2021 Chamberlain Group Conclusions • Inference on edge devices has become both possible and practical • Small hardware features can make a big difference in speed • Selecting the right model and the right inference engine for your hardware can expand the scope of what is possible 32
  • 33. © 2021 Chamberlain Group Example of Resource Slide 33 Detectors used in the Example: YOLOv5 https://github.com/ultralytics/yolov5 NanoDet https://github.com/RangiLyu/nanodet QuickYOLO https://github.com/tehtea/QuickYOLO Xailient https://www.xailient.com/ Chamberlain Group: https://chamberlaingroup.com Note: Many more links and resources are available at the end of the slide deck.
  • 35. © 2021 Chamberlain Group Half of implementing deep learning is fighting Python & C++ errors and resolving library incompatibilities. Pay close attention to documented versions! Use “virtualenv” Become a CMAKE expert! 35
  • 36. © 2021 Chamberlain Group Papers 36 Paper Title URL Larq Compute Engine: Design, Benchmark, and Deploy State-of-the- Art Binarized Neural Networks https://arxiv.org/abs/2011.09398 Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization https://arxiv.org/abs/1906.02107 FCOS: Fully Convolutional One-Stage Object Detection https://arxiv.org/abs/1904.01355 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection https://arxiv.org/abs/1912.02424 Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection https://arxiv.org/abs/2006.04388 ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design https://arxiv.org/abs/1807.11164
  • 37. © 2021 Chamberlain Group Edge Inference Engines Inference Engine Type Notes TFLite Runtime Runtime for TF/Keras https://www.tensorflow.org/lite TFLite Micro Runtime TFLite for MCUs https://www.tensorflow.org/lite/microcontrollers Larq Compute Engine Runtime Binarized TFLite https://github.com/larq/compute-engine NCNN Runtime Tencent runtime https://github.com/Tencent/ncnn MNN Runtime Alibaba runtime https://github.com/alibaba/MNN Apache TVM Compiler Compiler and optimizer https://tvm.apache.org/ Apache MicroTVM Compiler TVM for MCUs https://tvm.apache.org/docs/microtvm/index.html Glow Compiler Compiler for ONNX https://ai.facebook.com/tools/glow/ Microsoft ELL Compiler Compiler https://github.com/Microsoft/ELL deepC Compiler ONNX -> LLVM https://github.com/ai-techsystems/deepC NNoM Library Keras -> C https://github.com/majianjia/nnom 37 Generic, Open-Source Note: This list is not comprehensive.
  • 38. © 2021 Chamberlain Group Edge Inference Engines Vendor Inference Engine Type Notes Nvidia TensorRT Runtime Support for Nvidia GPUs, such as Jetson Nano Arm Arm® NN Runtime Optimized for Arm Cortex-A CPU, Mali GPU, Ethos NPU Arm CMSIS-NN Library Library used by various runtimes and compilers NXP NXP eIQ™ Both Optimized for NXP; Tflite and Glow with Arm-NN & CMSIS-NN Qualcomm SNPE Runtime For Qualcomm Snapdragon processors Intel OpenVINO™ Runtime Runtime for Intel products, including Movidius Morpho SoftNeuro Runtime Commercial platform; limited details publicly available. Edge Impulse EON Compiler Commercial platform; Targeted at Microcontrollers STMicro STM32Cube.AI Compiler Optimized for STM32 Kendryte nncase Compiler Kendryte K210; https://github.com/kendryte/nncase 38 Hardware-Specific and Commercial Note: This list is not comprehensive.
  • 39. © 2021 Chamberlain Group Model Optimization Tools Tool Framework(s) URL TensorFlow MOT TensorFlow https://www.tensorflow.org/model_optimization Microsoft NNI PyTorch https://github.com/microsoft/nni IntelLabs Distiller PyTorch https://github.com/IntelLabs/distiller Riptide TensorFlow + TVM https://github.com/jwfromm/Riptide Qualcomm AIMET PyTorch, TensorFlow https://github.com/quic/aimet 39 Generic, Open-Source Note: This list is not comprehensive.
  • 40. © 2021 Chamberlain Group Model Optimization Tools Tool Framework(s) URL OpenVINO NNCF PyTorch https://github.com/openvinotoolkit/nncf NXP eIQ TensorFlow, TFLite, ONNX https://www.nxp.com/design/software/development-software/eiq- ml-development-environment:EIQ Deeplite PyTorch, TensorFlow, ONNX https://www.deeplite.ai/ Edge Impulse Keras 40 Hardware-Specific and Commercial Note: This list is not comprehensive.
  • 41. © 2021 Chamberlain Group Peripheral Accelerators Product Off-the-Thelf SBC USB Nvidia GPU Jetson Nano, TX1, TX2 - Movidius Myriad X - Intel Neural Compute Stick 2 Google Edge TPU Coral Dev Board, Dev Board Mini Coral USB Accelerator Gryfalcon Lightspeeur® - Orange Pi AI Stick Lite Rockchip RK1808 - Toybrick RK1808 41 Note: This list is not comprehensive.
  • 42. © 2021 Chamberlain Group SoCs w/Embedded Accelerators Product Acceleration Single Board Computer Qualcomm Snapdragon (various) DSP + GPU (+NPU) (by request only) Ambarella CV2, CV5, CV22S, CV25S, CV28M DSP + NPU (by request only) NXP i.MX 8 DSP + GPU SolidRun $160+ NXP i.MX 8M Plus DSP + GPU + NPU SolidRun, Wandboard $180+ Rockchip RK3399Pro NPU Rock Pi N10 $99+ Allwinner V831 NPU Sipeed MAIX-II Dock $29 Sophon BM1880 NPU Sophon Edge $129 42 Note: This list is not comprehensive.
  • 43. © 2021 Chamberlain Group MCUs for Inference Vendor Product Features that support inference Various Cortex-M4/7/33/35P SIMD instructions, FPU; Future Ethos-U55 microNPU Raspberry Pi RP2040 Memory, bus fabric Maxim Integrated MAX78000 Cortex-M4, CNN accelerator Kendryte K210 DNN accelerator Espressif ESP32-S3 SIMD instructions, FPU Note: This list is not comprehensive. 43