SlideShare a Scribd company logo
1 of 29
Download to read offline
State-of-the-Art Model
Quantization and
Optimization for
Efficient Edge AI
Hyunjin Kim
Compiler Team Lead
DEEPX AI
DEEPX: Edge AI Solution
2
© 2023 DeepX AI
NPU
Technology
NPU based SoC
Technology
AI HW/SW
Optimization
Developing one of the most efficient NPU technologies
Developing SoC ASIC with NPU for commercial products
Optimizing both AI HW & SW to provide the highest NPU efficiency
DEEPX: Edge AI Solution
3
© 2023 DeepX AI
NPU
Technology
NPU based SoC
Technology
AI HW/SW
Optimization
Developing one of the most efficient NPU technologies
Developing SoC ASIC with NPU for commercial products
Optimizing both AI HW & SW to provide the highest NPU efficiency
This Talk!
• Satisfy the EDGE AI inference requirement
• EDGE AI inference requirement
• Maintain high accuracy with quantization
• Support various AI models (operation coverage)
• Minimize HW Cost: SRAM
DEEPX Mission
4
© 2023 DeepX AI
• Satisfy the EDGE AI inference requirement
• EDGE AI inference requirement
• Maintain high accuracy with quantization
• Support various AI models (Operation Coverage)
• Minimize HW Cost: SRAM
DEEPX Mission
5
© 2023 DeepX AI
Our Solution:
HW/SW Co-design
• HW/SW Co-Design
• Selective quantization to avoid accuracy drop
• 8-bit quantized ops + FP32 ops
High Accuracy with Quantization
6
© 2023 DeepX AI
DXNN-Lite DXNN-Pro DXNN-Master
Norm. Accuracy
Over FP32 (avg.)
99.01% 99.42% 100.54%
Time Cost ~ 1 min - 1~2 hours 50+ GPU Hours
* Training env.
• High operation coverage with minimum implementation cost
• HW/SW Co-Design
• Increase op coverage via combinations of supported ops
• TransConv: Resize + Padding + Conv
• DilatedConv: multiple Convs
• ArgMax: Two-stage implementation
Operation Coverage
7
© 2023 DeepX AI
• DEEPX NPUs have small SRAM sizes.
• HW/SW Co-Design
• DEEPX compiler’s memory optimization maximizes SRAM
utilization to reduce DRAM accesses
Minimize HW Cost: SRAM
8
© 2023 DeepX AI
SRAM: 1 MB SRAM: 8.25 MB
SRAM: 2.25 MB
Memory Optimization: DRAM Accesses
9
© 2023 DeepX AI
49%
0%
20%
40%
60%
80%
100%
120%
MobileNetv1 MobileNetv2 MobileNetv3-Large ResNet50 EfficientNet-B0 YOLOv4 (telechips) YOLOv5s Average
Normalized DRAM Accesses Over Base [M1-4K, 2.25 MB]
Base #DRAM
Memory Optimization: Performance
10
© 2023 DeepX AI
137%
0%
20%
40%
60%
80%
100%
120%
140%
160%
MobileNetv1 MobileNetv2 MobileNetv3-Large ResNet50 EfficientNet-B0 YOLOv4 (telechips) YOLOv5s Average
Normalized Performance (FPS) Over Base [M1-4K, 2.25 MB]
Base MemOpt
11
Case Study: Fused Operations
© 2023 DeepX AI
• Segmentation Models
• Resize → Argmax at the end for pixel-wise classification
Fused Operation: Resize + Argmax (1/4)
12
bisenet_v2
© 2023 DeepX AI
• Segmentation Models
• Resize → Argmax at the end for pixel-wise classification
Fused Operation: Resize + Argmax (2/4)
13
bisenet_v2
1 x 171 x 640 x 640 x 4 (FP32) = 267 MB
1 x 640 x 640 x 2 (INT16) = 800 KB
© 2023 DeepX AI
• Segmentation Models
• Resize → Argmax at the end for pixel-wise classification
Fused Operation: Resize + Argmax (3/4)
14
bisenet_v2
1 x 171 x 640 x 640 x 4 (FP32) = 267 MB
Fusing Resize and Argmax by performing Argmax during
Resize via 2-stage Argmax (171 → 1)
1 x 640 x 640 x 2 (INT16) = 800 KB
© 2023 DeepX AI
• Segmentation Models
• Resize → Argmax at the end for pixel-wise classification
Fused Operation: Resize + Argmax (4/4)
15
Fused
bisenet_v2
1 x 171 x 640 x 640 x 4 (FP32) = 267 MB
Fusing Resize and Argmax by performing Argmax during
Resize via 2-stage Argmax (171 → 1)
1 x 640 x 640 x 2 (INT16) = 800 KB
© 2023 DeepX AI
16
DXNN SDK Tutorial
© 2023 DeepX AI
DXNN™ – DEEPX NPU SDK
17
© 2023 DeepX AI
 Beta version of DXNN has been released to customers.
DXNN SDK Tutorial: Overview
18
© 2023 DeepX AI
Model ONNX file
Model Config file
- input shape
- calibration dataset
- preprocessing
NPU Binary
NPU binary config
Weight Binary
• DX-COM compiles the model onxx file
• It also performs the quantization as well.
• The config file includes the model meta data.
• input shape, input pre-processing parameter, …
$./dx_com/dx_com 
--model_path sample/yolov5s.onnx 
--config_path sample/yolov5s.json 
--output_dir output/yolov5s
DXNN SDK Tutorial: Model Compilation
19
© 2023 DeepX AI
$run_model --model /dxrt/models/yolov5s  # compilation output directory
--input /dxrt/models/yolov5s/input/npu_input_0.bin 
--ref /dxrt/models/yolov5s/output/output_last/npu_output_0.bin 
--output output.bin
--loop 5
DXNN SDK Tutorial: Run Inference
20
© 2023 DeepX AI
DX-RT
Model Inference
Option
Input
Tensors
Post Processing
Output
Tensors
DXNN SDK Tutorial: Post Processing
21
© 2023 DeepX AI
• User should configure the parameters for post-processing.
• The parameters are not included in the compilation output.
• Example: yolov5s
YoloParam yolov5s_512 = {
.imgSize = 512,
.confThreshold = 0.25,
.scoreThreshold = 0.3,
.iouThreshold = 0.4,
.numClasses = 80,
.layers = {
…
};
#include "dxrt/dxrt_api.h“
#1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER
dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC);
DXNN SDK Tutorial: DX-RT API #1
22
© 2023 DeepX AI
#include "dxrt/dxrt_api.h“
#1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER
dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC);
auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model
DXNN SDK Tutorial: DX-RT API #2
23
© 2023 DeepX AI
#include "dxrt/dxrt_api.h“
#1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER
dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC);
auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model
auto inputs = ie.GetInput(); #3 Set inputs
DXNN SDK Tutorial: DX-RT API #3
24
© 2023 DeepX AI
#include "dxrt/dxrt_api.h“
#1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER
dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC);
auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model
auto inputs = ie.GetInput(); #3 Set inputs
auto outputs = ie.Run(); #4 Run Inference
DXNN SDK Tutorial: DX-RT API #4
25
© 2023 DeepX AI
#include "dxrt/dxrt_api.h“
#1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER
dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC);
auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model
auto inputs = ie.GetInput(); #3 Set inputs
auto outputs = ie.Run(); #4 Run Inference
auto result = yolo.PostProc(outputs); #5 Perform Post Processing
DXNN SDK Tutorial: DX-RT API #5
26
© 2023 DeepX AI
DXNN SDK Tutorial: Result
27
© 2023 DeepX AI
• DEEPX’s HW/SW codesign leads to
• Meeting the required specifications with minimum HW cost
• Selective quantization
• The combination of supported ops
• Minimize HW Cost: SRAM
• Case Study
• Reduced DRAM access by fused operation
Conclusion
28
© 2023 DeepX AI
Resources
2023 Embedded Vision Summit
Previous Talks from DEEPX
1. “Toward the Era of AI Everywhere”
(Lokwon Kim, CEO)
2. “DEEPX’s New M1 NPU Delivers
Flexibility, Accuracy, Efficiency and
Performance”
(Jay Kim, Vice President)
29
1. Demo Booth: #103
2. DEEPX Developers Page:
https://deepx.ai/developers
3. Linkedin & Youtube
© 2023 DeepX AI

More Related Content

Similar to “State-of-the-art Model Quantization and Optimization for Efficient Edge AI,” a Presentation from DEEPX

Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsDeepak Shankar
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Accelerating SDN/NFV with transparent offloading architecture
Accelerating SDN/NFV with transparent offloading architectureAccelerating SDN/NFV with transparent offloading architecture
Accelerating SDN/NFV with transparent offloading architectureOpen Networking Summits
 
Android on IA devices and Intel Tools
Android on IA devices and Intel ToolsAndroid on IA devices and Intel Tools
Android on IA devices and Intel ToolsXavier Hallade
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Intel® Software
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...Edge AI and Vision Alliance
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsHannes Tschofenig
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Databricks
 
Presentazione IBM Flex System e System x Evento Venaria 14 ottobre
Presentazione IBM Flex System e System x Evento Venaria 14 ottobrePresentazione IBM Flex System e System x Evento Venaria 14 ottobre
Presentazione IBM Flex System e System x Evento Venaria 14 ottobrePRAGMA PROGETTI
 
0xdroid osdc-2010-100426084937-phpapp02
0xdroid osdc-2010-100426084937-phpapp020xdroid osdc-2010-100426084937-phpapp02
0xdroid osdc-2010-100426084937-phpapp02chon2010
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlabNational Cheng Kung University
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance UpdateRebekah Rodriguez
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Deepak Shankar
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 

Similar to “State-of-the-art Model Quantization and Optimization for Efficient Edge AI,” a Presentation from DEEPX (20)

Energy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systemsEnergy efficient AI workload partitioning on multi-core systems
Energy efficient AI workload partitioning on multi-core systems
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Accelerating SDN/NFV with transparent offloading architecture
Accelerating SDN/NFV with transparent offloading architectureAccelerating SDN/NFV with transparent offloading architecture
Accelerating SDN/NFV with transparent offloading architecture
 
Android on IA devices and Intel Tools
Android on IA devices and Intel ToolsAndroid on IA devices and Intel Tools
Android on IA devices and Intel Tools
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Crypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M ProcessorsCrypto Performance on ARM Cortex-M Processors
Crypto Performance on ARM Cortex-M Processors
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
Presentazione IBM Flex System e System x Evento Venaria 14 ottobre
Presentazione IBM Flex System e System x Evento Venaria 14 ottobrePresentazione IBM Flex System e System x Evento Venaria 14 ottobre
Presentazione IBM Flex System e System x Evento Venaria 14 ottobre
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
 
0xdroid osdc-2010-100426084937-phpapp02
0xdroid osdc-2010-100426084937-phpapp020xdroid osdc-2010-100426084937-phpapp02
0xdroid osdc-2010-100426084937-phpapp02
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab0xdroid -- community-developed Android distribution by 0xlab
0xdroid -- community-developed Android distribution by 0xlab
 
Supermicro X12 Performance Update
Supermicro X12 Performance UpdateSupermicro X12 Performance Update
Supermicro X12 Performance Update
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
Compare Performance-power of Arm Cortex vs RISC-V for AI applications_oct_2021
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

“State-of-the-art Model Quantization and Optimization for Efficient Edge AI,” a Presentation from DEEPX

  • 1. State-of-the-Art Model Quantization and Optimization for Efficient Edge AI Hyunjin Kim Compiler Team Lead DEEPX AI
  • 2. DEEPX: Edge AI Solution 2 © 2023 DeepX AI NPU Technology NPU based SoC Technology AI HW/SW Optimization Developing one of the most efficient NPU technologies Developing SoC ASIC with NPU for commercial products Optimizing both AI HW & SW to provide the highest NPU efficiency
  • 3. DEEPX: Edge AI Solution 3 © 2023 DeepX AI NPU Technology NPU based SoC Technology AI HW/SW Optimization Developing one of the most efficient NPU technologies Developing SoC ASIC with NPU for commercial products Optimizing both AI HW & SW to provide the highest NPU efficiency This Talk!
  • 4. • Satisfy the EDGE AI inference requirement • EDGE AI inference requirement • Maintain high accuracy with quantization • Support various AI models (operation coverage) • Minimize HW Cost: SRAM DEEPX Mission 4 © 2023 DeepX AI
  • 5. • Satisfy the EDGE AI inference requirement • EDGE AI inference requirement • Maintain high accuracy with quantization • Support various AI models (Operation Coverage) • Minimize HW Cost: SRAM DEEPX Mission 5 © 2023 DeepX AI Our Solution: HW/SW Co-design
  • 6. • HW/SW Co-Design • Selective quantization to avoid accuracy drop • 8-bit quantized ops + FP32 ops High Accuracy with Quantization 6 © 2023 DeepX AI DXNN-Lite DXNN-Pro DXNN-Master Norm. Accuracy Over FP32 (avg.) 99.01% 99.42% 100.54% Time Cost ~ 1 min - 1~2 hours 50+ GPU Hours * Training env.
  • 7. • High operation coverage with minimum implementation cost • HW/SW Co-Design • Increase op coverage via combinations of supported ops • TransConv: Resize + Padding + Conv • DilatedConv: multiple Convs • ArgMax: Two-stage implementation Operation Coverage 7 © 2023 DeepX AI
  • 8. • DEEPX NPUs have small SRAM sizes. • HW/SW Co-Design • DEEPX compiler’s memory optimization maximizes SRAM utilization to reduce DRAM accesses Minimize HW Cost: SRAM 8 © 2023 DeepX AI SRAM: 1 MB SRAM: 8.25 MB SRAM: 2.25 MB
  • 9. Memory Optimization: DRAM Accesses 9 © 2023 DeepX AI 49% 0% 20% 40% 60% 80% 100% 120% MobileNetv1 MobileNetv2 MobileNetv3-Large ResNet50 EfficientNet-B0 YOLOv4 (telechips) YOLOv5s Average Normalized DRAM Accesses Over Base [M1-4K, 2.25 MB] Base #DRAM
  • 10. Memory Optimization: Performance 10 © 2023 DeepX AI 137% 0% 20% 40% 60% 80% 100% 120% 140% 160% MobileNetv1 MobileNetv2 MobileNetv3-Large ResNet50 EfficientNet-B0 YOLOv4 (telechips) YOLOv5s Average Normalized Performance (FPS) Over Base [M1-4K, 2.25 MB] Base MemOpt
  • 11. 11 Case Study: Fused Operations © 2023 DeepX AI
  • 12. • Segmentation Models • Resize → Argmax at the end for pixel-wise classification Fused Operation: Resize + Argmax (1/4) 12 bisenet_v2 © 2023 DeepX AI
  • 13. • Segmentation Models • Resize → Argmax at the end for pixel-wise classification Fused Operation: Resize + Argmax (2/4) 13 bisenet_v2 1 x 171 x 640 x 640 x 4 (FP32) = 267 MB 1 x 640 x 640 x 2 (INT16) = 800 KB © 2023 DeepX AI
  • 14. • Segmentation Models • Resize → Argmax at the end for pixel-wise classification Fused Operation: Resize + Argmax (3/4) 14 bisenet_v2 1 x 171 x 640 x 640 x 4 (FP32) = 267 MB Fusing Resize and Argmax by performing Argmax during Resize via 2-stage Argmax (171 → 1) 1 x 640 x 640 x 2 (INT16) = 800 KB © 2023 DeepX AI
  • 15. • Segmentation Models • Resize → Argmax at the end for pixel-wise classification Fused Operation: Resize + Argmax (4/4) 15 Fused bisenet_v2 1 x 171 x 640 x 640 x 4 (FP32) = 267 MB Fusing Resize and Argmax by performing Argmax during Resize via 2-stage Argmax (171 → 1) 1 x 640 x 640 x 2 (INT16) = 800 KB © 2023 DeepX AI
  • 16. 16 DXNN SDK Tutorial © 2023 DeepX AI
  • 17. DXNN™ – DEEPX NPU SDK 17 © 2023 DeepX AI  Beta version of DXNN has been released to customers.
  • 18. DXNN SDK Tutorial: Overview 18 © 2023 DeepX AI Model ONNX file Model Config file - input shape - calibration dataset - preprocessing NPU Binary NPU binary config Weight Binary
  • 19. • DX-COM compiles the model onxx file • It also performs the quantization as well. • The config file includes the model meta data. • input shape, input pre-processing parameter, … $./dx_com/dx_com --model_path sample/yolov5s.onnx --config_path sample/yolov5s.json --output_dir output/yolov5s DXNN SDK Tutorial: Model Compilation 19 © 2023 DeepX AI
  • 20. $run_model --model /dxrt/models/yolov5s # compilation output directory --input /dxrt/models/yolov5s/input/npu_input_0.bin --ref /dxrt/models/yolov5s/output/output_last/npu_output_0.bin --output output.bin --loop 5 DXNN SDK Tutorial: Run Inference 20 © 2023 DeepX AI DX-RT Model Inference Option Input Tensors Post Processing Output Tensors
  • 21. DXNN SDK Tutorial: Post Processing 21 © 2023 DeepX AI • User should configure the parameters for post-processing. • The parameters are not included in the compilation output. • Example: yolov5s YoloParam yolov5s_512 = { .imgSize = 512, .confThreshold = 0.25, .scoreThreshold = 0.3, .iouThreshold = 0.4, .numClasses = 80, .layers = { … };
  • 22. #include "dxrt/dxrt_api.h“ #1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC); DXNN SDK Tutorial: DX-RT API #1 22 © 2023 DeepX AI
  • 23. #include "dxrt/dxrt_api.h“ #1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC); auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model DXNN SDK Tutorial: DX-RT API #2 23 © 2023 DeepX AI
  • 24. #include "dxrt/dxrt_api.h“ #1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC); auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model auto inputs = ie.GetInput(); #3 Set inputs DXNN SDK Tutorial: DX-RT API #3 24 © 2023 DeepX AI
  • 25. #include "dxrt/dxrt_api.h“ #1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC); auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model auto inputs = ie.GetInput(); #3 Set inputs auto outputs = ie.Run(); #4 Run Inference DXNN SDK Tutorial: DX-RT API #4 25 © 2023 DeepX AI
  • 26. #include "dxrt/dxrt_api.h“ #1 Set Inference Options: 1 NPU, 1 WORKER THREAD, 2 MEMORY BUFFER dxrt::InferenceOption option(1, 1, 2, dxrt::InferenceMode::MODE_SYNC); auto ie = dxrt::InferenceEngine("yolov5s", &option); #2 Load Model auto inputs = ie.GetInput(); #3 Set inputs auto outputs = ie.Run(); #4 Run Inference auto result = yolo.PostProc(outputs); #5 Perform Post Processing DXNN SDK Tutorial: DX-RT API #5 26 © 2023 DeepX AI
  • 27. DXNN SDK Tutorial: Result 27 © 2023 DeepX AI
  • 28. • DEEPX’s HW/SW codesign leads to • Meeting the required specifications with minimum HW cost • Selective quantization • The combination of supported ops • Minimize HW Cost: SRAM • Case Study • Reduced DRAM access by fused operation Conclusion 28 © 2023 DeepX AI
  • 29. Resources 2023 Embedded Vision Summit Previous Talks from DEEPX 1. “Toward the Era of AI Everywhere” (Lokwon Kim, CEO) 2. “DEEPX’s New M1 NPU Delivers Flexibility, Accuracy, Efficiency and Performance” (Jay Kim, Vice President) 29 1. Demo Booth: #103 2. DEEPX Developers Page: https://deepx.ai/developers 3. Linkedin & Youtube © 2023 DeepX AI