SlideShare a Scribd company logo
1 of 20
Download to read offline
Processing Raw Images
Efficiently with the
MAX78000 AI Neural
Network Accelerator
Mehmet Gorkem Ulkar
Principal Engineer, Machine Learning
Analog Devices
Agenda
2
© 2023 Analog Devices
1. Challenges of AI at the edge
2. MAX78000 overview
3. MAX78000 sample applications
4. Energy requirements for data manipulation
5. Proposal: CNN based de-bayerization
6. Results
7. Q&A
Mehmet Gorkem Ulkar,
PhD
Dallas, TX
Principal ML Engineer
Keep Your Data Close: The Physics of Data
Sources:
• Rick Zarr, TI, 2008, The True Cost of an Internet “Click” - estimate of transfer cost for 30KB page from server http://energyzarr.typepad.com/energyzarrnationalcom/2008/08/the-true-cost-o.html
• J Kunkel et al, University of Hamburg 2010,Collecting Energy Consumption of Scientific Data
• Horowitz ISSCC 2014, 1300-2600 pJ per 64b access
• Chris Rowen, Cadence Design Systems, January 2016, Get Real! – Neural Network Technology for Embedded Systems
1E-13
1E-14
1E-12
1E-11
1E-10
1E-09
1E-08
1E-07
1E-03
1E-04
1E-05
1E-06
1E-05 1E-04 1E-03 1E-02 1E-01 1E+02 1E+03 1E+04 1E+05 1E+06
J
per
64b
1E+00 1E+01
Distance (m)
Credit: Cadence
623
mi
~1000 km
3
► In inference, computational
effort is in forward
propagation
 On classic hardware, almost all spent in
a triple nested matrix multiplication loop
 O(n3) to O(n2.8) *
► Very energy intensive even with
fast matrix multiply using
integer math on DSP or GPU
– large number of memory accesses
*Strassen’s algorithm
Software Inference: Slow and Power Hungry
© 2023 Analog Devices 4
CNN Accelerator: MAX78000/MAX78002
• The conv operation is parallelizable in the channel
dimension.
 64 processors in total, more channels are processed in a multi-pass
fashion
• Proper architecture that minimizes data movement
provides energy efficiency
 Each input channel is processed in parallel using different processors
to minimize data movement
 Each processor uses dedicated memory
© 2023 Analog Devices 5
MAX78000 AI Micro - System-on-Chip
© 2023 Analog Devices 6
7
Model, Training, Deployment: Development Flow
© 2023 Analog Devices 7
MAX78000 Benchmarks
0
500
1000
1500
2000
MAX78000 MAX32650 STM32F7
Inference Time ms
0
50
100
150
200
250
300
MAX78000 MAX32650 STM32F7
Inference Energy mJ
Network MACs
MAX78000
CNN at 50 MHz1,
1.2V
MAX326502
Cortex-M4, 120 MHz,
1.2V
STM32F72
Cortex-M7, 216 MHz, 2.1V
▣ KWS20 13,801,088 2.0 ms, 0.14 mJ 350 ms, 8.37 mJ 125 ms, 30.1 mJ3
▣ FaceID 55,234,560 13.89 ms4, 0.40 mJ 1760 ms5, 42.1 mJ 714 ms5, 153 mJ + 59 mJ6
128 billion operations/second,
2ARM DSP with CMSIS-NN, running
exact same INT8 network as
MAX78000, 3STMF722ZE, internal
memory,
4Includes time to load input,
5Does not include time to load input,
6STMF746NG + external 3.3V SDRAM
IS42S32400F-6BL + SDRAM
controller
© 2023 Analog Devices 8
Battery Life Leader in Independent Benchmarks
Best
A Battery-Free Long-Range Wireless Smart Camera for Face Detection: An accurate
benchmark of novel Edge AI platforms and milliwatt microcontrollers Michele MAGNO, Head
of the Project-based learning Center, ETH Zurich, D-ITET, EMEA TinyML Talks June 2021
9
Thinking About Edge AI Use Cases…
If my [application] ______________
then do _________________
sees
hears
senses object/sound/event/situation/…
action
If my [camera] sees a bear, then take a high-resolution picture and send over cell network
If my [thermostat] hears glass break, then send a text message to the owner
If my [factory robot] sees a person nearby, then shutdown until they leave
If my [pet door] sees a cat with a mouse in its mouth, then lock the pet door and send me a text message
© 2023 Analog Devices 10
• Embeddings saved in memory on a
rolling basis
 No redundant calculations
Action Recognition
Dataset
Validation
Acc.
Paramet
ers
Kinetics-400
(4 classes +
other)
79.8% 379k
© 2023 Analog Devices 11
No Url
People Tracking
12
https://github.com/MaximIntegratedAI/refdes
Trail Camera
13
System Energy: From Traditional Systems to
MAX78000
• Accelerator drastically lowers CNN energy
• Input and data manipulation become much
larger relative contributors to energy
• MAX78000 improves data loading, better
algorithms can help with data manipulation:
e.g. better ways of handling raw images
CNN
CNN
Traditional
micro
Accelerator
only
MAX78000
express
loader
CNN
hardware
algo
improvement
Data
input
Data
manipulation
CNN
operation
Output
CNN
14
Energy
© 2023 Analog Devices 14
Data Manipulation: Debayerization
15
In order to obtain an RGB format, the raw image
must be debayerized. There are several
debayerization methods*:
• Bilinear Interpolation
• Sequential Demosaicing
• Iterative Demosaicing
• Machine Learning Methods
• Adaptive Color Plane Interpolation
Figure 1. Bayer Filter (Nkansah et. al., 2022)
► Outside the CNN accelerator
► Increased system energy consumption
© 2023 Analog Devices
* Dammer, K., Grosz R., (2017). Demosaising using a Convolutional Neural Network approach. Lund University, Lund, Sweden.
CNN based Debayerization
16
• Approach 1: Learning the manipulation & interpolation by a CNN model
and embedding this network into an accelerator ⟶ Efficient way of
debayerization
Figure 3. The Network of B2RGBNet (Syu et. al., 2018)
#
parameters:
124715
© 2023 Analog Devices
CNN based Debayerization
17
• Approach 2: Using folding and fixed 1x1 kernels
Step 1: Folding the pixels into channels Step 2: Convolution with the fixed kernel to
obtain RGB
© 2023 Analog Devices
Accuracy Results
18
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
ImageNet
Mean Squared Reconstruction Error
Bilinear Interpolation b2rgb conv w/fold + transconv + conv conv w/fold + b2rgb
© 2023 Analog Devices
• MAX78000 enables battery-powered smart applications at the edge
• Effective data manipulation and preprocessing are much more important
when using highly-efficient NN inference engines
• Two methods proposed to perform interpolation inside CNN accelerator,
MAX78000
• Results show better accuracies compared to simple conventional
interpolation; the work is ongoing
Conclusion
19
© 2023 Analog Devices
• We are waiting for you at the ADI booth!
• Upper-level AI repo: https://github.com/MaximIntegratedAI
• Open-source training repo: https://github.com/MaximIntegratedAI/ai8x-training/
• Open-source synthesis repo: https://github.com/MaximIntegratedAI/ai8x-
synthesis
• Data-folding paper: L3U-net: Low-Latency Lightweight U-net Based Image
Segmentation Model for Parallel CNN Processors
https://arxiv.org/pdf/2203.16528.pdf
• B2RGBNet paper: Learning Deep Convolutional Networks for Demosaicing
https://arxiv.org/pdf/1802.03769.pdf
Resources
20
© 2023 Analog Devices

More Related Content

What's hot

“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
Edge AI and Vision Alliance
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
Edge AI and Vision Alliance
 

What's hot (20)

Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Explicit Density Models
Explicit Density ModelsExplicit Density Models
Explicit Density Models
 
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
Vision Transformer(ViT) / An Image is Worth 16*16 Words: Transformers for Ima...
 
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
Use Variable Rate Shading (VRS) to Improve the User Experience in Real-Time G...
 
Transformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptxTransformers In Vision From Zero to Hero (DLI).pptx
Transformers In Vision From Zero to Hero (DLI).pptx
 
Classical force fields as physics-based neural networks
Classical force fields as physics-based neural networksClassical force fields as physics-based neural networks
Classical force fields as physics-based neural networks
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Quantum Computing
Quantum ComputingQuantum Computing
Quantum Computing
 
How to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memoHow to Burn Multi-GPUs using CUDA stress test memo
How to Burn Multi-GPUs using CUDA stress test memo
 
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
 
有點硬又不會太硬的DNN加速器
有點硬又不會太硬的DNN加速器有點硬又不會太硬的DNN加速器
有點硬又不會太硬的DNN加速器
 
Efficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® GraphicsEfficient Rendering with DirectX* 12 on Intel® Graphics
Efficient Rendering with DirectX* 12 on Intel® Graphics
 
VAE 처음부터 알아보기
VAE 처음부터 알아보기VAE 처음부터 알아보기
VAE 처음부터 알아보기
 
TPU paper slide
TPU paper slideTPU paper slide
TPU paper slide
 
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...“How Transformers are Changing the Direction of Deep Learning Architectures,”...
“How Transformers are Changing the Direction of Deep Learning Architectures,”...
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Fast stone capture key
Fast stone capture keyFast stone capture key
Fast stone capture key
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
 
Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2Understanding deep learning requires rethinking generalization (2017) 1/2
Understanding deep learning requires rethinking generalization (2017) 1/2
 

Similar to “Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator,” a Presentation from Analog Devices

“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
Edge AI and Vision Alliance
 
Lightning Acquisition and Processing On Sensor Node Using NI cRIO
Lightning Acquisition and Processing On Sensor Node Using NI cRIOLightning Acquisition and Processing On Sensor Node Using NI cRIO
Lightning Acquisition and Processing On Sensor Node Using NI cRIO
ijceronline
 

Similar to “Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator,” a Presentation from Analog Devices (20)

IRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET- Handwritten Decimal Image Compression using Deep Stacked AutoencoderIRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
IRJET- Handwritten Decimal Image Compression using Deep Stacked Autoencoder
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
Paper id 42201619
Paper id 42201619Paper id 42201619
Paper id 42201619
 
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
 
Lightning Acquisition and Processing On Sensor Node Using NI cRIO
Lightning Acquisition and Processing On Sensor Node Using NI cRIOLightning Acquisition and Processing On Sensor Node Using NI cRIO
Lightning Acquisition and Processing On Sensor Node Using NI cRIO
 
Smartplug ppt
Smartplug pptSmartplug ppt
Smartplug ppt
 
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
Revisiting Sensor MAC for Periodic Monitoring: Why Should Transmitters Be Ear...
 
Design & Implementation Of Fault Identification In Underground Cables Using IOT
Design & Implementation Of Fault Identification In Underground Cables Using IOTDesign & Implementation Of Fault Identification In Underground Cables Using IOT
Design & Implementation Of Fault Identification In Underground Cables Using IOT
 
A White Paper On Neural Network Quantization
A White Paper On Neural Network QuantizationA White Paper On Neural Network Quantization
A White Paper On Neural Network Quantization
 
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORSCOMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
COMPOSITE IMAGELET IDENTIFIER FOR ML PROCESSORS
 
PROTOTYPE OF IOT BASED DC MICROGRID AUTOMATION
PROTOTYPE OF IOT BASED DC MICROGRID AUTOMATIONPROTOTYPE OF IOT BASED DC MICROGRID AUTOMATION
PROTOTYPE OF IOT BASED DC MICROGRID AUTOMATION
 
Poster_group22
Poster_group22Poster_group22
Poster_group22
 
IRJET- Earthquake Early Warning System for Android
IRJET-  	  Earthquake Early Warning System for AndroidIRJET-  	  Earthquake Early Warning System for Android
IRJET- Earthquake Early Warning System for Android
 
Energy efficient platform designed for sdma applications in mobile wireless ...
Energy efficient platform designed for sdma applications in mobile  wireless ...Energy efficient platform designed for sdma applications in mobile  wireless ...
Energy efficient platform designed for sdma applications in mobile wireless ...
 
BKK16-100K2 ARM Research - Sensors to Supercomputers
BKK16-100K2 ARM Research - Sensors to SupercomputersBKK16-100K2 ARM Research - Sensors to Supercomputers
BKK16-100K2 ARM Research - Sensors to Supercomputers
 
Signal Classification and Identification for Cognitive Radio
Signal Classification and Identification for Cognitive RadioSignal Classification and Identification for Cognitive Radio
Signal Classification and Identification for Cognitive Radio
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Agent based Load Management for Microgrid
Agent based Load Management for MicrogridAgent based Load Management for Microgrid
Agent based Load Management for Microgrid
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdfTutorial-on-DNN-09A-Co-design-Sparsity.pdf
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
 
Application of Microcontroller in Transmitter Section of Wireless System
Application of Microcontroller in Transmitter Section of Wireless SystemApplication of Microcontroller in Transmitter Section of Wireless System
Application of Microcontroller in Transmitter Section of Wireless System
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 

“Processing Raw Images Efficiently on the MAX78000 Neural Network Accelerator,” a Presentation from Analog Devices

  • 1. Processing Raw Images Efficiently with the MAX78000 AI Neural Network Accelerator Mehmet Gorkem Ulkar Principal Engineer, Machine Learning Analog Devices
  • 2. Agenda 2 © 2023 Analog Devices 1. Challenges of AI at the edge 2. MAX78000 overview 3. MAX78000 sample applications 4. Energy requirements for data manipulation 5. Proposal: CNN based de-bayerization 6. Results 7. Q&A Mehmet Gorkem Ulkar, PhD Dallas, TX Principal ML Engineer
  • 3. Keep Your Data Close: The Physics of Data Sources: • Rick Zarr, TI, 2008, The True Cost of an Internet “Click” - estimate of transfer cost for 30KB page from server http://energyzarr.typepad.com/energyzarrnationalcom/2008/08/the-true-cost-o.html • J Kunkel et al, University of Hamburg 2010,Collecting Energy Consumption of Scientific Data • Horowitz ISSCC 2014, 1300-2600 pJ per 64b access • Chris Rowen, Cadence Design Systems, January 2016, Get Real! – Neural Network Technology for Embedded Systems 1E-13 1E-14 1E-12 1E-11 1E-10 1E-09 1E-08 1E-07 1E-03 1E-04 1E-05 1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+02 1E+03 1E+04 1E+05 1E+06 J per 64b 1E+00 1E+01 Distance (m) Credit: Cadence 623 mi ~1000 km 3
  • 4. ► In inference, computational effort is in forward propagation  On classic hardware, almost all spent in a triple nested matrix multiplication loop  O(n3) to O(n2.8) * ► Very energy intensive even with fast matrix multiply using integer math on DSP or GPU – large number of memory accesses *Strassen’s algorithm Software Inference: Slow and Power Hungry © 2023 Analog Devices 4
  • 5. CNN Accelerator: MAX78000/MAX78002 • The conv operation is parallelizable in the channel dimension.  64 processors in total, more channels are processed in a multi-pass fashion • Proper architecture that minimizes data movement provides energy efficiency  Each input channel is processed in parallel using different processors to minimize data movement  Each processor uses dedicated memory © 2023 Analog Devices 5
  • 6. MAX78000 AI Micro - System-on-Chip © 2023 Analog Devices 6
  • 7. 7 Model, Training, Deployment: Development Flow © 2023 Analog Devices 7
  • 8. MAX78000 Benchmarks 0 500 1000 1500 2000 MAX78000 MAX32650 STM32F7 Inference Time ms 0 50 100 150 200 250 300 MAX78000 MAX32650 STM32F7 Inference Energy mJ Network MACs MAX78000 CNN at 50 MHz1, 1.2V MAX326502 Cortex-M4, 120 MHz, 1.2V STM32F72 Cortex-M7, 216 MHz, 2.1V ▣ KWS20 13,801,088 2.0 ms, 0.14 mJ 350 ms, 8.37 mJ 125 ms, 30.1 mJ3 ▣ FaceID 55,234,560 13.89 ms4, 0.40 mJ 1760 ms5, 42.1 mJ 714 ms5, 153 mJ + 59 mJ6 128 billion operations/second, 2ARM DSP with CMSIS-NN, running exact same INT8 network as MAX78000, 3STMF722ZE, internal memory, 4Includes time to load input, 5Does not include time to load input, 6STMF746NG + external 3.3V SDRAM IS42S32400F-6BL + SDRAM controller © 2023 Analog Devices 8
  • 9. Battery Life Leader in Independent Benchmarks Best A Battery-Free Long-Range Wireless Smart Camera for Face Detection: An accurate benchmark of novel Edge AI platforms and milliwatt microcontrollers Michele MAGNO, Head of the Project-based learning Center, ETH Zurich, D-ITET, EMEA TinyML Talks June 2021 9
  • 10. Thinking About Edge AI Use Cases… If my [application] ______________ then do _________________ sees hears senses object/sound/event/situation/… action If my [camera] sees a bear, then take a high-resolution picture and send over cell network If my [thermostat] hears glass break, then send a text message to the owner If my [factory robot] sees a person nearby, then shutdown until they leave If my [pet door] sees a cat with a mouse in its mouth, then lock the pet door and send me a text message © 2023 Analog Devices 10
  • 11. • Embeddings saved in memory on a rolling basis  No redundant calculations Action Recognition Dataset Validation Acc. Paramet ers Kinetics-400 (4 classes + other) 79.8% 379k © 2023 Analog Devices 11
  • 14. System Energy: From Traditional Systems to MAX78000 • Accelerator drastically lowers CNN energy • Input and data manipulation become much larger relative contributors to energy • MAX78000 improves data loading, better algorithms can help with data manipulation: e.g. better ways of handling raw images CNN CNN Traditional micro Accelerator only MAX78000 express loader CNN hardware algo improvement Data input Data manipulation CNN operation Output CNN 14 Energy © 2023 Analog Devices 14
  • 15. Data Manipulation: Debayerization 15 In order to obtain an RGB format, the raw image must be debayerized. There are several debayerization methods*: • Bilinear Interpolation • Sequential Demosaicing • Iterative Demosaicing • Machine Learning Methods • Adaptive Color Plane Interpolation Figure 1. Bayer Filter (Nkansah et. al., 2022) ► Outside the CNN accelerator ► Increased system energy consumption © 2023 Analog Devices * Dammer, K., Grosz R., (2017). Demosaising using a Convolutional Neural Network approach. Lund University, Lund, Sweden.
  • 16. CNN based Debayerization 16 • Approach 1: Learning the manipulation & interpolation by a CNN model and embedding this network into an accelerator ⟶ Efficient way of debayerization Figure 3. The Network of B2RGBNet (Syu et. al., 2018) # parameters: 124715 © 2023 Analog Devices
  • 17. CNN based Debayerization 17 • Approach 2: Using folding and fixed 1x1 kernels Step 1: Folding the pixels into channels Step 2: Convolution with the fixed kernel to obtain RGB © 2023 Analog Devices
  • 18. Accuracy Results 18 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 ImageNet Mean Squared Reconstruction Error Bilinear Interpolation b2rgb conv w/fold + transconv + conv conv w/fold + b2rgb © 2023 Analog Devices
  • 19. • MAX78000 enables battery-powered smart applications at the edge • Effective data manipulation and preprocessing are much more important when using highly-efficient NN inference engines • Two methods proposed to perform interpolation inside CNN accelerator, MAX78000 • Results show better accuracies compared to simple conventional interpolation; the work is ongoing Conclusion 19 © 2023 Analog Devices
  • 20. • We are waiting for you at the ADI booth! • Upper-level AI repo: https://github.com/MaximIntegratedAI • Open-source training repo: https://github.com/MaximIntegratedAI/ai8x-training/ • Open-source synthesis repo: https://github.com/MaximIntegratedAI/ai8x- synthesis • Data-folding paper: L3U-net: Low-Latency Lightweight U-net Based Image Segmentation Model for Parallel CNN Processors https://arxiv.org/pdf/2203.16528.pdf • B2RGBNet paper: Learning Deep Convolutional Networks for Demosaicing https://arxiv.org/pdf/1802.03769.pdf Resources 20 © 2023 Analog Devices