SlideShare a Scribd company logo
1 of 25
Download to read offline
Efficiently Map AI and
Vision Applications onto
Multi-Core AI Processors
Using CEVA’s Parallel
Processing Framework
Rami Drucker
SW Architect
CEVA
Executive Summary
• Next generation
AI and CV
applications
require higher
than ever
computing power.
• Edge devices use
multi-core
processors to
deliver high
performance.
• However,
developers must
efficiently map
their applications
onto the multiple
cores, which can
be difficult.
• CEVA has
introduced the
Architecture
Planner tool as a
new element of
CDNN, CEVA’s
comprehensive AI
SDK.
• In this talk, we’ll
show how the
Architecture
Planner tool
analyzes the
network model
and maps the
workload onto the
multiple cores in
an efficient
manner.
• We’ll explain key
techniques used
by the Tool,
including
symmetrical and
asymmetrical
multi-processing
paradigms.
2
NeuPro-M – A Family of AI Processors
A Full System Solution
NeuPro-M AI Core
NeuPro-M Common Subsystem
Multi
Engine
r
Data &
Weights
Compression
AXI Matrix
Host
Interface
Realtime
Interface
Functional
Safety
Security logic
Core Shared
Memory
NeuPro-M Master Controller
- Optimized for performance, control and code size
AXI
Engine#8
Engine #...
Engine #2
Engine #1
NeuPro-M Engine
AXI
Mixed
Precision
Neural Engine
Complementary
Activation
Unit
Unstructured
Sparsity
Engine
(Data/Weights)
Winograd
Transform
Engine
Vector
Processor
Unit
Engine Local
Controller
Matrix
Decomposition
Engine
local
Shared
Memory
4K
MAC
3
CDNN Open Development Platform
CDNN SDK
NeuPro-M Core HW
NPM
Controller
NPM
Sub-system
CDNN
Frontend
CDNN
Backend
Optimized Layers / Operators
Multi-thread
CDNN
Android NN
CEVA’s
Host / OS Interface
CEVA’s
Parallel Programing
Solution
NPM Engine
SensPro DSP HW
SP
Controller
Halide
SPU
VCU 4
CDNN Toolchain Workflow
1 Neural Network Training Optimizer Tool
Real-Time Multi-Network Inferencing
4 CDNN Offline Optimization
3
2 Architecture Planner Tools
Training
Framework
Image
Database
OUTPUT
► BW to DDR
► L1-L2 BW
► Cycle count
► Overall performance
► Precision
► Output Image
Data Flow
Resolution
L2
Analyzer
L1
Analyzer
Layer
Performance
in a front basket"
with a cat
riding a bicycle,
with long hair
“Young girl
Parallel processing of Multi- networks
running over Several engines/cores
Input Image
Neural
Network
+ +
User configuration
• L1/L2 MSS size
• DDR max BW
• DDR latency
Graph
Manager
Quantizer
– Winograd
– Mix Precision
– BW level
Precision Wizard
System Planner
Input
Output
Hidden
Fixed-point
Customized Network + Weight
FEATURES
 Graph Compiler
 Memory Optimizations
 BW reduction
 Code size reduction
 Precision
 Scale per Channel
 Scale per Layer
 Scale per Module
 CDNN-Invite
 Node API
 Custom Device API
 Open-source front-end
 QAT: Quantization Aware Training
 Pruning
 Structured pruning
 Semi-Structured Pruning
 Unstructured pruning
 HW-Aware Network Optimization
 Model size reduction
5
CDNN Multi-Engine Features
• CDNN supports the following parallel processing paradigms to leverage the
compute power of multi-engine device and maximize overall system
throughput
• Partitioning by tiles  different tiles, shared weights. This is efficient in first
layers where input maps are large
• Partitioning by output maps  same tile, different weights
• Partitioning by sub-graphs  different sub-graphs are assigned to engines
• Pipeline partitioning  sequence of nodes assigned to engine
• Batch partitioning  same network, different input image
• Partitioning by networks  different network per engine
6
Quad-Engine Core – Segmentation Network
Data Flow
NeuPro-M™
AI Processor
L1 MEM
engine #0
L1 MEM
engine #3
L1 MEM
engine #1
L2
Shared
MSS
L1 MEM
engine #0
NPM
engine #0
NPM
engine #3
NPM
engine #1
NPM
engine #2
L1 MEM
engine #1
L1 MEM
engine #2
L1 MEM
engine #3
External Memory /
ISP real-time client
X
Y
AXI
P a r a l l e l
p r o c e s s i n g
L1 MEM
engine #2
Overlap
area
Output
Input:
7
Parallel Processing Paradigms
Model by model
NeuPro-M
engine #0
NeuPro-M
engine #3
NeuPro-M
engine #1
NeuPro-M
engine #2
3
Object detection
Image processing
Lane detection
Free space segmentation
Frame by frame
NeuPro-M
engine #0
NeuPro-M
engine #2
NeuPro-M
engine #1
1
3
2
Object detection
Object detection
Object detection
Object detection
NeuPro-M
engine #...
#
Layer by layer
NeuPro-M
engine #0
NeuPro-M
engine #3
NeuPro-M
engine #1
NeuPro-M
engine #2
Input
Output
Neural Network
Tile by tile
NeuPro-M
engine #0
NeuPro-M
engine #3
NeuPro-M
engine #1
NeuPro-M
engine #2
3
F r e e s p a c e s e g m e n t a t i o n
NeuPro-M SDK - Adaptive computing methods:
II
I
III
IV
4
3
2
1
I
n
p
u
t
8
CDNN Architecture Planner
9
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
• Given graph G=(V,E)
• Given input and output nodes (a,u)
• Given cost function f(v) for each node
• We have multiple paths from a to u
• Parallel execution is appropriate in this
case
• We will attempt to parallelize the model
across 3 DSPs
CDNN Multi-Engine Offline Algorithm
10
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
CDNN Multi-Engine Offline Algorithm
11
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4 ----- 0 4 ----- 0 4
n 4 14 b 4 11 c 4 9
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
CDNN Offline Algorithm – Layer 1
12
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4 ----- 0 4 ----- 0 4
n 4 14 b 4 11 c 4 9
e 14 18 g 11 14 d 9 13
f 13 17
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
CDNN Offline Algorithm – Layer 2
13
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4 ----- 0 4 ----- 0 4
n 4 14 b 4 11 c 4 9
e 14 18 g 11 14 d 9 13
j 18 22 l 14 20 f 13 17
m 20 24 k 17 21
h 21 25
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
CDNN Offline Algorithm – Layer 3
14
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4 ----- 0 4 ----- 0 4
n 4 14 b 4 11 c 4 9
e 14 18 g 11 14 d 9 13
j 18 22 l 14 20 f 13 17
p 22 26 m 20 24 k 17 21
q 26 30 r 24 27 h 21 25
o 25 29
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
CDNN Offline Algorithm – Layer 4
15
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4 ----- 0 4 ----- 0 4
n 4 14 b 4 11 c 4 9
e 14 18 g 11 14 d 9 13
j 18 22 l 14 20 f 13 17
p 22 26 m 20 24 k 17 21
q 26 30 r 24 27 h 21 25
t 27 46 o 25 29
s 29 41
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
CDNN Offline Algorithm – Layer 5
16
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4 ----- 0 4 ----- 0 4
n 4 14 b 4 11 c 4 9
e 14 18 g 11 14 d 9 13
j 18 22 l 14 20 f 13 17
p 22 26 m 20 24 k 17 21
q 26 30 r 24 27 h 21 25
u 46 47 t 27 46 o 25 29
s 29 41
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
CDNN Offline Algorithm – Layer 6
17
a
c b
g
u
d
n
h
o
e
j
p
f
k
q
l m
r
s t
0
1
2
3
4
5
6
Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish
a 0 4 ----- 0 4 ----- 0 4
n 4 14 b 4 11 c 4 9
e 14 18 g 11 14 d 9 13
j 18 22 l 14 20 f 13 17
p 22 26 m 20 24 k 17 21
q 26 30 r 24 27 h 21 25
----- 30 46 t 27 46 o 25 29
u 46 47 s 29 41
4
5 7
10
3
4
6
4
4
4
4
4
4
4
4
4
13
3
19
1
Total inference time for 3 DSPs: 47
Single DSP inference time is the sum of all nodes:
4+5+7+10+9*4+3+13+13+19+1 = 111
CDNN Offline Algorithm – Finish
18
• Algorithm performs load balancing
across sub-graphs to minimize idle
time of DSPs
• Parallel execution is appropriate in
this case
• We will attempt to parallelize the
model across 1, 2, 3, and 4 DSPs
and compare the results
• The algorithm implements
backtracking traversal of the graph
to limit the computational
complexity of the process
Example: Inception_v3
19
• Backtracking algorithm
produced the best results
• The bar chart compares the
performance in cycles of a
single DSP vs two, three,
and four DSPs
Number of DSP Units
Execution
cycles
3,568,884
2,166,036
2,013,499 2,001,637
Results: Inception_v3
20
DSP1 DSP2 DSP3
Time
• Many neural networks comprise
a long thread of nodes that are
computed one after the other
• Parallelization is difficult
because there are no nodes
that can run in parallel
• A pipeline approach is adequate
in this case
• We will attempt to parallelize
the model across 3 DSPs
• To minimize idle time, the
algorithm needs to balance the
load across DSPs
CDNN Multi-engine Pipeline Execution
21
• Given Graph G=(V,E)
• The resultant frame rate equals to
1/CYCLE_COUNT of the slowest
section of the pipeline
Example: Mobilenet_v2
22
• The pipeline algorithm
produced the best result
with 4 DSPs
Number of DSP Units
Execution
cycles
165,376
84,436
64,468
43,659
Results: Mobilenet_v2
23
• We introduced CEVA NeuPro-M multi-engine processor for AI and CV
applications.
• These cores are complemented by CDNN for multi-engine, a highly
optimized graph compiler and runtime framework.
• We presented the results of sub-graph and pipeline algorithms, which were
developed at CEVA to distribute the network inference workload across
multiple engines.
• Surprisingly, even for Inception_V3, the pipeline approach outperformed
the backtracking technique.
• Further testing would need to be carried out to confirm the initial results.
Summary and Conclusions
24
Resources
For more information, please visit our booth, #420
www.ceva-dsp.com
25

More Related Content

Similar to “Efficiently Map AI and Vision Applications onto Multi-core AI Processors Using CEVA’s Parallel Processing Framework,” a Presentation from CEVA

AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelMyNOG
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalJunli Gu
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesMarina Kolpakova
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptHermellaGashaw
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processingvishal choudhary
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on androidKoan-Sin Tan
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelKoichi Shirahata
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Julien SIMON
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processingAcad
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxAkshitAgiwal1
 

Similar to “Efficiently Map AI and Vision Applications onto Multi-core AI Processors Using CEVA’s Parallel Processing Framework,” a Presentation from CEVA (20)

26_Fan.pdf
26_Fan.pdf26_Fan.pdf
26_Fan.pdf
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
Introduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, IntelIntroduction to Programmable Networks by Clarence Anslem, Intel
Introduction to Programmable Networks by Clarence Anslem, Intel
 
OpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation finalOpenCL caffe IWOCL 2016 presentation final
OpenCL caffe IWOCL 2016 presentation final
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization ApproachesPragmatic Optimization in Modern Programming - Ordering Optimization Approaches
Pragmatic Optimization in Modern Programming - Ordering Optimization Approaches
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.pptComputer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
Computer_Architecture_3rd_Edition_by_Moris_Mano_Ch_09.ppt
 
Unit 3-pipelining & vector processing
Unit 3-pipelining & vector processingUnit 3-pipelining & vector processing
Unit 3-pipelining & vector processing
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming ModelPerformance Analysis of Lattice QCD on GPUs in APGAS Programming Model
Performance Analysis of Lattice QCD on GPUs in APGAS Programming Model
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)Deep Dive on Amazon EC2 Instances (March 2017)
Deep Dive on Amazon EC2 Instances (March 2017)
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 
pipeline and vector processing
pipeline and vector processingpipeline and vector processing
pipeline and vector processing
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Using CEVA’s Parallel Processing Framework,” a Presentation from CEVA

  • 1. Efficiently Map AI and Vision Applications onto Multi-Core AI Processors Using CEVA’s Parallel Processing Framework Rami Drucker SW Architect CEVA
  • 2. Executive Summary • Next generation AI and CV applications require higher than ever computing power. • Edge devices use multi-core processors to deliver high performance. • However, developers must efficiently map their applications onto the multiple cores, which can be difficult. • CEVA has introduced the Architecture Planner tool as a new element of CDNN, CEVA’s comprehensive AI SDK. • In this talk, we’ll show how the Architecture Planner tool analyzes the network model and maps the workload onto the multiple cores in an efficient manner. • We’ll explain key techniques used by the Tool, including symmetrical and asymmetrical multi-processing paradigms. 2
  • 3. NeuPro-M – A Family of AI Processors A Full System Solution NeuPro-M AI Core NeuPro-M Common Subsystem Multi Engine r Data & Weights Compression AXI Matrix Host Interface Realtime Interface Functional Safety Security logic Core Shared Memory NeuPro-M Master Controller - Optimized for performance, control and code size AXI Engine#8 Engine #... Engine #2 Engine #1 NeuPro-M Engine AXI Mixed Precision Neural Engine Complementary Activation Unit Unstructured Sparsity Engine (Data/Weights) Winograd Transform Engine Vector Processor Unit Engine Local Controller Matrix Decomposition Engine local Shared Memory 4K MAC 3
  • 4. CDNN Open Development Platform CDNN SDK NeuPro-M Core HW NPM Controller NPM Sub-system CDNN Frontend CDNN Backend Optimized Layers / Operators Multi-thread CDNN Android NN CEVA’s Host / OS Interface CEVA’s Parallel Programing Solution NPM Engine SensPro DSP HW SP Controller Halide SPU VCU 4
  • 5. CDNN Toolchain Workflow 1 Neural Network Training Optimizer Tool Real-Time Multi-Network Inferencing 4 CDNN Offline Optimization 3 2 Architecture Planner Tools Training Framework Image Database OUTPUT ► BW to DDR ► L1-L2 BW ► Cycle count ► Overall performance ► Precision ► Output Image Data Flow Resolution L2 Analyzer L1 Analyzer Layer Performance in a front basket" with a cat riding a bicycle, with long hair “Young girl Parallel processing of Multi- networks running over Several engines/cores Input Image Neural Network + + User configuration • L1/L2 MSS size • DDR max BW • DDR latency Graph Manager Quantizer – Winograd – Mix Precision – BW level Precision Wizard System Planner Input Output Hidden Fixed-point Customized Network + Weight FEATURES  Graph Compiler  Memory Optimizations  BW reduction  Code size reduction  Precision  Scale per Channel  Scale per Layer  Scale per Module  CDNN-Invite  Node API  Custom Device API  Open-source front-end  QAT: Quantization Aware Training  Pruning  Structured pruning  Semi-Structured Pruning  Unstructured pruning  HW-Aware Network Optimization  Model size reduction 5
  • 6. CDNN Multi-Engine Features • CDNN supports the following parallel processing paradigms to leverage the compute power of multi-engine device and maximize overall system throughput • Partitioning by tiles  different tiles, shared weights. This is efficient in first layers where input maps are large • Partitioning by output maps  same tile, different weights • Partitioning by sub-graphs  different sub-graphs are assigned to engines • Pipeline partitioning  sequence of nodes assigned to engine • Batch partitioning  same network, different input image • Partitioning by networks  different network per engine 6
  • 7. Quad-Engine Core – Segmentation Network Data Flow NeuPro-M™ AI Processor L1 MEM engine #0 L1 MEM engine #3 L1 MEM engine #1 L2 Shared MSS L1 MEM engine #0 NPM engine #0 NPM engine #3 NPM engine #1 NPM engine #2 L1 MEM engine #1 L1 MEM engine #2 L1 MEM engine #3 External Memory / ISP real-time client X Y AXI P a r a l l e l p r o c e s s i n g L1 MEM engine #2 Overlap area Output Input: 7
  • 8. Parallel Processing Paradigms Model by model NeuPro-M engine #0 NeuPro-M engine #3 NeuPro-M engine #1 NeuPro-M engine #2 3 Object detection Image processing Lane detection Free space segmentation Frame by frame NeuPro-M engine #0 NeuPro-M engine #2 NeuPro-M engine #1 1 3 2 Object detection Object detection Object detection Object detection NeuPro-M engine #... # Layer by layer NeuPro-M engine #0 NeuPro-M engine #3 NeuPro-M engine #1 NeuPro-M engine #2 Input Output Neural Network Tile by tile NeuPro-M engine #0 NeuPro-M engine #3 NeuPro-M engine #1 NeuPro-M engine #2 3 F r e e s p a c e s e g m e n t a t i o n NeuPro-M SDK - Adaptive computing methods: II I III IV 4 3 2 1 I n p u t 8
  • 10. a c b g u d n h o e j p f k q l m r s t 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 • Given graph G=(V,E) • Given input and output nodes (a,u) • Given cost function f(v) for each node • We have multiple paths from a to u • Parallel execution is appropriate in this case • We will attempt to parallelize the model across 3 DSPs CDNN Multi-Engine Offline Algorithm 10
  • 11. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 CDNN Multi-Engine Offline Algorithm 11
  • 12. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 ----- 0 4 ----- 0 4 n 4 14 b 4 11 c 4 9 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 CDNN Offline Algorithm – Layer 1 12
  • 13. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 ----- 0 4 ----- 0 4 n 4 14 b 4 11 c 4 9 e 14 18 g 11 14 d 9 13 f 13 17 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 CDNN Offline Algorithm – Layer 2 13
  • 14. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 ----- 0 4 ----- 0 4 n 4 14 b 4 11 c 4 9 e 14 18 g 11 14 d 9 13 j 18 22 l 14 20 f 13 17 m 20 24 k 17 21 h 21 25 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 CDNN Offline Algorithm – Layer 3 14
  • 15. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 ----- 0 4 ----- 0 4 n 4 14 b 4 11 c 4 9 e 14 18 g 11 14 d 9 13 j 18 22 l 14 20 f 13 17 p 22 26 m 20 24 k 17 21 q 26 30 r 24 27 h 21 25 o 25 29 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 CDNN Offline Algorithm – Layer 4 15
  • 16. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 ----- 0 4 ----- 0 4 n 4 14 b 4 11 c 4 9 e 14 18 g 11 14 d 9 13 j 18 22 l 14 20 f 13 17 p 22 26 m 20 24 k 17 21 q 26 30 r 24 27 h 21 25 t 27 46 o 25 29 s 29 41 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 CDNN Offline Algorithm – Layer 5 16
  • 17. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 ----- 0 4 ----- 0 4 n 4 14 b 4 11 c 4 9 e 14 18 g 11 14 d 9 13 j 18 22 l 14 20 f 13 17 p 22 26 m 20 24 k 17 21 q 26 30 r 24 27 h 21 25 u 46 47 t 27 46 o 25 29 s 29 41 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 CDNN Offline Algorithm – Layer 6 17
  • 18. a c b g u d n h o e j p f k q l m r s t 0 1 2 3 4 5 6 Dsp1 Start finish Dsp2 Start finish Dsp3 Start finish a 0 4 ----- 0 4 ----- 0 4 n 4 14 b 4 11 c 4 9 e 14 18 g 11 14 d 9 13 j 18 22 l 14 20 f 13 17 p 22 26 m 20 24 k 17 21 q 26 30 r 24 27 h 21 25 ----- 30 46 t 27 46 o 25 29 u 46 47 s 29 41 4 5 7 10 3 4 6 4 4 4 4 4 4 4 4 4 13 3 19 1 Total inference time for 3 DSPs: 47 Single DSP inference time is the sum of all nodes: 4+5+7+10+9*4+3+13+13+19+1 = 111 CDNN Offline Algorithm – Finish 18
  • 19. • Algorithm performs load balancing across sub-graphs to minimize idle time of DSPs • Parallel execution is appropriate in this case • We will attempt to parallelize the model across 1, 2, 3, and 4 DSPs and compare the results • The algorithm implements backtracking traversal of the graph to limit the computational complexity of the process Example: Inception_v3 19
  • 20. • Backtracking algorithm produced the best results • The bar chart compares the performance in cycles of a single DSP vs two, three, and four DSPs Number of DSP Units Execution cycles 3,568,884 2,166,036 2,013,499 2,001,637 Results: Inception_v3 20
  • 21. DSP1 DSP2 DSP3 Time • Many neural networks comprise a long thread of nodes that are computed one after the other • Parallelization is difficult because there are no nodes that can run in parallel • A pipeline approach is adequate in this case • We will attempt to parallelize the model across 3 DSPs • To minimize idle time, the algorithm needs to balance the load across DSPs CDNN Multi-engine Pipeline Execution 21
  • 22. • Given Graph G=(V,E) • The resultant frame rate equals to 1/CYCLE_COUNT of the slowest section of the pipeline Example: Mobilenet_v2 22
  • 23. • The pipeline algorithm produced the best result with 4 DSPs Number of DSP Units Execution cycles 165,376 84,436 64,468 43,659 Results: Mobilenet_v2 23
  • 24. • We introduced CEVA NeuPro-M multi-engine processor for AI and CV applications. • These cores are complemented by CDNN for multi-engine, a highly optimized graph compiler and runtime framework. • We presented the results of sub-graph and pipeline algorithms, which were developed at CEVA to distribute the network inference workload across multiple engines. • Surprisingly, even for Inception_V3, the pipeline approach outperformed the backtracking technique. • Further testing would need to be carried out to confirm the initial results. Summary and Conclusions 24
  • 25. Resources For more information, please visit our booth, #420 www.ceva-dsp.com 25