SlideShare a Scribd company logo
1 of 22
Download to read offline
Copyright © 2017 Synopsys 1
Moving CNNs from Academic Theory to
Embedded Reality
Tom Michiels, System Architect
May 2017
Copyright © 2017 Synopsys 2
• Embedded vision processor leverages many silicon-proven IP products
• DesignWare®: ARC® HS processor, AXI, DMA, Memory Compiler …
• HAPS® FPGA-based rapid prototyping system
Synopsys at a Glance
>5,500
Masters/PhD
Degrees
>2,400
IP Designers
>2,100
Applications
Engineers
>$2.2B
FY15
Revenue
33%
Revenue
on R&D
>10,200
Employees
Copyright © 2017 Synopsys 3
Requirements for embedded
CNN implementations
car
car
sky
building
building
Copyright © 2017 Synopsys 4
• Object detection, classification & localization,
face recognition
• Visual attention, facial expression recognition
• Gesture recognition / hand tracking
• Scene recognition and labelling, semantic
segmentation
• Sky, mountain, road, tree, building …
• Resolution upscaling
CNN for a Wide Range of Vision Applications
car
car
sky
building
building
road
Copyright © 2017 Synopsys 5
Accuracy
Computationalcomplexity
Lenet (1994)
4 layers
AlexNet (2012)
8 layers
100MByte
VGG-19 (2014)
19 layers
270MByte
GoogleNet (2014)
22 layer
20MByte
ResNet (2015)
152 layers!
10MByte
1 GOPs/frame
10 GOPs/frame
Computation Requirements for CNNs
Copyright © 2017 Synopsys 6
• Advanced CNN applications
• Object classification, detection, localization
• Scene segmentation
• Super resolution
• Recursive neural networks
• Implementation on GPP and GP-GPU
• Typical customer targets for 1080p @30 fps
Typical Power, Performance and Area
Based on 28 nm process node
<500 mW 1-2 mm2
100 – 1000 GMAC/s
1-10 W 50-100 mm2
Copyright © 2017 Synopsys 7
CNN Technical Challenges & Solutions
Copyright © 2017 Synopsys 8
Bit Width Impact on Detection Accuracy
Functional sim. model w/varying bit widths (ILSVRC Graphs/Caffe Trained Models)
Copyright © 2017 Synopsys 9
Efficient Implementation of Convolution
Loop over all layers
for (layer = 0;..;layer ++) {
for (d_z = 0;…;…) {
for (d_y = 0;…;…) {
for (d_x = 0;…;…) {
r = 0;
for (s_z = 0;…;…) {
for (c_y = 0;…;…) {
for (c_x = 0;…;…) {
r+= kernel[layer][d_z][s_z][c_y][c_x]
* F[layer][s_z][d_y-c_y][d_x-c_x];
}
}
}
F[layer+1][d_y][d_x] = ReLu( r );
}
}
}
}
Loop over the three dimensions
of the output blob,
Loop over the X-Y dimension of
the convolution stencil
Loop over the Z-dimension of
the input
It’s just 7 nested loops!
s
dc
Copyright © 2017 Synopsys 10
Efficient Implementation of Convolution
for (layer = 0;..;layer ++) {
for (d_z = 0;…;…) {
for (d_y = 0;…;…) {
for (d_x = 0;…;…) {
r = 0;
for (s_z = 0;…;…) {
for (c_y = 0;…;…) {
for (c_x = 0;…;…) {
r+= kernel[layer][d_z][s_z][c_y][c_x]
* F[layer][s_z][d_y-c_y][d_x-c_x];
}
}
}
F[layer+1][d_y][d_x] = ReLu( r );
}
}
}
}
Design Choices
Efficiency Impact
• Over which of these 7 loops do we vectorize?
• Do we split up loops in fine-grain and course-grain?
• How do we nest these loops?
• What intermediate data can we cache?
• Efficiency of vectorization
• Data-reuse of register and local memory
• External memory bandwidth
• Local memory size and bandwidth requirements
• Cost of mux logic
• Opportunity to exploit sparseness of kernels
Copyright © 2017 Synopsys 11
• Vectorizing too much over one dimension is not
efficient
• Vectorizing over both input-feature maps and
convolution stencils increases computation without
increasing accumulator memory access
• Challenge is efficient vectorization over the
convolution stencil
• Vectorizing over Z-dimension of the input feature
maps increases parallelism without increasing
accumulator bandwidth
Different Ways of Vectorizing Convolutions
Copyright © 2017 Synopsys 12
• Vectorizing and the loop dimension will determine bandwidth
• Orthogonal loop order  lower bandwidth
Vectorizing versus Loop Nesting
3x3
12x3
8x1
3x3
8x1
3x3
12x3 12x3 8x1
Iterate Horizontally Iterate Vertically
Copyright © 2017 Synopsys 13
• Energy cost of local and external memory access
Reduce external memory access by optimizing local memory reuse
Reduce local memory access by optimizing register reuse
Cost of Memory Access
1
2
Copyright © 2017 Synopsys 14
• Once vectorized, every one of the 6 nested loops can be tiled, and every level of the loop can
be nested
• The choice of loop ordering will impact
• Data reuse opportunities
• Bandwidth
• Local memory requirements
• To optimize for power and performance, different loop ordering is needed for different layers
Different Ways of Nesting Convolution Loops
1
2
1
3
4
Copyright © 2017 Synopsys 15
• CNN layers can have 10s of MBs of feature maps and coefficients
• Storing theses intermediate feature maps in external memory may not be
necessary if for subsequent layers the coefficients fit in the local memory
• Convolutions can be tiled between network layers to keep the
intermediate feature maps in local memory
External Memory Bandwidth reduction: Example
3x3
3x3
Copyright © 2017 Synopsys 16
• Scene segmentation on 5-channel 1920x1080 images
• Segmenting into 11 categories
• Weights: Over 100 K values
Automotive Example
5x5,
20
fmap
Max
2x2
5x5,
40
fmap
Max
2x2
5x5,
80
fmap
1x1,
11
fmap
1920
x108
0x5
473x
263x
5
road
building
Frames per second 18 FPS
Cycles per frame 51M Cycles
MAX VM (Storage of Feature Maps) 151K Bytes
MAX WM (Storage of Weights) 155K Bytes
DMA BW Read 503 MB/s
DMA BW Write 102 MB/s
car
car
sky
building
Copyright © 2017 Synopsys 17
External Memory Bandwidth reduction: Example
for (layer = 0;..;layer ++) {
for (d_z = 0;…;…) {
for (d_y = 0;…;…) {
for (d_x = 0;…;…) {
r = 0;
for (s_z = 0;…;…) {
for (c_y = 0;…;…) {
for (c_x = 0;…;…) {
r+= kernel[layer][d_z][s_z][c_y][c_x]
* F[layer][s_z][d_y-c_y][d_x-c_x];
}
}
}
F[layer+1][d_y][d_x] = ReLu( r );
}
}
}
}
This loop should not be the outer-
loop if you want to minimize the
external bandwidth
Copyright © 2017 Synopsys 18
Convolutional Networks with LSTM
CNN LSTM
CNNs are used in conjunction with Recurrent Neural Networks (like LSTM)
Image Caption Generation People Detection in Crowded Scenes
Copyright © 2017 Synopsys 19
• Characteristics of convolutions
• Data-reuse to keep the MACs busy efficiently
• Complicated loop-order and vector trade-offs
• Low bandwidth per MAC
• Challenge of fully connected layers, LSTM, RNN
• High bandwidth per MAC
• Data management, not raw compute power
• Exploiting sparsity in computation
Convolutions vs Fully Connected Layers, LSTM, RNN
Very different
characteristics
Copyright © 2017 Synopsys 20
• Choose bit-widths wisely!
• Lower bit-width saves power and area, but below 10 bits
classification accuracy drops
• Reduce internal and external memory bandwidth
• Choose loop nesting based on the shapes of the convolution layers
Conclusions
Copyright © 2017 Synopsys 21
• Website: Synopsys DesignWare EV6 Embedded Vision Processors
• 2016 Embedded Vision Summit presentations:
• Programming Embedded Vision Processors Using OpenVX
• Using the OpenCL C Kernel Language for Embedded Vision
Processors
• Embedded Vision Alliance article: Facial Analysis Delivers Diverse
Vision Processing Capabilities
• Visit the Synopsys booth for demos on Deep Learning & CNN
Resources
Copyright © 2017 Synopsys 22
Thank You
Tom Michiels, System Architect
May 2017

More Related Content

What's hot

Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imageryShunta Saito
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...MYEONGGYU LEE
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional NetworkSEMINARGROOT
 
Segmentation based graph construction (5)
Segmentation based graph construction (5)Segmentation based graph construction (5)
Segmentation based graph construction (5)DCU
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)Shunta Saito
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1qpqpqp
 
Boolean Algebra by SUKHDEEP SINGH
Boolean Algebra by SUKHDEEP SINGHBoolean Algebra by SUKHDEEP SINGH
Boolean Algebra by SUKHDEEP SINGHSukhdeep Bisht
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringSOYEON KIM
 
Machine learning session 9
Machine learning session 9Machine learning session 9
Machine learning session 9NirsandhG
 
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Sunando Sengupta
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical ImagesCloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical ImagesYan Xu
 
Semi fragile watermarking
Semi fragile watermarkingSemi fragile watermarking
Semi fragile watermarkingYash Diwakar
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-renderingmistercteam
 

What's hot (20)

Implementation
ImplementationImplementation
Implementation
 
Building and road detection from large aerial imagery
Building and road detection from large aerial imageryBuilding and road detection from large aerial imagery
Building and road detection from large aerial imagery
 
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
(Paper Review)3D shape reconstruction from sketches via multi view convolutio...
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional Network
 
Segmentation based graph construction (5)
Segmentation based graph construction (5)Segmentation based graph construction (5)
Segmentation based graph construction (5)
 
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
[unofficial] Pyramid Scene Parsing Network (CVPR 2017)
 
Computer Graphics Part1
Computer Graphics Part1Computer Graphics Part1
Computer Graphics Part1
 
Boolean Algebra by SUKHDEEP SINGH
Boolean Algebra by SUKHDEEP SINGHBoolean Algebra by SUKHDEEP SINGH
Boolean Algebra by SUKHDEEP SINGH
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...Deep Learning for Computer Vision: Memory usage and computational considerati...
Deep Learning for Computer Vision: Memory usage and computational considerati...
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
Machine learning session 9
Machine learning session 9Machine learning session 9
Machine learning session 9
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
 
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
Urban 3D Semantic Modelling Using Stereo Vision, ICRA 2013
 
ARCHES ICF
ARCHES ICFARCHES ICF
ARCHES ICF
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical ImagesCloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
 
Semi fragile watermarking
Semi fragile watermarkingSemi fragile watermarking
Semi fragile watermarking
 
Advancements in-tiled-rendering
Advancements in-tiled-renderingAdvancements in-tiled-rendering
Advancements in-tiled-rendering
 

Similar to Moving CNNs from Academic Theory to Embedded Reality

Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererDavide Pasca
 
"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTIEdge AI and Vision Alliance
 
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...Edge AI and Vision Alliance
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...Edge AI and Vision Alliance
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkKNIMESlides
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraEmiliano
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceIntel Nervana
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Johan Andersson
 
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...nishimurashoji
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetKrishnakoumarC
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangLihang Li
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...Edge AI and Vision Alliance
 

Similar to Moving CNNs from Academic Theory to Embedded Reality (20)

Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES renderer
 
Masked Occlusion Culling
Masked Occlusion CullingMasked Occlusion Culling
Masked Occlusion Culling
 
"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI"Demystifying Deep Neural Networks," a Presentation from BDTI
"Demystifying Deep Neural Networks," a Presentation from BDTI
 
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
“Improving Power Efficiency for Edge Inferencing with Memory Management Optim...
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al..."Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
"Designing CNN Algorithms for Real-time Applications," a Presentation from Al...
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
Heterogeneous Data Mining with Spark
Heterogeneous Data Mining with SparkHeterogeneous Data Mining with Spark
Heterogeneous Data Mining with Spark
 
Autonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with CassandraAutonomous control in Big Data platforms: and experience with Cassandra
Autonomous control in Big Data platforms: and experience with Cassandra
 
Rethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligenceRethinking computation: A processor architecture for machine intelligence
Rethinking computation: A processor architecture for machine intelligence
 
Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!Your Game Needs Direct3D 11, So Get Started Now!
Your Game Needs Direct3D 11, So Get Started Now!
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
QUILTS: Multidimensional Data Partitioning Framework Based on Query-Aware and...
 
Introduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNetIntroduction to CNN Models: DenseNet & MobileNet
Introduction to CNN Models: DenseNet & MobileNet
 
Point cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihangPoint cloud mesh-investigation_report-lihang
Point cloud mesh-investigation_report-lihang
 
PointNet
PointNetPointNet
PointNet
 
UE4 Landscape
UE4 LandscapeUE4 Landscape
UE4 Landscape
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 
Core concepts of C++
Core concepts of C++  Core concepts of C++
Core concepts of C++
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Moving CNNs from Academic Theory to Embedded Reality

  • 1. Copyright © 2017 Synopsys 1 Moving CNNs from Academic Theory to Embedded Reality Tom Michiels, System Architect May 2017
  • 2. Copyright © 2017 Synopsys 2 • Embedded vision processor leverages many silicon-proven IP products • DesignWare®: ARC® HS processor, AXI, DMA, Memory Compiler … • HAPS® FPGA-based rapid prototyping system Synopsys at a Glance >5,500 Masters/PhD Degrees >2,400 IP Designers >2,100 Applications Engineers >$2.2B FY15 Revenue 33% Revenue on R&D >10,200 Employees
  • 3. Copyright © 2017 Synopsys 3 Requirements for embedded CNN implementations car car sky building building
  • 4. Copyright © 2017 Synopsys 4 • Object detection, classification & localization, face recognition • Visual attention, facial expression recognition • Gesture recognition / hand tracking • Scene recognition and labelling, semantic segmentation • Sky, mountain, road, tree, building … • Resolution upscaling CNN for a Wide Range of Vision Applications car car sky building building road
  • 5. Copyright © 2017 Synopsys 5 Accuracy Computationalcomplexity Lenet (1994) 4 layers AlexNet (2012) 8 layers 100MByte VGG-19 (2014) 19 layers 270MByte GoogleNet (2014) 22 layer 20MByte ResNet (2015) 152 layers! 10MByte 1 GOPs/frame 10 GOPs/frame Computation Requirements for CNNs
  • 6. Copyright © 2017 Synopsys 6 • Advanced CNN applications • Object classification, detection, localization • Scene segmentation • Super resolution • Recursive neural networks • Implementation on GPP and GP-GPU • Typical customer targets for 1080p @30 fps Typical Power, Performance and Area Based on 28 nm process node <500 mW 1-2 mm2 100 – 1000 GMAC/s 1-10 W 50-100 mm2
  • 7. Copyright © 2017 Synopsys 7 CNN Technical Challenges & Solutions
  • 8. Copyright © 2017 Synopsys 8 Bit Width Impact on Detection Accuracy Functional sim. model w/varying bit widths (ILSVRC Graphs/Caffe Trained Models)
  • 9. Copyright © 2017 Synopsys 9 Efficient Implementation of Convolution Loop over all layers for (layer = 0;..;layer ++) { for (d_z = 0;…;…) { for (d_y = 0;…;…) { for (d_x = 0;…;…) { r = 0; for (s_z = 0;…;…) { for (c_y = 0;…;…) { for (c_x = 0;…;…) { r+= kernel[layer][d_z][s_z][c_y][c_x] * F[layer][s_z][d_y-c_y][d_x-c_x]; } } } F[layer+1][d_y][d_x] = ReLu( r ); } } } } Loop over the three dimensions of the output blob, Loop over the X-Y dimension of the convolution stencil Loop over the Z-dimension of the input It’s just 7 nested loops! s dc
  • 10. Copyright © 2017 Synopsys 10 Efficient Implementation of Convolution for (layer = 0;..;layer ++) { for (d_z = 0;…;…) { for (d_y = 0;…;…) { for (d_x = 0;…;…) { r = 0; for (s_z = 0;…;…) { for (c_y = 0;…;…) { for (c_x = 0;…;…) { r+= kernel[layer][d_z][s_z][c_y][c_x] * F[layer][s_z][d_y-c_y][d_x-c_x]; } } } F[layer+1][d_y][d_x] = ReLu( r ); } } } } Design Choices Efficiency Impact • Over which of these 7 loops do we vectorize? • Do we split up loops in fine-grain and course-grain? • How do we nest these loops? • What intermediate data can we cache? • Efficiency of vectorization • Data-reuse of register and local memory • External memory bandwidth • Local memory size and bandwidth requirements • Cost of mux logic • Opportunity to exploit sparseness of kernels
  • 11. Copyright © 2017 Synopsys 11 • Vectorizing too much over one dimension is not efficient • Vectorizing over both input-feature maps and convolution stencils increases computation without increasing accumulator memory access • Challenge is efficient vectorization over the convolution stencil • Vectorizing over Z-dimension of the input feature maps increases parallelism without increasing accumulator bandwidth Different Ways of Vectorizing Convolutions
  • 12. Copyright © 2017 Synopsys 12 • Vectorizing and the loop dimension will determine bandwidth • Orthogonal loop order  lower bandwidth Vectorizing versus Loop Nesting 3x3 12x3 8x1 3x3 8x1 3x3 12x3 12x3 8x1 Iterate Horizontally Iterate Vertically
  • 13. Copyright © 2017 Synopsys 13 • Energy cost of local and external memory access Reduce external memory access by optimizing local memory reuse Reduce local memory access by optimizing register reuse Cost of Memory Access 1 2
  • 14. Copyright © 2017 Synopsys 14 • Once vectorized, every one of the 6 nested loops can be tiled, and every level of the loop can be nested • The choice of loop ordering will impact • Data reuse opportunities • Bandwidth • Local memory requirements • To optimize for power and performance, different loop ordering is needed for different layers Different Ways of Nesting Convolution Loops 1 2 1 3 4
  • 15. Copyright © 2017 Synopsys 15 • CNN layers can have 10s of MBs of feature maps and coefficients • Storing theses intermediate feature maps in external memory may not be necessary if for subsequent layers the coefficients fit in the local memory • Convolutions can be tiled between network layers to keep the intermediate feature maps in local memory External Memory Bandwidth reduction: Example 3x3 3x3
  • 16. Copyright © 2017 Synopsys 16 • Scene segmentation on 5-channel 1920x1080 images • Segmenting into 11 categories • Weights: Over 100 K values Automotive Example 5x5, 20 fmap Max 2x2 5x5, 40 fmap Max 2x2 5x5, 80 fmap 1x1, 11 fmap 1920 x108 0x5 473x 263x 5 road building Frames per second 18 FPS Cycles per frame 51M Cycles MAX VM (Storage of Feature Maps) 151K Bytes MAX WM (Storage of Weights) 155K Bytes DMA BW Read 503 MB/s DMA BW Write 102 MB/s car car sky building
  • 17. Copyright © 2017 Synopsys 17 External Memory Bandwidth reduction: Example for (layer = 0;..;layer ++) { for (d_z = 0;…;…) { for (d_y = 0;…;…) { for (d_x = 0;…;…) { r = 0; for (s_z = 0;…;…) { for (c_y = 0;…;…) { for (c_x = 0;…;…) { r+= kernel[layer][d_z][s_z][c_y][c_x] * F[layer][s_z][d_y-c_y][d_x-c_x]; } } } F[layer+1][d_y][d_x] = ReLu( r ); } } } } This loop should not be the outer- loop if you want to minimize the external bandwidth
  • 18. Copyright © 2017 Synopsys 18 Convolutional Networks with LSTM CNN LSTM CNNs are used in conjunction with Recurrent Neural Networks (like LSTM) Image Caption Generation People Detection in Crowded Scenes
  • 19. Copyright © 2017 Synopsys 19 • Characteristics of convolutions • Data-reuse to keep the MACs busy efficiently • Complicated loop-order and vector trade-offs • Low bandwidth per MAC • Challenge of fully connected layers, LSTM, RNN • High bandwidth per MAC • Data management, not raw compute power • Exploiting sparsity in computation Convolutions vs Fully Connected Layers, LSTM, RNN Very different characteristics
  • 20. Copyright © 2017 Synopsys 20 • Choose bit-widths wisely! • Lower bit-width saves power and area, but below 10 bits classification accuracy drops • Reduce internal and external memory bandwidth • Choose loop nesting based on the shapes of the convolution layers Conclusions
  • 21. Copyright © 2017 Synopsys 21 • Website: Synopsys DesignWare EV6 Embedded Vision Processors • 2016 Embedded Vision Summit presentations: • Programming Embedded Vision Processors Using OpenVX • Using the OpenCL C Kernel Language for Embedded Vision Processors • Embedded Vision Alliance article: Facial Analysis Delivers Diverse Vision Processing Capabilities • Visit the Synopsys booth for demos on Deep Learning & CNN Resources
  • 22. Copyright © 2017 Synopsys 22 Thank You Tom Michiels, System Architect May 2017