SlideShare a Scribd company logo
1 of 30
Download to read offline
Copyright © 2017 Cadence Design Systems 1
Samer Hijazi
May 2017
Techniques to Reduce Power
Consumption in Embedded DNN
Implementations
Copyright © 2017 Cadence Design Systems 2
Cadence’s mission
• Enable better, faster, cooler silicon systems sooner
Imaging/video recognition
• Strong driver for creating advanced SoCs
Neural networks are a crucial innovation
• But, need a breakthrough in efficiency
Motivation
Copyright © 2017 Cadence Design Systems 3
Typical Computer Vision Problems
• Classifications (e.g., ImageNet and GTSRB)
• Detections
• Draw a bounding box around each ROI
• Pixel by Pixel segmentation
• For a 1080p image, we have 2M pixels in and out!
Copyright © 2017 Cadence Design Systems 4
Why Haven’t DNNs Gone Embedded Yet?
Copyright © 2017 Cadence Design Systems 5
CNN Evolution
• Today’s deep learning industry motto is “Deeper is Better”
Network Application Conv
Layers
LeNet-5 for MNIST
(1998)
Handwritten
Digit Recognition
7
AlexNet (2012) ImageNet 8
Deepface (2014) Face recognition 7
VGG-19 (2014) ImageNet 19
GoogLeNet (2015) ImageNet 22
ResNet (2015) ImageNet 152
Inception-ResNet (2016) ImageNet 246
Copyright © 2017 Cadence Design Systems 6
• Today’s state-of-the-art hardware consumes ~40 W/TMAC
• ~4 TMACs are needed for many DNN real-time applications
• e.g., glass surround analysis, gesture HMI
• This means ~160 W!
• Even O(102) improvement in efficiency is not enough
The DNN Power Question
Courtesy of Dr. Stephen Hicks, Nuffield
Department of Clinical Neurosciences,
University of Oxford
Embedded device power budgets and form-factors
cannot accommodate the current trend of DNNs!
Copyright © 2017 Cadence Design Systems 7
How to Save Power?
CNNs use an excessive number of multiplies and data moves per pixel!
• To solve this problem we can do four things
1. Optimize network architecture
2. Optimize the problem definition
3. Minimize the number of bits per computation
4. Optimize CNN hardware (not covered in this talk)
Copyright © 2017 Cadence Design Systems 8
1. Optimize network architecture
2. Optimize the problem definition
3. Minimize the number of bits per computation
Copyright © 2017 Cadence Design Systems 9
Complexity vs. Performance
Accuracy
Target recognition rate
Complexity
Embedded Device Budget
CloudBudget
Starter Network
Copyright © 2017 Cadence Design Systems 10
Automatic Optimizations of Network Structure
• The ingredients
• A superset network architecture with many knobs to dial
• CactusNet
• Measure redundancy vs. accuracy
• Gradually trim CactusNet
Copyright © 2017 Cadence Design Systems 11
CactusNet
• A general CNN reference architecture with lots of control knobs
CM M M M M C MM M M C ……
1x1 BN ReLU
1x1 BN ReLU
7/3 BN ReLU
5/3 BN ReLU
3 BN ReLU
1 BN ReLU
Concat
1x1BNReLU
Concat
1x1BNReLU
1x1 BN ReLU
Pooling Layer C Concatenate M Cactus Module
Cactus Module
Copyright © 2017 Cadence Design Systems 12
2
3
Convolutional
Neural Network
Training
Labelled Dataset
Compressed Network Architecture
Learned Weights
Sensitivity
Analysis
Optimization
Accuracy vs.
Complexity Model
4
1
Transfer
Learning
Compressed Network
Initialization
Cactus Network Compression Procedure
Copyright © 2017 Cadence Design Systems 13
“Replicants”
• Propagate learned knowledge between networks with different
architectures. Why?
• Accelerate the creation of a family of networks
Performance
Starter Network
Replicants
Complexity
…
Copyright © 2017 Cadence Design Systems 14
• 51,840 images of German road signs
in 43 classes
• CactusNet Outperforms every other
known network on GTSRB
German Traffic Sign Recognition Benchmark
(GTSRB)
98.5
98.7
98.9
99.1
99.3
99.5
99.7
99.9
100.1
1 10 100 1000 10000
%CORRECTCLASSIFICATIONRATE
MILLION MACS (LOG SCALE)
Performance vs. Complexity
CNN Optimization Committee of CNN
Hinge
Loss CNN
Committee Of CNNs
Baseline
(Multi-Scale Replica)
CactusNet
37x
30x
Multi-Scale: Pierre Sermanet and Yann LeCun, “Traffic Sign Recognition with Multi-Scale
Convolutional Netorks”, IEEE IJCNN, 2011
Committee of CNNs: Ciresan, D.; Meier, U.; Schmidhuber, J., "Multi-column deep neural
networks for image classification," IEEE CVPR, 2012
Hinge Loss CNN: Jin, Junqi, Kun Fu, and Changshui Zhang. "Traffic sign recognition with hinge
loss trained convolutional neural networks." IEEE Transactions on Intelligent Transportation
Systems 15.5 (2014): 1991-2000
Copyright © 2017 Cadence Design Systems 15
Results ImageNet (2012)
• ResNet-50 has the best
accuracy/complexity ratio on
ImageNet
• CactusNet outperforms ResNet-50
on accuracy and complexity
Set Num of
images
Max size minsize
train 1281167 3456x2304 60x60
validation 50000 3657x2357 80x60
test 100000 3464x2880 63x84
Cactus Optimization
*CNN accuracies for pre-trained networks from
http://www.vlfeat.org/matconvnet/pretrained/
*
Copyright © 2017 Cadence Design Systems 16
1. Optimize network architecture
2. Optimize the problem definition
3. Minimize the number of bits per computation
Copyright © 2017 Cadence Design Systems 17
How to Reduce the Problem Size
for Pixel Segmentation?
Copyright © 2017 Cadence Design Systems 18
KITTI Road Segmentation Dataset
• 289 training and 290 test images of size 375x1242 pixels
• To segment an image solve 466K classification problems
• Exploiting correlations we made the problem 22x smaller
1. GMACs to process input of size 256x1280
2. http://lmb.informatik.uni-freiburg.de/Publications/2016/OB16b/oliveira16iros.pdf
3. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labeling; Vijay Badrinarayanan,
Ankur Handa, Roberto Cipolla; Machine Intelligence Lab, Department of Engineering, University of Cambridge, UK.
4. Anonymous submission on KITTI server, currently at number 3
http://www.cvlibs.net/datasets/kitti/eval_road_detail.php?result=581d48d10c399e61fe19182cd3483e628c46e893
NW Precision Recall MaxF Road
Accuracy
Overall
Accuracy
GMACs1
Cadence 95.3% 94.4% 94.8% 96.3% 98.9% 10.6
FCN2
94.0% 93.7% 93.8% x x 105.3
SegNet3
x x x 97.4% 89.7% 112.5
RPP4
95.9% 96.9% 96.4% 404.5
Copyright © 2017 Cadence Design Systems 19
1. Optimize network architecture
2. Optimize the problem definition
3. Minimize the number of bits per computation
Copyright © 2017 Cadence Design Systems 20
Minimizing Number Formats in DNN
• Two quantization approaches:
1. Post training quantization
2. During training quantization
• This requires changing the training infrastructure and process
System Methodology
AlexNet Top-1 Error [%]
32b Float 8b Fixed
UC Davis
Ristretto
Dynamic FXP
Minifloat (16b floating-point)
Multiplier-free (shifts only)]
Fine tuning (during training quantization)
43.1 43.8
Google
TensorFlow
Static FXP 42.2 49.4
Cadence Post training quantization based on Dynamic FXP
Fully fixed-point C-modeling, 32b accumulators
42.2 42.5
Copyright © 2017 Cadence Design Systems 21
Fine-Tuning Quantization Benefit
From: Gysel, Philipp. "Ristretto: Hardware-oriented approximation of
convolutional neural networks." arXiv preprint arXiv:1605.06402 (2016).
With
Fine Tuning
Post Training
Only
Average 0.73% accuracy improvement after fine tuning
MNIST
CIFAR
ImageNet
Copyright © 2017 Cadence Design Systems 22
“Static” Fixed-Point
4D Quantization
𝑥1 𝑥2 𝑥 𝑀𝑥𝑗
𝑦1 𝑦2 𝑦 𝑁𝑦𝑖
𝐹𝑖1 𝐹𝑖2
𝐹𝑖𝑗
𝐹𝑖𝑀
8b split:
1 sign bit
3 int bits
4 frac bits
𝐹11
𝐹12
𝐹1𝑗
𝐹1𝑀
Input Feature Maps
Output Feature Maps
𝒔 + 𝒃 𝟏 𝒃 𝟎. 𝒃−𝟏 𝒃−𝟐 𝒃−𝟑 𝒃−𝟒 𝒃−𝟓
4D Convolutional
Kernel
Copyright © 2017 Cadence Design Systems 23
“Dynamic” Fixed-Point
Optimize definition of int/frac bits over finer subsets of processing chain
3D Quantization
𝑥1 𝑥2 𝑥 𝑀𝑥𝑗
𝑦1 𝑦2 𝑦 𝑁𝑦𝑖
𝐹𝑖1 𝐹𝑖2
𝐹𝑖𝑗
𝐹𝑖𝑀
𝒔 + 𝒃 𝟏 𝒃 𝟎. 𝒃−𝟏 𝒃−𝟐 𝒃−𝟑 𝒃−𝟒 𝒃−𝟓
8b split:
1 sign bit
2 int bits
5 frac bits
𝐹11
𝐹12
𝐹1𝑗
𝐹1𝑀𝒔 + 𝒃 𝟑 𝒃 𝟐 𝒃 𝟏 𝒃 𝟎. 𝒃−𝟏 𝒃−𝟐 𝒃−𝟑
1 sign
4 int
3 frac
3D Conv
Kernel #i3D Conv
Kernel #1
Copyright © 2017 Cadence Design Systems 24
“Dynamic” Fixed-Point
There is a trade-off between performance and over-head.
2D Quantization
𝑥1 𝑥2 𝑥 𝑀𝑥𝑗
𝑦1 𝑦2 𝑦 𝑁𝑦𝑖
𝐹𝑖1 𝐹𝑖2
𝐹𝑖𝑗
𝐹𝑖𝑀
𝐹11
𝐹12
𝐹1𝑗
𝐹1𝑀
2D
Filter
int / frac
split
4 / 3
3 / 4
2 / 5
1 / 6
Copyright © 2017 Cadence Design Systems 25
Cadence Quantization of Network Zoo
Network
Top-5 Error [%] Top-1 Error [%]
FLP 8b FXP* FLP 8b FXP*
GoogLeNet 13.5 13.5 34.9 35.0
VGG-VeryDeep-19 10.7 10.7 29.5 29.7
Cactus-it6
~2x < ResNet-50 MACs
~10x < VGG-19 MACs
8.2 8.5 25.3 26.1
ResNet-50 8.3 8.6 25.1 25.8
ResNet-101 7.6 8.2 24.0 25.3
ResNet-152 7.2 7.5 23.5 24.1
*Estimates based on 8b fixed-point coefficients & data, 32b floating-point accumulators
Copyright © 2017 Cadence Design Systems 26
CNN 4b Quantization
Method Data Coefficients Computational
Complexity
Savings vs. 8b x 8b
AlexNet
Top-1
Error
Conventional 8b 8b - 42.5%
Hybrid 8b 8b: 54%
4b: 46%
33% 43.1%
Theoretical
Limit
8b 8b: 0%
4b: 100%
50%
Copyright © 2017 Cadence Design Systems 27
In Summary
Copyright © 2017 Cadence Design Systems 28
• CNN will achieve full potential with O(103)~O(104) improvement in power efficiency.
• DSP techniques can decrease CNN compute and data movement by O(101)~O(102)
Take Away Points
Copyright © 2017 Cadence Design Systems 29
Network Compression
[1] Han, Song, Huizi Mao, and William J. Dally. "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and
Huffman Coding." arXiv preprint arXiv:1510.00149 (2015).
[2] Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size." arXiv preprint
arXiv:1602.07360(2016).
Network Quantization
[3] Gysel, Philipp. "Ristretto: Hardware-oriented approximation of convolutional neural networks." arXiv preprint arXiv:1605.06402 (2016).
[4] Warden, P. “How to quantize neural networks with tensorflow,” @ https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-
tensorflow/
Segmentation
[5] Badrinarayanan, Vijay, Ankur Handa, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel-
wise labelling." arXiv preprint arXiv:1505.07293 (2015).
[6] Oliveira, Gabriel L., Wolfram Burgard, and Thomas Brox. "Efficient deep models for monocular road segmentation." Intelligent Robots and Systems
(IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016.
MatConvNet
[7] Vedaldi, Andrea, and Karel Lenc. "Matconvnet: Convolutional neural networks for matlab." Proceedings of the 23rd ACM international conference on
Multimedia. ACM, 2015.
Benchmarks
[8] Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision 115.3 (2015): 211-252.
[9] Stallkamp, Johannes, et al. "Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition." Neural networks 32 (2012):
323-332.
Selected References & Resources
Copyright © 2017 Cadence Design Systems 30
“© 2017 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo and the other Cadence marks found at
www.cadence.com/go/trademarks are trademarks or registered trademarks of Cadence Design Systems, Inc. All other trademarks are
the property of their respective holders.

More Related Content

More from Edge AI and Vision Alliance

“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
Edge AI and Vision Alliance
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
Edge AI and Vision Alliance
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
Edge AI and Vision Alliance
 
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
 
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

"Techniques to Reduce Power Consumption in Embedded DNN Implementations," a Presentation from Cadence

  • 1. Copyright © 2017 Cadence Design Systems 1 Samer Hijazi May 2017 Techniques to Reduce Power Consumption in Embedded DNN Implementations
  • 2. Copyright © 2017 Cadence Design Systems 2 Cadence’s mission • Enable better, faster, cooler silicon systems sooner Imaging/video recognition • Strong driver for creating advanced SoCs Neural networks are a crucial innovation • But, need a breakthrough in efficiency Motivation
  • 3. Copyright © 2017 Cadence Design Systems 3 Typical Computer Vision Problems • Classifications (e.g., ImageNet and GTSRB) • Detections • Draw a bounding box around each ROI • Pixel by Pixel segmentation • For a 1080p image, we have 2M pixels in and out!
  • 4. Copyright © 2017 Cadence Design Systems 4 Why Haven’t DNNs Gone Embedded Yet?
  • 5. Copyright © 2017 Cadence Design Systems 5 CNN Evolution • Today’s deep learning industry motto is “Deeper is Better” Network Application Conv Layers LeNet-5 for MNIST (1998) Handwritten Digit Recognition 7 AlexNet (2012) ImageNet 8 Deepface (2014) Face recognition 7 VGG-19 (2014) ImageNet 19 GoogLeNet (2015) ImageNet 22 ResNet (2015) ImageNet 152 Inception-ResNet (2016) ImageNet 246
  • 6. Copyright © 2017 Cadence Design Systems 6 • Today’s state-of-the-art hardware consumes ~40 W/TMAC • ~4 TMACs are needed for many DNN real-time applications • e.g., glass surround analysis, gesture HMI • This means ~160 W! • Even O(102) improvement in efficiency is not enough The DNN Power Question Courtesy of Dr. Stephen Hicks, Nuffield Department of Clinical Neurosciences, University of Oxford Embedded device power budgets and form-factors cannot accommodate the current trend of DNNs!
  • 7. Copyright © 2017 Cadence Design Systems 7 How to Save Power? CNNs use an excessive number of multiplies and data moves per pixel! • To solve this problem we can do four things 1. Optimize network architecture 2. Optimize the problem definition 3. Minimize the number of bits per computation 4. Optimize CNN hardware (not covered in this talk)
  • 8. Copyright © 2017 Cadence Design Systems 8 1. Optimize network architecture 2. Optimize the problem definition 3. Minimize the number of bits per computation
  • 9. Copyright © 2017 Cadence Design Systems 9 Complexity vs. Performance Accuracy Target recognition rate Complexity Embedded Device Budget CloudBudget Starter Network
  • 10. Copyright © 2017 Cadence Design Systems 10 Automatic Optimizations of Network Structure • The ingredients • A superset network architecture with many knobs to dial • CactusNet • Measure redundancy vs. accuracy • Gradually trim CactusNet
  • 11. Copyright © 2017 Cadence Design Systems 11 CactusNet • A general CNN reference architecture with lots of control knobs CM M M M M C MM M M C …… 1x1 BN ReLU 1x1 BN ReLU 7/3 BN ReLU 5/3 BN ReLU 3 BN ReLU 1 BN ReLU Concat 1x1BNReLU Concat 1x1BNReLU 1x1 BN ReLU Pooling Layer C Concatenate M Cactus Module Cactus Module
  • 12. Copyright © 2017 Cadence Design Systems 12 2 3 Convolutional Neural Network Training Labelled Dataset Compressed Network Architecture Learned Weights Sensitivity Analysis Optimization Accuracy vs. Complexity Model 4 1 Transfer Learning Compressed Network Initialization Cactus Network Compression Procedure
  • 13. Copyright © 2017 Cadence Design Systems 13 “Replicants” • Propagate learned knowledge between networks with different architectures. Why? • Accelerate the creation of a family of networks Performance Starter Network Replicants Complexity …
  • 14. Copyright © 2017 Cadence Design Systems 14 • 51,840 images of German road signs in 43 classes • CactusNet Outperforms every other known network on GTSRB German Traffic Sign Recognition Benchmark (GTSRB) 98.5 98.7 98.9 99.1 99.3 99.5 99.7 99.9 100.1 1 10 100 1000 10000 %CORRECTCLASSIFICATIONRATE MILLION MACS (LOG SCALE) Performance vs. Complexity CNN Optimization Committee of CNN Hinge Loss CNN Committee Of CNNs Baseline (Multi-Scale Replica) CactusNet 37x 30x Multi-Scale: Pierre Sermanet and Yann LeCun, “Traffic Sign Recognition with Multi-Scale Convolutional Netorks”, IEEE IJCNN, 2011 Committee of CNNs: Ciresan, D.; Meier, U.; Schmidhuber, J., "Multi-column deep neural networks for image classification," IEEE CVPR, 2012 Hinge Loss CNN: Jin, Junqi, Kun Fu, and Changshui Zhang. "Traffic sign recognition with hinge loss trained convolutional neural networks." IEEE Transactions on Intelligent Transportation Systems 15.5 (2014): 1991-2000
  • 15. Copyright © 2017 Cadence Design Systems 15 Results ImageNet (2012) • ResNet-50 has the best accuracy/complexity ratio on ImageNet • CactusNet outperforms ResNet-50 on accuracy and complexity Set Num of images Max size minsize train 1281167 3456x2304 60x60 validation 50000 3657x2357 80x60 test 100000 3464x2880 63x84 Cactus Optimization *CNN accuracies for pre-trained networks from http://www.vlfeat.org/matconvnet/pretrained/ *
  • 16. Copyright © 2017 Cadence Design Systems 16 1. Optimize network architecture 2. Optimize the problem definition 3. Minimize the number of bits per computation
  • 17. Copyright © 2017 Cadence Design Systems 17 How to Reduce the Problem Size for Pixel Segmentation?
  • 18. Copyright © 2017 Cadence Design Systems 18 KITTI Road Segmentation Dataset • 289 training and 290 test images of size 375x1242 pixels • To segment an image solve 466K classification problems • Exploiting correlations we made the problem 22x smaller 1. GMACs to process input of size 256x1280 2. http://lmb.informatik.uni-freiburg.de/Publications/2016/OB16b/oliveira16iros.pdf 3. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labeling; Vijay Badrinarayanan, Ankur Handa, Roberto Cipolla; Machine Intelligence Lab, Department of Engineering, University of Cambridge, UK. 4. Anonymous submission on KITTI server, currently at number 3 http://www.cvlibs.net/datasets/kitti/eval_road_detail.php?result=581d48d10c399e61fe19182cd3483e628c46e893 NW Precision Recall MaxF Road Accuracy Overall Accuracy GMACs1 Cadence 95.3% 94.4% 94.8% 96.3% 98.9% 10.6 FCN2 94.0% 93.7% 93.8% x x 105.3 SegNet3 x x x 97.4% 89.7% 112.5 RPP4 95.9% 96.9% 96.4% 404.5
  • 19. Copyright © 2017 Cadence Design Systems 19 1. Optimize network architecture 2. Optimize the problem definition 3. Minimize the number of bits per computation
  • 20. Copyright © 2017 Cadence Design Systems 20 Minimizing Number Formats in DNN • Two quantization approaches: 1. Post training quantization 2. During training quantization • This requires changing the training infrastructure and process System Methodology AlexNet Top-1 Error [%] 32b Float 8b Fixed UC Davis Ristretto Dynamic FXP Minifloat (16b floating-point) Multiplier-free (shifts only)] Fine tuning (during training quantization) 43.1 43.8 Google TensorFlow Static FXP 42.2 49.4 Cadence Post training quantization based on Dynamic FXP Fully fixed-point C-modeling, 32b accumulators 42.2 42.5
  • 21. Copyright © 2017 Cadence Design Systems 21 Fine-Tuning Quantization Benefit From: Gysel, Philipp. "Ristretto: Hardware-oriented approximation of convolutional neural networks." arXiv preprint arXiv:1605.06402 (2016). With Fine Tuning Post Training Only Average 0.73% accuracy improvement after fine tuning MNIST CIFAR ImageNet
  • 22. Copyright © 2017 Cadence Design Systems 22 “Static” Fixed-Point 4D Quantization 𝑥1 𝑥2 𝑥 𝑀𝑥𝑗 𝑦1 𝑦2 𝑦 𝑁𝑦𝑖 𝐹𝑖1 𝐹𝑖2 𝐹𝑖𝑗 𝐹𝑖𝑀 8b split: 1 sign bit 3 int bits 4 frac bits 𝐹11 𝐹12 𝐹1𝑗 𝐹1𝑀 Input Feature Maps Output Feature Maps 𝒔 + 𝒃 𝟏 𝒃 𝟎. 𝒃−𝟏 𝒃−𝟐 𝒃−𝟑 𝒃−𝟒 𝒃−𝟓 4D Convolutional Kernel
  • 23. Copyright © 2017 Cadence Design Systems 23 “Dynamic” Fixed-Point Optimize definition of int/frac bits over finer subsets of processing chain 3D Quantization 𝑥1 𝑥2 𝑥 𝑀𝑥𝑗 𝑦1 𝑦2 𝑦 𝑁𝑦𝑖 𝐹𝑖1 𝐹𝑖2 𝐹𝑖𝑗 𝐹𝑖𝑀 𝒔 + 𝒃 𝟏 𝒃 𝟎. 𝒃−𝟏 𝒃−𝟐 𝒃−𝟑 𝒃−𝟒 𝒃−𝟓 8b split: 1 sign bit 2 int bits 5 frac bits 𝐹11 𝐹12 𝐹1𝑗 𝐹1𝑀𝒔 + 𝒃 𝟑 𝒃 𝟐 𝒃 𝟏 𝒃 𝟎. 𝒃−𝟏 𝒃−𝟐 𝒃−𝟑 1 sign 4 int 3 frac 3D Conv Kernel #i3D Conv Kernel #1
  • 24. Copyright © 2017 Cadence Design Systems 24 “Dynamic” Fixed-Point There is a trade-off between performance and over-head. 2D Quantization 𝑥1 𝑥2 𝑥 𝑀𝑥𝑗 𝑦1 𝑦2 𝑦 𝑁𝑦𝑖 𝐹𝑖1 𝐹𝑖2 𝐹𝑖𝑗 𝐹𝑖𝑀 𝐹11 𝐹12 𝐹1𝑗 𝐹1𝑀 2D Filter int / frac split 4 / 3 3 / 4 2 / 5 1 / 6
  • 25. Copyright © 2017 Cadence Design Systems 25 Cadence Quantization of Network Zoo Network Top-5 Error [%] Top-1 Error [%] FLP 8b FXP* FLP 8b FXP* GoogLeNet 13.5 13.5 34.9 35.0 VGG-VeryDeep-19 10.7 10.7 29.5 29.7 Cactus-it6 ~2x < ResNet-50 MACs ~10x < VGG-19 MACs 8.2 8.5 25.3 26.1 ResNet-50 8.3 8.6 25.1 25.8 ResNet-101 7.6 8.2 24.0 25.3 ResNet-152 7.2 7.5 23.5 24.1 *Estimates based on 8b fixed-point coefficients & data, 32b floating-point accumulators
  • 26. Copyright © 2017 Cadence Design Systems 26 CNN 4b Quantization Method Data Coefficients Computational Complexity Savings vs. 8b x 8b AlexNet Top-1 Error Conventional 8b 8b - 42.5% Hybrid 8b 8b: 54% 4b: 46% 33% 43.1% Theoretical Limit 8b 8b: 0% 4b: 100% 50%
  • 27. Copyright © 2017 Cadence Design Systems 27 In Summary
  • 28. Copyright © 2017 Cadence Design Systems 28 • CNN will achieve full potential with O(103)~O(104) improvement in power efficiency. • DSP techniques can decrease CNN compute and data movement by O(101)~O(102) Take Away Points
  • 29. Copyright © 2017 Cadence Design Systems 29 Network Compression [1] Han, Song, Huizi Mao, and William J. Dally. "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." arXiv preprint arXiv:1510.00149 (2015). [2] Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 1MB model size." arXiv preprint arXiv:1602.07360(2016). Network Quantization [3] Gysel, Philipp. "Ristretto: Hardware-oriented approximation of convolutional neural networks." arXiv preprint arXiv:1605.06402 (2016). [4] Warden, P. “How to quantize neural networks with tensorflow,” @ https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with- tensorflow/ Segmentation [5] Badrinarayanan, Vijay, Ankur Handa, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for robust semantic pixel- wise labelling." arXiv preprint arXiv:1505.07293 (2015). [6] Oliveira, Gabriel L., Wolfram Burgard, and Thomas Brox. "Efficient deep models for monocular road segmentation." Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on. IEEE, 2016. MatConvNet [7] Vedaldi, Andrea, and Karel Lenc. "Matconvnet: Convolutional neural networks for matlab." Proceedings of the 23rd ACM international conference on Multimedia. ACM, 2015. Benchmarks [8] Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision 115.3 (2015): 211-252. [9] Stallkamp, Johannes, et al. "Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition." Neural networks 32 (2012): 323-332. Selected References & Resources
  • 30. Copyright © 2017 Cadence Design Systems 30 “© 2017 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo and the other Cadence marks found at www.cadence.com/go/trademarks are trademarks or registered trademarks of Cadence Design Systems, Inc. All other trademarks are the property of their respective holders.