SlideShare a Scribd company logo
1 of 26
Download to read offline
1
DNN Model and
Hardware Co-Design
ISCA Tutorial (2019)
Website: http://eyeriss.mit.edu/tutorial.html
Joel Emer, Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang
2
Approaches
• Reduce size of operands for storage/compute
– Floating point à Fixed point
– Bit-width reduction
– Non-linear quantization
• Reduce number of operations for storage/compute
– Exploit Activation Statistics (Compression)
– Network Pruning
– Compact Network Architectures
3
Taxonomy
• Precision refers to the number of levels
– Number of bits = log2 (number of levels)
• Quantization: mapping data to a smaller set of levels
– Linear, e.g., fixed-point
– Non-linear
• Computed (e.g., floating point, log-domain)
• Table lookup (e.g., learned)
Objective: Reduce size to improve speed and/or reduce energy
while preserving accuracy
4
Cost of Operations
Operation: Energy
(pJ)
8b Add 0.03
16b Add 0.05
32b Add 0.1
16b FP Add 0.4
32b FP Add 0.9
8b Mult 0.2
32b Mult 3.1
16b FP Mult 1.1
32b FP Mult 3.7
32b SRAM Read (8KB) 5
32b DRAM Read 640
Area
(µm2)
36
67
137
1360
4184
282
3495
1640
7700
N/A
N/A
[Horowitz, “Computing’s Energy Problem (and what we can do about it)”, ISSCC 2014]
Relative Energy Cost
1 10 102 103 104
Relative Area Cost
1 10 102 103
5
Number Representation
FP32
FP16
Int32
Int16
Int8
S E M
1 8 23
S E M
1 5 10
M
31
S
S M
1
1 15
S M
1 7
Range Accuracy
10-38 – 1038 .000006%
6x10-5 - 6x104 .05%
0 – 2x109 ½
0 – 6x104 ½
0 – 127 ½
Image Source: B. Dally
6
Floating Point à Fixed Point
1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0
32-bit float
exponent (8-bits) mantissa (23-bits)
sign
8-bit
fixed
0 1 1 0 0 1 1 0
sign
integer
(4-bits)
mantissa (7-bits)
fractional
(3-bits)
s = 0
12.75 m=102
Floating Point
Fixed Point
e = 74
s = 1 m = 20484
-1.112934 x 10-16
7
N-bit Precision
Accumulate
+
Weight
(N-bits)
Activation
(N-bits)
N x N
multiply
2N-bits
2N+M-bits
Output
(N-bits)
Quantize
to N-bits
For no loss in precision, M is determined based on largest
filter size (in the range of 10 to 16 bits for popular DNNs)
8
Dynamic Fixed Point
1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0
32-bit float
exponent (8-bits) mantissa (23-bits)
sign
8-bit
dynamic
fixed
0 1 1 0 0 1 1 0
sign
integer
([7-f ]-bits)
mantissa (7-bits)
fractional
(f-bits)
f = 3
s = 0
12.75 m=102
8-bit
dynamic
fixed
0 1 1 0 0 1 1 0
sign mantissa (7-bits)
fractional
(f-bits)
f = 9
s = 0
0.19921875 m=102
Allow f to vary based on data type and layer
Floating Point
Fixed Point
e = 74
s = 1 m = 20484
-1.112934 x 10-16
9
Impact on Accuracy
[Gysel et al., Ristretto, ICLR 2016]
w/o fine tuning
Top-1 accuracy
on of CaffeNet
on ImageNet
10
Avoiding Dynamic Fixed Point
AlexNet
(Layer 6)
Image Source: Moons
et al, WACV 2016
Batch normalization ‘centers’ dynamic range
‘Centered’ dynamic ranges might reduce need for
dynamic fixed point
11
Nvidia PASCAL
“New half-precision, 16-bit
floating point instructions
deliver over 21 TeraFLOPS for
unprecedented training
performance. With 47 TOPS
(tera-operations per second)
of performance, new 8-bit
integer instructions in Pascal
allow AI algorithms to deliver
real-time responsiveness for
deep learning inference.”
– Nvidia.com (April 2016)
12
Google’s Tensor Processing Unit (TPU)
“ With its TPU Google has
seemingly focused on delivering
the data really quickly by cutting
down on precision. Specifically,
it doesn’t rely on floating point
precision like a GPU
….
Instead the chip uses integer
math…TPU used 8-bit integer.”
- Next Platform (May 19, 2016)
[Jouppi et al., ISCA 2017]
13
Microsoft BrainWave
[Chung et al., Hot Chips 2017]
Narrow Precision for Inference
Custom 8-bit floating point format (“ms-fp8”)
14
Precision Varies from Layer to Layer
[Moons et al., WACV 2016]
[Judd et al., ArXiv 2016]
15
Bitwidth Scaling (Speed)
Bit-Serial Processing: Reduce Bit-width à Skip Cycles
Speed up of 2.24x vs. 16-bit fixed
[Judd et al., Stripes, CAL 2016]
16
Bitwidth Scaling (Power)
[Moons et al., VLSI 2016]
Reduce Bit-width à
Shorter Critical Path
à Reduce Voltage
Power reduction of
2.56x vs. 16-bit fixed
On AlexNet Layer 2
17
Reconfigure Spatial Multiply
[Envision, ISSCC 2017]
Configure 16bx16b multiplication into two 8x8b or
four 4x4b (up to 256-64=192 adders are idle).
Body bias to reduce leakage at low precision since
more adders are idle (1.2x reduction)
18
Reconfigure Spatial Multiply
[Bit Fusion, ISCA 2018]
Build larger multipliers (Fused Unit) from small 2x2 multipliers with
programmable shifters (BitBrick)
One 8bx8b, four 2bx8b, sixteen 2bx2b
19
Binary Nets
• Binary Connect (BC)
– Weights {-1,1}, Activations 32-bit float
– MAC à addition/subtraction
– Accuracy loss: 19% on AlexNet
• Binarized Neural Networks (BNN)
– Weights {-1,1}, Activations {-1,1}
– MAC à XNOR
– Accuracy loss: 29.8% on AlexNet
Binary Filters
[Courbariaux, arXiv 2016]
[Courbariaux, NeurIPS 2015]
20
Scale the Weights and Activations
[Rastegari et al., BWN & XNOR-Net, ECCV 2016]
• Binary Weight Nets (BWN)
– Weights {-α, α} à except first and last layers are 32-bit float
– Activations: 32-bit float
– α determined by the l1-norm of all weights in a filter
– Accuracy loss: 0.8% on AlexNet
• XNOR-Net
– Weights {-α, α}
– Activations {-βi, βi} à except first and last layers are 32-bit float
– βi determined by the l1-norm of all activations across channels
for given position i of the input feature map
– Accuracy loss: 11% on AlexNet
Hardware needs to support
both activation precisions
Scale factors (α, βi) can change per filter or position in filter
21
Ternary Nets
• Allow for weights to be zero
– Increase sparsity, but also increase number of bits (2-bits)
• Ternary Weight Nets (TWN)
– Weights {-w, 0, w} à except first and last layers are 32-bit float
– Activations: 32-bit float
– Accuracy loss: 3.7% on AlexNet
• Trained Ternary Quantization (TTQ)
– Weights {-w1, 0, w2} à except first and last layers are 32-bit float
– Activations: 32-bit float
– Accuracy loss: 0.6% on AlexNet
[Li et al., arXiv 2016]
[Zhu et al., ICLR 2017]
22
Computed Non-linear Quantization
Log Domain Quantization
Product = X << W
Product = X * W
[Lee et al., LogNet, ICASSP 2017]
23
Log Domain Quantization
• Weights: 5-bits for CONV, 4-bit for FC; Activations: 4-bits
• Accuracy loss: 3.2% on AlexNet
[Miyashita et al., arXiv 2016],
[Lee et al., LogNet, ICASSP 2017]
Shift and Add
WS
24
Reduce Precision Overview
• Learned mapping of data to quantization levels
(e.g., k-means)
• Additional Properties
– Fixed or Variable (across data types, layers, channels, etc.)
[Han et al., ICLR 2016]
Implement with
look up table
25
Non-Linear Quantization Table Lookup
Trained Quantization: Find K weights via K-means clustering
to reduce number of unique weights per layer (weight sharing)
[Han et al., Deep Compression, ICLR 2016]
Weight
Decoder/
Dequant
U x 16b
Weight
index
(log2U-bits)
Weight
(16-bits)
Weight
Memory
CRSM x
log2U-bits
Output
Activation
(16-bits)
MAC
Input
Activation
(16-bits)
Example: AlexNet (no accuracy loss)
256 unique weights for CONV layer
16 unique weights for FC layer
Does not reduce
precision of MAC
Overhead
Smaller Weight
Memory
Consequences: Narrow weight memory and second access from (small) table
26
Summary of Reduce Precision
Category Method Weights
(# of bits)
Activations
(# of bits)
Accuracy Loss vs.
32-bit float (%)
Dynamic Fixed
Point
w/o fine-tuning 8 10 0.4
w/ fine-tuning 8 8 0.6
Reduce weight Ternary weights
Networks (TWN)
2* 32 3.7
Trained Ternary
Quantization (TTQ)
2* 32 0.6
Binary Connect (BC) 1 32 19.2
Binary Weight Net
(BWN)
1* 32 0.8
Reduce weight
and activation
Binarized Neural Net
(BNN)
1 1 29.8
XNOR-Net 1* 1 11
Non-Linear LogNet 5(conv), 4(fc) 4 3.2
Weight Sharing 8(conv), 4(fc) 16 0
* first and last layers are 32-bit float

More Related Content

Similar to Tutorial-on-DNN-07-Co-design-Precision.pdf

Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...T. E. BOGALE
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureIJMER
 
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Carlos Reaño González
 
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET Journal
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioOPNFV
 
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...Ealwan Lee
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET Journal
 
ASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL AcceleratorASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL AcceleratorIRJET Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2Ani Sridhar
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Multimedia Security - JPEG Artifact details
Multimedia Security - JPEG Artifact detailsMultimedia Security - JPEG Artifact details
Multimedia Security - JPEG Artifact detailsSebastiano Battiato
 
Performance Enhancement for Quality Inter-Layer Scalable Video Coding
Performance Enhancement for Quality Inter-Layer Scalable Video CodingPerformance Enhancement for Quality Inter-Layer Scalable Video Coding
Performance Enhancement for Quality Inter-Layer Scalable Video CodingIJCSIS Research Publications
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520IJRAT
 
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...pgdayrussia
 
Analysis of low pdp using SPST in bilateral filter
Analysis of low pdp using SPST in bilateral filterAnalysis of low pdp using SPST in bilateral filter
Analysis of low pdp using SPST in bilateral filterIJTET Journal
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)byteLAKE
 

Similar to Tutorial-on-DNN-07-Co-design-Precision.pdf (20)

Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
Joint CSI Estimation, Beamforming and Scheduling Design for Wideband Massive ...
 
Efficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT ArchitectureEfficient Implementation of Low Power 2-D DCT Architecture
Efficient Implementation of Low Power 2-D DCT Architecture
 
Gt3612201224
Gt3612201224Gt3612201224
Gt3612201224
 
L14.pdf
L14.pdfL14.pdf
L14.pdf
 
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early...
 
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...IRJET-  	  Implementation of Radix-16 and Binary 64 Division VLSI Realization...
IRJET- Implementation of Radix-16 and Binary 64 Division VLSI Realization...
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
Boosting the Performance of Nested Spatial Mapping with Unequal Modulation in...
 
IRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL AcceleratorIRJET-ASIC Implementation for SOBEL Accelerator
IRJET-ASIC Implementation for SOBEL Accelerator
 
ASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL AcceleratorASIC Implementation for SOBEL Accelerator
ASIC Implementation for SOBEL Accelerator
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
feedback_optimizations_v2
feedback_optimizations_v2feedback_optimizations_v2
feedback_optimizations_v2
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Multimedia Security - JPEG Artifact details
Multimedia Security - JPEG Artifact detailsMultimedia Security - JPEG Artifact details
Multimedia Security - JPEG Artifact details
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Performance Enhancement for Quality Inter-Layer Scalable Video Coding
Performance Enhancement for Quality Inter-Layer Scalable Video CodingPerformance Enhancement for Quality Inter-Layer Scalable Video Coding
Performance Enhancement for Quality Inter-Layer Scalable Video Coding
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520
 
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
 
Analysis of low pdp using SPST in bilateral filter
Analysis of low pdp using SPST in bilateral filterAnalysis of low pdp using SPST in bilateral filter
Analysis of low pdp using SPST in bilateral filter
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 

Recently uploaded

Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...Pooja Nehwal
 
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...Call Girls in Nagpur High Profile
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gapedkojalkojal131
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service ThanePooja Nehwal
 
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...Suhani Kapoor
 
Dubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai WisteriaDubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai WisteriaUnited Arab Emirates
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurSuhani Kapoor
 
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查awo24iot
 
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Call Girls in Nagpur High Profile
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...Call Girls in Nagpur High Profile
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...ur8mqw8e
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Pooja Nehwal
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...Amil baba
 
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikLow Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...Call Girls in Nagpur High Profile
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 

Recently uploaded (20)

Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Sakshi Call 7001035870 Meet With Nagpur Escorts
 
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
9892124323, Call Girl in Juhu Call Girls Services (Rate ₹8.5K) 24×7 with Hote...
 
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
 
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Hitech City ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
 
Dubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai WisteriaDubai Call Girls O528786472 Call Girls In Dubai Wisteria
Dubai Call Girls O528786472 Call Girls In Dubai Wisteria
 
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service SaharanpurVIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
VIP Call Girl Saharanpur Aashi 8250192130 Independent Escort Service Saharanpur
 
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
 
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
 
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
《伯明翰城市大学毕业证成绩单购买》学历证书学位证书区别《复刻原版1:1伯明翰城市大学毕业证书|修改BCU成绩单PDF版》Q微信741003700《BCU学...
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
 
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls KolkataCall Girls Service Kolkata Aishwarya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Call Girls Service Kolkata Aishwarya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
 
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikLow Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
 
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls Kolkata Chhaya 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls Kolkata Chhaya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 

Tutorial-on-DNN-07-Co-design-Precision.pdf

  • 1. 1 DNN Model and Hardware Co-Design ISCA Tutorial (2019) Website: http://eyeriss.mit.edu/tutorial.html Joel Emer, Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang
  • 2. 2 Approaches • Reduce size of operands for storage/compute – Floating point à Fixed point – Bit-width reduction – Non-linear quantization • Reduce number of operations for storage/compute – Exploit Activation Statistics (Compression) – Network Pruning – Compact Network Architectures
  • 3. 3 Taxonomy • Precision refers to the number of levels – Number of bits = log2 (number of levels) • Quantization: mapping data to a smaller set of levels – Linear, e.g., fixed-point – Non-linear • Computed (e.g., floating point, log-domain) • Table lookup (e.g., learned) Objective: Reduce size to improve speed and/or reduce energy while preserving accuracy
  • 4. 4 Cost of Operations Operation: Energy (pJ) 8b Add 0.03 16b Add 0.05 32b Add 0.1 16b FP Add 0.4 32b FP Add 0.9 8b Mult 0.2 32b Mult 3.1 16b FP Mult 1.1 32b FP Mult 3.7 32b SRAM Read (8KB) 5 32b DRAM Read 640 Area (µm2) 36 67 137 1360 4184 282 3495 1640 7700 N/A N/A [Horowitz, “Computing’s Energy Problem (and what we can do about it)”, ISSCC 2014] Relative Energy Cost 1 10 102 103 104 Relative Area Cost 1 10 102 103
  • 5. 5 Number Representation FP32 FP16 Int32 Int16 Int8 S E M 1 8 23 S E M 1 5 10 M 31 S S M 1 1 15 S M 1 7 Range Accuracy 10-38 – 1038 .000006% 6x10-5 - 6x104 .05% 0 – 2x109 ½ 0 – 6x104 ½ 0 – 127 ½ Image Source: B. Dally
  • 6. 6 Floating Point à Fixed Point 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 32-bit float exponent (8-bits) mantissa (23-bits) sign 8-bit fixed 0 1 1 0 0 1 1 0 sign integer (4-bits) mantissa (7-bits) fractional (3-bits) s = 0 12.75 m=102 Floating Point Fixed Point e = 74 s = 1 m = 20484 -1.112934 x 10-16
  • 7. 7 N-bit Precision Accumulate + Weight (N-bits) Activation (N-bits) N x N multiply 2N-bits 2N+M-bits Output (N-bits) Quantize to N-bits For no loss in precision, M is determined based on largest filter size (in the range of 10 to 16 bits for popular DNNs)
  • 8. 8 Dynamic Fixed Point 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 32-bit float exponent (8-bits) mantissa (23-bits) sign 8-bit dynamic fixed 0 1 1 0 0 1 1 0 sign integer ([7-f ]-bits) mantissa (7-bits) fractional (f-bits) f = 3 s = 0 12.75 m=102 8-bit dynamic fixed 0 1 1 0 0 1 1 0 sign mantissa (7-bits) fractional (f-bits) f = 9 s = 0 0.19921875 m=102 Allow f to vary based on data type and layer Floating Point Fixed Point e = 74 s = 1 m = 20484 -1.112934 x 10-16
  • 9. 9 Impact on Accuracy [Gysel et al., Ristretto, ICLR 2016] w/o fine tuning Top-1 accuracy on of CaffeNet on ImageNet
  • 10. 10 Avoiding Dynamic Fixed Point AlexNet (Layer 6) Image Source: Moons et al, WACV 2016 Batch normalization ‘centers’ dynamic range ‘Centered’ dynamic ranges might reduce need for dynamic fixed point
  • 11. 11 Nvidia PASCAL “New half-precision, 16-bit floating point instructions deliver over 21 TeraFLOPS for unprecedented training performance. With 47 TOPS (tera-operations per second) of performance, new 8-bit integer instructions in Pascal allow AI algorithms to deliver real-time responsiveness for deep learning inference.” – Nvidia.com (April 2016)
  • 12. 12 Google’s Tensor Processing Unit (TPU) “ With its TPU Google has seemingly focused on delivering the data really quickly by cutting down on precision. Specifically, it doesn’t rely on floating point precision like a GPU …. Instead the chip uses integer math…TPU used 8-bit integer.” - Next Platform (May 19, 2016) [Jouppi et al., ISCA 2017]
  • 13. 13 Microsoft BrainWave [Chung et al., Hot Chips 2017] Narrow Precision for Inference Custom 8-bit floating point format (“ms-fp8”)
  • 14. 14 Precision Varies from Layer to Layer [Moons et al., WACV 2016] [Judd et al., ArXiv 2016]
  • 15. 15 Bitwidth Scaling (Speed) Bit-Serial Processing: Reduce Bit-width à Skip Cycles Speed up of 2.24x vs. 16-bit fixed [Judd et al., Stripes, CAL 2016]
  • 16. 16 Bitwidth Scaling (Power) [Moons et al., VLSI 2016] Reduce Bit-width à Shorter Critical Path à Reduce Voltage Power reduction of 2.56x vs. 16-bit fixed On AlexNet Layer 2
  • 17. 17 Reconfigure Spatial Multiply [Envision, ISSCC 2017] Configure 16bx16b multiplication into two 8x8b or four 4x4b (up to 256-64=192 adders are idle). Body bias to reduce leakage at low precision since more adders are idle (1.2x reduction)
  • 18. 18 Reconfigure Spatial Multiply [Bit Fusion, ISCA 2018] Build larger multipliers (Fused Unit) from small 2x2 multipliers with programmable shifters (BitBrick) One 8bx8b, four 2bx8b, sixteen 2bx2b
  • 19. 19 Binary Nets • Binary Connect (BC) – Weights {-1,1}, Activations 32-bit float – MAC à addition/subtraction – Accuracy loss: 19% on AlexNet • Binarized Neural Networks (BNN) – Weights {-1,1}, Activations {-1,1} – MAC à XNOR – Accuracy loss: 29.8% on AlexNet Binary Filters [Courbariaux, arXiv 2016] [Courbariaux, NeurIPS 2015]
  • 20. 20 Scale the Weights and Activations [Rastegari et al., BWN & XNOR-Net, ECCV 2016] • Binary Weight Nets (BWN) – Weights {-α, α} à except first and last layers are 32-bit float – Activations: 32-bit float – α determined by the l1-norm of all weights in a filter – Accuracy loss: 0.8% on AlexNet • XNOR-Net – Weights {-α, α} – Activations {-βi, βi} à except first and last layers are 32-bit float – βi determined by the l1-norm of all activations across channels for given position i of the input feature map – Accuracy loss: 11% on AlexNet Hardware needs to support both activation precisions Scale factors (α, βi) can change per filter or position in filter
  • 21. 21 Ternary Nets • Allow for weights to be zero – Increase sparsity, but also increase number of bits (2-bits) • Ternary Weight Nets (TWN) – Weights {-w, 0, w} à except first and last layers are 32-bit float – Activations: 32-bit float – Accuracy loss: 3.7% on AlexNet • Trained Ternary Quantization (TTQ) – Weights {-w1, 0, w2} à except first and last layers are 32-bit float – Activations: 32-bit float – Accuracy loss: 0.6% on AlexNet [Li et al., arXiv 2016] [Zhu et al., ICLR 2017]
  • 22. 22 Computed Non-linear Quantization Log Domain Quantization Product = X << W Product = X * W [Lee et al., LogNet, ICASSP 2017]
  • 23. 23 Log Domain Quantization • Weights: 5-bits for CONV, 4-bit for FC; Activations: 4-bits • Accuracy loss: 3.2% on AlexNet [Miyashita et al., arXiv 2016], [Lee et al., LogNet, ICASSP 2017] Shift and Add WS
  • 24. 24 Reduce Precision Overview • Learned mapping of data to quantization levels (e.g., k-means) • Additional Properties – Fixed or Variable (across data types, layers, channels, etc.) [Han et al., ICLR 2016] Implement with look up table
  • 25. 25 Non-Linear Quantization Table Lookup Trained Quantization: Find K weights via K-means clustering to reduce number of unique weights per layer (weight sharing) [Han et al., Deep Compression, ICLR 2016] Weight Decoder/ Dequant U x 16b Weight index (log2U-bits) Weight (16-bits) Weight Memory CRSM x log2U-bits Output Activation (16-bits) MAC Input Activation (16-bits) Example: AlexNet (no accuracy loss) 256 unique weights for CONV layer 16 unique weights for FC layer Does not reduce precision of MAC Overhead Smaller Weight Memory Consequences: Narrow weight memory and second access from (small) table
  • 26. 26 Summary of Reduce Precision Category Method Weights (# of bits) Activations (# of bits) Accuracy Loss vs. 32-bit float (%) Dynamic Fixed Point w/o fine-tuning 8 10 0.4 w/ fine-tuning 8 8 0.6 Reduce weight Ternary weights Networks (TWN) 2* 32 3.7 Trained Ternary Quantization (TTQ) 2* 32 0.6 Binary Connect (BC) 1 32 19.2 Binary Weight Net (BWN) 1* 32 0.8 Reduce weight and activation Binarized Neural Net (BNN) 1 1 29.8 XNOR-Net 1* 1 11 Non-Linear LogNet 5(conv), 4(fc) 4 3.2 Weight Sharing 8(conv), 4(fc) 16 0 * first and last layers are 32-bit float