SlideShare a Scribd company logo
1 of 180
Download to read offline
L12-1
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Accelerator Architecture
(continued)
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 15, 2023
L12-2
Sze and Emer
Announcements
• Last chance to submit topic selection survey
• Please let us know if you are considering dropping… ☹
• Paper assignment later this week.
March 15, 2023
L12-3
Sze and Emer
Goals of Today’s Lecture
• Taxonomy of dataflows for CNNs
– Output Stationary (last lecture)
– Weight Stationary
– Input Stationary
• Advanced dataflow
– Row Stationary (Eyeriss)
• Data Orchestration
March 15, 2023
L12-4
Sze and Emer
Background Reading
• DNN Accelerators
– Efficient Processing of Deep Neural Networks
• Chapter 5
All these books and their online/e-book versions are available through
MIT libraries.
March 15, 2023
L12-5
Sze and Emer
Spatial Architecture for DNN
Processing
Element (PE)
Global Buffer (100 – 500 kB)
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
DRAM
Local Memory Hierarchy
• Global Buffer
• Direct inter-PE network
• PE-local memory (RF)
Control
Reg File 0.5 – 1.0 kB
March 15, 2023
L12-6
Sze and Emer
Energy Cost of Data Movement
ALU
Buffer ALU
RF ALU
Normalized Energy Cost*
200×
6×
PE ALU 2×
1×
1× (Reference)
DRAM ALU
0.5 – 1.0 kB
100 – 500 kB
* measured from a commercial 65nm process
NoC: 200 – 1000 PEs
March 15, 2023
L12-7
Sze and Emer
Types of Data Reuse in DNN
Filter Reuse
Convolutional Reuse Fmap Reuse
CONV layers only
(sliding window)
CONV and FC layers CONV and FC layers
(batch size > 1)
Filter
Input Fmap
Filters
2
1
Input Fmap
Filter
2
1
Input Fmaps
Activations
Filter weights
Reuse: Activations
Reuse: Filter weights
Reuse:
March 15, 2023
L12-8
Sze and Emer
• Minimize partial sum R/W energy consumption
− maximize local accumulation
• Broadcast/Multicast filter weights and reuse
activations spatially across the PE array
Output Stationary (OS)
Global Buffer
P0 P1 P2 P3 P4 P5 P6 P7
Activation Weight
PE
Psum
March 15, 2023
Output activations are in the outer loop
L12-9
Sze and Emer
1-D Convolution – Output Stationary
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for q in [0, Q):
for s in [0, S):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
No constraints
on loop
permutations!
L12-10
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-11
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-12
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-13
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-14
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-15
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-16
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-17
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-18
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-19
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-20
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-21
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-22
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-23
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-24
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-25
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-26
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-27
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-28
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-29
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-30
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-31
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
L12-32
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
What dataflow is this?
† Assuming: ‘valid’ style convolution
L12-33
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
What dataflow is this? Weight stationary
† Assuming: ‘valid’ style convolution
L12-34
Sze and Emer
Weight Stationary - Animation
March 15, 2023
L12-35
Sze and Emer
Weight Stationary - Spacetime
March 15, 2023
L12-36
Sze and Emer
1-D Convolution Einsum + WS
Traversal order (fastest to slowest): Q, S
WS design for parallel weights
Traversal order (fastest to slowest): Q
Parallel Ranks: S
Serial WS design
Einsum:
Einsum:
Can you write the loop nest? I hope so 😊
L12-37
Sze and Emer
Parallel Weight Stationary - Animation
March 15, 2023
L12-38
Sze and Emer
Weight Stationary (WS)
• Minimize weight read energy consumption
− maximize convolutional and filter reuse of weights
• Broadcast activations and accumulate psums
spatially across the PE array.
Global Buffer
W0 W1 W2 W3 W4 W5 W6 W7
Psum Activation
PE
Weight
March 15, 2023
Weights are in the outer loop
L12-39
Sze and Emer
WS Example: nn-X (NeuFlow)
[Farabet et al., ICCV 2009]
A 3×3 2D Convolution Engine
weights
activations
psums
March 15, 2023
L12-40
Sze and Emer
WS Example: NVDLA (simplified)
Image Source: Nvidia
March 15, 2023
Released Sept 29, 2017
Global Buffer
PE Array
M x C MACs:
The number of input and output
channels must match array,
otherwise MACs idle
Range:
• C (16 to 128)
• M (4 to 16)
*Note: In this class we use M
for number of kernels (filters),
while NVDLA uses K
http://nvdla.org
L12-41
Sze and Emer
WS Example: Nvidia DLA
• Convolution Buffer
– Stores both weights and activations
– Ratio of weights and activation data varies across layers
– Four access ports: R+W feature & R+W weights
• Convolution read bandwidth determine size of read port (C*datasize)
– Optional compression support
– Prefer to either store all weights or all activations
March 15, 2023
L12-42
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
P
output fmap
filters
M
…
R
S
0
R
C
M-1
H
input fmap
W Q
C
C
S
Filter overlay
Incomplete partial sum
L12-43
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
P
output fmap
filters
M
…
R
S
0
R
C
M-1
H
input fmap
W Q
C
C
S
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-44
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
2
output fmap
filters
8
…
2
2
0
2
3
7
3
input fmap
3 2
3
3
2
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-45
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
2
output fmap
…
2
2
1
2
8
3
3 2
input fmap
3
3
3
2
8
filters
0
7
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-46
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I000
O000
F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000
F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100
F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200
O100 O200 O300 O400 O500 O600
Psum
O700
I100
I200
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-47
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
2
output fmap
2
input fmap
Cycle through input and output fmap (hold weight)
3
3
3
8
…
2
2
1
2
8
3
3
2
filters
1
7
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-48
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
output fmap
…
2
2
1
2
8
input fmap
3
3
2
Cycle through input and output fmap (hold weight)
3
3
3
2
2
8
filters
0
7
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-49
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
output fmap
input fmap
Cycle through input and output fmap (hold weights)
3
3
3
2
2
8
…
2
2
1
2
8
3
3
2
filters
0
7
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
Filter overlay
Incomplete partial sum
L12-50
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I001
O001
F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000
F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100
F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200
O101 O201 O301 O401 O501 O601 O701
I101
I201
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-51
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I010
O010
F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000
F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100
F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200
O110 O210 O310 O410 O510 O610 O710
I110
I201
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-52
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I011
O011
F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000
F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100
F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200
O111 O211 O311 O411 O511 O611 O711
I111
I211
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-53
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
output fmap
input fmap
Load new weights
3
3
3
…
2
2
1
2
8
3
3
2
filters
0
7
2
2
8
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-54
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
output fmap
input fmap
3
3
3
Cycle through input and output fmap (hold weights)
2
2
8
…
2
2
1
2
8
3
3
2
filters
0
7
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-55
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
output fmap
input fmap
Cycle through input and output fmap (hold weights)
2
2
8
3
3
3
…
2
2
1
2
8
3
3
2
filters
0
7
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-56
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
output fmap
input fmap
Cycle through input and output fmap (hold weights)
3
3
3
2
2
2
8
…
2
2
1
2
8
3
3
2
filters
0
7
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
L12-57
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I001
O000
F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001
F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101
F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201
O100 O200 O300 O400 O500 O600 O700
I101
I201
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-58
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I002
O001
F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001
F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101
F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201
O101 O201 O301 O401 O501 O601 O701
I102
I202
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-59
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I011
O010
F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001
F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101
F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201
O110 O210 O310 O410 O510 O610 O710
I111
I211
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-60
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
Number of Kernels (i.e. Filters)
‘Atomic-M’
Number of Input
Channels
‘Atomic-C’
I012
O011
F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001
F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101
F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201
O111 O211 O311 O411 O511 O611 O711
I112
I212
Output feature map: O(m, p, q)
Input fmap:
I(c, p+r, q+s)
Weights: F(m, c, r, s)
L12-61
Sze and Emer
Loop Nest: NVDLA (simplified)
March 15, 2023
M = 8; C = 3;
R = 2; S = 2
P = 2; Q = 2
int i[C,H,W]; # Input activations
int f[M,C,R,S]; # Filter weights
int o[M,P,Q]; # Output activations
for r in [0,R):
for s in [0,S):
for p in [0,P):
for q in [0,Q):
parallel-for m in [0, M):
parallel-for c in [0, C):
o[m,p,q] += i[c,p+r,q+s] * f[m,c,r,s]
How can we tell this is weight stationary? Top loops are r and s
L12-62
Sze and Emer
NVDLA (Simplified) - Animation
March 15, 2023
M = 2
C = 2
R = 3
S = 3
H = 4
W = 8
P = 2
Q = 6
L12-63
Sze and Emer
CONV-layer Einsum
March 15, 2023
Traversal order (fastest to slowest): Q, P, S, R
Parallel Ranks: C, M
L12-64
Sze and Emer
WS Example: NVDLA (simplified)
Image Source: Nvidia
March 15, 2023
Released Sept 29, 2017
Global Buffer
http://nvdla.org
L12-65
Sze and Emer
WS Example: Nvidia DLA
• Single Data Point Operations
– Immediately after CNN
– Linear functions
• e.g. bias addition, precision scaling (before writing to memory), batch normalization, element-
wise operation
• Scaling: all values, per channel, per pixel
– Non-linear functions use LUT
• e.g. ReLU, PReLU, sigmod, tanh
• Planar Data Operations
– For pooling: Max, min, average
• Multi-Plane Operation
– Cross-channel processing for LRN
March 15, 2023
L12-66
Sze and Emer
Taxonomy: More Examples
• Weight Stationary (WS)
[Chakradhar, ISCA 2010] [nn-X (NeuFlow), CVPRW 2014]
[Park, ISSCC 2015] [ISAAC, ISCA 2016] [PRIME, ISCA 2016]
• Output Stationary (OS)
[ShiDianNao, ISCA 2015]
[Peemen, ICCD 2013]
[Gupta, ICML 2015] [Moons, VLSI 2016] [Thinker, VLSI 2017]
[TPU, ISCA 2017]
March 15, 2023
L12-67
Sze and Emer
Input Stationary
• Hold inputs stationary rather than outputs or weights
• Used for sparse CNNs [Parashar et al., SCNN, ISCA 2017]
– Sparse CNN is where many weights are zeros
– Advantage is the inputs are larger than weights thus require larger memory  reduce
reads to larger memory
• Not analyzed for dense
March 15, 2023
L12-68
Sze and Emer
1-D Convolution – Output Stationary
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for q in [0, Q):
for s in [0, S):
w = q + s
o[q] += i[w]*f[s]
† Assuming: ‘valid’ style convolution
How can we
implement input
stationary with
no input index?
L12-69
Sze and Emer
1-D Convolution – Input Stationary
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for w in [0, W):
for s in [0, S):
q = w - s
o[q] += i[w]*f[s]
† Assuming: ‘valid’ style convolution
Beware q must
be >= 0 and <Q
L12-70
Sze and Emer
1-D Convolution Einsum + IS
Traversal order (fastest to slowest): S, W
Note: w = q+s, so q = w-s, and therefore
L12-71
Sze and Emer
Variants of Output Stationary
# Output Channels
# Output Activations
P
Q
M
OSB
Multiple
Multiple
Notes
P
Q
M
OSA
Single
Multiple
Targeting
CONV layers
P
Q
M
OSC
Multiple
Single
Targeting
FC layers
Parallel
Output Region
L12-72
Sze and Emer
OS tilings
Parallel: P0, Q0
OSA
Parallel: M0, P0, Q0
OSB
Parallel: M0
OSC
L12-73
Sze and Emer
Energy Efficiency Comparison
0
0.5
1
1.5
2
NLR
WS OSA OSB OSC Row
Stationary
Normalized
Energy/MAC
CNN Dataflows
Variants of OS
• Same total area • 256 PEs
• AlexNet CONV layers • Batch size = 16
[Chen et al., ISCA 2016]
March 15, 2023
L12-74
Sze and Emer
Energy-Efficient Dataflow:
Row Stationary (RS)
• Maximize reuse and accumulation at RF
• Optimize for overall energy efficiency
instead for only a certain data type
[Chen et al., ISCA 2016]
March 15, 2023
L12-75
Sze and Emer
1D Row Convolution in PE
* =
Filter Partial Sums
a b c a b c
a b c d e
PE
Reg File
b a
c
d c
e a
b
Input Fmap
March 15, 2023
L12-76
Sze and Emer
1D Row Convolution in PE
* =
Filter
a b c a b c
a b c d e
e d
PE
b a
c
Reg File
b a
c
a
Partial Sums
Input Fmap
March 15, 2023
L12-77
Sze and Emer
1D Row Convolution in PE
* =
a b c
a b c d e Partial Sums
Input Fmap
PE
b a
c
Reg File
c b
d
b
e
a
Filter
a b c
March 15, 2023
L12-78
Sze and Emer
1D Row Convolution in PE
* =
a b c
a b c d e Partial Sums
Input Fmap
PE
b a
c
Reg File
d c
e
c
b a
Filter
a b c
March 15, 2023
L12-79
Sze and Emer
1D Row Convolution in PE
PE
b a
c
Reg File
d c
e
c
b a
• Maximize row convolutional reuse in RF
− Keep a filter row and fmap sliding window in RF
• Maximize row psum accumulation in RF
March 15, 2023
L12-80
Sze and Emer
2D Convolution in PE Array
Row 1 Row 1
=
*
*
PE 1
March 15, 2023
L12-81
Sze and Emer
2D Convolution in PE Array
PE 1
Row 1 Row 1
PE 2
Row 2 Row 2
PE 3
Row 3 Row 3
Row 1
=
*
PE 4
Row 1 Row 2
PE 5
Row 2 Row 3
PE 6
Row 3 Row 4
Row 2
=
*
PE 7
Row 1 Row 3
PE 8
Row 2 Row 4
PE 9
Row 3 Row 5
Row 3
=
*
* * *
* * *
* * *
March 15, 2023
L12-82
Sze and Emer
Convolutional Reuse Maximized
Row 1
Row 2
Row 3
Row 1
Row 2
Row 3
Row 4
Row 2
Row 3
Row 4
Row 5
Row 3
* * *
* * *
* * *
Filter rows are reused across PEs horizontally
Row 1
Row 2
Row 3
Row 1
Row 2
Row 3
Row 1
Row 2
Row 3
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
PE 7
PE 8
PE 9
March 15, 2023
L12-83
Sze and Emer
Convolutional Reuse Maximized
Row 1
Row 2
Row 3
Row 1
Row 1
Row 2
Row 3
Row 2
Row 1
Row 2
Row 3
Row 3
* * *
* * *
* * *
Fmap rows are reused across PEs diagonally
Row 1
Row 2
Row 3
Row 2
Row 3
Row 4
Row 3
Row 4
Row 5
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
PE 7
PE 8
PE 9
March 15, 2023
L12-84
Sze and Emer
Maximize 2D Accumulation in PE Array
Row 1 Row 1
Row 2 Row 2
Row 3 Row 3
Row 1 Row 2
Row 2 Row 3
Row 3 Row 4
Row 1 Row 3
Row 2 Row 4
Row 3 Row 5
* * *
* * *
* * *
Partial sums accumulate across PEs vertically
Row 1 Row 2 Row 3
PE 1
PE 2
PE 3
PE 4
PE 5
PE 6
PE 7
PE 8
PE 9
March 15, 2023
L12-85
Sze and Emer
Dataflow
Simulation Results
March 15, 2023
L12-86
Sze and Emer
Evaluate Reuse in Different Dataflows
• Weight Stationary
– Minimize movement of filter weights
• Output Stationary
– Minimize movement of partial sums
• No Local Reuse
– No PE local storage. Maximize global buffer size.
• Row Stationary
Evaluation Setup
• same total area
• 256 PEs
• AlexNet
• batch size = 16
ALU
Buffer ALU
RF ALU
Normalized Energy Cost*
200×
6×
PE ALU 2×
1×
1× (Reference)
DRAM ALU
March 15, 2023
L12-87
Sze and Emer
Dataflow Comparison: CONV Layers
Normalized
Energy/MAC
RS optimizes for the best overall energy efficiency
0
0.5
1
1.5
2
WS OSA OSB OSC NLR RS
CNN Dataflows
psums
weights
activations
[Chen et al., ISCA 2016]
L12-88
Sze and Emer
Dataflow Comparison: CONV Layers
RS uses 1.4× – 2.5× lower energy than other dataflows
Normalized
Energy/MAC
ALU
RF
NoC
buffer
DRAM
0
0.5
1
1.5
2
WS OSA OSB OSC NLR RS
CNN Dataflows
[Chen et al., ISCA 2016]
L12-89
Sze and Emer
Dataflow Comparison: FC Layers
0
0.5
1
1.5
2
Normalized
Energy/MAC
WS OSA OSB OSC NLR RS
CNN Dataflows
RS uses at least 1.3× lower energy than other dataflows
psums
weights
activations
[Chen et al., ISCA 2016]
L12-90
Sze and Emer
Row Stationary: Layer Breakdown
ALU
RF
NoC
buffer
DRAM
2.0e10
1.5e10
1.0e10
0.5e10
0
L1 L8
L2 L3 L4 L5 L6 L7
Normalized
Energy
(1 MAC = 1)
CONV Layers FC Layers
RF dominates DRAM dominates
[Chen et al., ISCA 2016]
L12-91
Sze and Emer
Row Stationary: Layer Breakdown
ALU
RF
NoC
buffer
DRAM
2.0e10
1.5e10
1.0e10
0.5e10
0
L1 L8
L2 L3 L4 L5 L6 L7
Normalized
Energy
(1 MAC = 1)
CONV Layers FC Layers
CONV layers dominate energy consumption!
Total Energy
80% 20%
L12-92
Sze and Emer
Data Orchestration
March 15, 2023
L12-93
Sze and Emer
What is Data Orchestration?
Feeding data to a computation unit exactly when it wants it
When data is moved over a
transfer substrate
Where data is placed in available staging buffers
Who the “actors” are that touch
data and their synchronization
with each other
ML ASICs use workload knowledge to optimize orchestration at design-time without caches
How data is accessed (strides,
patterns, etc.), including when it
is no longer needed
March 15, 2023
L12-94
Sze and Emer
Buffer Hierarchies in DNN Accelerators
• Percentage of area devoted to on-chip buffers:
Desire to have a reusable, efficient storage hierarchy exists that is separate from any given accelerator
March 15, 2023
L12-95
Sze and Emer
Guiding Principles for Data Orchestration
Efficient reuse – small storage
physically close to consuming
units for reused data
Precise synchronization –
Only wait for exactly data you
need, respond quickly (e.g.,
no barriers or remote polling)
Delivery/use overlap - Next
tile should be available when
current is done (e.g., double-
buffering)
Storage usage efficiency –
Minimize idle storage waiting
for long round trip latency
Bandwidth efficiency - Maximize
delivery rate by controlling
outstanding requests
Cross-unit use – amortize
data access and
communication
March 15, 2023
L12-96
Sze and Emer
Approaches: Implicit versus Explicit
Explicit:
Implicit:
March 15, 2023
L12-97
Sze and Emer
Approaches: Coupled versus Decoupled
Implicit + Coupled Implicit + Decoupled
March 15, 2023
L12-98
Sze and Emer
Explicit Decoupled Data Orchestration
Implicit + Decoupled Explicit + Decoupled
March 15, 2023
L12-99
Sze and Emer
Classifying Orchestration Approaches
implicit data
placement
decoupled
coupled
Load...
Math...
Store...
HW Pre-
fetching
ASIC-style fill
and iterate
engines
GPU
shared
memory
Cache-
level
specific
loads
SW-
controlled
replace.
policy
explicit data
placement
SW pre-
fetching
Decoupled
Access
Execute
March 15, 2023
L12-100
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-101
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-102
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-103
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-104
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-105
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-106
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-107
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-108
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-109
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-110
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-111
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-112
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-113
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-114
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-115
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-116
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-117
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-118
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-119
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-120
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-121
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-122
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-123
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-124
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-125
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-126
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-127
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-128
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-129
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-130
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-131
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-132
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-133
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-134
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-135
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-136
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-137
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-138
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-139
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-140
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-141
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-142
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-143
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-144
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-145
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-146
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-147
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-148
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-149
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-150
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-151
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-152
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-153
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-154
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-155
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-156
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-157
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-158
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-159
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-160
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-161
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-162
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-163
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-164
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-165
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-166
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-167
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-168
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-169
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-170
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-171
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-172
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
L12-173
Sze and Emer
Buffets
• Compared to FIFO
– Allows random access into live window
– Allows updates of values in live window
• Compared to scratchpad:
– Adds scoreboarding for synchronization
– Allows arbitrary degrees of buffering
• Compared to cache
– Addresses local (therefore fewer bits) and no tag store
– Push model versus pull model for smaller landing zone
– Raises level of abstraction beyond single value transfers
Buffet
Fill()
from
memory
Read() Update() Shrink()
local_offset data local_offset data
data
num to
datapath
Can be specialized, for example: “read-only” buffets that have fills but not updates
March 15, 2023
L12-174
Sze and Emer
Buffet Usage Model
March 15, 2023
Buffet
Storage
Read sequence
Read values
Update sequence
Update values
Fill values
Fill sequence
L12-175
Sze and Emer
Buffet Behavioral Attributes
• Based on ‘fill’ address sequence, will pop values from ‘fill’ LI channel
until fill buffer is full.
• Based on ‘read’ address sequence will try to push values down ‘read’
LI channel, but only if the value has been ‘filled’.
• ‘Read’ address sequence can also inform buffer that a value can be
dropped, i.e., space freed.
• Based on ‘update’ address sequence will try to pop values from
‘update’ LI channel
• Implementation may include multiple logical buffers inside a single
physical buffer.
March 15, 2023
L12-176
Sze and Emer
Buffets: Composable Idiom for E.D.D.O.
Transfers between levels only depend on a credit
flow from the adjacent level.
March 15, 2023
L12-177
Sze and Emer
Data Communications
March 13, 2023
L12-178
Sze and Emer
Configurable Systolic Array - WARP
March 13, 2023
Source: WARP Architecture and Implementation, ISCA 1986
L12-179
Sze and Emer
Latency-Insensitive (LI) Channels
March 13, 2023
IsFull()
Push()
PushNB()
IsEmpty()
Peek()
Pop()
PopNB()
OUT IN
Connections
Producer Consumer
Credit: Pellauer
Inter-unit communication can use Latency-Insensitive
(LI) Channels that, for functionality purposes, have no
latency requirements
Semantics: FIFO abstraction with flow control on all interfaces
L12-180
Sze and Emer
March 13, 2023
Next:
Mapping

More Related Content

More from TRNHONGLINHBCHCM (15)

L11.pdf
L11.pdfL11.pdf
L11.pdf
 
L09.pdf
L09.pdfL09.pdf
L09.pdf
 
L04.pdf
L04.pdfL04.pdf
L04.pdf
 
L01.pdf
L01.pdfL01.pdf
L01.pdf
 
L02.pdf
L02.pdfL02.pdf
L02.pdf
 
L14.pdf
L14.pdfL14.pdf
L14.pdf
 
Projects.pdf
Projects.pdfProjects.pdf
Projects.pdf
 
L15.pdf
L15.pdfL15.pdf
L15.pdf
 
L10.pdf
L10.pdfL10.pdf
L10.pdf
 
L08.pdf
L08.pdfL08.pdf
L08.pdf
 
PaperReview.pdf
PaperReview.pdfPaperReview.pdf
PaperReview.pdf
 
L13.pdf
L13.pdfL13.pdf
L13.pdf
 
L07-handout.pdf
L07-handout.pdfL07-handout.pdf
L07-handout.pdf
 
L10.pdf
L10.pdfL10.pdf
L10.pdf
 
L10-handout.pdf
L10-handout.pdfL10-handout.pdf
L10-handout.pdf
 

Recently uploaded

+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 

Recently uploaded (20)

Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Bridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptxBridge Jacking Design Sample Calculation.pptx
Bridge Jacking Design Sample Calculation.pptx
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 

L12.pdf

  • 1. L12-1 Sze and Emer 6.5930/1 Hardware Architectures for Deep Learning Accelerator Architecture (continued) Joel Emer and Vivienne Sze Massachusetts Institute of Technology Electrical Engineering & Computer Science March 15, 2023
  • 2. L12-2 Sze and Emer Announcements • Last chance to submit topic selection survey • Please let us know if you are considering dropping… ☹ • Paper assignment later this week. March 15, 2023
  • 3. L12-3 Sze and Emer Goals of Today’s Lecture • Taxonomy of dataflows for CNNs – Output Stationary (last lecture) – Weight Stationary – Input Stationary • Advanced dataflow – Row Stationary (Eyeriss) • Data Orchestration March 15, 2023
  • 4. L12-4 Sze and Emer Background Reading • DNN Accelerators – Efficient Processing of Deep Neural Networks • Chapter 5 All these books and their online/e-book versions are available through MIT libraries. March 15, 2023
  • 5. L12-5 Sze and Emer Spatial Architecture for DNN Processing Element (PE) Global Buffer (100 – 500 kB) ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU ALU DRAM Local Memory Hierarchy • Global Buffer • Direct inter-PE network • PE-local memory (RF) Control Reg File 0.5 – 1.0 kB March 15, 2023
  • 6. L12-6 Sze and Emer Energy Cost of Data Movement ALU Buffer ALU RF ALU Normalized Energy Cost* 200× 6× PE ALU 2× 1× 1× (Reference) DRAM ALU 0.5 – 1.0 kB 100 – 500 kB * measured from a commercial 65nm process NoC: 200 – 1000 PEs March 15, 2023
  • 7. L12-7 Sze and Emer Types of Data Reuse in DNN Filter Reuse Convolutional Reuse Fmap Reuse CONV layers only (sliding window) CONV and FC layers CONV and FC layers (batch size > 1) Filter Input Fmap Filters 2 1 Input Fmap Filter 2 1 Input Fmaps Activations Filter weights Reuse: Activations Reuse: Filter weights Reuse: March 15, 2023
  • 8. L12-8 Sze and Emer • Minimize partial sum R/W energy consumption − maximize local accumulation • Broadcast/Multicast filter weights and reuse activations spatially across the PE array Output Stationary (OS) Global Buffer P0 P1 P2 P3 P4 P5 P6 P7 Activation Weight PE Psum March 15, 2023 Output activations are in the outer loop
  • 9. L12-9 Sze and Emer 1-D Convolution – Output Stationary March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for q in [0, Q): for s in [0, S): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution No constraints on loop permutations!
  • 10. L12-10 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 11. L12-11 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 12. L12-12 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 13. L12-13 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 14. L12-14 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 15. L12-15 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 16. L12-16 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 17. L12-17 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 18. L12-18 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 19. L12-19 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 20. L12-20 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 21. L12-21 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 22. L12-22 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 23. L12-23 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 24. L12-24 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 25. L12-25 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 26. L12-26 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 27. L12-27 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 28. L12-28 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 29. L12-29 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 30. L12-30 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 31. L12-31 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] † Assuming: ‘valid’ style convolution
  • 32. L12-32 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] What dataflow is this? † Assuming: ‘valid’ style convolution
  • 33. L12-33 Sze and Emer 1-D Convolution March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for s in [0, S): for q in [0, Q): o[q] += i[q+s]*f[s] What dataflow is this? Weight stationary † Assuming: ‘valid’ style convolution
  • 34. L12-34 Sze and Emer Weight Stationary - Animation March 15, 2023
  • 35. L12-35 Sze and Emer Weight Stationary - Spacetime March 15, 2023
  • 36. L12-36 Sze and Emer 1-D Convolution Einsum + WS Traversal order (fastest to slowest): Q, S WS design for parallel weights Traversal order (fastest to slowest): Q Parallel Ranks: S Serial WS design Einsum: Einsum: Can you write the loop nest? I hope so 😊
  • 37. L12-37 Sze and Emer Parallel Weight Stationary - Animation March 15, 2023
  • 38. L12-38 Sze and Emer Weight Stationary (WS) • Minimize weight read energy consumption − maximize convolutional and filter reuse of weights • Broadcast activations and accumulate psums spatially across the PE array. Global Buffer W0 W1 W2 W3 W4 W5 W6 W7 Psum Activation PE Weight March 15, 2023 Weights are in the outer loop
  • 39. L12-39 Sze and Emer WS Example: nn-X (NeuFlow) [Farabet et al., ICCV 2009] A 3×3 2D Convolution Engine weights activations psums March 15, 2023
  • 40. L12-40 Sze and Emer WS Example: NVDLA (simplified) Image Source: Nvidia March 15, 2023 Released Sept 29, 2017 Global Buffer PE Array M x C MACs: The number of input and output channels must match array, otherwise MACs idle Range: • C (16 to 128) • M (4 to 16) *Note: In this class we use M for number of kernels (filters), while NVDLA uses K http://nvdla.org
  • 41. L12-41 Sze and Emer WS Example: Nvidia DLA • Convolution Buffer – Stores both weights and activations – Ratio of weights and activation data varies across layers – Four access ports: R+W feature & R+W weights • Convolution read bandwidth determine size of read port (C*datasize) – Optional compression support – Prefer to either store all weights or all activations March 15, 2023
  • 42. L12-42 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 P output fmap filters M … R S 0 R C M-1 H input fmap W Q C C S Filter overlay Incomplete partial sum
  • 43. L12-43 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 P output fmap filters M … R S 0 R C M-1 H input fmap W Q C C S Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 44. L12-44 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 2 output fmap filters 8 … 2 2 0 2 3 7 3 input fmap 3 2 3 3 2 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 45. L12-45 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 2 output fmap … 2 2 1 2 8 3 3 2 input fmap 3 3 3 2 8 filters 0 7 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 46. L12-46 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I000 O000 F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000 F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100 F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200 O100 O200 O300 O400 O500 O600 Psum O700 I100 I200 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 47. L12-47 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 2 output fmap 2 input fmap Cycle through input and output fmap (hold weight) 3 3 3 8 … 2 2 1 2 8 3 3 2 filters 1 7 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 48. L12-48 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 output fmap … 2 2 1 2 8 input fmap 3 3 2 Cycle through input and output fmap (hold weight) 3 3 3 2 2 8 filters 0 7 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 49. L12-49 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 output fmap input fmap Cycle through input and output fmap (hold weights) 3 3 3 2 2 8 … 2 2 1 2 8 3 3 2 filters 0 7 M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2 Filter overlay Incomplete partial sum
  • 50. L12-50 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I001 O001 F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000 F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100 F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200 O101 O201 O301 O401 O501 O601 O701 I101 I201 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 51. L12-51 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I010 O010 F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000 F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100 F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200 O110 O210 O310 O410 O510 O610 O710 I110 I201 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 52. L12-52 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I011 O011 F0000 F1000 F2000 F3000 F4000 F5000 F6000 F7000 F0100 F1100 F2100 F3100 F4100 F5100 F6100 F7100 F0200 F1200 F2200 F3200 F4200 F5200 F6200 F7200 O111 O211 O311 O411 O511 O611 O711 I111 I211 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 53. L12-53 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 output fmap input fmap Load new weights 3 3 3 … 2 2 1 2 8 3 3 2 filters 0 7 2 2 8 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 54. L12-54 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 output fmap input fmap 3 3 3 Cycle through input and output fmap (hold weights) 2 2 8 … 2 2 1 2 8 3 3 2 filters 0 7 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 55. L12-55 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 output fmap input fmap Cycle through input and output fmap (hold weights) 2 2 8 3 3 3 … 2 2 1 2 8 3 3 2 filters 0 7 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 56. L12-56 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 output fmap input fmap Cycle through input and output fmap (hold weights) 3 3 3 2 2 2 8 … 2 2 1 2 8 3 3 2 filters 0 7 Filter overlay Incomplete partial sum M=8 C=3 R=2 S=2 H=3 W=3 P=2 Q=2
  • 57. L12-57 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I001 O000 F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001 F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101 F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201 O100 O200 O300 O400 O500 O600 O700 I101 I201 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 58. L12-58 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I002 O001 F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001 F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101 F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201 O101 O201 O301 O401 O501 O601 O701 I102 I202 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 59. L12-59 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I011 O010 F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001 F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101 F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201 O110 O210 O310 O410 O510 O610 O710 I111 I211 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 60. L12-60 Sze and Emer WS Example: NVDLA (simplified) March 15, 2023 Number of Kernels (i.e. Filters) ‘Atomic-M’ Number of Input Channels ‘Atomic-C’ I012 O011 F0001 F1001 F2001 F3001 F4001 F5001 F6001 F7001 F0101 F1101 F2101 F3101 F4101 F5101 F6101 F7101 F0201 F1201 F2201 F3201 F4201 F5201 F6201 F7201 O111 O211 O311 O411 O511 O611 O711 I112 I212 Output feature map: O(m, p, q) Input fmap: I(c, p+r, q+s) Weights: F(m, c, r, s)
  • 61. L12-61 Sze and Emer Loop Nest: NVDLA (simplified) March 15, 2023 M = 8; C = 3; R = 2; S = 2 P = 2; Q = 2 int i[C,H,W]; # Input activations int f[M,C,R,S]; # Filter weights int o[M,P,Q]; # Output activations for r in [0,R): for s in [0,S): for p in [0,P): for q in [0,Q): parallel-for m in [0, M): parallel-for c in [0, C): o[m,p,q] += i[c,p+r,q+s] * f[m,c,r,s] How can we tell this is weight stationary? Top loops are r and s
  • 62. L12-62 Sze and Emer NVDLA (Simplified) - Animation March 15, 2023 M = 2 C = 2 R = 3 S = 3 H = 4 W = 8 P = 2 Q = 6
  • 63. L12-63 Sze and Emer CONV-layer Einsum March 15, 2023 Traversal order (fastest to slowest): Q, P, S, R Parallel Ranks: C, M
  • 64. L12-64 Sze and Emer WS Example: NVDLA (simplified) Image Source: Nvidia March 15, 2023 Released Sept 29, 2017 Global Buffer http://nvdla.org
  • 65. L12-65 Sze and Emer WS Example: Nvidia DLA • Single Data Point Operations – Immediately after CNN – Linear functions • e.g. bias addition, precision scaling (before writing to memory), batch normalization, element- wise operation • Scaling: all values, per channel, per pixel – Non-linear functions use LUT • e.g. ReLU, PReLU, sigmod, tanh • Planar Data Operations – For pooling: Max, min, average • Multi-Plane Operation – Cross-channel processing for LRN March 15, 2023
  • 66. L12-66 Sze and Emer Taxonomy: More Examples • Weight Stationary (WS) [Chakradhar, ISCA 2010] [nn-X (NeuFlow), CVPRW 2014] [Park, ISSCC 2015] [ISAAC, ISCA 2016] [PRIME, ISCA 2016] • Output Stationary (OS) [ShiDianNao, ISCA 2015] [Peemen, ICCD 2013] [Gupta, ICML 2015] [Moons, VLSI 2016] [Thinker, VLSI 2017] [TPU, ISCA 2017] March 15, 2023
  • 67. L12-67 Sze and Emer Input Stationary • Hold inputs stationary rather than outputs or weights • Used for sparse CNNs [Parashar et al., SCNN, ISCA 2017] – Sparse CNN is where many weights are zeros – Advantage is the inputs are larger than weights thus require larger memory  reduce reads to larger memory • Not analyzed for dense March 15, 2023
  • 68. L12-68 Sze and Emer 1-D Convolution – Output Stationary March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for q in [0, Q): for s in [0, S): w = q + s o[q] += i[w]*f[s] † Assuming: ‘valid’ style convolution How can we implement input stationary with no input index?
  • 69. L12-69 Sze and Emer 1-D Convolution – Input Stationary March 15, 2023 S Weights W Inputs Q = W-ceil(S/2)† Outputs * = int i[W]; # Input activations int f[S]; # Filter weights int o[Q]; # Output activations for w in [0, W): for s in [0, S): q = w - s o[q] += i[w]*f[s] † Assuming: ‘valid’ style convolution Beware q must be >= 0 and <Q
  • 70. L12-70 Sze and Emer 1-D Convolution Einsum + IS Traversal order (fastest to slowest): S, W Note: w = q+s, so q = w-s, and therefore
  • 71. L12-71 Sze and Emer Variants of Output Stationary # Output Channels # Output Activations P Q M OSB Multiple Multiple Notes P Q M OSA Single Multiple Targeting CONV layers P Q M OSC Multiple Single Targeting FC layers Parallel Output Region
  • 72. L12-72 Sze and Emer OS tilings Parallel: P0, Q0 OSA Parallel: M0, P0, Q0 OSB Parallel: M0 OSC
  • 73. L12-73 Sze and Emer Energy Efficiency Comparison 0 0.5 1 1.5 2 NLR WS OSA OSB OSC Row Stationary Normalized Energy/MAC CNN Dataflows Variants of OS • Same total area • 256 PEs • AlexNet CONV layers • Batch size = 16 [Chen et al., ISCA 2016] March 15, 2023
  • 74. L12-74 Sze and Emer Energy-Efficient Dataflow: Row Stationary (RS) • Maximize reuse and accumulation at RF • Optimize for overall energy efficiency instead for only a certain data type [Chen et al., ISCA 2016] March 15, 2023
  • 75. L12-75 Sze and Emer 1D Row Convolution in PE * = Filter Partial Sums a b c a b c a b c d e PE Reg File b a c d c e a b Input Fmap March 15, 2023
  • 76. L12-76 Sze and Emer 1D Row Convolution in PE * = Filter a b c a b c a b c d e e d PE b a c Reg File b a c a Partial Sums Input Fmap March 15, 2023
  • 77. L12-77 Sze and Emer 1D Row Convolution in PE * = a b c a b c d e Partial Sums Input Fmap PE b a c Reg File c b d b e a Filter a b c March 15, 2023
  • 78. L12-78 Sze and Emer 1D Row Convolution in PE * = a b c a b c d e Partial Sums Input Fmap PE b a c Reg File d c e c b a Filter a b c March 15, 2023
  • 79. L12-79 Sze and Emer 1D Row Convolution in PE PE b a c Reg File d c e c b a • Maximize row convolutional reuse in RF − Keep a filter row and fmap sliding window in RF • Maximize row psum accumulation in RF March 15, 2023
  • 80. L12-80 Sze and Emer 2D Convolution in PE Array Row 1 Row 1 = * * PE 1 March 15, 2023
  • 81. L12-81 Sze and Emer 2D Convolution in PE Array PE 1 Row 1 Row 1 PE 2 Row 2 Row 2 PE 3 Row 3 Row 3 Row 1 = * PE 4 Row 1 Row 2 PE 5 Row 2 Row 3 PE 6 Row 3 Row 4 Row 2 = * PE 7 Row 1 Row 3 PE 8 Row 2 Row 4 PE 9 Row 3 Row 5 Row 3 = * * * * * * * * * * March 15, 2023
  • 82. L12-82 Sze and Emer Convolutional Reuse Maximized Row 1 Row 2 Row 3 Row 1 Row 2 Row 3 Row 4 Row 2 Row 3 Row 4 Row 5 Row 3 * * * * * * * * * Filter rows are reused across PEs horizontally Row 1 Row 2 Row 3 Row 1 Row 2 Row 3 Row 1 Row 2 Row 3 PE 1 PE 2 PE 3 PE 4 PE 5 PE 6 PE 7 PE 8 PE 9 March 15, 2023
  • 83. L12-83 Sze and Emer Convolutional Reuse Maximized Row 1 Row 2 Row 3 Row 1 Row 1 Row 2 Row 3 Row 2 Row 1 Row 2 Row 3 Row 3 * * * * * * * * * Fmap rows are reused across PEs diagonally Row 1 Row 2 Row 3 Row 2 Row 3 Row 4 Row 3 Row 4 Row 5 PE 1 PE 2 PE 3 PE 4 PE 5 PE 6 PE 7 PE 8 PE 9 March 15, 2023
  • 84. L12-84 Sze and Emer Maximize 2D Accumulation in PE Array Row 1 Row 1 Row 2 Row 2 Row 3 Row 3 Row 1 Row 2 Row 2 Row 3 Row 3 Row 4 Row 1 Row 3 Row 2 Row 4 Row 3 Row 5 * * * * * * * * * Partial sums accumulate across PEs vertically Row 1 Row 2 Row 3 PE 1 PE 2 PE 3 PE 4 PE 5 PE 6 PE 7 PE 8 PE 9 March 15, 2023
  • 85. L12-85 Sze and Emer Dataflow Simulation Results March 15, 2023
  • 86. L12-86 Sze and Emer Evaluate Reuse in Different Dataflows • Weight Stationary – Minimize movement of filter weights • Output Stationary – Minimize movement of partial sums • No Local Reuse – No PE local storage. Maximize global buffer size. • Row Stationary Evaluation Setup • same total area • 256 PEs • AlexNet • batch size = 16 ALU Buffer ALU RF ALU Normalized Energy Cost* 200× 6× PE ALU 2× 1× 1× (Reference) DRAM ALU March 15, 2023
  • 87. L12-87 Sze and Emer Dataflow Comparison: CONV Layers Normalized Energy/MAC RS optimizes for the best overall energy efficiency 0 0.5 1 1.5 2 WS OSA OSB OSC NLR RS CNN Dataflows psums weights activations [Chen et al., ISCA 2016]
  • 88. L12-88 Sze and Emer Dataflow Comparison: CONV Layers RS uses 1.4× – 2.5× lower energy than other dataflows Normalized Energy/MAC ALU RF NoC buffer DRAM 0 0.5 1 1.5 2 WS OSA OSB OSC NLR RS CNN Dataflows [Chen et al., ISCA 2016]
  • 89. L12-89 Sze and Emer Dataflow Comparison: FC Layers 0 0.5 1 1.5 2 Normalized Energy/MAC WS OSA OSB OSC NLR RS CNN Dataflows RS uses at least 1.3× lower energy than other dataflows psums weights activations [Chen et al., ISCA 2016]
  • 90. L12-90 Sze and Emer Row Stationary: Layer Breakdown ALU RF NoC buffer DRAM 2.0e10 1.5e10 1.0e10 0.5e10 0 L1 L8 L2 L3 L4 L5 L6 L7 Normalized Energy (1 MAC = 1) CONV Layers FC Layers RF dominates DRAM dominates [Chen et al., ISCA 2016]
  • 91. L12-91 Sze and Emer Row Stationary: Layer Breakdown ALU RF NoC buffer DRAM 2.0e10 1.5e10 1.0e10 0.5e10 0 L1 L8 L2 L3 L4 L5 L6 L7 Normalized Energy (1 MAC = 1) CONV Layers FC Layers CONV layers dominate energy consumption! Total Energy 80% 20%
  • 92. L12-92 Sze and Emer Data Orchestration March 15, 2023
  • 93. L12-93 Sze and Emer What is Data Orchestration? Feeding data to a computation unit exactly when it wants it When data is moved over a transfer substrate Where data is placed in available staging buffers Who the “actors” are that touch data and their synchronization with each other ML ASICs use workload knowledge to optimize orchestration at design-time without caches How data is accessed (strides, patterns, etc.), including when it is no longer needed March 15, 2023
  • 94. L12-94 Sze and Emer Buffer Hierarchies in DNN Accelerators • Percentage of area devoted to on-chip buffers: Desire to have a reusable, efficient storage hierarchy exists that is separate from any given accelerator March 15, 2023
  • 95. L12-95 Sze and Emer Guiding Principles for Data Orchestration Efficient reuse – small storage physically close to consuming units for reused data Precise synchronization – Only wait for exactly data you need, respond quickly (e.g., no barriers or remote polling) Delivery/use overlap - Next tile should be available when current is done (e.g., double- buffering) Storage usage efficiency – Minimize idle storage waiting for long round trip latency Bandwidth efficiency - Maximize delivery rate by controlling outstanding requests Cross-unit use – amortize data access and communication March 15, 2023
  • 96. L12-96 Sze and Emer Approaches: Implicit versus Explicit Explicit: Implicit: March 15, 2023
  • 97. L12-97 Sze and Emer Approaches: Coupled versus Decoupled Implicit + Coupled Implicit + Decoupled March 15, 2023
  • 98. L12-98 Sze and Emer Explicit Decoupled Data Orchestration Implicit + Decoupled Explicit + Decoupled March 15, 2023
  • 99. L12-99 Sze and Emer Classifying Orchestration Approaches implicit data placement decoupled coupled Load... Math... Store... HW Pre- fetching ASIC-style fill and iterate engines GPU shared memory Cache- level specific loads SW- controlled replace. policy explicit data placement SW pre- fetching Decoupled Access Execute March 15, 2023
  • 100. L12-100 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 101. L12-101 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 102. L12-102 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 103. L12-103 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 104. L12-104 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 105. L12-105 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 106. L12-106 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 107. L12-107 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 108. L12-108 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 109. L12-109 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 110. L12-110 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 111. L12-111 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 112. L12-112 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 113. L12-113 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 114. L12-114 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 115. L12-115 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 116. L12-116 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 117. L12-117 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 118. L12-118 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 119. L12-119 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 120. L12-120 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 121. L12-121 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 122. L12-122 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 123. L12-123 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 124. L12-124 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 125. L12-125 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 126. L12-126 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 127. L12-127 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 128. L12-128 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 129. L12-129 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 130. L12-130 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 131. L12-131 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 132. L12-132 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 133. L12-133 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 134. L12-134 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 135. L12-135 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 136. L12-136 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 137. L12-137 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 138. L12-138 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 139. L12-139 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 140. L12-140 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 141. L12-141 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 142. L12-142 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 143. L12-143 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 144. L12-144 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 145. L12-145 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 146. L12-146 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 147. L12-147 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 148. L12-148 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 149. L12-149 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 150. L12-150 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 151. L12-151 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 152. L12-152 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 153. L12-153 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 154. L12-154 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 155. L12-155 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 156. L12-156 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 157. L12-157 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 158. L12-158 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 159. L12-159 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 160. L12-160 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 161. L12-161 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 162. L12-162 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 163. L12-163 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 164. L12-164 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 165. L12-165 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 166. L12-166 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 167. L12-167 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 168. L12-168 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 169. L12-169 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 170. L12-170 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 171. L12-171 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 172. L12-172 Sze and Emer Buffering Strategies March 15, 2023 Fill and use: Buffer Storage Drain values Fill values Double buffer Buffer Storage Drain values Fill values Rolling use Buffer Storage Drain values Fill values
  • 173. L12-173 Sze and Emer Buffets • Compared to FIFO – Allows random access into live window – Allows updates of values in live window • Compared to scratchpad: – Adds scoreboarding for synchronization – Allows arbitrary degrees of buffering • Compared to cache – Addresses local (therefore fewer bits) and no tag store – Push model versus pull model for smaller landing zone – Raises level of abstraction beyond single value transfers Buffet Fill() from memory Read() Update() Shrink() local_offset data local_offset data data num to datapath Can be specialized, for example: “read-only” buffets that have fills but not updates March 15, 2023
  • 174. L12-174 Sze and Emer Buffet Usage Model March 15, 2023 Buffet Storage Read sequence Read values Update sequence Update values Fill values Fill sequence
  • 175. L12-175 Sze and Emer Buffet Behavioral Attributes • Based on ‘fill’ address sequence, will pop values from ‘fill’ LI channel until fill buffer is full. • Based on ‘read’ address sequence will try to push values down ‘read’ LI channel, but only if the value has been ‘filled’. • ‘Read’ address sequence can also inform buffer that a value can be dropped, i.e., space freed. • Based on ‘update’ address sequence will try to pop values from ‘update’ LI channel • Implementation may include multiple logical buffers inside a single physical buffer. March 15, 2023
  • 176. L12-176 Sze and Emer Buffets: Composable Idiom for E.D.D.O. Transfers between levels only depend on a credit flow from the adjacent level. March 15, 2023
  • 177. L12-177 Sze and Emer Data Communications March 13, 2023
  • 178. L12-178 Sze and Emer Configurable Systolic Array - WARP March 13, 2023 Source: WARP Architecture and Implementation, ISCA 1986
  • 179. L12-179 Sze and Emer Latency-Insensitive (LI) Channels March 13, 2023 IsFull() Push() PushNB() IsEmpty() Peek() Pop() PopNB() OUT IN Connections Producer Consumer Credit: Pellauer Inter-unit communication can use Latency-Insensitive (LI) Channels that, for functionality purposes, have no latency requirements Semantics: FIFO abstraction with flow control on all interfaces
  • 180. L12-180 Sze and Emer March 13, 2023 Next: Mapping