1. L12-1
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Accelerator Architecture
(continued)
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 15, 2023
2. L12-2
Sze and Emer
Announcements
• Last chance to submit topic selection survey
• Please let us know if you are considering dropping… ☹
• Paper assignment later this week.
March 15, 2023
3. L12-3
Sze and Emer
Goals of Today’s Lecture
• Taxonomy of dataflows for CNNs
– Output Stationary (last lecture)
– Weight Stationary
– Input Stationary
• Advanced dataflow
– Row Stationary (Eyeriss)
• Data Orchestration
March 15, 2023
4. L12-4
Sze and Emer
Background Reading
• DNN Accelerators
– Efficient Processing of Deep Neural Networks
• Chapter 5
All these books and their online/e-book versions are available through
MIT libraries.
March 15, 2023
5. L12-5
Sze and Emer
Spatial Architecture for DNN
Processing
Element (PE)
Global Buffer (100 – 500 kB)
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
ALU
DRAM
Local Memory Hierarchy
• Global Buffer
• Direct inter-PE network
• PE-local memory (RF)
Control
Reg File 0.5 – 1.0 kB
March 15, 2023
6. L12-6
Sze and Emer
Energy Cost of Data Movement
ALU
Buffer ALU
RF ALU
Normalized Energy Cost*
200×
6×
PE ALU 2×
1×
1× (Reference)
DRAM ALU
0.5 – 1.0 kB
100 – 500 kB
* measured from a commercial 65nm process
NoC: 200 – 1000 PEs
March 15, 2023
7. L12-7
Sze and Emer
Types of Data Reuse in DNN
Filter Reuse
Convolutional Reuse Fmap Reuse
CONV layers only
(sliding window)
CONV and FC layers CONV and FC layers
(batch size > 1)
Filter
Input Fmap
Filters
2
1
Input Fmap
Filter
2
1
Input Fmaps
Activations
Filter weights
Reuse: Activations
Reuse: Filter weights
Reuse:
March 15, 2023
8. L12-8
Sze and Emer
• Minimize partial sum R/W energy consumption
− maximize local accumulation
• Broadcast/Multicast filter weights and reuse
activations spatially across the PE array
Output Stationary (OS)
Global Buffer
P0 P1 P2 P3 P4 P5 P6 P7
Activation Weight
PE
Psum
March 15, 2023
Output activations are in the outer loop
9. L12-9
Sze and Emer
1-D Convolution – Output Stationary
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for q in [0, Q):
for s in [0, S):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
No constraints
on loop
permutations!
10. L12-10
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
11. L12-11
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
12. L12-12
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
13. L12-13
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
14. L12-14
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
15. L12-15
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
16. L12-16
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
17. L12-17
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
18. L12-18
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
19. L12-19
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
20. L12-20
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
21. L12-21
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
22. L12-22
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
23. L12-23
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
24. L12-24
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
25. L12-25
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
26. L12-26
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
27. L12-27
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
28. L12-28
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
29. L12-29
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
30. L12-30
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
31. L12-31
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
† Assuming: ‘valid’ style convolution
32. L12-32
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
What dataflow is this?
† Assuming: ‘valid’ style convolution
33. L12-33
Sze and Emer
1-D Convolution
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for s in [0, S):
for q in [0, Q):
o[q] += i[q+s]*f[s]
What dataflow is this? Weight stationary
† Assuming: ‘valid’ style convolution
36. L12-36
Sze and Emer
1-D Convolution Einsum + WS
Traversal order (fastest to slowest): Q, S
WS design for parallel weights
Traversal order (fastest to slowest): Q
Parallel Ranks: S
Serial WS design
Einsum:
Einsum:
Can you write the loop nest? I hope so 😊
38. L12-38
Sze and Emer
Weight Stationary (WS)
• Minimize weight read energy consumption
− maximize convolutional and filter reuse of weights
• Broadcast activations and accumulate psums
spatially across the PE array.
Global Buffer
W0 W1 W2 W3 W4 W5 W6 W7
Psum Activation
PE
Weight
March 15, 2023
Weights are in the outer loop
39. L12-39
Sze and Emer
WS Example: nn-X (NeuFlow)
[Farabet et al., ICCV 2009]
A 3×3 2D Convolution Engine
weights
activations
psums
March 15, 2023
40. L12-40
Sze and Emer
WS Example: NVDLA (simplified)
Image Source: Nvidia
March 15, 2023
Released Sept 29, 2017
Global Buffer
PE Array
M x C MACs:
The number of input and output
channels must match array,
otherwise MACs idle
Range:
• C (16 to 128)
• M (4 to 16)
*Note: In this class we use M
for number of kernels (filters),
while NVDLA uses K
http://nvdla.org
41. L12-41
Sze and Emer
WS Example: Nvidia DLA
• Convolution Buffer
– Stores both weights and activations
– Ratio of weights and activation data varies across layers
– Four access ports: R+W feature & R+W weights
• Convolution read bandwidth determine size of read port (C*datasize)
– Optional compression support
– Prefer to either store all weights or all activations
March 15, 2023
42. L12-42
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
P
output fmap
filters
M
…
R
S
0
R
C
M-1
H
input fmap
W Q
C
C
S
Filter overlay
Incomplete partial sum
43. L12-43
Sze and Emer
WS Example: NVDLA (simplified)
March 15, 2023
P
output fmap
filters
M
…
R
S
0
R
C
M-1
H
input fmap
W Q
C
C
S
Filter overlay
Incomplete partial sum
M=8
C=3
R=2
S=2
H=3
W=3
P=2
Q=2
61. L12-61
Sze and Emer
Loop Nest: NVDLA (simplified)
March 15, 2023
M = 8; C = 3;
R = 2; S = 2
P = 2; Q = 2
int i[C,H,W]; # Input activations
int f[M,C,R,S]; # Filter weights
int o[M,P,Q]; # Output activations
for r in [0,R):
for s in [0,S):
for p in [0,P):
for q in [0,Q):
parallel-for m in [0, M):
parallel-for c in [0, C):
o[m,p,q] += i[c,p+r,q+s] * f[m,c,r,s]
How can we tell this is weight stationary? Top loops are r and s
62. L12-62
Sze and Emer
NVDLA (Simplified) - Animation
March 15, 2023
M = 2
C = 2
R = 3
S = 3
H = 4
W = 8
P = 2
Q = 6
63. L12-63
Sze and Emer
CONV-layer Einsum
March 15, 2023
Traversal order (fastest to slowest): Q, P, S, R
Parallel Ranks: C, M
64. L12-64
Sze and Emer
WS Example: NVDLA (simplified)
Image Source: Nvidia
March 15, 2023
Released Sept 29, 2017
Global Buffer
http://nvdla.org
65. L12-65
Sze and Emer
WS Example: Nvidia DLA
• Single Data Point Operations
– Immediately after CNN
– Linear functions
• e.g. bias addition, precision scaling (before writing to memory), batch normalization, element-
wise operation
• Scaling: all values, per channel, per pixel
– Non-linear functions use LUT
• e.g. ReLU, PReLU, sigmod, tanh
• Planar Data Operations
– For pooling: Max, min, average
• Multi-Plane Operation
– Cross-channel processing for LRN
March 15, 2023
67. L12-67
Sze and Emer
Input Stationary
• Hold inputs stationary rather than outputs or weights
• Used for sparse CNNs [Parashar et al., SCNN, ISCA 2017]
– Sparse CNN is where many weights are zeros
– Advantage is the inputs are larger than weights thus require larger memory reduce
reads to larger memory
• Not analyzed for dense
March 15, 2023
68. L12-68
Sze and Emer
1-D Convolution – Output Stationary
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for q in [0, Q):
for s in [0, S):
w = q + s
o[q] += i[w]*f[s]
† Assuming: ‘valid’ style convolution
How can we
implement input
stationary with
no input index?
69. L12-69
Sze and Emer
1-D Convolution – Input Stationary
March 15, 2023
S
Weights
W
Inputs
Q = W-ceil(S/2)†
Outputs
* =
int i[W]; # Input activations
int f[S]; # Filter weights
int o[Q]; # Output activations
for w in [0, W):
for s in [0, S):
q = w - s
o[q] += i[w]*f[s]
† Assuming: ‘valid’ style convolution
Beware q must
be >= 0 and <Q
70. L12-70
Sze and Emer
1-D Convolution Einsum + IS
Traversal order (fastest to slowest): S, W
Note: w = q+s, so q = w-s, and therefore
71. L12-71
Sze and Emer
Variants of Output Stationary
# Output Channels
# Output Activations
P
Q
M
OSB
Multiple
Multiple
Notes
P
Q
M
OSA
Single
Multiple
Targeting
CONV layers
P
Q
M
OSC
Multiple
Single
Targeting
FC layers
Parallel
Output Region
72. L12-72
Sze and Emer
OS tilings
Parallel: P0, Q0
OSA
Parallel: M0, P0, Q0
OSB
Parallel: M0
OSC
73. L12-73
Sze and Emer
Energy Efficiency Comparison
0
0.5
1
1.5
2
NLR
WS OSA OSB OSC Row
Stationary
Normalized
Energy/MAC
CNN Dataflows
Variants of OS
• Same total area • 256 PEs
• AlexNet CONV layers • Batch size = 16
[Chen et al., ISCA 2016]
March 15, 2023
74. L12-74
Sze and Emer
Energy-Efficient Dataflow:
Row Stationary (RS)
• Maximize reuse and accumulation at RF
• Optimize for overall energy efficiency
instead for only a certain data type
[Chen et al., ISCA 2016]
March 15, 2023
75. L12-75
Sze and Emer
1D Row Convolution in PE
* =
Filter Partial Sums
a b c a b c
a b c d e
PE
Reg File
b a
c
d c
e a
b
Input Fmap
March 15, 2023
76. L12-76
Sze and Emer
1D Row Convolution in PE
* =
Filter
a b c a b c
a b c d e
e d
PE
b a
c
Reg File
b a
c
a
Partial Sums
Input Fmap
March 15, 2023
77. L12-77
Sze and Emer
1D Row Convolution in PE
* =
a b c
a b c d e Partial Sums
Input Fmap
PE
b a
c
Reg File
c b
d
b
e
a
Filter
a b c
March 15, 2023
78. L12-78
Sze and Emer
1D Row Convolution in PE
* =
a b c
a b c d e Partial Sums
Input Fmap
PE
b a
c
Reg File
d c
e
c
b a
Filter
a b c
March 15, 2023
79. L12-79
Sze and Emer
1D Row Convolution in PE
PE
b a
c
Reg File
d c
e
c
b a
• Maximize row convolutional reuse in RF
− Keep a filter row and fmap sliding window in RF
• Maximize row psum accumulation in RF
March 15, 2023
86. L12-86
Sze and Emer
Evaluate Reuse in Different Dataflows
• Weight Stationary
– Minimize movement of filter weights
• Output Stationary
– Minimize movement of partial sums
• No Local Reuse
– No PE local storage. Maximize global buffer size.
• Row Stationary
Evaluation Setup
• same total area
• 256 PEs
• AlexNet
• batch size = 16
ALU
Buffer ALU
RF ALU
Normalized Energy Cost*
200×
6×
PE ALU 2×
1×
1× (Reference)
DRAM ALU
March 15, 2023
87. L12-87
Sze and Emer
Dataflow Comparison: CONV Layers
Normalized
Energy/MAC
RS optimizes for the best overall energy efficiency
0
0.5
1
1.5
2
WS OSA OSB OSC NLR RS
CNN Dataflows
psums
weights
activations
[Chen et al., ISCA 2016]
88. L12-88
Sze and Emer
Dataflow Comparison: CONV Layers
RS uses 1.4× – 2.5× lower energy than other dataflows
Normalized
Energy/MAC
ALU
RF
NoC
buffer
DRAM
0
0.5
1
1.5
2
WS OSA OSB OSC NLR RS
CNN Dataflows
[Chen et al., ISCA 2016]
89. L12-89
Sze and Emer
Dataflow Comparison: FC Layers
0
0.5
1
1.5
2
Normalized
Energy/MAC
WS OSA OSB OSC NLR RS
CNN Dataflows
RS uses at least 1.3× lower energy than other dataflows
psums
weights
activations
[Chen et al., ISCA 2016]
90. L12-90
Sze and Emer
Row Stationary: Layer Breakdown
ALU
RF
NoC
buffer
DRAM
2.0e10
1.5e10
1.0e10
0.5e10
0
L1 L8
L2 L3 L4 L5 L6 L7
Normalized
Energy
(1 MAC = 1)
CONV Layers FC Layers
RF dominates DRAM dominates
[Chen et al., ISCA 2016]
91. L12-91
Sze and Emer
Row Stationary: Layer Breakdown
ALU
RF
NoC
buffer
DRAM
2.0e10
1.5e10
1.0e10
0.5e10
0
L1 L8
L2 L3 L4 L5 L6 L7
Normalized
Energy
(1 MAC = 1)
CONV Layers FC Layers
CONV layers dominate energy consumption!
Total Energy
80% 20%
93. L12-93
Sze and Emer
What is Data Orchestration?
Feeding data to a computation unit exactly when it wants it
When data is moved over a
transfer substrate
Where data is placed in available staging buffers
Who the “actors” are that touch
data and their synchronization
with each other
ML ASICs use workload knowledge to optimize orchestration at design-time without caches
How data is accessed (strides,
patterns, etc.), including when it
is no longer needed
March 15, 2023
94. L12-94
Sze and Emer
Buffer Hierarchies in DNN Accelerators
• Percentage of area devoted to on-chip buffers:
Desire to have a reusable, efficient storage hierarchy exists that is separate from any given accelerator
March 15, 2023
95. L12-95
Sze and Emer
Guiding Principles for Data Orchestration
Efficient reuse – small storage
physically close to consuming
units for reused data
Precise synchronization –
Only wait for exactly data you
need, respond quickly (e.g.,
no barriers or remote polling)
Delivery/use overlap - Next
tile should be available when
current is done (e.g., double-
buffering)
Storage usage efficiency –
Minimize idle storage waiting
for long round trip latency
Bandwidth efficiency - Maximize
delivery rate by controlling
outstanding requests
Cross-unit use – amortize
data access and
communication
March 15, 2023
98. L12-98
Sze and Emer
Explicit Decoupled Data Orchestration
Implicit + Decoupled Explicit + Decoupled
March 15, 2023
99. L12-99
Sze and Emer
Classifying Orchestration Approaches
implicit data
placement
decoupled
coupled
Load...
Math...
Store...
HW Pre-
fetching
ASIC-style fill
and iterate
engines
GPU
shared
memory
Cache-
level
specific
loads
SW-
controlled
replace.
policy
explicit data
placement
SW pre-
fetching
Decoupled
Access
Execute
March 15, 2023
100. L12-100
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
101. L12-101
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
102. L12-102
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
103. L12-103
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
104. L12-104
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
105. L12-105
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
106. L12-106
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
107. L12-107
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
108. L12-108
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
109. L12-109
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
110. L12-110
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
111. L12-111
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
112. L12-112
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
113. L12-113
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
114. L12-114
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
115. L12-115
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
116. L12-116
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
117. L12-117
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
118. L12-118
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
119. L12-119
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
120. L12-120
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
121. L12-121
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
122. L12-122
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
123. L12-123
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
124. L12-124
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
125. L12-125
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
126. L12-126
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
127. L12-127
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
128. L12-128
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
129. L12-129
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
130. L12-130
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
131. L12-131
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
132. L12-132
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
133. L12-133
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
134. L12-134
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
135. L12-135
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
136. L12-136
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
137. L12-137
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
138. L12-138
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
139. L12-139
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
140. L12-140
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
141. L12-141
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
142. L12-142
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
143. L12-143
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
144. L12-144
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
145. L12-145
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
146. L12-146
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
147. L12-147
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
148. L12-148
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
149. L12-149
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
150. L12-150
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
151. L12-151
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
152. L12-152
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
153. L12-153
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
154. L12-154
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
155. L12-155
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
156. L12-156
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
157. L12-157
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
158. L12-158
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
159. L12-159
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
160. L12-160
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
161. L12-161
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
162. L12-162
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
163. L12-163
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
164. L12-164
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
165. L12-165
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
166. L12-166
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
167. L12-167
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
168. L12-168
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
169. L12-169
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
170. L12-170
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
171. L12-171
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
172. L12-172
Sze and Emer
Buffering Strategies
March 15, 2023
Fill and use:
Buffer Storage
Drain values
Fill values
Double buffer
Buffer Storage
Drain values
Fill values
Rolling use
Buffer Storage
Drain values
Fill values
173. L12-173
Sze and Emer
Buffets
• Compared to FIFO
– Allows random access into live window
– Allows updates of values in live window
• Compared to scratchpad:
– Adds scoreboarding for synchronization
– Allows arbitrary degrees of buffering
• Compared to cache
– Addresses local (therefore fewer bits) and no tag store
– Push model versus pull model for smaller landing zone
– Raises level of abstraction beyond single value transfers
Buffet
Fill()
from
memory
Read() Update() Shrink()
local_offset data local_offset data
data
num to
datapath
Can be specialized, for example: “read-only” buffets that have fills but not updates
March 15, 2023
174. L12-174
Sze and Emer
Buffet Usage Model
March 15, 2023
Buffet
Storage
Read sequence
Read values
Update sequence
Update values
Fill values
Fill sequence
175. L12-175
Sze and Emer
Buffet Behavioral Attributes
• Based on ‘fill’ address sequence, will pop values from ‘fill’ LI channel
until fill buffer is full.
• Based on ‘read’ address sequence will try to push values down ‘read’
LI channel, but only if the value has been ‘filled’.
• ‘Read’ address sequence can also inform buffer that a value can be
dropped, i.e., space freed.
• Based on ‘update’ address sequence will try to pop values from
‘update’ LI channel
• Implementation may include multiple logical buffers inside a single
physical buffer.
March 15, 2023
176. L12-176
Sze and Emer
Buffets: Composable Idiom for E.D.D.O.
Transfers between levels only depend on a credit
flow from the adjacent level.
March 15, 2023
179. L12-179
Sze and Emer
Latency-Insensitive (LI) Channels
March 13, 2023
IsFull()
Push()
PushNB()
IsEmpty()
Peek()
Pop()
PopNB()
OUT IN
Connections
Producer Consumer
Credit: Pellauer
Inter-unit communication can use Latency-Insensitive
(LI) Channels that, for functionality purposes, have no
latency requirements
Semantics: FIFO abstraction with flow control on all interfaces