L08.pdf

L08-1
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Vectorized Kernel Computation
(continued)
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 1, 2023

L08-2
Sze and Emer
Sze and Emer
ISA Datatypes
FP32
FP16
Int32
Int16
Int8
S E M
1 8 23
S E M
1 5 10
M
31
S
S M
1
1 15
S M
1 7
Range Accuracy
10-38 – 1038 .000006%
6x10-5 - 6x104 .05%
0 – 2x109 ½
0 – 6x104 ½
0 – 127 ½
Image Source: B. Dally
March 1, 2023

L08-3
Sze and Emer
Sze and Emer
Intel – MMX/SSE/AVX
Width Int8 Int16 Int 32 Int64 FP16 FP32 FP64 Features
MMX 64 8 4 2 1
SSE 128 4
SSE2 128 16 8 4 2 4 2
SSE3 128 16 8 4 2 4 2 R
AVX 256 32 16 8 4 16 8 4
AVX2 256 32 16 8 4 16 8 4 GUMR
AVX3 512 64 32 16 8 ? 16 8 GUMRP
March 1, 2023
G: gather
U: unaligned
M: MAC
R: reductions/permutations
P: Predicate masks
Source: Myriad non-authoritative sources on web

L08-4
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Computational Transforms
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 1, 2023

L08-5
Sze and Emer
Sze and Emer
FC – Vector Operation Counts
March 1, 2023
// Level 2 loops
for m in [0, M):
for chw1 in [0, C*H*W/L):
// Level 1 loops
parallel_for chw0 in [0, L):
o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0];
L == Lanes
Vector operation
on L lanes
Order: m, chw1, chw0 Factors: M, C*H*W/L, L

L08-6
Sze and Emer
Sze and Emer
FC – Vector Operation Counts
March 1, 2023
How many MACs? M *(C*H*W/L) * L = M*C*H*W
How many reads of “inputs” C*H*W*M
How many reads of “weights” C*H*W*M
How many writes of “outputs” M
Measuring reads/writes in
units of 32-bit integers
// Level 2 loops
for m in [0, M):
for chw1 in [0, C*H*W/L):
// Level 1 loops
L == Lanes

L08-7
Sze and Emer
Sze and Emer
Compute Intensity (MACs/Read)
March 1, 2023
MACs/Read? C∗H∗W∗M
C∗H∗W∗M C∗H∗W∗M
~
1
2
If system can support 1 Read/MAC
will system run at full throttle?
No
// Level 2 loops
for m in [0, M):
for chw2 in [0, C*H*W/L:
// Level 1 loops
L == Lanes

L08-8
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
8 MAC Lanes with 1 Read/MAC .
March 1, 2023
MACs/cycle limited by
number of lanes
Williams, Samuel, Andrew Waterman, and David Patterson. "Roofline: an insightful visual performance model for multicore architectures."
Communications of the ACM 52.4 (2009): 65-76.

L08-9
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
8 MAC Lanes with 1 Read/MAC
March 1, 2023
Where will the previous slide’s code be? Compute intensity = 1/2
number of lanes
MACs/cycle limited
by operand delivery

L08-10
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
8 MAC Lanes with 1 Read/MAC
March 1, 2023
Where will the previous slide’s code be? Compute intensity = 1/2
o
How can we change the compute intensity? Code changes, e.g., different splitting,
loop inversions?
number of lanes
MACs/cycle limited
by operand delivery

L08-11
Sze and Emer
Sze and Emer
FC – Splitting/Reordered
// Level 2 loops
for m1 in [0, M/L):
for chw in [0, C*H*W):
// Level 1 loops
parallel_for m0 in [0, L):
o[m1*L+m1] += i[chw] * f[CHW*(m1*L+m0) + chw]
L == Lanes
Partiion
“m”
Partition
“m”
Order: m1, chw, m0 Factors: M/L, C*H*W, L

L08-12
Sze and Emer
Sze and Emer
FC – Reordered + loop invariant hoisted
March 1, 2023
// Level 2 loops
for m2 in [0, M/L):
// Level 1 loops
parallel_for m1 in [0, L):
o[m2*L+m1] += i[chw] * f[CHW*(m2*L+m1) + chw]
// Level 2 loops
for m2 in [0, M/L):
i_chw = i[chw]
// Level 1 loops
o[m2*L+m1] += i_chw * f[CHW*(m2*L+m1) + chw]
L == Lanes
L == Lanes

L08-13
Sze and Emer
Sze and Emer
FC – Operation Counts
March 1, 2023
How many MACs? C*H*W*M
How many reads of “inputs” C*H*W*M/L
How many reads of “weights” C*H*W*M
How many writes of “outputs” M
Measuring reads/writes in units of 32-bit integers
// Level 2 loops
for m1 in [0, M/L):
i_chw = i[chw]
// Level 1 loops
L == Lanes

L08-14
Sze and Emer
Sze and Emer
Compute Intensity
March 1, 2023
MACs/Read? C∗H∗W∗M
C∗H∗W∗M C∗H∗W∗M/L
~
𝐿
1 + 𝐿
If system can support 1 Read/MAC
will system run at full throttle?
No
// Level 2 loops
for m1 in [0, M/L):
i_chw = i[chw]
// Level 1 loops

L08-15
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
1 MAC/Read - 8 lanes
March 1, 2023
Where will the previous slide’s code be? Compute intensity = L/(1+L) = 8/9
Why might points be below the line? Other overheads (e.g. instructions, stalls)
x
o
Is being on the flat part always best? Not necessarily…

L08-16
Sze and Emer
Sze and Emer
Computation Transformations
• Goal: Bitwise same result, but reduce
number of operations
• Focuses mostly on compute
March 1, 2023

L08-17
Sze and Emer
Sze and Emer
Gauss’s Multiplication Algorithm
4 multiplications + 3 additions
March 1, 2023

L08-18
Sze and Emer
Sze and Emer
Strassen
P1 = a(f – h)
P2 = (a + b)h
P3 = (c + d)e
P4 = d(g – e)
P5 = (a + d)(e + h)
P6 = (b - d)(g + h)
P7 = (a – c)(e + f)
7 multiplications + 13 additions (for constant B matrix – weights)
[Cong et al., ICANN, 2014]
March 1, 2023

L08-19
Sze and Emer
Sze and Emer
Strassen
Comes at the price of reduced numerical stability and
requires significantly more memory
N
Naïve
Strassen
Complexity
Image Source: http://www.stoimen.com/blog/2012/11/26/computer-algorithms-strassens-matrix-multiplication/
• Reduce the complexity of matrix multiplication
from Θ(N3) to Θ(N2.807) by reducing
multiplications
March 1, 2023

L08-20
Sze and Emer
Sze and Emer
Python to C++ Chart
March 1, 2023
[Leiserson, There’s plenty of room at the top, Science, 2020]

L08-21
Sze and Emer
Sze and Emer
Tensor Computations
March 1, 2023
Matrix Multiply
CONV Layer

L08-22
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
…
M
…
Many
Input fmaps (N) Many
Output fmaps (N)
…
R
S
R
S
C
C
filters
P
Q
H
C
H
W
C
P
1 1
N
N
W Q
March 1, 2023
Batch Size (N)

L08-23
Sze and Emer
Sze and Emer
CONV Layer Implementation
Naïve 7-layer for-loop implementation:
for n in [0..N):
for m in [0..M):
for q in [0..Q):
for p in [0..P):
O[n][m][p][q] = B[m];
for c in [0..C):
for r in [0..R):
for s in [0..S):
O[n][m][p][q] += I[n][c][Up+r][Uq+s]
× F[m][c][r][s];
O[n][m][p][q] = Activation(O[n][m][p][q]);
for each output fmap value
convolve
a
window
and
apply
activatio
n
March 1, 2023

L08-24
Sze and Emer
Sze and Emer
Winograd 1D – F(2,3)
• Targeting convolutions instead of matrix multiply
• Notation: F(size of output, filter size)
[Lavin et al., CVPR 2016]
filter
March 1, 2023
inputs outputs
F(2,3) =

L08-25
Sze and Emer
Sze and Emer
Winograd 1D – F(2,3)
• Targeting convolutions instead of matrix multiply
• Notation: F(size of output, filter size)
[Lavin et al., CVPR 2016]
filter
March 1, 2023
inputs outputs
F(2,3) =
4 multiplications + 12 additions + 2 shifts
4 multiplications + 8 additions (for constant weights)

L08-26
Sze and Emer
Sze and Emer
Winograd 2D - F(2x2, 3x3)
• 1D Winograd is nested to make 2D Winograd
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
Winograd: 16 multiplications  2.25 times reduction
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Original: 36 multiplications
Filter Input Fmap Output Fmap
* =
March 1, 2023

L08-27
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
March 1, 2023

L08-28
Sze and Emer
Sze and Emer
Winograd Halos
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
March 1, 2023

L08-29
Sze and Emer
Sze and Emer
Winograd Halos
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
March 1, 2023

L08-30
Sze and Emer
Sze and Emer
Winograd Halos
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
o02 o03
o12 o12
March 1, 2023

L08-31
Sze and Emer
Sze and Emer
Winograd Halos
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
o02 o03
o12 o12
Halo columns
March 1, 2023

L08-32
Sze and Emer
Sze and Emer
Winograd Performance Varies
Source: Nvidia
March 1, 2023

L08-33
Sze and Emer
Sze and Emer
Winograd Summary
• Winograd is an optimized computation for
convolutions
• It can significantly reduce multiplies
– For example, for 3x3 filter by 2.25X
• But, each filter size (and output size) is a
different computation.
March 1, 2023

L08-34
Sze and Emer
Sze and Emer
Winograd as a Transform
Transform inputs
Element-wise
Multiplication
Transform output
[Lavin, ArXiv 2015]
Note: GfGT can be precomputed
March 1, 2023
filter
input

L08-35
Sze and Emer
Sze and Emer
R
filter (weights)
S
Fast Fourier Transform (FFT) Flow
E
F
input fmap output fmap
H
W
an output
activation
* =
FFT(W)
F
F
T
FFT(I)
X = FFT(O)
F
F
T
I
F
F
T
March 1, 2023

L08-36
Sze and Emer
Sze and Emer
FFT Overview
• Convert filter and input to frequency domain to make convolution
a simple multiply then convert back to space domain.
• Convert direct convolution O(No
2Nf
2) computation to O(No
2log2No)
• Note that computational benefit of FFT decreases with decreasing
size of filter
March 1, 2023
[Mathieu, ArXiv 2013], [Vasilache, ArXiv 2014]

L08-37
Sze and Emer
Sze and Emer
FFT Costs
• Input and Filter matrices are ‘0-completed’,
– i.e., expanded to size P+R-1 x Q+S-1
• Frequency domain matrices are same dimensions as
input, but complex.
• FFT often reduces computation, but requires much more
memory space and bandwidth
March 1, 2023

L08-38
Sze and Emer
Sze and Emer
Optimization opportunities
• FFT of real matrix is symmetric allowing one to save ½ the computes
• Filters can be pre-computed and stored, but convolutional filter in
frequency domain is much larger than in space domain
• Can reuse frequency domain version of input for creating different
output channels to avoid FFT re-computations
• Can accumulate across channels before performing inverse
transform to reduce number of IFFT
March 1, 2023

L08-39
Sze and Emer
Sze and Emer
cuDNN: Speed up with Transformations
Source: Nvidia
March 1, 2023

L08-40
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
e f
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]

L08-41
Sze and Emer
Sze and Emer
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
* =
b a
g h
7 additions
8 multiplications
[Hegde, ISCA 2018]

L08-42
Sze and Emer
Sze and Emer
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]

L08-43
Sze and Emer
Sze and Emer
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
* =
b a
g h
𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]

L08-44
Sze and Emer
Sze and Emer
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
* =
b a
g h
𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖
6 additions
6 multiplications
7 additions
8 multiplications
[Hegde, ISCA 2018]

L08-45
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
March 1, 2023

L08-46
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
March 1, 2023

L08-47
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Filtering steps
March 1, 2023

L08-48
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
March 1, 2023

L08-49
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
2D dot Product
March 1, 2023

L08-50
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
2D dot Product
March 1, 2023

L08-51
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
Filtering steps
2D dot Product
March 1, 2023

L08-52
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
Filtering steps
2D dot Product
March 1, 2023

L08-53
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
Filtering steps
2D dot Product
March 1, 2023

L08-54
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
Filtering steps
2D dot Product
March 1, 2023

L08-55
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Filtering steps
2D dot Product
March 1, 2023

L08-56
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023

L08-57
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023

L08-58
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023

L08-59
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023

L08-60
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
March 1, 2023

L08-61
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023

L08-62
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023

L08-63
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023

L08-64
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
2
3
5
6
1 2 3 4 ● = 2
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023

L08-65
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
2
3
5
6
1 2 3 4 ● = 2
Flattened
Filtering steps
.
.
.
2D dot Product
1D dot
Product
March 1, 2023

L08-66
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Convolution:
Flattened
March 1, 2023

L08-67
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Convolution:
Flattened
March 1, 2023

L08-68
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1
March 1, 2023

L08-69
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1
March 1, 2023

L08-70
Sze and Emer
Sze and Emer
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023

L08-71
Sze and Emer
Sze and Emer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Convolution:
March 1, 2023

L08-72
Sze and Emer
Sze and Emer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023

L08-73
Sze and Emer
Sze and Emer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023

L08-74
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023

L08-75
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4
1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023

L08-76
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023

L08-77
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1
2
March 1, 2023

L08-78
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4
2
3
5
6
1
2
March 1, 2023

L08-79
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1
2
March 1, 2023

L08-80
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023

L08-81
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
1 2
March 1, 2023

L08-82
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
1 2 3
March 1, 2023

L08-83
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3
March 1, 2023

L08-84
Sze and Emer
Sze and Emer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
March 1, 2023

L08-85
Sze and Emer
Sze and Emer
1 2 3 4
1 2 3 4
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
March 1, 2023

L08-86
Sze and Emer
Sze and Emer
Convert to matrix multiply using the Toeplitz Matrix
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
1 2 3 4
1 2 4 5
2 3 5 6
4 5 7 8
5 6 8 9
1 2 3 4 × =
Convolution:
March 1, 2023

L08-87
Sze and Emer
Sze and Emer
1 2 3 4
1 2 4 5
2 3 5 6
4 5 7 8
5 6 8 9
1 2 3 4 × =
Data is repeated
1 2 3
4 5 6
7 8 9
1 2
3 4
* =
1 2
3 4
Convolution:
March 1, 2023

L08-88
Sze and Emer
Sze and Emer
• Multiple Input Channels
1 2
3 4
Filter 1
Input Fmap
Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Output Fmap
Key:
Black: Input channel 1
Yellow: Input channel 2
C
Filter
C
March 1, 2023

L08-89
Sze and Emer
Sze and Emer
1 2
3 4
Filter 1
Input Fmap
Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Output Fmap
Key:
C
Filter
C
3D Stacks…
March 1, 2023

L08-90
Sze and Emer
Sze and Emer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Output Fmap
Filter
March 1, 2023

L08-91
Sze and Emer
Sze and Emer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023

L08-92
Sze and Emer
Sze and Emer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023

L08-93
Sze and Emer
Sze and Emer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023

L08-94
Sze and Emer
Sze and Emer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023

L08-95
Sze and Emer
Sze and Emer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023

L08-96
Sze and Emer
Sze and Emer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023

L08-97
Sze and Emer
Sze and Emer
• Multiple Input Channels and Output Channels
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Key:
Underlined: Output
channel 2
Input Fmap Output Fmap
C
Filter
C
March 1, 2023

L08-98
Sze and Emer
Sze and Emer
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
1 2
3 4
Filter 2
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2
3 4
1 2 3
4 5 6
7 8 9
Key:
Underlined: Output
channel 2
C
Filter
C
March 1, 2023

L08-99
Sze and Emer
Sze and Emer
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
1 2
3 4
Filter 2
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2
3 4
1 2 3
4 5 6
7 8 9
1 2
3 4
Chnl 2
Key:
Underlined: Output
channel 2
C
Filter
C
March 1, 2023

L08-100
Sze and Emer
Sze and Emer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
1 2 3 4 1 2 3 4
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023

L08-101
Sze and Emer
Sze and Emer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
Chnl 1 Chnl 2
Filter 1
Filter 2
Chnl 1
Chnl 2
Chnl 1
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023

L08-102
Sze and Emer
Sze and Emer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4 3
1 2 4
Chnl 1 Chnl 2
Filter 1
Filter 2
Chnl 1
Chnl 2
Chnl 1
Chnl 2
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023

L08-103
Sze and Emer
Sze and Emer
M
CRS
CRS
(H-R+1)(W-S+1)N
Filters Input fmaps
×
Output fmaps
M
=
(H-R+1)(W-S+1)N
• Dimensions of matrices for matrix multiply in
convolution layers with batch size N
March 1, 2023 N=2 in example

L08-104
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Simplify to 1-D with N=1, C=1, M=1, U=1
Break into two steps

L08-105
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 4 5 6
3 4 5
2 3 4
1 2 3

L08-106
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3
4 5 6
3 4 5
2 3 4
1 2 3

L08-107
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1 2 3

L08-108
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1 2 3

L08-109
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1
2
3

L08-110
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x =
4 5 6
3 4 5
2 3 4
1
2
3

L08-111
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2 3 4
1
2
3

L08-112
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2 3 4
1
2
3

L08-113
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2
3
4
1
2
3

L08-114
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3 4 5
2
3
4
1
2
3

L08-115
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3 4 5
2
3
4
1
2
3

L08-116
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3
4
5
2
3
4
1
2
3

L08-117
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1 2 3
=
4 5 6
3
4
5
2
3
4
1
2
3

L08-118
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1 2 3
=
4
5
6
3
4
5
2
3
4
1
2
3

L08-119
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3

L08-120
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
1 2 3
* = 1 2 3 4
1 2 3 x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3

L08-121
Sze and Emer
Sze and Emer
March 1, 2023
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3

L08-122
Sze and Emer
Sze and Emer
March 1, 2023
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3

L08-123
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5

L08-124
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2

L08-125
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2

L08-126
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1

L08-127
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2

L08-128
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3

L08-129
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2

L08-130
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3

L08-131
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3

L08-132
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3

L08-133
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4

L08-134
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
5 6
4
5

L08-135
Sze and Emer
Sze and Emer
March 1, 2023
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3

L08-136
Sze and Emer
Sze and Emer
March 1, 2023
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3

L08-137
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5

L08-138
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2

L08-139
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
q s q+s
0 0 0

L08-140
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
q s q+s
0 0 0

L08-141
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
q s q+s
0 0 0
0 1 1

L08-142
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
q s q+s
0 0 0
0 1 1

L08-143
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
q s q+s
0 0 0
0 1 1
0 2 2

L08-144
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
q s q+s
0 0 0
0 1 1
0 2 2

L08-145
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1

L08-146
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1

L08-147
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2

L08-148
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2

L08-149
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3

L08-150
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3

L08-151
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2

L08-152
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2

L08-153
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3

L08-154
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3

L08-155
Sze and Emer
Sze and Emer
March 1, 2023
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
5 6
4
5
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3

L08-156
Sze and Emer
Sze and Emer
March 1, 2023

L08-157
Sze and Emer
Sze and Emer
March 1, 2023
Break out Toeplitz conversion
Flatten ranks

L08-158
Sze and Emer
Sze and Emer
Next Lecture: GPUs
Thank you!
March 1, 2023

L08.pdf

Recommended

Recommended

More Related Content

Similar to L08.pdf

Similar to L08.pdf (20)

More from TRNHONGLINHBCHCM

More from TRNHONGLINHBCHCM (12)

Recently uploaded

Recently uploaded (20)

L08.pdf