SlideShare a Scribd company logo
1 of 158
Download to read offline
L08-1
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Vectorized Kernel Computation
(continued)
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 1, 2023
L08-2
Sze and Emer
Sze and Emer
ISA Datatypes
FP32
FP16
Int32
Int16
Int8
S E M
1 8 23
S E M
1 5 10
M
31
S
S M
1
1 15
S M
1 7
Range Accuracy
10-38 – 1038 .000006%
6x10-5 - 6x104 .05%
0 – 2x109 ½
0 – 6x104 ½
0 – 127 ½
Image Source: B. Dally
March 1, 2023
L08-3
Sze and Emer
Sze and Emer
Intel – MMX/SSE/AVX
Width Int8 Int16 Int 32 Int64 FP16 FP32 FP64 Features
MMX 64 8 4 2 1
SSE 128 4
SSE2 128 16 8 4 2 4 2
SSE3 128 16 8 4 2 4 2 R
AVX 256 32 16 8 4 16 8 4
AVX2 256 32 16 8 4 16 8 4 GUMR
AVX3 512 64 32 16 8 ? 16 8 GUMRP
March 1, 2023
G: gather
U: unaligned
M: MAC
R: reductions/permutations
P: Predicate masks
Source: Myriad non-authoritative sources on web
L08-4
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Computational Transforms
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 1, 2023
L08-5
Sze and Emer
Sze and Emer
FC – Vector Operation Counts
March 1, 2023
// Level 2 loops
for m in [0, M):
for chw1 in [0, C*H*W/L):
// Level 1 loops
parallel_for chw0 in [0, L):
o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0];
L == Lanes
Vector operation
on L lanes
Order: m, chw1, chw0 Factors: M, C*H*W/L, L
L08-6
Sze and Emer
Sze and Emer
FC – Vector Operation Counts
March 1, 2023
How many MACs?
How many reads of “inputs”
How many reads of “weights”
How many writes of “outputs”
Measuring reads/writes in
units of 32-bit integers
// Level 2 loops
for m in [0, M):
for chw1 in [0, C*H*W/L):
// Level 1 loops
parallel_for chw0 in [0, L):
o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0];
L == Lanes
L08-7
Sze and Emer
Sze and Emer
Compute Intensity (MACs/Read)
March 1, 2023
MACs/Read? C∗H∗W∗M
C∗H∗W∗M C∗H∗W∗M
~
1
2
If system can support 1 Read/MAC
will system run at full throttle?
// Level 2 loops
for m in [0, M):
for chw2 in [0, C*H*W/L:
// Level 1 loops
parallel_for chw1 in [0, L):
o[m] += i[L*chw2+chw1] * f[C*H*W*m + L*chw2+chw1];
L == Lanes
L08-8
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
8 MAC Lanes with 1 Read/MAC .
March 1, 2023
MACs/cycle limited by
number of lanes
Williams, Samuel, Andrew Waterman, and David Patterson. "Roofline: an insightful visual performance model for multicore architectures."
Communications of the ACM 52.4 (2009): 65-76.
L08-9
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
8 MAC Lanes with 1 Read/MAC
March 1, 2023
Where will the previous slide’s code be?
MACs/cycle limited by
number of lanes
MACs/cycle limited
by operand delivery
L08-10
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
8 MAC Lanes with 1 Read/MAC
March 1, 2023
Where will the previous slide’s code be?
o
How can we change the compute intensity?
MACs/cycle limited by
number of lanes
MACs/cycle limited
by operand delivery
L08-11
Sze and Emer
Sze and Emer
FC – Splitting/Reordered
// Level 2 loops
for m1 in [0, M/L):
for chw in [0, C*H*W):
// Level 1 loops
parallel_for m0 in [0, L):
o[m1*L+m1] += i[chw] * f[CHW*(m1*L+m0) + chw]
L == Lanes
Partiion
“m”
Partition
“m”
Order: m1, chw, m0 Factors: M/L, C*H*W, L
L08-12
Sze and Emer
Sze and Emer
FC – Reordered + loop invariant hoisted
March 1, 2023
// Level 2 loops
for m2 in [0, M/L):
for chw in [0, C*H*W):
// Level 1 loops
parallel_for m1 in [0, L):
o[m2*L+m1] += i[chw] * f[CHW*(m2*L+m1) + chw]
// Level 2 loops
for m2 in [0, M/L):
for chw in [0, C*H*W):
i_chw = i[chw]
// Level 1 loops
for chw in [0, C*H*W):
o[m2*L+m1] += i_chw * f[CHW*(m2*L+m1) + chw]
L == Lanes
L == Lanes
L08-13
Sze and Emer
Sze and Emer
FC – Operation Counts
March 1, 2023
How many MACs?
How many reads of “inputs”
How many reads of “weights”
How many writes of “outputs”
Measuring reads/writes in units of 32-bit integers
// Level 2 loops
for m1 in [0, M/L):
for chw in [0, C*H*W):
i_chw = i[chw]
// Level 1 loops
for chw in [0, C*H*W):
o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw]
L == Lanes
L08-14
Sze and Emer
Sze and Emer
Compute Intensity
March 1, 2023
MACs/Read? C∗H∗W∗M
C∗H∗W∗M C∗H∗W∗M/L
~
𝐿
1 + 𝐿
If system can support 1 Read/MAC
will system run at full throttle?
// Level 2 loops
for m1 in [0, M/L):
for chw in [0, C*H*W):
i_chw = i[chw]
// Level 1 loops
for chw in [0, C*H*W):
o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw]
L08-15
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
1 MAC/Read - 8 lanes
March 1, 2023
Where will the previous slide’s code be?
Why might points be below the line?
x
o
Is being on the flat part always best?
L08-16
Sze and Emer
Sze and Emer
Computation Transformations
• Goal: Bitwise same result, but reduce
number of operations
• Focuses mostly on compute
March 1, 2023
L08-17
Sze and Emer
Sze and Emer
Gauss’s Multiplication Algorithm
4 multiplications + 3 additions
March 1, 2023
L08-18
Sze and Emer
Sze and Emer
Strassen
P1 = a(f – h)
P2 = (a + b)h
P3 = (c + d)e
P4 = d(g – e)
P5 = (a + d)(e + h)
P6 = (b - d)(g + h)
P7 = (a – c)(e + f)
8 multiplications + 4 additions
7 multiplications + 18 additions
7 multiplications + 13 additions (for constant B matrix – weights)
[Cong et al., ICANN, 2014]
March 1, 2023
L08-19
Sze and Emer
Sze and Emer
Strassen
Comes at the price of reduced numerical stability and
requires significantly more memory
N
Naïve
Strassen
Complexity
Image Source: http://www.stoimen.com/blog/2012/11/26/computer-algorithms-strassens-matrix-multiplication/
• Reduce the complexity of matrix multiplication
from Θ(N3) to Θ(N2.807) by reducing
multiplications
March 1, 2023
L08-20
Sze and Emer
Sze and Emer
Python to C++ Chart
March 1, 2023
[Leiserson, There’s plenty of room at the top, Science, 2020]
L08-21
Sze and Emer
Sze and Emer
Tensor Computations
March 1, 2023
Matrix Multiply
CONV Layer
L08-22
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
…
M
…
Many
Input fmaps (N) Many
Output fmaps (N)
…
R
S
R
S
C
C
filters
P
Q
H
C
H
W
C
P
1 1
N
N
W Q
March 1, 2023
Batch Size (N)
L08-23
Sze and Emer
Sze and Emer
CONV Layer Implementation
Naïve 7-layer for-loop implementation:
for n in [0..N):
for m in [0..M):
for q in [0..Q):
for p in [0..P):
O[n][m][p][q] = B[m];
for c in [0..C):
for r in [0..R):
for s in [0..S):
O[n][m][p][q] += I[n][c][Up+r][Uq+s]
× F[m][c][r][s];
O[n][m][p][q] = Activation(O[n][m][p][q]);
for each output fmap value
convolve
a
window
and
apply
activatio
n
March 1, 2023
L08-24
Sze and Emer
Sze and Emer
Winograd 1D – F(2,3)
• Targeting convolutions instead of matrix multiply
• Notation: F(size of output, filter size)
[Lavin et al., CVPR 2016]
6 multiplications + 4 additions
filter
March 1, 2023
inputs outputs
F(2,3) =
L08-25
Sze and Emer
Sze and Emer
Winograd 1D – F(2,3)
• Targeting convolutions instead of matrix multiply
• Notation: F(size of output, filter size)
[Lavin et al., CVPR 2016]
filter
March 1, 2023
inputs outputs
F(2,3) =
4 multiplications + 12 additions + 2 shifts
4 multiplications + 8 additions (for constant weights)
L08-26
Sze and Emer
Sze and Emer
Winograd 2D - F(2x2, 3x3)
• 1D Winograd is nested to make 2D Winograd
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
Winograd: 16 multiplications  2.25 times reduction
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Original: 36 multiplications
Filter Input Fmap Output Fmap
* =
March 1, 2023
L08-27
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
Filter Input Fmap Output Fmap
March 1, 2023
L08-28
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
March 1, 2023
L08-29
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
March 1, 2023
L08-30
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
o02 o03
o12 o12
March 1, 2023
L08-31
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
o02 o03
o12 o12
Halo columns
March 1, 2023
L08-32
Sze and Emer
Sze and Emer
Winograd Performance Varies
Source: Nvidia
March 1, 2023
L08-33
Sze and Emer
Sze and Emer
Winograd Summary
• Winograd is an optimized computation for
convolutions
• It can significantly reduce multiplies
– For example, for 3x3 filter by 2.25X
• But, each filter size (and output size) is a
different computation.
March 1, 2023
L08-34
Sze and Emer
Sze and Emer
Winograd as a Transform
Transform inputs
Element-wise
Multiplication
Transform output
[Lavin, ArXiv 2015]
Note: GfGT can be precomputed
March 1, 2023
filter
input
L08-35
Sze and Emer
Sze and Emer
R
filter (weights)
S
Fast Fourier Transform (FFT) Flow
E
F
input fmap output fmap
H
W
an output
activation
* =
FFT(W)
F
F
T
FFT(I)
X = FFT(O)
F
F
T
I
F
F
T
March 1, 2023
L08-36
Sze and Emer
Sze and Emer
FFT Overview
• Convert filter and input to frequency domain to make convolution
a simple multiply then convert back to space domain.
• Convert direct convolution O(No
2Nf
2) computation to O(No
2log2No)
• Note that computational benefit of FFT decreases with decreasing
size of filter
March 1, 2023
[Mathieu, ArXiv 2013], [Vasilache, ArXiv 2014]
L08-37
Sze and Emer
Sze and Emer
FFT Costs
• Input and Filter matrices are ‘0-completed’,
– i.e., expanded to size P+R-1 x Q+S-1
• Frequency domain matrices are same dimensions as
input, but complex.
• FFT often reduces computation, but requires much more
memory space and bandwidth
March 1, 2023
L08-38
Sze and Emer
Sze and Emer
Optimization opportunities
• FFT of real matrix is symmetric allowing one to save ½ the computes
• Filters can be pre-computed and stored, but convolutional filter in
frequency domain is much larger than in space domain
• Can reuse frequency domain version of input for creating different
output channels to avoid FFT re-computations
• Can accumulate across channels before performing inverse
transform to reduce number of IFFT
March 1, 2023
L08-39
Sze and Emer
Sze and Emer
cuDNN: Speed up with Transformations
Source: Nvidia
March 1, 2023
L08-40
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
e f
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-41
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-42
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-43
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-44
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖
6 additions
6 multiplications
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-45
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
March 1, 2023
L08-46
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
March 1, 2023
L08-47
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Filtering steps
March 1, 2023
L08-48
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
March 1, 2023
L08-49
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
2D dot Product
March 1, 2023
L08-50
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
2D dot Product
March 1, 2023
L08-51
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
Filtering steps
2D dot Product
March 1, 2023
L08-52
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
Filtering steps
2D dot Product
March 1, 2023
L08-53
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
Filtering steps
2D dot Product
March 1, 2023
L08-54
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
Filtering steps
2D dot Product
March 1, 2023
L08-55
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Filtering steps
2D dot Product
March 1, 2023
L08-56
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-57
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-58
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-59
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-60
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-61
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-62
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-63
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-64
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
2
3
5
6
1 2 3 4 ● = 2
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-65
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
2
3
5
6
1 2 3 4 ● = 2
Flattened
Filtering steps
.
.
.
2D dot Product
1D dot
Product
March 1, 2023
L08-66
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
March 1, 2023
L08-67
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
March 1, 2023
L08-68
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1
March 1, 2023
L08-69
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1
March 1, 2023
L08-70
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023
L08-71
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
March 1, 2023
L08-72
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023
L08-73
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023
L08-74
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-75
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4
1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-76
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-77
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1
2
March 1, 2023
L08-78
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4
2
3
5
6
1
2
March 1, 2023
L08-79
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1
2
March 1, 2023
L08-80
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-81
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
1 2
March 1, 2023
L08-82
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
1 2 3
March 1, 2023
L08-83
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3
March 1, 2023
L08-84
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
March 1, 2023
L08-85
Sze and Emer
Sze and Emer
1 2 3 4
Convolution (CONV) Layer
1 2 3 4
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
March 1, 2023
L08-86
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Convert to matrix multiply using the Toeplitz Matrix
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2 3 4
1 2 4 5
2 3 5 6
4 5 7 8
5 6 8 9
1 2 3 4 × =
Convolution:
Matrix Multiply (by Toeplitz Matrix)
March 1, 2023
L08-87
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3 4
1 2 4 5
2 3 5 6
4 5 7 8
5 6 8 9
1 2 3 4 × =
Data is repeated
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Matrix Multiply (by Toeplitz Matrix)
March 1, 2023
L08-88
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels
1 2
3 4
Filter 1
Input Fmap
Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Output Fmap
Key:
Black: Input channel 1
Yellow: Input channel 2
C
Filter
C
March 1, 2023
L08-89
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels
1 2
3 4
Filter 1
Input Fmap
Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Output Fmap
Key:
Black: Input channel 1
Yellow: Input channel 2
C
Filter
C
3D Stacks…
March 1, 2023
L08-90
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Output Fmap
Filter
March 1, 2023
L08-91
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-92
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-93
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-94
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-95
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-96
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-97
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels and Output Channels
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Key:
Black: Input channel 1
Yellow: Input channel 2
Underlined: Output
channel 2
Input Fmap Output Fmap
C
Filter
C
March 1, 2023
L08-98
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels and Output Channels
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
1 2
3 4
Filter 2
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2
3 4
1 2 3
4 5 6
7 8 9
Key:
Black: Input channel 1
Yellow: Input channel 2
Underlined: Output
channel 2
Input Fmap Output Fmap
C
Filter
C
March 1, 2023
L08-99
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels and Output Channels
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
1 2
3 4
Filter 2
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2
3 4
1 2 3
4 5 6
7 8 9
1 2
3 4
Chnl 2
Key:
Black: Input channel 1
Yellow: Input channel 2
Underlined: Output
channel 2
Input Fmap Output Fmap
C
Filter
C
March 1, 2023
L08-100
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
• Multiple Input Channels and Output Channels
1 2 3 4 1 2 3 4
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023
L08-101
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
• Multiple Input Channels and Output Channels
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
Chnl 1 Chnl 2
Filter 1
Filter 2
Chnl 1
Chnl 2
Chnl 1
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023
L08-102
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
• Multiple Input Channels and Output Channels
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4 3
1 2 4
Chnl 1 Chnl 2
Filter 1
Filter 2
Chnl 1
Chnl 2
Chnl 1
Chnl 2
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023
L08-103
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
M
CRS
CRS
(H-R+1)(W-S+1)N
Filters Input fmaps
×
Output fmaps
M
=
(H-R+1)(W-S+1)N
• Dimensions of matrices for matrix multiply in
convolution layers with batch size N
March 1, 2023 N=2 in example
L08-104
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Simplify to 1-D with N=1, C=1, M=1, U=1
Break into two steps
L08-105
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 4 5 6
3 4 5
2 3 4
1 2 3
L08-106
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3
4 5 6
3 4 5
2 3 4
1 2 3
L08-107
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1 2 3
L08-108
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1 2 3
L08-109
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1
2
3
L08-110
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x =
4 5 6
3 4 5
2 3 4
1
2
3
L08-111
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2 3 4
1
2
3
L08-112
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2 3 4
1
2
3
L08-113
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2
3
4
1
2
3
L08-114
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3 4 5
2
3
4
1
2
3
L08-115
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3 4 5
2
3
4
1
2
3
L08-116
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3
4
5
2
3
4
1
2
3
L08-117
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3
=
4 5 6
3
4
5
2
3
4
1
2
3
L08-118
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3
=
4
5
6
3
4
5
2
3
4
1
2
3
L08-119
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
L08-120
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
L08-121
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-122
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-123
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
L08-124
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
L08-125
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
L08-126
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
L08-127
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
L08-128
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
L08-129
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
L08-130
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
L08-131
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
L08-132
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
L08-133
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
L08-134
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
5 6
4
5
L08-135
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-136
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-137
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
L08-138
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
L08-139
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
q s q+s
0 0 0
L08-140
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
q s q+s
0 0 0
L08-141
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
q s q+s
0 0 0
0 1 1
L08-142
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
q s q+s
0 0 0
0 1 1
L08-143
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
q s q+s
0 0 0
0 1 1
0 2 2
L08-144
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
q s q+s
0 0 0
0 1 1
0 2 2
L08-145
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
L08-146
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
L08-147
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
L08-148
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
L08-149
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
L08-150
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
L08-151
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
L08-152
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
L08-153
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
L08-154
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
L08-155
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
5 6
4
5
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
L08-156
Sze and Emer
Sze and Emer
2-D Toeplitz Convolution Einsum
March 1, 2023
L08-157
Sze and Emer
Sze and Emer
2-D Toeplitz Convolution Einsum
March 1, 2023
Break out Toeplitz conversion
Flatten ranks
L08-158
Sze and Emer
Sze and Emer
Next Lecture: GPUs
Thank you!
March 1, 2023

More Related Content

Similar to L08-handout.pdf

Cryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and PracticeCryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and Practice
TaunyaCoffman887
 
Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)
MargenePurnell14
 

Similar to L08-handout.pdf (20)

L07-handout.pdf
L07-handout.pdfL07-handout.pdf
L07-handout.pdf
 
L05.pdf
L05.pdfL05.pdf
L05.pdf
 
L13.pdf
L13.pdfL13.pdf
L13.pdf
 
Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP
 
Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...
 
Project Management Techniques
Project Management TechniquesProject Management Techniques
Project Management Techniques
 
RBootcam Day 2
RBootcam Day 2RBootcam Day 2
RBootcam Day 2
 
Cryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and PracticeCryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and Practice
 
Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)
 
Mat lab
Mat labMat lab
Mat lab
 
pert_n_cpm.ppt
pert_n_cpm.pptpert_n_cpm.ppt
pert_n_cpm.ppt
 
pert_n_cpm.ppt
pert_n_cpm.pptpert_n_cpm.ppt
pert_n_cpm.ppt
 
A new radix 4 fft algorithm
A new radix 4 fft algorithmA new radix 4 fft algorithm
A new radix 4 fft algorithm
 
Reed solomon explained v1 0
Reed solomon explained v1 0Reed solomon explained v1 0
Reed solomon explained v1 0
 
Introduction to Data Science With R Lab Record
Introduction to Data Science With R Lab RecordIntroduction to Data Science With R Lab Record
Introduction to Data Science With R Lab Record
 
A flexible method to create wave file features
A flexible method to create wave file features A flexible method to create wave file features
A flexible method to create wave file features
 
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
High Speed Memory Efficient Multiplier-less 1-D 9/7 Wavelet Filters Based NED...
 
Introduction of MatLab
Introduction of MatLab Introduction of MatLab
Introduction of MatLab
 
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformRadix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
 
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine TransformRadix-3 Algorithm for Realization of Type-II Discrete Sine Transform
Radix-3 Algorithm for Realization of Type-II Discrete Sine Transform
 

More from TRNHONGLINHBCHCM (12)

L06-handout.pdf
L06-handout.pdfL06-handout.pdf
L06-handout.pdf
 
L09-handout.pdf
L09-handout.pdfL09-handout.pdf
L09-handout.pdf
 
L06.pdf
L06.pdfL06.pdf
L06.pdf
 
L03.pdf
L03.pdfL03.pdf
L03.pdf
 
L11.pdf
L11.pdfL11.pdf
L11.pdf
 
L09.pdf
L09.pdfL09.pdf
L09.pdf
 
L04.pdf
L04.pdfL04.pdf
L04.pdf
 
L12.pdf
L12.pdfL12.pdf
L12.pdf
 
L01.pdf
L01.pdfL01.pdf
L01.pdf
 
L14.pdf
L14.pdfL14.pdf
L14.pdf
 
Projects.pdf
Projects.pdfProjects.pdf
Projects.pdf
 
PaperReview.pdf
PaperReview.pdfPaperReview.pdf
PaperReview.pdf
 

Recently uploaded

"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 

Recently uploaded (20)

A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 

L08-handout.pdf

  • 1. L08-1 Sze and Emer 6.5930/1 Hardware Architectures for Deep Learning Vectorized Kernel Computation (continued) Joel Emer and Vivienne Sze Massachusetts Institute of Technology Electrical Engineering & Computer Science March 1, 2023
  • 2. L08-2 Sze and Emer Sze and Emer ISA Datatypes FP32 FP16 Int32 Int16 Int8 S E M 1 8 23 S E M 1 5 10 M 31 S S M 1 1 15 S M 1 7 Range Accuracy 10-38 – 1038 .000006% 6x10-5 - 6x104 .05% 0 – 2x109 ½ 0 – 6x104 ½ 0 – 127 ½ Image Source: B. Dally March 1, 2023
  • 3. L08-3 Sze and Emer Sze and Emer Intel – MMX/SSE/AVX Width Int8 Int16 Int 32 Int64 FP16 FP32 FP64 Features MMX 64 8 4 2 1 SSE 128 4 SSE2 128 16 8 4 2 4 2 SSE3 128 16 8 4 2 4 2 R AVX 256 32 16 8 4 16 8 4 AVX2 256 32 16 8 4 16 8 4 GUMR AVX3 512 64 32 16 8 ? 16 8 GUMRP March 1, 2023 G: gather U: unaligned M: MAC R: reductions/permutations P: Predicate masks Source: Myriad non-authoritative sources on web
  • 4. L08-4 Sze and Emer 6.5930/1 Hardware Architectures for Deep Learning Computational Transforms Joel Emer and Vivienne Sze Massachusetts Institute of Technology Electrical Engineering & Computer Science March 1, 2023
  • 5. L08-5 Sze and Emer Sze and Emer FC – Vector Operation Counts March 1, 2023 // Level 2 loops for m in [0, M): for chw1 in [0, C*H*W/L): // Level 1 loops parallel_for chw0 in [0, L): o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0]; L == Lanes Vector operation on L lanes Order: m, chw1, chw0 Factors: M, C*H*W/L, L
  • 6. L08-6 Sze and Emer Sze and Emer FC – Vector Operation Counts March 1, 2023 How many MACs? How many reads of “inputs” How many reads of “weights” How many writes of “outputs” Measuring reads/writes in units of 32-bit integers // Level 2 loops for m in [0, M): for chw1 in [0, C*H*W/L): // Level 1 loops parallel_for chw0 in [0, L): o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0]; L == Lanes
  • 7. L08-7 Sze and Emer Sze and Emer Compute Intensity (MACs/Read) March 1, 2023 MACs/Read? C∗H∗W∗M C∗H∗W∗M C∗H∗W∗M ~ 1 2 If system can support 1 Read/MAC will system run at full throttle? // Level 2 loops for m in [0, M): for chw2 in [0, C*H*W/L: // Level 1 loops parallel_for chw1 in [0, L): o[m] += i[L*chw2+chw1] * f[C*H*W*m + L*chw2+chw1]; L == Lanes
  • 8. L08-8 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 8 MAC Lanes with 1 Read/MAC . March 1, 2023 MACs/cycle limited by number of lanes Williams, Samuel, Andrew Waterman, and David Patterson. "Roofline: an insightful visual performance model for multicore architectures." Communications of the ACM 52.4 (2009): 65-76.
  • 9. L08-9 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 8 MAC Lanes with 1 Read/MAC March 1, 2023 Where will the previous slide’s code be? MACs/cycle limited by number of lanes MACs/cycle limited by operand delivery
  • 10. L08-10 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 8 MAC Lanes with 1 Read/MAC March 1, 2023 Where will the previous slide’s code be? o How can we change the compute intensity? MACs/cycle limited by number of lanes MACs/cycle limited by operand delivery
  • 11. L08-11 Sze and Emer Sze and Emer FC – Splitting/Reordered // Level 2 loops for m1 in [0, M/L): for chw in [0, C*H*W): // Level 1 loops parallel_for m0 in [0, L): o[m1*L+m1] += i[chw] * f[CHW*(m1*L+m0) + chw] L == Lanes Partiion “m” Partition “m” Order: m1, chw, m0 Factors: M/L, C*H*W, L
  • 12. L08-12 Sze and Emer Sze and Emer FC – Reordered + loop invariant hoisted March 1, 2023 // Level 2 loops for m2 in [0, M/L): for chw in [0, C*H*W): // Level 1 loops parallel_for m1 in [0, L): o[m2*L+m1] += i[chw] * f[CHW*(m2*L+m1) + chw] // Level 2 loops for m2 in [0, M/L): for chw in [0, C*H*W): i_chw = i[chw] // Level 1 loops for chw in [0, C*H*W): o[m2*L+m1] += i_chw * f[CHW*(m2*L+m1) + chw] L == Lanes L == Lanes
  • 13. L08-13 Sze and Emer Sze and Emer FC – Operation Counts March 1, 2023 How many MACs? How many reads of “inputs” How many reads of “weights” How many writes of “outputs” Measuring reads/writes in units of 32-bit integers // Level 2 loops for m1 in [0, M/L): for chw in [0, C*H*W): i_chw = i[chw] // Level 1 loops for chw in [0, C*H*W): o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw] L == Lanes
  • 14. L08-14 Sze and Emer Sze and Emer Compute Intensity March 1, 2023 MACs/Read? C∗H∗W∗M C∗H∗W∗M C∗H∗W∗M/L ~ 𝐿 1 + 𝐿 If system can support 1 Read/MAC will system run at full throttle? // Level 2 loops for m1 in [0, M/L): for chw in [0, C*H*W): i_chw = i[chw] // Level 1 loops for chw in [0, C*H*W): o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw]
  • 15. L08-15 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 1 MAC/Read - 8 lanes March 1, 2023 Where will the previous slide’s code be? Why might points be below the line? x o Is being on the flat part always best?
  • 16. L08-16 Sze and Emer Sze and Emer Computation Transformations • Goal: Bitwise same result, but reduce number of operations • Focuses mostly on compute March 1, 2023
  • 17. L08-17 Sze and Emer Sze and Emer Gauss’s Multiplication Algorithm 4 multiplications + 3 additions March 1, 2023
  • 18. L08-18 Sze and Emer Sze and Emer Strassen P1 = a(f – h) P2 = (a + b)h P3 = (c + d)e P4 = d(g – e) P5 = (a + d)(e + h) P6 = (b - d)(g + h) P7 = (a – c)(e + f) 8 multiplications + 4 additions 7 multiplications + 18 additions 7 multiplications + 13 additions (for constant B matrix – weights) [Cong et al., ICANN, 2014] March 1, 2023
  • 19. L08-19 Sze and Emer Sze and Emer Strassen Comes at the price of reduced numerical stability and requires significantly more memory N Naïve Strassen Complexity Image Source: http://www.stoimen.com/blog/2012/11/26/computer-algorithms-strassens-matrix-multiplication/ • Reduce the complexity of matrix multiplication from Θ(N3) to Θ(N2.807) by reducing multiplications March 1, 2023
  • 20. L08-20 Sze and Emer Sze and Emer Python to C++ Chart March 1, 2023 [Leiserson, There’s plenty of room at the top, Science, 2020]
  • 21. L08-21 Sze and Emer Sze and Emer Tensor Computations March 1, 2023 Matrix Multiply CONV Layer
  • 22. L08-22 Sze and Emer Sze and Emer Convolution (CONV) Layer … M … Many Input fmaps (N) Many Output fmaps (N) … R S R S C C filters P Q H C H W C P 1 1 N N W Q March 1, 2023 Batch Size (N)
  • 23. L08-23 Sze and Emer Sze and Emer CONV Layer Implementation Naïve 7-layer for-loop implementation: for n in [0..N): for m in [0..M): for q in [0..Q): for p in [0..P): O[n][m][p][q] = B[m]; for c in [0..C): for r in [0..R): for s in [0..S): O[n][m][p][q] += I[n][c][Up+r][Uq+s] × F[m][c][r][s]; O[n][m][p][q] = Activation(O[n][m][p][q]); for each output fmap value convolve a window and apply activatio n March 1, 2023
  • 24. L08-24 Sze and Emer Sze and Emer Winograd 1D – F(2,3) • Targeting convolutions instead of matrix multiply • Notation: F(size of output, filter size) [Lavin et al., CVPR 2016] 6 multiplications + 4 additions filter March 1, 2023 inputs outputs F(2,3) =
  • 25. L08-25 Sze and Emer Sze and Emer Winograd 1D – F(2,3) • Targeting convolutions instead of matrix multiply • Notation: F(size of output, filter size) [Lavin et al., CVPR 2016] filter March 1, 2023 inputs outputs F(2,3) = 4 multiplications + 12 additions + 2 shifts 4 multiplications + 8 additions (for constant weights)
  • 26. L08-26 Sze and Emer Sze and Emer Winograd 2D - F(2x2, 3x3) • 1D Winograd is nested to make 2D Winograd i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 Winograd: 16 multiplications  2.25 times reduction f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Original: 36 multiplications Filter Input Fmap Output Fmap * = March 1, 2023
  • 27. L08-27 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 Filter Input Fmap Output Fmap March 1, 2023
  • 28. L08-28 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap March 1, 2023
  • 29. L08-29 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap March 1, 2023
  • 30. L08-30 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap o02 o03 o12 o12 March 1, 2023
  • 31. L08-31 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap o02 o03 o12 o12 Halo columns March 1, 2023
  • 32. L08-32 Sze and Emer Sze and Emer Winograd Performance Varies Source: Nvidia March 1, 2023
  • 33. L08-33 Sze and Emer Sze and Emer Winograd Summary • Winograd is an optimized computation for convolutions • It can significantly reduce multiplies – For example, for 3x3 filter by 2.25X • But, each filter size (and output size) is a different computation. March 1, 2023
  • 34. L08-34 Sze and Emer Sze and Emer Winograd as a Transform Transform inputs Element-wise Multiplication Transform output [Lavin, ArXiv 2015] Note: GfGT can be precomputed March 1, 2023 filter input
  • 35. L08-35 Sze and Emer Sze and Emer R filter (weights) S Fast Fourier Transform (FFT) Flow E F input fmap output fmap H W an output activation * = FFT(W) F F T FFT(I) X = FFT(O) F F T I F F T March 1, 2023
  • 36. L08-36 Sze and Emer Sze and Emer FFT Overview • Convert filter and input to frequency domain to make convolution a simple multiply then convert back to space domain. • Convert direct convolution O(No 2Nf 2) computation to O(No 2log2No) • Note that computational benefit of FFT decreases with decreasing size of filter March 1, 2023 [Mathieu, ArXiv 2013], [Vasilache, ArXiv 2014]
  • 37. L08-37 Sze and Emer Sze and Emer FFT Costs • Input and Filter matrices are ‘0-completed’, – i.e., expanded to size P+R-1 x Q+S-1 • Frequency domain matrices are same dimensions as input, but complex. • FFT often reduces computation, but requires much more memory space and bandwidth March 1, 2023
  • 38. L08-38 Sze and Emer Sze and Emer Optimization opportunities • FFT of real matrix is symmetric allowing one to save ½ the computes • Filters can be pre-computed and stored, but convolutional filter in frequency domain is much larger than in space domain • Can reuse frequency domain version of input for creating different output channels to avoid FFT re-computations • Can accumulate across channels before performing inverse transform to reduce number of IFFT March 1, 2023
  • 39. L08-39 Sze and Emer Sze and Emer cuDNN: Speed up with Transformations Source: Nvidia March 1, 2023
  • 40. L08-40 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = e f g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 41. L08-41 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 42. L08-42 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 43. L08-43 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 44. L08-44 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖 6 additions 6 multiplications 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 45. L08-45 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 March 1, 2023
  • 46. L08-46 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 March 1, 2023
  • 47. L08-47 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Filtering steps March 1, 2023
  • 48. L08-48 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 Filtering steps March 1, 2023
  • 49. L08-49 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 Filtering steps 2D dot Product March 1, 2023
  • 50. L08-50 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 Filtering steps 2D dot Product March 1, 2023
  • 51. L08-51 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Filtering steps 2D dot Product March 1, 2023
  • 52. L08-52 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Filtering steps 2D dot Product March 1, 2023
  • 53. L08-53 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 Filtering steps 2D dot Product March 1, 2023
  • 54. L08-54 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 Filtering steps 2D dot Product March 1, 2023
  • 55. L08-55 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Filtering steps 2D dot Product March 1, 2023
  • 56. L08-56 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 57. L08-57 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 58. L08-58 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 59. L08-59 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 60. L08-60 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product March 1, 2023
  • 61. L08-61 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 62. L08-62 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 63. L08-63 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 64. L08-64 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 65. L08-65 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Flattened Filtering steps . . . 2D dot Product 1D dot Product March 1, 2023
  • 66. L08-66 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened March 1, 2023
  • 67. L08-67 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened March 1, 2023
  • 68. L08-68 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened 1 2 4 5 1 2 3 4 ● = 1 March 1, 2023
  • 69. L08-69 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened 1 2 4 5 1 2 3 4 ● = 1 March 1, 2023
  • 70. L08-70 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . March 1, 2023
  • 71. L08-71 Sze and Emer Sze and Emer Convolution (CONV) Layer Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: March 1, 2023
  • 72. L08-72 Sze and Emer Sze and Emer Convolution (CONV) Layer Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . March 1, 2023
  • 73. L08-73 Sze and Emer Sze and Emer Convolution (CONV) Layer Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . March 1, 2023
  • 74. L08-74 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 75. L08-75 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 76. L08-76 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 77. L08-77 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 78. L08-78 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 79. L08-79 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 80. L08-80 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 81. L08-81 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 1 2 March 1, 2023
  • 82. L08-82 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 1 2 3 March 1, 2023
  • 83. L08-83 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 March 1, 2023
  • 84. L08-84 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 March 1, 2023
  • 85. L08-85 Sze and Emer Sze and Emer 1 2 3 4 Convolution (CONV) Layer 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 March 1, 2023
  • 86. L08-86 Sze and Emer Sze and Emer Convolution (CONV) Layer Convert to matrix multiply using the Toeplitz Matrix 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × = Convolution: Matrix Multiply (by Toeplitz Matrix) March 1, 2023
  • 87. L08-87 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × = Data is repeated 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Matrix Multiply (by Toeplitz Matrix) March 1, 2023
  • 88. L08-88 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels 1 2 3 4 Filter 1 Input Fmap Chnl 1 * = 1 2 3 4 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 5 6 7 8 9 Output Fmap Key: Black: Input channel 1 Yellow: Input channel 2 C Filter C March 1, 2023
  • 89. L08-89 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels 1 2 3 4 Filter 1 Input Fmap Chnl 1 * = 1 2 3 4 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 5 6 7 8 9 Output Fmap Key: Black: Input channel 1 Yellow: Input channel 2 C Filter C 3D Stacks… March 1, 2023
  • 90. L08-90 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Filter 1 Chnl 1 Chnl 1 Output Fmap Filter March 1, 2023
  • 91. L08-91 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 92. L08-92 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 93. L08-93 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 94. L08-94 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 95. L08-95 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 2 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 96. L08-96 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 2 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 97. L08-97 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels and Output Channels 1 2 3 4 Filter 1 Chnl 1 * = 1 2 3 4 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 5 6 7 8 9 Key: Black: Input channel 1 Yellow: Input channel 2 Underlined: Output channel 2 Input Fmap Output Fmap C Filter C March 1, 2023
  • 98. L08-98 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels and Output Channels 1 2 3 4 Filter 1 Chnl 1 * = 1 2 3 4 1 2 3 4 Filter 2 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8 9 Key: Black: Input channel 1 Yellow: Input channel 2 Underlined: Output channel 2 Input Fmap Output Fmap C Filter C March 1, 2023
  • 99. L08-99 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels and Output Channels 1 2 3 4 Filter 1 Chnl 1 * = 1 2 3 4 1 2 3 4 Filter 2 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8 9 1 2 3 4 Chnl 2 Key: Black: Input channel 1 Yellow: Input channel 2 Underlined: Output channel 2 Input Fmap Output Fmap C Filter C March 1, 2023
  • 100. L08-100 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × • Multiple Input Channels and Output Channels 1 2 3 4 1 2 3 4 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 2 Chnl 1 Input Fmap as Toeplitz matrix Output Fmap Filter March 1, 2023
  • 101. L08-101 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × • Multiple Input Channels and Output Channels 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Chnl 1 Chnl 2 Filter 1 Filter 2 Chnl 1 Chnl 2 Chnl 1 Input Fmap as Toeplitz matrix Output Fmap Filter March 1, 2023
  • 102. L08-102 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × • Multiple Input Channels and Output Channels 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 3 1 2 4 Chnl 1 Chnl 2 Filter 1 Filter 2 Chnl 1 Chnl 2 Chnl 1 Chnl 2 Input Fmap as Toeplitz matrix Output Fmap Filter March 1, 2023
  • 103. L08-103 Sze and Emer Sze and Emer Convolution (CONV) Layer M CRS CRS (H-R+1)(W-S+1)N Filters Input fmaps × Output fmaps M = (H-R+1)(W-S+1)N • Dimensions of matrices for matrix multiply in convolution layers with batch size N March 1, 2023 N=2 in example
  • 104. L08-104 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Simplify to 1-D with N=1, C=1, M=1, U=1 Break into two steps
  • 105. L08-105 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 4 5 6 3 4 5 2 3 4 1 2 3
  • 106. L08-106 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 4 5 6 3 4 5 2 3 4 1 2 3
  • 107. L08-107 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 4 5 6 3 4 5 2 3 4 1 2 3
  • 108. L08-108 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 4 5 6 3 4 5 2 3 4 1 2 3
  • 109. L08-109 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 4 5 6 3 4 5 2 3 4 1 2 3
  • 110. L08-110 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x = 4 5 6 3 4 5 2 3 4 1 2 3
  • 111. L08-111 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 112. L08-112 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 113. L08-113 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 114. L08-114 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 115. L08-115 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 116. L08-116 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 117. L08-117 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 118. L08-118 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 119. L08-119 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 120. L08-120 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 121. L08-121 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 122. L08-122 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 123. L08-123 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5
  • 124. L08-124 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2
  • 125. L08-125 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2
  • 126. L08-126 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1
  • 127. L08-127 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2
  • 128. L08-128 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3
  • 129. L08-129 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2
  • 130. L08-130 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 3
  • 131. L08-131 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3
  • 132. L08-132 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3
  • 133. L08-133 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4
  • 134. L08-134 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4 5 6 4 5
  • 135. L08-135 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 136. L08-136 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 137. L08-137 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5
  • 138. L08-138 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2
  • 139. L08-139 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 q s q+s 0 0 0
  • 140. L08-140 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 q s q+s 0 0 0
  • 141. L08-141 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 q s q+s 0 0 0 0 1 1
  • 142. L08-142 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 q s q+s 0 0 0 0 1 1
  • 143. L08-143 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 q s q+s 0 0 0 0 1 1 0 2 2
  • 144. L08-144 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2
  • 145. L08-145 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1
  • 146. L08-146 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1
  • 147. L08-147 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2
  • 148. L08-148 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2
  • 149. L08-149 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3
  • 150. L08-150 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3
  • 151. L08-151 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2
  • 152. L08-152 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2
  • 153. L08-153 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3
  • 154. L08-154 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3
  • 155. L08-155 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4 5 6 4 5 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3
  • 156. L08-156 Sze and Emer Sze and Emer 2-D Toeplitz Convolution Einsum March 1, 2023
  • 157. L08-157 Sze and Emer Sze and Emer 2-D Toeplitz Convolution Einsum March 1, 2023 Break out Toeplitz conversion Flatten ranks
  • 158. L08-158 Sze and Emer Sze and Emer Next Lecture: GPUs Thank you! March 1, 2023