SlideShare a Scribd company logo
1 of 158
Download to read offline
L08-1
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Vectorized Kernel Computation
(continued)
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 1, 2023
L08-2
Sze and Emer
Sze and Emer
ISA Datatypes
FP32
FP16
Int32
Int16
Int8
S E M
1 8 23
S E M
1 5 10
M
31
S
S M
1
1 15
S M
1 7
Range Accuracy
10-38 – 1038 .000006%
6x10-5 - 6x104 .05%
0 – 2x109 ½
0 – 6x104 ½
0 – 127 ½
Image Source: B. Dally
March 1, 2023
L08-3
Sze and Emer
Sze and Emer
Intel – MMX/SSE/AVX
Width Int8 Int16 Int 32 Int64 FP16 FP32 FP64 Features
MMX 64 8 4 2 1
SSE 128 4
SSE2 128 16 8 4 2 4 2
SSE3 128 16 8 4 2 4 2 R
AVX 256 32 16 8 4 16 8 4
AVX2 256 32 16 8 4 16 8 4 GUMR
AVX3 512 64 32 16 8 ? 16 8 GUMRP
March 1, 2023
G: gather
U: unaligned
M: MAC
R: reductions/permutations
P: Predicate masks
Source: Myriad non-authoritative sources on web
L08-4
Sze and Emer
6.5930/1
Hardware Architectures for Deep Learning
Computational Transforms
Joel Emer and Vivienne Sze
Massachusetts Institute of Technology
Electrical Engineering & Computer Science
March 1, 2023
L08-5
Sze and Emer
Sze and Emer
FC – Vector Operation Counts
March 1, 2023
// Level 2 loops
for m in [0, M):
for chw1 in [0, C*H*W/L):
// Level 1 loops
parallel_for chw0 in [0, L):
o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0];
L == Lanes
Vector operation
on L lanes
Order: m, chw1, chw0 Factors: M, C*H*W/L, L
L08-6
Sze and Emer
Sze and Emer
FC – Vector Operation Counts
March 1, 2023
How many MACs? M *(C*H*W/L) * L = M*C*H*W
How many reads of “inputs” C*H*W*M
How many reads of “weights” C*H*W*M
How many writes of “outputs” M
Measuring reads/writes in
units of 32-bit integers
// Level 2 loops
for m in [0, M):
for chw1 in [0, C*H*W/L):
// Level 1 loops
parallel_for chw0 in [0, L):
o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0];
L == Lanes
L08-7
Sze and Emer
Sze and Emer
Compute Intensity (MACs/Read)
March 1, 2023
MACs/Read? C∗H∗W∗M
C∗H∗W∗M C∗H∗W∗M
~
1
2
If system can support 1 Read/MAC
will system run at full throttle?
No
// Level 2 loops
for m in [0, M):
for chw2 in [0, C*H*W/L:
// Level 1 loops
parallel_for chw1 in [0, L):
o[m] += i[L*chw2+chw1] * f[C*H*W*m + L*chw2+chw1];
L == Lanes
L08-8
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
8 MAC Lanes with 1 Read/MAC .
March 1, 2023
MACs/cycle limited by
number of lanes
Williams, Samuel, Andrew Waterman, and David Patterson. "Roofline: an insightful visual performance model for multicore architectures."
Communications of the ACM 52.4 (2009): 65-76.
L08-9
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
8 MAC Lanes with 1 Read/MAC
March 1, 2023
Where will the previous slide’s code be? Compute intensity = 1/2
MACs/cycle limited by
number of lanes
MACs/cycle limited
by operand delivery
L08-10
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
8 MAC Lanes with 1 Read/MAC
March 1, 2023
Where will the previous slide’s code be? Compute intensity = 1/2
o
How can we change the compute intensity? Code changes, e.g., different splitting,
loop inversions?
MACs/cycle limited by
number of lanes
MACs/cycle limited
by operand delivery
L08-11
Sze and Emer
Sze and Emer
FC – Splitting/Reordered
// Level 2 loops
for m1 in [0, M/L):
for chw in [0, C*H*W):
// Level 1 loops
parallel_for m0 in [0, L):
o[m1*L+m1] += i[chw] * f[CHW*(m1*L+m0) + chw]
L == Lanes
Partiion
“m”
Partition
“m”
Order: m1, chw, m0 Factors: M/L, C*H*W, L
L08-12
Sze and Emer
Sze and Emer
FC – Reordered + loop invariant hoisted
March 1, 2023
// Level 2 loops
for m2 in [0, M/L):
for chw in [0, C*H*W):
// Level 1 loops
parallel_for m1 in [0, L):
o[m2*L+m1] += i[chw] * f[CHW*(m2*L+m1) + chw]
// Level 2 loops
for m2 in [0, M/L):
for chw in [0, C*H*W):
i_chw = i[chw]
// Level 1 loops
for chw in [0, C*H*W):
o[m2*L+m1] += i_chw * f[CHW*(m2*L+m1) + chw]
L == Lanes
L == Lanes
L08-13
Sze and Emer
Sze and Emer
FC – Operation Counts
March 1, 2023
How many MACs? C*H*W*M
How many reads of “inputs” C*H*W*M/L
How many reads of “weights” C*H*W*M
How many writes of “outputs” M
Measuring reads/writes in units of 32-bit integers
// Level 2 loops
for m1 in [0, M/L):
for chw in [0, C*H*W):
i_chw = i[chw]
// Level 1 loops
for chw in [0, C*H*W):
o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw]
L == Lanes
L08-14
Sze and Emer
Sze and Emer
Compute Intensity
March 1, 2023
MACs/Read? C∗H∗W∗M
C∗H∗W∗M C∗H∗W∗M/L
~
𝐿
1 + 𝐿
If system can support 1 Read/MAC
will system run at full throttle?
No
// Level 2 loops
for m1 in [0, M/L):
for chw in [0, C*H*W):
i_chw = i[chw]
// Level 1 loops
for chw in [0, C*H*W):
o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw]
L08-15
Sze and Emer
Sze and Emer
Roofline Model
0
2
4
6
8
10
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2
MACs/Cycle
Compute Intensity (MACs/read)
1 MAC/Read - 8 lanes
March 1, 2023
Where will the previous slide’s code be? Compute intensity = L/(1+L) = 8/9
Why might points be below the line? Other overheads (e.g. instructions, stalls)
x
o
Is being on the flat part always best? Not necessarily…
L08-16
Sze and Emer
Sze and Emer
Computation Transformations
• Goal: Bitwise same result, but reduce
number of operations
• Focuses mostly on compute
March 1, 2023
L08-17
Sze and Emer
Sze and Emer
Gauss’s Multiplication Algorithm
4 multiplications + 3 additions
March 1, 2023
L08-18
Sze and Emer
Sze and Emer
Strassen
P1 = a(f – h)
P2 = (a + b)h
P3 = (c + d)e
P4 = d(g – e)
P5 = (a + d)(e + h)
P6 = (b - d)(g + h)
P7 = (a – c)(e + f)
8 multiplications + 4 additions
7 multiplications + 18 additions
7 multiplications + 13 additions (for constant B matrix – weights)
[Cong et al., ICANN, 2014]
March 1, 2023
L08-19
Sze and Emer
Sze and Emer
Strassen
Comes at the price of reduced numerical stability and
requires significantly more memory
N
Naïve
Strassen
Complexity
Image Source: http://www.stoimen.com/blog/2012/11/26/computer-algorithms-strassens-matrix-multiplication/
• Reduce the complexity of matrix multiplication
from Θ(N3) to Θ(N2.807) by reducing
multiplications
March 1, 2023
L08-20
Sze and Emer
Sze and Emer
Python to C++ Chart
March 1, 2023
[Leiserson, There’s plenty of room at the top, Science, 2020]
L08-21
Sze and Emer
Sze and Emer
Tensor Computations
March 1, 2023
Matrix Multiply
CONV Layer
L08-22
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
…
M
…
Many
Input fmaps (N) Many
Output fmaps (N)
…
R
S
R
S
C
C
filters
P
Q
H
C
H
W
C
P
1 1
N
N
W Q
March 1, 2023
Batch Size (N)
L08-23
Sze and Emer
Sze and Emer
CONV Layer Implementation
Naïve 7-layer for-loop implementation:
for n in [0..N):
for m in [0..M):
for q in [0..Q):
for p in [0..P):
O[n][m][p][q] = B[m];
for c in [0..C):
for r in [0..R):
for s in [0..S):
O[n][m][p][q] += I[n][c][Up+r][Uq+s]
× F[m][c][r][s];
O[n][m][p][q] = Activation(O[n][m][p][q]);
for each output fmap value
convolve
a
window
and
apply
activatio
n
March 1, 2023
L08-24
Sze and Emer
Sze and Emer
Winograd 1D – F(2,3)
• Targeting convolutions instead of matrix multiply
• Notation: F(size of output, filter size)
[Lavin et al., CVPR 2016]
6 multiplications + 4 additions
filter
March 1, 2023
inputs outputs
F(2,3) =
L08-25
Sze and Emer
Sze and Emer
Winograd 1D – F(2,3)
• Targeting convolutions instead of matrix multiply
• Notation: F(size of output, filter size)
[Lavin et al., CVPR 2016]
filter
March 1, 2023
inputs outputs
F(2,3) =
4 multiplications + 12 additions + 2 shifts
4 multiplications + 8 additions (for constant weights)
L08-26
Sze and Emer
Sze and Emer
Winograd 2D - F(2x2, 3x3)
• 1D Winograd is nested to make 2D Winograd
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
Winograd: 16 multiplications  2.25 times reduction
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Original: 36 multiplications
Filter Input Fmap Output Fmap
* =
March 1, 2023
L08-27
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
Filter Input Fmap Output Fmap
March 1, 2023
L08-28
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
March 1, 2023
L08-29
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
March 1, 2023
L08-30
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
o02 o03
o12 o12
March 1, 2023
L08-31
Sze and Emer
Sze and Emer
Winograd Halos
• Winograd works on a small region (tile) of output at a time,
and therefore uses inputs repeatedly
i00 i01 i02 i03 i04 i05
i10 i11 i12 i13 i14 i15
i20 i21 i22 i23 i24 i25
i30 i31 i32 i33 i34 i35
f00 f01 f02
f10 f11 f12
f20 f21 f22
o00 o01
o10 o11
Filter Input Fmap Output Fmap
o02 o03
o12 o12
Halo columns
March 1, 2023
L08-32
Sze and Emer
Sze and Emer
Winograd Performance Varies
Source: Nvidia
March 1, 2023
L08-33
Sze and Emer
Sze and Emer
Winograd Summary
• Winograd is an optimized computation for
convolutions
• It can significantly reduce multiplies
– For example, for 3x3 filter by 2.25X
• But, each filter size (and output size) is a
different computation.
March 1, 2023
L08-34
Sze and Emer
Sze and Emer
Winograd as a Transform
Transform inputs
Element-wise
Multiplication
Transform output
[Lavin, ArXiv 2015]
Note: GfGT can be precomputed
March 1, 2023
filter
input
L08-35
Sze and Emer
Sze and Emer
R
filter (weights)
S
Fast Fourier Transform (FFT) Flow
E
F
input fmap output fmap
H
W
an output
activation
* =
FFT(W)
F
F
T
FFT(I)
X = FFT(O)
F
F
T
I
F
F
T
March 1, 2023
L08-36
Sze and Emer
Sze and Emer
FFT Overview
• Convert filter and input to frequency domain to make convolution
a simple multiply then convert back to space domain.
• Convert direct convolution O(No
2Nf
2) computation to O(No
2log2No)
• Note that computational benefit of FFT decreases with decreasing
size of filter
March 1, 2023
[Mathieu, ArXiv 2013], [Vasilache, ArXiv 2014]
L08-37
Sze and Emer
Sze and Emer
FFT Costs
• Input and Filter matrices are ‘0-completed’,
– i.e., expanded to size P+R-1 x Q+S-1
• Frequency domain matrices are same dimensions as
input, but complex.
• FFT often reduces computation, but requires much more
memory space and bandwidth
March 1, 2023
L08-38
Sze and Emer
Sze and Emer
Optimization opportunities
• FFT of real matrix is symmetric allowing one to save ½ the computes
• Filters can be pre-computed and stored, but convolutional filter in
frequency domain is much larger than in space domain
• Can reuse frequency domain version of input for creating different
output channels to avoid FFT re-computations
• Can accumulate across channels before performing inverse
transform to reduce number of IFFT
March 1, 2023
L08-39
Sze and Emer
Sze and Emer
cuDNN: Speed up with Transformations
Source: Nvidia
March 1, 2023
L08-40
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
e f
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-41
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-42
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-43
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-44
Sze and Emer
Sze and Emer
UCNN – Convolution (Simplified)
March 1, 2023
i00 i01 i02 i03
i10 i11 i12 i13
i20 i21 i22 i23
i30 i31 i32 i33
a b
c d
o00 o01
o10 o11
Filters Input Fmap Output Fmap
* =
b a
g h
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖
𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖
6 additions
6 multiplications
7 additions
8 multiplications
[Hegde, ISCA 2018]
L08-45
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
March 1, 2023
L08-46
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
March 1, 2023
L08-47
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Filtering steps
March 1, 2023
L08-48
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
March 1, 2023
L08-49
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
2D dot Product
March 1, 2023
L08-50
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
Filtering steps
2D dot Product
March 1, 2023
L08-51
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
Filtering steps
2D dot Product
March 1, 2023
L08-52
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
Filtering steps
2D dot Product
March 1, 2023
L08-53
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
Filtering steps
2D dot Product
March 1, 2023
L08-54
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
Filtering steps
2D dot Product
March 1, 2023
L08-55
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Filtering steps
2D dot Product
March 1, 2023
L08-56
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-57
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-58
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-59
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-60
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
March 1, 2023
L08-61
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-62
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-63
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-64
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
2
3
5
6
1 2 3 4 ● = 2
Flattened
Filtering steps
2D dot Product
1D dot
Product
March 1, 2023
L08-65
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2
4 5
1 2
3 4
● =
1
2 3
5 6
1 2
3 4
● =
2
4 5
7 8
1 2
3 4
● = 3
5 6
8 9
1 2
3 4
● = 4
1
2
4
5
1 2 3 4 ● = 1
2
3
5
6
1 2 3 4 ● = 2
Flattened
Filtering steps
.
.
.
2D dot Product
1D dot
Product
March 1, 2023
L08-66
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
March 1, 2023
L08-67
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
March 1, 2023
L08-68
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1
March 1, 2023
L08-69
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1
March 1, 2023
L08-70
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023
L08-71
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
March 1, 2023
L08-72
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023
L08-73
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
March 1, 2023
L08-74
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-75
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4
1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-76
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-77
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1
2
March 1, 2023
L08-78
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4
2
3
5
6
1
2
March 1, 2023
L08-79
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1
2
March 1, 2023
L08-80
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
1 2
March 1, 2023
L08-81
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
1 2
March 1, 2023
L08-82
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
1 2 3
March 1, 2023
L08-83
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3
March 1, 2023
L08-84
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
March 1, 2023
L08-85
Sze and Emer
Sze and Emer
1 2 3 4
Convolution (CONV) Layer
1 2 3 4
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
× =
Flattened
1
2
4
5
1 2 3 4 ● = 1 2
3
5
6
1 2 3 4 ● = 2 . . .
Matrix Multiply (by Toeplitz Matrix)
1 2 3 4 1
2
4
5
1 2 3 4 2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
March 1, 2023
L08-86
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
Convert to matrix multiply using the Toeplitz Matrix
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
1 2 3 4
1 2 4 5
2 3 5 6
4 5 7 8
5 6 8 9
1 2 3 4 × =
Convolution:
Matrix Multiply (by Toeplitz Matrix)
March 1, 2023
L08-87
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
1 2 3 4
1 2 4 5
2 3 5 6
4 5 7 8
5 6 8 9
1 2 3 4 × =
Data is repeated
1 2 3
4 5 6
7 8 9
1 2
3 4
Filter Input Fmap Output Fmap
* =
1 2
3 4
Convolution:
Matrix Multiply (by Toeplitz Matrix)
March 1, 2023
L08-88
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels
1 2
3 4
Filter 1
Input Fmap
Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Output Fmap
Key:
Black: Input channel 1
Yellow: Input channel 2
C
Filter
C
March 1, 2023
L08-89
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels
1 2
3 4
Filter 1
Input Fmap
Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Output Fmap
Key:
Black: Input channel 1
Yellow: Input channel 2
C
Filter
C
3D Stacks…
March 1, 2023
L08-90
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Output Fmap
Filter
March 1, 2023
L08-91
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-92
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-93
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-94
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-95
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-96
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1 2 3 4
×
Input Fmap as
Toeplitz matrix
• Multiple Input Channels
1 2 3 4 1 2 3 4 1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Toeplitz
converted inputs
Output Fmap
Filter
March 1, 2023
L08-97
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels and Output Channels
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2 3
4 5 6
7 8 9
Key:
Black: Input channel 1
Yellow: Input channel 2
Underlined: Output
channel 2
Input Fmap Output Fmap
C
Filter
C
March 1, 2023
L08-98
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels and Output Channels
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
1 2
3 4
Filter 2
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2
3 4
1 2 3
4 5 6
7 8 9
Key:
Black: Input channel 1
Yellow: Input channel 2
Underlined: Output
channel 2
Input Fmap Output Fmap
C
Filter
C
March 1, 2023
L08-99
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
• Multiple Input Channels and Output Channels
1 2
3 4
Filter 1 Chnl 1
* =
1 2
3 4
1 2
3 4
Filter 2
Chnl 1 Chnl 2
1 2 3
4 5 6
7 8 9
Chnl 1 Chnl 2
1 2
3 4
1 2
3 4
1 2 3
4 5 6
7 8 9
1 2
3 4
Chnl 2
Key:
Black: Input channel 1
Yellow: Input channel 2
Underlined: Output
channel 2
Input Fmap Output Fmap
C
Filter
C
March 1, 2023
L08-100
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
• Multiple Input Channels and Output Channels
1 2 3 4 1 2 3 4
Chnl 1 Chnl 2
Filter 1
Chnl 1
Chnl 2
Chnl 1
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023
L08-101
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
• Multiple Input Channels and Output Channels
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
Chnl 1 Chnl 2
Filter 1
Filter 2
Chnl 1
Chnl 2
Chnl 1
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023
L08-102
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
=
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1
2
4
5
2
3
5
6
4
5
7
8
5
6
8
9
1 2 3 4
×
• Multiple Input Channels and Output Channels
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4 3
1 2 4
Chnl 1 Chnl 2
Filter 1
Filter 2
Chnl 1
Chnl 2
Chnl 1
Chnl 2
Input Fmap as
Toeplitz matrix
Output Fmap
Filter
March 1, 2023
L08-103
Sze and Emer
Sze and Emer
Convolution (CONV) Layer
M
CRS
CRS
(H-R+1)(W-S+1)N
Filters Input fmaps
×
Output fmaps
M
=
(H-R+1)(W-S+1)N
• Dimensions of matrices for matrix multiply in
convolution layers with batch size N
March 1, 2023 N=2 in example
L08-104
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Simplify to 1-D with N=1, C=1, M=1, U=1
Break into two steps
L08-105
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 4 5 6
3 4 5
2 3 4
1 2 3
L08-106
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3
4 5 6
3 4 5
2 3 4
1 2 3
L08-107
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1 2 3
L08-108
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1 2 3
L08-109
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x
4 5 6
3 4 5
2 3 4
1
2
3
L08-110
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x =
4 5 6
3 4 5
2 3 4
1
2
3
L08-111
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2 3 4
1
2
3
L08-112
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2 3 4
1
2
3
L08-113
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1
=
4 5 6
3 4 5
2
3
4
1
2
3
L08-114
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3 4 5
2
3
4
1
2
3
L08-115
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3 4 5
2
3
4
1
2
3
L08-116
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2
=
4 5 6
3
4
5
2
3
4
1
2
3
L08-117
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3
=
4 5 6
3
4
5
2
3
4
1
2
3
L08-118
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3
=
4
5
6
3
4
5
2
3
4
1
2
3
L08-119
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
L08-120
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
1 2 3 4 5 6
1 2 3
Filter Input Fmap Output Fmap
* = 1 2 3 4
1 2 3 x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
L08-121
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-122
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-123
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
L08-124
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
L08-125
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
L08-126
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
L08-127
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
L08-128
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
L08-129
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
L08-130
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
L08-131
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
L08-132
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
L08-133
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
L08-134
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
5 6
4
5
L08-135
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-136
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
L08-137
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
L08-138
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
L08-139
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
q s q+s
0 0 0
L08-140
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
q s q+s
0 0 0
L08-141
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
q s q+s
0 0 0
0 1 1
L08-142
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
q s q+s
0 0 0
0 1 1
L08-143
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
q s q+s
0 0 0
0 1 1
0 2 2
L08-144
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
q s q+s
0 0 0
0 1 1
0 2 2
L08-145
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
L08-146
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
L08-147
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
L08-148
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
L08-149
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
L08-150
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
L08-151
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
L08-152
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
L08-153
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
L08-154
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
L08-155
Sze and Emer
Sze and Emer
1-D Toeplitz Convolution Einsum
March 1, 2023
Filter Input Fmap Output Fmap
1 2 3 4 5 6
x 1 2 3 4
=
4
5
6
3
4
5
2
3
4
1
2
3
1 2 3
1 2 3
0 1 2 3 4 5
0 1 2 3 <- q
s
V
0
1
2
1
2
3 4
2
3
3
4
5 6
4
5
q s q+s
0 0 0
0 1 1
0 2 2
1 0 1
1 1 2
1 2 3
2 0 2
2 1 3
L08-156
Sze and Emer
Sze and Emer
2-D Toeplitz Convolution Einsum
March 1, 2023
L08-157
Sze and Emer
Sze and Emer
2-D Toeplitz Convolution Einsum
March 1, 2023
Break out Toeplitz conversion
Flatten ranks
L08-158
Sze and Emer
Sze and Emer
Next Lecture: GPUs
Thank you!
March 1, 2023

More Related Content

Similar to L08.pdf

Extended network and algorithm finding maximal flows
Extended network and algorithm finding maximal flows Extended network and algorithm finding maximal flows
Extended network and algorithm finding maximal flows IJECEIAES
 
Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP IJECEIAES
 
Ceng232 Decoder Multiplexer Adder
Ceng232 Decoder Multiplexer AdderCeng232 Decoder Multiplexer Adder
Ceng232 Decoder Multiplexer Addergueste731a4
 
Introduction to Data Science With R Lab Record
Introduction to Data Science With R Lab RecordIntroduction to Data Science With R Lab Record
Introduction to Data Science With R Lab RecordLakshmi Sarvani Videla
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsEellekwameowusu
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics PipelineMark Kilgard
 
Cryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and PracticeCryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and PracticeTaunyaCoffman887
 
Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)MargenePurnell14
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoderijsrd.com
 
B61301007 matlab documentation
B61301007 matlab documentationB61301007 matlab documentation
B61301007 matlab documentationManchireddy Reddy
 
Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...IJECEIAES
 
On Edge Control Set of a Graph in Transportation Problems
On Edge Control Set of a Graph in Transportation ProblemsOn Edge Control Set of a Graph in Transportation Problems
On Edge Control Set of a Graph in Transportation ProblemsEswar Publications
 
Introduction of MatLab
Introduction of MatLab Introduction of MatLab
Introduction of MatLab Imran Nawaz
 

Similar to L08.pdf (20)

L05.pdf
L05.pdfL05.pdf
L05.pdf
 
L15.pdf
L15.pdfL15.pdf
L15.pdf
 
Project Management Techniques
Project Management TechniquesProject Management Techniques
Project Management Techniques
 
Extended network and algorithm finding maximal flows
Extended network and algorithm finding maximal flows Extended network and algorithm finding maximal flows
Extended network and algorithm finding maximal flows
 
Mat lab
Mat labMat lab
Mat lab
 
L02.pdf
L02.pdfL02.pdf
L02.pdf
 
Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP Wave File Features Extraction using Reduced LBP
Wave File Features Extraction using Reduced LBP
 
Ceng232 Decoder Multiplexer Adder
Ceng232 Decoder Multiplexer AdderCeng232 Decoder Multiplexer Adder
Ceng232 Decoder Multiplexer Adder
 
Introduction to Data Science With R Lab Record
Introduction to Data Science With R Lab RecordIntroduction to Data Science With R Lab Record
Introduction to Data Science With R Lab Record
 
Mm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithmsMm chap08 -_lossy_compression_algorithms
Mm chap08 -_lossy_compression_algorithms
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
Cryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and PracticeCryptography and Network Security Principles and Practice
Cryptography and Network Security Principles and Practice
 
Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)Cryptography and Network Security Principles and Practice (2)
Cryptography and Network Security Principles and Practice (2)
 
RBootcam Day 2
RBootcam Day 2RBootcam Day 2
RBootcam Day 2
 
Optimisation random graph presentation
Optimisation random graph presentationOptimisation random graph presentation
Optimisation random graph presentation
 
FPGA based BCH Decoder
FPGA based BCH DecoderFPGA based BCH Decoder
FPGA based BCH Decoder
 
B61301007 matlab documentation
B61301007 matlab documentationB61301007 matlab documentation
B61301007 matlab documentation
 
Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...
 
On Edge Control Set of a Graph in Transportation Problems
On Edge Control Set of a Graph in Transportation ProblemsOn Edge Control Set of a Graph in Transportation Problems
On Edge Control Set of a Graph in Transportation Problems
 
Introduction of MatLab
Introduction of MatLab Introduction of MatLab
Introduction of MatLab
 

More from TRNHONGLINHBCHCM (12)

L06-handout.pdf
L06-handout.pdfL06-handout.pdf
L06-handout.pdf
 
L09-handout.pdf
L09-handout.pdfL09-handout.pdf
L09-handout.pdf
 
L06.pdf
L06.pdfL06.pdf
L06.pdf
 
L03.pdf
L03.pdfL03.pdf
L03.pdf
 
L11.pdf
L11.pdfL11.pdf
L11.pdf
 
L09.pdf
L09.pdfL09.pdf
L09.pdf
 
L04.pdf
L04.pdfL04.pdf
L04.pdf
 
L12.pdf
L12.pdfL12.pdf
L12.pdf
 
L01.pdf
L01.pdfL01.pdf
L01.pdf
 
L14.pdf
L14.pdfL14.pdf
L14.pdf
 
Projects.pdf
Projects.pdfProjects.pdf
Projects.pdf
 
PaperReview.pdf
PaperReview.pdfPaperReview.pdf
PaperReview.pdf
 

Recently uploaded

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 

L08.pdf

  • 1. L08-1 Sze and Emer 6.5930/1 Hardware Architectures for Deep Learning Vectorized Kernel Computation (continued) Joel Emer and Vivienne Sze Massachusetts Institute of Technology Electrical Engineering & Computer Science March 1, 2023
  • 2. L08-2 Sze and Emer Sze and Emer ISA Datatypes FP32 FP16 Int32 Int16 Int8 S E M 1 8 23 S E M 1 5 10 M 31 S S M 1 1 15 S M 1 7 Range Accuracy 10-38 – 1038 .000006% 6x10-5 - 6x104 .05% 0 – 2x109 ½ 0 – 6x104 ½ 0 – 127 ½ Image Source: B. Dally March 1, 2023
  • 3. L08-3 Sze and Emer Sze and Emer Intel – MMX/SSE/AVX Width Int8 Int16 Int 32 Int64 FP16 FP32 FP64 Features MMX 64 8 4 2 1 SSE 128 4 SSE2 128 16 8 4 2 4 2 SSE3 128 16 8 4 2 4 2 R AVX 256 32 16 8 4 16 8 4 AVX2 256 32 16 8 4 16 8 4 GUMR AVX3 512 64 32 16 8 ? 16 8 GUMRP March 1, 2023 G: gather U: unaligned M: MAC R: reductions/permutations P: Predicate masks Source: Myriad non-authoritative sources on web
  • 4. L08-4 Sze and Emer 6.5930/1 Hardware Architectures for Deep Learning Computational Transforms Joel Emer and Vivienne Sze Massachusetts Institute of Technology Electrical Engineering & Computer Science March 1, 2023
  • 5. L08-5 Sze and Emer Sze and Emer FC – Vector Operation Counts March 1, 2023 // Level 2 loops for m in [0, M): for chw1 in [0, C*H*W/L): // Level 1 loops parallel_for chw0 in [0, L): o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0]; L == Lanes Vector operation on L lanes Order: m, chw1, chw0 Factors: M, C*H*W/L, L
  • 6. L08-6 Sze and Emer Sze and Emer FC – Vector Operation Counts March 1, 2023 How many MACs? M *(C*H*W/L) * L = M*C*H*W How many reads of “inputs” C*H*W*M How many reads of “weights” C*H*W*M How many writes of “outputs” M Measuring reads/writes in units of 32-bit integers // Level 2 loops for m in [0, M): for chw1 in [0, C*H*W/L): // Level 1 loops parallel_for chw0 in [0, L): o[m] += i[L*chw1+chw0] * f[C*H*W*m + L*chw1+chw0]; L == Lanes
  • 7. L08-7 Sze and Emer Sze and Emer Compute Intensity (MACs/Read) March 1, 2023 MACs/Read? C∗H∗W∗M C∗H∗W∗M C∗H∗W∗M ~ 1 2 If system can support 1 Read/MAC will system run at full throttle? No // Level 2 loops for m in [0, M): for chw2 in [0, C*H*W/L: // Level 1 loops parallel_for chw1 in [0, L): o[m] += i[L*chw2+chw1] * f[C*H*W*m + L*chw2+chw1]; L == Lanes
  • 8. L08-8 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 8 MAC Lanes with 1 Read/MAC . March 1, 2023 MACs/cycle limited by number of lanes Williams, Samuel, Andrew Waterman, and David Patterson. "Roofline: an insightful visual performance model for multicore architectures." Communications of the ACM 52.4 (2009): 65-76.
  • 9. L08-9 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 8 MAC Lanes with 1 Read/MAC March 1, 2023 Where will the previous slide’s code be? Compute intensity = 1/2 MACs/cycle limited by number of lanes MACs/cycle limited by operand delivery
  • 10. L08-10 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 8 MAC Lanes with 1 Read/MAC March 1, 2023 Where will the previous slide’s code be? Compute intensity = 1/2 o How can we change the compute intensity? Code changes, e.g., different splitting, loop inversions? MACs/cycle limited by number of lanes MACs/cycle limited by operand delivery
  • 11. L08-11 Sze and Emer Sze and Emer FC – Splitting/Reordered // Level 2 loops for m1 in [0, M/L): for chw in [0, C*H*W): // Level 1 loops parallel_for m0 in [0, L): o[m1*L+m1] += i[chw] * f[CHW*(m1*L+m0) + chw] L == Lanes Partiion “m” Partition “m” Order: m1, chw, m0 Factors: M/L, C*H*W, L
  • 12. L08-12 Sze and Emer Sze and Emer FC – Reordered + loop invariant hoisted March 1, 2023 // Level 2 loops for m2 in [0, M/L): for chw in [0, C*H*W): // Level 1 loops parallel_for m1 in [0, L): o[m2*L+m1] += i[chw] * f[CHW*(m2*L+m1) + chw] // Level 2 loops for m2 in [0, M/L): for chw in [0, C*H*W): i_chw = i[chw] // Level 1 loops for chw in [0, C*H*W): o[m2*L+m1] += i_chw * f[CHW*(m2*L+m1) + chw] L == Lanes L == Lanes
  • 13. L08-13 Sze and Emer Sze and Emer FC – Operation Counts March 1, 2023 How many MACs? C*H*W*M How many reads of “inputs” C*H*W*M/L How many reads of “weights” C*H*W*M How many writes of “outputs” M Measuring reads/writes in units of 32-bit integers // Level 2 loops for m1 in [0, M/L): for chw in [0, C*H*W): i_chw = i[chw] // Level 1 loops for chw in [0, C*H*W): o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw] L == Lanes
  • 14. L08-14 Sze and Emer Sze and Emer Compute Intensity March 1, 2023 MACs/Read? C∗H∗W∗M C∗H∗W∗M C∗H∗W∗M/L ~ 𝐿 1 + 𝐿 If system can support 1 Read/MAC will system run at full throttle? No // Level 2 loops for m1 in [0, M/L): for chw in [0, C*H*W): i_chw = i[chw] // Level 1 loops for chw in [0, C*H*W): o[m1*L+m0] += i_chw * f[CHW*(m1*L+m0) + chw]
  • 15. L08-15 Sze and Emer Sze and Emer Roofline Model 0 2 4 6 8 10 0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 MACs/Cycle Compute Intensity (MACs/read) 1 MAC/Read - 8 lanes March 1, 2023 Where will the previous slide’s code be? Compute intensity = L/(1+L) = 8/9 Why might points be below the line? Other overheads (e.g. instructions, stalls) x o Is being on the flat part always best? Not necessarily…
  • 16. L08-16 Sze and Emer Sze and Emer Computation Transformations • Goal: Bitwise same result, but reduce number of operations • Focuses mostly on compute March 1, 2023
  • 17. L08-17 Sze and Emer Sze and Emer Gauss’s Multiplication Algorithm 4 multiplications + 3 additions March 1, 2023
  • 18. L08-18 Sze and Emer Sze and Emer Strassen P1 = a(f – h) P2 = (a + b)h P3 = (c + d)e P4 = d(g – e) P5 = (a + d)(e + h) P6 = (b - d)(g + h) P7 = (a – c)(e + f) 8 multiplications + 4 additions 7 multiplications + 18 additions 7 multiplications + 13 additions (for constant B matrix – weights) [Cong et al., ICANN, 2014] March 1, 2023
  • 19. L08-19 Sze and Emer Sze and Emer Strassen Comes at the price of reduced numerical stability and requires significantly more memory N Naïve Strassen Complexity Image Source: http://www.stoimen.com/blog/2012/11/26/computer-algorithms-strassens-matrix-multiplication/ • Reduce the complexity of matrix multiplication from Θ(N3) to Θ(N2.807) by reducing multiplications March 1, 2023
  • 20. L08-20 Sze and Emer Sze and Emer Python to C++ Chart March 1, 2023 [Leiserson, There’s plenty of room at the top, Science, 2020]
  • 21. L08-21 Sze and Emer Sze and Emer Tensor Computations March 1, 2023 Matrix Multiply CONV Layer
  • 22. L08-22 Sze and Emer Sze and Emer Convolution (CONV) Layer … M … Many Input fmaps (N) Many Output fmaps (N) … R S R S C C filters P Q H C H W C P 1 1 N N W Q March 1, 2023 Batch Size (N)
  • 23. L08-23 Sze and Emer Sze and Emer CONV Layer Implementation Naïve 7-layer for-loop implementation: for n in [0..N): for m in [0..M): for q in [0..Q): for p in [0..P): O[n][m][p][q] = B[m]; for c in [0..C): for r in [0..R): for s in [0..S): O[n][m][p][q] += I[n][c][Up+r][Uq+s] × F[m][c][r][s]; O[n][m][p][q] = Activation(O[n][m][p][q]); for each output fmap value convolve a window and apply activatio n March 1, 2023
  • 24. L08-24 Sze and Emer Sze and Emer Winograd 1D – F(2,3) • Targeting convolutions instead of matrix multiply • Notation: F(size of output, filter size) [Lavin et al., CVPR 2016] 6 multiplications + 4 additions filter March 1, 2023 inputs outputs F(2,3) =
  • 25. L08-25 Sze and Emer Sze and Emer Winograd 1D – F(2,3) • Targeting convolutions instead of matrix multiply • Notation: F(size of output, filter size) [Lavin et al., CVPR 2016] filter March 1, 2023 inputs outputs F(2,3) = 4 multiplications + 12 additions + 2 shifts 4 multiplications + 8 additions (for constant weights)
  • 26. L08-26 Sze and Emer Sze and Emer Winograd 2D - F(2x2, 3x3) • 1D Winograd is nested to make 2D Winograd i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 Winograd: 16 multiplications  2.25 times reduction f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Original: 36 multiplications Filter Input Fmap Output Fmap * = March 1, 2023
  • 27. L08-27 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 Filter Input Fmap Output Fmap March 1, 2023
  • 28. L08-28 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap March 1, 2023
  • 29. L08-29 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap March 1, 2023
  • 30. L08-30 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap o02 o03 o12 o12 March 1, 2023
  • 31. L08-31 Sze and Emer Sze and Emer Winograd Halos • Winograd works on a small region (tile) of output at a time, and therefore uses inputs repeatedly i00 i01 i02 i03 i04 i05 i10 i11 i12 i13 i14 i15 i20 i21 i22 i23 i24 i25 i30 i31 i32 i33 i34 i35 f00 f01 f02 f10 f11 f12 f20 f21 f22 o00 o01 o10 o11 Filter Input Fmap Output Fmap o02 o03 o12 o12 Halo columns March 1, 2023
  • 32. L08-32 Sze and Emer Sze and Emer Winograd Performance Varies Source: Nvidia March 1, 2023
  • 33. L08-33 Sze and Emer Sze and Emer Winograd Summary • Winograd is an optimized computation for convolutions • It can significantly reduce multiplies – For example, for 3x3 filter by 2.25X • But, each filter size (and output size) is a different computation. March 1, 2023
  • 34. L08-34 Sze and Emer Sze and Emer Winograd as a Transform Transform inputs Element-wise Multiplication Transform output [Lavin, ArXiv 2015] Note: GfGT can be precomputed March 1, 2023 filter input
  • 35. L08-35 Sze and Emer Sze and Emer R filter (weights) S Fast Fourier Transform (FFT) Flow E F input fmap output fmap H W an output activation * = FFT(W) F F T FFT(I) X = FFT(O) F F T I F F T March 1, 2023
  • 36. L08-36 Sze and Emer Sze and Emer FFT Overview • Convert filter and input to frequency domain to make convolution a simple multiply then convert back to space domain. • Convert direct convolution O(No 2Nf 2) computation to O(No 2log2No) • Note that computational benefit of FFT decreases with decreasing size of filter March 1, 2023 [Mathieu, ArXiv 2013], [Vasilache, ArXiv 2014]
  • 37. L08-37 Sze and Emer Sze and Emer FFT Costs • Input and Filter matrices are ‘0-completed’, – i.e., expanded to size P+R-1 x Q+S-1 • Frequency domain matrices are same dimensions as input, but complex. • FFT often reduces computation, but requires much more memory space and bandwidth March 1, 2023
  • 38. L08-38 Sze and Emer Sze and Emer Optimization opportunities • FFT of real matrix is symmetric allowing one to save ½ the computes • Filters can be pre-computed and stored, but convolutional filter in frequency domain is much larger than in space domain • Can reuse frequency domain version of input for creating different output channels to avoid FFT re-computations • Can accumulate across channels before performing inverse transform to reduce number of IFFT March 1, 2023
  • 39. L08-39 Sze and Emer Sze and Emer cuDNN: Speed up with Transformations Source: Nvidia March 1, 2023
  • 40. L08-40 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = e f g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 41. L08-41 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 42. L08-42 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 43. L08-43 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 44. L08-44 Sze and Emer Sze and Emer UCNN – Convolution (Simplified) March 1, 2023 i00 i01 i02 i03 i10 i11 i12 i13 i20 i21 i22 i23 i30 i31 i32 i33 a b c d o00 o01 o10 o11 Filters Input Fmap Output Fmap * = b a g h 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + b 𝑖 +𝑎 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = 𝑎 𝑖 +𝑏 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑒 𝑖 +𝑓 𝑖 + 𝑔 𝑖 + h 𝑖 𝑜 = (𝑎 + 𝑏)𝑖 +(𝑎 + 𝑏) 𝑖 + c 𝑖 + 𝑑 𝑖 + 𝑔 𝑖 + h 𝑖 6 additions 6 multiplications 7 additions 8 multiplications [Hegde, ISCA 2018]
  • 45. L08-45 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 March 1, 2023
  • 46. L08-46 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 March 1, 2023
  • 47. L08-47 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Filtering steps March 1, 2023
  • 48. L08-48 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 Filtering steps March 1, 2023
  • 49. L08-49 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 Filtering steps 2D dot Product March 1, 2023
  • 50. L08-50 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 Filtering steps 2D dot Product March 1, 2023
  • 51. L08-51 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Filtering steps 2D dot Product March 1, 2023
  • 52. L08-52 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Filtering steps 2D dot Product March 1, 2023
  • 53. L08-53 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 Filtering steps 2D dot Product March 1, 2023
  • 54. L08-54 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 Filtering steps 2D dot Product March 1, 2023
  • 55. L08-55 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Filtering steps 2D dot Product March 1, 2023
  • 56. L08-56 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 57. L08-57 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 58. L08-58 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 59. L08-59 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 Flattened Filtering steps 2D dot Product March 1, 2023
  • 60. L08-60 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product March 1, 2023
  • 61. L08-61 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 62. L08-62 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 63. L08-63 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 64. L08-64 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Flattened Filtering steps 2D dot Product 1D dot Product March 1, 2023
  • 65. L08-65 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 4 5 7 8 1 2 3 4 ● = 3 5 6 8 9 1 2 3 4 ● = 4 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 Flattened Filtering steps . . . 2D dot Product 1D dot Product March 1, 2023
  • 66. L08-66 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened March 1, 2023
  • 67. L08-67 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened March 1, 2023
  • 68. L08-68 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened 1 2 4 5 1 2 3 4 ● = 1 March 1, 2023
  • 69. L08-69 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened 1 2 4 5 1 2 3 4 ● = 1 March 1, 2023
  • 70. L08-70 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . March 1, 2023
  • 71. L08-71 Sze and Emer Sze and Emer Convolution (CONV) Layer Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: March 1, 2023
  • 72. L08-72 Sze and Emer Sze and Emer Convolution (CONV) Layer Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . March 1, 2023
  • 73. L08-73 Sze and Emer Sze and Emer Convolution (CONV) Layer Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . March 1, 2023
  • 74. L08-74 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 75. L08-75 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 76. L08-76 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 77. L08-77 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 78. L08-78 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 79. L08-79 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 80. L08-80 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 1 2 March 1, 2023
  • 81. L08-81 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 1 2 March 1, 2023
  • 82. L08-82 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 1 2 3 March 1, 2023
  • 83. L08-83 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 March 1, 2023
  • 84. L08-84 Sze and Emer Sze and Emer Convolution (CONV) Layer × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 March 1, 2023
  • 85. L08-85 Sze and Emer Sze and Emer 1 2 3 4 Convolution (CONV) Layer 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 × = Flattened 1 2 4 5 1 2 3 4 ● = 1 2 3 5 6 1 2 3 4 ● = 2 . . . Matrix Multiply (by Toeplitz Matrix) 1 2 3 4 1 2 4 5 1 2 3 4 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 March 1, 2023
  • 86. L08-86 Sze and Emer Sze and Emer Convolution (CONV) Layer Convert to matrix multiply using the Toeplitz Matrix 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × = Convolution: Matrix Multiply (by Toeplitz Matrix) March 1, 2023
  • 87. L08-87 Sze and Emer Sze and Emer Convolution (CONV) Layer 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × = Data is repeated 1 2 3 4 5 6 7 8 9 1 2 3 4 Filter Input Fmap Output Fmap * = 1 2 3 4 Convolution: Matrix Multiply (by Toeplitz Matrix) March 1, 2023
  • 88. L08-88 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels 1 2 3 4 Filter 1 Input Fmap Chnl 1 * = 1 2 3 4 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 5 6 7 8 9 Output Fmap Key: Black: Input channel 1 Yellow: Input channel 2 C Filter C March 1, 2023
  • 89. L08-89 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels 1 2 3 4 Filter 1 Input Fmap Chnl 1 * = 1 2 3 4 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 5 6 7 8 9 Output Fmap Key: Black: Input channel 1 Yellow: Input channel 2 C Filter C 3D Stacks… March 1, 2023
  • 90. L08-90 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Filter 1 Chnl 1 Chnl 1 Output Fmap Filter March 1, 2023
  • 91. L08-91 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 92. L08-92 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 93. L08-93 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 94. L08-94 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 95. L08-95 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 2 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 96. L08-96 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 3 4 × Input Fmap as Toeplitz matrix • Multiple Input Channels 1 2 3 4 1 2 3 4 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 2 Chnl 1 Toeplitz converted inputs Output Fmap Filter March 1, 2023
  • 97. L08-97 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels and Output Channels 1 2 3 4 Filter 1 Chnl 1 * = 1 2 3 4 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 5 6 7 8 9 Key: Black: Input channel 1 Yellow: Input channel 2 Underlined: Output channel 2 Input Fmap Output Fmap C Filter C March 1, 2023
  • 98. L08-98 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels and Output Channels 1 2 3 4 Filter 1 Chnl 1 * = 1 2 3 4 1 2 3 4 Filter 2 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8 9 Key: Black: Input channel 1 Yellow: Input channel 2 Underlined: Output channel 2 Input Fmap Output Fmap C Filter C March 1, 2023
  • 99. L08-99 Sze and Emer Sze and Emer Convolution (CONV) Layer • Multiple Input Channels and Output Channels 1 2 3 4 Filter 1 Chnl 1 * = 1 2 3 4 1 2 3 4 Filter 2 Chnl 1 Chnl 2 1 2 3 4 5 6 7 8 9 Chnl 1 Chnl 2 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8 9 1 2 3 4 Chnl 2 Key: Black: Input channel 1 Yellow: Input channel 2 Underlined: Output channel 2 Input Fmap Output Fmap C Filter C March 1, 2023
  • 100. L08-100 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × • Multiple Input Channels and Output Channels 1 2 3 4 1 2 3 4 Chnl 1 Chnl 2 Filter 1 Chnl 1 Chnl 2 Chnl 1 Input Fmap as Toeplitz matrix Output Fmap Filter March 1, 2023
  • 101. L08-101 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × • Multiple Input Channels and Output Channels 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Chnl 1 Chnl 2 Filter 1 Filter 2 Chnl 1 Chnl 2 Chnl 1 Input Fmap as Toeplitz matrix Output Fmap Filter March 1, 2023
  • 102. L08-102 Sze and Emer Sze and Emer Convolution (CONV) Layer = 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 4 5 2 3 5 6 4 5 7 8 5 6 8 9 1 2 3 4 × • Multiple Input Channels and Output Channels 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 3 1 2 4 Chnl 1 Chnl 2 Filter 1 Filter 2 Chnl 1 Chnl 2 Chnl 1 Chnl 2 Input Fmap as Toeplitz matrix Output Fmap Filter March 1, 2023
  • 103. L08-103 Sze and Emer Sze and Emer Convolution (CONV) Layer M CRS CRS (H-R+1)(W-S+1)N Filters Input fmaps × Output fmaps M = (H-R+1)(W-S+1)N • Dimensions of matrices for matrix multiply in convolution layers with batch size N March 1, 2023 N=2 in example
  • 104. L08-104 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Simplify to 1-D with N=1, C=1, M=1, U=1 Break into two steps
  • 105. L08-105 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 4 5 6 3 4 5 2 3 4 1 2 3
  • 106. L08-106 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 4 5 6 3 4 5 2 3 4 1 2 3
  • 107. L08-107 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 4 5 6 3 4 5 2 3 4 1 2 3
  • 108. L08-108 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 4 5 6 3 4 5 2 3 4 1 2 3
  • 109. L08-109 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 4 5 6 3 4 5 2 3 4 1 2 3
  • 110. L08-110 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x = 4 5 6 3 4 5 2 3 4 1 2 3
  • 111. L08-111 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 112. L08-112 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 113. L08-113 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 114. L08-114 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 115. L08-115 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 116. L08-116 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 117. L08-117 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 118. L08-118 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 119. L08-119 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 120. L08-120 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 1 2 3 4 5 6 1 2 3 Filter Input Fmap Output Fmap * = 1 2 3 4 1 2 3 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3
  • 121. L08-121 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 122. L08-122 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 123. L08-123 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5
  • 124. L08-124 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2
  • 125. L08-125 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2
  • 126. L08-126 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1
  • 127. L08-127 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2
  • 128. L08-128 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3
  • 129. L08-129 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2
  • 130. L08-130 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 3
  • 131. L08-131 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3
  • 132. L08-132 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3
  • 133. L08-133 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4
  • 134. L08-134 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4 5 6 4 5
  • 135. L08-135 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 136. L08-136 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3
  • 137. L08-137 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5
  • 138. L08-138 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2
  • 139. L08-139 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 q s q+s 0 0 0
  • 140. L08-140 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 q s q+s 0 0 0
  • 141. L08-141 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 q s q+s 0 0 0 0 1 1
  • 142. L08-142 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 q s q+s 0 0 0 0 1 1
  • 143. L08-143 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 q s q+s 0 0 0 0 1 1 0 2 2
  • 144. L08-144 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2
  • 145. L08-145 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1
  • 146. L08-146 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1
  • 147. L08-147 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2
  • 148. L08-148 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2
  • 149. L08-149 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3
  • 150. L08-150 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3
  • 151. L08-151 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2
  • 152. L08-152 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2
  • 153. L08-153 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3
  • 154. L08-154 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3
  • 155. L08-155 Sze and Emer Sze and Emer 1-D Toeplitz Convolution Einsum March 1, 2023 Filter Input Fmap Output Fmap 1 2 3 4 5 6 x 1 2 3 4 = 4 5 6 3 4 5 2 3 4 1 2 3 1 2 3 1 2 3 0 1 2 3 4 5 0 1 2 3 <- q s V 0 1 2 1 2 3 4 2 3 3 4 5 6 4 5 q s q+s 0 0 0 0 1 1 0 2 2 1 0 1 1 1 2 1 2 3 2 0 2 2 1 3
  • 156. L08-156 Sze and Emer Sze and Emer 2-D Toeplitz Convolution Einsum March 1, 2023
  • 157. L08-157 Sze and Emer Sze and Emer 2-D Toeplitz Convolution Einsum March 1, 2023 Break out Toeplitz conversion Flatten ranks
  • 158. L08-158 Sze and Emer Sze and Emer Next Lecture: GPUs Thank you! March 1, 2023