New Pointwise Convolution in Deep Neural Networks through Extremely Fast and Non-parameteric Transforms

New Pointwise Convolution
in
Deep Neural Networks
through Extremely Fast and
Non-parametric Transforms
Joonhyun Jeong l Sung-Ho Bae
Kyung-Hee University

01 Background & Motivation
02 Method
03 Conclusion
INDEX
2

Background
4
Standard convolution kernels
Spatial specific convolution kernels
Channel specific convolution (pointwise convolution) kernels
Depthwise separable convolution
Dk
* notations
N
M
= number of output channels
= number of input channels
= filter size

• But, existing pointwise convolution needs a lot of parameters and FLOPs!
• This study is focused on reducing complexity of number of weights and FLOPs
needed in pointwise convolution through conventional transforms
Motivation
5
others
5%
Pointwise Convolution
95%
others
25%
Pointwise Convolution
75%
Ratio of params in MobileNet-V1 Ratio of FLOPs in MobileNet-V1

Method
• Pointwise Convolution with conventional transforms
• Optimal block structure for conventional transforms
• Optimal hierarchical level layers for conventional transforms

6
METHOD01
Pointwise Convolution(PC) using conventional transforms
Method
Discrete Cosine Transform (DCT) kernels
Discrete Walsh-Hadamard Transform (DWHT) kernels
Conventional pointwise convolution kernels
first kernel
second kernel
third kernel
…
first kernel second kernel
third kernel …
first kernel second kernel third kernel

• Two nice properties of conventional transforms 
➢ No learnable parameters needed: no MAC(Memory Access Cost) toward weight parameters. 
➢ Fast computation version :   O(N2
) O(NlogN)
7
METHOD02
Method
• We can construct efficient neural network in the viewpoint of memory space and fast computation.

• Fast version of Discrete Walsh Hadamard Transform
8
METHOD03
Method
• Fast version of Discrete Cosine Transform : adopted Kok’s fast DCT algorithm (1997)
No Multiplication needed!
DWHT and DCT can be extremely fast!

Baseline(ShuffleNet-V2)
9
METHOD04
Optimal block structure
for conventional transforms
Method
* notations
: Conventional Transform Pointwise ConvolutionCTPC ReLU after CTPC No ReLU after CTPC

Optimal block structure for conventional transforms.
10
METHOD05
Method
➤ Applying ReLU after conventional transform degraded accuracy significantly.
* notations
: CTPC in block (b) is DCT(b)-DCT
: CTPC in block (b) is DCT(b)-DWHT
: CTPC in block (c) is DCT(c)-DCT
: CTPC in block (c) is DWHT(c)-DWHT

11
METHOD06
Method
Optimal hierarchical block levels for conventional transforms (ShuffleNet-V2)
* notations
: baseline block (a)
: (a) blocks in these range are all replaced by our optimal block
• High level
• Low level
• Middle level
High-level model 2
High-level model 1
High-level model 3
Mid-level model 2
Mid-level model 1
Mid-level model 3
Low-level model 2
Low-level model 1
Low-level model 3

11
METHOD07
Method
➤ High-level blocks are favored by the proposed pointwise convolution layer.
Optimal hierarchical block levels for conventional transforms (ShuffleNet-V2)
High level block Middle level block Low level block

accuracy increase 1.49% 
79.1% weight reduced
48.4% FLOPs reduced
compared to Baseline model!!!
12
METHOD07
Optimal hierarchical block levels for conventional transforms (MobileNet V1)
Method

Conclusion
14
• We proposed extremely fast and non-
parametric pointwise convolution!
• Especially, DWHT is extremely efficient in
computation because of no multiplication
but addition or subtraction.
• We found the optimal block and hierarchical
block levels for conventional transforms.

THANK YOU
new pointwise convolution in deep neural networks
through extremely fast and non-parametric transforms
15
Joonhyun Jeong l Sung-Ho Bae
Kyung-Hee University

New Pointwise Convolution in Deep Neural Networks through Extremely Fast and Non-parameteric Transforms

More Related Content

What's hot

Similar to New Pointwise Convolution in Deep Neural Networks through Extremely Fast and Non-parameteric Transforms

Recently uploaded

New Pointwise Convolution in Deep Neural Networks through Extremely Fast and Non-parameteric Transforms