"Separable Convolutions for Efficient Implementation of CNNs and Other Vision Algorithms," a Presentation from Phiar

© 2019 Phiar Technologies, Inc.
Separable Convolutions for
Efficient Implementation of CNNs
and Other Vision Algorithms
Chen-Ping Yu, PhD
Phiar Technologies, Inc.
May 2019

• AI-powered AR navigation platform for driving
2
• First product: AR navigation mobile app
• On-device processing with mobile sensors : AI + SLAM + path planning

Outline
• Spatial convolution in computer vision
• Separable convolution in computer vision
• Application in deep learning CNNs
• Low rank filter expansion (Jaderberg et al., 2014)
• Flattened CNN (Jin et al., 2015)
• MobileNet (Howard et al., 2017)
• Takeaways
• Resources
3

Spatial convolution
4
• Running a filter through an
input image
• Smoothing (Gaussian)
• Template matching f k
Image source: http://deeplearning.net/software/theano_versions/dev/tutorial/conv_arithmetic.html; http://www.cse.psu.edu/~rtc12/CSE486/lecture07.pdf

Spatial convolution
5
• Convolve f with a filter k: highest response at the matched locations
Image source: http://www.cse.psu.edu/~rtc12/CSE486/lecture07.pdf
• When both f and k are normalized (zero mean & unit standard deviation)
• Really cross-correlation, but is often called interchangeably

Spatial convolution – zero padding, stride = 1
6
f k f k⊗
0*-1 + 0*0 + 0*1 +
0*-2 + 6*0 + 3*2 +
0*-1 + 4*0 + 2*1
86 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0
-1 0 1
-2 0 2
-1 0 1

7
f k
8
f k⊗
0*-1 + 0*0 + 0*1 +
6*-2 + 3*0 + 3*2 +
4*-1 + 2*0 + 1*1
-96 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0
-1 0 1
-2 0 2
-1 0 1

8
f k
8 -9
f k⊗
0*-1 + 0*0 + 0*1 +
3*-2 + 3*0 + 6*2 +
2*-1 + 1*0 + 8*1
126 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0
-1 0 1
-2 0 2
-1 0 1

9
f k
8 -9 12
f k⊗
0*-1 + 0*0 + 0*1 +
3*-2 + 6*0 + 0*2 +
1*-1 + 8*0 + 0*1
-76 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0
-1 0 1
-2 0 2
-1 0 1

10
f k
8 -9 12 -7
f k⊗
0*-1 + 6*0 + 3*1 +
0*-2 + 4*0 + 2*2 +
0*-1 + 2*0 + 5*1
12
6 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
0 0 0 0 0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0
-1 0 1
-2 0 2
-1 0 1

Separable convolution
11
Sobel filter
(edge detector)
⊗
-1 0 1
-2 0 2
-1 0 1
=-1 0 1
1
2
1
3x3
1x3
3x1
Also equivalent to outer product, or
matrix multiplication
Turns out, k can be decomposed into a column vector
that is convolved by a row vector in 1D
kc r

= f c r
= ( f c ) r
f k = f ( c r )
Let k = c r
Separable convolution
12
-1 0 1
-2 0 2
-1 0 1
=>
-1 0 1
1
2
1
6 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
⊗
Associativity of convolution:
f ( c r ) = ( f c ) r⊗⊗ ⊗ ⊗
⊗
⊗ ⊗ ⊗
⊗ ⊗
⊗ ⊗
8 -9 12 -7
12 -5 14 -11
10 -2 2 -15
11 -10 -5 -10
6 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
⊗ ⊗ =>
8 -9 12 -7
12 -5 14 -11
10 -2 2 -15
11 -10 -5 -10
f
f
k
c r f c r
f k⊗
⊗ ⊗

Separable convolution: significantly reduced complexity
13
-1 0 1
-2 0 2
-1 0 1
-1 0 1
1
2
1
6 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
9 multiplications per pixel
Let n = side size of the filter
2D filter: n*n => O(n2)
3D filter: n*n*n => O(n3)
6 3 3 6
4 2 1 8
2 5 6 4
9 3 2 1
⊗ ⊗
3x3
⊗
3x1 3x1
3 multiplications
+
3 multiplications
2D filter: n+n = 2n => O(n)
3D filter: n+n+n = 3n => O(n)

Separable convolution example: 3D tumor detection
14
Image source: http://www.cse.psu.edu/~rtc12/CSE486/lecture07.pdf; Yu et al., 2017: http://chenpingyu.org/docs/yu_isbi2014.pdf
• Yu et al., Stony Brook U., ISBI 2014
• Intel i7 dual core @ 2.7 Ghz
• Head MRI: 256 x 256 x 256
• 8 scales of 3D LoG filters
• Regular conv: > 2 hours
• Separable conv: < 2 min
2D Laplacian of Gaussian filter
“Blob” detector
3D Separable LoG
Mult per voxel: from n3 to 9n

Deep learning example: Low rank filter expansion
• Jaderberg et al., University of Oxford, BMVC 2014
• Reconstruct 3D filters in a pre-trained network with 1D and 2D filters
• Approximation 1: use purely 1D filters
• Approximation 2: use 1D filters followed by 2D filters
• 4-layer CNN; text recognition; 2.5~4.5x speed up; <1% accuracy tradeoff
15
Image source: https://arxiv.org/pdf/1405.3866.pdf
1) 2)

Deep learning example: Flatten CNN
• Jin et al., Purdue University, ICLR Workshop 2015
• Train a network from scratch with a sequence of 1D filters
• Baseline: 3 conv (5x5) + 2 FC layers; swap each conv with 2 flattened set
• CIFAR-10/100, MNIST; 2~3.5x speedup at same or better accuracy
16

Deep learning example: MobileNet V1
17
• Howard et al., Google Inc., ArXiv 2017
• Depthwise separable convolutions: 3 x 3 x 1 then 1 x 1 x D
• 28 layers, a number of variants
• ImageNet, Stanford Dogs, Im2GPS, YFCC100M, COCO
• At comparable accuracy, 4.2M parameters vs 138M of VGG-16

Takeaways - deep learning applicability
• Faster inference and training time (~20% faster training on ImageNet)
• Computational savings get even more with larger filters (per pixel)
• A 3 x 3 x 64 filter: from 576 multiplications down to 70
• A 15 x 15 x 64 filter: from 14,400 multiplications down to 94
• A 35 x 35 x 64 filter: from 78,400 multiplications down to 134
• Allows more contextual information
• Allows deeper & wider network, use residual connections to avoid
vanishing gradient
• Especially good at early layers – reducing large input sizes’ complexity
18

Resources, and we are hiring!
19
Relevant Papers & Materials
Yu et al., 2014, “3D Blob Based Tumor Detection and
Segmentation in MR Images.”
http://chenpingyu.org/docs/yu_isbi2014.pdf
Jaderberg et al., 2014, “Speeding Up Convolution Neural
Networks with Low Rank Expansions.”
https://arxiv.org/pdf/1405.3866.pdf
Jin et al., 2015, “Flattened Convolutional Neural Networks
For Feedforward Acceleration.”
Howard et al., 2017, “MobileNets: Efficient Convolutional
Neural Networks for Mobile Vision Applications.”
Computer Vision Lecture Notes, Penn State University
http://www.cse.psu.edu/~rtc12/CSE486/lecture07.pdf
Website
https://www.phiar.net
Currently Hiring for:
• SLAM Engineer
• Computer Vision/Deep Learning Engineer
• Full-Stack Software Engineer
• Product Manager
Embedded Vision Summit

"Separable Convolutions for Efficient Implementation of CNNs and Other Vision Algorithms," a Presentation from Phiar

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to "Separable Convolutions for Efficient Implementation of CNNs and Other Vision Algorithms," a Presentation from Phiar

Similar to "Separable Convolutions for Efficient Implementation of CNNs and Other Vision Algorithms," a Presentation from Phiar (20)

More from Edge AI and Vision Alliance

More from Edge AI and Vision Alliance (20)

Recently uploaded

Recently uploaded (20)

"Separable Convolutions for Efficient Implementation of CNNs and Other Vision Algorithms," a Presentation from Phiar