Introduction of Mobile CNN

2016 UEC Tokyo.
Introduction of Mobile CNN
2016/11/10(Thu)
Department of Informatics,
The University of Electro-Communications,
Yanai Laboratory,
Ryosuke Tanno

ⓒ 2016 UEC Tokyo.
• Affiliation: master 1 student at University of Electro-
Communications(Yanai Laboratory)
• Research:
– Bachelor: Implementation and Comparative Analysis of Image
Recognition System based on Deep Learning on Mobile OS
– Master: Image Recognition and Image Transfer
based on Deep Learning
Self Introduction

ⓒ 2016 UEC Tokyo.
Contributions
• Stand-alone DCNN-based mobile image recognition
– No need of a recognition server and communication.
– Built-it trained DCNN model with UECFOOD-100
– Implemented as iOS/Android app.
– Released as iOS app on https://goo.gl/4m2tQz
– as Android app (APK) on http://foodcam.mobi/
• Excellent performance with reasonable speed and model size
– UECFOOD100 : 78.8% (top-1) 95.2% (top-5)
in 55.7 [ms] with 5.5M weights (22MB)
– Employing Network-in-Network
– Adding batch normalization and additional layers
• Multi-scale recognition
– User can choose the balance between speed and accuracy
• 26.2[ms] for 160x160 images ⇔ 55.7[ms] for 227x227 images (on iPhone 7 Plus)

ⓒ 2016 UEC Tokyo.
CNN architecture (1)
• The amounts of weights in AlexNet and VGG-16 are
too much for mobile.
• GoogLeNet is too complicated
for efficient parallel implemen
-tation. (It has many branches.)

ⓒ 2016 UEC Tokyo.
CNN architecture (2)
• We adopt Network-in-Network (NIN).
– No fully-connected layers (which bring less weights)
– Straight flow and consisting of many conv layers
⇒ It’s easy for parallel implementation.
Efficient computation for conv layers is needed !
Network-In-Network(NIN)

ⓒ 2016 UEC Tokyo.
Extension of NIN
adding BN, 5layers, multiple image size
• Modified models (BN, 5layer, multi-scale)
– adding BN layers just after all the conv/cccp layers
– replaced 5x5 conv with two 3x3 conv layers
– reduced the number of kernels in conv 4 from 1024 to 768
– replaced fixed average pooling with Global Average Pooling
• Multiple image size
4layers
5layers+BN
227x227 180x180 160x160 Trade-off: Accuracy vs speed
227x227
55.7ms 78.8%
180x180
35.5ms 76.0%
160x160
26.3ms 71.5%Global Average Pooling (GAP)

ⓒ 2016 UEC Tokyo.
• Speeding up Conv layers →Speeding up GEMM
– computation of conv layer is decomposed into “im2col”
operation and generic matric multiplications(GEMM)
– Multi-threading: Use 2cores in iOS , 4 cores in Android in
parallel
– SIMD instruction(NEON in ARM-based processor)
• Total: iOS: 2Core*4 = 8calculation, Android: 4Core*4 = 16 calculation
– BLAS library(highly optimized for iOS ⇔ not optimized for
Android)
• BLAS(iOS: BLAS in iOS Accelerate Framework, Android: OpenBLAS)
Fast Implementation on Mobile

ⓒ 2016 UEC Tokyo.
Fast computation of conv layers
- efficient GEMM with 4 cores and BLAS/NEON -
• Conv = im2col + GEMM (Generic Matrix Multiplication)
conv. kernels
input feature maps
kernel 2
kernel 3
kernel 1
2
3
patch
1
patch
2
patch
3
patch
4
patch
5
im2col
matrix multiplication (=conv. layer computation)
1
4
kernel 4
Parallel computation over multiple cores
Inside each core NEON or BLAS is used.
patch1
patch2
patch3
patch4
patch5
patch1
patch2
patch3
patch4
patch5
kernel 1
Core1
NEON
or BLAS
kernel 2
patch1
patch2
patch3
patch4
patch5
Core2
NEON
or BLAS
Core3 Core4
kernel 3 patch1
patch2
patch3
patch4
patch5
patch1
patch2
patch3
patch4
patch5
kernel 4
NEON
or BLAS NEON
or BLAS

ⓒ 2016 UEC Tokyo.
Evaluation: Processing time
• iOS: BLAS >> NEON, Android: BLAS << NEON
– The BLAS library in iOS Accelerate Framework is very efficient !
iOS
iOS
iOS
Android
Trade-off:
accuracy vs. speed
by changing the size of input images
Fastest !
Most accurate !
Achieved “real”
real-time 26.2 ms!

ⓒ 2016 UEC Tokyo.
Comparison to FV-based FoodCam
with UEC-FOOD100 dataset
• Much improved ( 65.3% ⇒ 81.5% (top-1) )
• Even for 160x160 improved ( 65.3% ⇒ 71.5% )
60.0%
65.0%
70.0%
75.0%
80.0%
85.0%
90.0%
95.0%
100.0%
1 2 3 4 5 6 7 8 9 10
AlexNet
NIN 5layer [104ms]
NIN 4layer [67ms]
NIN 4layer (160x160) [33ms]
FV (Color+HOG) [65ms]
Top1:
81.5%
Top1:
65.3%
Top5:
96.2%
Top5:
86.7%

2016 UEC Tokyo.
) ) ( )
iOS
DeepFoodCam
https://itunes.apple.com/jp/app/deepf
oodcam/id1111261423?mt=8
iOS
RealTimeMultiStyleTransfer
https://itunes.apple.com/jp/app/realti
memultistyletransfer/id1161707531?
mt=8

Introduction of Mobile CNN

Recommended

Recommended

More Related Content

Similar to Introduction of Mobile CNN

Similar to Introduction of Mobile CNN (20)

More from Ryosuke Tanno

More from Ryosuke Tanno (15)

Introduction of Mobile CNN