SlideShare a Scribd company logo
1 of 11
Download to read offline
2016 UEC Tokyo.
Introduction of Mobile CNN
2016/11/10(Thu)
Department of Informatics,
The University of Electro-Communications,
Yanai Laboratory,
Ryosuke Tanno
ⓒ 2016 UEC Tokyo.
• Affiliation: master 1 student at University of Electro-
Communications(Yanai Laboratory)
• Research:
– Bachelor: Implementation and Comparative Analysis of Image
Recognition System based on Deep Learning on Mobile OS
– Master: Image Recognition and Image Transfer
based on Deep Learning
Self Introduction
ⓒ 2016 UEC Tokyo.
Contributions
• Stand-alone DCNN-based mobile image recognition
– No need of a recognition server and communication.
– Built-it trained DCNN model with UECFOOD-100
– Implemented as iOS/Android app.
– Released as iOS app on https://goo.gl/4m2tQz
– as Android app (APK) on http://foodcam.mobi/
• Excellent performance with reasonable speed and model size
– UECFOOD100 : 78.8% (top-1) 95.2% (top-5)
in 55.7 [ms] with 5.5M weights (22MB)
– Employing Network-in-Network
– Adding batch normalization and additional layers
• Multi-scale recognition
– User can choose the balance between speed and accuracy
• 26.2[ms] for 160x160 images ⇔ 55.7[ms] for 227x227 images (on iPhone 7 Plus)
ⓒ 2016 UEC Tokyo.
CNN architecture (1)
• The amounts of weights in AlexNet and VGG-16 are
too much for mobile.
• GoogLeNet is too complicated
for efficient parallel implemen
-tation. (It has many branches.)
ⓒ 2016 UEC Tokyo.
CNN architecture (2)
• We adopt Network-in-Network (NIN).
– No fully-connected layers (which bring less weights)
– Straight flow and consisting of many conv layers
⇒ It’s easy for parallel implementation.
Efficient computation for conv layers is needed !
Network-In-Network(NIN)
ⓒ 2016 UEC Tokyo.
Extension of NIN
adding BN, 5layers, multiple image size
• Modified models (BN, 5layer, multi-scale)
– adding BN layers just after all the conv/cccp layers
– replaced 5x5 conv with two 3x3 conv layers
– reduced the number of kernels in conv 4 from 1024 to 768
– replaced fixed average pooling with Global Average Pooling
• Multiple image size
4layers
5layers+BN
227x227 180x180 160x160 Trade-off: Accuracy vs speed
227x227
55.7ms 78.8%
180x180
35.5ms 76.0%
160x160
26.3ms 71.5%Global Average Pooling (GAP)
ⓒ 2016 UEC Tokyo.
• Speeding up Conv layers →Speeding up GEMM
– computation of conv layer is decomposed into “im2col”
operation and generic matric multiplications(GEMM)
– Multi-threading: Use 2cores in iOS , 4 cores in Android in
parallel
– SIMD instruction(NEON in ARM-based processor)
• Total: iOS: 2Core*4 = 8calculation, Android: 4Core*4 = 16 calculation
– BLAS library(highly optimized for iOS ⇔ not optimized for
Android)
• BLAS(iOS: BLAS in iOS Accelerate Framework, Android: OpenBLAS)
Fast Implementation on Mobile
ⓒ 2016 UEC Tokyo.
Fast computation of conv layers
- efficient GEMM with 4 cores and BLAS/NEON -
• Conv = im2col + GEMM (Generic Matrix Multiplication)
conv. kernels
input feature maps
kernel 2
kernel 3
kernel 1
2
3
patch
1
patch
2
patch
3
patch
4
patch
5
im2col
matrix multiplication (=conv. layer computation)
1
4
kernel 4
Parallel computation over multiple cores
Inside each core NEON or BLAS is used.
patch1
patch2
patch3
patch4
patch5
patch1
patch2
patch3
patch4
patch5
kernel 1
Core1
NEON
or BLAS
kernel 2
patch1
patch2
patch3
patch4
patch5
Core2
NEON
or BLAS
Core3 Core4
kernel 3 patch1
patch2
patch3
patch4
patch5
patch1
patch2
patch3
patch4
patch5
kernel 4
NEON
or BLAS NEON
or BLAS
ⓒ 2016 UEC Tokyo.
Evaluation: Processing time
• iOS: BLAS >> NEON, Android: BLAS << NEON
– The BLAS library in iOS Accelerate Framework is very efficient !
iOS
iOS
iOS
Android
Trade-off:
accuracy vs. speed
by changing the size of input images
Fastest !
Most accurate !
Achieved “real”
real-time 26.2 ms!
ⓒ 2016 UEC Tokyo.
Comparison to FV-based FoodCam
with UEC-FOOD100 dataset
• Much improved ( 65.3% ⇒ 81.5% (top-1) )
• Even for 160x160 improved ( 65.3% ⇒ 71.5% )
60.0%
65.0%
70.0%
75.0%
80.0%
85.0%
90.0%
95.0%
100.0%
1 2 3 4 5 6 7 8 9 10
AlexNet
NIN 5layer [104ms]
NIN 4layer [67ms]
NIN 4layer (160x160) [33ms]
FV (Color+HOG) [65ms]
Top1:
81.5%
Top1:
65.3%
Top5:
96.2%
Top5:
86.7%
2016 UEC Tokyo.
) ) ( )
iOS
DeepFoodCam
https://itunes.apple.com/jp/app/deepf
oodcam/id1111261423?mt=8
iOS
RealTimeMultiStyleTransfer
https://itunes.apple.com/jp/app/realti
memultistyletransfer/id1161707531?
mt=8

More Related Content

Similar to Introduction of Mobile CNN

(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
Bomm Kim
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
convolutional_neural_networks.pptx
convolutional_neural_networks.pptxconvolutional_neural_networks.pptx
convolutional_neural_networks.pptx
MsKiranSingh
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
inside-BigData.com
 
“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...
“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...
“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...
Edge AI and Vision Alliance
 

Similar to Introduction of Mobile CNN (20)

Future Internet: Managing Innovation and Testbed
Future Internet: Managing Innovation and TestbedFuture Internet: Managing Innovation and Testbed
Future Internet: Managing Innovation and Testbed
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded SystemsApplying Deep Learning Vision Technology to low-cost/power Embedded Systems
Applying Deep Learning Vision Technology to low-cost/power Embedded Systems
 
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
MIPI DevCon 2021: MIPI CSI-2 v4.0 Panel Discussion with the MIPI Camera Worki...
 
dwdwd
dwdwddwdwd
dwdwd
 
Rate and Performance Analysis of Indoor Optical Camera Communications in Opti...
Rate and Performance Analysis of Indoor Optical Camera Communications in Opti...Rate and Performance Analysis of Indoor Optical Camera Communications in Opti...
Rate and Performance Analysis of Indoor Optical Camera Communications in Opti...
 
Inria Tech Talk : Améliorez vos applications de robotique & réalité augmentée
Inria Tech Talk : Améliorez vos applications de robotique & réalité augmentéeInria Tech Talk : Améliorez vos applications de robotique & réalité augmentée
Inria Tech Talk : Améliorez vos applications de robotique & réalité augmentée
 
Rethinking the Mobile Code Offloading Paradigm: From Concept to Practice
Rethinking the Mobile Code Offloading Paradigm: From Concept to PracticeRethinking the Mobile Code Offloading Paradigm: From Concept to Practice
Rethinking the Mobile Code Offloading Paradigm: From Concept to Practice
 
CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNING
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
Research and activity report
Research and activity reportResearch and activity report
Research and activity report
 
convolutional_neural_networks.pptx
convolutional_neural_networks.pptxconvolutional_neural_networks.pptx
convolutional_neural_networks.pptx
 
kanimozhi2019.pdf
kanimozhi2019.pdfkanimozhi2019.pdf
kanimozhi2019.pdf
 
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
Abstractions and Directives for Adapting Wavefront Algorithms to Future Archi...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
The Network Revolution, John Zannos, Canonical
The Network Revolution, John Zannos, CanonicalThe Network Revolution, John Zannos, Canonical
The Network Revolution, John Zannos, Canonical
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
 
“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...
“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...
“FOMO: Real-Time Object Detection on Microcontrollers,” a Presentation from E...
 

More from Ryosuke Tanno

More from Ryosuke Tanno (15)

【2017年度】勉強会資料_学習に関するテクニック
【2017年度】勉強会資料_学習に関するテクニック【2017年度】勉強会資料_学習に関するテクニック
【2017年度】勉強会資料_学習に関するテクニック
 
【2016年度】勉強会資料_word2vec
【2016年度】勉強会資料_word2vec【2016年度】勉強会資料_word2vec
【2016年度】勉強会資料_word2vec
 
【2016年度】勉強会資料_Chainer
【2016年度】勉強会資料_Chainer【2016年度】勉強会資料_Chainer
【2016年度】勉強会資料_Chainer
 
MMM2017参加報告
MMM2017参加報告MMM2017参加報告
MMM2017参加報告
 
ECCV2016, ACMMM2016参加報告
ECCV2016, ACMMM2016参加報告ECCV2016, ACMMM2016参加報告
ECCV2016, ACMMM2016参加報告
 
敵対的生成ネットワークによる食事画像の変換に関する研究
敵対的生成ネットワークによる食事画像の変換に関する研究敵対的生成ネットワークによる食事画像の変換に関する研究
敵対的生成ネットワークによる食事画像の変換に関する研究
 
深層学習ネットワークのモバイル実装
深層学習ネットワークのモバイル実装深層学習ネットワークのモバイル実装
深層学習ネットワークのモバイル実装
 
研究紹介
研究紹介研究紹介
研究紹介
 
第三回勉強会 Keras担当回
第三回勉強会 Keras担当回第三回勉強会 Keras担当回
第三回勉強会 Keras担当回
 
複数スタイルの融合と 部分的適用を可能とする Multi-style Feed-forward Networkの提案
複数スタイルの融合と部分的適用を可能とするMulti-style Feed-forward Networkの提案複数スタイルの融合と部分的適用を可能とするMulti-style Feed-forward Networkの提案
複数スタイルの融合と 部分的適用を可能とする Multi-style Feed-forward Networkの提案
 
Conditional CycleGANによる食事画像変換
Conditional CycleGANによる食事画像変換Conditional CycleGANによる食事画像変換
Conditional CycleGANによる食事画像変換
 
モバイルOS上での深層学習による 画像認識システムの実装と比較分析
モバイルOS上での深層学習による 画像認識システムの実装と比較分析モバイルOS上での深層学習による 画像認識システムの実装と比較分析
モバイルOS上での深層学習による 画像認識システムの実装と比較分析
 
AR DeepCalorieCam: AR表示型食事カロリー量推定システム
AR DeepCalorieCam: AR表示型食事カロリー量推定システムAR DeepCalorieCam: AR表示型食事カロリー量推定システム
AR DeepCalorieCam: AR表示型食事カロリー量推定システム
 
CoreMLによるiOS深層学習アプリの実装と性能分析
CoreMLによるiOS深層学習アプリの実装と性能分析CoreMLによるiOS深層学習アプリの実装と性能分析
CoreMLによるiOS深層学習アプリの実装と性能分析
 
OpenCV DNN module vs. Ours method
OpenCV DNN module vs. Ours method OpenCV DNN module vs. Ours method
OpenCV DNN module vs. Ours method
 

Recently uploaded

Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)
Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)
Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (8)

Android Application Components with Implementation & Examples
Android Application Components with Implementation & ExamplesAndroid Application Components with Implementation & Examples
Android Application Components with Implementation & Examples
 
Mobile App Penetration Testing Bsides312
Mobile App Penetration Testing Bsides312Mobile App Penetration Testing Bsides312
Mobile App Penetration Testing Bsides312
 
Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)
Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)
Obat Penggugur Kandungan Di Apotik Kimia Farma (087776558899)
 
Mobile Application Development- Configuration and Android Installation
Mobile Application Development- Configuration and Android InstallationMobile Application Development- Configuration and Android Installation
Mobile Application Development- Configuration and Android Installation
 
Bromazolam CAS 71368-80-4 high quality opiates, Safe transportation, 99% pure
Bromazolam CAS 71368-80-4 high quality opiates, Safe transportation, 99% pureBromazolam CAS 71368-80-4 high quality opiates, Safe transportation, 99% pure
Bromazolam CAS 71368-80-4 high quality opiates, Safe transportation, 99% pure
 
Satara Call girl escort *74796//13122* Call me punam call girls 24*7hour avai...
Satara Call girl escort *74796//13122* Call me punam call girls 24*7hour avai...Satara Call girl escort *74796//13122* Call me punam call girls 24*7hour avai...
Satara Call girl escort *74796//13122* Call me punam call girls 24*7hour avai...
 
Mobile Application Development-Android and It’s Tools
Mobile Application Development-Android and It’s ToolsMobile Application Development-Android and It’s Tools
Mobile Application Development-Android and It’s Tools
 
Mobile Application Development-Components and Layouts
Mobile Application Development-Components and LayoutsMobile Application Development-Components and Layouts
Mobile Application Development-Components and Layouts
 

Introduction of Mobile CNN

  • 1. 2016 UEC Tokyo. Introduction of Mobile CNN 2016/11/10(Thu) Department of Informatics, The University of Electro-Communications, Yanai Laboratory, Ryosuke Tanno
  • 2. ⓒ 2016 UEC Tokyo. • Affiliation: master 1 student at University of Electro- Communications(Yanai Laboratory) • Research: – Bachelor: Implementation and Comparative Analysis of Image Recognition System based on Deep Learning on Mobile OS – Master: Image Recognition and Image Transfer based on Deep Learning Self Introduction
  • 3. ⓒ 2016 UEC Tokyo. Contributions • Stand-alone DCNN-based mobile image recognition – No need of a recognition server and communication. – Built-it trained DCNN model with UECFOOD-100 – Implemented as iOS/Android app. – Released as iOS app on https://goo.gl/4m2tQz – as Android app (APK) on http://foodcam.mobi/ • Excellent performance with reasonable speed and model size – UECFOOD100 : 78.8% (top-1) 95.2% (top-5) in 55.7 [ms] with 5.5M weights (22MB) – Employing Network-in-Network – Adding batch normalization and additional layers • Multi-scale recognition – User can choose the balance between speed and accuracy • 26.2[ms] for 160x160 images ⇔ 55.7[ms] for 227x227 images (on iPhone 7 Plus)
  • 4. ⓒ 2016 UEC Tokyo. CNN architecture (1) • The amounts of weights in AlexNet and VGG-16 are too much for mobile. • GoogLeNet is too complicated for efficient parallel implemen -tation. (It has many branches.)
  • 5. ⓒ 2016 UEC Tokyo. CNN architecture (2) • We adopt Network-in-Network (NIN). – No fully-connected layers (which bring less weights) – Straight flow and consisting of many conv layers ⇒ It’s easy for parallel implementation. Efficient computation for conv layers is needed ! Network-In-Network(NIN)
  • 6. ⓒ 2016 UEC Tokyo. Extension of NIN adding BN, 5layers, multiple image size • Modified models (BN, 5layer, multi-scale) – adding BN layers just after all the conv/cccp layers – replaced 5x5 conv with two 3x3 conv layers – reduced the number of kernels in conv 4 from 1024 to 768 – replaced fixed average pooling with Global Average Pooling • Multiple image size 4layers 5layers+BN 227x227 180x180 160x160 Trade-off: Accuracy vs speed 227x227 55.7ms 78.8% 180x180 35.5ms 76.0% 160x160 26.3ms 71.5%Global Average Pooling (GAP)
  • 7. ⓒ 2016 UEC Tokyo. • Speeding up Conv layers →Speeding up GEMM – computation of conv layer is decomposed into “im2col” operation and generic matric multiplications(GEMM) – Multi-threading: Use 2cores in iOS , 4 cores in Android in parallel – SIMD instruction(NEON in ARM-based processor) • Total: iOS: 2Core*4 = 8calculation, Android: 4Core*4 = 16 calculation – BLAS library(highly optimized for iOS ⇔ not optimized for Android) • BLAS(iOS: BLAS in iOS Accelerate Framework, Android: OpenBLAS) Fast Implementation on Mobile
  • 8. ⓒ 2016 UEC Tokyo. Fast computation of conv layers - efficient GEMM with 4 cores and BLAS/NEON - • Conv = im2col + GEMM (Generic Matrix Multiplication) conv. kernels input feature maps kernel 2 kernel 3 kernel 1 2 3 patch 1 patch 2 patch 3 patch 4 patch 5 im2col matrix multiplication (=conv. layer computation) 1 4 kernel 4 Parallel computation over multiple cores Inside each core NEON or BLAS is used. patch1 patch2 patch3 patch4 patch5 patch1 patch2 patch3 patch4 patch5 kernel 1 Core1 NEON or BLAS kernel 2 patch1 patch2 patch3 patch4 patch5 Core2 NEON or BLAS Core3 Core4 kernel 3 patch1 patch2 patch3 patch4 patch5 patch1 patch2 patch3 patch4 patch5 kernel 4 NEON or BLAS NEON or BLAS
  • 9. ⓒ 2016 UEC Tokyo. Evaluation: Processing time • iOS: BLAS >> NEON, Android: BLAS << NEON – The BLAS library in iOS Accelerate Framework is very efficient ! iOS iOS iOS Android Trade-off: accuracy vs. speed by changing the size of input images Fastest ! Most accurate ! Achieved “real” real-time 26.2 ms!
  • 10. ⓒ 2016 UEC Tokyo. Comparison to FV-based FoodCam with UEC-FOOD100 dataset • Much improved ( 65.3% ⇒ 81.5% (top-1) ) • Even for 160x160 improved ( 65.3% ⇒ 71.5% ) 60.0% 65.0% 70.0% 75.0% 80.0% 85.0% 90.0% 95.0% 100.0% 1 2 3 4 5 6 7 8 9 10 AlexNet NIN 5layer [104ms] NIN 4layer [67ms] NIN 4layer (160x160) [33ms] FV (Color+HOG) [65ms] Top1: 81.5% Top1: 65.3% Top5: 96.2% Top5: 86.7%
  • 11. 2016 UEC Tokyo. ) ) ( ) iOS DeepFoodCam https://itunes.apple.com/jp/app/deepf oodcam/id1111261423?mt=8 iOS RealTimeMultiStyleTransfer https://itunes.apple.com/jp/app/realti memultistyletransfer/id1161707531? mt=8