This document proposes a method called deep embedding to perform scalable image recognition on mobile and IoT devices. Deep neural networks achieve high performance but require too many parameters to run on limited devices. The method uses kernel preserving projection to project features from a pretrained DNN into a lower dimensional space, reducing parameters by 86% while only dropping accuracy 1.12%. This allows image classification to be done directly on mobile and IoT devices using a small, efficient model encoded with high-level semantic information from DNNs.
9. Solution: pure mobile system
Dataset
Lib
Linear
Feature extraction
Classification
Or
Further
Processing
Send low dim.
feature to server for
more complicated job
10. Problem: Limited Storage &
Computing power
• Too many parameters for a DNN model makes
it impossible to fit in a storage & computing
limited system like mobile & IOT devices
• How to perform image classification on mobile
& IOT device?
11. Krizhevsky et al model size (alexNet)
A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012.
Layer: Model Size(MB)
Conv1: float*(48+48)*(3*11^2) = 0.1
Conv2: float*(128+128)*(48*5^2) = 1.2
Conv3: float*(192+192)*(256*3^2 = 3.4
Conv4: float*(192+192)*(192*3^2) = 2.5
Conv5: float*(128+128)*(192*3^2) = 1.7
FC6: float*((128+128)*6^2)*4096 = 144(66%)
FC7: float*4096*4096 = 64(29%)
Total = 217 MB
12. Solution:
Semantic-Rich Low Dim. Feature
• The activations of fully connected layer of
alexNet model are viewed as a general high-
semantic feature in recent years
• 95% of model parameters are for fully
connected
15. Kernel Preserving Projection(KPP)
• find a linear transformation that project
features into a lower dimensional space
where ”preserve the relevance distance in
kernel space”
YC Su et. al. ,”Scalable Mobile Visual Classification by Kernel Preserving Projection over High Dimensional Features”, IEEE, 2014
16. Kernel Preserving Projection(KPP)
• find a explicit transform 𝜙(𝑥) such that:
𝑘 𝑥𝑖, 𝑥𝑗 ≈ 𝜙(𝑥𝑖) ∙ 𝜙(𝑥𝑗)
• In matrix representation, we want to find a
matrix 𝑃 ∈ 𝑅 𝑑×𝐷
𝑲 ≈ 𝑷𝑿 𝑇
𝑷𝑿 = 𝑿 𝑇
𝑷 𝑇
𝑷𝑿
18. Deep Embedding
• Experimental result shows that on hand-craft
feature, RBF kernel perform best
• Thought inf. dim. , RBF space itself is
semantically meaningless !
19. Deep Embedding
• For RBF kernel,
𝑘 𝑥𝑖, 𝑥𝑗 = 𝜙 𝑥𝑖
𝑇
∙ 𝜙 𝑥𝑗 = 𝑒−𝛾||𝑥 𝑖−𝑥 𝑗||2
• For Deep Embedding,
𝜙 𝑥 = 𝑅𝑒𝐿𝑈(𝑥 𝑐𝑜𝑛𝑣5 × 𝑾 𝑓𝑐6)
22. Result
In the experiment, we use liblinear as our
classifier and perform 10-fold on scene15
benchmark dataset. We first compare KPP(RBF)
and other methods on hand-craft state-of-the-
art feature(VLAD) to show how KPP outperform
others.
24. Result-Deep Embed
- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet)
shows to power of DNN
- Deep embedding outperform other method by
large on DNN feature.
The final model result in:
- Requiring only 14% of parameters, 86% space
saved.(217M->30M)
- Accuracy drop only 1.12%.(89.5%->88.38%)
- Suitable for mobile & IOT device computing !
26. Result-Deep Embed
- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet)
shows to power of DNN
- Deep embedding outperform other method by
large on DNN feature.
The final model result in:
- Requiring only 14% of parameters, 86% space
saved.(217M->30M)
- Accuracy drop only 1.12%.(89.5%->88.38%)
- Suitable for mobile & IOT device computing !