Scalable image recognition model with deep embedding

Scalable Image Recognition
Model with Deep Embedding
Chieh-En Tsai
b01902004@cml.csie.ntu.edu.tw

Motivation: DNN raising
• Deep Neural Network achieved the best
performance for variety of visual tasks.

Motivation: popular mobiles
• devices like smartphone, in-car camera, GoPro,
IOT devices pop up.

Huge amount of valuable images stored not in server,
but in mobile & IOT devices

Motivation: exploit DNN
• High performance brought by DNN
• Valuable data brought by mobile & IOT
devices
How to exploit the best of both worlds ?

Solution: client-server system
La Tour Eiffel
averaging 7 - 12 sec
Can’t do real-time application

Solution: pure mobile system
Dataset
Lib
Linear
Feature extraction
Classification
Or
Further
Processing
Send low dim.
feature to server for
more complicated job

Problem: Limited Storage &
Computing power
• Too many parameters for a DNN model makes
it impossible to fit in a storage & computing
limited system like mobile & IOT devices
• How to perform image classification on mobile
& IOT device?

Krizhevsky et al model size (alexNet)
A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012.
Layer: Model Size(MB)
Conv1: float*(48+48)*(3*11^2) = 0.1
Conv2: float*(128+128)*(48*5^2) = 1.2
Conv3: float*(192+192)*(256*3^2 = 3.4
Conv4: float*(192+192)*(192*3^2) = 2.5
Conv5: float*(128+128)*(192*3^2) = 1.7
FC6: float*((128+128)*6^2)*4096 = 144(66%)
FC7: float*4096*4096 = 64(29%)
Total = 217 MB

Solution:
Semantic-Rich Low Dim. Feature
• The activations of fully connected layer of
alexNet model are viewed as a general high-
semantic feature in recent years
• 95% of model parameters are for fully
connected

Solution:
Semantic-Rich Low Dim. Feature
Drop fully connected layer in final model
while still encoding it’s information !

Kernel Preserving Projection(KPP)
• find a linear transformation that project
features into a lower dimensional space
where ”preserve the relevance distance in
kernel space”
YC Su et. al. ,”Scalable Mobile Visual Classification by Kernel Preserving Projection over High Dimensional Features”, IEEE, 2014

• find a explicit transform 𝜙(𝑥) such that:
𝑘 𝑥𝑖, 𝑥𝑗 ≈ 𝜙(𝑥𝑖) ∙ 𝜙(𝑥𝑗)
• In matrix representation, we want to find a
matrix 𝑃 ∈ 𝑅 𝑑×𝐷
𝑲 ≈ 𝑷𝑿 𝑇
𝑷𝑿 = 𝑿 𝑇
𝑷 𝑇
𝑷𝑿

• MVProjection:
𝑷∗
= argmin
𝑷
|| 𝑲 − 𝑿 𝑇
𝑷 𝑇
𝑷𝑿||F − 𝜆||𝑿 𝑇
𝑷 𝑇
𝑷𝑿|| 𝐅
• L1MVProjection:
𝑷∗
= argmin
𝑷
|| 𝑲 − 𝑿 𝑇
𝑷 𝑇
𝑷𝑿||F − 𝜆||𝑿 𝑇
𝑷 𝑇
𝑷𝑿|| 𝐅 + 𝜂||𝑷||1

Deep Embedding
• Experimental result shows that on hand-craft
feature, RBF kernel perform best
• Thought inf. dim. , RBF space itself is
semantically meaningless !

Deep Embedding
• For RBF kernel,
𝑘 𝑥𝑖, 𝑥𝑗 = 𝜙 𝑥𝑖
𝑇
∙ 𝜙 𝑥𝑗 = 𝑒−𝛾||𝑥 𝑖−𝑥 𝑗||2
• For Deep Embedding,
𝜙 𝑥 = 𝑅𝑒𝐿𝑈(𝑥 𝑐𝑜𝑛𝑣5 × 𝑾 𝑓𝑐6)

Not only model reduced,
but also the classifier

Result
In the experiment, we use liblinear as our
classifier and perform 10-fold on scene15
benchmark dataset. We first compare KPP(RBF)
and other methods on hand-craft state-of-the-
art feature(VLAD) to show how KPP outperform
others.

Result-Deep Embed
- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet)
shows to power of DNN
- Deep embedding outperform other method by
large on DNN feature.
The final model result in:
- Requiring only 14% of parameters, 86% space
saved.(217M->30M)
- Accuracy drop only 1.12%.(89.5%->88.38%)
- Suitable for mobile & IOT device computing !

Result-Deep Embed
21.1M
0
30MB

Scalable image recognition model with deep embedding

More Related Content

What's hot

Similar to Scalable image recognition model with deep embedding

More from 捷恩 蔡

Recently uploaded

Scalable image recognition model with deep embedding

More from 捷恩蔡