Scalable Image Recognition with Deep Embedding Projection

•Download as PPTX, PDF•

0 likes•805 views

This document proposes a method called deep embedding to perform scalable image recognition on mobile and IoT devices. Deep neural networks achieve high performance but require too many parameters to run on limited devices. The method uses kernel preserving projection to project features from a pretrained DNN into a lower dimensional space, reducing parameters by 86% while only dropping accuracy 1.12%. This allows image classification to be done directly on mobile and IoT devices using a small, efficient model encoded with high-level semantic information from DNNs.

Technology

Scalable Image Recognition
Model with Deep Embedding
Chieh-En Tsai
b01902004@cml.csie.ntu.edu.tw

Motivation: DNN raising
• Deep Neural Network achieved the best
performance for variety of visual tasks.

Motivation: popular mobiles
• devices like smartphone, in-car camera, GoPro,
IOT devices pop up.

Huge amount of valuable images stored not in server,
but in mobile & IOT devices

Motivation: exploit DNN
• High performance brought by DNN
• Valuable data brought by mobile & IOT
devices
How to exploit the best of both worlds ?

Solution: client-server system
La Tour Eiffel
averaging 7 - 12 sec
Can’t do real-time application

Solution: pure mobile system
Dataset
Lib
Linear
Feature extraction
Classification
Or
Further
Processing
Send low dim.
feature to server for
more complicated job

Problem: Limited Storage &
Computing power
• Too many parameters for a DNN model makes
it impossible to fit in a storage & computing
limited system like mobile & IOT devices
• How to perform image classification on mobile
& IOT device?

Krizhevsky et al model size (alexNet)
A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012.
Layer: Model Size(MB)
Conv1: float*(48+48)*(3*11^2) = 0.1
Conv2: float*(128+128)*(48*5^2) = 1.2
Conv3: float*(192+192)*(256*3^2 = 3.4
Conv4: float*(192+192)*(192*3^2) = 2.5
Conv5: float*(128+128)*(192*3^2) = 1.7
FC6: float*((128+128)*6^2)*4096 = 144(66%)
FC7: float*4096*4096 = 64(29%)
Total = 217 MB

Solution:
Semantic-Rich Low Dim. Feature
• The activations of fully connected layer of
alexNet model are viewed as a general high-
semantic feature in recent years
• 95% of model parameters are for fully
connected

Solution:
Semantic-Rich Low Dim. Feature
Drop fully connected layer in final model
while still encoding it’s information !

Kernel Preserving Projection(KPP)
• find a linear transformation that project
features into a lower dimensional space
where ”preserve the relevance distance in
kernel space”
YC Su et. al. ,”Scalable Mobile Visual Classification by Kernel Preserving Projection over High Dimensional Features”, IEEE, 2014

Kernel Preserving Projection(KPP)
• find a explicit transform 𝜙(𝑥) such that:
𝑘 𝑥𝑖, 𝑥𝑗 ≈ 𝜙(𝑥𝑖) ∙ 𝜙(𝑥𝑗)
• In matrix representation, we want to find a
matrix 𝑃 ∈ 𝑅 𝑑×𝐷
𝑲 ≈ 𝑷𝑿 𝑇
𝑷𝑿 = 𝑿 𝑇
𝑷 𝑇
𝑷𝑿

Kernel Preserving Projection(KPP)
• MVProjection:
𝑷∗
= argmin
𝑷
|| 𝑲 − 𝑿 𝑇
𝑷 𝑇
𝑷𝑿||F − 𝜆||𝑿 𝑇
𝑷 𝑇
𝑷𝑿|| 𝐅
• L1MVProjection:
𝑷∗
= argmin
𝑷
|| 𝑲 − 𝑿 𝑇
𝑷 𝑇
𝑷𝑿||F − 𝜆||𝑿 𝑇
𝑷 𝑇
𝑷𝑿|| 𝐅 + 𝜂||𝑷||1

Deep Embedding
• Experimental result shows that on hand-craft
feature, RBF kernel perform best
• Thought inf. dim. , RBF space itself is
semantically meaningless !

Deep Embedding
• For RBF kernel,
𝑘 𝑥𝑖, 𝑥𝑗 = 𝜙 𝑥𝑖
𝑇
∙ 𝜙 𝑥𝑗 = 𝑒−𝛾||𝑥 𝑖−𝑥 𝑗||2
• For Deep Embedding,
𝜙 𝑥 = 𝑅𝑒𝐿𝑈(𝑥 𝑐𝑜𝑛𝑣5 × 𝑾 𝑓𝑐6)

Not only model reduced,
but also the classifier

Result
In the experiment, we use liblinear as our
classifier and perform 10-fold on scene15
benchmark dataset. We first compare KPP(RBF)
and other methods on hand-craft state-of-the-
art feature(VLAD) to show how KPP outperform
others.

Result-Deep Embed
- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet)
shows to power of DNN
- Deep embedding outperform other method by
large on DNN feature.
The final model result in:
- Requiring only 14% of parameters, 86% space
saved.(217M->30M)
- Accuracy drop only 1.12%.(89.5%->88.38%)
- Suitable for mobile & IOT device computing !

What's hot

[Impl] neural machine translationJaeHo Jang

CNN for modeling sentenceANISH BHANUSHALI

Recurrent Convolutional Neural Networks for Text ClassificationShuangshuang Zhou

Understanding Convolutional Neural NetworksJeremy Nixon

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya

Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...Universitat Politècnica de Catalunya

EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkPeinan ZHANG

Scene classification using Convolutional Neural Networks - Jayani WithanawasamWithTheBest

Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal

Deep Learning Tutorial Ligeng Zhu

Deep Learning & NLP: Graphs to the Rescue!Roelof Pieters

Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya

A neural conversational_modelsotanemoto

ConnascenceCarlos Raffellini

LeNet to ResNetSomnath Banerjee

Word embeddings, RNN, GRU and LSTMDivya Gera

Paper overview: "Deep Residual Learning for Image Recognition"Ilya Kuzovkin

Natural language processing techniques transition from machine learning to de...Divya Gera

Information Retrieval with Deep LearningAdam Gibson

Naver learning to rank question answer pairs using hrde-ltcNAVER Engineering

What's hot (20)

[Impl] neural machine translation

CNN for modeling sentence

Recurrent Convolutional Neural Networks for Text Classification

Understanding Convolutional Neural Networks

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...

Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...

EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network

Scene classification using Convolutional Neural Networks - Jayani Withanawasam

Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...

Deep Learning Tutorial

Deep Learning & NLP: Graphs to the Rescue!

Neural Machine Translation (D3L4 Deep Learning for Speech and Language UPC 2017)

A neural conversational_model

Connascence

LeNet to ResNet

Word embeddings, RNN, GRU and LSTM

Paper overview: "Deep Residual Learning for Image Recognition"

Natural language processing techniques transition from machine learning to de...

Information Retrieval with Deep Learning

Naver learning to rank question answer pairs using hrde-ltc

Similar to Scalable Image Recognition with Deep Embedding Projection

Deep learning on mobileAnirudh Koul

HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu

Squeezing Deep Learning Into Mobile PhonesAnirudh Koul

Learn to Build an App to Find Similar Images using Deep Learning- Piotr TeterwakPyData

Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningAli Alkan

Cvpr 2018 papers review (efficient computing)DonghyunKang12

Small Deep-Neural-Networks: Their Advantages and Their DesignForrest Iandola

SeRanet introductionKosuke Nakago

Dataset creation for Deep Learning-based Geometric Computer Vision problemsPetteriTeikariPhD

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk

Introduction to computer visionMarcin Jedyk

04 Deep CNN (Ch_01 to Ch_3).pptxZainULABIDIN496386

深度學習在AOI的應用CHENHuiMei

Deep Learning Made Easy with Deep FeaturesTuri, Inc.

kanimozhi2019.pdfAshrafDabbas1

Convolutional neural networks 이론과 응용홍배 김

AI powered emotion recognition: From Inception to Production - Global AI Conf...Vandana Kannan

AI powered emotion recognition: From Inception to Production - Global AI Conf...Apache MXNet

From Conventional Machine Learning to Deep Learning and Beyond.pptxChun-Hao Chang

Similar to Scalable Image Recognition with Deep Embedding Projection (20)

Deep learning on mobile

HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...

Squeezing Deep Learning Into Mobile Phones

Learn to Build an App to Find Similar Images using Deep Learning- Piotr Teterwak

Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning

Cvpr 2018 papers review (efficient computing)

Small Deep-Neural-Networks: Their Advantages and Their Design

SeRanet introduction

Dataset creation for Deep Learning-based Geometric Computer Vision problems

Content-Based Image Retrieval (D2L6 Insight@DCU Machine Learning Workshop 2017)

Introduction to computer vision with Convoluted Neural Networks

Introduction to computer vision

04 Deep CNN (Ch_01 to Ch_3).pptx

深度學習在AOI的應用

Deep Learning Made Easy with Deep Features

kanimozhi2019.pdf

Convolutional neural networks 이론과 응용

AI powered emotion recognition: From Inception to Production - Global AI Conf...

From Conventional Machine Learning to Deep Learning and Beyond.pptx

Recently uploaded

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Install Stable Diffusion in windows machinePadma Pradeep

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

How to Remove Document Management Hurdles with X-Docs?XfilesPro

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Recently uploaded (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Install Stable Diffusion in windows machine

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Injustice - Developers Among Us (SciFiDevCon 2024)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

My Hashitalk Indonesia April 2024 Presentation

GenCyber Cyber Security Day Presentation

Breaking the Kubernetes Kill Chain: Host Path Mount

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

How to Remove Document Management Hurdles with X-Docs?

Pigging Solutions Piggable Sweeping Elbows

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Understanding the Laravel MVC Architecture

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Scaling API-first – The story of a global engineering organization

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

Maximizing Board Effectiveness 2024 Webinar.pptx

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Scalable Image Recognition with Deep Embedding Projection

1. Scalable Image Recognition Model with Deep Embedding Chieh-En Tsai b01902004@cml.csie.ntu.edu.tw

2. Motivation

3. Motivation: DNN raising • Deep Neural Network achieved the best performance for variety of visual tasks.

4. Motivation: popular mobiles • devices like smartphone, in-car camera, GoPro, IOT devices pop up.

5. Huge amount of valuable images stored not in server, but in mobile & IOT devices

6. Motivation: exploit DNN • High performance brought by DNN • Valuable data brought by mobile & IOT devices How to exploit the best of both worlds ?

7. Solution: client-server system La Tour Eiffel averaging 7 - 12 sec Can’t do real-time application

8. Or, another way

9. Solution: pure mobile system Dataset Lib Linear Feature extraction Classification Or Further Processing Send low dim. feature to server for more complicated job

10. Problem: Limited Storage & Computing power • Too many parameters for a DNN model makes it impossible to fit in a storage & computing limited system like mobile & IOT devices • How to perform image classification on mobile & IOT device?

11. Krizhevsky et al model size (alexNet) A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012. Layer: Model Size(MB) Conv1: float*(48+48)*(3*11^2) = 0.1 Conv2: float*(128+128)*(48*5^2) = 1.2 Conv3: float*(192+192)*(256*3^2 = 3.4 Conv4: float*(192+192)*(192*3^2) = 2.5 Conv5: float*(128+128)*(192*3^2) = 1.7 FC6: float*((128+128)*6^2)*4096 = 144(66%) FC7: float*4096*4096 = 64(29%) Total = 217 MB

12. Solution: Semantic-Rich Low Dim. Feature • The activations of fully connected layer of alexNet model are viewed as a general high- semantic feature in recent years • 95% of model parameters are for fully connected

13. Solution: Semantic-Rich Low Dim. Feature Drop fully connected layer in final model while still encoding it’s information !

14. How ?

15. Kernel Preserving Projection(KPP) • find a linear transformation that project features into a lower dimensional space where ”preserve the relevance distance in kernel space” YC Su et. al. ,”Scalable Mobile Visual Classification by Kernel Preserving Projection over High Dimensional Features”, IEEE, 2014

16. Kernel Preserving Projection(KPP) • find a explicit transform 𝜙(𝑥) such that: 𝑘 𝑥𝑖, 𝑥𝑗 ≈ 𝜙(𝑥𝑖) ∙ 𝜙(𝑥𝑗) • In matrix representation, we want to find a matrix 𝑃 ∈ 𝑅 𝑑×𝐷 𝑲 ≈ 𝑷𝑿 𝑇 𝑷𝑿 = 𝑿 𝑇 𝑷 𝑇 𝑷𝑿

17. Kernel Preserving Projection(KPP) • MVProjection: 𝑷∗ = argmin 𝑷 || 𝑲 − 𝑿 𝑇 𝑷 𝑇 𝑷𝑿||F − 𝜆||𝑿 𝑇 𝑷 𝑇 𝑷𝑿|| 𝐅 • L1MVProjection: 𝑷∗ = argmin 𝑷 || 𝑲 − 𝑿 𝑇 𝑷 𝑇 𝑷𝑿||F − 𝜆||𝑿 𝑇 𝑷 𝑇 𝑷𝑿|| 𝐅 + 𝜂||𝑷||1

18. Deep Embedding • Experimental result shows that on hand-craft feature, RBF kernel perform best • Thought inf. dim. , RBF space itself is semantically meaningless !

19. Deep Embedding • For RBF kernel, 𝑘 𝑥𝑖, 𝑥𝑗 = 𝜙 𝑥𝑖 𝑇 ∙ 𝜙 𝑥𝑗 = 𝑒−𝛾||𝑥 𝑖−𝑥 𝑗||2 • For Deep Embedding, 𝜙 𝑥 = 𝑅𝑒𝐿𝑈(𝑥 𝑐𝑜𝑛𝑣5 × 𝑾 𝑓𝑐6)

20. Deep Embedding

21. Not only model reduced, but also the classifier

22. Result In the experiment, we use liblinear as our classifier and perform 10-fold on scene15 benchmark dataset. We first compare KPP(RBF) and other methods on hand-craft state-of-the- art feature(VLAD) to show how KPP outperform others.

23. Result

24. Result-Deep Embed - Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet) shows to power of DNN - Deep embedding outperform other method by large on DNN feature. The final model result in: - Requiring only 14% of parameters, 86% space saved.(217M->30M) - Accuracy drop only 1.12%.(89.5%->88.38%) - Suitable for mobile & IOT device computing !

25. Result-Deep Embed 21.1M 0 30MB

26. Result-Deep Embed - Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet) shows to power of DNN - Deep embedding outperform other method by large on DNN feature. The final model result in: - Requiring only 14% of parameters, 86% space saved.(217M->30M) - Accuracy drop only 1.12%.(89.5%->88.38%) - Suitable for mobile & IOT device computing !

27. Thank you !

Scalable Image Recognition with Deep Embedding Projection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scalable Image Recognition with Deep Embedding Projection

Similar to Scalable Image Recognition with Deep Embedding Projection (20)

More from 捷恩蔡

More from 捷恩蔡 (7)

Recently uploaded

Recently uploaded (20)

Scalable Image Recognition with Deep Embedding Projection

Scalable Image Recognition with Deep Embedding Projection

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scalable Image Recognition with Deep Embedding Projection

Similar to Scalable Image Recognition with Deep Embedding Projection (20)

More from 捷恩 蔡

More from 捷恩 蔡 (7)

Recently uploaded

Recently uploaded (20)

Scalable Image Recognition with Deep Embedding Projection

More from 捷恩蔡

More from 捷恩蔡 (7)