Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1
NeoNet: Object centric training
for image recognition
Daniel Fontijne, Koen E. A. van de Sande, Eren Gölge,
R. Blythe To...
2
Summary
Key component: object centric training
Score Ranking
Classification 4.8 -
Localization 12.6 3
Detection 53.6 2
P...
3
Agenda
Foundation
1
Classification
2
Localization
3
Detection
4
Places 2
5
4
The base network for all our submissions is the inceptionnetwork as
introduced in the batch normalization paper by Ioffe...
5
Network in an inception module
Note: the 5x5 path is not used.
Lin et al. ICLR 2014
6
Agenda
Foundation
1
Classification
2
Localization
3
Detection
4
Places 2
5
7
Ensemble of 12 networks
Train ‘really long’, 350 epochs.
Randomized RELU.
Test at 14 scales, 10 crops.
Object preserving...
8
Quiz: What is this?
9
Answer: Flower
10
Quiz: In case you got that right, what is this?
11
Answer: Butterfly
12
Random crop selection might miss the object of interest.
Network tries to remember ‘butterfly’ when presented with leav...
13
Epochs Single view Multi-view
First attempt at inception + batch norm 112 8.63% 6.58%
Train ~325 epochs 324 8.77% 6.34%...
14
Final classification results
16.4
11.7
6.7
4.9
4.8
4.6
3.6
3.6
0 5 10 15 20
SuperVision ('12)
Clarifai ('13)
GoogLeNet ...
15
Agenda
Foundation
1
Classification
2
Localization
3
Detection
4
Places 2
5
16
Foundations.
− Generate box proposals using fast selective search.
− Train box-classification networks on crops.
Object...
17
Use the bounding box annotations for pre-training.
Increase the number of classes from N to 2*N+1:
− N classes for the ...
18
Dual-head network to account for missing bounding boxes.
− One with 1000 outputs.
− One with 2001 outputs. No error gra...
19
Fully connected layer on top of Inception 4e and 5b.
Re-train Inception 5b and new head.
Then fine-tune entire network....
20
Quiz: Is this an entire skyscraper?
21
A 40% border worked best.
− Such that in 7x7 resolution of Inception 5b there is a 1 pixel border.
Bordering the object
22
Extra head for object box alignment.
Classification head is also used, but with cross entropy cost.
Object alignment ne...
23
Object box alignment moves corners up to 50% of the width and height.
100% border allows network to ‘see’ full range of...
24
Component breakdown
Top-5 localization error
First attempt 24.0%
40% border, FC on top of inception 5b 22.5%
FC on top ...
25
Final localization results
42.5
34.2
30.0
25.3
12.6
12.3
9.0
0 5 10 15 20 25 30 35 40 45
UvA ('11)
SuperVision ('12)
Ov...
26
Agenda
Foundation
1
Classification
2
Localization
3
Detection
4
Places 2
5
27
Improved selective search
Fast Improved
Color spaces 2 3
Segmentations 2 4
Similarity functions 2 4
Average boxes 1,600...
28
Five inception-style networks for feature extraction
− Two trained on 1,000 object classes, no input border, fine-tunin...
29
Component breakdown
mAP on validation set
Best object class network 44.6
Best object centric network 47.7
Ensemble of 5...
30
Component breakdown
mAP on validation set
Best object class network 44.6
Best object centric network 47.7
Ensemble of 5...
31
mAP on validation set
Best object class network 44.6
Best object centric network 47.7
Ensemble of 5 51.9
+ context 53.2...
32
Final detection results
22.6
43.9
52.7
53.6
62.1
0 10 20 30 40 50 60 70
UvA/Euvision ('13)
GoogLeNet ('14)
Deep-ID Net
...
33
Agenda
Foundation
1
Classification
2
Localization
3
Detection
4
Places 2
5
34
Our best submission: an ensemble of two inception nets.
− Reduce fully connected layer from 1,000 to 401 outputs.
− Use...
35
Component breakdown (top-5 error)
Single view Multi view
~325 epochs pre-training 17.9% 16.8%
First attempt. 112 epochs...
36
Final places 2 results
20
19.4
19.3
18.0
17.6
17.4
16.9
15 16 17 18 19 20 21
HiVision
MERL
ntu_rose
Trimps-Soushen
NeoN...
37
On device recognition at 18 ms
38
Summary
Key component: object centric training
Score Ranking
Classification 4.8 -
Localization 12.6 3
Detection 53.6 2
...
39
Nothing in these materials is an offer to sell any of the components or devices referenced herein.
©2013-2015 Qualcomm ...
Upcoming SlideShare
Loading in …5
×

Qualcomm research-imagenet2015

4,277 views

Published on

Description of how we as Qualcomm Research used deep learning at ImageNet 2015

Published in: Science
  • Login to see the comments

Qualcomm research-imagenet2015

  1. 1. 1 NeoNet: Object centric training for image recognition Daniel Fontijne, Koen E. A. van de Sande, Eren Gölge, R. Blythe Towal, Anthony Sarah, Cees G. M. Snoek Qualcomm Technologies, Inc., December 17, 2015 Presented by: Daniel Fontijne Senior Staff Engineer
  2. 2. 2 Summary Key component: object centric training Score Ranking Classification 4.8 - Localization 12.6 3 Detection 53.6 2 Places 2 17.6 3
  3. 3. 3 Agenda Foundation 1 Classification 2 Localization 3 Detection 4 Places 2 5
  4. 4. 4 The base network for all our submissions is the inceptionnetwork as introduced in the batch normalization paper by Ioffe & Szegedy. Foundation: Batch-normalized inception Ioffe & Szegedy ICML 2015
  5. 5. 5 Network in an inception module Note: the 5x5 path is not used. Lin et al. ICLR 2014
  6. 6. 6 Agenda Foundation 1 Classification 2 Localization 3 Detection 4 Places 2 5
  7. 7. 7 Ensemble of 12 networks Train ‘really long’, 350 epochs. Randomized RELU. Test at 14 scales, 10 crops. Object preserving crops. Classification overview Xu et al. ICML workshop 2015
  8. 8. 8 Quiz: What is this?
  9. 9. 9 Answer: Flower
  10. 10. 10 Quiz: In case you got that right, what is this?
  11. 11. 11 Answer: Butterfly
  12. 12. 12 Random crop selection might miss the object of interest. Network tries to remember ‘butterfly’ when presented with leaves. Solution: use provided boxes to assure crop contains the object. − For images without box annotation, use best box predicted by localization system. Object preserving crops X
  13. 13. 13 Epochs Single view Multi-view First attempt at inception + batch norm 112 8.63% 6.58% Train ~325 epochs 324 8.77% 6.34% 32 images / mini-batch 130 8.74% 6.68% Object preserving, 32 images/mini-batch 120 8.59% 6.51% Object preserving with generated boxes 130 8.47% 6.46% Ensemble of 12 - - 4.84% Component breakdown
  14. 14. 14 Final classification results 16.4 11.7 6.7 4.9 4.8 4.6 3.6 3.6 0 5 10 15 20 SuperVision ('12) Clarifai ('13) GoogLeNet ('14) Ioffe & Szegedy, ICML '15 NeoNet Trimps-Soushen ReCeption MSRA Top-5 classification error on test set NeoNet is competitive on object classification
  15. 15. 15 Agenda Foundation 1 Classification 2 Localization 3 Detection 4 Places 2 5
  16. 16. 16 Foundations. − Generate box proposals using fast selective search. − Train box-classification networks on crops. Object centric training. − Object pre-training network. − Object localization network. − Object alignment network. Localization overview Girshik et al. PAMI 2016 Uijlings et al. IJCV 2013
  17. 17. 17 Use the bounding box annotations for pre-training. Increase the number of classes from N to 2*N+1: − N classes for the object, well-framed. − N classes for partially framed objects. − 1 class for ‘background’, i.e., object not visible. 1% – 1.5% improvement compared to standard pre-training. Object centric pre-training
  18. 18. 18 Dual-head network to account for missing bounding boxes. − One with 1000 outputs. − One with 2001 outputs. No error gradient when box annotation is missing. Object centric pre-training
  19. 19. 19 Fully connected layer on top of Inception 4e and 5b. Re-train Inception 5b and new head. Then fine-tune entire network. Object localization network
  20. 20. 20 Quiz: Is this an entire skyscraper?
  21. 21. 21 A 40% border worked best. − Such that in 7x7 resolution of Inception 5b there is a 1 pixel border. Bordering the object
  22. 22. 22 Extra head for object box alignment. Classification head is also used, but with cross entropy cost. Object alignment network
  23. 23. 23 Object box alignment moves corners up to 50% of the width and height. 100% border allows network to ‘see’ full range of possible alignments. ~2% gain. Object alignment border
  24. 24. 24 Component breakdown Top-5 localization error First attempt 24.0% 40% border, FC on top of inception 5b 22.5% FC on top of inception 5b+4e 21.8% Object centric pre-training 20.3% Ensemble of 8 17.5% Object alignment 15.5% Final result with ILSVRC blacklist applied 14.5%
  25. 25. 25 Final localization results 42.5 34.2 30.0 25.3 12.6 12.3 9.0 0 5 10 15 20 25 30 35 40 45 UvA ('11) SuperVision ('12) OverFeat ('13) VGG ('14) NeoNet Trimps-Soushen MSRA Top-5 localization error on test set NeoNet is competitive on object localization
  26. 26. 26 Agenda Foundation 1 Classification 2 Localization 3 Detection 4 Places 2 5
  27. 27. 27 Improved selective search Fast Improved Color spaces 2 3 Segmentations 2 4 Similarity functions 2 4 Average boxes 1,600 5,000 MABO 77.5 82.6 Time (s) 0.8 2.4 mAP 41.2 44.0
  28. 28. 28 Five inception-style networks for feature extraction − Two trained on 1,000 object classes, no input border, fine-tuning on detection boxes − Three trained on 1,000 object windows with input border, no fine tuning Object detection network
  29. 29. 29 Component breakdown mAP on validation set Best object class network 44.6 Best object centric network 47.7 Ensemble of 5 51.9
  30. 30. 30 Component breakdown mAP on validation set Best object class network 44.6 Best object centric network 47.7 Ensemble of 5 51.9 + context 53.2 Four classification networks fine tuned with 200 detection class labels
  31. 31. 31 mAP on validation set Best object class network 44.6 Best object centric network 47.7 Ensemble of 5 51.9 + context 53.2 + object alignment 54.6 Component breakdown
  32. 32. 32 Final detection results 22.6 43.9 52.7 53.6 62.1 0 10 20 30 40 50 60 70 UvA/Euvision ('13) GoogLeNet ('14) Deep-ID Net NeoNet MSRA Mean average precision on test set NeoNet is competitive on object detection
  33. 33. 33 Agenda Foundation 1 Classification 2 Localization 3 Detection 4 Places 2 5
  34. 34. 34 Our best submission: an ensemble of two inception nets. − Reduce fully connected layer from 1,000 to 401 outputs. − Use pre-trained weights from ImageNet 1,000 (~325 epochs). − Train Inception 5b and fully connected layer for two epochs. − Fine-tune entire network for eight epochs. Adding other networks reduced the accuracy Places 2 overview
  35. 35. 35 Component breakdown (top-5 error) Single view Multi view ~325 epochs pre-training 17.9% 16.8% First attempt. 112 epochs pre-training. 19.1% 17.9% 512 channel 5b, Alex-style FC head 20.0% 18.4% 32 images / batch 18.7% 17.6% Randomized RELU 18.2% 17.5% Ensemble of 7 - 16.7% Ensemble of 2 - 16.5%
  36. 36. 36 Final places 2 results 20 19.4 19.3 18.0 17.6 17.4 16.9 15 16 17 18 19 20 21 HiVision MERL ntu_rose Trimps-Soushen NeoNet SIAT_MMLAB WM Top-5 classification error on test set NeoNet is competitive on scene classification
  37. 37. 37 On device recognition at 18 ms
  38. 38. 38 Summary Key component: object centric training Score Ranking Classification 4.8 - Localization 12.6 3 Detection 53.6 2 Places 2 17.6 3
  39. 39. 39 Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2013-2015 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Zeroth is a trademark of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT. For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog Thank you Follow us on:

×