ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
Learning Deep Features for
Discriminative Localization
CVPR 2016
Authors : Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
Computer Science and Artificial Intelligence Laboratory, MIT
Presented by Kwang Hee Lee
2
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
In this work…
 Propose : A simple modification of the GAP(Global Average Pooling) layer + Class
Activation Mapping(CAM)
 Localize the discriminative image regions using CAM in a single forward-pass
 Achieve 37.1% top-5 error for object localization on ILSVRC2014
(34.2% top-5 test error achieved by fully supervised AlexNet)
 A generic localizable deep representation that can be applied to a variety tasks.
(transferred to other recognition datasets for generic classification, localization and
concept discovery)
3
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Global Average Pooling
 Network In Network(NIN)에서 처음 제안됨.
 Fully-connected layers의 사용이 overfitting을 일으킴.
 structural regularizer : Parameter수를 줄여 overfitting을 막기 위해 FC layer 대신 사용됨.
 GAP는 spatial information을 채널별로 sum하기 때문에 input의 spatial translation에 더
robust하다.
Related Work
Network in Network : https://arxiv.org/pdf/1312.4400.pdf
4
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Weakly-supervised Object Localization
• Not end-to-end system [1,2,15]
• Require multiple forward passes of a network [1,2,15]
• Global max pooling (point localization) [16]
• End-to-end system, single forward pass, global average pooling
• GAP자체보다는 accurate discriminative localization에 적용될 수 있다는
observation이 contribution 이라고 주장함.
Related Work
5
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Visualizing CNN
• Fully connected layer을 거치면서 좋은 localization 능력이 무시되고,
incomplete activation map을 얻게 됨.
제안된 방법은 fully connected layer대신 Global Average Pooling을 이용하
면 performance를 유지하면서 network을 처음부터 끝까지 이해 가능하다.
Related Work
Visualizing and understanding convolutional networks. ECCV2014
6
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
Class Activation Mapping(CAM)
𝑓𝑘
𝐹𝑘 =
𝑥,𝑦
𝑓𝑘(𝑥, 𝑦)
𝐹𝑘
Class c
𝑺𝒐𝒇𝒕𝒎𝒂𝒙(𝑺 𝒄)
𝑆𝑐 =
𝑘
𝑤 𝑘
𝑐
𝑥,𝑦
𝑓𝑘 𝑥, 𝑦 =
𝑥,𝑦 𝑘
𝑤 𝑘
𝑐
𝑓𝑘(𝑥, 𝑦)
𝑤 𝑘
𝑐
𝑓𝑘 𝑀𝑐
𝑀𝑐(𝑥, 𝑦) =
𝑘
𝑤 𝑘
𝑐
𝑓𝑘(𝑥, 𝑦) Importance of the activation at spatial grid (x,y)
leading to the classification of an image to class c
7
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
Class Activation Mapping(CAM)
8
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
Class Activation Mapping(CAM)
9
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 GAP loss encourages the network to identify the extent of the object
 GMP loss encourages the network to identify just one discriminative
part
 GAP는 모든 discriminative part를 고려해서 map의 score에 영향을 주지
만, GMP는 most discriminative one을 제외한 low score들은 전체 score
에 임팩트를 주지 못함.
Global average pooling(GAP) vs global max pooling(GMP)
10
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Setup
• AlexNet, VGGnet, GoogLeNet을 변형
• Fully connected layer를 GAP로 바꿈
• Mapping Resolution을 올리기 위해 여러 conv layer들을 제거
• Modify
• AlexNet : conv 5 뒤쪽 레이어 제거, mapping resolution : 13x13 – AlexNet-GAP
• VGGnet : conv5-3 뒤쪽 레이어 제거, mapping resolution : 14x14 – VGGnet-GAP
• GoogLeNet : inception4e 뒤쪽 레이어 제거, mapping resolution : 14x14 – GoogLeNet-GAP
• 마지막에 3x3, stride1, pad1 with 1024 units conv net 추가 후 GAP layer + softmax
• Fine-tuning on ILSVRC dataset(1000 class)
• Classification 성능 비교
• Original AlexNet, VGGnet, GoogLeNet, NIN과 비교
• Localization 성능 비교
• Original GoogLeNet, NIN, backpropagation instead of CAM
• GoogLeNet-GAP와 GoogLeNet-GMP 비교
Weakly-supervised Object Localization
11
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Classification Results
• A small performance drop of 1~2% when removing conv layer.
• Localization에 대한 높은 성능을 얻기 위해서는 classification성능이 중요함
Weakly-supervised Object Localization
12
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Localization Results
• Bounding box생성 : simple thresholding technique to segment heatmap
( above 20% of the max value of the CAM, largest connected component )
Weakly-supervised Object Localization
첫번째 두번째 예측 클래스의
CAM으로 부터 두개의 바운딩 박스
(one tight and one loose),
세번째 예측 클래스의 CAM으로부터
one loose 바운딩 박스를 선택
13
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Localization Results
• Bounding box생성 : simple thresholding technique to segment heatmap
( above 20% of the max value of the CAM, largest connected component )
Weakly-supervised Object Localization
14
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Higher-level layer + linear SVM
Deep Features for Generic Localization
15
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Fine-grained Recognition
 Dataset : 200 bird species, 11,788 images, Bbox Anotation
Deep Features for Generic Localization
16
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Pattern Discovery
• Discovering informative objects in the scenes
• 10 scene categories from SUN dataset (total 4675 fully annotated images)
• GAP layer(GoogLeNet-GAP) + one-vs-all linear SVM
• Informative object : 가장 자주 나타나는 Top 6 objects
Deep Features for Generic Localization
17
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Pattern Discovery
• Concept localization in weakly labeled images
Deep Features for Generic Localization
18
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Pattern Discovery
• Weakly supervised text detector
Deep Features for Generic Localization
19
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Pattern Discovery
• Interpreting visual question answering
Deep Features for Generic Localization
20
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
 Class specific unit
Visualizing Class-Specific Units
ARTIFICIAL
INTELLIGENCE
RESEARCH
INSTITUTE
감사합니다.
Q & A

PR12-CAM