SlideShare a Scribd company logo
1 of 22
Learning Deep Features for
Discriminative Localization
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio
Torralba Computer Science and Artificial Intelligence Laboratory, MIT
2017.06.06(Tue)
Junya Tanaka(M1)
Introduction
•The various layers of CNN behave as object
detectors
oThis ability is lost when fully-connected layers are used for
classification
•Past work
oAvoid the use of fully-connected layers to minimize the
number of parameters while maintaining high performance
oNetwork in Network (NIN) [M. Lin et al., 2014]
Global average pooling which acts as a structural regularizer,
preventing over-fitting during training
oGoogLeNet [Zhou et al., 2016]
Introduction
•Classify the image and localize class-specific image
regions in a single forward-pass
oLocalize the discriminative regions as the objects that the
humans are interacting with rather than the humans
themselves
This approach can be
easily transferred to other
recognition datasets for
generic classification,
localization, and concept
discovery.
Related Work
•Weakly-supervised object localization:
oBergamo et al
Self-taught object localization involving masking out image
oCinbis et al and Pinheiro et al
Combining multiple-instance learning with CNN features
oOquab et al
Transferring mid-level image representations and evaluating the
output on multiple overlapping patches
•Visualizing CNNs:
oZeiler et al
Using deconvolutional networks to visualize what patterns activate
each unit
Class Activation Mapping
•Identify the importance of the image regions by
projecting back the weights of the output layer on to
the convolutional feature maps
•The CAMs of two classes from dataset
Class Activation Mapping
•Highlight the differences in the CAMs for a single
image when using different classes to generate the
maps
•Observe that the discriminative regions for different
categories are different even for a given image
Class Activation Mapping
•The procedure for generating class activation maps
oGAP outputs the spatial average of the feature map of
each unit at the last conv layer
oA weighted sum of these values is used to generate the
final output
Class Activation Mapping
•It is important to highlight the intuitive difference
between GAP and GMP
•Global average pooling vs Global max pooling:
oGAP(The extent of the object)
Value can be maximized by finding all discriminative parts of an
object as all low activations reduce the output of the particular map
oGMP(just one discriminative part)
Low scores for all image regions except the most discriminative one
do not impact the score as you just perform a max
Weakly-supervised Object Localization
•Evaluate the localization ability of CAM when trained
on the ILSVRC
•Verify that this technique does not adversely impact
the classification performance when learning to
localize and provide detailed results on weakly-
supervised object localization
Setup
•The effect of using CAM on the following popular
CNNs: AlexNet, VG-Gnet, and GoogLeNet
oRemove the fully-connected layers before the final output
oReplace them with GAP followed by a fully-connected
softmax layer
•Classification
oCompare this approach against the original AlexNet,
VGGnet, GoogLeNet and NIN
•Localization
oCompare against the original GoogLeNet, NIN and
GoogLeNet-GMP(For comparing with GMP and GAP)
Results
•Classification
oThis approach does not significantly hurt classification
performance
Results
•Localization
oNeed to generate a bounding box and its associated object
category
Simple thresholding technique to segment the heatmap
Segment the regions of which the value is above 20% of the max
value of the CAM
Take the bounding box that covers the largest connected component
in the segmentation map
Results
•Localization
oThis approach is effective at weakly-supervised object
localization
Results
•Localization
Results
•Localization
•CAM approach significantly outperforms the
backpropagation approach (see Fig. for a comparison of
the outputs)
Deep Features for Generic Localization
•The responses from the higher-level layers of CNN
have been shown to be effective generic features
•Show that GAP CNNs also identify the
discriminative image regions used for categorization
oTrain a linear SVM on the output of the GAP layer
oCompare the performance of features from our best
network, GoogLeNet-GAP, with the fc7 features from
AlexNet, and ave pool from GoogLeNet
Fine-grained Recognition
•Explore fine-grained recognition of birds
•Evaluate the generic localization ability and use it to
further improve performance
oGoogLeNet-GAP can successfully localize important image
crops, boosting classification performance
Pattern Discovery
•Technique can identify common elements or
patterns in images beyond objects, such as text or
high-level concepts
oGiven a set of images containing a common concept
oTo identify which regions our network recognizes as being
important and if this corresponds to the input pattern
•Several pattern discovery using deep features
oTrain a linear SVM on the GAP layer of the GoogLeNet-
GAP network
oApply the CAM technique
Pattern Discovery
•Discovering informative objects in the scenes:
o10 scene categories from the SUN dataset
containing at least 200 fully annotated images
resulting in a total of 4675 fully annotated images
oTrain a one-vs-all linear SVM for each scene category
oCompute the CAMs using the weights of the linear SVM
oThe high activation regions frequently correspond to
objects indicative of the particular scene category
Pattern Discovery
•Weakly supervised text detector:
oTrain a weakly supervised text detector using 350 Google
StreetView images containing text
oApproach highlights the text accurately without using
bounding box annotations
Visualizing Class-Specific Units
•The class-specific units of a CNN
oEstimating the receptive field
oSegmenting the top activation images of each unit in the
final convolutional layer
oUsing GAP and the ranked softmax weight
Identifying the parts of
the object that are most
discriminative for
classification and exactly
which units detect these
parts
Conclusion
•Propose a general technique called CAM for CNNs
with global average pooling
•Classification-trained CNNs can learn to perform
object localization, without using any bounding box
annotations
•Class activation maps allow us to visualize the
predicted class scores
•Demonstrated that global average pooling CNNs
can perform accurate object localization

More Related Content

What's hot

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Learning to Perceive the 3D World
Learning to Perceive the 3D WorldLearning to Perceive the 3D World
Learning to Perceive the 3D WorldNAVER Engineering
 
Video Object Segmentation in Videos
Video Object Segmentation in VideosVideo Object Segmentation in Videos
Video Object Segmentation in VideosNAVER Engineering
 
Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...
Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...
Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...NAVER Engineering
 
A survey on moving object tracking in video
A survey on moving object tracking in videoA survey on moving object tracking in video
A survey on moving object tracking in videoijitjournal
 
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
 A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv... A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...Chennai Networks
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa
 
Moving object detection
Moving object detectionMoving object detection
Moving object detectionManav Mittal
 
Real-time Object Tracking
Real-time Object TrackingReal-time Object Tracking
Real-time Object TrackingWonsang You
 
Multiple Object Tracking
Multiple Object TrackingMultiple Object Tracking
Multiple Object TrackingRainakSharma
 
Semantic Line Detection and Its Applications
Semantic Line Detection and Its ApplicationsSemantic Line Detection and Its Applications
Semantic Line Detection and Its ApplicationsNAVER Engineering
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learningYu Huang
 
Visual object tracking using particle clustering - ICITACEE 2014
Visual object tracking using particle clustering - ICITACEE 2014Visual object tracking using particle clustering - ICITACEE 2014
Visual object tracking using particle clustering - ICITACEE 2014Harindra Pradhana
 
Contour-Constrained Superpixels for Image and Video Processing
Contour-Constrained Superpixels for Image and Video ProcessingContour-Constrained Superpixels for Image and Video Processing
Contour-Constrained Superpixels for Image and Video ProcessingNAVER Engineering
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Haze removal for a single remote sensing image based on deformed haze imaging...
Haze removal for a single remote sensing image based on deformed haze imaging...Haze removal for a single remote sensing image based on deformed haze imaging...
Haze removal for a single remote sensing image based on deformed haze imaging...LogicMindtech Nologies
 

What's hot (20)

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Learning to Perceive the 3D World
Learning to Perceive the 3D WorldLearning to Perceive the 3D World
Learning to Perceive the 3D World
 
Video Object Segmentation in Videos
Video Object Segmentation in VideosVideo Object Segmentation in Videos
Video Object Segmentation in Videos
 
Moving object detection1
Moving object detection1Moving object detection1
Moving object detection1
 
Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...
Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...
Fast and Noise Robust Depth from Focus using Ring Difference Filter with Your...
 
A survey on moving object tracking in video
A survey on moving object tracking in videoA survey on moving object tracking in video
A survey on moving object tracking in video
 
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
 A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv... A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
A Genetic Algorithm-Based Moving Object Detection For Real-Time Traffic Surv...
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
 
Moving object detection
Moving object detectionMoving object detection
Moving object detection
 
Real-time Object Tracking
Real-time Object TrackingReal-time Object Tracking
Real-time Object Tracking
 
Multiple Object Tracking
Multiple Object TrackingMultiple Object Tracking
Multiple Object Tracking
 
Semantic Line Detection and Its Applications
Semantic Line Detection and Its ApplicationsSemantic Line Detection and Its Applications
Semantic Line Detection and Its Applications
 
Passive stereo vision with deep learning
Passive stereo vision with deep learningPassive stereo vision with deep learning
Passive stereo vision with deep learning
 
Final Poster
Final PosterFinal Poster
Final Poster
 
Visual object tracking using particle clustering - ICITACEE 2014
Visual object tracking using particle clustering - ICITACEE 2014Visual object tracking using particle clustering - ICITACEE 2014
Visual object tracking using particle clustering - ICITACEE 2014
 
Contour-Constrained Superpixels for Image and Video Processing
Contour-Constrained Superpixels for Image and Video ProcessingContour-Constrained Superpixels for Image and Video Processing
Contour-Constrained Superpixels for Image and Video Processing
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Haze removal for a single remote sensing image based on deformed haze imaging...
Haze removal for a single remote sensing image based on deformed haze imaging...Haze removal for a single remote sensing image based on deformed haze imaging...
Haze removal for a single remote sensing image based on deformed haze imaging...
 

Similar to Paper review

物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術CHENHuiMei
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptxnitin571047
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...Dongmin Choi
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...Seunghyun Hwang
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...gabrielesisinna
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonAditya Bhattacharya
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Jihong Kang
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET Journal
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learningpratik pratyay
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceDiego Tosato
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Sujit Pal
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitBAINIDA
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 

Similar to Paper review (20)

物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptx
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
FickleNet: Weakly and Semi-supervised Semantic Image Segmentation using Stoch...
 
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
 
Computer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathonComputer vision-nit-silchar-hackathon
Computer vision-nit-silchar-hackathon
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
IRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and PythonIRJET - Object Detection using Deep Learning with OpenCV and Python
IRJET - Object Detection using Deep Learning with OpenCV and Python
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 

Recently uploaded

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 

Recently uploaded (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 

Paper review

  • 1. Learning Deep Features for Discriminative Localization Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba Computer Science and Artificial Intelligence Laboratory, MIT 2017.06.06(Tue) Junya Tanaka(M1)
  • 2. Introduction •The various layers of CNN behave as object detectors oThis ability is lost when fully-connected layers are used for classification •Past work oAvoid the use of fully-connected layers to minimize the number of parameters while maintaining high performance oNetwork in Network (NIN) [M. Lin et al., 2014] Global average pooling which acts as a structural regularizer, preventing over-fitting during training oGoogLeNet [Zhou et al., 2016]
  • 3. Introduction •Classify the image and localize class-specific image regions in a single forward-pass oLocalize the discriminative regions as the objects that the humans are interacting with rather than the humans themselves This approach can be easily transferred to other recognition datasets for generic classification, localization, and concept discovery.
  • 4. Related Work •Weakly-supervised object localization: oBergamo et al Self-taught object localization involving masking out image oCinbis et al and Pinheiro et al Combining multiple-instance learning with CNN features oOquab et al Transferring mid-level image representations and evaluating the output on multiple overlapping patches •Visualizing CNNs: oZeiler et al Using deconvolutional networks to visualize what patterns activate each unit
  • 5. Class Activation Mapping •Identify the importance of the image regions by projecting back the weights of the output layer on to the convolutional feature maps •The CAMs of two classes from dataset
  • 6. Class Activation Mapping •Highlight the differences in the CAMs for a single image when using different classes to generate the maps •Observe that the discriminative regions for different categories are different even for a given image
  • 7. Class Activation Mapping •The procedure for generating class activation maps oGAP outputs the spatial average of the feature map of each unit at the last conv layer oA weighted sum of these values is used to generate the final output
  • 8. Class Activation Mapping •It is important to highlight the intuitive difference between GAP and GMP •Global average pooling vs Global max pooling: oGAP(The extent of the object) Value can be maximized by finding all discriminative parts of an object as all low activations reduce the output of the particular map oGMP(just one discriminative part) Low scores for all image regions except the most discriminative one do not impact the score as you just perform a max
  • 9. Weakly-supervised Object Localization •Evaluate the localization ability of CAM when trained on the ILSVRC •Verify that this technique does not adversely impact the classification performance when learning to localize and provide detailed results on weakly- supervised object localization
  • 10. Setup •The effect of using CAM on the following popular CNNs: AlexNet, VG-Gnet, and GoogLeNet oRemove the fully-connected layers before the final output oReplace them with GAP followed by a fully-connected softmax layer •Classification oCompare this approach against the original AlexNet, VGGnet, GoogLeNet and NIN •Localization oCompare against the original GoogLeNet, NIN and GoogLeNet-GMP(For comparing with GMP and GAP)
  • 11. Results •Classification oThis approach does not significantly hurt classification performance
  • 12. Results •Localization oNeed to generate a bounding box and its associated object category Simple thresholding technique to segment the heatmap Segment the regions of which the value is above 20% of the max value of the CAM Take the bounding box that covers the largest connected component in the segmentation map
  • 13. Results •Localization oThis approach is effective at weakly-supervised object localization
  • 15. Results •Localization •CAM approach significantly outperforms the backpropagation approach (see Fig. for a comparison of the outputs)
  • 16. Deep Features for Generic Localization •The responses from the higher-level layers of CNN have been shown to be effective generic features •Show that GAP CNNs also identify the discriminative image regions used for categorization oTrain a linear SVM on the output of the GAP layer oCompare the performance of features from our best network, GoogLeNet-GAP, with the fc7 features from AlexNet, and ave pool from GoogLeNet
  • 17. Fine-grained Recognition •Explore fine-grained recognition of birds •Evaluate the generic localization ability and use it to further improve performance oGoogLeNet-GAP can successfully localize important image crops, boosting classification performance
  • 18. Pattern Discovery •Technique can identify common elements or patterns in images beyond objects, such as text or high-level concepts oGiven a set of images containing a common concept oTo identify which regions our network recognizes as being important and if this corresponds to the input pattern •Several pattern discovery using deep features oTrain a linear SVM on the GAP layer of the GoogLeNet- GAP network oApply the CAM technique
  • 19. Pattern Discovery •Discovering informative objects in the scenes: o10 scene categories from the SUN dataset containing at least 200 fully annotated images resulting in a total of 4675 fully annotated images oTrain a one-vs-all linear SVM for each scene category oCompute the CAMs using the weights of the linear SVM oThe high activation regions frequently correspond to objects indicative of the particular scene category
  • 20. Pattern Discovery •Weakly supervised text detector: oTrain a weakly supervised text detector using 350 Google StreetView images containing text oApproach highlights the text accurately without using bounding box annotations
  • 21. Visualizing Class-Specific Units •The class-specific units of a CNN oEstimating the receptive field oSegmenting the top activation images of each unit in the final convolutional layer oUsing GAP and the ranked softmax weight Identifying the parts of the object that are most discriminative for classification and exactly which units detect these parts
  • 22. Conclusion •Propose a general technique called CAM for CNNs with global average pooling •Classification-trained CNNs can learn to perform object localization, without using any bounding box annotations •Class activation maps allow us to visualize the predicted class scores •Demonstrated that global average pooling CNNs can perform accurate object localization

Editor's Notes

  1. This suggests that our approach works as expected.
  2. Layers for easy handling
  3. There is a small performance drop of 1 − 2% when removing the additional layers from the various networks. the classification performance is largely preserved for our GAP networks. GoogLeNet-GAP and GoogLeNet-GMP have similar per- formance on classification, as expected. classification in order to achieve a high performance on localization as it involves identifying both the object category and the bound- ing box location accurately.
  4. Fig. 6(a) shows some example bounding boxes generated using this technique.
  5. observe that our GAP networks outperform all the baseline approaches with GoogLeNet-GAP achieving the lowest localization error of 43% on top-5
  6. Further, we observe that GoogLeNet-GAP signifi- cantly outperforms GoogLeNet on localization, despite this being reversed for classification. We believe that the low mapping resolution of GoogLeNet (7 × 7) prevents it from obtaining accurate localizations. Last, we observe that GoogLeNet-GAP outperforms GoogLeNet-GMP by a rea- sonable margin illustrating the importance of average pool- ing over max pooling for identifying the extent of objects.
  7. GoogLeNet-GAP performs comparably to existing approaches, achieving an accuracy of 63.0% when using the full image without any bounding box annotations for both train and test. When using bounding box anno- tations, this accuracy increases to 70.5% . We then use GoogLeNet-GAP to extract features again from the crops inside the bounding box, for training and testing. We find that this improves the performance considerably to 67.8%.
  8. Figure 9. Informative objects for two scene categories. For the din- ing room and bathroom categories, we show examples of original images (top), and list of the 6 most frequent objects in that scene category with the corresponding frequency of appearance. At the bottom: the CAMs and a list of the 6 objects that most frequently overlap with the high activation regions.
  9. The text is accurately detected on the image even though our network is not trained with text or any bounding box annotations.