SlideShare a Scribd company logo
1 of 3
Download to read offline
DIVAR - 2ND INTERVIEW MINI PROJECT 1
Object Localization With Classification Networks
Mahan Fathi
UnsupervisedObjectLocalization
I. PRELIMINARIES
This is a mini project as the second phase of the interview
process, with the purpose of object localization on the raw
image dataset of Divar, and by ‘raw,’ I mean there is no
tagging or ground-truth bounding box available. Of course,
running off-the-shelf trained networks on the dataset is the
first thing that comes to mind, but doing so, I infer, is not
exactly in-line with the interviewer expectations. There’s not
much to passing image batches through a trained network and
storing the outputs. So I opt to go with a more demanding
method, that might entail more familiarity with neural nets
and auto-differentiation frameworks. As a result, here I
present an algorithm which is capable of producing fairly
nice bounding boxes, by only using good old get-at-able
classification networks.
IMPORTANT: One other thing I would like to mention is
that, my dearest grandfather passed away only a few hours
after I received the project mail and countdown. I could not
even make time to report the matter to you, and did not know
if it was the right thing to do, but now I appreciate if you
bear in mind that this is the result of the work done over a
weekend and I had a really tough time getting around this in
such a short time.
II. INTRODUCTION
GUIDED Back-propagation has proven to be a fast and
interpretable util for visualizing the parts of the image
to which a specific neuron fires the most — not exactly,
just waving hands here. This is done by gating the gradients
through Relu units when back-propagating from a single
neuron all the way back to the input image. Classification
networks store valuable spatial information about the picture
and the idea is to make the most out of these gradients for the
task of object localization when they correspond to the object.
By targeting the neurons with the biggest positive impact on
the classification score, we should be able to spot the parts of
the image which we infer to confirm with the object.
III. METHOD OVERVIEW
An introduction to the method was given in the previous
section, here I go into more detail. The first thing we need is
a classification network. I picked VGG16 for Its simple archi-
tecture, availability, and Its relative tininess for the better of
my limited vram. As an algorithm hyper-parameter, we choose
the layer from which the top neurons are drawn. After exper-
imenting with the output, I finally settled block5_conv2
for the inputted neurons. One major perk of back-propagating
from an intermediate layer, and not final feature vectors, also
referred to as net embeddings or fc7 features, is that it incites
more generality and flexibility for the algorithm to work on a
wider range of image categories. To dive into the algorithm,
our first step is to pass the image through the network to
calculate the class scores. Assuming that VGGNET outputs
the right class for the image, the aforementioned top neurons,
are those with the biggest contribution to the maximum class
score. These neurons are later guided back-propagated to the
original image and only a handful of them make it to the mask
generation step, which will be further described later in this
reading. These masks are finally joined and the bounding box
for the object is simply the smallest rectangle enclosing this
area.
Below an overall representation of the algorithm is provided
in pseudo-code. Each step is later attended to individually.
Algorithm 1 Object Localization with VGGNET
1: procedure LOCALIZATIONWITHVGG
2: Pass image through VGGNET to obtain the classifi-
cation
3: Identify kmax most important neurons via DAM
heuristic
4: Use Guided Back-propagation to map neurons back
into the image
5: Generate masks from gradients’ saliency maps and
apply to image separately
6: Pass resulted images once again to get class scores,
pick top final k neurons
7: Join final masks and find enclosing bounding box
IV. METHOD DESCRIPTION
A. Passing image through net
We need the class scores – the top class in particular –
and the feature layer activations later. Although my narration
makes it look like these steps are performed sequentially,
but it is important to note that, these computations are not
actually carried out in that way — the network is wired up in
TensorFlow.
B. Finding kmax neurons
As it was mentioned in the previous section, we focus
on the input neurons to the layer of block5_conv2. We
need a notion of importance to select the kmax neurons. This
selection is necessary, because there are approximately around
1000 neurons in this layer and back-propagating from all of
DIVAR - 2ND INTERVIEW MINI PROJECT 2
them is not computationally practical. So we have to make a
subset with the size of kmax and back-propagate from them.
To introduce a notion of importance, I use the DAM heuristic
which is proportional to the activation of the neuron and the
top class score differentiation of its activation. So I form this
matrix for the input layer of block5_conv2 and get kmax
neuron indices with highest values:
activations
dtopClassScore
dactivations
I have set kmax to be 10. These 10 are then considered for
back-propagation.
C. Guided Back-propagating from neurons to image
VGGNET uses Relu for the non-linearity units and Guided
Back-propagation makes the differentiation of these units a
tad different — the back-propagating signal on these units
additionally must be thresholded on zero. A nice handle is
implemented in the code that makes me able to switch from
normal to guided back-propagation whenever I want. Here this
switch is activated. The output of the guided back-propagation
is a matrix with the size of the original input image. So now we
have kmax or 10 different images, and each one of them tends
to different parts of the original image. The negative saliency
map of a guided back-propagated gradients are shown in Fig
1.
D. Generating masks
To clean up the back-propagated gradients, only pixel values
that fall into a certain percentile are kept – their value is set
to one for every channel – and the rest are set to zero. This
binary image is sent through the morphological operations of
dilation and erosion respectively, to ensure that there are no
tiny islands and holes of active pixels on the mask. This is
procedure is carried out for every one of the kmax neurons.
E. Selecting top k neurons
These kmax neurons are separately applied on the image
and kmax masked images are produces. These images are
once again passed through the CNN, and k masks/images,
corresponding to k different neurons, with least Softmax
Classification Loss are selected. Here again, the ground-truth
class is supposed to be the VGG output for the original image.
F. Spitting out bounding box
The bounding box is now simply the smallest rectangle
that encloses the united area of all k top masks. See the red
bounding box in Fig. 1.
V. VALIDATION METHOD
I would normally compare the generated bounding box and
the ground-truth by Intersection over Union (IoU) metric.
However as I have already addressed the issue with the dataset,
the possibilities here for the validation procedure are very
limited. I finally decided to compare the classification score
of the cropped image of the bounding box with the original
Fig. 1. Negative Saliency Map of Guided Back-propagated gradients.
image. This might strike you as a self-fulfilling prophecy, as
I am in some way maximizing this very score by picking the
neurons with maximum contribution to it. To resolve this issue,
one could use a second network for validation, which makes
sense to me. I went with ResNet-50. Since both networks are
trained over ImageNet, it is straightforward to map classes
together. Table I summarizes the validation results. Mind that
VGGNET results for original images are once again treated
as the ground-truth here.
TABLE I
VALIDATION RESULTS
Input Images VGG16 ResNet − 50
Original 100.00% 70.20%
Bounding Box Cropped 56.60% 42.20%
I sampled 100 images from electronics and vehicles, per
category, 250 from personal, and 50 from for-the-home to
form a validation dataset of the size of 500. Then I cropped
the bounding box by setting the outlying pixel values to zero
and cached them to disk. Dropping to 70.20% when changing
the network might imply that the dataset is not the healthiest
dataset out there. Nevertheless, bounding box cropped results
look quite impressive to me!
DIVAR - 2ND INTERVIEW MINI PROJECT 3
Fig. 2. t-SNE representation of the personal category.
VI. T-SNE REPRESENTATION
I would like to briefly refer to the t-SNE representation of
the dataset using fc7 embeddings of VGG16, which is shown
in Fig. 2. Notice how similar photos cluster up regionally. It
is very useful to have a glance at the dataset and infer some
cornerstone facts for designing the algorithm. So this was the
first thing I did. The result is easy on the eyes by the way. You
can also find a larger t-SNE picture of the personal category
with more number of tiles in the attachment.
VII. ALGORITHM PROS AND CONS
• Pros:
– Collages or photos containing multiple objects are
handled nicely, k is high enough to detect objects
from all over the picture.
– Algorithm outputs full size of the image as the
bounding box, when it encounters a dull/monotone
image. These kinds of bounding boxes are more
frequent in for-the-home category.
• Cons:
– Multiple neurons might tend to a specific part of the
image. For instance, it turns out that neurons are very
sensitive to car wheels. One solution is to increase
k.
– Some parts of the algorithm cannot be parallel pro-
grammed. This might slow the computations a little
bit.
VIII. ABOUT THE CODE
• Find the code here: https://github.com/MahanFathi/
UnsupervisedObjectLocalization.
• Dependencies: TensorFlow, Numpy, Scipy,
Scikit-learn, matplotlib, LAPJV.
Fig. 3. Results.

More Related Content

What's hot

Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)SungminYou
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognitionYUNG-KUEI CHEN
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKMd Rajib Bhuiyan
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentationSteve Dias da Cruz
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...Ashray Bhandare
 
Problems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksProblems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksVipul Vaibhaw
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNNAshray Bhandare
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyNUPUR YADAV
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketakiKetaki Patwari
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkItachi SK
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between CapsulesKyuhwan Jung
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Muhammad Haroon
 

What's hot (20)

Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
MNIST and machine learning - presentation
MNIST and machine learning - presentationMNIST and machine learning - presentation
MNIST and machine learning - presentation
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
Bio-inspired Algorithms for Evolving the Architecture of Convolutional Neural...
 
Problems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networksProblems with CNNs and Introduction to capsule neural networks
Problems with CNNs and Introduction to capsule neural networks
 
Deep Learning - CNN and RNN
Deep Learning - CNN and RNNDeep Learning - CNN and RNN
Deep Learning - CNN and RNN
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
CNN and its applications by ketaki
CNN and its applications by ketakiCNN and its applications by ketaki
CNN and its applications by ketaki
 
Neural networks
Neural networksNeural networks
Neural networks
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsules
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Tech talk
Tech talkTech talk
Tech talk
 

Similar to Unsupervised Object Detection

Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisNaeem Shehzad
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural NetworksIRJET Journal
 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docxjaffarbikat
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsIRJET Journal
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesIRJET Journal
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET Journal
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep LearningIRJET Journal
 
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...Sangeetha Mam
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Alex Conway
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningIRJET Journal
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptxnitin571047
 
BMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialBMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialpotaters
 
Deep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionDeep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionAlex Conway
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_reportMatt Vitelli
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksParrotAI
 
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET- Real-Time Object Detection using Deep Learning: A SurveyIRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET- Real-Time Object Detection using Deep Learning: A SurveyIRJET Journal
 

Similar to Unsupervised Object Detection (20)

Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Decomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesisDecomposing image generation into layout priction and conditional synthesis
Decomposing image generation into layout priction and conditional synthesis
 
Handwritten Digit Recognition using Convolutional Neural Networks
Handwritten Digit Recognition using Convolutional Neural  NetworksHandwritten Digit Recognition using Convolutional Neural  Networks
Handwritten Digit Recognition using Convolutional Neural Networks
 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docx
 
Classification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its VariantsClassification of Images Using CNN Model and its Variants
Classification of Images Using CNN Model and its Variants
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
 
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
Image Classification using Deep Learning
Image Classification using Deep LearningImage Classification using Deep Learning
Image Classification using Deep Learning
 
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...
MODELLING AND SYNTHESIZING OF 3D SHAPE WITH STACKED GENERATIVE ADVERSARIAL NE...
 
Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017Deep Learning for Computer Vision - PyconDE 2017
Deep Learning for Computer Vision - PyconDE 2017
 
A Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep LearningA Survey on Image Processing using CNN in Deep Learning
A Survey on Image Processing using CNN in Deep Learning
 
FINAL_Team_4.pptx
FINAL_Team_4.pptxFINAL_Team_4.pptx
FINAL_Team_4.pptx
 
BMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorialBMVA summer school MATLAB programming tutorial
BMVA summer school MATLAB programming tutorial
 
Deep Neural Networks for Computer Vision
Deep Neural Networks for Computer VisionDeep Neural Networks for Computer Vision
Deep Neural Networks for Computer Vision
 
mvitelli_ee367_final_report
mvitelli_ee367_final_reportmvitelli_ee367_final_report
mvitelli_ee367_final_report
 
Colloquium.pptx
Colloquium.pptxColloquium.pptx
Colloquium.pptx
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET- Real-Time Object Detection using Deep Learning: A SurveyIRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET- Real-Time Object Detection using Deep Learning: A Survey
 

Recently uploaded

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdfKamal Acharya
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)ChandrakantDivate1
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdfKamal Acharya
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdfAldoGarca30
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...Amil baba
 

Recently uploaded (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)Introduction to Artificial Intelligence ( AI)
Introduction to Artificial Intelligence ( AI)
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 

Unsupervised Object Detection

  • 1. DIVAR - 2ND INTERVIEW MINI PROJECT 1 Object Localization With Classification Networks Mahan Fathi UnsupervisedObjectLocalization I. PRELIMINARIES This is a mini project as the second phase of the interview process, with the purpose of object localization on the raw image dataset of Divar, and by ‘raw,’ I mean there is no tagging or ground-truth bounding box available. Of course, running off-the-shelf trained networks on the dataset is the first thing that comes to mind, but doing so, I infer, is not exactly in-line with the interviewer expectations. There’s not much to passing image batches through a trained network and storing the outputs. So I opt to go with a more demanding method, that might entail more familiarity with neural nets and auto-differentiation frameworks. As a result, here I present an algorithm which is capable of producing fairly nice bounding boxes, by only using good old get-at-able classification networks. IMPORTANT: One other thing I would like to mention is that, my dearest grandfather passed away only a few hours after I received the project mail and countdown. I could not even make time to report the matter to you, and did not know if it was the right thing to do, but now I appreciate if you bear in mind that this is the result of the work done over a weekend and I had a really tough time getting around this in such a short time. II. INTRODUCTION GUIDED Back-propagation has proven to be a fast and interpretable util for visualizing the parts of the image to which a specific neuron fires the most — not exactly, just waving hands here. This is done by gating the gradients through Relu units when back-propagating from a single neuron all the way back to the input image. Classification networks store valuable spatial information about the picture and the idea is to make the most out of these gradients for the task of object localization when they correspond to the object. By targeting the neurons with the biggest positive impact on the classification score, we should be able to spot the parts of the image which we infer to confirm with the object. III. METHOD OVERVIEW An introduction to the method was given in the previous section, here I go into more detail. The first thing we need is a classification network. I picked VGG16 for Its simple archi- tecture, availability, and Its relative tininess for the better of my limited vram. As an algorithm hyper-parameter, we choose the layer from which the top neurons are drawn. After exper- imenting with the output, I finally settled block5_conv2 for the inputted neurons. One major perk of back-propagating from an intermediate layer, and not final feature vectors, also referred to as net embeddings or fc7 features, is that it incites more generality and flexibility for the algorithm to work on a wider range of image categories. To dive into the algorithm, our first step is to pass the image through the network to calculate the class scores. Assuming that VGGNET outputs the right class for the image, the aforementioned top neurons, are those with the biggest contribution to the maximum class score. These neurons are later guided back-propagated to the original image and only a handful of them make it to the mask generation step, which will be further described later in this reading. These masks are finally joined and the bounding box for the object is simply the smallest rectangle enclosing this area. Below an overall representation of the algorithm is provided in pseudo-code. Each step is later attended to individually. Algorithm 1 Object Localization with VGGNET 1: procedure LOCALIZATIONWITHVGG 2: Pass image through VGGNET to obtain the classifi- cation 3: Identify kmax most important neurons via DAM heuristic 4: Use Guided Back-propagation to map neurons back into the image 5: Generate masks from gradients’ saliency maps and apply to image separately 6: Pass resulted images once again to get class scores, pick top final k neurons 7: Join final masks and find enclosing bounding box IV. METHOD DESCRIPTION A. Passing image through net We need the class scores – the top class in particular – and the feature layer activations later. Although my narration makes it look like these steps are performed sequentially, but it is important to note that, these computations are not actually carried out in that way — the network is wired up in TensorFlow. B. Finding kmax neurons As it was mentioned in the previous section, we focus on the input neurons to the layer of block5_conv2. We need a notion of importance to select the kmax neurons. This selection is necessary, because there are approximately around 1000 neurons in this layer and back-propagating from all of
  • 2. DIVAR - 2ND INTERVIEW MINI PROJECT 2 them is not computationally practical. So we have to make a subset with the size of kmax and back-propagate from them. To introduce a notion of importance, I use the DAM heuristic which is proportional to the activation of the neuron and the top class score differentiation of its activation. So I form this matrix for the input layer of block5_conv2 and get kmax neuron indices with highest values: activations dtopClassScore dactivations I have set kmax to be 10. These 10 are then considered for back-propagation. C. Guided Back-propagating from neurons to image VGGNET uses Relu for the non-linearity units and Guided Back-propagation makes the differentiation of these units a tad different — the back-propagating signal on these units additionally must be thresholded on zero. A nice handle is implemented in the code that makes me able to switch from normal to guided back-propagation whenever I want. Here this switch is activated. The output of the guided back-propagation is a matrix with the size of the original input image. So now we have kmax or 10 different images, and each one of them tends to different parts of the original image. The negative saliency map of a guided back-propagated gradients are shown in Fig 1. D. Generating masks To clean up the back-propagated gradients, only pixel values that fall into a certain percentile are kept – their value is set to one for every channel – and the rest are set to zero. This binary image is sent through the morphological operations of dilation and erosion respectively, to ensure that there are no tiny islands and holes of active pixels on the mask. This is procedure is carried out for every one of the kmax neurons. E. Selecting top k neurons These kmax neurons are separately applied on the image and kmax masked images are produces. These images are once again passed through the CNN, and k masks/images, corresponding to k different neurons, with least Softmax Classification Loss are selected. Here again, the ground-truth class is supposed to be the VGG output for the original image. F. Spitting out bounding box The bounding box is now simply the smallest rectangle that encloses the united area of all k top masks. See the red bounding box in Fig. 1. V. VALIDATION METHOD I would normally compare the generated bounding box and the ground-truth by Intersection over Union (IoU) metric. However as I have already addressed the issue with the dataset, the possibilities here for the validation procedure are very limited. I finally decided to compare the classification score of the cropped image of the bounding box with the original Fig. 1. Negative Saliency Map of Guided Back-propagated gradients. image. This might strike you as a self-fulfilling prophecy, as I am in some way maximizing this very score by picking the neurons with maximum contribution to it. To resolve this issue, one could use a second network for validation, which makes sense to me. I went with ResNet-50. Since both networks are trained over ImageNet, it is straightforward to map classes together. Table I summarizes the validation results. Mind that VGGNET results for original images are once again treated as the ground-truth here. TABLE I VALIDATION RESULTS Input Images VGG16 ResNet − 50 Original 100.00% 70.20% Bounding Box Cropped 56.60% 42.20% I sampled 100 images from electronics and vehicles, per category, 250 from personal, and 50 from for-the-home to form a validation dataset of the size of 500. Then I cropped the bounding box by setting the outlying pixel values to zero and cached them to disk. Dropping to 70.20% when changing the network might imply that the dataset is not the healthiest dataset out there. Nevertheless, bounding box cropped results look quite impressive to me!
  • 3. DIVAR - 2ND INTERVIEW MINI PROJECT 3 Fig. 2. t-SNE representation of the personal category. VI. T-SNE REPRESENTATION I would like to briefly refer to the t-SNE representation of the dataset using fc7 embeddings of VGG16, which is shown in Fig. 2. Notice how similar photos cluster up regionally. It is very useful to have a glance at the dataset and infer some cornerstone facts for designing the algorithm. So this was the first thing I did. The result is easy on the eyes by the way. You can also find a larger t-SNE picture of the personal category with more number of tiles in the attachment. VII. ALGORITHM PROS AND CONS • Pros: – Collages or photos containing multiple objects are handled nicely, k is high enough to detect objects from all over the picture. – Algorithm outputs full size of the image as the bounding box, when it encounters a dull/monotone image. These kinds of bounding boxes are more frequent in for-the-home category. • Cons: – Multiple neurons might tend to a specific part of the image. For instance, it turns out that neurons are very sensitive to car wheels. One solution is to increase k. – Some parts of the algorithm cannot be parallel pro- grammed. This might slow the computations a little bit. VIII. ABOUT THE CODE • Find the code here: https://github.com/MahanFathi/ UnsupervisedObjectLocalization. • Dependencies: TensorFlow, Numpy, Scipy, Scikit-learn, matplotlib, LAPJV. Fig. 3. Results.