© Vigen Sahakyan 2016
Content Based Image Retrieval by
Deep Learning
© Vigen Sahakyan 2016
Agenda
● Goals
● What is CBIR?
● What is Deep Learning ?
● AutoEncoder
● Tool description
© Vigen Sahakyan 2016
Goals
● We want to create Image search system based on Machine Learning
technique, which can do searching by image content. It has lots of
applications in public safety, military, medicine diagnoses e.t.c
● In modern web we have millions and billions of images without labels and
only a couple thousands of labeled images. The problem is how we can use
the power of this unlabeled data in our system ?
● In this presentation we explain our CBIR system which able to collect all
meaningful information from unlabeled data by using one of the widely used
Deep Learning technique which is called AutoEncoder.
© Vigen Sahakyan 2016
What is CBIR?
● Content Based Image Retrieval (CBIR)
● Is the process by which one searches for similar images.
● "Content-based" means that the search analyzes the contents of the image
rather than the metadata such as keywords, tags, or descriptions associated
with the image.
● One of the open problems in Computer Vision.
● It has lots of applications in many fields such as (Public safety, Military,
Medical Diagnoses, Robotics e.t.c)
© Vigen Sahakyan 2016
What is Deep Learning?
1. Deep learning is a branch of machine learning based on a set of algorithms that attempt to model
high-level abstractions in data by using multiple processing layers.
2. It’s used in Machine Learning to automatically figure out high level feature.
3. By Deep Learning we can extract high level features like shape, texture, contrast e.t.c from image
datasets(it’s not necessary for images to be labeled).
4. There are lots of Deep Learning algorithms
like Convolutional and Recursive Neural
Network, Deep Belief Network, Restricted
Boltzmann Machine e.t.c. In this work we
were used AutoEncoder .
5. It has lots of applications in many fields such
as (Computer Vision, Search Engines, Speech
Recognition, Artificial Intelligence e.t.c)
© Vigen Sahakyan 2016
AutoEncoder
● The aim of an autoencoder is to learn a representation (encoding) for a set of data,
typically for the purpose of dimensionality reduction.
● Recently, the autoencoder concept has become more widely used for learning
generative models of data
● The AutoEncoder is also a Neural Network.
The difference is that the AutoEncoder uses
unsupervised learning. To achieve this, the
AutoEncoder gets the same input value vector
at the output. Differences in the vectors at the
output can be considered errors for
backpropagation. It try to learn codec on hidden
layer (encoded value).
● Input = Decode(Encode(Input))
© Vigen Sahakyan 2016
Tool description
1. First of all Web service receive raw image (.jpg, .png, e.t.c) and pass it to
preprocessing step.
2. Preprocess raw Image:
a. Resize image to the appropriate size (our model size)
b. Generate GrayScale representation of resized image.
3. Generate row vector from preprocessed image pixels.
4. Call Normalization module
© Vigen Sahakyan 2016
Tool description
We call sigmoid function on value of every neuron
and it useful to have normalized inputs, to find global
minimum faster and improve error rate.
1. We do Min-Max normalization of input values by following
formula. zi
=(xi
−min(x))/(max(x)−min(x))
2. In our case zi
= xi
/ 255
3. Call Encoding module
© Vigen Sahakyan 2016
Tool description
We have already pretrained our AutoEncoder model via stochastic gradient
descent. As dataset we used 60000 unlabeled images of handwritten digits. After
training AutoEncoder figured out lots of high level feature of those images.
1. We feed our normalized row image to our AutoEncoder then we get more
compact feature vector (this vector represent probabilities of each high level
feature to be found on this image).
2. We pass new compact vector to Classifier module. (There isn’t need to
normalize this vector as it’s already had normalized when passed through
sigmoid function)
© Vigen Sahakyan 2016
Tool description
We pre trained our Neural Network classifier with several
thousands of labeled examples which were passed through
the AutoEncoder.
1. We feed row vector encoded by AutoEncoder
and call Result retrieval module to figure out
Result class from output layer.
© Vigen Sahakyan 2016
Tool description
Each node in the output layer will have a probability that it's class is the
correct output.
1. If the probability of one of the outputs class is greater than the
threshold (0.5) then it is considered as result class.
© Vigen Sahakyan 2016
Result
We tested our algorithm on MNIST digital handwritten image dataset and
compared it with the couple of famous article results.
MNIST
Our algorithm 95%
Yann LeCun algorithm 95.3%
Aurelio Ranzato algorithm 99%

CBIR by deep learning

  • 1.
    © Vigen Sahakyan2016 Content Based Image Retrieval by Deep Learning
  • 2.
    © Vigen Sahakyan2016 Agenda ● Goals ● What is CBIR? ● What is Deep Learning ? ● AutoEncoder ● Tool description
  • 3.
    © Vigen Sahakyan2016 Goals ● We want to create Image search system based on Machine Learning technique, which can do searching by image content. It has lots of applications in public safety, military, medicine diagnoses e.t.c ● In modern web we have millions and billions of images without labels and only a couple thousands of labeled images. The problem is how we can use the power of this unlabeled data in our system ? ● In this presentation we explain our CBIR system which able to collect all meaningful information from unlabeled data by using one of the widely used Deep Learning technique which is called AutoEncoder.
  • 4.
    © Vigen Sahakyan2016 What is CBIR? ● Content Based Image Retrieval (CBIR) ● Is the process by which one searches for similar images. ● "Content-based" means that the search analyzes the contents of the image rather than the metadata such as keywords, tags, or descriptions associated with the image. ● One of the open problems in Computer Vision. ● It has lots of applications in many fields such as (Public safety, Military, Medical Diagnoses, Robotics e.t.c)
  • 5.
    © Vigen Sahakyan2016 What is Deep Learning? 1. Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using multiple processing layers. 2. It’s used in Machine Learning to automatically figure out high level feature. 3. By Deep Learning we can extract high level features like shape, texture, contrast e.t.c from image datasets(it’s not necessary for images to be labeled). 4. There are lots of Deep Learning algorithms like Convolutional and Recursive Neural Network, Deep Belief Network, Restricted Boltzmann Machine e.t.c. In this work we were used AutoEncoder . 5. It has lots of applications in many fields such as (Computer Vision, Search Engines, Speech Recognition, Artificial Intelligence e.t.c)
  • 6.
    © Vigen Sahakyan2016 AutoEncoder ● The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. ● Recently, the autoencoder concept has become more widely used for learning generative models of data ● The AutoEncoder is also a Neural Network. The difference is that the AutoEncoder uses unsupervised learning. To achieve this, the AutoEncoder gets the same input value vector at the output. Differences in the vectors at the output can be considered errors for backpropagation. It try to learn codec on hidden layer (encoded value). ● Input = Decode(Encode(Input))
  • 7.
    © Vigen Sahakyan2016 Tool description 1. First of all Web service receive raw image (.jpg, .png, e.t.c) and pass it to preprocessing step. 2. Preprocess raw Image: a. Resize image to the appropriate size (our model size) b. Generate GrayScale representation of resized image. 3. Generate row vector from preprocessed image pixels. 4. Call Normalization module
  • 8.
    © Vigen Sahakyan2016 Tool description We call sigmoid function on value of every neuron and it useful to have normalized inputs, to find global minimum faster and improve error rate. 1. We do Min-Max normalization of input values by following formula. zi =(xi −min(x))/(max(x)−min(x)) 2. In our case zi = xi / 255 3. Call Encoding module
  • 9.
    © Vigen Sahakyan2016 Tool description We have already pretrained our AutoEncoder model via stochastic gradient descent. As dataset we used 60000 unlabeled images of handwritten digits. After training AutoEncoder figured out lots of high level feature of those images. 1. We feed our normalized row image to our AutoEncoder then we get more compact feature vector (this vector represent probabilities of each high level feature to be found on this image). 2. We pass new compact vector to Classifier module. (There isn’t need to normalize this vector as it’s already had normalized when passed through sigmoid function)
  • 10.
    © Vigen Sahakyan2016 Tool description We pre trained our Neural Network classifier with several thousands of labeled examples which were passed through the AutoEncoder. 1. We feed row vector encoded by AutoEncoder and call Result retrieval module to figure out Result class from output layer.
  • 11.
    © Vigen Sahakyan2016 Tool description Each node in the output layer will have a probability that it's class is the correct output. 1. If the probability of one of the outputs class is greater than the threshold (0.5) then it is considered as result class.
  • 12.
    © Vigen Sahakyan2016 Result We tested our algorithm on MNIST digital handwritten image dataset and compared it with the couple of famous article results. MNIST Our algorithm 95% Yann LeCun algorithm 95.3% Aurelio Ranzato algorithm 99%