5. Litterature
Google search engine:
“Learning Fine-grained Image Similarity wit Deep Ranking” (2014)
Visual search and Recommendation system from Flipkart, India’s largest e-commerce
company:
“Deep Learning based Large Scale Visual Recommendation and Search for E-
Commerce” (2017)
6. Main idea for image search
Images are projected on an Euclidean space, such that the more similar two images, the
smallest the distance between them in the embedding space.
Our goal is to learn an embedding function f(.) that assigns the smallest distance to more
similar pairs:
Consider three images p, p+ and p-. If p and p+ are more similar than p and p-, then
, 𝑓 𝑝𝑖 − 𝑓 𝑝𝑖
+
< 𝑓 𝑝𝑖 − 𝑓 𝑝𝑖
−
Consider a sample of triplets (pi, pi
+, pi
-), the function f(.) can be learned by minimizing the
following loss function:
max(0, 𝑓 𝑝𝑖 − 𝑓 𝑝𝑖
+
− 𝑓 𝑝𝑖 − 𝑓 𝑝𝑖
−
)
7. CNN architecture
The triplet sampling characterizes the relative
similarity relationship for three images.
Query image pi, positive image pi
+ and negative
image pi
- are fed independently into three identical
deep neural networks.
The ranking layer evaluates the loss of the triplet. It
does not have any parameter.
The network parameters are computed using the
classical back propagation algorithm to minimize the
ranking loss function.
8. CNN architecture
16-Layer VGG net: capture abstract, high level features of the input image
Shallow Conv Layers 1 and 2: capture fine-grained details of the input image
9. 𝑓(𝑝𝑖
+
)
𝑓(𝑝𝑖
−
)
𝑓(𝑝𝑖 )
And with few resources ?
Training a full network is computationally expensive.
Pre-trained networks for classification could be fine-tuned to address the image search
problem:
Tensorflow is well suited to download and modify pre-trained network.
The following results are computed from the pre-trained inception-V3 model .
reLu nerons
Only one step of
backpropagation
10. And with few resources ?
To ensure that inception model bottlenecks can be adapted to visual search, we first look at
the nearest images in the bottleneck space:
Results seem promising for images with white background:
11. And with few resources ?
To ensure that inception model bottlenecks can be adapted to visual search, we first look at
the nearest images in the bottleneck space:
Results need to be improved for “real life” pictures:
12. And with few resources ?
We use a selection of 18 000 products (2x(3000 seats + 3000 armchairs + 3000 sofas))
from Cdiscount catalog.
Inside category triplet: query and positive images are pictures of the same product,
while the negative image is taken from another product from the same category.
Outside category triplet: query and positive images are taken from products of the same
category, while the negative image represents a product from another category.
13. And with few resources ?
Lots of “bad” triplets…
Lead to “bad” learning…