Image similarity with deep learning

Speaker: Romain Futrzynski, Peltarion
Organizers: David Rydén, Filip Wästberg, Stockholm Data Science

Agenda What is image similarity?
Deep learning solution
Why does it work?
How to improve?

It’s a way to compare images to know if they share similar content
What is image similarity?

Similar content is loosely deﬁned
– Independent of resolution
– Independent of translation, rotation
– Independent of color
– Independent of style
– Independent of object
Images have meaning

Compare is precisely quantiﬁed
– These are 98.647% similar:

– Reverse search
What can you do with similarity?
lotr, image, samwise
Chrysophylax Dives
Dragon
Alan Lee - Farmer Giles of Ham
Biographical Information
Hoard In the mountains, composed of
gold, silver, rings, necklaces, jewels,
diamonds, and more

– Exploration

– Face recognition

– Landmark recognition

– Data clustering

How to search similar content?
– Compare pixel by pixel? 1 MPixels -> 1 million operations to check 1 image!
?

– Find key points (?)
?

– Find key points (?)
– Measure bright/dark areas (?)
?

Non-image content?
– Text
– Bank records
– Music
– Customer proﬁles

Comparing is slow Similar isn’t speciﬁc

💡 Compress the data
– Compare compressed data
– Search 6000x faster! 6 MB
1 kB
?
?

✅ Search is fast Similar isn’t speciﬁc

Information should be preserved
6 MB 1 kB
compress

Information should be preserved
6 MB 1 kB
compress
decompress

– Preserves information
– Takes in “any” type of data
– Fix-sized layer
– Continuous function
Autoencoder network
encoder decoder

Is compressed data comparable?
encoder decoder
?

Is compressed data comparable?
encoder decoder
?
zip unzip NO
Complexity of encoder-decoder may prevent easy comparison
md5 hash md5 hack NO

Decoder needs to be simple
encoder decoder

– Makes compressed data
easily comparable
encoder decoder

easily comparable
– Can’t reconstruct data well
encoder decoder

easily comparable
– Can’t reconstruct data well
– Preserve some information
encoder decoder
Sam

That’s a classiﬁcation problem!
Sam

Horse
or not
Classiﬁcation network

Horse
or not
Encoder creates points
= (1.3; 2.2)

Horse
or not
Encoder creates points
= (1.3; 2.2)
X Y

Decoder draws lines
horse
not
horse
Horse
or not
= (X; Y)

Decoder draws lines
= (X; Y)
horsenot
horse
not
dolphin
dolphin
Horse | Dolphin

Decoder draws lines
not horse
not dolhpin
not tiger
= (X; Y)
Horse | Dolphin | Tiger

Decoder draws lines
– Categories mutually exclusive
– Many categories should organize as circle

Decoder draws lines
– Categories mutually exclusive
– Many categories should organize as circle
Categories can be identiﬁed by their relative angle

Trigonometry
Degree is not a convenient unit
– Where is 36° ?
??
?
?

angle: 5.2° 36° 83° 177°
cos(angle): 0.995 0.8 0.12 -0.998
Trigonometry
angle angle angle angle
cosine cosine cosine cosine

✅ Search is fast ✅ Similar is measurable

– Upload a catalogue of images
Similarity search demo
index
Import

– Pick a model
index

– Pick a model
– Index the images
index

– Pick a model
– Compress the query
index

– Pick a model
– Compare compressed values
index

– Pick a model
– Compare compressed values
– Pull top 3 matches from catalogue
index

Is that new?
“Similarity”
“Classiﬁcation”
Hummin
gbird
Hummin
gbird

Instance classiﬁcation
– Decoder simple
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised Feature Learning via Non-parametric Instance Discrimination. Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 3733–3742. https://doi.org/10.1109/CVPR.2018.00393

– Decoder simple
– Preserve some information by learning category
Sam

– Decoder simple
– Preserve some information by learning category
– Preserve all information by learning individual examples
Sam
That one picture called
img_59875.jpg

– Works without labels
– “Squeeze your eyes” model: no semantic information
– Continuity and small embedding size push similar images together
– Individual classiﬁcation prevents hash collision

Classiﬁcation
– Labels give semantic information
– “Squeeze your eyes” still plays a part

Multi-label classiﬁcation
– Extra semantic information
Species: Horse/Zebra/Tiger…
Vegetarian: Yes/No
Aquatic: Yes/No

To me,
– Odd numbers (1,3,5...) are more similar than Even numbers (2,4,6...)
– Consecutive numbers are more similar
Express similarity as a probability for the loss function:
Explicit similarity
compared to: 1 3 5 7 9 0
In degrees: 0° 22.5° 45° 66° 90° 90°
In cos sim: 1 0.92 0.70 0.38 0 0
In crossentropy
probability:
0.95 0.05 0.00001 0.0000... 0 0

Explicit similarity
angle = 22.5°
cos(angle) = 0.92
angle = 90°
cos(angle) = 0
compared to: 1 3 5 7 9 0
In degrees: 0 22.5 45 67.5 90 90
In cos sim: 1 0.92 0.70 0.38 0 0
In crossentropy
probability:
0.95 0.05 0.00001 0.00... 0 0

Contrastive loss
– Tune double-network
– Predict similar or not
cosine
similarity
0
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., … Wu, Y. (2014). Learning fine-grained image similarity with deep ranking.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1386–1393. https://doi.org/10.1109/CVPR.2014.180

Contrastive loss
cosine
similarity
1
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., … Wu, Y. (2014). Learning fine-grained image similarity with deep ranking.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1386–1393. https://doi.org/10.1109/CVPR.2014.180

Contrastive loss
– Siamese weights or not
– Replace comparator
1
Garcia, N., & Vogiatzis, G. (2019). Learning non-metric visual similarity for image retrieval. Image and Vision Computing, 82, 18–25.
https://doi.org/10.1016/j.imavis.2019.01.001

– Reference, positive, negative triplet
Triplet loss
1
0
0
Hermans, A., Beyer, L., & Leibe, B. (2017). In Defense of the Triplet Loss for Person Re-Identification. Retrieved from
http://arxiv.org/abs/1703.07737

– 8 inputs
Octoplet loss
1001
0100
0101
0110
0010
1111
0101
0111

Summary
Image similarity is a way to quantify
visual and semantics of images
Deep neural networks can compress general data
to make comparison fast
Cosine similarity gives intuitive values
You can train models using anything from
- No extra information
- Labeled information
- Arbitrary “is similar” information

Unknown images
not horse
not dolhpin
not tiger
= (X; Y)

Unknown images
not horse
not dolhpin
not tiger
= (X; Y)
?

Unknown images
= (X; Y)

Labeling distances
– Deduce distances from dataset organization
Distance: 1 2 3 4
ImageNet: Same image with noise /
augmentation
Image in same category Image from other
category

Labeling distances
Distance: 1 2 3 4
augmentation
category
Large image: Same image with noise /
augmentation
Neighbor in space / time /
sequence
Gildenblat, J., & Klaiman, E. (2019). Self-Supervised Similarity Learning for Digital Pathology. 1–8. Retrieved from
http://arxiv.org/abs/1905.08139

Labeling distances
Distance: 1 2 3 4
augmentation
category
Large image: Same image with noise /
augmentation
Neighbor in space / time /
sequence
Text: Following sentence Same book Same author Same century

Image similarity with deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Image similarity with deep learning

Similar to Image similarity with deep learning (20)

Recently uploaded

Recently uploaded (20)

Image similarity with deep learning