Similarity is a strange notion: it's the same, but different. You probably don't lose any sleep over it, but how could a deterministic algorithm deal with it?
This presentation will show how you can build deep neural networks that quantify similarity between images, and allow downstream tasks like content search or data clustering. We'll also get an understanding of how the data is represented in the networks, and why this works in the first place.
Does "similar" actually have a very specific meaning to you? There are many techniques out there to fine-tune networks for similarity, whether you have labeled ground truth examples or not.
2. Agenda What is image similarity?
Deep learning solution
Why does it work?
How to improve?
3. It’s a way to compare images to know if they share similar content
What is image similarity?
4. What is image similarity?
It’s a way to compare images to know if they share similar content
Similar content is loosely defined
– Independent of resolution
– Independent of translation, rotation
– Independent of color
– Independent of style
– Independent of object
Images have meaning
5. What is image similarity?
It’s a way to compare images to know if they share similar content
Compare is precisely quantified
– These are 98.647% similar:
6. – Reverse search
What can you do with similarity?
lotr, image, samwise
Chrysophylax Dives
Dragon
Alan Lee - Farmer Giles of Ham
Biographical Information
Hoard In the mountains, composed of
gold, silver, rings, necklaces, jewels,
diamonds, and more
12. How to search similar content?
– Compare pixel by pixel? 1 MPixels -> 1 million operations to check 1 image!
?
13. – Compare pixel by pixel? 1 MPixels -> 1 million operations to check 1 image!
– Find key points (?)
How to search similar content?
?
14. How to search similar content?
– Compare pixel by pixel? 1 MPixels -> 1 million operations to check 1 image!
– Find key points (?)
– Measure bright/dark areas (?)
?
15. How to search similar content?
Non-image content?
– Text
– Bank records
– Music
– Customer profiles
39. Decoder draws lines
– Categories mutually exclusive
– Many categories should organize as circle
40. Decoder draws lines
– Categories mutually exclusive
– Many categories should organize as circle
Categories can be identified by their relative angle
45. – Upload a catalogue of images
Similarity search demo
index
Import
46. – Upload a catalogue of images
– Pick a model
Similarity search demo
index
47. – Upload a catalogue of images
– Pick a model
– Index the images
Similarity search demo
index
48. – Upload a catalogue of images
– Pick a model
– Index the images
– Compress the query
Similarity search demo
index
49. – Upload a catalogue of images
– Pick a model
– Index the images
– Compress the query
– Compare compressed values
Similarity search demo
index
50. – Upload a catalogue of images
– Pick a model
– Index the images
– Compress the query
– Compare compressed values
– Pull top 3 matches from catalogue
Similarity search demo
index
53. Instance classification
– Decoder simple
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised Feature Learning via Non-parametric Instance Discrimination. Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, 3733–3742. https://doi.org/10.1109/CVPR.2018.00393
55. Instance classification
– Decoder simple
– Preserve some information by learning category
– Preserve all information by learning individual examples
Sam
That one picture called
img_59875.jpg
56. Instance classification
– Works without labels
– “Squeeze your eyes” model: no semantic information
– Continuity and small embedding size push similar images together
– Individual classification prevents hash collision
59. To me,
– Odd numbers (1,3,5...) are more similar than Even numbers (2,4,6...)
– Consecutive numbers are more similar
Express similarity as a probability for the loss function:
Explicit similarity
compared to: 1 3 5 7 9 0
In degrees: 0° 22.5° 45° 66° 90° 90°
In cos sim: 1 0.92 0.70 0.38 0 0
In crossentropy
probability:
0.95 0.05 0.00001 0.0000... 0 0
60. Explicit similarity
angle = 22.5°
cos(angle) = 0.92
angle = 90°
cos(angle) = 0
compared to: 1 3 5 7 9 0
In degrees: 0 22.5 45 67.5 90 90
In cos sim: 1 0.92 0.70 0.38 0 0
In crossentropy
probability:
0.95 0.05 0.00001 0.00... 0 0
61. Contrastive loss
– Tune double-network
– Predict similar or not
cosine
similarity
0
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., … Wu, Y. (2014). Learning fine-grained image similarity with deep ranking.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1386–1393. https://doi.org/10.1109/CVPR.2014.180
62. Contrastive loss
– Tune double-network
– Predict similar or not
cosine
similarity
1
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., … Wu, Y. (2014). Learning fine-grained image similarity with deep ranking.
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1386–1393. https://doi.org/10.1109/CVPR.2014.180
63. Contrastive loss
– Tune double-network
– Predict similar or not
– Siamese weights or not
– Replace comparator
1
Garcia, N., & Vogiatzis, G. (2019). Learning non-metric visual similarity for image retrieval. Image and Vision Computing, 82, 18–25.
https://doi.org/10.1016/j.imavis.2019.01.001
64. – Tune double-network
– Predict similar or not
– Siamese weights or not
– Replace comparator
– Reference, positive, negative triplet
Triplet loss
1
0
0
Hermans, A., Beyer, L., & Leibe, B. (2017). In Defense of the Triplet Loss for Person Re-Identification. Retrieved from
http://arxiv.org/abs/1703.07737
65. – Tune double-network
– Predict similar or not
– Siamese weights or not
– Replace comparator
– 8 inputs
Octoplet loss
1001
0100
0101
0110
0010
1111
0101
0111
66. Summary
Image similarity is a way to quantify
visual and semantics of images
Deep neural networks can compress general data
to make comparison fast
Cosine similarity gives intuitive values
You can train models using anything from
- No extra information
- Labeled information
- Arbitrary “is similar” information
74. Labeling distances
– Deduce distances from dataset organization
Distance: 1 2 3 4
ImageNet: Same image with noise /
augmentation
Image in same category Image from other
category
75. Labeling distances
– Deduce distances from dataset organization
Distance: 1 2 3 4
ImageNet: Same image with noise /
augmentation
Image in same category Image from other
category
Large image: Same image with noise /
augmentation
Neighbor in space / time /
sequence
Gildenblat, J., & Klaiman, E. (2019). Self-Supervised Similarity Learning for Digital Pathology. 1–8. Retrieved from
http://arxiv.org/abs/1905.08139
76. Labeling distances
– Deduce distances from dataset organization
Distance: 1 2 3 4
ImageNet: Same image with noise /
augmentation
Image in same category Image from other
category
Large image: Same image with noise /
augmentation
Neighbor in space / time /
sequence
Text: Following sentence Same book Same author Same century