Recognizing the main colors on an image is important both for product tagging and to find visually similar images. One problem is to discern the foreground from the background, as users are only interested in the main object. Another issue is in encoding the color information as a mathematical descriptor suited for visual similarity. And finally, encoding and predicting color is just the tip of the iceberg: besides color, we want to detect and encode other visual aspects. In this talk, we will go through the challenges and solutions Velebit AI used to develop its image tagging and similarity services.
[DSC Adria 23]Mladen Fernezir How to Encode Color for Visual Similarity and Product Tagging.pptx
1. How to Encode Color for Visual
Similarity and Product Tagging
Mladen Fernežir, Lead Data Scientist & Co-Founder
2. Velebit AI
● AI custom R&D
● AI consultancy
● Data engineering
● Deployment and monitoring
● Fast research and prototyping
● Images, text, tabular data
● Team with 8 years of experience
● AI solutions for online marketplaces
4. Keywords and Topics
Fashion Items and Online Marketplaces
Color Tagging
Visual Similarity
Deep Learning Modeling
Technical Challenges and Solutions
Data Collection and Labeling
5. Motivation and use-cases
● Visual aspects matter for online marketplaces
● People are selling and buying various items
● Color, material, style, and brand recognition
● Tagging for faster selling
● Automatic category prediction
● Finding visually similar items
● Visual recommendations
9. Ronneberger, Olaf et al. “U-Net: Convolutional Networks for Biomedical Image
Segmentation.” MICCAI (2015).
10. Real-word items are messy
● Multiple colors on the
image
● Difference in lightning and
contrast
● The real item color can
appear differently on a
photo
● Often not easy for people to
agree
12. Approaches for color tagging
● We modify some open source solutions, e.g.
https://github.com/algolia/color-extractor
○ background removal
○ automatic color pixel clustering
○ match pixel clusters to color names
● human labeling and cleaning
● teacher-student approach
○ get labels with a slow approach (human or automatic)
○ train a fast neural network to reproduce those labels
13. Why Deep Learning and neural networks?
● We want multiple objectives all at once
○ General category prediction
○ Other visual attributes
● We want a solution that is fast enough in production
● Deep Learning doesn't require feature crafting to learn complex
concepts
● Teacher (any slow process): provides correct labels
● Student (a neural network) :
○ reproduces the labels
○ does other tasks in parallel
14. Neural network color tagging (illustration)
iATC_Deep-mISF: A Multi-Label Classifier for Predicting the Classes of Anatomical Therapeutic
Chemicals by Deep Learning
20. We encode images as vectors
● We train the neural
network on multiple tasks
● We extract the network
features to encode images
as vectors
● Similar images get similar
high dimensional vectors
t-SNE visualization of CNN codes
23. Metric learning to encode pixels
● Color pixels can be
compared in the CIELAB
color space how close they
are perceptually
● Colors that are close in the
CIELAB space get
encoded into close vectors
(embeddings) Feature Fusion for Image Retrieval With Adaptive Bitrate
Allocation and Hard Negative Mining
24. CIELAB space
● Pixels close in the RGB
space are not perceptually
similar
● Pixels close in the CIELAB
space are perceptually
similar
● We encode the CIELAB
relations into color
embeddings
25. How to encode entire images into color vectors?
● There are many pixels on an image
● It's a long vector if we encode all of them separately
● We don't want to encode background pixels
● It works well to learn a proxy task first:
○ image color tagging
○ we take network features related to color tagging to create
image vectors
● We can also do image level metric learning for color
26. Metric learning to encode images
● We cluster pixels on an
image to form image color
signatures
● We use the Earth Mover
Distance to compare two
color signatures
● Metric learning can
encode this relationship
into image level color
vectors
Approximation Techniques for Indexing the Earth
Movers Distance in Multimedia Databases
The Earth Mover’s Distance as a Metric for
Image Retrieval
28. Color Tagging
teacher process to label the data
ignore background
neural network student
multiple other parallel tasks
Color Encoding for Visual Similarity
vectors to represent color
pixel level color vectors
image level color vectors
color tagging as a proxy task
metric learning