Yulia Honcharenko "Application of metric learning for logo recognition"

Yulia Honcharenko
Data scientist
Yulia Honcharenko
YouScan provides real-time monitoring and analytics of brand mentions
on social networks, blogs, forums, review sites and online news.
@yuliok

1.Logo recognition in YouScan: old and new approach

2.Data

3.How can we measure performance: NMI, Recall@n, F1-score

4.Baseline

5.Metric learning: Triplet loss, Proxy-nca loss, Softriple

6.Small tips & tricks

7.Synthetic data

8.Results
Agenda

Object detection
(n classes) +
Post-processor:
Image classiﬁcation
for crops
(n classes)
Before

Problems
• We need ~200 labelled
images for each new class

• We need to retrain two models
for adding every new class

• F1-score and MAP decreases
with adding of new classes

• Min time for adding new logo
is ~ 3 days

Solution: two steps approach
Detect all potential logos
Match all potential logos to our
existing logo “standards” from our
base

Second step: match every crop with existing logo class
Query Similar images from our base

Proﬁt
We only need few crops and 10-15
minutes to add new logo

• Public logo datasets + our own
data

• 973 train classes, 43 test classes,

• min count of images per class - 3
Data

Metrics: F1 score of end-to-end approach by IoU threshold
Apple
Instagram
• +1 false positive for Apple

• +1 true positive for Instagram
(IoU >= threshold)
IoU=0.7

Few-shot learning
Train dataset: N classes, many images
for every class
Test dataset: M classes, 1-10 images
per class

Baseline: learning to ﬁne-tune

Baseline: learning to ﬁne-tune
Take the weight matrix as an example. We can write the weight matrix as
where each class has a d-dimensional weight vector. In the training
stage, for an input feature we compute its cosine similarity to each weight vector
and obtain the similarity scores for all classes, where
. We can then obtain the prediction probability for each class
by normalizing these similarity scores with a softmax function. Here, the classiﬁer
makes a prediction based on the cosine distance between the input feature and the
learned weight vectors representing each class
Wb
[w1, w2 . . . wc]
f(xi)
[w1, w2 . . . wc] [si1, si2 . . . sic]
si,j = f(xi)T
wj/| f(xi)||wj |

Distance metric learning
Distance metric learning (or simply,
metric learning) aims at
automatically constructing task-
speciﬁc distance metrics from
supervised data, in a machine
learning manner.

• Image retrieval

• Near-duplicate detection

• Clustering

• Zero-shot learning
Distance metric learning

• Divide our test set on N(=number
of classes) clusters by K-means

• Compute NMI (Normalized Mutual
Information) of cluster labels and
ground truth labels
Metrics: NMI of test set k-means clusters

The Triplet Loss minimizes the distance between an anchor and a positive,
both of which have the same identity, and maximizes the distance between
the anchor and a negative of a diﬀerent identity.

Ltriplet(xa, xp, xn) = max(0,m + ||f(xa) − f(xp)||2
2 − ||f(xa) − f(xn)||2
2 )
Triplet loss

Intuitively we would like to have P
approximate the set of all data
points, i.e. for each x there is one
element in P which is close to x
w.r.t. the distance metric d. We call
such an element a proxy for x:

Proxy approximation error:
Proxies

NCA loss
The NCA ( Neighbourhood Components Analysis) loss tries to make x closer to y
than to any element in a set Z using exponential weighting:

Proxy- NCA loss
Just use proxies instead of using simple elements. So, algorithm will do the
next steps: sample triplet, formulate proxy triplet from sample, calculate loss

l = − log(
exp(−d(x, p(y)))
∑
p(z)∈p(Z)
exp(−d(x, p(z))

In conventional SoftMax loss, each
class has a representative center in
the last fully connected layer.
Examples in the same class will be
collapsed to the same center. It may
be inappropriate for the realworld
data as illustrated. In contrast,
SoftTriple loss keeps multiple
centers (e.g., 2 centers per class in
this example) in the fully connected
layer and each image will be
assigned to one of them. It is more
ﬂexible for modeling intra-class
variance in real-world data sets.
Softriple

Now, we assume that each class has K centers. Then, the similarity between the example and
the class C can be deﬁned as

xi
Si,c = maxk(xT
i wk
c)
Softriple

HardTriple loss improves the SoftMax loss by providing multiple centers for each class.
However, it requires the max operator to obtain the nearest center in each class while this
operator is not smooth and the assignment can be sensitive between multiple centers.
Inspired by the SoftMax loss, we can improve the robustness by smoothing the max
operator
Hardtriple

Compared with the SoftMax loss, we first increase the dimension of the
FC layer to include multiple centers for each class (e.g., 2 centers per
class in this example). Then, we obtain the similarity for each class by
different operators. Finally, the distribution over different classes is
computed with the similarity obtained from each class
Softriple

Softriple
Let’s deﬁne smoothed similarity

Results
• Object detection - 0.87 F1-score
• Triplet loss 0.71 F1-score
• Proxy-NCA - 0.81 F1-score
• Softriple - 0.75 F1-score

• Made from corrected old approach
predictions - biased
Validation problems
Lot of false positives
• Validation has no images without logo
Recall of new approach is deﬁnitely higher on prod
data, but not on validation data

Validation problems eﬀects
Validation recall 0.8 Validation recall 0.75

Validation problems eﬀects
Validation precision 0.877

• False positives from ﬁrst
iterations

• Unseen data - faces, eyes
other non-logo things, that are
false positives from our
detector

• ~23k images
Class “other”

• New class - new “other” samples

• 5-shot learning becomes 50-shot learning
Class “other”: problems
Solution
• Add “other” class to train set

Results
Other class is
mostly located
here but also all
over space
Results

Results
Logos here are separated very
well
“Text in circles” logos

Simple shapes problem

Synthetic data
Free logo generators
SynthText approach on
logos
Pinterest

Problems with small changes

• Remove small images from dataset

• Add more classes

• Add +5 pixels from each side to detector prediction

Augmentations:

• Add diﬀerent blures to augmentation

• Randomly add random amount of pixels from random side
Small things that helped

• Spatial Transformer Network

• Any synth data (text/images)
Things that didn’t help

Results: everything works great…

• We don’t need 100-200 labeled with bounding box images anymore. We
just need 5-10 crops aka standards

• We don’t need to retrain detector and classiﬁer, our models are universal
and works with diﬀerent logos

• It’s easier to control things: we can add/delete standards if we see that
there are samples/logos our model can’t deal with (earlier we had to add
this samples in train set and retrain model)
Results

Please be creative when you’ll create logo for your
startup
Thank you for attention!

Yulia Honcharenko "Application of metric learning for logo recognition"

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Yulia Honcharenko "Application of metric learning for logo recognition"

Similar to Yulia Honcharenko "Application of metric learning for logo recognition" (20)

More from Fwdays

More from Fwdays (20)

Recently uploaded

Recently uploaded (20)

Yulia Honcharenko "Application of metric learning for logo recognition"