Metric-learn,
a Scikit-learn compatible package
October 6, 2018
William de Vazelhes
wdevazelhes
william.de-vazelhes@inria.fr
1 / 48
About me:
William de Vazelhes
Engineer @Inria Lille, Magnet team, since 2017
work on metric-learn, with @bellet and @nvauquie.
Joint work with Inria Parietal team (scikit-learn developers), esp. @ogrisel,
@GaelVaroquaux, @agramfort
few contributions to scikit-learn
2 / 48
Summary
Introduction to Machine Learning with scikit-learn
 Introduction to Metric Learning
Presentation of the metric-learn package
3 / 48
Summary
Introduction to Machine Learning with scikit-learn
 Introduction to Metric Learning
Presentation of the metric-learn package
4 / 48
De nition
Machine learning is a field of computer science that uses statistical
techniques to give computer systems the ability to "learn" (e.g.,
progressively improve performance on a specific task) with data,
without being explicitly programmed. -- Wikipedia
5 / 48
Applications
6 / 48
scikit-learn: Machine Learning in Python
used by > 500,000 data scientists daily around the world
30k stars on GitHub
1000+ contributors
A lot of estimators
A lot of machine learning routines
Very detailed documentation
v0.20.0 just a few days ago
7 / 48
Running example: Face Recognition
We have a dataset of labeled images:
'Smith' 'Cooper'
'Stevens' 'Smith'
'Stevens'
...: ...
8 / 48
Running example: Face Recognition
We have a dataset of labeled images:
'Smith' 'Cooper'
'Stevens' 'Smith'
'Stevens'
...: ...
We want to classify a new image:
? → 'Cooper'
9 / 48
Load dataset fromscikit-learn
Input data: 400 greyscale images of 64 x 64 → 400 samples of 4096 features
each
(400, 4096) (400,)
[[0.30991736 0.3677686 0.41735536 ... 0.15289256 0.16115703 0.1570248 ]
[0.45454547 0.47107437 0.5123967 ... 0.15289256 0.15289256 0.15289256]
...
[0.21487603 0.21900827 0.21900827 ... 0.57438016 0.59090906 0.60330576]
[0.5165289 0.46280992 0.28099173 ... 0.35950413 0.3553719 0.38429752]]
['Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Mcmahon' 'Mcmahon' '
'Mcmahon' 'Mcmahon' 'Mcmahon' 'Mcmahon' 'Mcmahon' 'Mcmahon' ... 'Mccarty' 'Mccarty' 'Rivers'
'Rivers' 'Rivers' 'Rivers' 'Rivers' 'Rivers']
import numpy as np
from sklearn.datasets import fetch_olivetti_faces
dataset = fetch_olivetti_faces()
names = np.array(['Hart', 'Mcmahon', 'Cain', 'Mahoney', 'Long', 'Green', 'Vega', 'H
X, y = dataset.data, names[dataset.target]
print(X.shape, y.shape)
print(X)
print(y)
10 / 48
Split between train/test
Train set: to train the ML algorithm
Test set: to simulate some unseen data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y.shape)
print(X_test.shape, y_test.shape)
(300, 4096) (400,)
(100, 4096) (100,)
11 / 48
Train the classi er
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train, y_train)
12 / 48
Predict/score on newsamples
clf.predict(X_test)
array(['Villa', 'Benitez', 'Benson', 'Petersen', 'Acosta', 'Pace',
'Christian', 'Perkins', 'Green', 'Keller', 'Mahoney', 'Benson',
...
'Benitez', 'Gilmore',
'Hurst', 'Mcmahon', 'Keller', 'Vega', 'Hart', 'Porter'],
dtype='<U11')
clf.score(X_test, y_test)
0.91
13 / 48
Select hyperparameters...
Create validation set for evaluating the models
0.96
0.9733333333333334
clf_1 = LogisticRegression(C=0.1)
clf_2 = LogisticRegression(C=1)
X_train_bis, X_validation, y_train_bis, y_validation = train_test_split(X_train,
for clf in [clf_1, clf_2]:
clf.fit(X_train_bis, y_train_bis)
print(clf.score(X_validation, y_validation))
14 / 48
... which is easy with GridSearchCV
from sklearn.model_selection import GridSearchCV
clf = LogisticRegression()
grid = {'C': [0.1, 1, 5], 'penalty': ['l1', 'l2']}
clf = GridSearchCV(clf, grid)
clf.fit(X_train, y_train)
print(clf.best_params_)
print(clf.best_score_)
{'C': 5, 'penalty': 'l2'}
0.9633333333333334
15 / 48
Summary
Introduction to Machine Learning with scikit-learn
Introduction to Metric Learning
Presentation of the metric-learn package
16 / 48
Face matching for access authorization
Many people in an organisation, but only a few pictures each
Incoming picture: does it match some member ?
Also have a huge database of unlabeled images from a lot of people (from
a faces database)
Mech. turks labeled pairs of images as "same person"/"different persons"
(hard to directly label images)
https://www.facefirst.com/wp-content/uploads/2018/04/Screen-Shot-2018-04-26-at-4.12.56-PM.png
17 / 48
Learn a good metric
Learn a metric that puts similar points closer and dissimilar points
further apart
𝑑
18 / 48
Applications ofMetric Learning
https://proxy.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.computerhope.com%2Fjargon%2Ff%2Fface-id-truedepth-camera.jpg&f=1.jpg https://rrc.ru/upload/splunk/splunk-workshop/Discovery%20Day%20Russia%20-%20Machine%20Learning.pdf https://i2.wp.com/www.touahria.com/wp-
19 / 48
Loading pairs ofimages
Dataset: Pairs of similar points and dissimilar points
from sklearn.datasets import fetch_lfw_pairs
dataset = fetch_lfw_pairs()
pairs = dataset.pairs
y = 2 * dataset.target - 1
for i in range(2):
plt.subplot(1, 2, i+1)
plt.imshow(pairs[0, i, :, :], cmap='Greys_r')
print(y[0])
1
20 / 48
Loading pairs ofimages
pairs = pairs.reshape(pairs.shape[0], 2, -1)
print(pairs)
print(y)
[[[ 73.666664 70.666664 81.666664 ... 152. 159.66667 155. ]
[ 66. 74.333336 84.333336 ... 225.66667 229.66667 233.33333 ]]
[[ 86.333336 113.333336 133.33333 ... 157.66667 87.333336 49.666668]
[109. 92.666664 114.333336 ... 106. 114.333336 122.333336]]
[[ 37.333332 35.333332 34. ... 192.33333 197. 198. ]
[ 24. 28.333334 32. ... 51.333332 52.333332 52. ]]
...
[[ 73. 94.333336 121.333336 ... 226.66667 229. 227.66667 ]
[ 23. 20.333334 21.333334 ... 64. 71. 82.333336]]
[[119. 110.333336 112.666664 ... 244.33333 239.66667 230.33333 ]
[106.333336 94.333336 88.333336 ... 145.33333 130. 102.333336]]
[[ 23.333334 20. 23.333334 ... 190.33333 187.66667 174.66667 ]
[ 34.666668 44.666668 70. ... 146.33333 151. 159. ]]]
[ 1 1 1 ... -1 -1 -1]
21 / 48
Split between train and test
pairs_train, pairs_test, y_train, y_test = train_test_split(pairs, y)
test
train
[3.2, 6.8, 9.1] [2.5, 1.8, 2.5]
[3.1, 6.7, 1.8] [3.2, 6.8, 9.1]
[3.5, 4.9, 1.0] [8.5, 7.2, 9.0]
[4.5, 9.0, 4.2] [3.8, 6.4, 2.6]
1
-1
1
1
[
]
[
[
[
[
]
]
]
]
22 / 48
Howdo you learn on this data ?
 Example: Mahalanobis Metric for Clustering (MMC)
Parameters to learn: a transformation matrix
That transforms into a new representation
Associated metric: : the euclidean distance in the new space
Problem to solve :
s.t.
𝐿
𝑥 𝑖 𝐿 𝑥 𝑖
||𝐿 − 𝐿 ||𝑥 𝑖 𝑥 𝑗
||𝐿 − 𝐿 |min𝐿 ∑
( , )∈𝑆𝑥 𝑖 𝑥 𝑗
𝑥 𝑖 𝑥 𝑗 |
2
||𝐿 − 𝐿 || ≥ 1∑
( , )∈𝐷𝑥 𝑖 𝑥 𝑗
𝑥 𝑖 𝑥 𝑗
23 / 48
What can you do with this learned metric ?
KNN classification: find the nearest neighbors of some w.r.t. the
learned metric
Clustering: use the learned metric to cluster together similar samples
...
𝑥 𝑖
24 / 48
Summary
Introduction to Machine Learning with scikit-learn
 Introduction to Metric Learning
Presentation of the metric-learn package
25 / 48
Introduction
created by CJ Carey (@perimosocordiae) and Yuan Tang (@terrytangyuan)
472 stars on GitHub
9 algorithms
documentation
13 contributors:
perimosocordiae 4,601 ++ 3,211 --
terrytangyuan 1,268 ++ 218 --
bhargavvader 897 ++ 26 --
wdevazelhes 706 ++ 213 --
Callidior 635 ++ 38 --
svecon 458 ++ 143 --
dsquareindia 141 ++ 1 --
ab-anssi 102 ++ 38 --
anirudt 6 ++ 0 --
arikpoz 4 ++ 2 --
toto 3 ++ 3 --
shalan 1 ++ 1 --
michaelstewart 1 ++ 1 --
+ other contributions
26 / 48
Introduction
Metric-learn v0.4.0 just released 1 month ago
But not yet compatible with scikit learn
Rest of the talk: about v.0.5.0 (release in a few weeks)
27 / 48
 Challenge: make it scikit learn compatible
28 / 48
Sklearn compatibility
After loading and splitting we had:
test
train
1
-1
1
1
Concretely represented by:
test
train
[3.2, 6.8, 9.1] [2.5, 1.8, 2.5]
[3.1, 6.7, 1.8] [3.2, 6.8, 9.1]
[3.5, 4.9, 1.0] [8.5, 7.2, 9.0]
[4.5, 9.0, 4.2] [3.8, 6.4, 2.6]
1
-1
1
1
[
]
[
[
[
[
]
]
]
]
29 / 48
Sklearn compatibility
Scikit-learn routines work with this format !
from metric_learn import MMC
from sklearn.model_selection import GridSearchCV
grid = {'alpha': [0.1, 1, 10]}
mmc = MMC()
metric_learner = GridSearchCV(mmc, grid)
metric_learner.fit(pairs_train, y_train)
30 / 48
Sklearn compatibility
Scikit-learn routines work with this format !
from metric_learn import MMC
from sklearn.model_selection import GridSearchCV
grid = {'alpha': [0.1, 1, 10]}
mmc = MMC()
metric_learner = GridSearchCV(mmc, grid)
metric_learner.fit(pairs_train, y_train)
But: this 3D array is very redundant: data duplication in each pair which
reuses one sample
31 / 48
Sklearn compatibility
Other solution: 2D arrays of indices
First argument of the metric learner is now indices (2D array of indices)
Give also the X array when initializing the metric learner
0 3
4 0
1 5
6 7test
train
[3.2, 6.8, 9.1]
[3.5, 4.9, 1.0]
[1.5, 2.9, 4.0]
[2.5, 1.8, 2.5]
[3.1, 6.7, 1.8]
[8.5, 7.2, 9.0]
[4.5, 9.0, 4.2]
[3.8, 6.4, 2.6]
1
-1
1
1
[
]
[
[
[ ]
]
]
[ ]
32 / 48
Sklearn compatibility
Other solution: 2D arrays of indices
from metric_learn import MMC
from sklearn.model_selection import GridSearchCV
grid = {'alpha': [0.1, 1, 10]}
mmc = MMC(preprocessor=data)
metric_learner = GridSearchCV(mmc, grid)
metric_learner.fit(pairs_train_indices, y_train)
33 / 48
Sklearn compatibility
Other solution: 2D arrays of indices
Other example of accepted data:
path_pairs_train = [['img_1.png', 'img_2.png'], ['img_2.png', 'img_4.png'], ['img_2
root = '~/images'
itml = ITML(preprocessor=ImgLoader(root))
itml.fit(path_pairs, y_train)
34 / 48
Sklearn compatibility
Note
Pairs will be formed batch-wise from indices inside the algorithm:
def fit(self, indices, y):
weights_update = np.zeros(d, d)
for indices_batch in yield_batches(indices):
weights_update += some_computation(preprocessor(batch_indices))
35 / 48
 Package Overview
36 / 48
Algorithms
Fully Supervised:
classification: NCA, LMNN, LFDA, Covariance
regression: MLKR
Weakly Supervised:
pairs: MMC, ITML, SDML
quadruplets: LSML
Every pairs/quadruplets based algorithm comes with a *_Supervised version
that creates pairs/quadruplets on the fly
37 / 48
Quadruplets based algorithms
"A is more similar to B than C is to D"
less supervision: relative similarity judgments (you do not "force" some
similarities to be small or large explicitely)
notion of ordering between pairwise similarities
38 / 48
Weakly Supervised Learners
39 / 48
Weakly Supervised Learners
Scoring pairs/quadruplets based algorithms
for all metric learners (even supervised ones):
score_pairs: returns a similarity score
for pairs learners:
predict: +1 or -1 according to similar or not (uses threshold)
benefit from accuracy, roc_auc, from scikit-learn
for quadruplets learners:
predict +1 if A is more similar to B than C is to D, -1 otherwise
benefit from accuracy, roc_auc, from scikit-learn
40 / 48
Mahalanobis metric learning (c.f. MMCbefore)
41 / 48
Mahalanobis metric learning (c.f. MMCbefore)
For now: all algorithms define a euclidean distance in an embedding space
that is obtained through a linear transformation:
metric:
All have the transform method
They can do dimensionality reduction
mmc.fit(pairs_train, y_train)
mmc.transform(X_test)
# result is an array of shape (X_test.shape[0], dim_output)
||𝐿 − 𝐿 ||𝑥 𝑖 𝑥 𝑗
42 / 48
Testing and Continuous Integration
def test_fit_mmc():
???
We do not know in advance what we want to test
But hopefully:
We know some properties of objects we work with
testing the gradient: can compare with finite approximation
scipy.optimize.check_grad
test that a transformation is indeed linear: f(ax+by) = a f(x) + b f(y)
...
We can use toy examples
43 / 48
Designing toy examples
Simple example that exhibits a property that you can test:
Ex: 3 points in 2D (not colinear), and close but should'nt and and
far but shouldn't
def test_mmc_toy_example():
data = np.array([[0, 0], [0, 1], [2, 0]])
pairs = np.array([[0, 1], [0, 2]])
y = np.array([-1, 1])
mmc = MMC(preprocessor=data)
mmc.fit(pairs, y)
data_transformed = mmc.transform(data)
assert (np.linalg.norm(data_transformed[1] - data_transformed[0]) >
np.linalg.norm(data_transformed[2] - data_transformed[0]))
𝑥 0 𝑥 1 𝑥 0 𝑥 2
44 / 48
Recap: v.0.5.0 (in a fewweeks)
scikit-learn compatibility (cross-validation, GridSearchCV...)
"Preprocessor" to avoid memory consumption
Next steps
submit to sklearn-contrib
stochastic optimizers for scaling up
more choice to form pairs/quadruplets from labeled data
general functions like regularizers etc
more testing
more documentation, incl. examples
...
45 / 48
Conclusion
Metric learning: learn similarities from weakly supervised information
Many use cases
open source package metric-learn
v0.5.0: compatibility with scikit-learn
46 / 48
Check it out !
open source
raise issues
submit PRs
any contribution is welcome !
47 / 48
Questions ?
Contact
william.de-vazelhes@inria.fr
48 / 48

Metric-learn, a Scikit-learn compatible package

  • 1.
    Metric-learn, a Scikit-learn compatiblepackage October 6, 2018 William de Vazelhes wdevazelhes william.de-vazelhes@inria.fr 1 / 48
  • 2.
    About me: William deVazelhes Engineer @Inria Lille, Magnet team, since 2017 work on metric-learn, with @bellet and @nvauquie. Joint work with Inria Parietal team (scikit-learn developers), esp. @ogrisel, @GaelVaroquaux, @agramfort few contributions to scikit-learn 2 / 48
  • 3.
    Summary Introduction to MachineLearning with scikit-learn  Introduction to Metric Learning Presentation of the metric-learn package 3 / 48
  • 4.
    Summary Introduction to MachineLearning with scikit-learn  Introduction to Metric Learning Presentation of the metric-learn package 4 / 48
  • 5.
    De nition Machine learningis a field of computer science that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) with data, without being explicitly programmed. -- Wikipedia 5 / 48
  • 6.
  • 7.
    scikit-learn: Machine Learningin Python used by > 500,000 data scientists daily around the world 30k stars on GitHub 1000+ contributors A lot of estimators A lot of machine learning routines Very detailed documentation v0.20.0 just a few days ago 7 / 48
  • 8.
    Running example: FaceRecognition We have a dataset of labeled images: 'Smith' 'Cooper' 'Stevens' 'Smith' 'Stevens' ...: ... 8 / 48
  • 9.
    Running example: FaceRecognition We have a dataset of labeled images: 'Smith' 'Cooper' 'Stevens' 'Smith' 'Stevens' ...: ... We want to classify a new image: ? → 'Cooper' 9 / 48
  • 10.
    Load dataset fromscikit-learn Inputdata: 400 greyscale images of 64 x 64 → 400 samples of 4096 features each (400, 4096) (400,) [[0.30991736 0.3677686 0.41735536 ... 0.15289256 0.16115703 0.1570248 ] [0.45454547 0.47107437 0.5123967 ... 0.15289256 0.15289256 0.15289256] ... [0.21487603 0.21900827 0.21900827 ... 0.57438016 0.59090906 0.60330576] [0.5165289 0.46280992 0.28099173 ... 0.35950413 0.3553719 0.38429752]] ['Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Hart' 'Mcmahon' 'Mcmahon' ' 'Mcmahon' 'Mcmahon' 'Mcmahon' 'Mcmahon' 'Mcmahon' 'Mcmahon' ... 'Mccarty' 'Mccarty' 'Rivers' 'Rivers' 'Rivers' 'Rivers' 'Rivers' 'Rivers'] import numpy as np from sklearn.datasets import fetch_olivetti_faces dataset = fetch_olivetti_faces() names = np.array(['Hart', 'Mcmahon', 'Cain', 'Mahoney', 'Long', 'Green', 'Vega', 'H X, y = dataset.data, names[dataset.target] print(X.shape, y.shape) print(X) print(y) 10 / 48
  • 11.
    Split between train/test Trainset: to train the ML algorithm Test set: to simulate some unseen data from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y) print(X_train.shape, y.shape) print(X_test.shape, y_test.shape) (300, 4096) (400,) (100, 4096) (100,) 11 / 48
  • 12.
    Train the classier from sklearn.linear_model import LogisticRegression clf = LogisticRegression() clf.fit(X_train, y_train) 12 / 48
  • 13.
    Predict/score on newsamples clf.predict(X_test) array(['Villa','Benitez', 'Benson', 'Petersen', 'Acosta', 'Pace', 'Christian', 'Perkins', 'Green', 'Keller', 'Mahoney', 'Benson', ... 'Benitez', 'Gilmore', 'Hurst', 'Mcmahon', 'Keller', 'Vega', 'Hart', 'Porter'], dtype='<U11') clf.score(X_test, y_test) 0.91 13 / 48
  • 14.
    Select hyperparameters... Create validationset for evaluating the models 0.96 0.9733333333333334 clf_1 = LogisticRegression(C=0.1) clf_2 = LogisticRegression(C=1) X_train_bis, X_validation, y_train_bis, y_validation = train_test_split(X_train, for clf in [clf_1, clf_2]: clf.fit(X_train_bis, y_train_bis) print(clf.score(X_validation, y_validation)) 14 / 48
  • 15.
    ... which iseasy with GridSearchCV from sklearn.model_selection import GridSearchCV clf = LogisticRegression() grid = {'C': [0.1, 1, 5], 'penalty': ['l1', 'l2']} clf = GridSearchCV(clf, grid) clf.fit(X_train, y_train) print(clf.best_params_) print(clf.best_score_) {'C': 5, 'penalty': 'l2'} 0.9633333333333334 15 / 48
  • 16.
    Summary Introduction to MachineLearning with scikit-learn Introduction to Metric Learning Presentation of the metric-learn package 16 / 48
  • 17.
    Face matching foraccess authorization Many people in an organisation, but only a few pictures each Incoming picture: does it match some member ? Also have a huge database of unlabeled images from a lot of people (from a faces database) Mech. turks labeled pairs of images as "same person"/"different persons" (hard to directly label images) https://www.facefirst.com/wp-content/uploads/2018/04/Screen-Shot-2018-04-26-at-4.12.56-PM.png 17 / 48
  • 18.
    Learn a goodmetric Learn a metric that puts similar points closer and dissimilar points further apart 𝑑 18 / 48
  • 19.
    Applications ofMetric Learning https://proxy.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.computerhope.com%2Fjargon%2Ff%2Fface-id-truedepth-camera.jpg&f=1.jpghttps://rrc.ru/upload/splunk/splunk-workshop/Discovery%20Day%20Russia%20-%20Machine%20Learning.pdf https://i2.wp.com/www.touahria.com/wp- 19 / 48
  • 20.
    Loading pairs ofimages Dataset:Pairs of similar points and dissimilar points from sklearn.datasets import fetch_lfw_pairs dataset = fetch_lfw_pairs() pairs = dataset.pairs y = 2 * dataset.target - 1 for i in range(2): plt.subplot(1, 2, i+1) plt.imshow(pairs[0, i, :, :], cmap='Greys_r') print(y[0]) 1 20 / 48
  • 21.
    Loading pairs ofimages pairs= pairs.reshape(pairs.shape[0], 2, -1) print(pairs) print(y) [[[ 73.666664 70.666664 81.666664 ... 152. 159.66667 155. ] [ 66. 74.333336 84.333336 ... 225.66667 229.66667 233.33333 ]] [[ 86.333336 113.333336 133.33333 ... 157.66667 87.333336 49.666668] [109. 92.666664 114.333336 ... 106. 114.333336 122.333336]] [[ 37.333332 35.333332 34. ... 192.33333 197. 198. ] [ 24. 28.333334 32. ... 51.333332 52.333332 52. ]] ... [[ 73. 94.333336 121.333336 ... 226.66667 229. 227.66667 ] [ 23. 20.333334 21.333334 ... 64. 71. 82.333336]] [[119. 110.333336 112.666664 ... 244.33333 239.66667 230.33333 ] [106.333336 94.333336 88.333336 ... 145.33333 130. 102.333336]] [[ 23.333334 20. 23.333334 ... 190.33333 187.66667 174.66667 ] [ 34.666668 44.666668 70. ... 146.33333 151. 159. ]]] [ 1 1 1 ... -1 -1 -1] 21 / 48
  • 22.
    Split between trainand test pairs_train, pairs_test, y_train, y_test = train_test_split(pairs, y) test train [3.2, 6.8, 9.1] [2.5, 1.8, 2.5] [3.1, 6.7, 1.8] [3.2, 6.8, 9.1] [3.5, 4.9, 1.0] [8.5, 7.2, 9.0] [4.5, 9.0, 4.2] [3.8, 6.4, 2.6] 1 -1 1 1 [ ] [ [ [ [ ] ] ] ] 22 / 48
  • 23.
    Howdo you learnon this data ?  Example: Mahalanobis Metric for Clustering (MMC) Parameters to learn: a transformation matrix That transforms into a new representation Associated metric: : the euclidean distance in the new space Problem to solve : s.t. 𝐿 𝑥 𝑖 𝐿 𝑥 𝑖 ||𝐿 − 𝐿 ||𝑥 𝑖 𝑥 𝑗 ||𝐿 − 𝐿 |min𝐿 ∑ ( , )∈𝑆𝑥 𝑖 𝑥 𝑗 𝑥 𝑖 𝑥 𝑗 | 2 ||𝐿 − 𝐿 || ≥ 1∑ ( , )∈𝐷𝑥 𝑖 𝑥 𝑗 𝑥 𝑖 𝑥 𝑗 23 / 48
  • 24.
    What can youdo with this learned metric ? KNN classification: find the nearest neighbors of some w.r.t. the learned metric Clustering: use the learned metric to cluster together similar samples ... 𝑥 𝑖 24 / 48
  • 25.
    Summary Introduction to MachineLearning with scikit-learn  Introduction to Metric Learning Presentation of the metric-learn package 25 / 48
  • 26.
    Introduction created by CJCarey (@perimosocordiae) and Yuan Tang (@terrytangyuan) 472 stars on GitHub 9 algorithms documentation 13 contributors: perimosocordiae 4,601 ++ 3,211 -- terrytangyuan 1,268 ++ 218 -- bhargavvader 897 ++ 26 -- wdevazelhes 706 ++ 213 -- Callidior 635 ++ 38 -- svecon 458 ++ 143 -- dsquareindia 141 ++ 1 -- ab-anssi 102 ++ 38 -- anirudt 6 ++ 0 -- arikpoz 4 ++ 2 -- toto 3 ++ 3 -- shalan 1 ++ 1 -- michaelstewart 1 ++ 1 -- + other contributions 26 / 48
  • 27.
    Introduction Metric-learn v0.4.0 justreleased 1 month ago But not yet compatible with scikit learn Rest of the talk: about v.0.5.0 (release in a few weeks) 27 / 48
  • 28.
     Challenge: make itscikit learn compatible 28 / 48
  • 29.
    Sklearn compatibility After loading andsplitting we had: test train 1 -1 1 1 Concretely represented by: test train [3.2, 6.8, 9.1] [2.5, 1.8, 2.5] [3.1, 6.7, 1.8] [3.2, 6.8, 9.1] [3.5, 4.9, 1.0] [8.5, 7.2, 9.0] [4.5, 9.0, 4.2] [3.8, 6.4, 2.6] 1 -1 1 1 [ ] [ [ [ [ ] ] ] ] 29 / 48
  • 30.
    Sklearn compatibility Scikit-learn routines workwith this format ! from metric_learn import MMC from sklearn.model_selection import GridSearchCV grid = {'alpha': [0.1, 1, 10]} mmc = MMC() metric_learner = GridSearchCV(mmc, grid) metric_learner.fit(pairs_train, y_train) 30 / 48
  • 31.
    Sklearn compatibility Scikit-learn routines workwith this format ! from metric_learn import MMC from sklearn.model_selection import GridSearchCV grid = {'alpha': [0.1, 1, 10]} mmc = MMC() metric_learner = GridSearchCV(mmc, grid) metric_learner.fit(pairs_train, y_train) But: this 3D array is very redundant: data duplication in each pair which reuses one sample 31 / 48
  • 32.
    Sklearn compatibility Other solution: 2Darrays of indices First argument of the metric learner is now indices (2D array of indices) Give also the X array when initializing the metric learner 0 3 4 0 1 5 6 7test train [3.2, 6.8, 9.1] [3.5, 4.9, 1.0] [1.5, 2.9, 4.0] [2.5, 1.8, 2.5] [3.1, 6.7, 1.8] [8.5, 7.2, 9.0] [4.5, 9.0, 4.2] [3.8, 6.4, 2.6] 1 -1 1 1 [ ] [ [ [ ] ] ] [ ] 32 / 48
  • 33.
    Sklearn compatibility Other solution: 2Darrays of indices from metric_learn import MMC from sklearn.model_selection import GridSearchCV grid = {'alpha': [0.1, 1, 10]} mmc = MMC(preprocessor=data) metric_learner = GridSearchCV(mmc, grid) metric_learner.fit(pairs_train_indices, y_train) 33 / 48
  • 34.
    Sklearn compatibility Other solution: 2Darrays of indices Other example of accepted data: path_pairs_train = [['img_1.png', 'img_2.png'], ['img_2.png', 'img_4.png'], ['img_2 root = '~/images' itml = ITML(preprocessor=ImgLoader(root)) itml.fit(path_pairs, y_train) 34 / 48
  • 35.
    Sklearn compatibility Note Pairs will beformed batch-wise from indices inside the algorithm: def fit(self, indices, y): weights_update = np.zeros(d, d) for indices_batch in yield_batches(indices): weights_update += some_computation(preprocessor(batch_indices)) 35 / 48
  • 36.
  • 37.
    Algorithms Fully Supervised: classification: NCA,LMNN, LFDA, Covariance regression: MLKR Weakly Supervised: pairs: MMC, ITML, SDML quadruplets: LSML Every pairs/quadruplets based algorithm comes with a *_Supervised version that creates pairs/quadruplets on the fly 37 / 48
  • 38.
    Quadruplets based algorithms "Ais more similar to B than C is to D" less supervision: relative similarity judgments (you do not "force" some similarities to be small or large explicitely) notion of ordering between pairwise similarities 38 / 48
  • 39.
  • 40.
    Weakly Supervised Learners Scoringpairs/quadruplets based algorithms for all metric learners (even supervised ones): score_pairs: returns a similarity score for pairs learners: predict: +1 or -1 according to similar or not (uses threshold) benefit from accuracy, roc_auc, from scikit-learn for quadruplets learners: predict +1 if A is more similar to B than C is to D, -1 otherwise benefit from accuracy, roc_auc, from scikit-learn 40 / 48
  • 41.
    Mahalanobis metric learning(c.f. MMCbefore) 41 / 48
  • 42.
    Mahalanobis metric learning(c.f. MMCbefore) For now: all algorithms define a euclidean distance in an embedding space that is obtained through a linear transformation: metric: All have the transform method They can do dimensionality reduction mmc.fit(pairs_train, y_train) mmc.transform(X_test) # result is an array of shape (X_test.shape[0], dim_output) ||𝐿 − 𝐿 ||𝑥 𝑖 𝑥 𝑗 42 / 48
  • 43.
    Testing and ContinuousIntegration def test_fit_mmc(): ??? We do not know in advance what we want to test But hopefully: We know some properties of objects we work with testing the gradient: can compare with finite approximation scipy.optimize.check_grad test that a transformation is indeed linear: f(ax+by) = a f(x) + b f(y) ... We can use toy examples 43 / 48
  • 44.
    Designing toy examples Simpleexample that exhibits a property that you can test: Ex: 3 points in 2D (not colinear), and close but should'nt and and far but shouldn't def test_mmc_toy_example(): data = np.array([[0, 0], [0, 1], [2, 0]]) pairs = np.array([[0, 1], [0, 2]]) y = np.array([-1, 1]) mmc = MMC(preprocessor=data) mmc.fit(pairs, y) data_transformed = mmc.transform(data) assert (np.linalg.norm(data_transformed[1] - data_transformed[0]) > np.linalg.norm(data_transformed[2] - data_transformed[0])) 𝑥 0 𝑥 1 𝑥 0 𝑥 2 44 / 48
  • 45.
    Recap: v.0.5.0 (ina fewweeks) scikit-learn compatibility (cross-validation, GridSearchCV...) "Preprocessor" to avoid memory consumption Next steps submit to sklearn-contrib stochastic optimizers for scaling up more choice to form pairs/quadruplets from labeled data general functions like regularizers etc more testing more documentation, incl. examples ... 45 / 48
  • 46.
    Conclusion Metric learning: learnsimilarities from weakly supervised information Many use cases open source package metric-learn v0.5.0: compatibility with scikit-learn 46 / 48
  • 47.
    Check it out! open source raise issues submit PRs any contribution is welcome ! 47 / 48
  • 48.