5. Background:low-rank and existing method
M is low-rank → Mahalanobis distance is defined in low dim space.
Rank minimization is NP Hard [2] ⇒ Use approximation of that.
Trace Norm regularization
▶ Minimizes sum of all singular values σ(M).
Reg(M)
∑
s
σs(M).
Changing of one large singular value affects overall.
Fantope regularization
▶ Minimizes sum of k smallest singular values.
Reg(M)
k∑
s
σs(M).
Be sensitive for hyper-parameter k. 3 / 14
6. Proposed Method
In proposed method, use Capped Trace Norm regularization.
Capped Trace Norm regularization
Only minimizes the singular values that are smaller than ϵ.
Reg(M)
∑
s
min(σs(M),ϵ).
Reduce the effect of changing large singular value.
Stable for hyper-parameter than Fantope.
4 / 14
7. Proposed Method
Optimization Problem:
min
M ∈Sd
+
∑
(i,j,k,l)∈A
[
δijkl +
⟨
M, xij x⊤
ij − xkl x⊤
kl
⟩]
+
Degree of violation for quadrupletwise constraints
+
γ
2
∑
s
min(σs(M),ϵ)
Regularization term
,where A =
{
(i, j, k,l) : dM (xk, xl) ≥ dM (xi, xj) + δijkl
}
.
⇒ This function is non-convex
5 / 14
8. Proposed Method:Algorithm
Singular value decomposition of M:
M = UΣU⊤
= · · · us · · ·
...
σs
...
· · · us · · ·
⊤
.
Define D:
D =
1
2
k∑
s=1
σ−1
s us u⊤
s .
, where k is number of singular values that smaller than ϵ.
Transform into this convex optimization by using D.
min
M ∈Sd
+
∑
(i,j,k,l)∈A
[
ξ(i,j,k,l) +
⟨
M, xij x⊤
ij − xkl x⊤
kl
⟩]
+
γ
2
Tr(M⊤
DM)
, where D is fixed.
6 / 14
9. Proposed Method:Algorithm
Proximal Gradient Descent
Key Points
They prove the convergence of our optimization algorithm.
k is hyper-parameter.
▶ ϵ is adaptively determined.
7 / 14
10. Experiment:Synthetic Data
Data:
1. Make T ∈ Sd
+, where rank(T) is e.
2. Quadrupletwise constraints that are satisfied on Mahalanobis Distance of T, split for
train data A, validation data V, test data T.
Setting:
▶ d = 100
▶ e = 10
▶ |A| = |V| = |T | = 104
▶ γ is tuned in the range of
{
10−2,10−2,1,10,102
}
▶ k is tuned from 5 to 20
Compared Methods:
▶ ML: No regularization
▶ ML+Trace: Trace Norm regularization
▶ ML+Fantope: Fantope regularization
▶ ML+capped: Proposed Method
8 / 14
11. Experiment:Synthetic Data
Accuracy and rank(M).
Method Accuracy rank(M)
ML 85.62% 53
ML + Trace 88.44% 41
ML + Fantope 95.50% 10
ML + capped 95.43% 10
Table 1: Synthetic experiment results.
⇒ Fantope reg and Capped Norm reg are both better than other method.
9 / 14
12. Experiment:Synthetic Data
Accuracy on changing hyper-parameter k
Rank k
6 8 10 12 14 16 18 20
Accuracy%
88
89
90
91
92
93
94
95
96
ML+Trace
ML+Fantope
Our method
⇒ Proposed method mostly outperforms Fantope reg.
⇒ Propsoed method performs more stable than Fantope reg.
10 / 14
13. Experiment:Labeled Faces in The Wild
Task: Deciding if two face images are from the same person.
Data:
▶ 13233 images from 5749 persons.
▶ Use SIFT feature.
Setting:
▶ Use pairwise constraints.
▶ γ is tuned in
{
10−2,10−1,1,10,102
}
▶ k is tuned in {30,35,40,45,50,55,60,65,70}
Compared Methods:
▶ IDENTITY: Euclidean Distance
▶ MAHALANOBIS: Traditional Mahalanobis Distance
▶ KISSME: [3]
▶ ITML: [4]
▶ LDML: [5]
11 / 14
15. Experiment:Labeled Faces in The Wild
Accuracy on changing hyper-parameter k.
Rank k
30 35 40 45 50 55 60 65 70
Accuracy%
79.5
80
80.5
81
81.5
82
82.5
ML+Trace
ML+Fantope
Our method
⇒ Proposed method get better results than metric learning with Fantope
regularization.
13 / 14
16. Conclusion
They proposed a novel low-rank regularization, Capped Trace Norm regularization.
Proposed algorithm for optimization problem and prove convergence of that.
Experimental results show that our method outperforms the state-of-the-art
metric learning methods.
14 / 14
17. Reference I
Zhouyuan Huo, Feiping Nie, and Heng Huang.
Robust and effective metric learning using capped trace norm: Metric learning via
capped trace norm.
In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 1605–1614. ACM, 2016.
Aur´elien Bellet, Amaury Habrard, and Marc Sebban.
A survey on metric learning for feature vectors and structured data.
arXiv preprint arXiv:1306.6709, 2013.
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst
Bischof.
Large scale metric learning from equivalence constraints.
In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.
2288–2295. IEEE, 2012.
14 / 14
18. Reference II
Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon.
Information-theoretic metric learning.
In Proceedings of the 24th international conference on Machine learning, pp.
209–216. ACM, 2007.
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid.
Is that you? metric learning approaches for face identification.
In 2009 IEEE 12th International Conference on Computer Vision, pp. 498–505.
IEEE, 2009.
14 / 14