6. Key idea
• Privileged information allow us to distinguish
between easy and hard examples
– If the privileged data is easy to classify, then the
original data would also be easy to classify.
– … under the assumption that the privileged data is
similarly informative about the problem at hand.
7. Linear SVM
• Ordinary convergence rate = 𝑂𝑂(𝑁𝑁−1/2
)
• It improves to 𝑂𝑂(𝑁𝑁−1
)
– if we knew the optimal slack values 𝜉𝜉𝑖𝑖 in advance
(OracleSVM [Vapnik+ 2009])
min
𝑤𝑤∈ℝ𝑑𝑑,𝑏𝑏∈ℝ,𝜉𝜉𝑖𝑖∈ℝ
8. Slack variables in SVM
• Slack variables tell us which training examples
are easy / hard to classify
– 𝜉𝜉𝑖𝑖 = 0 → easy
– 𝜉𝜉𝑖𝑖 ≫ 0 → hard
min
𝑤𝑤∈ℝ𝑑𝑑,𝑏𝑏∈ℝ,𝜉𝜉𝑖𝑖∈ℝ
9. SVM+
• A 1st model for LUPI
– Use privileged data as a proxy to the oracle
– Parameterize 𝜉𝜉𝑖𝑖 = 𝑤𝑤∗, 𝑥𝑥𝑖𝑖
∗
+ 𝑏𝑏∗
[Vapnik+ NN2009, NIPS2010]
10. Why should SVM+ be improved?
• Cannot be solved by popular SVM packages
– Although good optimization algorithms were
derived [Pechyony+ 2011], they work only with the dual.
11. Learning to rank setup instead
• Underlying idea is the same
• Using the privileged data to identify easy /
hard-to-separate sample pairs
– Instead of using it to identify easy / hard-to-
classify samples
12. SVMrank
• Slack variables tell us which training example
pairs are easy / hard / impossible to separate
[Joachims KDD2002]
13. Proposed method: Rank transfer
• The strategy is similar to SVM+, but indirect.
1. SVMrank on 𝑋𝑋∗ (The ranking function 𝑓𝑓∗)
2. Margins 𝜌𝜌𝑖𝑖𝑖𝑖 = 𝑓𝑓∗
𝑥𝑥𝑖𝑖
∗
− 𝑓𝑓∗
(𝑥𝑥𝑗𝑗
∗
) ∀𝑖𝑖, 𝑗𝑗 𝑦𝑦𝑖𝑖 > 𝑦𝑦𝑗𝑗
• 𝜌𝜌𝑖𝑖𝑖𝑖 ≫ 0 : easy, 𝜌𝜌𝑖𝑖𝑖𝑖 ≈ 0 : hard, 𝜌𝜌𝑖𝑖𝑖𝑖 < 0 : impossible
3. SVMrank on 𝑋𝑋 with data-dependent margins
14. Intuition
• If it was difficult to correctly rank a pair on 𝑋𝑋∗
,
also it will also be difficult on 𝑋𝑋
1. Pairs (𝑖𝑖, 𝑗𝑗) with small margins 𝜌𝜌𝑖𝑖𝑖𝑖 have more
limited influence on 𝑤𝑤
2. Incorrectly ranked pairs are ignored.
1.
2.
15. Why not Rank transfer?
• We can use standard SVM packages!
– For the SVMrank on 𝑋𝑋∗ this is clear.
– For the SVMrank on 𝑋𝑋 we need variable
transformations
16. Experiments
• 4 different types of privileged information
– All of those can be handled in a unified framework.
• 4 different methods to be compared
– SVM, SVMrank, SVM+, Rank transfer
• Evaluation metric = Average Precision
17. (1) Attributes as privileged info
• Animals with Attributes Dataset
– 10 species ( = classes), 85 properties ( = attributes)
• Features: 2000-dim SURF
• Privileged: 85-dim predicted attributes
[Lampert+ PAMI2014]
• Learn 1-vs-1 classifiers with 100 training
samples
20. (2) Bounding box as privileged info
• Fine-grained setup on ILSVRC2012
– 17 classes with variety of snakes
• Features: 4096-dim Fisher vector from the
whole images
• Privileged: 4096-dim Fisher vector from the
bounding box regions
• Learn 1-vs-rest classifiers
21. (2) Results
• SVM+ is the best, ranking strategies do not
seem suitable for this setup.
22. (3) Texts as privileged info
• IsraelImages dataset [Bekkerman+ CVPR2007]
– 11 classes, 1800 images with a textual description
up to 18 words
• Features: 4096-dim Fisher vectors
• Privileged: BoWs from the texts
• Learn 1-vs-1 classifiers
Desert Trees
23. (3) Results
• Reference (privileged only) is the best
• All the others produce almost the same.
– Note that, high accuracy in the privileged space
does not necessarily mean that the privileged
information is helpful for the target task.
24. (4) Rationales as privileged info
• Hot or Not dataset [Donahue+ ICCV2011]
• Features: 500-dim densely sampled SIFT from
the whole image
• Privileged: 500-dim densely sampled SIFT
from the rationales
25. (4) Results
• Reference is the best.
• Rank transfer performs better for male class.
• Hard to draw a conclusion.
28. Last words
• The idea is nice, easy to use.
• More privileged information, better
performance? --- needs discussions
• Which types of privileged information are
suitable? --- unknown