Your SlideShare is downloading. ×
0
Neighbourhood Component       Analysis         T.S. Yo
References
Outline●   Introduction●   Learn the distance metric from data●   The size of K●   Procedure of NCA●   Experiments●   Disc...
Introduction (1/2)●   KNN    –   Simple and effective    –   Nonlinear decision surface    –   Non-parametric    –   Quali...
Introduction (2/2)●   Drawbacks of KNN    –   Computationally expensive: search through the        whole training data in ...
Learn the Distance from Data (1/5)●   What is a good distance metric?     –   The one that minimize (optimize) the cost!● ...
Learn the Distance from Data (2/5)●   Modeling the LOO error:    –   Let pij be the probability that point xj is selected ...
Learn the Distance from Data (3/5)●   Then, how do we define pij ?    –   According to the softmax of the distance dij    ...
Learn the Distance from Data (4/5)●   How do we define dij ?●   Limit the distance measure within Mahalanobis    (quadrati...
Learn the Distance from Data (5/5)●   Substitute the dij in pij :●   Now, we have the objective function :●   Maximize f(A...
The Size of k●   For the probability distribution pij :●   The perplexity can be used as an estimate for    the size of ne...
Procedure of NCA (1/2)●   Use the objective function and its gradient to    learn the transformation matrix A and K from  ...
Procedure of NCA (2/2)●   Functions used for optimization
Experiments – Datasets (1/2)●   4 from UCI ML Repository, 2 self-made
Experiments – Datasets (2/2)n2d is a mixture of two bivariate normal distributions with different means andcovariance matr...
Experiments – Results (1/4)Error rates of KNN and NCA with the same K.It is shown that generally NCA does improve theperfo...
Experiments – Results (2/4)
Experiments – Results (3/4)●   Compare with    other classifiers
Experiments – Results (4/4)                   ●   Rank 2                       dimension                       reduction
Discussions (1/8)●   Rank 2 transformation for wine
Discussions (2/8)●   Rank 1 transformation for n2d
Discussions (3/8)●   Results of    Goldberger    et al.(40 realizations of   30%/70% splits)
Discussions    (4/8)●   Results of    Goldberger    et al.(rank 2   transformation)
Discussions (5/8)●   Results of experiments suggest that with the    learned distance metric by NCA algorithm, KNN    clas...
Discussions (6/8)●   Comparing to other classification methods (i.e.    LDA and QDA), NCA usually does not give the    bes...
Discussions (7/8)●   Optimize a matrix●    Can we Optimize these Functions? (Michael L. Overton)    –   Globally, no. Rela...
Discussions (8/8) ●   Other methods learn distant metric from data      –   Discriminant Common Vectors(DCV)           ●  ...
Question?
Thank you!
Derive the Objective Function (1/5)●   From the assumptions, we have :
Derive the Objective Function (2/5)
Derive the Objective Function (3/5)
Derive the Objective Function (4/5)
Derive the Objective Function (5/5)
Upcoming SlideShare
Loading in...5
×

Neighborhood Component Analysis 20071108

1,438

Published on

An introduction to

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,438
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Neighborhood Component Analysis 20071108"

  1. 1. Neighbourhood Component Analysis T.S. Yo
  2. 2. References
  3. 3. Outline● Introduction● Learn the distance metric from data● The size of K● Procedure of NCA● Experiments● Discussions
  4. 4. Introduction (1/2)● KNN – Simple and effective – Nonlinear decision surface – Non-parametric – Quality improved with more data – Only one parameter, K -> easy for tuning
  5. 5. Introduction (2/2)● Drawbacks of KNN – Computationally expensive: search through the whole training data in the test time – How to define the “distance” properly?● Learn the distance metric from data, and force it to be low rank.
  6. 6. Learn the Distance from Data (1/5)● What is a good distance metric? – The one that minimize (optimize) the cost!● Then, what is the cost? – The expected testing error – Best estimated with leave-one-out (LOO) cross- validation error in the training dataKohavi, Ron (1995). "A study of cross-validation and bootstrap for accuracy estimation and model selection".Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): 1137–1143. (MorganKaufmann, San Mateo)
  7. 7. Learn the Distance from Data (2/5)● Modeling the LOO error: – Let pij be the probability that point xj is selected as point xis neighbour. – The probability that points are correctly classified when xi is used as the reference is:● To maximize pi for all xi means to minimize LOO error.
  8. 8. Learn the Distance from Data (3/5)● Then, how do we define pij ? – According to the softmax of the distance dij Softmax Function 1 0.9 – Relatively smoother than dij 0.8 0.7 0.6 exp(-X) 0.5 0.4 0.3 0.2 0.1 0 X
  9. 9. Learn the Distance from Data (4/5)● How do we define dij ?● Limit the distance measure within Mahalanobis (quadratic) distance.● That is to say, we project the original feature vectors x into another vector space with q transformation matrix, A
  10. 10. Learn the Distance from Data (5/5)● Substitute the dij in pij :● Now, we have the objective function :● Maximize f(A) w.r.t. A → minimize overall LOO error
  11. 11. The Size of k● For the probability distribution pij :● The perplexity can be used as an estimate for the size of neighbours to be considered, k
  12. 12. Procedure of NCA (1/2)● Use the objective function and its gradient to learn the transformation matrix A and K from the training data, Dtrain(with or without dimension reduction).● Project the test data, Dtest, into the transformed space.● Perform traditional KNN (with K and ADtrain) on the transformed test data, ADtest.
  13. 13. Procedure of NCA (2/2)● Functions used for optimization
  14. 14. Experiments – Datasets (1/2)● 4 from UCI ML Repository, 2 self-made
  15. 15. Experiments – Datasets (2/2)n2d is a mixture of two bivariate normal distributions with different means andcovariance matrices. ring consists of 2-d concentric rings and 8 dimensions ofuniform random noise.
  16. 16. Experiments – Results (1/4)Error rates of KNN and NCA with the same K.It is shown that generally NCA does improve theperformance of KNN.
  17. 17. Experiments – Results (2/4)
  18. 18. Experiments – Results (3/4)● Compare with other classifiers
  19. 19. Experiments – Results (4/4) ● Rank 2 dimension reduction
  20. 20. Discussions (1/8)● Rank 2 transformation for wine
  21. 21. Discussions (2/8)● Rank 1 transformation for n2d
  22. 22. Discussions (3/8)● Results of Goldberger et al.(40 realizations of 30%/70% splits)
  23. 23. Discussions (4/8)● Results of Goldberger et al.(rank 2 transformation)
  24. 24. Discussions (5/8)● Results of experiments suggest that with the learned distance metric by NCA algorithm, KNN classification can be improved.● NCA also outperforms traditional dimension reduction methods for several datasets.
  25. 25. Discussions (6/8)● Comparing to other classification methods (i.e. LDA and QDA), NCA usually does not give the best accuracy.● Some odd performance on dimension reduction suggests that a further investigation on the optimization algorithm is necessary.
  26. 26. Discussions (7/8)● Optimize a matrix● Can we Optimize these Functions? (Michael L. Overton) – Globally, no. Related problems are NP-hard (Blondell- Tsitsiklas, Nemirovski) – Locally, yes. ● But not by standard methods for nonconvex, smooth optimization ● Steepest descent, BFGS or nonlinear conjugate gradient will typically jam because of nonsmoothness
  27. 27. Discussions (8/8) ● Other methods learn distant metric from data – Discriminant Common Vectors(DCV) ● Similar to NCA, DCV focuses on optimizing the distance metric on certain objective functions – Laplacianfaces(LAP) ● Emphasizes more on dimension reductionJ. Liu and S. Chen , Discriminant Common Vecotors Versus Neighbourhood ComponentsAnalysis and Laplacianfaces: A comparative study in small sample size problem. Image andVision Computing
  28. 28. Question?
  29. 29. Thank you!
  30. 30. Derive the Objective Function (1/5)● From the assumptions, we have :
  31. 31. Derive the Objective Function (2/5)
  32. 32. Derive the Objective Function (3/5)
  33. 33. Derive the Objective Function (4/5)
  34. 34. Derive the Objective Function (5/5)
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×