Neighborhood Component Analysis 20071108
Upcoming SlideShare
Loading in...5
×
 

Neighborhood Component Analysis 20071108

on

  • 1,376 views

An introduction to

An introduction to

Statistics

Views

Total Views
1,376
Views on SlideShare
1,375
Embed Views
1

Actions

Likes
0
Downloads
5
Comments
0

1 Embed 1

http://www.plurk.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Neighborhood Component Analysis 20071108 Neighborhood Component Analysis 20071108 Presentation Transcript

    • Neighbourhood Component Analysis T.S. Yo
    • References
    • Outline● Introduction● Learn the distance metric from data● The size of K● Procedure of NCA● Experiments● Discussions
    • Introduction (1/2)● KNN – Simple and effective – Nonlinear decision surface – Non-parametric – Quality improved with more data – Only one parameter, K -> easy for tuning
    • Introduction (2/2)● Drawbacks of KNN – Computationally expensive: search through the whole training data in the test time – How to define the “distance” properly?● Learn the distance metric from data, and force it to be low rank.
    • Learn the Distance from Data (1/5)● What is a good distance metric? – The one that minimize (optimize) the cost!● Then, what is the cost? – The expected testing error – Best estimated with leave-one-out (LOO) cross- validation error in the training dataKohavi, Ron (1995). "A study of cross-validation and bootstrap for accuracy estimation and model selection".Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): 1137–1143. (MorganKaufmann, San Mateo)
    • Learn the Distance from Data (2/5)● Modeling the LOO error: – Let pij be the probability that point xj is selected as point xis neighbour. – The probability that points are correctly classified when xi is used as the reference is:● To maximize pi for all xi means to minimize LOO error.
    • Learn the Distance from Data (3/5)● Then, how do we define pij ? – According to the softmax of the distance dij Softmax Function 1 0.9 – Relatively smoother than dij 0.8 0.7 0.6 exp(-X) 0.5 0.4 0.3 0.2 0.1 0 X
    • Learn the Distance from Data (4/5)● How do we define dij ?● Limit the distance measure within Mahalanobis (quadratic) distance.● That is to say, we project the original feature vectors x into another vector space with q transformation matrix, A
    • Learn the Distance from Data (5/5)● Substitute the dij in pij :● Now, we have the objective function :● Maximize f(A) w.r.t. A → minimize overall LOO error
    • The Size of k● For the probability distribution pij :● The perplexity can be used as an estimate for the size of neighbours to be considered, k
    • Procedure of NCA (1/2)● Use the objective function and its gradient to learn the transformation matrix A and K from the training data, Dtrain(with or without dimension reduction).● Project the test data, Dtest, into the transformed space.● Perform traditional KNN (with K and ADtrain) on the transformed test data, ADtest.
    • Procedure of NCA (2/2)● Functions used for optimization
    • Experiments – Datasets (1/2)● 4 from UCI ML Repository, 2 self-made
    • Experiments – Datasets (2/2)n2d is a mixture of two bivariate normal distributions with different means andcovariance matrices. ring consists of 2-d concentric rings and 8 dimensions ofuniform random noise.
    • Experiments – Results (1/4)Error rates of KNN and NCA with the same K.It is shown that generally NCA does improve theperformance of KNN.
    • Experiments – Results (2/4)
    • Experiments – Results (3/4)● Compare with other classifiers
    • Experiments – Results (4/4) ● Rank 2 dimension reduction
    • Discussions (1/8)● Rank 2 transformation for wine
    • Discussions (2/8)● Rank 1 transformation for n2d
    • Discussions (3/8)● Results of Goldberger et al.(40 realizations of 30%/70% splits)
    • Discussions (4/8)● Results of Goldberger et al.(rank 2 transformation)
    • Discussions (5/8)● Results of experiments suggest that with the learned distance metric by NCA algorithm, KNN classification can be improved.● NCA also outperforms traditional dimension reduction methods for several datasets.
    • Discussions (6/8)● Comparing to other classification methods (i.e. LDA and QDA), NCA usually does not give the best accuracy.● Some odd performance on dimension reduction suggests that a further investigation on the optimization algorithm is necessary.
    • Discussions (7/8)● Optimize a matrix● Can we Optimize these Functions? (Michael L. Overton) – Globally, no. Related problems are NP-hard (Blondell- Tsitsiklas, Nemirovski) – Locally, yes. ● But not by standard methods for nonconvex, smooth optimization ● Steepest descent, BFGS or nonlinear conjugate gradient will typically jam because of nonsmoothness
    • Discussions (8/8) ● Other methods learn distant metric from data – Discriminant Common Vectors(DCV) ● Similar to NCA, DCV focuses on optimizing the distance metric on certain objective functions – Laplacianfaces(LAP) ● Emphasizes more on dimension reductionJ. Liu and S. Chen , Discriminant Common Vecotors Versus Neighbourhood ComponentsAnalysis and Laplacianfaces: A comparative study in small sample size problem. Image andVision Computing
    • Question?
    • Thank you!
    • Derive the Objective Function (1/5)● From the assumptions, we have :
    • Derive the Objective Function (2/5)
    • Derive the Objective Function (3/5)
    • Derive the Objective Function (4/5)
    • Derive the Objective Function (5/5)