SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003

1,321 views

Published on

Incremental Learning Workshop - Invited Paper

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,321
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

SVM Incremental Learning, Adaptation and Optimization - IJCNN 2003

  1. 1. SVM Incremental Learning, Adaptation and Optimization Chris Diehl Applied Physics Laboratory Johns Hopkins University Chris.Diehl@jhuapl.edu July 23, 2003 IJCNN – Incremental Learning Workshop Gert Cauwenberghs ECE Department Johns Hopkins University gert@jhu.edu
  2. 2. Motivation l What is the objective of machine learning? Ø To identify a model from the data that generalizes well l When learning incrementally, the search for a good model involves repeatedly Ø Selecting a hypothesis class (HC) Ø Searching the HC by minimizing an objective function over the model parameter space Ø Evaluating the resulting model’s generalization performance
  3. 3. Contributions l Unified framework for incremental learning and adaptation of SVM classifiers that supports Ø Learning and unlearning of individual or multiple examples Ø Adaptation of the current SVM to changes in regularization and kernel parameters Ø Generalization performance assessment through exact and approximate leave-one-out error estimation l Generalization of incremental learning algorithm presented in G. Cauwenberghs and T. Poggio, “Incremental and Decremental SVM Learning,” Advances in Neural Information Processing Systems (NIPS 2000), vol. 13, 2001.
  4. 4. SVM Learning for Classification l Quadratic Programming Problem (Dual Form) l Model Selection Ø Repeatedly select (C,θ ) Ø Solve the quadratic program Ø Perform cross-validation (approximate or exact) ( ) ( ) ( ) 1 20 , 1 1 1 1 min W , , i N N N i ij j i i i C i j i i N ij i j i j i i i i Q b y Q y y K x x f x y K x x b α α α α α θ α θ ≤ ≤ = = = = = − + = = + ∑ ∑ ∑ ∑
  5. 5. ( ) 1 0 W 1 0 0 W 0 i i i i N j j j i R g y f x i S i E h y b α α = > ∀ ∈ ∂  = = − = = ∀ ∈ ∂ < ∀ ∈ ∂ = = = ∂ ∑ Karush-Kuhn-Tucker (KKT) Conditions ( ) 1f x = ( ) 1f x = − 1y = 1y = ( ) : 0 R α = Reserve Vectors ( ) : 0 S Cα≤ ≤ Margin Vectors ( ) :E Cα = Error Vectors
  6. 6. Incrementing Additional Examples into the Solution Strategy Preserve the KKT conditions for the examples in R, S and E while incrementing the examples in U into the solution ( ) ( )( ) , ,i i i j j j i E j S f x y K x x y K x x bα θ α θ ∈ ∈ = + +∑ ∑ ( ) ( ) ( ) ( ) '( ) , , , i i i j j j i E j S k j k k k U f x y K x x y K x x y K x bx b α θ α θ θ α α ∈ ∈ ∈ = ∆+ + ∆ + + +∆ ∑ ∑ ∑ ( ) : 0 U Cα≤ < Unlearned Vectors ( ) 1f x = ( ) 1f x = − 1y = 1y = 1y =
  7. 7. Part 1: Computing Coefficient and Margin Sensitivities l For small perturbations of the unlearned vector coefficients, the error and reserve vectors will not change category l Need to modify and so that the margin vectors remain on the margin l Transformation from original SVM to final SVM will be controlled via the perturbation parameter p { }Skk ∈∀∆α b∆
  8. 8. Part 1: Computing Coefficient and Margin Sensitivities l The coefficient sensitivities and will be computed by enforcing the constraints and l Margin sensitivities can then be computed based on the coefficient sensitivities k k k S p α β  ∆ = ∀ ∈  ∆  b p β ∆ = ∆ 0i i g i S p γ  ∆ = = ∀ ∈  ∆  0 h p ∆ = ∆ { }, ,i i E R Uγ ∀ ∈
  9. 9. Part 2: Bookkeeping l Fundamental Assumption Ø No examples change category (e.g. margin to error vector) during a perturbation l Perturbation Process Ø Using the coefficient and margin sensitivities, compute the smallest that leads to a category change − Margin vectors: track changes in α − Error, reserve, unlearned vectors: track changes in g Ø Update the example categories and classifier parameters Ø Recompute the coefficient and margin sensitivities and repeat the process until p = 1. p∆
  10. 10. Decremental Learning and Cross-Validation l The perturbation process is fully reversible! l This allows one the option of unlearning individual or multiple examples to perform exact leave-one-out (LOO) or k-fold cross-validation l The LOO approximation based on Vapnik and Chapelle’s span rule is easily computed using the margin sensitivities
  11. 11. Regularization and Kernel Parameter Perturbation Strategies l Regularization Parameter Perturbation Ø Involves incrementing/decrementing error vector coefficients Ø Bookkeeping changes slightly since the regularization parameters change during a perturbation of the SVM l Kernel Parameter Perturbation Ø First modify the kernel parameter Ø Then correct violations of the KKT conditions Ø Becomes increasingly expensive as the number of margin vectors in the original SVM increases

×