Introduction to Statistical                                                                                   Machine Lear...
Introduction to Statistical                             Machine Learning                                 c 2011           ...
Introduction to StatisticalClassification                                                                    Machine Learni...
Introduction to StatisticalHow to represent binary class labels?                                   Machine Learning       ...
Introduction to StatisticalHow to represent multi-class labels?                                  Machine Learning         ...
Introduction to StatisticalLinear Model                                                         Machine Learning          ...
Introduction to StatisticalGeneralised Linear Model                                                 Machine Learning      ...
Introduction to StatisticalThree Models for Decision Problems                                              Machine Learnin...
Introduction to StatisticalTwo Classes                                                            Machine Learning        ...
Introduction to StatisticalTwo Classes                                                           Machine Learning         ...
Introduction to StatisticalTwo Classes                                                  Machine Learning                  ...
Introduction to StatisticalTwo Classes                                                                  Machine Learning  ...
Introduction to StatisticalTwo Classes                                                      Machine Learning              ...
Introduction to StatisticalMulti-Class                                                     Machine Learning               ...
Introduction to StatisticalMulti-Class                                                     Machine Learning               ...
Introduction to StatisticalMulti-Class                                                            Machine Learning        ...
Introduction to StatisticalLeast Squares for Classification                                      Machine Learning          ...
Introduction to StatisticalLeast Squares for Classification                                       Machine Learning         ...
Introduction to StatisticalDetermine W                                                                 Machine Learning   ...
Introduction to StatisticalDiscriminant Function for Multi-Class                                   Machine Learning       ...
Introduction to StatisticalDeficiencies of the Least Squares Approach                              Machine Learning        ...
Introduction to StatisticalDeficiencies of the Least Squares Approach                               Machine Learning       ...
Introduction to StatisticalFisher’s Linear Discriminant                                         Machine Learning          ...
Introduction to StatisticalFisher’s Linear Discriminant                                        Machine Learning           ...
Introduction to StatisticalFisher’s Linear Discriminant - First Try                                 Machine Learning      ...
Introduction to StatisticalFisher’s Linear Discriminant                                    Machine Learning               ...
Introduction to StatisticalFisher’s Linear Discriminant                                                    Machine Learnin...
Introduction to StatisticalFisher’s Linear Discriminant                                          Machine Learning         ...
Introduction to StatisticalFisher’s Discriminant For Multi-Class                                Machine Learning          ...
Introduction to StatisticalFisher’s Discriminant For Multi-Class                              Machine Learning            ...
Introduction to StatisticalFisher’s Discriminant For Multi-Class                              Machine Learning            ...
Introduction to StatisticalThe Perceptron Algorithm                                           Machine Learning            ...
Introduction to StatisticalThe Perceptron Algorithm                                          Machine Learning             ...
Introduction to StatisticalThe Perceptron Algorithm                                     Machine Learning                  ...
Introduction to StatisticalThe Perceptron Algorithm - Error Function                          Machine Learning            ...
Introduction to StatisticalPerceptron - Stochastic Gradient Descent                              Machine Learning         ...
Introduction to StatisticalThe Perceptron Algorithm - Update 1                                      Machine Learning      ...
Introduction to StatisticalThe Perceptron Algorithm - Update 2                                      Machine Learning      ...
Introduction to StatisticalThe Perceptron Algorithm - Convergence                                       Machine Learning  ...
Upcoming SlideShare
Loading in...5
×

linear classification

243

Published on

linear classification

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
243
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

linear classification

  1. 1. Introduction to Statistical Machine Learning c 2011 Christfried Webers NICTA The Australian National UniversityIntroduction to Statistical Machine Learning Christfried Webers Statistical Machine Learning Group NICTA and College of Engineering and Computer Science The Australian National University Canberra February – June 2011 (Many figures from C. M. Bishop, "Pattern Recognition and Machine Learning") 1of 267
  2. 2. Introduction to Statistical Machine Learning c 2011 Christfried Webers NICTA The Australian National University Part VII Classification Generalised Linear ModelLinear Classification 1 Inference and Decision Discriminant Functions Fisher’s Linear Discriminant The Perceptron Algorithm 230of 267
  3. 3. Introduction to StatisticalClassification Machine Learning c 2011 Christfried Webers NICTA Goal : Given input data x, assign it to one of K discrete The Australian National University classes Ck where k = 1, . . . , K. Divide the input space into different regions. Classification Generalised Linear Model Inference and Decision Discriminant Functions Fisher’s Linear Discriminant The Perceptron AlgorithmFigure: Length of the petal [in cm] for a given sepal [cm] for iris flowers(Iris Setosa, Iris Versicolor, Iris Virginica). 231of 267
  4. 4. Introduction to StatisticalHow to represent binary class labels? Machine Learning c 2011 Christfried Webers NICTA The Australian National University Class labels are no longer real values as in regression, but a discrete set. Classification Generalised Linear Two classes : t ∈ {0, 1} Model ( t = 1 represents class C1 and t = 0 represents class C2 ) Inference and Decision Discriminant Functions Can interpret the value of t as the probability of class C1 , Fisher’s Linear with only two values possible for the probability, 0 or 1. Discriminant The Perceptron Note: Other conventions to map classes into integers Algorithm possible, check the setup. 232of 267
  5. 5. Introduction to StatisticalHow to represent multi-class labels? Machine Learning c 2011 Christfried Webers NICTA The Australian National University If there are more than two classes ( K > 2), we call it a multi-class setup. Often used: 1-of-K coding scheme in which t is a vector of Classification length K which has all values 0 except for tj = 1, where j Generalised Linear Model comes from the membership in class Cj to encode. Inference and Decision Example: Given 5 classes, {C1 , . . . , C5 }. Membership in Discriminant Functions class C2 will be encoded as the target vector Fisher’s Linear Discriminant The Perceptron t = (0, 1, 0, 0, 0)T Algorithm Note: Other conventions to map multi-classes into integers possible, check the setup. 233of 267
  6. 6. Introduction to StatisticalLinear Model Machine Learning c 2011 Christfried Webers NICTA Idea: Use again a Linear Model as in regression: y(x, w) is The Australian National University a linear function of the parameters w y(xn , w) = wT φ(xn ) But generally y(xn , w) ∈ R. Example: Which class is y(x, w) = 0.71623 ? Classification Generalised Linear Model Inference and Decision Discriminant Functions Fisher’s Linear Discriminant The Perceptron Algorithm 234of 267
  7. 7. Introduction to StatisticalGeneralised Linear Model Machine Learning c 2011 Christfried Webers NICTA The Australian National University Apply a mapping f : R → Z to the linear model to get the discrete class labels. Generalised Linear Model y(xn , w) = f (wT φ(xn )) Classification Generalised Linear Model Activation function: f (·) Inference and Decision Link function : f −1 (·) Discriminant Functions Fisher’s Linear sign z Discriminant 1.0 The Perceptron 0.5 Algorithm z 0.5 0.0 0.5 1.0 0.5 1.0 Figure: Example of an activation function f (z) = sign (z) . 235of 267
  8. 8. Introduction to StatisticalThree Models for Decision Problems Machine Learning c 2011 Christfried Webers NICTA The Australian National UniversityIn increasing order of complexity Find a discriminant function f (x) which maps each input directly onto a class label. Discriminative Models Classification 1 Solve the inference problem of determining the posterior Generalised Linear class probabilities p(Ck | x). Model 2 Use decision theory to assign each new x to one of the Inference and Decision classes. Discriminant Functions Fisher’s Linear Generative Models Discriminant 1 Solve the inference problem of determining the The Perceptron Algorithm class-conditional probabilities p(x | Ck ). 2 Also, infer the prior class probabilities p(Ck ). 3 Use Bayes’ theorem to find the posterior p(Ck | x). 4 Alternatively, model the joint distribution p(x, Ck ) directly. 5 Use decision theory to assign each new x to one of the classes. 236of 267
  9. 9. Introduction to StatisticalTwo Classes Machine Learning c 2011 Christfried Webers NICTA The Australian National UniversityDefinitionA discriminant is a function that maps from an input vector x toone of K classes, denoted by Ck . Classification Generalised Linear Model Consider first two classes ( K = 2 ). Inference and Decision Construct a linear function of the inputs x Discriminant Functions Fisher’s Linear y(x) = wT x + w0 Discriminant The Perceptron Algorithm such that x being assigned to class C1 if y(x) ≥ 0, and to class C2 otherwise. weight vector w bias w0 ( sometimes −w0 called threshold ) 237of 267
  10. 10. Introduction to StatisticalTwo Classes Machine Learning c 2011 Christfried Webers NICTA The Australian National University Decision boundary y(x) = 0 is a (D − 1)-dimensional hyperplane in a D-dimensional input space (decision Classification surface). Generalised Linear Model w is orthogonal to any vector lying in the decision surface. Inference and Decision Discriminant Functions Proof: Assume xA and xB are two points lying in the Fisher’s Linear decision surface. Then, Discriminant The Perceptron Algorithm 0 = y(xA ) − y(xB ) = wT (xA − xB ) 238of 267
  11. 11. Introduction to StatisticalTwo Classes Machine Learning c 2011 Christfried Webers NICTA The Australian National University The normal distance from the origin to the decision surface is wT x w0 =− w w Classification Generalised Linear y>0 x2 Model y=0 R1 Inference and Decision y<0 R2 Discriminant Functions Fisher’s Linear Discriminant x w The Perceptron y(x) Algorithm w x⊥ x1 −w0 w 239of 267
  12. 12. Introduction to StatisticalTwo Classes Machine Learning c 2011 Christfried Webers NICTA The Australian National University The value of y(x) gives a signed measure of the perpendicular distance r of the point x from the decision surface, r = y(x)/ w . Classification x 0 Generalised Linear T w wT w Model y(x) = w x⊥ + r +w0 = r + wT x⊥ + w0 = r w Inference and Decision w w Discriminant Functions y>0 x2 Fisher’s Linear y=0 Discriminant y<0 R1 R2 The Perceptron Algorithm x w y(x) w x⊥ x1 −w0 w 240of 267
  13. 13. Introduction to StatisticalTwo Classes Machine Learning c 2011 Christfried Webers NICTA The Australian National University More compact notation : Add an extra dimension to the Classification input space and set the value to x0 = 1. Generalised Linear Model Also define w = (w0 , w) and x = (1, x) Inference and Decision T Discriminant Functions y(x) = w x Fisher’s Linear Discriminant Decision surface is now a D-dimensional hyperplane in a The Perceptron Algorithm D + 1-dimensional expanded input space. 241of 267
  14. 14. Introduction to StatisticalMulti-Class Machine Learning c 2011 Christfried Webers NICTA The Australian National University Number of classes K > 2 Can we combine a number of two-class discriminant functions using K − 1 one-versus-the-rest classifiers? Classification Generalised Linear Model Inference and Decision ? Discriminant Functions Fisher’s Linear Discriminant R1 R2 The Perceptron Algorithm C1 R3 C2 not C1 not C2 242of 267
  15. 15. Introduction to StatisticalMulti-Class Machine Learning c 2011 Christfried Webers NICTA The Australian National University Number of classes K > 2 Can we combine a number of two-class discriminant functions using K(K − 1)/2 one-versus-one classifiers? Classification Generalised Linear C3 Model C1 Inference and Decision Discriminant Functions R1 Fisher’s Linear R3 Discriminant C1 ? The Perceptron Algorithm C3 R2 C2 C2 243of 267
  16. 16. Introduction to StatisticalMulti-Class Machine Learning c 2011 Christfried Webers NICTA The Australian National University Number of classes K > 2 Solution: Use K linear functions yk (x) = wT x + wk0 k Classification Generalised Linear Assign input x to class Ck if yk (x) > yj (x) for all j = k. Model Inference and Decision Decision boundary between class Ck and Cj given by Discriminant Functions yk (x) = yj (x) Fisher’s Linear Discriminant The Perceptron Algorithm Rj Ri Rk xB xA x 244of 267
  17. 17. Introduction to StatisticalLeast Squares for Classification Machine Learning c 2011 Christfried Webers NICTA The Australian National University Regression with a linear function of the model parameters and minimisation of sum-of-squares error function resulted Classification in a closed-from solution for the parameter values. Generalised Linear Model Is this also possible for classification? Inference and Decision Given input data x belonging to one of K classes Ck . Discriminant Functions Use 1-of-K binary coding scheme. Fisher’s Linear Discriminant Each class is described by its own linear model The Perceptron Algorithm yk (x) = wT x + wk0 k k = 1, . . . , K 245of 267
  18. 18. Introduction to StatisticalLeast Squares for Classification Machine Learning c 2011 Christfried Webers NICTA The Australian National University With the conventions wk0 wk = ∈ RD+1 wk Classification 1 D+1 Generalised Linear x= ∈R Model x Inference and Decision W = w1 ... wK ∈ R(D+1)×K Discriminant Functions Fisher’s Linear Discriminant we get for the discriminant function (vector valued) The Perceptron Algorithm y(x) = WT x ∈ RK . For a new input x, the class is then defined by the index of the largest value in the row vector y(x) 246of 267
  19. 19. Introduction to StatisticalDetermine W Machine Learning c 2011 Christfried Webers NICTA The Australian National University Given a training set {xn , t} where n = 1, . . . , N, and t is the class in the 1-of-K coding scheme. Define a matrix T where row n corresponds to tT . n Classification The sum-of-squares error can now be written as Generalised Linear Model Inference and Decision 1 ED (W) = tr (XW − T)T (XW − T) Discriminant Functions 2 Fisher’s Linear Discriminant The minimum of ED (W) will be reached for The Perceptron Algorithm W = (XT X)−1 XT T = X† T where X† is the pseudo-inverse of X. 247of 267
  20. 20. Introduction to StatisticalDiscriminant Function for Multi-Class Machine Learning c 2011 Christfried Webers NICTA The Australian National University The discriminant function y(x) is therefore y(x) = WT x = TT (X† )T x, where X is given by the training data, and x is the new Classification Generalised Linear input. Model Interesting property: If for every tn the same linear Inference and Decision constraint aT tn + b = 0 holds, then the prediction y(x) will Discriminant Functions Fisher’s Linear also obey the same constraint Discriminant The Perceptron aT y(x) + b = 0. Algorithm For the 1-of-K coding scheme, the sum of all components in tn is one, and therefore all components of y(x) will sum to one. BUT: the components are not probabilities, as they are not constraint to the interval (0, 1). 248of 267
  21. 21. Introduction to StatisticalDeficiencies of the Least Squares Approach Machine Learning c 2011 Christfried Webers NICTA The Australian National UniversityMagenta curve : Decision Boundary for the least squaresapproach ( Green curve : Decision boundary for the logisticregression model described later) 4 4 Classification Generalised Linear 2 2 Model Inference and Decision 0 0 Discriminant Functions −2 −2 Fisher’s Linear Discriminant −4 −4 The Perceptron Algorithm −6 −6 −8 −8 −4 −2 0 2 4 6 8 −4 −2 0 2 4 6 8 249of 267
  22. 22. Introduction to StatisticalDeficiencies of the Least Squares Approach Machine Learning c 2011 Christfried Webers NICTA The Australian National UniversityMagenta curve : Decision Boundary for the least squaresapproach ( Green curve : Decision boundary for the logisticregression model described later) 6 6 Classification Generalised Linear 4 4 Model Inference and Decision 2 2 Discriminant Functions Fisher’s Linear 0 0 Discriminant The Perceptron −2 −2 Algorithm −4 −4 −6 −6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 250of 267
  23. 23. Introduction to StatisticalFisher’s Linear Discriminant Machine Learning c 2011 Christfried Webers NICTA The Australian National University View linear classification as dimensionality reduction. y(x) = wT x Classification Generalised Linear If y ≥ −w0 then class C1 , otherwise C2 . Model But there are many projections from a D-dimensional input Inference and Decision Discriminant Functions space onto one dimension. Fisher’s Linear Projection always means loss of information. Discriminant The Perceptron For classification we want to preserve the class separation Algorithm in one dimension. Can we find a projection which maximally preserves the class separation ? 251of 267
  24. 24. Introduction to StatisticalFisher’s Linear Discriminant Machine Learning c 2011 Christfried Webers NICTA The Australian National UniversitySamples from two classes in a two-dimensional input spaceand their histogram when projected to two differentone-dimensional spaces. Classification Generalised Linear Model 4 4 Inference and Decision Discriminant Functions 2 2 Fisher’s Linear Discriminant The Perceptron 0 0 Algorithm −2 −2 −2 2 6 −2 2 6 252of 267
  25. 25. Introduction to StatisticalFisher’s Linear Discriminant - First Try Machine Learning c 2011 Christfried Webers NICTA Given N1 input data of class C1 , and N2 input data of class The Australian National University C2 , calculate the centres of the two classes 1 1 m1 = xn , m2 = xn N1 N2 n∈C1 n∈C2 Choose w so as to maximise the projection of the class Classification means onto w Generalised Linear Model m1 − m2 = wT (m1 − m2 ) Inference and Decision Discriminant Functions Problem with non-uniform covariance Fisher’s Linear Discriminant 4 The Perceptron Algorithm 2 0 −2 −2 2 6 253of 267
  26. 26. Introduction to StatisticalFisher’s Linear Discriminant Machine Learning c 2011 Christfried Webers NICTA Measure also the within-class variance for each class The Australian National University s2 = k (yn − mk )2 n∈Ck where yn = wT xn . Classification Maximise the Fisher criterion Generalised Linear Model 2 (m2 − m1 ) Inference and Decision J(w) = s2 + s2 1 2 Discriminant Functions Fisher’s Linear Discriminant 4 The Perceptron Algorithm 2 0 −2 −2 2 6 254of 267
  27. 27. Introduction to StatisticalFisher’s Linear Discriminant Machine Learning c 2011 Christfried Webers NICTA The Australian National University The Fisher criterion can be rewritten as wT SB w J(w) = Classification wT S W w Generalised Linear Model SB is the between-class covariance Inference and Decision Discriminant Functions SB = (m2 − m1 )(m2 − m1 )T Fisher’s Linear Discriminant SW is the within-class covariance The Perceptron Algorithm SW = (xn − m1 )(xn − m1 )T + (xn − m2 )(xn − m2 )T n∈C1 n∈C2 255of 267
  28. 28. Introduction to StatisticalFisher’s Linear Discriminant Machine Learning c 2011 Christfried Webers NICTA The Australian National University The Fisher criterion wT SB w Classification J(w) = wT S W w Generalised Linear Model Inference and Decision has a maximum for Fisher’s linear discriminant Discriminant Functions w∝ S−1 (m2 W − m1 ) Fisher’s Linear Discriminant The Perceptron Algorithm Fisher’s linear discriminant is NOT a discriminant, but can be used to construct one by choosing a threshold y0 in the projection space. 256of 267
  29. 29. Introduction to StatisticalFisher’s Discriminant For Multi-Class Machine Learning c 2011 Christfried Webers NICTA Assume that the dimensionality of the input space D is The Australian National University greater than the number of classes K. Use D > 1 linear ’features’ yk = wT x and write everything in vector form (no bias involved!) y = wT x. Classification Generalised Linear The within-class covariance is then the sum of the Model Inference and Decision covariances for all K classes Discriminant Functions K Fisher’s Linear Discriminant SW = Sk The Perceptron k=1 Algorithm where Sk = (xn − mk )(xn − mk )T n∈Ck 1 mk = xn Nk n∈Ck 257of 267
  30. 30. Introduction to StatisticalFisher’s Discriminant For Multi-Class Machine Learning c 2011 Christfried Webers NICTA The Australian National Between-class covariance University K SB = (mk − m)(mk − m)T . k=1 Classification where m is the total mean of the input data Generalised Linear Model N Inference and Decision 1 m= x. Discriminant Functions N n=1 Fisher’s Linear Discriminant One possible way to define a function of W which is large The Perceptron Algorithm when the between-class covariance is large and the within-class covariance is small is given by J(W) = tr (WSW WT )−1 (WSB WT ) The maximum of J(W) is determined by the D eigenvectors of S−1 SB with the largest eigenvalues. W 258of 267
  31. 31. Introduction to StatisticalFisher’s Discriminant For Multi-Class Machine Learning c 2011 Christfried Webers NICTA The Australian National University Classification How many linear ’features’ can one find with this method? Generalised Linear Model SB is of rank at most K − 1 because of the sum of K rank Inference and Decision one matrices and the global constraint via m. Discriminant Functions Projection onto the subspace spanned by SB can not have Fisher’s Linear Discriminant more than K − 1 linear features. The Perceptron Algorithm 259of 267
  32. 32. Introduction to StatisticalThe Perceptron Algorithm Machine Learning c 2011 Christfried Webers NICTA The Australian National University Frank Rosenblatt (1928 - 1969) "Principles of neurodynamics: Perceptrons and the theory of brain mechanisms" (Spartan Books, 1962) Classification Generalised Linear Model Inference and Decision Discriminant Functions Fisher’s Linear Discriminant The Perceptron Algorithm 260of 267
  33. 33. Introduction to StatisticalThe Perceptron Algorithm Machine Learning c 2011 Christfried Webers NICTA The Australian National University Perceptron ("MARK 1") was the first computer which could learn new skills by trial and error Classification Generalised Linear Model Inference and Decision Discriminant Functions Fisher’s Linear Discriminant The Perceptron Algorithm 261of 267
  34. 34. Introduction to StatisticalThe Perceptron Algorithm Machine Learning c 2011 Christfried Webers NICTA The Australian National Two class model University Create feature vector φ(x) by a fixed nonlinear transformation of the input x. Generalised linear model Classification y(x) = f (wT φ(x)) Generalised Linear Model with φ(x) containing some bias element φ0 (x) = 1. Inference and Decision Discriminant Functions nonlinear activation function Fisher’s Linear Discriminant +1, a ≥ 0 The Perceptron f (a) = Algorithm −1, a < 0 Target coding for perceptron +1, if C1 t= −1, if C2 262of 267
  35. 35. Introduction to StatisticalThe Perceptron Algorithm - Error Function Machine Learning c 2011 Christfried Webers NICTA The Australian National University Idea : Minimise total number of misclassified patterns. Problem : As a function of w, this is piecewise constant Classification and therefore the gradient is zero almost everywhere. Generalised Linear Model Better idea: Using the (−1, +1) target coding scheme, we Inference and Decision want all patterns to satisfy wT φ(xn )tn > 0. Discriminant Functions Perceptron Criterion : Add the errors for all patterns Fisher’s Linear Discriminant belonging to the set of misclassified patterns M The Perceptron Algorithm EP (w) = − wT φ(xn )tn n∈M 263of 267
  36. 36. Introduction to StatisticalPerceptron - Stochastic Gradient Descent Machine Learning c 2011 Christfried Webers NICTA The Australian National University Perceptron Criterion (with notation φn = φ(xn ) ) EP (w) = − wT φn tn Classification n∈M Generalised Linear Model One iteration at step τ Inference and Decision 1 Choose a training pair (xn , tn ) Discriminant Functions 2 Update the weight vector w by Fisher’s Linear Discriminant w(τ +1) = w(τ ) − η EP (w) = w(τ ) + ηφn tn The Perceptron Algorithm As y(x, w) does not depend on the norm of w, one can set η=1 w(τ +1) = w(τ ) + φn tn 264of 267
  37. 37. Introduction to StatisticalThe Perceptron Algorithm - Update 1 Machine Learning c 2011 Christfried Webers NICTA The Australian National University Update of the perceptron weights from a misclassified pattern (green) w(τ +1) = w(τ ) + φn tn Classification Generalised Linear Model Inference and Decision 1 1 Discriminant Functions Fisher’s Linear 0.5 0.5 Discriminant The Perceptron Algorithm 0 0 −0.5 −0.5 −1 −1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 265of 267
  38. 38. Introduction to StatisticalThe Perceptron Algorithm - Update 2 Machine Learning c 2011 Christfried Webers NICTA The Australian National University Update of the perceptron weights from a misclassified pattern (green) w(τ +1) = w(τ ) + φn tn Classification Generalised Linear Model Inference and Decision 1 1 Discriminant Functions Fisher’s Linear 0.5 0.5 Discriminant The Perceptron Algorithm 0 0 −0.5 −0.5 −1 −1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 266of 267
  39. 39. Introduction to StatisticalThe Perceptron Algorithm - Convergence Machine Learning c 2011 Christfried Webers NICTA The Australian National University Does the algorithm converge ? For a single update step −w(τ +1)T φn tn = −w(τ )T φn tn − (φn tn )T φn tn < −w(τ )T φn tn Classification Generalised Linear Model T because (φn tn ) φn tn = φn tn > 0. Inference and Decision BUT: contributions to the error from the other misclassified Discriminant Functions patterns might have increased. Fisher’s Linear Discriminant AND: some correctly classified patterns might now be The Perceptron Algorithm misclassified. Perceptron Convergence Theorem : If the training set is linearly separable, the perceptron algorithm is guaranteed to find a solution in a finite number of steps. 267of 267

×