SlideShare a Scribd company logo
1 of 34
Download to read offline
Support Vector Machine
  (and Statistical Learning Theory)

           Tutorial
             Jason Weston
          NEC Labs America
  4 Independence Way, Princeton, USA.
         jasonw@nec-labs.com
1 Support Vector Machines: history
 • SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became
   rather popular since.

 • Theoretically well motivated algorithm: developed from Statistical
   Learning Theory (Vapnik & Chervonenkis) since the 60s.

 • Empirically good performance: successful applications in many
   fields (bioinformatics, text, image recognition, . . . )
2 Support Vector Machines: history II
 • Centralized website: www.kernel-machines.org.

 • Several textbooks, e.g. ”An introduction to Support Vector
   Machines” by Cristianini and Shawe-Taylor is one.
 • A large and diverse community work on them: from machine
   learning, optimization, statistics, neural networks, functional
   analysis, etc.
3 Support Vector Machines: basics
[Boser, Guyon, Vapnik ’92],[Cortes & Vapnik ’95]


            -              -             -              - margin
                                -

                                                                 margin
                    +                        +
           +
                                    +                    +
Nice properties: convex, theoretically motivated, nonlinear with kernels..
4 Preliminaries:
 • Machine learning is about learning structure from data.

 • Although the class of algorithms called ”SVM”s can do more, in this
   talk we focus on pattern recognition.

 • So we want to learn the mapping: X → Y, where x ∈ X is some
   object and y ∈ Y is a class label.
 • Let’s take the simplest case: 2-class classification. So: x ∈ Rn ,
   y ∈ {±1}.
5 Example:

Suppose we have 50 photographs of elephants and 50 photos of tigers.




                                  vs.

We digitize them into 100 x 100 pixel images, so we have x ∈ Rn where
n = 10, 000.
Now, given a new (different) photograph we want to answer the question:
is it an elephant or a tiger? [we assume it is one or the other.]
6 Training sets and prediction models
 • input/output sets X , Y

 • training set (x1 , y1 ), . . . , (xm , ym )
 • ”generalization”: given a previously seen x ∈ X , find a suitable
   y ∈ Y.

 • i.e., want to learn a classifier: y = f (x, α), where α are the
   parameters of the function.
 • For example, if we are choosing our model from the set of
   hyperplanes in Rn , then we have:

                          f (x, {w, b}) = sign(w · x + b).
7 Empirical Risk and the true Risk
 • We can try to learn f (x, α) by choosing a function that performs well
   on training data:
                             m
                       1
            Remp (α) =            (f (xi , α), yi ) = Training Error
                       m    i=1

   where is the zero-one loss function, (y, y ) = 1, if y = y , and 0
                                             ˆ              ˆ
   otherwise. Remp is called the empirical risk.
 • By doing this we are trying to minimize the overall risk:

              R(α) =       (f (x, α), y)dP (x, y) = Test Error

   where P(x,y) is the (unknown) joint distribution function of x and y.
8 Choosing the set of functions
What about f (x, α) allowing all functions from X to {±1}?
Training set (x1 , y1 ), . . . , (xm , ym ) ∈ X × {±1}
Test set x1 , . . . , xm ∈ X ,
         ¯            ¯¯
such that the two sets do not intersect.
For any f there exists f ∗ :
 1. f ∗ (xi ) = f (xi ) for all i
 2. f ∗ (xj ) = f (xj ) for all j
Based on the training data alone, there is no means of choosing which
function is better. On the test set however they give different results. So
generalization is not guaranteed.
=⇒ a restriction must be placed on the functions that we allow.
9 Empirical Risk and the true Risk
Vapnik & Chervonenkis showed that an upper bound on the true risk can
be given by the empirical risk + an additional term:


                                   h(log( 2m + 1) − log( η )
                                                         4
           R(α) ≤ Remp (α) +               h
                                             m
where h is the VC dimension of the set of functions parameterized by α.
 • The VC dimension of a set of functions is a measure of their capacity
   or complexity.
 • If you can describe a lot of different phenomena with a set of
   functions then the value of h is large.
[VC dim = the maximum number of points that can be separated in all
possible ways by that set of functions.]
10 VC dimension:

The VC dimension of a set of functions is the maximum number of points
that can be separated in all possible ways by that set of functions. For
hyperplanes in Rn , the VC dimension can be shown to be n + 1.
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                             x
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                     x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                              x                                        x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                                                                    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
         x    xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                 x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                          x                                        x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                         x
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                x                                        x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                                  x
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                            xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
              xxxxxxxxxxxxxxxxxxxxxxxxxx                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                             xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                             x                                       x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                              x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                                       x
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
         x
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                                 x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                          x
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                                                   x                  xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                             xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                         x                 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                x
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                                                                                         x                                        x   xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
                                           xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx                                                                      xxxxxxxxxxxx
                                                                                                                                                      xxxxxxxxxxxx
  xxxxxxxxxxxxxxxxxxxx                                                                                                                                xxxxxxxxxxxx
11 VC dimension and capacity of functions

Simplification of bound:
  Test Error ≤ Training Error + Complexity of set of Models

 • Actually, a lot of bounds of this form have been proved (different
   measures of capacity). The complexity function is often called a
   regularizer.

 • If you take a high capacity set of functions (explain a lot) you get low
   training error. But you might ”overfit”.

 • If you take a very simple set of models, you have low complexity, but
   won’t get low training error.
12 Capacity of a set of functions (classification)




[Images taken from a talk by B. Schoelkopf.]
13 Capacity of a set of functions (regression)
                  y

                                     sine curve fit


                                             hyperplane fit



                                         true function
                                     x
14 Controlling the risk: model complexity

                                 Bound on the risk


                                 Confidence interval




                                          Empirical risk
                                          (training error)


         h1           h*    hn


              S1 S*    Sn
15 Capacity of hyperplanes

Vapnik & Chervonenkis also showed the following:
Consider hyperplanes (w · x) = 0 where w is normalized w.r.t a set of
points X ∗ such that: mini |w · xi | = 1.
The set of decision functions fw (x) = sign(w · x) defined on X ∗ such
that ||w|| ≤ A has a VC dimension satisfying

                               h ≤ R2 A2 .

where R is the radius of the smallest sphere around the origin containing
X ∗.
=⇒ minimize ||w||2 and have low capacity
=⇒ minimizing ||w||2 equivalent to obtaining a large margin classifier
<w, x> + b > 0

                                    x

           q                    x
                       q

                                            x
<w, x> + b < 0     w                    x


                       q
               q
                           q
                               {x | <w, x> + b = 0}
{x | <w, x> + b = +1}
{x | <w, x> + b = −1}                                    Note:
                                      x
                                                                  <w, x1> + b = +1
           r                      x   x1       yi = +1            <w, x2> + b = −1
                   x2r
                                                         =>     <w , (x1−x2)> = 2
                                                x
                             ,                                  w
    yi = −1        w                       x             =>   <     , (x1−x2) = 2
                                                                           >
                                                              ||w||            ||w||

                        r
               r
                            r
                                 {x | <w, x> + b = 0}
16 Linear Support Vector Machines (at last!)

So, we would like to find the function which minimizes an objective like:
  Training Error + Complexity term
We write that as:
                     m
                 1
                           (f (xi , α), yi ) + Complexity term
                 m   i=1

For now we will choose the set of hyperplanes (we will extend this later),
so f (x) = (w · x) + b:
                           m
                      1
                                 (w · xi + b, yi ) + ||w||2
                      m    i=1

subject to mini |w · xi | = 1.
17 Linear Support Vector Machines II
That function before was a little difficult to minimize because of the step
function in (y, y ) (either 1 or 0).
                ˆ
Let’s assume we can separate the data perfectly. Then we can optimize
the following:
Minimize ||w||2 , subject to:


                      (w · xi + b) ≥ 1, if yi = 1
                    (w · xi + b) ≤ −1, if yi = −1
The last two constraints can be compacted to:

                            yi (w · xi + b) ≥ 1

This is a quadratic program.
18 SVMs : non-separable case
To deal with the non-separable case, one can rewrite the problem as:
Minimize:
                                             m
                                ||w||2 + C         ξi
                                             i=1
subject to:

                     yi (w · xi + b) ≥ 1 − ξi ,         ξi ≥ 0

This is just the same as the original objective:
                          m
                      1
                                (w · xi + b, yi ) + ||w||2
                      m   i=1

except is no longer the zero-one loss, but is called the ”hinge-loss”:
 (y, y ) = max(0, 1 − y y ). This is still a quadratic program!
     ˆ                  ˆ
-       +
-                        -        -
                 -

                                 margin
    +                        +
+       ξi
                     +       -        +
             -
19 Support Vector Machines - Primal
 • Decision function:
                            f (x) = w · x + b

 • Primal formulation:
                             1     2
       min P (w, b) =          w           + C          H1 [ yi f (xi ) ]
                             2                      i
                         maximize margin
                                            minimize training error
   Ideally H1 would count the number of errors, approximate with:
                                                H1(z)




   Hinge Loss H1 (z) = max(0, 1 − z)

                                                    0                   z
20 SVMs : non-linear case
Linear classifiers aren’t complex enough sometimes. SVM solution:
Map data into a richer feature space including nonlinear features, then
construct a hyperplane in that space so all other equations are the same!
Formally, preprocess the data with:

                               x → Φ(x)

and then learn the map from Φ(x) to y:

                          f (x) = w · Φ(x) + b.
21 SVMs : polynomial mapping

                             Φ : R2 → R3
          (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 ,
                                            1                            (2)x1 x2 , x2 )
                                                                                     2

                                   x2
                                                                                     z3
                   !
                                           !                             !                !
      !                                                !
                                       !                                         !
                !                                                        !
                                           !                                              !
                                                                                              !
                                                                     !
      !    !                       r               !                                 !
                                                                             !
               r
                           r
                                           r
                                                           x1        r
                                                                     r                            !
                               r
                                                                       r !
      !
                                   r
                                           r
                                                                      rr   !
                                                                             !
                                                                                                      z1
           !               r                       !                 r r
                                                                         r
                                                           !                                          !
                       !
                                               !

            !              !                           !
                                       !                        z2
22 SVMs : non-linear case II
For example MNIST hand-writing recognition.
60,000 training examples, 10000 test examples, 28x28.
Linear SVM has around 8.5% test error.
Polynomial SVM has around 1% test error.

                     5   0   4   1   9   2   1   3   1   4


                     3   5   3   6   1   7   2   8   6   9


                     4   0   9   1   1   2   4   3   2   7


                     3   8   6   9   0   5   6   0   7   6


                     1   8   7   9   3   9   8   5   9   3


                     3   0   7   4   9   8   0   9   4   1


                     4   4   6   0   4   5   6   1   0   0


                     1   7   1   6   3   0   2   1   1   7


                     9   0   2   6   7   8   3   9   0   4


                     6   7   4   6   8   0   7   8   3   1
23 SVMs : full MNIST results

                        Classifier           Test Error
                          linear            8.4%
                    3-nearest-neighbor      2.4%
                       RBF-SVM              1.4 %
                     Tangent distance       1.1 %
                          LeNet             1.1 %
                     Boosted LeNet          0.7 %
                Translation invariant SVM   0.56 %


Choosing a good mapping Φ(·) (encoding prior knowledge + getting right
complexity of function class) for your problem improves results.
24 SVMs : the kernel trick
Problem: the dimensionality of Φ(x) can be very large, making w hard to
represent explicitly in memory, and hard for the QP to solve.
The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for
SVMs as a special case):
                                     m
                            w=             αi Φ(xi )
                                     i=1

for some variables α. Instead of optimizing w directly we can thus
optimize α.
The decision rule is now:
                               m
                     f (x) =         αi Φ(xi ) · Φ(x) + b
                               i=1

We call K(xi , x) = Φ(xi ) · Φ(x) the kernel function.
25 Support Vector Machines - kernel trick II

We can rewrite all the SVM equations we saw before, but with the
w = m αi Φ(xi ) equation:
       i=1

 • Decision function:

                      f (x) =           αi Φ(xi ) · Φ(x) + b
                                    i


                          =             αi K(xi , x) + b
                                i

 • Dual formulation:
                              m
                      1                         2
       min P (w, b) =               αi Φ(xi )       + C        H1 [ yi f (xi ) ]
                      2     i=1                            i

                         maximize margin             minimize training error
26 Support Vector Machines - Dual
But people normally write it like this:
  • Dual formulation:
                                                                     
               1                                                        È α =0
                                                                           i   i
    min D(α) =              αi αj Φ(xi )·Φ(xj )−        yi αi s.t.
     α         2                                                        0≤yi αi ≤C
                      i,j                           i

  • Dual Decision function:

                            f (x) =       αi K(xi , x) + b
                                      i

  • Kernel function K(·, ·) is used to make (implicit) nonlinear feature
    map, e.g.
     – Polynomial kernel:       K(x, x ) = (x · x + 1)d .
     – RBF kernel:           K(x, x ) = exp(−γ||x − x ||2 ).
27 Polynomial-SVMs

The kernel K(x, x ) = (x · x )d gives the same result as the explicit
mapping + dot product that we described before:
     Φ : R2 → R3        (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 ,
                                                          1       (2)x1 x2 , x2 )
                                                                              2
                                                            2                       2
 Φ((x1 , x2 ) · Φ((x1 , x2 ) = (x2 ,
                                 1     (2)x1 x2 , x2 ) · (x 1 ,
                                                   2               (2)x 1 x 2 , x 2 )
                              2                             2
                      = x2 x 1 + 2x1 x 1 x2 x 2 + x2 x 2
                         1                         2

is the same as:

              K(x, x ) = (x · x )2 = ((x1 , x2 ) · (x 1 , x 2 ))2
                                           2         2
          = (x1 x 1 + x2 x 2 )2 = x2 x 1 + x2 x 2 + 2x1 x 1 x2 x 2
                                   1        2

Interestingly, if d is large the kernel is still only requires n multiplications
to compute, whereas the explicit representation may not fit in memory!
28 RBF-SVMs
The RBF kernel K(x, x ) = exp(−γ||x − x ||2 ) is one of the most
popular kernel functions. It adds a ”bump” around each data point:
                             m
                  f (x) =          αi exp(−γ||xi − x||2 ) + b
                            i=1




                    Φ




      .    .
      x    x'               Φ(x)     Φ(x')


Using this one can get state-of-the-art results.
29 SVMs : more results

There is much more in the field of SVMs/ kernel machines than we could
cover here, including:

 • Regression, clustering, semi-supervised learning and other domains.
 • Lots of other kernels, e.g. string kernels to handle text.

 • Lots of research in modifications, e.g. to improve generalization
   ability, or tailoring to a particular task.
 • Lots of research in speeding up training.

Please see text books such as the ones by Cristianini & Shawe-Taylor or
by Schoelkopf and Smola.
30 SVMs : software

Lots of SVM software:
 • LibSVM (C++)

 • SVMLight (C)
As well as complete machine learning toolboxes that include SVMs:

 • Torch (C++)
 • Spider (Matlab)

 • Weka (Java)
All available through www.kernel-machines.org.

More Related Content

Similar to Jason svm tutorial

Fungicide Resistance: Can it be Prevented?
Fungicide Resistance: Can it be Prevented?Fungicide Resistance: Can it be Prevented?
Fungicide Resistance: Can it be Prevented?NC State Turf Pathology
 
How SaaS founders can raise Angel funds
How SaaS founders can raise Angel fundsHow SaaS founders can raise Angel funds
How SaaS founders can raise Angel fundsKevin Dewalt
 
48x48 square templatev12
48x48 square templatev1248x48 square templatev12
48x48 square templatev12Jay Buckley
 
Honey comb weave design
Honey comb weave designHoney comb weave design
Honey comb weave designHassan7717348
 
M&A in der Medienbranche
M&A in der MedienbrancheM&A in der Medienbranche
M&A in der Medienbrancheyopi5000
 
B pprocessv3 brown papers-gsw
B pprocessv3 brown papers-gswB pprocessv3 brown papers-gsw
B pprocessv3 brown papers-gswwoznite65
 
36x60 horizontal templatev12
36x60 horizontal templatev1236x60 horizontal templatev12
36x60 horizontal templatev12Jay Buckley
 
42x60 horizontal templatev12
42x60 horizontal templatev1242x60 horizontal templatev12
42x60 horizontal templatev12Jay Buckley
 
Preventing vaw in-elections
Preventing vaw in-electionsPreventing vaw in-elections
Preventing vaw in-electionsJamaity
 
SignWriting in an ASCII World
SignWriting in an ASCII WorldSignWriting in an ASCII World
SignWriting in an ASCII WorldStephen Slevinski
 
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...SignWriting For Sign Languages
 

Similar to Jason svm tutorial (20)

Fungicide Resistance: Can it be Prevented?
Fungicide Resistance: Can it be Prevented?Fungicide Resistance: Can it be Prevented?
Fungicide Resistance: Can it be Prevented?
 
Cdn Tabulacion Exel
Cdn Tabulacion ExelCdn Tabulacion Exel
Cdn Tabulacion Exel
 
How SaaS founders can raise Angel funds
How SaaS founders can raise Angel fundsHow SaaS founders can raise Angel funds
How SaaS founders can raise Angel funds
 
Diagramacion
DiagramacionDiagramacion
Diagramacion
 
48x48 square templatev12
48x48 square templatev1248x48 square templatev12
48x48 square templatev12
 
Honey comb weave design
Honey comb weave designHoney comb weave design
Honey comb weave design
 
Bule E XíCara
Bule E XíCaraBule E XíCara
Bule E XíCara
 
M&A in der Medienbranche
M&A in der MedienbrancheM&A in der Medienbranche
M&A in der Medienbranche
 
Diamond Age Games Deck
Diamond Age Games DeckDiamond Age Games Deck
Diamond Age Games Deck
 
Copa do mundo 2018 continentes paises vencedores
Copa do mundo 2018 continentes paises vencedoresCopa do mundo 2018 continentes paises vencedores
Copa do mundo 2018 continentes paises vencedores
 
Copa do mundo 2018 goleadores
Copa do mundo 2018 goleadoresCopa do mundo 2018 goleadores
Copa do mundo 2018 goleadores
 
B pprocessv3 brown papers-gsw
B pprocessv3 brown papers-gswB pprocessv3 brown papers-gsw
B pprocessv3 brown papers-gsw
 
Sayasaya
SayasayaSayasaya
Sayasaya
 
36x60 horizontal templatev12
36x60 horizontal templatev1236x60 horizontal templatev12
36x60 horizontal templatev12
 
42x60 horizontal templatev12
42x60 horizontal templatev1242x60 horizontal templatev12
42x60 horizontal templatev12
 
Preventing vaw in-elections
Preventing vaw in-electionsPreventing vaw in-elections
Preventing vaw in-elections
 
SignWriting in an ASCII World
SignWriting in an ASCII WorldSignWriting in an ASCII World
SignWriting in an ASCII World
 
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
SIGNWRITING SYMPOSIUM PRESENTATION 49, Part 1: SignWriting in an ASCII World ...
 
Copa do mundo 2018 bolas
Copa do mundo 2018 bolasCopa do mundo 2018 bolas
Copa do mundo 2018 bolas
 
Icraf seminar(de clerck)
Icraf seminar(de clerck)Icraf seminar(de clerck)
Icraf seminar(de clerck)
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Jason svm tutorial

  • 1. Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. jasonw@nec-labs.com
  • 2. 1 Support Vector Machines: history • SVMs introduced in COLT-92 by Boser, Guyon & Vapnik. Became rather popular since. • Theoretically well motivated algorithm: developed from Statistical Learning Theory (Vapnik & Chervonenkis) since the 60s. • Empirically good performance: successful applications in many fields (bioinformatics, text, image recognition, . . . )
  • 3. 2 Support Vector Machines: history II • Centralized website: www.kernel-machines.org. • Several textbooks, e.g. ”An introduction to Support Vector Machines” by Cristianini and Shawe-Taylor is one. • A large and diverse community work on them: from machine learning, optimization, statistics, neural networks, functional analysis, etc.
  • 4. 3 Support Vector Machines: basics [Boser, Guyon, Vapnik ’92],[Cortes & Vapnik ’95] - - - - margin - margin + + + + + Nice properties: convex, theoretically motivated, nonlinear with kernels..
  • 5. 4 Preliminaries: • Machine learning is about learning structure from data. • Although the class of algorithms called ”SVM”s can do more, in this talk we focus on pattern recognition. • So we want to learn the mapping: X → Y, where x ∈ X is some object and y ∈ Y is a class label. • Let’s take the simplest case: 2-class classification. So: x ∈ Rn , y ∈ {±1}.
  • 6. 5 Example: Suppose we have 50 photographs of elephants and 50 photos of tigers. vs. We digitize them into 100 x 100 pixel images, so we have x ∈ Rn where n = 10, 000. Now, given a new (different) photograph we want to answer the question: is it an elephant or a tiger? [we assume it is one or the other.]
  • 7. 6 Training sets and prediction models • input/output sets X , Y • training set (x1 , y1 ), . . . , (xm , ym ) • ”generalization”: given a previously seen x ∈ X , find a suitable y ∈ Y. • i.e., want to learn a classifier: y = f (x, α), where α are the parameters of the function. • For example, if we are choosing our model from the set of hyperplanes in Rn , then we have: f (x, {w, b}) = sign(w · x + b).
  • 8. 7 Empirical Risk and the true Risk • We can try to learn f (x, α) by choosing a function that performs well on training data: m 1 Remp (α) = (f (xi , α), yi ) = Training Error m i=1 where is the zero-one loss function, (y, y ) = 1, if y = y , and 0 ˆ ˆ otherwise. Remp is called the empirical risk. • By doing this we are trying to minimize the overall risk: R(α) = (f (x, α), y)dP (x, y) = Test Error where P(x,y) is the (unknown) joint distribution function of x and y.
  • 9. 8 Choosing the set of functions What about f (x, α) allowing all functions from X to {±1}? Training set (x1 , y1 ), . . . , (xm , ym ) ∈ X × {±1} Test set x1 , . . . , xm ∈ X , ¯ ¯¯ such that the two sets do not intersect. For any f there exists f ∗ : 1. f ∗ (xi ) = f (xi ) for all i 2. f ∗ (xj ) = f (xj ) for all j Based on the training data alone, there is no means of choosing which function is better. On the test set however they give different results. So generalization is not guaranteed. =⇒ a restriction must be placed on the functions that we allow.
  • 10. 9 Empirical Risk and the true Risk Vapnik & Chervonenkis showed that an upper bound on the true risk can be given by the empirical risk + an additional term: h(log( 2m + 1) − log( η ) 4 R(α) ≤ Remp (α) + h m where h is the VC dimension of the set of functions parameterized by α. • The VC dimension of a set of functions is a measure of their capacity or complexity. • If you can describe a lot of different phenomena with a set of functions then the value of h is large. [VC dim = the maximum number of points that can be separated in all possible ways by that set of functions.]
  • 11. 10 VC dimension: The VC dimension of a set of functions is the maximum number of points that can be separated in all possible ways by that set of functions. For hyperplanes in Rn , the VC dimension can be shown to be n + 1. xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx
  • 12. 11 VC dimension and capacity of functions Simplification of bound: Test Error ≤ Training Error + Complexity of set of Models • Actually, a lot of bounds of this form have been proved (different measures of capacity). The complexity function is often called a regularizer. • If you take a high capacity set of functions (explain a lot) you get low training error. But you might ”overfit”. • If you take a very simple set of models, you have low complexity, but won’t get low training error.
  • 13. 12 Capacity of a set of functions (classification) [Images taken from a talk by B. Schoelkopf.]
  • 14. 13 Capacity of a set of functions (regression) y sine curve fit hyperplane fit true function x
  • 15. 14 Controlling the risk: model complexity Bound on the risk Confidence interval Empirical risk (training error) h1 h* hn S1 S* Sn
  • 16. 15 Capacity of hyperplanes Vapnik & Chervonenkis also showed the following: Consider hyperplanes (w · x) = 0 where w is normalized w.r.t a set of points X ∗ such that: mini |w · xi | = 1. The set of decision functions fw (x) = sign(w · x) defined on X ∗ such that ||w|| ≤ A has a VC dimension satisfying h ≤ R2 A2 . where R is the radius of the smallest sphere around the origin containing X ∗. =⇒ minimize ||w||2 and have low capacity =⇒ minimizing ||w||2 equivalent to obtaining a large margin classifier
  • 17. <w, x> + b > 0 x q x q x <w, x> + b < 0 w x q q q {x | <w, x> + b = 0}
  • 18. {x | <w, x> + b = +1} {x | <w, x> + b = −1} Note: x <w, x1> + b = +1 r x x1 yi = +1 <w, x2> + b = −1 x2r => <w , (x1−x2)> = 2 x , w yi = −1 w x => < , (x1−x2) = 2 > ||w|| ||w|| r r r {x | <w, x> + b = 0}
  • 19. 16 Linear Support Vector Machines (at last!) So, we would like to find the function which minimizes an objective like: Training Error + Complexity term We write that as: m 1 (f (xi , α), yi ) + Complexity term m i=1 For now we will choose the set of hyperplanes (we will extend this later), so f (x) = (w · x) + b: m 1 (w · xi + b, yi ) + ||w||2 m i=1 subject to mini |w · xi | = 1.
  • 20. 17 Linear Support Vector Machines II That function before was a little difficult to minimize because of the step function in (y, y ) (either 1 or 0). ˆ Let’s assume we can separate the data perfectly. Then we can optimize the following: Minimize ||w||2 , subject to: (w · xi + b) ≥ 1, if yi = 1 (w · xi + b) ≤ −1, if yi = −1 The last two constraints can be compacted to: yi (w · xi + b) ≥ 1 This is a quadratic program.
  • 21. 18 SVMs : non-separable case To deal with the non-separable case, one can rewrite the problem as: Minimize: m ||w||2 + C ξi i=1 subject to: yi (w · xi + b) ≥ 1 − ξi , ξi ≥ 0 This is just the same as the original objective: m 1 (w · xi + b, yi ) + ||w||2 m i=1 except is no longer the zero-one loss, but is called the ”hinge-loss”: (y, y ) = max(0, 1 − y y ). This is still a quadratic program! ˆ ˆ
  • 22. - + - - - - margin + + + ξi + - + -
  • 23. 19 Support Vector Machines - Primal • Decision function: f (x) = w · x + b • Primal formulation: 1 2 min P (w, b) = w + C H1 [ yi f (xi ) ] 2 i maximize margin minimize training error Ideally H1 would count the number of errors, approximate with: H1(z) Hinge Loss H1 (z) = max(0, 1 − z) 0 z
  • 24. 20 SVMs : non-linear case Linear classifiers aren’t complex enough sometimes. SVM solution: Map data into a richer feature space including nonlinear features, then construct a hyperplane in that space so all other equations are the same! Formally, preprocess the data with: x → Φ(x) and then learn the map from Φ(x) to y: f (x) = w · Φ(x) + b.
  • 25. 21 SVMs : polynomial mapping Φ : R2 → R3 (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 , 1 (2)x1 x2 , x2 ) 2 x2 z3 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! r ! ! ! r r r x1 r r ! r r ! ! r r rr ! ! z1 ! r ! r r r ! ! ! ! ! ! ! ! z2
  • 26. 22 SVMs : non-linear case II For example MNIST hand-writing recognition. 60,000 training examples, 10000 test examples, 28x28. Linear SVM has around 8.5% test error. Polynomial SVM has around 1% test error. 5 0 4 1 9 2 1 3 1 4 3 5 3 6 1 7 2 8 6 9 4 0 9 1 1 2 4 3 2 7 3 8 6 9 0 5 6 0 7 6 1 8 7 9 3 9 8 5 9 3 3 0 7 4 9 8 0 9 4 1 4 4 6 0 4 5 6 1 0 0 1 7 1 6 3 0 2 1 1 7 9 0 2 6 7 8 3 9 0 4 6 7 4 6 8 0 7 8 3 1
  • 27. 23 SVMs : full MNIST results Classifier Test Error linear 8.4% 3-nearest-neighbor 2.4% RBF-SVM 1.4 % Tangent distance 1.1 % LeNet 1.1 % Boosted LeNet 0.7 % Translation invariant SVM 0.56 % Choosing a good mapping Φ(·) (encoding prior knowledge + getting right complexity of function class) for your problem improves results.
  • 28. 24 SVMs : the kernel trick Problem: the dimensionality of Φ(x) can be very large, making w hard to represent explicitly in memory, and hard for the QP to solve. The Representer theorem (Kimeldorf & Wahba, 1971) shows that (for SVMs as a special case): m w= αi Φ(xi ) i=1 for some variables α. Instead of optimizing w directly we can thus optimize α. The decision rule is now: m f (x) = αi Φ(xi ) · Φ(x) + b i=1 We call K(xi , x) = Φ(xi ) · Φ(x) the kernel function.
  • 29. 25 Support Vector Machines - kernel trick II We can rewrite all the SVM equations we saw before, but with the w = m αi Φ(xi ) equation: i=1 • Decision function: f (x) = αi Φ(xi ) · Φ(x) + b i = αi K(xi , x) + b i • Dual formulation: m 1 2 min P (w, b) = αi Φ(xi ) + C H1 [ yi f (xi ) ] 2 i=1 i maximize margin minimize training error
  • 30. 26 Support Vector Machines - Dual But people normally write it like this: • Dual formulation:  1  È α =0 i i min D(α) = αi αj Φ(xi )·Φ(xj )− yi αi s.t. α 2  0≤yi αi ≤C i,j i • Dual Decision function: f (x) = αi K(xi , x) + b i • Kernel function K(·, ·) is used to make (implicit) nonlinear feature map, e.g. – Polynomial kernel: K(x, x ) = (x · x + 1)d . – RBF kernel: K(x, x ) = exp(−γ||x − x ||2 ).
  • 31. 27 Polynomial-SVMs The kernel K(x, x ) = (x · x )d gives the same result as the explicit mapping + dot product that we described before: Φ : R2 → R3 (x1 , x2 ) → (z1 , z2 , z3 ) := (x2 , 1 (2)x1 x2 , x2 ) 2 2 2 Φ((x1 , x2 ) · Φ((x1 , x2 ) = (x2 , 1 (2)x1 x2 , x2 ) · (x 1 , 2 (2)x 1 x 2 , x 2 ) 2 2 = x2 x 1 + 2x1 x 1 x2 x 2 + x2 x 2 1 2 is the same as: K(x, x ) = (x · x )2 = ((x1 , x2 ) · (x 1 , x 2 ))2 2 2 = (x1 x 1 + x2 x 2 )2 = x2 x 1 + x2 x 2 + 2x1 x 1 x2 x 2 1 2 Interestingly, if d is large the kernel is still only requires n multiplications to compute, whereas the explicit representation may not fit in memory!
  • 32. 28 RBF-SVMs The RBF kernel K(x, x ) = exp(−γ||x − x ||2 ) is one of the most popular kernel functions. It adds a ”bump” around each data point: m f (x) = αi exp(−γ||xi − x||2 ) + b i=1 Φ . . x x' Φ(x) Φ(x') Using this one can get state-of-the-art results.
  • 33. 29 SVMs : more results There is much more in the field of SVMs/ kernel machines than we could cover here, including: • Regression, clustering, semi-supervised learning and other domains. • Lots of other kernels, e.g. string kernels to handle text. • Lots of research in modifications, e.g. to improve generalization ability, or tailoring to a particular task. • Lots of research in speeding up training. Please see text books such as the ones by Cristianini & Shawe-Taylor or by Schoelkopf and Smola.
  • 34. 30 SVMs : software Lots of SVM software: • LibSVM (C++) • SVMLight (C) As well as complete machine learning toolboxes that include SVMs: • Torch (C++) • Spider (Matlab) • Weka (Java) All available through www.kernel-machines.org.