This document provides a summary of a lecture on support vector machines (SVMs). The lecture discusses how SVMs find the optimal separating hyperplane between two classes by maximizing the margin between them. It covers both the separable and non-separable cases, and how SVMs can be extended to non-linear classification using kernel tricks. The lecture concludes by mentioning further issues like multi-class classification and algorithms for building SVMs.
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
Lecture12 - SVM
1. Introduction to Machine
Learning
Lecture 12
Support Vector Machines
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull
2. Recap of Lecture 11
1st generation NN: Perceptrons and others
g p
Also multi-layer percetrons
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lecture 11
2nd generation NN
g
Some people figure it out how to adapt the weights of internal
layers
aye s
Seemed to be very powerful and able to solve almost anything
The reality showed that this was not exactly true
Slide 3
Artificial Intelligence Machine Learning
4. Today’s Agenda
Moving to SVM
g
Linear SVM
The separable case
The non-separable case
Non-Linear
Non Linear SVM
Slide 4
Artificial Intelligence Machine Learning
5. Introduction
SVM (Vapnik, 1995)
(p, )
Clever type of perceptron
Instead f h d di the layer of non-adaptive f t
I t d of hand-coding th l f d ti features, each h
training example is used to create a new feature using a fixed
recipe
ec pe
A clever optimization technique is used to select the best
subset o features
of eatu es
Many NNs researchers switched to SVM in the 1990s
because they work better
Here, we’ll take a slow path into SVM concepts
Slide 5
Artificial Intelligence Machine Learning
6. Shattering Points with Oriented Hyperplanes
Remember the idea
I want to build hyperplanes that separate points of two classes
In a two-dimensional space lines
E.g.: Linear Classifier
Which is the best separating line?
Remember, a hyperplane is
represented by th equation
t d b the ti
WX + b = 0
Slide 6
Artificial Intelligence Machine Learning
7. Linear SVM
I want the line that maximizes the margin between
g
examples of both classes!
Support Vectors
Slide 7
Artificial Intelligence Machine Learning
8. Linear SVM
In more detail
Let’s assume two classes
yi = {-1 1}
{-1,
Each example described by
a set of features x (x is a
vector; for clarity, we will
mark vectors in bold in the
remainder of the slides)
The problem can be formulated as follows
All training must satisfy
(
(in the separable case) )
This can be combined
Slide 8
Artificial Intelligence Machine Learning
9. Linear SVM
What are the support vectors?
pp
Let’s find the points that lay on the hyper plane H1
Their perpendicular distance to the origin is
Let’s find the points that lay on the hyper plane H2
Their perpendicular distance to the origin is
The margin is:
Slide 9
Artificial Intelligence Machine Learning
10. Linear SVM
Therefore, the problem is
, p
Find the hyper plane that minimizes
Subject to
But let us change to the Lagrange formulation because
The constraints will be placed on the Lagrange multipliers
themselves (easier to handle)
Training data will appear only in form of dot products between
vectors
Slide 10
Artificial Intelligence Machine Learning
11. Linear SVM
The Lagrangian formulation comes to be
g g
Where αi are the Lagrange multipliers
So,
So now we need to
Minimize Lp w.r.t w, b
Simultaneously require that the derivatives of Lp w.r.t to α
vanish
All subject to the constraints αi ≥ 0
Slide 11
Artificial Intelligence Machine Learning
12. Linear SVM
Transformation to the dual problem
p
This is a convex problem
We
W can equivalently solve th d l problem
i l tl l the dual bl
That is, maximize LD
W.r.t αi
Subject to constraints
And with αi ≥ 0
Slide 12
Artificial Intelligence Machine Learning
13. Linear SVM
This is a quadratic programming problem. You can solve
it with many methods such as gradient descent
We’ll not see these methods in class
Slide 13
Artificial Intelligence Machine Learning
14. The Non-Separable case
What if I can not separate the two classes
p
We will not be able to solve the Lagrangian formulation
proposed
Any idea?
Slide 14
Artificial Intelligence Machine Learning
15. The Non-Separable Case
Just relax the constraints by p
y permitting some errors
g
Slide 15
Artificial Intelligence Machine Learning
16. The Non-Separable Case
That means that the Lagrangian is rewritten
g g
We change the objective
function to be minimized to
uco o ed o
Therefore, we are maximizing the margin and minimizing the error
C i a constant to be chosen b th user
is t tt b h by the
The dual problem becomes
Subject to and
Slide 16
Artificial Intelligence Machine Learning
17. Non-Linear SVM
What happens if the decision function is a linear function of
pp
the data?
In our equations data appears in form of dot products xi · xj
equations,
Wouldn’t you like to have polynomials, logarithmics, …
functions to fit the data?
Slide 17
Artificial Intelligence Machine Learning
18. Non-Linear SVM
The kernel trick
Map the data into a higher-dimensional space
Mercer theorem: any continuous, symmetric, positive semi-
definite kernel function K(x, y) can be expressed as a dot
product in a high dimensional space
high-dimensional
Now, we have a kernel function
An example
All we have talked about still holds when using the
kernel function
The only difference is that now my function will be
Slide 18
Artificial Intelligence Machine Learning
19. Non-Linear SVM
Some typical kernels
A visual example of a polynomial kernel with p=3
i l lf l i lk l ith 3
Slide 19
Artificial Intelligence Machine Learning
20. Some Further Issues
We have to classify data
y
Described by nominal attributes and continuous attributes
Probably ith i i
P b bl with missing values
l
That may have more than two classes
How SVM deal with them?
SVM defined over continuous attributes No problem!
attributes.
Nominal attributes Map into continuous space
Multiple classes Build S
SVM that discriminate each pair of
f
classes
Slide 20
Artificial Intelligence Machine Learning
21. Some Further Issues
I’ve seen lots of formulas… But I want to program a SVM
pg
builder. How I get my SVM?
We have already mentioned that there are many methods to
solve the quadratic programming problem
Many algorithms designed for SVM
One of the most significant: Sequential Minimal Optimization
Currently, there are many new algorithms
C lh l ih
Slide 21
Artificial Intelligence Machine Learning
22. Next Class
Association Rules
Slide 22
Artificial Intelligence Machine Learning
23. Introduction to Machine
Learning
Lecture 12
Support Vector Machines
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull