SlideShare a Scribd company logo
1 of 56
Download to read offline
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Support Vector Machines
Dr. Fayyaz ul Amir Afsar Minhas
PIEAS Biomedical Informatics Research Lab
Department of Computer and Information Sciences
Pakistan Institute of Engineering & Applied Sciences
PO Nilore, Islamabad, Pakistan
http://faculty.pieas.edu.pk/fayyaz/
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 2
Classification
• Before moving on with the discussion let us
restrict ourselves to the following problem
– T = Given Training Set = {(x(i),yi), i = 1…N}
• x(i)ε Rm {Data Point i }
• yi: class of data point i (+1 or -1)
x1
x2
+1
-1
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 3
Use of Linear Discriminant in Classification
• Classifiers such as the Single Layer Perceptron (with
linear activation function) and SVM use a linear
discriminant function to differentiate between
patterns of different classes
• The linear discriminant function is given by
( )
( ) ( )
sgn x sgn w x
T
f b
= +
x1
x2
Linear Discriminant
Function
( ) 0
x w x
T
f b
= + =
> 0
< 0
w & b are ‘learned’
from the training data
using some error
criterion
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 4
Use of Linear Discriminant in Classification
• There are a large number of lines (or in
general ‘hyperplanes’) separating the two
classes
x1
x2
( ) 0
x w x
T
f b
= + =
> 0
< 0
Which separator is the best?
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 5
Use of Linear Discriminant in Classification
x1
x2 > 0
< 0
Misclassified as -1 by the ‘red’
discriminant
The boundary which lies at the
maximum distance from data
points of both classes gives better
tolerance to noise and better
generalization
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 6
Margin of a linear classifier
• The width by which the boundary of a linear
classifier can be increased before hitting a
data point is called the margin of the linear
classifier
x1
x2 > 0
< 0
x1
x2 > 0
< 0
Linear Classifiers
with larger margins
are better
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 7
Support Vector Machines (SVM)
• Support Vector Machines are linear classifiers
that produce the optimal separating boundary
(hyper-plane)
– Find w and b in a way so as to maximize the
margin while classifying all the training patterns
correctly (for linearly separable problem)
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 8
Finding Margin of a Linear Classifier
• Consider a linear classifier with the
boundary
• We know that the vector w is
perpendicular to the boundary
– Consider two points x(1) and x(2) on
the boundary
> 0
x1
x2
< 0
( ) 0
x w x
T
f b
= + =
( ) 0
x w x for all x on the boundary
T
f b
= + =
( )
( ) ( )
( )
( )
( ) ( )
( )
( ) ( )
( ) ( )
( ) ( ) ( )
( )
1 1
2 2
2 1 2 1
0 1
0 2
1 2
0
x w x
x w x
Subtracting from
w x x w x x
T
T
T
f b
f b
= + =
= + =
− =  ⊥ −
x(1)
x(2)
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 9
Finding Margin of a Linear Classifier
• Let x(s) be a point in the feature space
with its projection x(p) on the
boundary
• We know that,
x1
x2
< 0
( ) 0
x w x
T
f b
= + =
( )
( ) ( )
0
x w x
p p
T
f b
= + =
x(s)
x(p)
( ) ( )
( )
( ) ( )
( )
( )
( )
( )
( )
( )
0
s p
s s
T
p
T
p
T T
p
T T
s
ˆ
x x w
x w x
ˆ
w x w
ˆ
w x w w
w
w x w
w
w w
x
w
r
f b
r b
r b
b r
r r
f
r
= +
 = +
= + +
= + +
= + +
= + =
 =
w
r
Perpendicular
Distance of a point
from a line
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Example
• Consider the line
– x1+2x2+3 = 0
• The distance of
(4,2) is
– r = 4.92
10
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
x1
x
2
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 11
Finding Margin of a Linear Classifier
• The points in the training set that
lie closest (having minimum
perpendicular distance) to the
separating hyper-plane are called
support vectors
• Margin is twice of perpendicular
distances of the hyper-plane from
the nearest support vector of the
two classes
x1
x2 > 0
< 0
Nearest Support Vector
( )
2 2
*
x
w
f
r
 = =
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Geometric vs. Functional Margin
• Functional Margin
– This gives the position
of the point with
respect to the plane,
which does not depend
on the magnitude.
• Geometric Margin
– This gives the distance
between the given
training example and
the given plane.
12
x1
x2 > 0
< 0
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 13
Finding Margin of a Linear Classifier
• Assume that all data is at least distance 1 from the hyperplane, then
the following two constraints follow for a training set {(x(i) ,yi)}
wTx(i) + b ≥ 1 if yi = 1
wTx(i) + b ≤ -1 if yi = -1
( ) 1
2 2 2
2
*
x
w w
w
f
r



 = = =
=
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 14
Support Vector Machines
• Support Vector Machines, in their basic form, are linear
classifiers that are trained in a way so as to maximize the
margin
• Principles of Operation
– Define what an optimal hyper-plane is (in way that can be
identified in a computationally efficient way)
• Maximize margin
– Allows noise tolerance
– Extend the above definition for non-linearly separable
problems
• have a penalty term for misclassifications
– Map data to an alternate space where it is easier to
classify with linear decision surfaces
• reformulate problem so that data is mapped implicitly to this space
(using kernels)
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 15
Margin Maximization in SVM
• We know that if we require
• Then the margin is
• Margin maximization can be performed by
reducing the norm of the w vector
2
w
 =
wTx(i) + b ≥ 1 if yi = 1
wTx(i) + b ≤ -1 if yi = -1
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 16
SVM as an Optimization problem
• We can present SVM as the following
optimization problem
( )
( )
2
1 1
1 1
max
w
. .
w x , s.t.
w x , s.t.
i
T
i
i
T
i
s t
b i y
b i y
 =
+  −  = −
+  +  = +
( )
( )
1
2
1
min w w
. .
w x
T
i
T
i
s t
y b
 =
+  +
OR
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
-0.5 0 0.5 1 1.5
-0.5
0
0.5
1
1.5
-3
-2
-1
0
1
2
3
4
Example: Solution of the OR problem
17
One of the 3 SVs
Optimal separating boundary
0 0 1 1
0 1 0 1
1 1 1 1
4 2 2 0
2
2
1
i
i
y
b

       
=        
       
= − + + +
= + + +
 
=  
 
= −
*
*
x
w
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 18
Handling Non-Separable Patterns
• All the derivation till now was based on the
requirement that the data be linearly
separable
– Practical Problems are Non-Separable
• Non-separable data can be handled by
relaxing the constraint
• This is achieved as
( ) 1
w x
T
i
y b
+ 
( )
( )
( )
( )
1 1
1
1 1
0
w x
w x
w x
i
T
i
i i T
i i
i
T
i i
i
b y
y b
b y





+  −  = + 
+  −

+  − +  = − 


Slack Variables
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 19
Soft Margin SVM as an Optimization Problem
• The overall optimization problem can be
written as
( )
( )
( )
( )
1
1
1
2
1
2
1
0
1 0
0
w,
w,
min
w x
m
w w
. .
w w
in
w
. x
.
N
T
i
b
i
i
T
i i
i
i
T
i i
i
N
T
i
b
i
C
s y
t
C
b
Or
y b
s t






=
=
+
+
+  −

− − + 
− 


Weight of the penalty
due to margin violation
The objective function
now optimizes two
conflicting objectives:
Maximization of the
margin and
minimization of the
margin violations
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 20
Handling Non-Separable Patterns (Soft Margin)
( ) 1
x w x
T
f b
= + = −
x1
x2
( ) 1
x w x
T
f b
= + =
> 0
< 0
( ) 0
x w x
T
f b
= + =
0 1
i
i


−
 
i

+
1
i
i


−

ζi tells us about the
extent of margin
violation of example i
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 21
Example
0 0 1 1 0 8
0 1 0 1 0 5
100
1 1 1 1 1
34 52 82 0 100
2
2
1 65
i
i
C
y
b

         
=          
         
=
= − + + + −
= + + + +
 
=  
 
= −
*
*
.
x
.
w
.
x1
x
2
C = 100
-0.5 0 0.5 1 1.5
-0.5
0
0.5
1
1.5
-3
-2
-1
0
1
2
3
4
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Example…
22
x1
x
2
C = 1
-0.5 0 0.5 1 1.5
-0.5
0
0.5
1
1.5
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
x1
x
2
C = 100
-0.5 0 0.5 1 1.5
-0.5
0
0.5
1
1.5
-3
-2
-1
0
1
2
3
4
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Wanna Play?
• Use the Java Applet at:
• https://www.csie.ntu.edu.tw/~cjlin/libsvm/
• Set “-t 0 -c 100”
23
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
SVMs uptil now
• Vapnik and Chervonenkis:
– Hard SVM (1962)
– Theoretical foundations for SVMs
– Structural Risk Minimization
• Corinna Cortes
– Soft SVM (1995)
• Enter: Bernard Scholkopf (1997)
– Representer Theorem
– Complete Kernel trick!
– Kernels not only allow nonlinear
boundaries but also allow
representation of non-vectoral data
24
Rosenblatt
1928-1971
V. Vapnik
1936 -
Chervonenkis
1938 - 2014
C. Cortes
1961 -
Scholkopf
1968 -
http://www.svms.org/history.html
R. A. Fisher
1890-1962
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Reading
• Sections 10.1-10.3
• Sections 13.1-13.3
• Alpaydin, Ethem. Introduction to Machine
Learning. Cambridge, Mass. MIT Press, 2010.
25
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Nonlinear Separation through Transformation
• Given a classification problem with a
nonlinear boundary, we can, at times, find a
mapping or transformation of the feature
space which makes the classification problem
linear separable in the transformed space
26
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f(.)
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
Feature space
Input space
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Examples: Transformation
• Let’s define the mapping
– 𝝓
𝒙𝟏
𝒙𝟐
=
𝒙𝟏
𝒙𝟐
𝒙𝟏𝒙𝟐
27
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
XOR: Linear Separability
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Examples: Transformation
– Does this mapping do it?
• 𝝓
𝒙𝟏
𝒙𝟐
=
𝒙𝟏
𝒙𝟐
1
– What about this one?
• 𝝓
𝒙𝟏
𝒙𝟐
= 𝒙𝟏 + 𝒙𝟐 − 1 𝟐
29
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Transformation Examples
• Can you find a transform that makes the following classification problems linear separable?
Can you draw the data points in the new transformed feature space?
30
(I)
(II)
(III)
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
So How can we make a non-linear SVM?
• Transform the data and then apply the SVM!
• But defining transformations is hard
• A mathematical trick allows us to do this
easily!
– Kernel Trick!
31
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Kernelized SVM
• We know that the discriminant function of the
SVM can be written as:
– 𝑓 𝒙 = 𝑤𝑇𝒙 + 𝑏
• The Representer theorem (Scholkopf 2001)
allows us to represent the SVM weight vector
as a linear combination of input vectors with
each example’s contribution weighted by a
factor 𝛼𝑖
– 𝑤 = σ𝑖=1
𝑁
𝛼𝑖𝒙𝑖
32
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Kernelized SVM
• Thus, we can write
– 𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏 = 𝑏 + σ𝑗=1
𝑁
𝛼𝑗𝒙𝑗
𝑇
𝒙
• Notice, that there is a dot product of the test example with
every given input example. Let’s denote the dot product as
𝑘 𝒖, 𝒗 = 𝒖𝑇
𝒗
• This allows to write: 𝑓 𝒙 = 𝑏 + 𝒘𝑇𝒙 = 𝑏 + σ𝑗=1
𝑁
𝛼𝑗𝑘 𝒙𝑗, 𝒙
• Now, 𝒘𝑇𝒘 = σ𝑖=1
𝑁
𝛼𝑖𝒙𝑖
𝑇
σ𝑗=1
𝑁
𝛼𝑗𝒙𝑗 = σ𝑖,𝑗=1
𝑁
𝛼𝑖𝛼𝑗𝑘 𝒙𝑖, 𝒙𝑗
• Thus, the SVM can be rewritten as
33
𝑚𝑖𝑛𝛼,𝑏
1
2
෍
𝑖,𝑗=1
𝑁
𝛼𝑖𝛼𝑗𝑘 𝒙𝑖, 𝒙𝑗 + 𝐶 ෍
𝑖=1
𝑁
𝜉𝑖
Such that for all 𝑖 = 1 … 𝑁 and 𝜉𝑖 ≥ 0 :
𝑦𝑖 𝑏 + ෍
𝑗=1
𝑁
𝛼𝑗𝑘 𝒙𝑗, 𝒙𝑖 ≥ 1 − 𝜉𝑖
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Kernelized SVM
• In this formulation
• The weight vector is not present
• We know 𝑘 𝒙𝑖, 𝒙𝑗 for any two given training examples
• All the dot products have been replaced with 𝑘 𝒙𝑗, 𝒙𝑖
• The optimization solution will be to obtain the 𝜶
• So if we can solve this problem
• So that we can test the patterns using:
34
𝑚𝑖𝑛𝜶,𝑏
1
2
෍
𝑖,𝑗=1
𝑁
𝛼𝑖𝛼𝑗𝑘 𝒙𝑖, 𝒙𝑗 + 𝐶 ෍
𝑖=1
𝑁
𝜉𝑖
Such that for all 𝑖 = 1 … 𝑁 and 𝜉𝑖 ≥ 0 :
𝑦𝑖 ෍
𝑗=1
𝑁
𝛼𝑗𝑘 𝒙𝑗, 𝒙𝑖 + 𝑏 ≥ 1 − 𝜉𝑖
𝑓 𝒙 = 𝑏 + ෍
𝑗=1
𝑁
𝛼𝑗𝑘 𝒙𝑗, 𝒙
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Kernelized SVM
• Now the best thing about this formulation is that we can
replace the regular Euclidean dot product with a dot product
in a transformed space which will allow non-linear
classification!
• It doesn’t really matter what 𝑘 𝑢, 𝑣 is!
• Thus, instead of 𝑘 𝑢, 𝑣 = 𝒖𝑇𝒗
• We can have 𝑘 𝑢, 𝑣 = 𝝓 𝑢 𝑇
𝝓 𝑣
• As a matter of fact, we can define any similarity measure as 𝑘
that satisfies certain properties called “Mercer’s conditions”
• 𝑘 𝒖, 𝒗 is called a kernel function and resulting SVM is called
the kernelized SVM
– It can solve classification problems that are not linearly separable
35
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Kernel Functions ↔ Feature Transformation
• For the XOR problem we defined the transformation
– 𝝓
𝒙𝟏
𝒙𝟐
=
𝒙𝟏
𝒙𝟐
𝒙𝟏𝒙𝟐
• We can thus define a kernel
– 𝑘 𝒙(𝒊)
, 𝒙(𝒋)
= 𝝓 𝒙(𝒊) 𝑻
𝝓 𝒙(𝒋)
= 𝑥1
(𝑖)
𝑥2
(𝑖)
𝑥2
(𝑖)
𝑥2
(𝑖)
𝑥1
(𝑗)
𝑥2
(𝑗)
𝑥2
(𝑖)
𝑥2
(𝑗)
= 𝑥1
(𝑖)
𝑥1
(𝑗)
+ 𝑥2
(𝑖)
𝑥2
(𝑗)
+ 𝑥1
(𝑖)
𝑥2
(𝑖)
𝑥1
(𝑗)
𝑥2
(𝑗)
• Thus, the transformation produces a kernel
• Any valid kernel function, implicitly, transforms the
examples into “some” feature space
– The SVM can now found an optimal separating boundary in that
space
– That boundary can be non-linear in the original space
36
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Some kernel functions
• We can use arbitrary kernels, for example …
– The dot-product kernel
• 𝑲(𝑢, 𝑣) = 𝒖𝑻𝒗
– Feature transform: 𝛟 𝒖 = 𝒖
– The homogenous polynomial kernels
• 𝑲 𝑢, 𝑣 = 𝑢𝑻𝑣
𝒑
, 𝒑 is called degree
– For 𝒑 = 𝟐 and 2D data, 𝒖 =
𝑢1
𝑢2
: 𝛟 𝑢 =
𝑢1
2
𝑢2
2
2𝑢1𝑢2
– The Radial Basis Function (RBF) Kernel
• 𝑲 𝒖, 𝒗 = 𝒆
−
𝒖−𝒗 𝟐
𝟐𝝈𝟐
, 𝝈 controls the spread of the Gaussian
– Feature transform?
• The RBF and Homogenous Polynomial kernels implement non-linear
boundaries
37
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Example
• Suppose we have 5 1D data points
– x1=1, x2=2, x3=4, x4=5, x5=6, with 1, 2, 6 as class 1
and 4, 5 as class 2  y1=1, y2=1, y3=-1, y4=-1, y5=1
• We use the polynomial kernel of degree 2
– K(u,v) = (uv+1)2
– C is set to 100
• We first find i (i=1, …, 5)
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Example
• By using a QP solver, we get
– 1=0, 2=2.5, 3=0, 4=7.333, 5=4.833
– Note that the constraints are indeed satisfied
– The support vectors are {x2=2, x4=5, x5=6}
– The bias turns out to be b = 9
• The discriminant function is
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Example
Value of discriminant function
1 2 4 5 6
class 2 class 1
class 1
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Solving the XOR
• C=1
• With polynomial kernel of degree 2
-1.5
-1
-0.5
0
0.5
1
1.5
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Solving the XOR
• C=1
• With RBF Kernel (sigma = 0.5)
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Solving the XOR
• C=1
• With RBF Kernel (sigma = 0.3)
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
3D Plot
0
10
20
30
40
50
60 0
10
20
30
40
50
60
-1
-0.5
0
0.5
1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Using the SVM
• Read:
• Ben-Hur, Asa, and Jason Weston. 2010. “A User’s Guide
to Support Vector Machines.” In Data Mining
Techniques for the Life Sciences, edited by Oliviero
Carugo and Frank Eisenhaber, 223–39. Methods in
Molecular Biology 609. Humana Press.
http://dx.doi.org/10.1007/978-1-60327-241-4_13
• http://pyml.sourceforge.net/doc/howto.pdf
•
45
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Steps for Feature based Classification
• Prepare the pattern matrix
• Select the kernel function to use
• Select the parameter of the kernel function and
the value of C
– You can use the values suggested by the SVM
software, or you can set apart a validation set to
determine the values of the parameter
• Execute the training algorithm and obtain the i
• Unseen data can be classified using the i and the
support vectors
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Choosing the Kernel Function
• Probably the most tricky part of using SVM.
• The kernel function is important because it creates
the kernel matrix, which summarizes all the data
• In practice, a low degree polynomial kernel or RBF
kernel with a reasonable width is a good initial try
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Choosing C
• Cross-validation
– To assess how good a classifier is we can use
cross-validation
• Divide the data randomly into k parts
– If the data is imbalanced, use stratified sampling
• Use k-1 parts for training
• And the held-out part for testing to evaluate accuracy
or ROC curve or other performance metrics
• To choose C, you can do nested cross-
validation
48
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Handling data imbalance
• If the data is imbalanced (too much of one
class and only a small number of examples
from the other)
– You can set an individual C for each example
– Can also be used to reflect a priori knowledge
49
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Strengths and Weaknesses of SVM
• Strengths
– Only a few training points determine the final boundary
• Support Vectors
– Margin maximization and kernelized
– Training is relatively easy
• No local optimal, unlike in neural networks
– It scales relatively well to high dimensional data
– Tradeoff between classifier complexity and error can be controlled
explicitly (through C)
– Non-traditional data like strings and trees can be used as input to
SVM, instead of feature vectors
• Weaknesses
– Need to choose a “good” kernel function.
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Advantages of kernels
• Once we replace the dot product with a kernel
function (i.e., perform the kernel trick or ‘kernelize’
the formulation), the SVM formulation no longer
requires any features!
• As long as you have a kernel function, everything
works
– Remember a kernel function is simply a mapping
from two examples to a scalar
• Tells us how similar the two examples are to each other
51
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
But how is that an advantage?
• When the number of dimensions is very large, an implicit
representation through a kernel is helpful
• Let’s say we have a document classification problem
– We define a M-dimensional feature vector for each
document that indicates if it has any of the pre-specified
number of words in it (1) or not (0)
– M can be very large (equal to the size of the dictionary)
– The dot product of two feature vectors is equal to the
number of common words between the two documents
– Why not simply count the number of words?
• We can now do that with the kernel trick
52
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Sufficient Conditions for a kernel function
• Definition of an inner product
– Symmetry: 𝒙, 𝒚 = 𝒚, 𝒙
– Linearity: 𝛼𝒙 + 𝛽𝒚, 𝒛 = 𝛼 𝒙, 𝒛 + 𝛽 𝒚, 𝒛
• Principle of superposition
– Positive Definiteness
• 𝒙, 𝒙 ≥ 𝟎
• 𝒙, 𝒙 = 𝟎 iff 𝒙 = 𝟎
• These conditions need to be satisfied by the
kernel function too
– The kernel matrix must be symmetric, positive semi-
definite
– This requirement is called the Mercer’s condition
53
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
Kernel Construction
54
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab
SVM in Scikit Learn
• http://scikit-
learn.org/stable/modules/svm.html
55
CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 56
End of Lecture
We want to make a machine that will be
proud of us.
- Danny Hillis

More Related Content

What's hot

2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVMHonglin Yu
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classificationYiwei Chen
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 
A Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RA Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RAuro Tripathy
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineAboul Ella Hassanien
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenZhuyi Xue
 
Boosted tree
Boosted treeBoosted tree
Boosted treeZhuyi Xue
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Ganesan Narayanasamy
 
When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)Shujun Li
 

What's hot (20)

Ch1 2 (2)
Ch1 2 (2)Ch1 2 (2)
Ch1 2 (2)
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
A Simple Review on SVM
A Simple Review on SVMA Simple Review on SVM
A Simple Review on SVM
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
presentazione
presentazionepresentazione
presentazione
 
Web_Alg_Project
Web_Alg_ProjectWeb_Alg_Project
Web_Alg_Project
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
A Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RA Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With R
 
A BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machineA BA-based algorithm for parameter optimization of support vector machine
A BA-based algorithm for parameter optimization of support vector machine
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
Boosted tree
Boosted treeBoosted tree
Boosted tree
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117
 
When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)When Discrete Optimization Meets Multimedia Security (and Beyond)
When Discrete Optimization Meets Multimedia Security (and Beyond)
 

Similar to Afsar ml applied_svm

background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개r-kor
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksKevin Lee
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the thesanjaibalajeessn
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learningSamGuy7
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMarijn van Zelst
 
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Codemotion
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.pptMahimMajee
 
Support vector machine
Support vector machineSupport vector machine
Support vector machinePrasenjit Dey
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdfanandsimple
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with TensorflowShubham Sharma
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deepKNaveenKumarECE
 
lawper.ppt
lawper.pptlawper.ppt
lawper.pptHiuLXun4
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_finalNoha Elprince
 

Similar to Afsar ml applied_svm (20)

background.pptx
background.pptxbackground.pptx
background.pptx
 
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
RUCK 2017 MxNet과 R을 연동한 딥러닝 소개
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
 
Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
Support Vector Machine.ppt
Support Vector Machine.pptSupport Vector Machine.ppt
Support Vector Machine.ppt
 
svm.ppt
svm.pptsvm.ppt
svm.ppt
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learning
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspective
 
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...Machine learning from a software engineer's perspective - Marijn van Zelst - ...
Machine learning from a software engineer's perspective - Marijn van Zelst - ...
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Data Science and Machine Learning with Tensorflow
 Data Science and Machine Learning with Tensorflow Data Science and Machine Learning with Tensorflow
Data Science and Machine Learning with Tensorflow
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
lawper.ppt
lawper.pptlawper.ppt
lawper.ppt
 
Noha danms13 talk_final
Noha danms13 talk_finalNoha danms13 talk_final
Noha danms13 talk_final
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 

Afsar ml applied_svm

  • 1. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Support Vector Machines Dr. Fayyaz ul Amir Afsar Minhas PIEAS Biomedical Informatics Research Lab Department of Computer and Information Sciences Pakistan Institute of Engineering & Applied Sciences PO Nilore, Islamabad, Pakistan http://faculty.pieas.edu.pk/fayyaz/
  • 2. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 2 Classification • Before moving on with the discussion let us restrict ourselves to the following problem – T = Given Training Set = {(x(i),yi), i = 1…N} • x(i)ε Rm {Data Point i } • yi: class of data point i (+1 or -1) x1 x2 +1 -1
  • 3. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 3 Use of Linear Discriminant in Classification • Classifiers such as the Single Layer Perceptron (with linear activation function) and SVM use a linear discriminant function to differentiate between patterns of different classes • The linear discriminant function is given by ( ) ( ) ( ) sgn x sgn w x T f b = + x1 x2 Linear Discriminant Function ( ) 0 x w x T f b = + = > 0 < 0 w & b are ‘learned’ from the training data using some error criterion
  • 4. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 4 Use of Linear Discriminant in Classification • There are a large number of lines (or in general ‘hyperplanes’) separating the two classes x1 x2 ( ) 0 x w x T f b = + = > 0 < 0 Which separator is the best?
  • 5. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 5 Use of Linear Discriminant in Classification x1 x2 > 0 < 0 Misclassified as -1 by the ‘red’ discriminant The boundary which lies at the maximum distance from data points of both classes gives better tolerance to noise and better generalization
  • 6. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 6 Margin of a linear classifier • The width by which the boundary of a linear classifier can be increased before hitting a data point is called the margin of the linear classifier x1 x2 > 0 < 0 x1 x2 > 0 < 0 Linear Classifiers with larger margins are better
  • 7. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 7 Support Vector Machines (SVM) • Support Vector Machines are linear classifiers that produce the optimal separating boundary (hyper-plane) – Find w and b in a way so as to maximize the margin while classifying all the training patterns correctly (for linearly separable problem)
  • 8. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 8 Finding Margin of a Linear Classifier • Consider a linear classifier with the boundary • We know that the vector w is perpendicular to the boundary – Consider two points x(1) and x(2) on the boundary > 0 x1 x2 < 0 ( ) 0 x w x T f b = + = ( ) 0 x w x for all x on the boundary T f b = + = ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 2 2 2 1 2 1 0 1 0 2 1 2 0 x w x x w x Subtracting from w x x w x x T T T f b f b = + = = + = − =  ⊥ − x(1) x(2)
  • 9. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 9 Finding Margin of a Linear Classifier • Let x(s) be a point in the feature space with its projection x(p) on the boundary • We know that, x1 x2 < 0 ( ) 0 x w x T f b = + = ( ) ( ) ( ) 0 x w x p p T f b = + = x(s) x(p) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 s p s s T p T p T T p T T s ˆ x x w x w x ˆ w x w ˆ w x w w w w x w w w w x w r f b r b r b b r r r f r = +  = + = + + = + + = + + = + =  = w r Perpendicular Distance of a point from a line
  • 10. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Example • Consider the line – x1+2x2+3 = 0 • The distance of (4,2) is – r = 4.92 10 -5 -4 -3 -2 -1 0 1 2 3 4 5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 x1 x 2
  • 11. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 11 Finding Margin of a Linear Classifier • The points in the training set that lie closest (having minimum perpendicular distance) to the separating hyper-plane are called support vectors • Margin is twice of perpendicular distances of the hyper-plane from the nearest support vector of the two classes x1 x2 > 0 < 0 Nearest Support Vector ( ) 2 2 * x w f r  = =
  • 12. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Geometric vs. Functional Margin • Functional Margin – This gives the position of the point with respect to the plane, which does not depend on the magnitude. • Geometric Margin – This gives the distance between the given training example and the given plane. 12 x1 x2 > 0 < 0
  • 13. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 13 Finding Margin of a Linear Classifier • Assume that all data is at least distance 1 from the hyperplane, then the following two constraints follow for a training set {(x(i) ,yi)} wTx(i) + b ≥ 1 if yi = 1 wTx(i) + b ≤ -1 if yi = -1 ( ) 1 2 2 2 2 * x w w w f r     = = = =
  • 14. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 14 Support Vector Machines • Support Vector Machines, in their basic form, are linear classifiers that are trained in a way so as to maximize the margin • Principles of Operation – Define what an optimal hyper-plane is (in way that can be identified in a computationally efficient way) • Maximize margin – Allows noise tolerance – Extend the above definition for non-linearly separable problems • have a penalty term for misclassifications – Map data to an alternate space where it is easier to classify with linear decision surfaces • reformulate problem so that data is mapped implicitly to this space (using kernels)
  • 15. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 15 Margin Maximization in SVM • We know that if we require • Then the margin is • Margin maximization can be performed by reducing the norm of the w vector 2 w  = wTx(i) + b ≥ 1 if yi = 1 wTx(i) + b ≤ -1 if yi = -1
  • 16. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 16 SVM as an Optimization problem • We can present SVM as the following optimization problem ( ) ( ) 2 1 1 1 1 max w . . w x , s.t. w x , s.t. i T i i T i s t b i y b i y  = +  −  = − +  +  = + ( ) ( ) 1 2 1 min w w . . w x T i T i s t y b  = +  + OR
  • 17. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab -0.5 0 0.5 1 1.5 -0.5 0 0.5 1 1.5 -3 -2 -1 0 1 2 3 4 Example: Solution of the OR problem 17 One of the 3 SVs Optimal separating boundary 0 0 1 1 0 1 0 1 1 1 1 1 4 2 2 0 2 2 1 i i y b          =                 = − + + + = + + +   =     = − * * x w
  • 18. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 18 Handling Non-Separable Patterns • All the derivation till now was based on the requirement that the data be linearly separable – Practical Problems are Non-Separable • Non-separable data can be handled by relaxing the constraint • This is achieved as ( ) 1 w x T i y b +  ( ) ( ) ( ) ( ) 1 1 1 1 1 0 w x w x w x i T i i i T i i i T i i i b y y b b y      +  −  = +  +  −  +  − +  = −    Slack Variables
  • 19. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 19 Soft Margin SVM as an Optimization Problem • The overall optimization problem can be written as ( ) ( ) ( ) ( ) 1 1 1 2 1 2 1 0 1 0 0 w, w, min w x m w w . . w w in w . x . N T i b i i T i i i i T i i i N T i b i C s y t C b Or y b s t       = = + + +  −  − − +  −    Weight of the penalty due to margin violation The objective function now optimizes two conflicting objectives: Maximization of the margin and minimization of the margin violations
  • 20. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 20 Handling Non-Separable Patterns (Soft Margin) ( ) 1 x w x T f b = + = − x1 x2 ( ) 1 x w x T f b = + = > 0 < 0 ( ) 0 x w x T f b = + = 0 1 i i   −   i  + 1 i i   −  ζi tells us about the extent of margin violation of example i
  • 21. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 21 Example 0 0 1 1 0 8 0 1 0 1 0 5 100 1 1 1 1 1 34 52 82 0 100 2 2 1 65 i i C y b            =                     = = − + + + − = + + + +   =     = − * * . x . w . x1 x 2 C = 100 -0.5 0 0.5 1 1.5 -0.5 0 0.5 1 1.5 -3 -2 -1 0 1 2 3 4
  • 22. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Example… 22 x1 x 2 C = 1 -0.5 0 0.5 1 1.5 -0.5 0 0.5 1 1.5 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 x1 x 2 C = 100 -0.5 0 0.5 1 1.5 -0.5 0 0.5 1 1.5 -3 -2 -1 0 1 2 3 4
  • 23. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Wanna Play? • Use the Java Applet at: • https://www.csie.ntu.edu.tw/~cjlin/libsvm/ • Set “-t 0 -c 100” 23
  • 24. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab SVMs uptil now • Vapnik and Chervonenkis: – Hard SVM (1962) – Theoretical foundations for SVMs – Structural Risk Minimization • Corinna Cortes – Soft SVM (1995) • Enter: Bernard Scholkopf (1997) – Representer Theorem – Complete Kernel trick! – Kernels not only allow nonlinear boundaries but also allow representation of non-vectoral data 24 Rosenblatt 1928-1971 V. Vapnik 1936 - Chervonenkis 1938 - 2014 C. Cortes 1961 - Scholkopf 1968 - http://www.svms.org/history.html R. A. Fisher 1890-1962
  • 25. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Reading • Sections 10.1-10.3 • Sections 13.1-13.3 • Alpaydin, Ethem. Introduction to Machine Learning. Cambridge, Mass. MIT Press, 2010. 25
  • 26. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Nonlinear Separation through Transformation • Given a classification problem with a nonlinear boundary, we can, at times, find a mapping or transformation of the feature space which makes the classification problem linear separable in the transformed space 26 f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f(.) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) Feature space Input space
  • 27. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Examples: Transformation • Let’s define the mapping – 𝝓 𝒙𝟏 𝒙𝟐 = 𝒙𝟏 𝒙𝟐 𝒙𝟏𝒙𝟐 27
  • 28. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab XOR: Linear Separability
  • 29. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Examples: Transformation – Does this mapping do it? • 𝝓 𝒙𝟏 𝒙𝟐 = 𝒙𝟏 𝒙𝟐 1 – What about this one? • 𝝓 𝒙𝟏 𝒙𝟐 = 𝒙𝟏 + 𝒙𝟐 − 1 𝟐 29
  • 30. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Transformation Examples • Can you find a transform that makes the following classification problems linear separable? Can you draw the data points in the new transformed feature space? 30 (I) (II) (III)
  • 31. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab So How can we make a non-linear SVM? • Transform the data and then apply the SVM! • But defining transformations is hard • A mathematical trick allows us to do this easily! – Kernel Trick! 31
  • 32. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Kernelized SVM • We know that the discriminant function of the SVM can be written as: – 𝑓 𝒙 = 𝑤𝑇𝒙 + 𝑏 • The Representer theorem (Scholkopf 2001) allows us to represent the SVM weight vector as a linear combination of input vectors with each example’s contribution weighted by a factor 𝛼𝑖 – 𝑤 = σ𝑖=1 𝑁 𝛼𝑖𝒙𝑖 32
  • 33. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Kernelized SVM • Thus, we can write – 𝑓 𝒙 = 𝒘𝑇𝒙 + 𝑏 = 𝑏 + σ𝑗=1 𝑁 𝛼𝑗𝒙𝑗 𝑇 𝒙 • Notice, that there is a dot product of the test example with every given input example. Let’s denote the dot product as 𝑘 𝒖, 𝒗 = 𝒖𝑇 𝒗 • This allows to write: 𝑓 𝒙 = 𝑏 + 𝒘𝑇𝒙 = 𝑏 + σ𝑗=1 𝑁 𝛼𝑗𝑘 𝒙𝑗, 𝒙 • Now, 𝒘𝑇𝒘 = σ𝑖=1 𝑁 𝛼𝑖𝒙𝑖 𝑇 σ𝑗=1 𝑁 𝛼𝑗𝒙𝑗 = σ𝑖,𝑗=1 𝑁 𝛼𝑖𝛼𝑗𝑘 𝒙𝑖, 𝒙𝑗 • Thus, the SVM can be rewritten as 33 𝑚𝑖𝑛𝛼,𝑏 1 2 ෍ 𝑖,𝑗=1 𝑁 𝛼𝑖𝛼𝑗𝑘 𝒙𝑖, 𝒙𝑗 + 𝐶 ෍ 𝑖=1 𝑁 𝜉𝑖 Such that for all 𝑖 = 1 … 𝑁 and 𝜉𝑖 ≥ 0 : 𝑦𝑖 𝑏 + ෍ 𝑗=1 𝑁 𝛼𝑗𝑘 𝒙𝑗, 𝒙𝑖 ≥ 1 − 𝜉𝑖
  • 34. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Kernelized SVM • In this formulation • The weight vector is not present • We know 𝑘 𝒙𝑖, 𝒙𝑗 for any two given training examples • All the dot products have been replaced with 𝑘 𝒙𝑗, 𝒙𝑖 • The optimization solution will be to obtain the 𝜶 • So if we can solve this problem • So that we can test the patterns using: 34 𝑚𝑖𝑛𝜶,𝑏 1 2 ෍ 𝑖,𝑗=1 𝑁 𝛼𝑖𝛼𝑗𝑘 𝒙𝑖, 𝒙𝑗 + 𝐶 ෍ 𝑖=1 𝑁 𝜉𝑖 Such that for all 𝑖 = 1 … 𝑁 and 𝜉𝑖 ≥ 0 : 𝑦𝑖 ෍ 𝑗=1 𝑁 𝛼𝑗𝑘 𝒙𝑗, 𝒙𝑖 + 𝑏 ≥ 1 − 𝜉𝑖 𝑓 𝒙 = 𝑏 + ෍ 𝑗=1 𝑁 𝛼𝑗𝑘 𝒙𝑗, 𝒙
  • 35. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Kernelized SVM • Now the best thing about this formulation is that we can replace the regular Euclidean dot product with a dot product in a transformed space which will allow non-linear classification! • It doesn’t really matter what 𝑘 𝑢, 𝑣 is! • Thus, instead of 𝑘 𝑢, 𝑣 = 𝒖𝑇𝒗 • We can have 𝑘 𝑢, 𝑣 = 𝝓 𝑢 𝑇 𝝓 𝑣 • As a matter of fact, we can define any similarity measure as 𝑘 that satisfies certain properties called “Mercer’s conditions” • 𝑘 𝒖, 𝒗 is called a kernel function and resulting SVM is called the kernelized SVM – It can solve classification problems that are not linearly separable 35
  • 36. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Kernel Functions ↔ Feature Transformation • For the XOR problem we defined the transformation – 𝝓 𝒙𝟏 𝒙𝟐 = 𝒙𝟏 𝒙𝟐 𝒙𝟏𝒙𝟐 • We can thus define a kernel – 𝑘 𝒙(𝒊) , 𝒙(𝒋) = 𝝓 𝒙(𝒊) 𝑻 𝝓 𝒙(𝒋) = 𝑥1 (𝑖) 𝑥2 (𝑖) 𝑥2 (𝑖) 𝑥2 (𝑖) 𝑥1 (𝑗) 𝑥2 (𝑗) 𝑥2 (𝑖) 𝑥2 (𝑗) = 𝑥1 (𝑖) 𝑥1 (𝑗) + 𝑥2 (𝑖) 𝑥2 (𝑗) + 𝑥1 (𝑖) 𝑥2 (𝑖) 𝑥1 (𝑗) 𝑥2 (𝑗) • Thus, the transformation produces a kernel • Any valid kernel function, implicitly, transforms the examples into “some” feature space – The SVM can now found an optimal separating boundary in that space – That boundary can be non-linear in the original space 36
  • 37. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Some kernel functions • We can use arbitrary kernels, for example … – The dot-product kernel • 𝑲(𝑢, 𝑣) = 𝒖𝑻𝒗 – Feature transform: 𝛟 𝒖 = 𝒖 – The homogenous polynomial kernels • 𝑲 𝑢, 𝑣 = 𝑢𝑻𝑣 𝒑 , 𝒑 is called degree – For 𝒑 = 𝟐 and 2D data, 𝒖 = 𝑢1 𝑢2 : 𝛟 𝑢 = 𝑢1 2 𝑢2 2 2𝑢1𝑢2 – The Radial Basis Function (RBF) Kernel • 𝑲 𝒖, 𝒗 = 𝒆 − 𝒖−𝒗 𝟐 𝟐𝝈𝟐 , 𝝈 controls the spread of the Gaussian – Feature transform? • The RBF and Homogenous Polynomial kernels implement non-linear boundaries 37
  • 38. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Example • Suppose we have 5 1D data points – x1=1, x2=2, x3=4, x4=5, x5=6, with 1, 2, 6 as class 1 and 4, 5 as class 2  y1=1, y2=1, y3=-1, y4=-1, y5=1 • We use the polynomial kernel of degree 2 – K(u,v) = (uv+1)2 – C is set to 100 • We first find i (i=1, …, 5)
  • 39. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Example • By using a QP solver, we get – 1=0, 2=2.5, 3=0, 4=7.333, 5=4.833 – Note that the constraints are indeed satisfied – The support vectors are {x2=2, x4=5, x5=6} – The bias turns out to be b = 9 • The discriminant function is
  • 40. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Example Value of discriminant function 1 2 4 5 6 class 2 class 1 class 1
  • 41. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Solving the XOR • C=1 • With polynomial kernel of degree 2 -1.5 -1 -0.5 0 0.5 1 1.5
  • 42. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Solving the XOR • C=1 • With RBF Kernel (sigma = 0.5) -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
  • 43. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Solving the XOR • C=1 • With RBF Kernel (sigma = 0.3) -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
  • 44. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 3D Plot 0 10 20 30 40 50 60 0 10 20 30 40 50 60 -1 -0.5 0 0.5 1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
  • 45. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Using the SVM • Read: • Ben-Hur, Asa, and Jason Weston. 2010. “A User’s Guide to Support Vector Machines.” In Data Mining Techniques for the Life Sciences, edited by Oliviero Carugo and Frank Eisenhaber, 223–39. Methods in Molecular Biology 609. Humana Press. http://dx.doi.org/10.1007/978-1-60327-241-4_13 • http://pyml.sourceforge.net/doc/howto.pdf • 45
  • 46. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Steps for Feature based Classification • Prepare the pattern matrix • Select the kernel function to use • Select the parameter of the kernel function and the value of C – You can use the values suggested by the SVM software, or you can set apart a validation set to determine the values of the parameter • Execute the training algorithm and obtain the i • Unseen data can be classified using the i and the support vectors
  • 47. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Choosing the Kernel Function • Probably the most tricky part of using SVM. • The kernel function is important because it creates the kernel matrix, which summarizes all the data • In practice, a low degree polynomial kernel or RBF kernel with a reasonable width is a good initial try
  • 48. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Choosing C • Cross-validation – To assess how good a classifier is we can use cross-validation • Divide the data randomly into k parts – If the data is imbalanced, use stratified sampling • Use k-1 parts for training • And the held-out part for testing to evaluate accuracy or ROC curve or other performance metrics • To choose C, you can do nested cross- validation 48
  • 49. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Handling data imbalance • If the data is imbalanced (too much of one class and only a small number of examples from the other) – You can set an individual C for each example – Can also be used to reflect a priori knowledge 49
  • 50. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Strengths and Weaknesses of SVM • Strengths – Only a few training points determine the final boundary • Support Vectors – Margin maximization and kernelized – Training is relatively easy • No local optimal, unlike in neural networks – It scales relatively well to high dimensional data – Tradeoff between classifier complexity and error can be controlled explicitly (through C) – Non-traditional data like strings and trees can be used as input to SVM, instead of feature vectors • Weaknesses – Need to choose a “good” kernel function.
  • 51. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Advantages of kernels • Once we replace the dot product with a kernel function (i.e., perform the kernel trick or ‘kernelize’ the formulation), the SVM formulation no longer requires any features! • As long as you have a kernel function, everything works – Remember a kernel function is simply a mapping from two examples to a scalar • Tells us how similar the two examples are to each other 51
  • 52. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab But how is that an advantage? • When the number of dimensions is very large, an implicit representation through a kernel is helpful • Let’s say we have a document classification problem – We define a M-dimensional feature vector for each document that indicates if it has any of the pre-specified number of words in it (1) or not (0) – M can be very large (equal to the size of the dictionary) – The dot product of two feature vectors is equal to the number of common words between the two documents – Why not simply count the number of words? • We can now do that with the kernel trick 52
  • 53. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Sufficient Conditions for a kernel function • Definition of an inner product – Symmetry: 𝒙, 𝒚 = 𝒚, 𝒙 – Linearity: 𝛼𝒙 + 𝛽𝒚, 𝒛 = 𝛼 𝒙, 𝒛 + 𝛽 𝒚, 𝒛 • Principle of superposition – Positive Definiteness • 𝒙, 𝒙 ≥ 𝟎 • 𝒙, 𝒙 = 𝟎 iff 𝒙 = 𝟎 • These conditions need to be satisfied by the kernel function too – The kernel matrix must be symmetric, positive semi- definite – This requirement is called the Mercer’s condition 53
  • 54. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab Kernel Construction 54
  • 55. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab SVM in Scikit Learn • http://scikit- learn.org/stable/modules/svm.html 55
  • 56. CIS 621: Machine Learning PIEAS Biomedical Informatics Research Lab 56 End of Lecture We want to make a machine that will be proud of us. - Danny Hillis