Svm V SVC

Support Vector Machine
By: Amr Koura

Agenda
● Definition.
● Kernel Functions.
● Optimization Problem.
● Soft Margin Hyperplanes.
● V-SVC.
● SMO algorithm.
● Demo.

Definition
● Supervised learning model with associated
learning algorithms that analyze and recognize
patterns.
● Application:
- Machine learning.
- Pattern recognition.
- classification and regression analysis.

Binary Classifier
● Given set of Points P={ such that
and } .
build model that assign new example to
( X i ,Y i) X i ∈R
d
Y i ∈{−1,1}
{−1,1}

Question
● What if the examples are not linearly
separable?
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex8/ex8.html

Kernel Function
● SVM can efficiently perform non linear
classification using Kernel trick.
● Kernel trick map the input into high dimension
space where the examples become linearly
separable.

Kernel Function
https://en.wikipedia.org/wiki/Support_vector_machine

Kernel Function
● Linear Kernel.
● Polynomial Kernel.
● Gaussian RBF Kernel.
● Sigmoid Kernel.

Linear Kernel Function
● K(X,Y)=<X,Y>
Dot product between X,Y.

Polynomial Kernel Function
Where d: degree of polynomial, and c is free
parameter trade off between the influence of
higher and lower order terms in polynomials.
k ( X ,Y )=(γ∗< X ,Y > + c)
d

Gaussian RBF Kernel
Where denote square euclidean
distance.
Other form:
k ( X ,Y )=exp(
∣∣X −Y∣∣
2
−2∗σ
)
∣∣X −Y∣∣
2
k ( X ,Y )=exp(−¿ γ∗∣∣X −Y∣∣
2
)

Sigmoid Kernel Function
Where is scaling factor and r is shifting
parameter.
k ( X ,Y )=tanh(γ∗< X ,Y > + r)
γ

Optimization Problem
● Need to find hyperplane with maximum margin.
https://en.wikipedia.org/wiki/Support_vector_machine

● Distance between two hyperplanes = .
● Goal:
1- minimize ||W||.
2- prevent points to fall into margin.
● Constraint:
and
together:
, st:
2
∣∣W∣∣
W.X i−b≥1 forY i=1 W.X i−b≤−1 forY i=−1
yi (W.X i−b)≥1 for 1≤i≤nmin(W ,b)
∣∣W∣∣

● Mathematically convenient:
, st:
● By Lagrange multiplier , the problem become
quadratic optimization problem.
arg min(W ,b)
1
2
∣∣W∣∣
2
yi (W.X i−b)≥1
arg min(W ,b) max(α> 0)
1
2
∣∣W∣∣
2
−∑
i=1
n
αi [ yi (W.X i−b)−1]

● The solution can be expressed in linear
combination of :
.
for these points in support vector.
X i
W =∑
1
n
αi Y i X i
αi≠0

Optimization problem
● The QP is solved iff:
1) KKT conditions are fulfilled for every
example.
2) is semi definite positive.
● KKT conditions are:
Qi , j= yi∗y j∗k ( ⃗X i∗ ⃗X j)
αi=0⇒ yi∗ f ( ⃗xi )⩾1
0< αi< C ⇒ yi∗ f (⃗xi)⩾1
αi=C ⇒ yi∗ f ( ⃗xi )⩽1

Soft Margin Hyperplanes
● The soft margin hyperplanes will choose a
hyperplane that splits the examples as cleanly
as possible with maximum margin.
● Non slack variable , measure the degree of
misclassification.
ξi

Learning with Kernels , by: scholkopf

● The optimization problem:
, st: , .
using Lagrange multiplier:
st: ,
arg min(W ,ξ ,b)
1
2
∣∣W∣∣
2
+
C
n
∑
1
n
ξi yi (W.X i+ b)≥1−ξi ξi≥0
∑
i=1
n
αi yi=0
W (α)=∑
i=0
n
αi−
1
2
∑
i , j=1
n
αi α j yi y j k (xi , x j)
0≤αi≤
C
n

● C is essentially a regularisation parameter,
which controls the trade-off between achieving
a low error on the training data and minimising
the norm of the weights.
● After the Optimizer computes , the W can be
computed as
αi
W =∑
1
n
X i Y i αi

V-SVC
● In previous formula , C variable was tradeoff
between (1) minimizing training errors
(2)maximizing margin.
● Replace C by parameter V, control number of
margin errors and support vectors.
● V is upper bound of training error rate.

V-SVC
● The optimization problem become:
,st:
, and .
minimize(W ,ξ ,ρ)
1
2
∣∣W∣∣
2
−V ρ+
1
n
∑
1
n
ξi
yi (W.X i+ b)≥ρ−ξi ξi≥0 ρi≥0

V-SVC
● Using Lagrange multiplier:
St:
, and
and decision function f(X)=
minimizeα∈Rd W (α)=−
1
2
∑
i , j=1
n
αi α j Y i Y j k ( X i , X j)
0≤αi≤
1
n
∑
i=1
n
αi Y i=0 ∑
i=1
n
αi≥V
sgn(∑
i=1
n
αi yi k ( X , X i)+ b)

SMO Algorithm
● Sequential Minimal Optimization algorithm used
to solve quadratic programming problem.
● Algorithm:
1- select pair of examples “details are coming”.
2- optimize target function with respect to
selected pair analytically.
3- repeat until the selected pairs “step 1” is
optimized or number of iteration exceed user
defined input.

SMO Algorithm
2-optimize target function with respect to
selected pair analytically.
- the update on value of and depends on
the difference between the approximation error
in and .
X =Kii+ K jj−2Y i Y j Kij
αi
α j
αi α j

Solve for two Lagrange multipliers
http://research.microsoft.com/pubs/68391/smo-book.pdf

http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf

double X = Kii+Kjj+2*Kij;
double delta = (-G[i]-G[j])/X;
double diff = alpha[i] - alpha[j];
alpha[i] += delta; alpha[j] += delta;
if(region I):
alpha[i] = C_i; alpha[j] = C_i – diff;
if(region II):
alpha[j] = C_j; alpha[i] = C_j + diff;
if(region III):
alpha[j] = 0;alpha[i] = diff;
If (region IV):
alpha[i] = 0;alpha[j] = -diff;

SMO Algorithm
● 1- select pair of examples:
we need to find pair (i,j) where the difference
between classification error is maximum.
The pair is optimal if the difference between
classification error is less than
(( f (xi)− yi)−( f (x j)− y j))
2
ξ

SMO Algorithm
1- select pair of examples “Continue”:
Define the following variables:
(Max difference) (min difference)
I0={i ,αi=0,αi ∈(0,Ci)}
I+ ,0={i ,αi=0, yi=1} I+ ,C={i ,αi=Ci , yi=1}
I−,0={i ,αi=0, yi=−1} I−,C={i ,αi=Ci , yi=−1}
maxi∈{I0∪I+ ,0∪I−,c} f (xi)− yi
min j∈{I 0∪I−,0∪I+ ,c } f (x j)− y j

SMO algorithm complexity
● Memory complexity: no additional matrix is
required to solve the problem. Only 2*2 Matrix
is required in each iteration.
● Memory complexity is linear on training data set
size.
● SMO algorithm is scaled between linear and
quadratic in the size of training data size.

Svm V SVC

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to Svm V SVC

Similar to Svm V SVC (20)

Recently uploaded

Recently uploaded (20)

Svm V SVC