SlideShare a Scribd company logo
1 of 27
Download to read offline
A Simple Review on SVM
Honglin Yu
Australian National University, NICTA
September 2, 2013
Outline
1 The Tutorial Routine
Overview
Linear SVC in Separable Case: Largest Margin Classifier
Soft Margin
Solving SVM
Kernel Trick and Non-linear SVM
2 Some Topics
Why the Name: Support Vectors?
Why SVC Works Well: A Simple Example
Relation with Logistic Regression etc.
3 Packages
The Tutorial Routine Some Topics Packages
Overview
SVM (Support Vector Machines) are supervised learning
methods
It includes both methods for classification and regression
In this talk, we focus on binary classifications.
The Tutorial Routine Some Topics Packages
Symbols
training data: (x1, y1), ..., (xm, ym) ∈ X × {±1}
patterns: xi , i = 1, 2, ..., m
pattern space: X
targets: yi , i = 1, 2, ..., m
features: xi = Φ(xi )
feature space: H
feature mapping: Φ : X → H
The Tutorial Routine Some Topics Packages
Separable Case: Largest Margin Classifier
Figure: Simplest Case
“Separable” means: ∃ line
w · x + b = 0 correctly separates all
the training data.
“Margin”: d+ + d−
(d± = min
yi =±1
dist(xi , w · x + b = 0))
In this case, the SVC just looks for
a line maximizing the margins.
The Tutorial Routine Some Topics Packages
Separable Case: Largest Margin Classifier
Another form of expressing separable: yi (w · xi + b) > 0
Because the training data is finite, ∃ , yi (w · xi + b) ≥
This is equivalent to yi (w
· xi + b
) ≥ 1
w · xi + b = 0 and w
· xi + b
= 0 are same line.
We can directly write the constraints as yi (w · xi + b) ≥ 1
This removes the scaling redundancy in w, b
The Tutorial Routine Some Topics Packages
We also want the separating plane to place in the middle
(which means d+ = d−).
So the optimization problem can be formulated as
arg max
w,b
(2 min
x
|w · xi + b|
|w|
)
s.t. yi (w · xi + b) ≥ 1, i = 1, 2, ..., N
(1)
This is equivalent to:
arg min
w,b
|w|2
s.t. yi (w · xi + b) ≥ 1, i = 1, 2, ..., N
(2)
But, until now, it can only be confirmed that Eq.2 is only the
necessary condition of finding the plane we want (correct and
in the middle)
The Tutorial Routine Some Topics Packages
Largest Margin Classifier
It can be proved that, when the data is separable, for the following
problem
min
w,b
1
2
||w||2
s.t. yi · (w · xi + b) ≥ 1, i = 1, ..., m.
(3)
we have,
1 When the ||w|| is minimized, the equality holds for some x.
2 The equality holds at least for some xi , xj where yi yj < 0.
3 Based on 1) and 2) we can calculate that the margin is 2
||w|| ,
so the margin is maximized.
The Tutorial Routine Some Topics Packages
Proof of Previous Slide (Warning: My Proof)
1 If ∃c > 1 that ∀xi , yi · (w · xi + b) ≥ c, then w
c and b
c also
satisfy the constraints and the length is smaller.
2 If not, assume that ∃c > 1,
yi · (w · xi + b) ≥ 1, where yi = 1
yi · (w · xi + b) ≥ c, where yi = −1
(4)
Add c−1
2 to each side where yi = 1, minus c−1
2 to each side
where yi = −1, we can get:
yi · (w · xi + b +
c − 1
2
) ≥
c + 1
2
(5)
Because c+1
2 > 1, similar to 1), the |w| here is not the
smallest
3 Pick x1, x2 where the equality holds and y1y2 < 0, the margin
is just the distance between x1 and the line y2 · (w · x + b) = 1
which can be easily calculated as 2
||w|| .
The Tutorial Routine Some Topics Packages
Non Separable Case
Figure: Non separable case: miss classified points exist
The Tutorial Routine Some Topics Packages
Non Separable Case
Constraints yi (w · xi + b) ≥ 1, i = 1, 2, ..., m can not be
satisfied
Solution: add slack variables ξi , reformulate form problem as,
min
w,b,ξ
1
2
||w||2
+ C
m
i=1
ξi
s.t. yi ((w · xi) + b) ≥ 1 − ξi , i = 1, 2, ..., m
ξi ≥ 0
(6)
Show the trade off (C) between margins ( 1
|w| ) and penalty (ξi ).
The Tutorial Routine Some Topics Packages
Solving SVM: Lagrangian Dual
Constraint optimization → Lagrangian Dual
Primal form:
min
w,b,ξ
1
2
||w||2
+ C
m
i=1
ξi
s.t. yi ((w · xi) + b) ≥ 1 − ξi , i = 1, 2, ..., m
ξi ≥ 0
(7)
The Primal Lagrangian:
L(w, b, ξ, α, µ) =
1
2
||w||2
+C
i
ξi −
i
αi {yi (w·x+b−1−ξi )}−
i
µi ξi
Because [7] is convex, Karush-Kuhn-Tucker conditions hold.
The Tutorial Routine Some Topics Packages
Applying KKT Conditions
Stationarity
∂L
∂w
= 0 → w =
i
αi yi xi
∂L
∂b
= 0 →
i
αi yi = 0
∂L
∂ξ
= 0 → C − αi − µi = 0, ∀i
Primal Feasibility: yi ((w · xi) + b) ≥ 1 − ξi , ∀i
Dual Feasibility: αi ≥ 0, ui ≥ 0
Complementary Slackness, ∀i
µi ξi = 0
αi {yi (w · xi + b) − 1 + ξi } = 0
When αi = 0, corresponding xi is called support vectors
The Tutorial Routine Some Topics Packages
Dual Form
Using the equations derived from KKT conditions, remove
w, b, ξi , µi in the primal form to get the dual form:
min
α
i
αi −
1
2
i,j
αi αj yi yj xT
i xj
s.t.
i
αi yi = 0
C ≥ αi ≥ 0
(8)
And the decision function is:¯y = sign( i αi yi xT
i x + b)
(b = yk − w · xk, ∀k, C > αk > 0)
The Tutorial Routine Some Topics Packages
We Need Nonlinear Classifier
-1
-0.5
0
0.5
1
-1 -0.5 0 0.5 1
Figure: Case that linear classifier can not handle
Finding appropriate form of curves is hard, but we can transform
the data!
The Tutorial Routine Some Topics Packages
Mapping Training Data to Feature Space
Φ(x) = (x, x2)T
Figure: Feature Mapping Helps Classification
To solve nonlinear classification problem, we can define some
mapping Φ : X → H and do linear classification on feature space
H
The Tutorial Routine Some Topics Packages
Recap the Dual Form: An important Fact
Dual form:
min
α
i
αi −
1
2
i,j
αi αj yi yj xT
i xj
s.t.
i
αi yi = 0
C ≥ αi ≥ 0
(9)
Decision function: ¯y = sign( i αi yi xT
i x + b)
To train SVC or use SVC to predict, we only need to know the
inner product between xs!
If we want to apply linear SVC in H, we do NOT need to know
Φ(x), we ONLY need to know k(x, x ) =< Φ(x), Φ(x ) >.
And k(x, x ) is called “kernel function”.
The Tutorial Routine Some Topics Packages
Kernel Functions
The input of kernel function k : X × X → R is two patterns
x, x in X, the output is the canonical inner product between
Φ(x), Φ(x ) in H
By using k(·, ·), we can implicitly transform the data by some
Φ(·) (which is often with infinite dimension) E.g. for
k(x, x ) = (xx + 1)2, Φ(x) = (x2,
√
2x, 1)T
But not for all functions X × X → R, we can find
corresponding Φ(x). Kernel functions should satisfy Mercer’s
conditions
The Tutorial Routine Some Topics Packages
Conditions of Kernel Functions
Necessity: Kernel Matrix K = [k(xi , xj )]m×m must be positive
semidefinite:
tT
Kt =
i,j
ti tj k(xi , xj ) =
i,j
ti tj < Φ(xi ), Φ(xj ) >
=<
i
ti Φ(xi ),
j
tj Φ(xj ) >= |
i
ti Φ(xi )|2
≥ 0
Sufficiency in Continuous Form: Mercer’s Condition:
For any symmetric function k : X × X → R which is square
integrable in X × X, if it satisfies
X×X
k(x, x )f (x)f (x )dxdx ≥ 0 for all f ∈ L2(X)
there exist functions φi : X → R and numbers λi ≥ 0 that,
k(x, x ) =
i
λi φi (x)φi (x ) for all x, x in X
The Tutorial Routine Some Topics Packages
Commonly Used Kernel Functions
Linear Kernel: k(x, x ) = x T x
RBF Kernel: k(x, x ) = e−γ|x−x |2
, for gamma = 1
2 (from
wiki)
Polynomial Kernel: k(x, x ) = (γx T x + r)d , for γ = 1, d = 2
(from wiki)
etc.
The Tutorial Routine Some Topics Packages
Mechanical Analogy
Remember from KKT conditions,
∂L
∂w
= 0 → w =
i
αi yi xi
∂L
∂w
= 0 →
i
αi yi = 0
imagine every support vector xi exerts a force Fi = αi yi
w
|w| on
the “separating plane + margin”, we have,
Forces =
i
αi yi
w
|w|
=
w
|w|
i
αi yi = 0
Torques =
i
xi × (αi yi
w
|w|
) = (
i
αi yi xi ) ×
w
|w|
= w ×
w
|w|
= 0
This is why {xi } are called “support vectors”
The Tutorial Routine Some Topics Packages
Why SVC Works Well
Let’s first consider using linear regression to do classification, the
decision function is ¯y = sign(w · x + b)
Figure: Feature Mapping Helps Classification
In SVM, we only considers about the boundaries
The Tutorial Routine Some Topics Packages
Min-Loss Framework
Primal form:
min
w,b,ξ
1
2
||w||2
+ C
m
i=1
ξi
s.t. yi ((w · xi) + b) ≥ 1 − ξi , i = 1, 2, ..., m
ξi ≥ 0
(10)
Rewrite into min-loss form,
min
w,b,ξ
1
2
||w||2
+ C
m
i=1
max{0, (1 − yi ((w · xi) + b))} (11)
This is called hinge loss.
The Tutorial Routine Some Topics Packages
See C-SVM and LMC from a Unified Direction
Rewriting LMC classifier,
min
w
1
2
||w||2
+
m
i=0
∞ · (sign(1 − y(w · xi + b)) + 1) (12)
Regularised Logistic Regression
(y ∈ {0, 1}, not {−1, 1}, pi = 1
1+e−w·xi
)
min
w
1
2
||w||2
+
m
i=0
−(yi log(pi ) + (1 − yi )log(1 − pi )) (13)
The Tutorial Routine Some Topics Packages
Relation with Logistic Regression etc.
Figure: black:0-1 loss; red: logistic loss (−log( 1
1+e−yi w·x )); blue: hinge
loss; green: quadratic loss.
“0-1 loss” and “hinge loss” are not affected by correctly
classified outliers.
BTW, logistic regression can also be “kernelised”.
The Tutorial Routine Some Topics Packages
Commonly Used Packages
libsvm(liblinear), svmlight and sklearn (python wrap-up of
libsvm)
Code example in sklearn
import numpy as np
X = np . a r r a y ([[ −1 , −1] , [ −2 , −1] , [ 1 , 1 ] , [ 2 , 1 ] ] )
y = np . a r r a y ( [ 1 , 1 , 2 , 2 ] )
from s k l e a r n . svm import SVC
c l f = SVC()
c l f . f i t (X, y )
c l f . p r e d i c t ([[ −0.8 , −1]])
The Tutorial Routine Some Topics Packages
Things Not Covered
Algorithms (SMO, SGD)
Generalisation bound and VC dimension
ν-SVM, one-class SVM etc.
SVR
etc.

More Related Content

What's hot

2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaMacha Pujitha
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machinesbutest
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classificationYiwei Chen
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machinesNawal Sharma
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineSomnathMore3
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr taeseon ryu
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in tradingAashay Harlalka
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03Krish_ver2
 

What's hot (19)

2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Svm
SvmSvm
Svm
 
Support Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using WekaSupport Vector Machine and Implementation using Weka
Support Vector Machine and Implementation using Weka
 
Svm V SVC
Svm V SVCSvm V SVC
Svm V SVC
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 
Lect3cg2011
Lect3cg2011Lect3cg2011
Lect3cg2011
 
Support vector regression and its application in trading
Support vector regression and its application in tradingSupport vector regression and its application in trading
Support vector regression and its application in trading
 
5.3 dynamic programming 03
5.3 dynamic programming 035.3 dynamic programming 03
5.3 dynamic programming 03
 

Similar to A Simple Review on SVM

Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Cheng Feng
 
lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptNaglaaAbdelhady
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.pptMahimMajee
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learningSamGuy7
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).pptmuqadsatareen
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slackStéphane Canu
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector MachineLucas Xu
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputszukun
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machinenozomuhamada
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svmStéphane Canu
 

Similar to A Simple Review on SVM (20)

Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 
lecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.pptlecture9-support vector machines algorithms_ML-1.ppt
lecture9-support vector machines algorithms_ML-1.ppt
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
Support Vector Machine.ppt
Support Vector Machine.pptSupport Vector Machine.ppt
Support Vector Machine.ppt
 
svm.ppt
svm.pptsvm.ppt
svm.ppt
 
support vector machine algorithm in machine learning
support vector machine algorithm in machine learningsupport vector machine algorithm in machine learning
support vector machine algorithm in machine learning
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
lecture14-SVMs (1).ppt
lecture14-SVMs (1).pptlecture14-SVMs (1).ppt
lecture14-SVMs (1).ppt
 
Svm my
Svm mySvm my
Svm my
 
Svm my
Svm mySvm my
Svm my
 
Lecture4 xing
Lecture4 xingLecture4 xing
Lecture4 xing
 
Gentle intro to SVM
Gentle intro to SVMGentle intro to SVM
Gentle intro to SVM
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 
Lecture3 linear svm_with_slack
Lecture3 linear svm_with_slackLecture3 linear svm_with_slack
Lecture3 linear svm_with_slack
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
Machine learning of structured outputs
Machine learning of structured outputsMachine learning of structured outputs
Machine learning of structured outputs
 
2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine2012 mdsp pr13 support vector machine
2012 mdsp pr13 support vector machine
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Lecture8 multi class_svm
Lecture8 multi class_svmLecture8 multi class_svm
Lecture8 multi class_svm
 

Recently uploaded

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

A Simple Review on SVM

  • 1. A Simple Review on SVM Honglin Yu Australian National University, NICTA September 2, 2013
  • 2. Outline 1 The Tutorial Routine Overview Linear SVC in Separable Case: Largest Margin Classifier Soft Margin Solving SVM Kernel Trick and Non-linear SVM 2 Some Topics Why the Name: Support Vectors? Why SVC Works Well: A Simple Example Relation with Logistic Regression etc. 3 Packages
  • 3. The Tutorial Routine Some Topics Packages Overview SVM (Support Vector Machines) are supervised learning methods It includes both methods for classification and regression In this talk, we focus on binary classifications.
  • 4. The Tutorial Routine Some Topics Packages Symbols training data: (x1, y1), ..., (xm, ym) ∈ X × {±1} patterns: xi , i = 1, 2, ..., m pattern space: X targets: yi , i = 1, 2, ..., m features: xi = Φ(xi ) feature space: H feature mapping: Φ : X → H
  • 5. The Tutorial Routine Some Topics Packages Separable Case: Largest Margin Classifier Figure: Simplest Case “Separable” means: ∃ line w · x + b = 0 correctly separates all the training data. “Margin”: d+ + d− (d± = min yi =±1 dist(xi , w · x + b = 0)) In this case, the SVC just looks for a line maximizing the margins.
  • 6. The Tutorial Routine Some Topics Packages Separable Case: Largest Margin Classifier Another form of expressing separable: yi (w · xi + b) > 0 Because the training data is finite, ∃ , yi (w · xi + b) ≥ This is equivalent to yi (w · xi + b ) ≥ 1 w · xi + b = 0 and w · xi + b = 0 are same line. We can directly write the constraints as yi (w · xi + b) ≥ 1 This removes the scaling redundancy in w, b
  • 7. The Tutorial Routine Some Topics Packages We also want the separating plane to place in the middle (which means d+ = d−). So the optimization problem can be formulated as arg max w,b (2 min x |w · xi + b| |w| ) s.t. yi (w · xi + b) ≥ 1, i = 1, 2, ..., N (1) This is equivalent to: arg min w,b |w|2 s.t. yi (w · xi + b) ≥ 1, i = 1, 2, ..., N (2) But, until now, it can only be confirmed that Eq.2 is only the necessary condition of finding the plane we want (correct and in the middle)
  • 8. The Tutorial Routine Some Topics Packages Largest Margin Classifier It can be proved that, when the data is separable, for the following problem min w,b 1 2 ||w||2 s.t. yi · (w · xi + b) ≥ 1, i = 1, ..., m. (3) we have, 1 When the ||w|| is minimized, the equality holds for some x. 2 The equality holds at least for some xi , xj where yi yj < 0. 3 Based on 1) and 2) we can calculate that the margin is 2 ||w|| , so the margin is maximized.
  • 9. The Tutorial Routine Some Topics Packages Proof of Previous Slide (Warning: My Proof) 1 If ∃c > 1 that ∀xi , yi · (w · xi + b) ≥ c, then w c and b c also satisfy the constraints and the length is smaller. 2 If not, assume that ∃c > 1, yi · (w · xi + b) ≥ 1, where yi = 1 yi · (w · xi + b) ≥ c, where yi = −1 (4) Add c−1 2 to each side where yi = 1, minus c−1 2 to each side where yi = −1, we can get: yi · (w · xi + b + c − 1 2 ) ≥ c + 1 2 (5) Because c+1 2 > 1, similar to 1), the |w| here is not the smallest 3 Pick x1, x2 where the equality holds and y1y2 < 0, the margin is just the distance between x1 and the line y2 · (w · x + b) = 1 which can be easily calculated as 2 ||w|| .
  • 10. The Tutorial Routine Some Topics Packages Non Separable Case Figure: Non separable case: miss classified points exist
  • 11. The Tutorial Routine Some Topics Packages Non Separable Case Constraints yi (w · xi + b) ≥ 1, i = 1, 2, ..., m can not be satisfied Solution: add slack variables ξi , reformulate form problem as, min w,b,ξ 1 2 ||w||2 + C m i=1 ξi s.t. yi ((w · xi) + b) ≥ 1 − ξi , i = 1, 2, ..., m ξi ≥ 0 (6) Show the trade off (C) between margins ( 1 |w| ) and penalty (ξi ).
  • 12. The Tutorial Routine Some Topics Packages Solving SVM: Lagrangian Dual Constraint optimization → Lagrangian Dual Primal form: min w,b,ξ 1 2 ||w||2 + C m i=1 ξi s.t. yi ((w · xi) + b) ≥ 1 − ξi , i = 1, 2, ..., m ξi ≥ 0 (7) The Primal Lagrangian: L(w, b, ξ, α, µ) = 1 2 ||w||2 +C i ξi − i αi {yi (w·x+b−1−ξi )}− i µi ξi Because [7] is convex, Karush-Kuhn-Tucker conditions hold.
  • 13. The Tutorial Routine Some Topics Packages Applying KKT Conditions Stationarity ∂L ∂w = 0 → w = i αi yi xi ∂L ∂b = 0 → i αi yi = 0 ∂L ∂ξ = 0 → C − αi − µi = 0, ∀i Primal Feasibility: yi ((w · xi) + b) ≥ 1 − ξi , ∀i Dual Feasibility: αi ≥ 0, ui ≥ 0 Complementary Slackness, ∀i µi ξi = 0 αi {yi (w · xi + b) − 1 + ξi } = 0 When αi = 0, corresponding xi is called support vectors
  • 14. The Tutorial Routine Some Topics Packages Dual Form Using the equations derived from KKT conditions, remove w, b, ξi , µi in the primal form to get the dual form: min α i αi − 1 2 i,j αi αj yi yj xT i xj s.t. i αi yi = 0 C ≥ αi ≥ 0 (8) And the decision function is:¯y = sign( i αi yi xT i x + b) (b = yk − w · xk, ∀k, C > αk > 0)
  • 15. The Tutorial Routine Some Topics Packages We Need Nonlinear Classifier -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Figure: Case that linear classifier can not handle Finding appropriate form of curves is hard, but we can transform the data!
  • 16. The Tutorial Routine Some Topics Packages Mapping Training Data to Feature Space Φ(x) = (x, x2)T Figure: Feature Mapping Helps Classification To solve nonlinear classification problem, we can define some mapping Φ : X → H and do linear classification on feature space H
  • 17. The Tutorial Routine Some Topics Packages Recap the Dual Form: An important Fact Dual form: min α i αi − 1 2 i,j αi αj yi yj xT i xj s.t. i αi yi = 0 C ≥ αi ≥ 0 (9) Decision function: ¯y = sign( i αi yi xT i x + b) To train SVC or use SVC to predict, we only need to know the inner product between xs! If we want to apply linear SVC in H, we do NOT need to know Φ(x), we ONLY need to know k(x, x ) =< Φ(x), Φ(x ) >. And k(x, x ) is called “kernel function”.
  • 18. The Tutorial Routine Some Topics Packages Kernel Functions The input of kernel function k : X × X → R is two patterns x, x in X, the output is the canonical inner product between Φ(x), Φ(x ) in H By using k(·, ·), we can implicitly transform the data by some Φ(·) (which is often with infinite dimension) E.g. for k(x, x ) = (xx + 1)2, Φ(x) = (x2, √ 2x, 1)T But not for all functions X × X → R, we can find corresponding Φ(x). Kernel functions should satisfy Mercer’s conditions
  • 19. The Tutorial Routine Some Topics Packages Conditions of Kernel Functions Necessity: Kernel Matrix K = [k(xi , xj )]m×m must be positive semidefinite: tT Kt = i,j ti tj k(xi , xj ) = i,j ti tj < Φ(xi ), Φ(xj ) > =< i ti Φ(xi ), j tj Φ(xj ) >= | i ti Φ(xi )|2 ≥ 0 Sufficiency in Continuous Form: Mercer’s Condition: For any symmetric function k : X × X → R which is square integrable in X × X, if it satisfies X×X k(x, x )f (x)f (x )dxdx ≥ 0 for all f ∈ L2(X) there exist functions φi : X → R and numbers λi ≥ 0 that, k(x, x ) = i λi φi (x)φi (x ) for all x, x in X
  • 20. The Tutorial Routine Some Topics Packages Commonly Used Kernel Functions Linear Kernel: k(x, x ) = x T x RBF Kernel: k(x, x ) = e−γ|x−x |2 , for gamma = 1 2 (from wiki) Polynomial Kernel: k(x, x ) = (γx T x + r)d , for γ = 1, d = 2 (from wiki) etc.
  • 21. The Tutorial Routine Some Topics Packages Mechanical Analogy Remember from KKT conditions, ∂L ∂w = 0 → w = i αi yi xi ∂L ∂w = 0 → i αi yi = 0 imagine every support vector xi exerts a force Fi = αi yi w |w| on the “separating plane + margin”, we have, Forces = i αi yi w |w| = w |w| i αi yi = 0 Torques = i xi × (αi yi w |w| ) = ( i αi yi xi ) × w |w| = w × w |w| = 0 This is why {xi } are called “support vectors”
  • 22. The Tutorial Routine Some Topics Packages Why SVC Works Well Let’s first consider using linear regression to do classification, the decision function is ¯y = sign(w · x + b) Figure: Feature Mapping Helps Classification In SVM, we only considers about the boundaries
  • 23. The Tutorial Routine Some Topics Packages Min-Loss Framework Primal form: min w,b,ξ 1 2 ||w||2 + C m i=1 ξi s.t. yi ((w · xi) + b) ≥ 1 − ξi , i = 1, 2, ..., m ξi ≥ 0 (10) Rewrite into min-loss form, min w,b,ξ 1 2 ||w||2 + C m i=1 max{0, (1 − yi ((w · xi) + b))} (11) This is called hinge loss.
  • 24. The Tutorial Routine Some Topics Packages See C-SVM and LMC from a Unified Direction Rewriting LMC classifier, min w 1 2 ||w||2 + m i=0 ∞ · (sign(1 − y(w · xi + b)) + 1) (12) Regularised Logistic Regression (y ∈ {0, 1}, not {−1, 1}, pi = 1 1+e−w·xi ) min w 1 2 ||w||2 + m i=0 −(yi log(pi ) + (1 − yi )log(1 − pi )) (13)
  • 25. The Tutorial Routine Some Topics Packages Relation with Logistic Regression etc. Figure: black:0-1 loss; red: logistic loss (−log( 1 1+e−yi w·x )); blue: hinge loss; green: quadratic loss. “0-1 loss” and “hinge loss” are not affected by correctly classified outliers. BTW, logistic regression can also be “kernelised”.
  • 26. The Tutorial Routine Some Topics Packages Commonly Used Packages libsvm(liblinear), svmlight and sklearn (python wrap-up of libsvm) Code example in sklearn import numpy as np X = np . a r r a y ([[ −1 , −1] , [ −2 , −1] , [ 1 , 1 ] , [ 2 , 1 ] ] ) y = np . a r r a y ( [ 1 , 1 , 2 , 2 ] ) from s k l e a r n . svm import SVC c l f = SVC() c l f . f i t (X, y ) c l f . p r e d i c t ([[ −0.8 , −1]])
  • 27. The Tutorial Routine Some Topics Packages Things Not Covered Algorithms (SMO, SGD) Generalisation bound and VC dimension ν-SVM, one-class SVM etc. SVR etc.