SlideShare a Scribd company logo
1 of 51
Download to read offline
1
Support Vector Machines:
Support Vector Machines:
Maximum Margin Classifiers
Maximum Margin Classifiers
Machine Learning and Pattern Recognition:
September 16, 2008
Piotr Mirowski
Based on slides by Sumit Chopra and Fu-Jie Huang
2
Outline
Outline
What is behind Support Vector Machines?
Constrained Optimization
Lagrange Duality
Support Vector Machines in Detail
Kernel Trick
LibSVM demo
3
Binary Classification Problem
Binary Classification Problem
Given: Training data generated according to the
distribution D
Problem: Find a classifier (a function)
such that it generalizes well on the test set obtained from
the same distribution D
Solution:
Linear Approach: linear classifiers (e.g. Perceptron)
Non Linear Approach: non-linear classifiers
(e.g. Neural Networks, SVM)
x1, y1,,xp , yp∈ℜ
n
×{−1,1}
hx:ℜ
n
{−1,1}
input label input
space
label
space
4
Linearly Separable Data
Linearly Separable Data
Assume that the training data is linearly separable
x1
x2
5
Linearly Separable Data
Linearly Separable Data
Assume that the training data is linearly separable
Then the classifier is:
Inference:
hx = 
w.
xb where w∈ℜ
n
,b∈ℜ
signhx ∈ {−1,1}

w .
xb=0

w
abscissa on axis parallel to
abscissa of origin 0 is b

w
x1
x2
6
Linearly Separable Data
Linearly Separable Data
Assume that the training data is linearly separable

w
Margin
Maximize margin ρ (or 2ρ) so that:
For the closest points: hx = 
w.
xb ∈ {−1,1}
 =
1
∥
w∥

w .
xb=1

w .
xb=−1
=
1
∥
w∥
2=
2
∥
w∥

w .
xb=0 
w .
xb=0
x1
x2
7
input
Optimization Problem
Optimization Problem
A Constrained Optimization Problem
min
w
1
2
∥
w∥
2
s.t.:
yi 
w . 
xib  1, i=1,,m
Equivalent to maximizing the margin
A convex optimization problem:
Objective is convex
Constraints are affine hence convex
 =
1
∥
w∥
label
8
Optimization Problem
Optimization Problem
min
w
1
2
∥
w∥
2
s.t.:
yi 
w . 
xib  1, i=1,,m
constraints
Compare:
With:
min
w ∑
i=1
p
−yi
w. 
xib

2
∥
w∥2

objective
regularization
energy/errors
9
Optimization: Some Theory
Optimization: Some Theory
The problem:
min
x
f 0 x
s.t.:
f i x0, i=1,,m
hix=0 , i=1,, p
objective function
inequality constraints
equality constraints
Solution of problem:
Global (unique) optimum – if the problem is convex
Local optimum – if the problem is not convex
x
o
10
Optimization: Some Theory
Optimization: Some Theory
Example: Standard Linear Program (LP)
min
x
cT
x
s.t.:
Ax=b
x0
Example: Least Squares Solution of Linear Equations
(with L2
norm regularization of the solution x)
i.e. Ridge Regression
min
x
xT
x
s.t.:
Ax=b
11
Big Picture
Big Picture
Constrained / unconstrained optimization
Hierarchy of objective function:
smooth = infinitely derivable
convex = has a global optimum
convex non-convex
smooth non-smooth smooth non-smooth
f 0
SVM NN
12
Toy Example:
Toy Example:
Equality Constraint
Equality Constraint
Example 1: min x1x2 ≡ f
s.t.: x1
2
x2
2
−2=0 ≡h1
x1
x2
∇ f
∇ f
∇ f
∇ h1
∇ h1 ∇ h1
−1,−1
At Optimal Solution: ∇ f xo
=1
o
∇ h1xo

∇ f =

∂ f
∂ x1
∂ f
∂ x2

∇ h1=

∂h1
∂ x1
∂h1
∂ x2

13
is not an optimal solution, if there exists
such that
h1xs = 0
f xs  f x
x s≠0
Using first order Taylor's expansion
h1xs = h1 x∇ h1x
T
s = ∇ h1x
T
s = 0 1
f xs− f x = ∇ f x
T
s  0 2
Such an can exist only
when and
are not parallel
s
∇ h1x ∇ f x
∇ f x
∇ h1x
Toy Example:
Toy Example:
Equality Constraint
Equality Constraint
s
14
Thus we have
∇ f x
o
=1
o
∇ h1x
o

The Lagrangian
Lx ,1= f x−1 h1x
This is just a necessary (not a sufficient) condition”
x solution implies
Thus at the solution
∇x Lx
o
,1
o
=∇ f x
o
−1
o
∇ h1x
o
 = 0
Lagrange multiplier or
dual variable for h1
Toy Example:
Toy Example:
Equality Constraint
Equality Constraint
∇ h1x ∥ ∇ f x
15
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
Example 2: min x1x2 ≡ f
s.t.: 2−x1
2
−x2
2
 0 ≡c1
x1
x2
∇ f
∇ f
∇ f
∇ c1
∇ c1
∇ c1
−1,−1
∇ f =

∂ f
∂ x1
∂ f
∂ x2

∇ c1=

∂c1
∂ x1
∂c1
∂ x2

16
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
x1
x2
∇ c1
−1,−1
∇ f
∇ f
s
s
is not an optimal solution, if there exists
such that
x
c1xs  0
f xs  f x
Using first order Taylor's expansion
c1xs = c1x∇ c1 xT
s  0 1
f xs− f x = ∇ f x
T
s  0 2
s≠0
17
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
Case 1: Inactive constraint
Any sufficiently small s
as long as
Thus
c1x  0
∇ c1x
T
s  0 1
∇ f xT
s  0 2
∇ f 1x ≠ 0
s = − ∇ f x where 0
Case 2: Active constraint c1x = 0
∇ f x = 1 ∇ c1x, where 10
x1
∇ c1
−1,−1
∇ f
∇ f
s
s
∇ f x
∇ c1x
Case 2: Active constraint
s
18
Thus we have the Lagrangian (as before)
The optimality conditions
Lx ,1= f x−1 c1x
and
∇x L xo
,1
o
=∇ f xo
−1
o
∇ c1 xo
 = 0 for some 10
Lagrange multiplier or
dual variable for c1
Toy Example:
Toy Example:
Inequality Constraint
Inequality Constraint
1
o
c1x
o
 = 0
Complementarity
condition
either c1x
o
 = 0 or 1
o
= 0
(active) (inactive)
19
Same Concepts in a More
Same Concepts in a More
General Setting
General Setting
20
Lagrange Function
Lagrange Function
The Problem
min
x
f 0x
s.t.:
f i x0, i=1,,m
hi x=0 , i=1,, p
Lx , ,= f 0x∑
i=1
m
i f i x∑
i=1
p
i hi x
Standard tool for constrained optimization:
the Lagrange Function
dual variables or Lagrange multipliers
m inequality constraints
p equality constraints
objective function
21
Lagrange Dual Function
Lagrange Dual Function
Defined, for as the minimum value
of the Lagrange function over x
m inequality constraints
p equality constraints
g,=inf
x∈D
Lx , ,=inf
x∈Df 0x∑
i=1
m
1 f i x∑
i=1
p
i hix

g :ℜ
m
×ℜ
p
ℜ
,
22
unsatisfied
Lagrange Dual Function
Lagrange Dual Function
Interpretation of Lagrange dual function:
Writing the original problem as unconstrained problem
but with hard indicators (penalties)
minimize
x f 0x∑
i=1
m
I0 f i x∑
i=1
p
I1hi x

I0u={0 u0
∞ u0} I1u={0 u=0
∞ u≠0}
where
indicator functions
unsatisfied
satisfied
satisfied
23
Lagrange Dual Function
Lagrange Dual Function
Interpretation of Lagrange dual function:
The Lagrange multipliers in Lagrange dual function can
be seen as “softer” version of indicator (penalty)
functions.
minimize
x f 0x∑
i=1
m
I0 f ix∑
i=1
p
I1hi x

inf
x∈D f 0x∑
i=1
m
i f i x∑
i=1
p
i hi x

24
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g ,po
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
f i  
x  0 i=1,, m
hi  
x = 0 i=1,, p
25
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g , p
o
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
Thus
L 
x , , = f 0 
x∑
i=1
m
i f i  
x∑
i=1
p
i hi  
x  f 0 
x
f i 
x  0 i=1,,m
hi  
x = 0 i=1,, p
26
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g , p
o
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
Thus
L 
x , , = f 0 
x∑
i=1
m
i f i  
x∑
i=1
p
i hi  
x  f 0 
x
.  0
f i 
x  0 i=1,,m
hi  
x = 0 i=1,, p
27
Lagrange Dual Function
Lagrange Dual Function
Lagrange dual function gives a lower bound on optimal
value of the problem:
g, = inf
x∈D
Lx , ,  L 
x , ,  f 0 
x
g, p
o
Proof: Let be a feasible optimal point and let .
Then we have:

x 0
Thus
L
x , , = f 0
x∑
i=1
m
i f i 
x∑
i=1
p
i hi 
x  f 0 
x
Hence
.  0
f i 
x  0 i=1,,m
hi 
x = 0 i=1,, p
28
Sufficient Condition
Sufficient Condition
If is a saddle point, i.e. if
x
o
,
o
,
o

∀ x∈ℜ
n
, ∀0, Lx
o
,,Lx
o
,
o
,
o
Lx ,
o
,
o

... then is a solution of p
o
x
o
,
o
,
o

x
o
x

o
,
o

 ,
Lx , ,
29
Lagrange Dual Problem
Lagrange Dual Problem
Lagrange dual function gives a lower bound
on the optimal value of the problem.
We seek the “best” lower bound to minimize the objective:
maximize g ,
s.t.: 0
The dual optimal value and solution:
The Lagrange dual problem is convex
even if the original problem is not.
d
o
= g
o
,
o

30
Primal / Dual Problems
Primal / Dual Problems
Primal problem:
min
x∈D
f 0x
s.t.:
f i x0, i=1,,m
hi x=0 , i=1,, p
Dual problem:
max
 ,
g,
s.t.: 0
po
d
o
g,=inf
x∈Df 0x∑
i=1
m
1 f ix∑
i=1
p
i hi x

31
Weak Duality
Weak Duality
Weak duality theorem:
d
o
 p
o
Optimal duality gap:
p
o
− d
o
 0
This bound is sometimes used to get an estimate on the
optimal value of the original problem that is difficult to
solve.
32
Strong Duality
Strong Duality
Slater's Condition: If and it is strictly feasible:
x∈D
Strong Duality theorem:
if Slater's condition holds and the problem is convex,
then strong duality is attained:
f i x  0 for i=1,m
hix = 0 for i=1, p
d
o
= p
o
Strong Duality:
Strong duality does not hold in general.
∃
o
,
o
 with d
o
=g
o
,
o
=max
 ,
g ,=inf
x
Lx ,
o
,
o
= p
o
33
Optimality Conditions:
Optimality Conditions:
First Order
First Order
Complementary slackness:
if strong duality holds, then at optimality
i
o
f i x
o
 = 0 i=1,m
Proof:
f 0x
o
 = g
o
,
o

= inf
x f 0x∑
i=1
m
i
o
f i x∑
i=1
p
i
o
hi x

 f 0xo
∑
i=1
m
i
o
f i xo
∑
i=1
p
i
o
hi xo

 f 0x
o

x
o
,
o
,
o

∀i , f ix0,i≥0
∀i ,hi x = 0
(strong duality)
34
Optimality Conditions:
Optimality Conditions:
First Order
First Order
Karush-Kuhn-Tucker (KKT) conditions
If the strong duality holds, then at optimality:
f i x
o
  0, i=1,,m
hi xo
 = 0, i=1,, p
i
o
 0, i=1,,m
i
o
f i x
o
 = 0, i=1,, m
∇ f 0 xo
∑
i=1
m
i
o
∇ f i xo
∑
i=1
p
i
o
∇ hi xo
 = 0
KKT conditions are
➔
necessary in general (local optimum)
➔
necessary and sufficient in case of convex problems
(global optimum)
35
What are Support Vector
What are Support Vector
Machines?
Machines?
Linear classifiers
(Mostly) binary classifiers
Supervised training
Good generalization with
explicit bounds
36
Main Ideas Behind
Main Ideas Behind
Support Vector Machines
Support Vector Machines
Maximal margin
Dual space
Linear classifiers
in high-dimensional space
using non-linear mapping
Kernel trick
37
Quadratic Programming
Quadratic Programming
38
Using the Lagrangian
Using the Lagrangian
39
Dual Space
Dual Space
40
Strong Duality
Strong Duality
41
Duality Gap
Duality Gap
42
No Duality Gap Thanks to
No Duality Gap Thanks to
Convexity
Convexity
43
Dual Form
Dual Form
Illustration from Prof. Mohri's lecture notes 44
Non-linear separation of datasets
Non-linear separation of datasets
Non-linear separation is impossible in
most problems
45
Non-separable datasets
Non-separable datasets
Solutions:
1) Nonlinear classifiers 2) Increase dimensionality of dataset
and add a non-linear mapping Ф
46
Kernel Trick
Kernel Trick
“similarity measure”
between 2 data samples
47
Kernel Trick Illustrated
Kernel Trick Illustrated
48
Curse of Dimensionality Due to
Curse of Dimensionality Due to
the Non-Linear Mapping
the Non-Linear Mapping
49
Positive Symmetric Definite Kernels
Positive Symmetric Definite Kernels
(Mercer Condition)
(Mercer Condition)
50
Advantages of SVM
Advantages of SVM
Work very well...
Error bounds easy to obtain:
Generalization error small and predictable
Fool-proof method:
(Mostly) three kernels to choose from:
Gaussian
Linear and Polynomial
Sigmoid
Very small number of parameters to optimize
51
Limitations of SVM
Limitations of SVM
Size limitation:
Size of kernel matrix is quadratic with the
number of training vectors
Speed limitations:
1) During training:
very large quadratic programming problem
solved numerically
Solutions:
Chunking
Sequential Minimal Optimization (SMO)
breaks QP problem into many small QP
problems solved analytically
Hardware implementations
2) During testing:
number of support vectors
Solution: Online SVM

More Related Content

Similar to lecture on support vector machine in the field of ML

Paper finance hosseinkhan_remy
Paper finance hosseinkhan_remyPaper finance hosseinkhan_remy
Paper finance hosseinkhan_remyRémy Hosseinkhan
 
Project in Calcu
Project in CalcuProject in Calcu
Project in Calcupatrickpaz
 
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...RyotaroTsukada
 
A043001006
A043001006A043001006
A043001006inventy
 
A043001006
A043001006A043001006
A043001006inventy
 
A043001006
A043001006A043001006
A043001006inventy
 
LAGRANGE_MULTIPLIER.ppt
LAGRANGE_MULTIPLIER.pptLAGRANGE_MULTIPLIER.ppt
LAGRANGE_MULTIPLIER.pptMSPrasad7
 
DIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptxDIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptxOchiriaEliasonyait
 
Constrained Maximization
Constrained MaximizationConstrained Maximization
Constrained MaximizationGlennAnthony7
 
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptxSCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptxDenmarkSantos3
 
Module ii sp
Module ii spModule ii sp
Module ii spVijaya79
 
A Level Set Method For Multiobjective Combinatorial Optimization Application...
A Level Set Method For Multiobjective Combinatorial Optimization  Application...A Level Set Method For Multiobjective Combinatorial Optimization  Application...
A Level Set Method For Multiobjective Combinatorial Optimization Application...Scott Faria
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualStéphane Canu
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualStéphane Canu
 
Maxima & Minima
Maxima & MinimaMaxima & Minima
Maxima & MinimaArun Umrao
 
Maxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun UmraoMaxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun Umraossuserd6b1fd
 
376951072-3-Greedy-Method-new-ppt.ppt
376951072-3-Greedy-Method-new-ppt.ppt376951072-3-Greedy-Method-new-ppt.ppt
376951072-3-Greedy-Method-new-ppt.pptRohitPaul71
 

Similar to lecture on support vector machine in the field of ML (20)

Paper finance hosseinkhan_remy
Paper finance hosseinkhan_remyPaper finance hosseinkhan_remy
Paper finance hosseinkhan_remy
 
Project in Calcu
Project in CalcuProject in Calcu
Project in Calcu
 
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
5.4 Saddle-point interpretation, 5.5 Optimality conditions, 5.6 Perturbation ...
 
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
QMC: Operator Splitting Workshop, Progressive Decoupling of Linkages in Optim...
 
A043001006
A043001006A043001006
A043001006
 
A043001006
A043001006A043001006
A043001006
 
A043001006
A043001006A043001006
A043001006
 
LAGRANGE_MULTIPLIER.ppt
LAGRANGE_MULTIPLIER.pptLAGRANGE_MULTIPLIER.ppt
LAGRANGE_MULTIPLIER.ppt
 
DIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptxDIFFERENTIATION Integration and limits (1).pptx
DIFFERENTIATION Integration and limits (1).pptx
 
Constrained Maximization
Constrained MaximizationConstrained Maximization
Constrained Maximization
 
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptxSCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
SCalcET9_LecturePPTs_02_03.pptx SCalcET9_LecturePPTs_02_03.pptx
 
Limit and continuity
Limit and continuityLimit and continuity
Limit and continuity
 
Module ii sp
Module ii spModule ii sp
Module ii sp
 
A Level Set Method For Multiobjective Combinatorial Optimization Application...
A Level Set Method For Multiobjective Combinatorial Optimization  Application...A Level Set Method For Multiobjective Combinatorial Optimization  Application...
A Level Set Method For Multiobjective Combinatorial Optimization Application...
 
Lecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dualLecture 2: linear SVM in the dual
Lecture 2: linear SVM in the dual
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 
Maxima & Minima
Maxima & MinimaMaxima & Minima
Maxima & Minima
 
Maxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun UmraoMaxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun Umrao
 
R lecture co4_math 21-1
R lecture co4_math 21-1R lecture co4_math 21-1
R lecture co4_math 21-1
 
376951072-3-Greedy-Method-new-ppt.ppt
376951072-3-Greedy-Method-new-ppt.ppt376951072-3-Greedy-Method-new-ppt.ppt
376951072-3-Greedy-Method-new-ppt.ppt
 

Recently uploaded

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Recently uploaded (20)

Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

lecture on support vector machine in the field of ML

  • 1. 1 Support Vector Machines: Support Vector Machines: Maximum Margin Classifiers Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang
  • 2. 2 Outline Outline What is behind Support Vector Machines? Constrained Optimization Lagrange Duality Support Vector Machines in Detail Kernel Trick LibSVM demo
  • 3. 3 Binary Classification Problem Binary Classification Problem Given: Training data generated according to the distribution D Problem: Find a classifier (a function) such that it generalizes well on the test set obtained from the same distribution D Solution: Linear Approach: linear classifiers (e.g. Perceptron) Non Linear Approach: non-linear classifiers (e.g. Neural Networks, SVM) x1, y1,,xp , yp∈ℜ n ×{−1,1} hx:ℜ n {−1,1} input label input space label space
  • 4. 4 Linearly Separable Data Linearly Separable Data Assume that the training data is linearly separable x1 x2
  • 5. 5 Linearly Separable Data Linearly Separable Data Assume that the training data is linearly separable Then the classifier is: Inference: hx =  w. xb where w∈ℜ n ,b∈ℜ signhx ∈ {−1,1}  w . xb=0  w abscissa on axis parallel to abscissa of origin 0 is b  w x1 x2
  • 6. 6 Linearly Separable Data Linearly Separable Data Assume that the training data is linearly separable  w Margin Maximize margin ρ (or 2ρ) so that: For the closest points: hx =  w. xb ∈ {−1,1}  = 1 ∥ w∥  w . xb=1  w . xb=−1 = 1 ∥ w∥ 2= 2 ∥ w∥  w . xb=0  w . xb=0 x1 x2
  • 7. 7 input Optimization Problem Optimization Problem A Constrained Optimization Problem min w 1 2 ∥ w∥ 2 s.t.: yi  w .  xib  1, i=1,,m Equivalent to maximizing the margin A convex optimization problem: Objective is convex Constraints are affine hence convex  = 1 ∥ w∥ label
  • 8. 8 Optimization Problem Optimization Problem min w 1 2 ∥ w∥ 2 s.t.: yi  w .  xib  1, i=1,,m constraints Compare: With: min w ∑ i=1 p −yi w.  xib  2 ∥ w∥2  objective regularization energy/errors
  • 9. 9 Optimization: Some Theory Optimization: Some Theory The problem: min x f 0 x s.t.: f i x0, i=1,,m hix=0 , i=1,, p objective function inequality constraints equality constraints Solution of problem: Global (unique) optimum – if the problem is convex Local optimum – if the problem is not convex x o
  • 10. 10 Optimization: Some Theory Optimization: Some Theory Example: Standard Linear Program (LP) min x cT x s.t.: Ax=b x0 Example: Least Squares Solution of Linear Equations (with L2 norm regularization of the solution x) i.e. Ridge Regression min x xT x s.t.: Ax=b
  • 11. 11 Big Picture Big Picture Constrained / unconstrained optimization Hierarchy of objective function: smooth = infinitely derivable convex = has a global optimum convex non-convex smooth non-smooth smooth non-smooth f 0 SVM NN
  • 12. 12 Toy Example: Toy Example: Equality Constraint Equality Constraint Example 1: min x1x2 ≡ f s.t.: x1 2 x2 2 −2=0 ≡h1 x1 x2 ∇ f ∇ f ∇ f ∇ h1 ∇ h1 ∇ h1 −1,−1 At Optimal Solution: ∇ f xo =1 o ∇ h1xo  ∇ f =  ∂ f ∂ x1 ∂ f ∂ x2  ∇ h1=  ∂h1 ∂ x1 ∂h1 ∂ x2 
  • 13. 13 is not an optimal solution, if there exists such that h1xs = 0 f xs  f x x s≠0 Using first order Taylor's expansion h1xs = h1 x∇ h1x T s = ∇ h1x T s = 0 1 f xs− f x = ∇ f x T s  0 2 Such an can exist only when and are not parallel s ∇ h1x ∇ f x ∇ f x ∇ h1x Toy Example: Toy Example: Equality Constraint Equality Constraint s
  • 14. 14 Thus we have ∇ f x o =1 o ∇ h1x o  The Lagrangian Lx ,1= f x−1 h1x This is just a necessary (not a sufficient) condition” x solution implies Thus at the solution ∇x Lx o ,1 o =∇ f x o −1 o ∇ h1x o  = 0 Lagrange multiplier or dual variable for h1 Toy Example: Toy Example: Equality Constraint Equality Constraint ∇ h1x ∥ ∇ f x
  • 15. 15 Toy Example: Toy Example: Inequality Constraint Inequality Constraint Example 2: min x1x2 ≡ f s.t.: 2−x1 2 −x2 2  0 ≡c1 x1 x2 ∇ f ∇ f ∇ f ∇ c1 ∇ c1 ∇ c1 −1,−1 ∇ f =  ∂ f ∂ x1 ∂ f ∂ x2  ∇ c1=  ∂c1 ∂ x1 ∂c1 ∂ x2 
  • 16. 16 Toy Example: Toy Example: Inequality Constraint Inequality Constraint x1 x2 ∇ c1 −1,−1 ∇ f ∇ f s s is not an optimal solution, if there exists such that x c1xs  0 f xs  f x Using first order Taylor's expansion c1xs = c1x∇ c1 xT s  0 1 f xs− f x = ∇ f x T s  0 2 s≠0
  • 17. 17 Toy Example: Toy Example: Inequality Constraint Inequality Constraint Case 1: Inactive constraint Any sufficiently small s as long as Thus c1x  0 ∇ c1x T s  0 1 ∇ f xT s  0 2 ∇ f 1x ≠ 0 s = − ∇ f x where 0 Case 2: Active constraint c1x = 0 ∇ f x = 1 ∇ c1x, where 10 x1 ∇ c1 −1,−1 ∇ f ∇ f s s ∇ f x ∇ c1x Case 2: Active constraint s
  • 18. 18 Thus we have the Lagrangian (as before) The optimality conditions Lx ,1= f x−1 c1x and ∇x L xo ,1 o =∇ f xo −1 o ∇ c1 xo  = 0 for some 10 Lagrange multiplier or dual variable for c1 Toy Example: Toy Example: Inequality Constraint Inequality Constraint 1 o c1x o  = 0 Complementarity condition either c1x o  = 0 or 1 o = 0 (active) (inactive)
  • 19. 19 Same Concepts in a More Same Concepts in a More General Setting General Setting
  • 20. 20 Lagrange Function Lagrange Function The Problem min x f 0x s.t.: f i x0, i=1,,m hi x=0 , i=1,, p Lx , ,= f 0x∑ i=1 m i f i x∑ i=1 p i hi x Standard tool for constrained optimization: the Lagrange Function dual variables or Lagrange multipliers m inequality constraints p equality constraints objective function
  • 21. 21 Lagrange Dual Function Lagrange Dual Function Defined, for as the minimum value of the Lagrange function over x m inequality constraints p equality constraints g,=inf x∈D Lx , ,=inf x∈Df 0x∑ i=1 m 1 f i x∑ i=1 p i hix  g :ℜ m ×ℜ p ℜ ,
  • 22. 22 unsatisfied Lagrange Dual Function Lagrange Dual Function Interpretation of Lagrange dual function: Writing the original problem as unconstrained problem but with hard indicators (penalties) minimize x f 0x∑ i=1 m I0 f i x∑ i=1 p I1hi x  I0u={0 u0 ∞ u0} I1u={0 u=0 ∞ u≠0} where indicator functions unsatisfied satisfied satisfied
  • 23. 23 Lagrange Dual Function Lagrange Dual Function Interpretation of Lagrange dual function: The Lagrange multipliers in Lagrange dual function can be seen as “softer” version of indicator (penalty) functions. minimize x f 0x∑ i=1 m I0 f ix∑ i=1 p I1hi x  inf x∈D f 0x∑ i=1 m i f i x∑ i=1 p i hi x 
  • 24. 24 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g ,po Proof: Let be a feasible optimal point and let . Then we have:  x 0 f i   x  0 i=1,, m hi   x = 0 i=1,, p
  • 25. 25 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g , p o Proof: Let be a feasible optimal point and let . Then we have:  x 0 Thus L  x , , = f 0  x∑ i=1 m i f i   x∑ i=1 p i hi   x  f 0  x f i  x  0 i=1,,m hi   x = 0 i=1,, p
  • 26. 26 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g , p o Proof: Let be a feasible optimal point and let . Then we have:  x 0 Thus L  x , , = f 0  x∑ i=1 m i f i   x∑ i=1 p i hi   x  f 0  x .  0 f i  x  0 i=1,,m hi   x = 0 i=1,, p
  • 27. 27 Lagrange Dual Function Lagrange Dual Function Lagrange dual function gives a lower bound on optimal value of the problem: g, = inf x∈D Lx , ,  L  x , ,  f 0  x g, p o Proof: Let be a feasible optimal point and let . Then we have:  x 0 Thus L x , , = f 0 x∑ i=1 m i f i  x∑ i=1 p i hi  x  f 0  x Hence .  0 f i  x  0 i=1,,m hi  x = 0 i=1,, p
  • 28. 28 Sufficient Condition Sufficient Condition If is a saddle point, i.e. if x o , o , o  ∀ x∈ℜ n , ∀0, Lx o ,,Lx o , o , o Lx , o , o  ... then is a solution of p o x o , o , o  x o x  o , o   , Lx , ,
  • 29. 29 Lagrange Dual Problem Lagrange Dual Problem Lagrange dual function gives a lower bound on the optimal value of the problem. We seek the “best” lower bound to minimize the objective: maximize g , s.t.: 0 The dual optimal value and solution: The Lagrange dual problem is convex even if the original problem is not. d o = g o , o 
  • 30. 30 Primal / Dual Problems Primal / Dual Problems Primal problem: min x∈D f 0x s.t.: f i x0, i=1,,m hi x=0 , i=1,, p Dual problem: max  , g, s.t.: 0 po d o g,=inf x∈Df 0x∑ i=1 m 1 f ix∑ i=1 p i hi x 
  • 31. 31 Weak Duality Weak Duality Weak duality theorem: d o  p o Optimal duality gap: p o − d o  0 This bound is sometimes used to get an estimate on the optimal value of the original problem that is difficult to solve.
  • 32. 32 Strong Duality Strong Duality Slater's Condition: If and it is strictly feasible: x∈D Strong Duality theorem: if Slater's condition holds and the problem is convex, then strong duality is attained: f i x  0 for i=1,m hix = 0 for i=1, p d o = p o Strong Duality: Strong duality does not hold in general. ∃ o , o  with d o =g o , o =max  , g ,=inf x Lx , o , o = p o
  • 33. 33 Optimality Conditions: Optimality Conditions: First Order First Order Complementary slackness: if strong duality holds, then at optimality i o f i x o  = 0 i=1,m Proof: f 0x o  = g o , o  = inf x f 0x∑ i=1 m i o f i x∑ i=1 p i o hi x   f 0xo ∑ i=1 m i o f i xo ∑ i=1 p i o hi xo   f 0x o  x o , o , o  ∀i , f ix0,i≥0 ∀i ,hi x = 0 (strong duality)
  • 34. 34 Optimality Conditions: Optimality Conditions: First Order First Order Karush-Kuhn-Tucker (KKT) conditions If the strong duality holds, then at optimality: f i x o   0, i=1,,m hi xo  = 0, i=1,, p i o  0, i=1,,m i o f i x o  = 0, i=1,, m ∇ f 0 xo ∑ i=1 m i o ∇ f i xo ∑ i=1 p i o ∇ hi xo  = 0 KKT conditions are ➔ necessary in general (local optimum) ➔ necessary and sufficient in case of convex problems (global optimum)
  • 35. 35 What are Support Vector What are Support Vector Machines? Machines? Linear classifiers (Mostly) binary classifiers Supervised training Good generalization with explicit bounds
  • 36. 36 Main Ideas Behind Main Ideas Behind Support Vector Machines Support Vector Machines Maximal margin Dual space Linear classifiers in high-dimensional space using non-linear mapping Kernel trick
  • 42. 42 No Duality Gap Thanks to No Duality Gap Thanks to Convexity Convexity
  • 44. Illustration from Prof. Mohri's lecture notes 44 Non-linear separation of datasets Non-linear separation of datasets Non-linear separation is impossible in most problems
  • 45. 45 Non-separable datasets Non-separable datasets Solutions: 1) Nonlinear classifiers 2) Increase dimensionality of dataset and add a non-linear mapping Ф
  • 46. 46 Kernel Trick Kernel Trick “similarity measure” between 2 data samples
  • 48. 48 Curse of Dimensionality Due to Curse of Dimensionality Due to the Non-Linear Mapping the Non-Linear Mapping
  • 49. 49 Positive Symmetric Definite Kernels Positive Symmetric Definite Kernels (Mercer Condition) (Mercer Condition)
  • 50. 50 Advantages of SVM Advantages of SVM Work very well... Error bounds easy to obtain: Generalization error small and predictable Fool-proof method: (Mostly) three kernels to choose from: Gaussian Linear and Polynomial Sigmoid Very small number of parameters to optimize
  • 51. 51 Limitations of SVM Limitations of SVM Size limitation: Size of kernel matrix is quadratic with the number of training vectors Speed limitations: 1) During training: very large quadratic programming problem solved numerically Solutions: Chunking Sequential Minimal Optimization (SMO) breaks QP problem into many small QP problems solved analytically Hardware implementations 2) During testing: number of support vectors Solution: Online SVM