SlideShare a Scribd company logo
1 of 43
Download to read offline
Support Vector
Machines
Overviews
Proposed by Vapnik and his colleagues
- Started in 1963, taking shape in late 70’s as part of his statistical
learning theory (with Chervonenkis)
- Current form established in early 90’s (with Cortes)
Becomes popular in last decade
- Classification, regression (function approx.), optimization
- Compared favorably to MLP
Basic ideas
- Overcoming linear seperability problem by transforming the
problem into higher dimensional space using kernel functions
- (become equiv. to 2-layer perceptron when kernel is sigmoid
function)
- Maximize margin of decision boundary
Linear Classifiers
f
x
α
ytest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
How would you
classify this data?
Linear Classifiers
f
x
α
ytest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
How would you
classify this data?
Linear Classifiers
f
x
α
ytest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
How would you
classify this data?
Linear Classifiers
f
x
α
ytest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
How would you
classify this data?
Linear Classifiers
f
x
α
ytest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
Any of these
would be fine..
..but which is
best?
Classifier Margin
f
x
α
ytest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
Define the margin
of a linear
classifier as the
width that the
boundary could be
increased by
before hitting a
datapoint.
Maximum Margin
f
x
α
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
The maximum
margin linear
classifier is the
linear classifier
with maximum
margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Linear SVM
Maximum Margin
f
x
α
yest
denotes +1
denotes -1
f(x,w,b) = sign(w. x + b)
The maximum
margin linear
classifier is the
linear classifier
with maximum
margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Support Vectors
are those
datapoints that
the margin
pushes up
against
Linear SVM
Why Maximum Margin?
denotes +1
denotes -1
f(x,w,b) = sign(w. x - b)
The maximum
margin linear
classifier is the
linear classifier
with the, um,
maximum margin.
This is the
simplest kind of
SVM (Called an
LSVM)
Support Vectors
are those
datapoints that
the margin
pushes up
against
1. Intuitively this feels safest.
2. If we’ve made a small error in the
location of the boundary (it’s been
jolted in its perpendicular direction)
this gives us least chance of causing a
misclassification.
3. CV is easy since the model is immune
to removal of any non-support-vector
datapoints.
4. There’s some theory that this is a
good thing.
5. Empirically it works very very well.
Specifying a line and margin
How do we represent this mathematically?
…in m input dimensions?
Plus-Plane
Minus-Plane
Classifier Boundary
“Predict Class = +1”
zone
“Predict Class = -1”
zone
Specifying a line and margin
Conditions for optimal separating hyperplane for data
points
(x1
, y1
),…,(xl
, yl
) where yi
=±1
1. w . xi
+ b ≥ 1 if yi
= 1 (points in plus class)
2. w . xi
+ b ≤ -1 if yi
= -1 (points in minus class)
Plus-Plane
Minus-Plane
Classifier Boundary
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
Computing the margin width
Plus-plane = { x : w . x + b = +1 }
Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to the Plus Plane.
Why?
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width
How do we compute
M in terms of w
and b?
Computing the margin width
Plus-plane = { x : w . x + b = +1 }
Minus-plane = { x : w . x + b = -1 }
Claim: The vector w is perpendicular to the Plus Plane.
Why?
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width
How do we compute
M in terms of w
and b?
Let u and v be two vectors on the
Plus Plane. What is w . ( u – v ) ?
And so of course the vector w is also
perpendicular to the Minus Plane
Computing the margin width
Plus-plane = { x : w . x + b = +1 }
Minus-plane = { x : w . x + b = -1 }
The vector w is perpendicular to the Plus Plane
Let x-
be any point on the minus plane
Let x+
be the closest plus-plane-point to x-
.
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width
How do we compute
M in terms of w
and b?
x-
x+
Any location in
m
: not
necessarily a
datapoint
Any location in
Rm
: not
necessarily a
datapoint
Computing the margin width
Plus-plane = { x : w . x + b = +1 }
Minus-plane = { x : w . x + b = -1 }
The vector w is perpendicular to the Plus Plane
Let x-
be any point on the minus plane
Let x+
be the closest plus-plane-point to x-
.
Claim: x+
= x-
+ λ w for some value of λ. Why?
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width
How do we compute
M in terms of w
and b?
x-
x+
Computing the margin width
Plus-plane = { x : w . x + b = +1 }
Minus-plane = { x : w . x + b = -1 }
The vector w is perpendicular to the Plus Plane
Let x-
be any point on the minus plane
Let x+
be the closest plus-plane-point to x-
.
Claim: x+
= x-
+ λ w for some value of λ. Why?
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width
How do we compute
M in terms of w
and b?
x-
x+
The line from x-
to x+
is
perpendicular to the
planes.
So to get from x-
to x+
travel some distance in
direction w.
Computing the margin width
What we know:
w . x+
+ b = +1
w . x-
+ b = -1
x+
= x-
+ λ w
|x+
- x-
| = M
It’s now easy to get M
in terms of w and b
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width
x-
x+
Computing the margin width
What we know:
w . x+
+ b = +1
w . x-
+ b = -1
x+
= x-
+ λ w
|x+
- x-
| = M
It’s now easy to get M
in terms of w and b
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width
w . (x -
+ λ w) + b = 1
=>
w . x -
+ b + λ w .w = 1
=>
-1 + λ w .w = 1
=>
x-
x+
Computing the margin width
What we know:
w . x+
+ b = +1
w . x-
+ b = -1
x+
= x-
+ λ w
|x+
- x-
| = M
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width =
M = |x+
- x-
| =| λ w |=
x-
x+
Learning the Maximum Margin Classifier
Given a guess of w and b we can
Compute whether all data points in the correct half-planes
Compute the width of the margin
So now we just need to write a program to search the space
of w’s and b’s to find the widest margin that matches all
the datapoints. How?
Gradient descent? Simulated Annealing? Matrix Inversion?
EM? Newton’s Method?
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M = Margin Width =
x-
x+
Learning via Quadratic Programming
Optimal separating hyperplane can be found by solving
- This is a quadratic function
- Once are found,
the weight matrix
the decision function is
This optimization problem can be solved by quadratic
programming
QP is a well-studied class of optimization algorithms to maximize a
quadratic function subject to linear constraints
Quadratic Programming
Find
And subject to
n additional linear
inequality
constraints
e
additional
linear
equality
constraints
Quadratic criterion
Subject to
Quadratic Programming
Find
Subject to
And subject to
n additional linear
inequality
constraints
e
additional
linear
equality
constraints
Quadratic criterion
There exist algorithms for finding
such constrained quadratic
optima much more efficiently
and reliably than gradient
ascent.
(But they are very fiddly…you
probably don’t want to write
one yourself)
Learning the Maximum Margin Classifier
Given guess of w , b we can
Compute whether all data
points are in the correct
half-planes
Compute the margin width
Assume R datapoints, each
(xk
,yk
) where yk
= +/- 1
“Predict Class = +1”
zone
“Predict Class = -1”
zone
wx+b=1
wx+b=0
wx+b=-1
M =
What should our quadratic
optimization criterion be?
How many constraints will we
have?
What should they be?
Suppose we’re in
1-dimension
What would
SVMs do with
this data?
x=0
Suppose we’re in
1-dimension
Not a big surprise
Positive “plane” Negative “plane”
x=0
Harder 1-dimensional
dataset
That’s wiped the
smirk off SVM’s
face.
What can be
done about
this?
x=0
Harder 1-dimensional dataset
Remember how
permitting
non-linear basis
functions made
linear regression
so much nicer?
Let’s permit them
here too
x=0
Harder 1-dimensional dataset
Remember how
permitting
non-linear basis
functions made
linear regression
so much nicer?
Let’s permit them
here too
x=0
Common SVM basis functions
zk
= ( polynomial terms of xk
of degree 1 to q )
zk
= ( radial basis functions of xk
)
zk
= ( sigmoid functions of xk
)
Explosion of feature space dimensionality
• Consider a degree 2 polynomial kernel function
z = φ(x) for data point x = (x1
, x2
, …, xn
)
z1
= x1
, …, zn
= xn
zn+1
= (x1
)2
, …, z2n
= (xn
)2
z2n+1
= x1
x1
, …, zN
= xn-1
xn
where N = n(n+3)/2
• When constructing polynomials of degree 5 for a 256-
dimensional input space the feature space is
billion-dimensional
Kernel trick
Example: polynomial kernel
Kernel trick + QP
Max margin classifier can be found by solving
the weight matrix (no need to compute and store)
the decision function is
SVM Kernel Functions
Use kernel functions which compute
K(a, b)=(a ⋅ b +1)d
is an example of an SVM polynomial Kernel
Function
Beyond polynomials there are other very high dimensional basis
functions that can be made practical by finding the right Kernel
Function
Radial-Basis-style Kernel Function:
Neural-net-style Kernel Function:
σ, κ and δ are magic
parameters that must
be chosen by a model
selection method
such as CV or VCSRM
SVM Performance
Anecdotally they work very very well indeed.
Overcomes linear separability problem
Transforming input space to a higher dimension feature space
Overcome dimensionality explosion by kernel trick
Generalizes well (overfitting not as serious)
Maximum margin separator
Find MMS by quadratic programming
Example:
currently the best-known classifier on a well-studied
hand-written-character recognition benchmark
several reliable people doing practical real-world work claim that
SVMs have saved them when their other favorite classifiers did
poorly.
Hand-written character recognition
• MNIST: a data set of hand-written digits
− 60,000 training samples
− 10,000 test samples
− Each sample consists of 28 x 28 = 784 pixels
• Various techniques have been tried
− Linear classifier: 12.0%
− 2-layer BP net (300 hidden nodes) 4.7%
− 3-layer BP net (300+200 hidden nodes) 3.05%
− Support vector machine (SVM) 1.4%
− Convolutional net 0.4%
− 6 layer BP net (7500 hidden nodes): 0.35%
Failure rate for
test samples
SVM Performance
There is a lot of excitement and
religious fervor about SVMs as of 2001.
Despite this, some practitioners are a
little skeptical.
Doing multi-class classification
SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2).
Extend to output arity N, learn N SVM’s
SVM 1 learns “Output==1” vs “Output != 1”
SVM 2 learns “Output==2” vs “Output != 2”
:
SVM N learns “Output==N” vs “Output != N”
SVM can also be extended to compute any real value functions.
References
An excellent tutorial on VC-dimension and Support
Vector Machines:
C.J.C. Burges. A tutorial on support vector machines for pattern
recognition. Data Mining and Knowledge Discovery,
2(2):955-974, 1998.
http://citeseer.nj.nec.com/burges98tutorial.html
The VC/SRM/SVM Bible:
Statistical Learning Theory by Vladimir Vapnik,
Wiley-Interscience; 1998
Download SVM-light:
http://svmlight.joachims.org/

More Related Content

Similar to [ML]-SVM2.ppt.pdf

2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector MachinesSilicon Mentor
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.pptMahimMajee
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalStéphane Canu
 
Support Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxSupport Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxCodingChamp1
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Maninda Edirisooriya
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deepKNaveenKumarECE
 
Lecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptxLecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptxDuniaAbdelaziz
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines SimplyEmad Nabil
 
Support vector machine
Support vector machineSupport vector machine
Support vector machinePrasenjit Dey
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Cheng Feng
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Beniamino Murgante
 

Similar to [ML]-SVM2.ppt.pdf (20)

2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Introduction to Support Vector Machines
Introduction to Support Vector MachinesIntroduction to Support Vector Machines
Introduction to Support Vector Machines
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
linear SVM.ppt
linear SVM.pptlinear SVM.ppt
linear SVM.ppt
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
 
SVM.ppt
SVM.pptSVM.ppt
SVM.ppt
 
Support Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptxSupport Vector Machine topic of machine learning.pptx
Support Vector Machine topic of machine learning.pptx
 
Svm vs ls svm
Svm vs ls svmSvm vs ls svm
Svm vs ls svm
 
support vector machine
support vector machinesupport vector machine
support vector machine
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Gentle intro to SVM
Gentle intro to SVMGentle intro to SVM
Gentle intro to SVM
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 
Lecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptxLecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptx
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Winnow vs perceptron
Winnow vs perceptronWinnow vs perceptron
Winnow vs perceptron
 
Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01Epsrcws08 campbell isvm_01
Epsrcws08 campbell isvm_01
 
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov –...
 

Recently uploaded

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

[ML]-SVM2.ppt.pdf

  • 2. Overviews Proposed by Vapnik and his colleagues - Started in 1963, taking shape in late 70’s as part of his statistical learning theory (with Chervonenkis) - Current form established in early 90’s (with Cortes) Becomes popular in last decade - Classification, regression (function approx.), optimization - Compared favorably to MLP Basic ideas - Overcoming linear seperability problem by transforming the problem into higher dimensional space using kernel functions - (become equiv. to 2-layer perceptron when kernel is sigmoid function) - Maximize margin of decision boundary
  • 3. Linear Classifiers f x α ytest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) How would you classify this data?
  • 4. Linear Classifiers f x α ytest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) How would you classify this data?
  • 5. Linear Classifiers f x α ytest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) How would you classify this data?
  • 6. Linear Classifiers f x α ytest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) How would you classify this data?
  • 7. Linear Classifiers f x α ytest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) Any of these would be fine.. ..but which is best?
  • 8. Classifier Margin f x α ytest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.
  • 9. Maximum Margin f x α yest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) The maximum margin linear classifier is the linear classifier with maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM
  • 10. Maximum Margin f x α yest denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) The maximum margin linear classifier is the linear classifier with maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM
  • 11. Why Maximum Margin? denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1. Intuitively this feels safest. 2. If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 3. CV is easy since the model is immune to removal of any non-support-vector datapoints. 4. There’s some theory that this is a good thing. 5. Empirically it works very very well.
  • 12. Specifying a line and margin How do we represent this mathematically? …in m input dimensions? Plus-Plane Minus-Plane Classifier Boundary “Predict Class = +1” zone “Predict Class = -1” zone
  • 13. Specifying a line and margin Conditions for optimal separating hyperplane for data points (x1 , y1 ),…,(xl , yl ) where yi =±1 1. w . xi + b ≥ 1 if yi = 1 (points in plus class) 2. w . xi + b ≤ -1 if yi = -1 (points in minus class) Plus-Plane Minus-Plane Classifier Boundary “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1
  • 14. Computing the margin width Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b?
  • 15. Computing the margin width Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? Let u and v be two vectors on the Plus Plane. What is w . ( u – v ) ? And so of course the vector w is also perpendicular to the Minus Plane
  • 16. Computing the margin width Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } The vector w is perpendicular to the Plus Plane Let x- be any point on the minus plane Let x+ be the closest plus-plane-point to x- . “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? x- x+ Any location in m : not necessarily a datapoint Any location in Rm : not necessarily a datapoint
  • 17. Computing the margin width Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } The vector w is perpendicular to the Plus Plane Let x- be any point on the minus plane Let x+ be the closest plus-plane-point to x- . Claim: x+ = x- + λ w for some value of λ. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? x- x+
  • 18. Computing the margin width Plus-plane = { x : w . x + b = +1 } Minus-plane = { x : w . x + b = -1 } The vector w is perpendicular to the Plus Plane Let x- be any point on the minus plane Let x+ be the closest plus-plane-point to x- . Claim: x+ = x- + λ w for some value of λ. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? x- x+ The line from x- to x+ is perpendicular to the planes. So to get from x- to x+ travel some distance in direction w.
  • 19. Computing the margin width What we know: w . x+ + b = +1 w . x- + b = -1 x+ = x- + λ w |x+ - x- | = M It’s now easy to get M in terms of w and b “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width x- x+
  • 20. Computing the margin width What we know: w . x+ + b = +1 w . x- + b = -1 x+ = x- + λ w |x+ - x- | = M It’s now easy to get M in terms of w and b “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width w . (x - + λ w) + b = 1 => w . x - + b + λ w .w = 1 => -1 + λ w .w = 1 => x- x+
  • 21. Computing the margin width What we know: w . x+ + b = +1 w . x- + b = -1 x+ = x- + λ w |x+ - x- | = M “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width = M = |x+ - x- | =| λ w |= x- x+
  • 22. Learning the Maximum Margin Classifier Given a guess of w and b we can Compute whether all data points in the correct half-planes Compute the width of the margin So now we just need to write a program to search the space of w’s and b’s to find the widest margin that matches all the datapoints. How? Gradient descent? Simulated Annealing? Matrix Inversion? EM? Newton’s Method? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width = x- x+
  • 23. Learning via Quadratic Programming Optimal separating hyperplane can be found by solving - This is a quadratic function - Once are found, the weight matrix the decision function is This optimization problem can be solved by quadratic programming QP is a well-studied class of optimization algorithms to maximize a quadratic function subject to linear constraints
  • 24. Quadratic Programming Find And subject to n additional linear inequality constraints e additional linear equality constraints Quadratic criterion Subject to
  • 25. Quadratic Programming Find Subject to And subject to n additional linear inequality constraints e additional linear equality constraints Quadratic criterion There exist algorithms for finding such constrained quadratic optima much more efficiently and reliably than gradient ascent. (But they are very fiddly…you probably don’t want to write one yourself)
  • 26. Learning the Maximum Margin Classifier Given guess of w , b we can Compute whether all data points are in the correct half-planes Compute the margin width Assume R datapoints, each (xk ,yk ) where yk = +/- 1 “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = What should our quadratic optimization criterion be? How many constraints will we have? What should they be?
  • 27. Suppose we’re in 1-dimension What would SVMs do with this data? x=0
  • 28. Suppose we’re in 1-dimension Not a big surprise Positive “plane” Negative “plane” x=0
  • 29. Harder 1-dimensional dataset That’s wiped the smirk off SVM’s face. What can be done about this? x=0
  • 30. Harder 1-dimensional dataset Remember how permitting non-linear basis functions made linear regression so much nicer? Let’s permit them here too x=0
  • 31. Harder 1-dimensional dataset Remember how permitting non-linear basis functions made linear regression so much nicer? Let’s permit them here too x=0
  • 32. Common SVM basis functions zk = ( polynomial terms of xk of degree 1 to q ) zk = ( radial basis functions of xk ) zk = ( sigmoid functions of xk )
  • 33. Explosion of feature space dimensionality • Consider a degree 2 polynomial kernel function z = φ(x) for data point x = (x1 , x2 , …, xn ) z1 = x1 , …, zn = xn zn+1 = (x1 )2 , …, z2n = (xn )2 z2n+1 = x1 x1 , …, zN = xn-1 xn where N = n(n+3)/2 • When constructing polynomials of degree 5 for a 256- dimensional input space the feature space is billion-dimensional
  • 35. Kernel trick + QP Max margin classifier can be found by solving the weight matrix (no need to compute and store) the decision function is
  • 36. SVM Kernel Functions Use kernel functions which compute K(a, b)=(a ⋅ b +1)d is an example of an SVM polynomial Kernel Function Beyond polynomials there are other very high dimensional basis functions that can be made practical by finding the right Kernel Function Radial-Basis-style Kernel Function: Neural-net-style Kernel Function: σ, κ and δ are magic parameters that must be chosen by a model selection method such as CV or VCSRM
  • 37.
  • 38.
  • 39. SVM Performance Anecdotally they work very very well indeed. Overcomes linear separability problem Transforming input space to a higher dimension feature space Overcome dimensionality explosion by kernel trick Generalizes well (overfitting not as serious) Maximum margin separator Find MMS by quadratic programming Example: currently the best-known classifier on a well-studied hand-written-character recognition benchmark several reliable people doing practical real-world work claim that SVMs have saved them when their other favorite classifiers did poorly.
  • 40. Hand-written character recognition • MNIST: a data set of hand-written digits − 60,000 training samples − 10,000 test samples − Each sample consists of 28 x 28 = 784 pixels • Various techniques have been tried − Linear classifier: 12.0% − 2-layer BP net (300 hidden nodes) 4.7% − 3-layer BP net (300+200 hidden nodes) 3.05% − Support vector machine (SVM) 1.4% − Convolutional net 0.4% − 6 layer BP net (7500 hidden nodes): 0.35% Failure rate for test samples
  • 41. SVM Performance There is a lot of excitement and religious fervor about SVMs as of 2001. Despite this, some practitioners are a little skeptical.
  • 42. Doing multi-class classification SVMs can only handle two-class outputs (i.e. a categorical output variable with arity 2). Extend to output arity N, learn N SVM’s SVM 1 learns “Output==1” vs “Output != 1” SVM 2 learns “Output==2” vs “Output != 2” : SVM N learns “Output==N” vs “Output != N” SVM can also be extended to compute any real value functions.
  • 43. References An excellent tutorial on VC-dimension and Support Vector Machines: C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html The VC/SRM/SVM Bible: Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience; 1998 Download SVM-light: http://svmlight.joachims.org/