SlideShare a Scribd company logo
1 of 13
University of Texas at Austin
Machine Learning Group
Machine Learning Group
Department of Computer Sciences
University of Texas at Austin
Support Vector Machines
2
University of Texas at Austin
Machine Learning Group
Perceptron Revisited: Linear Separators
• Binary classification can be viewed as the task of
separating classes in feature space:
wTx + b = 0
wTx + b < 0
wTx + b > 0
f(x) = sign(wTx + b)
3
University of Texas at Austin
Machine Learning Group
Linear Separators
• Which of the linear separators is optimal?
4
University of Texas at Austin
Machine Learning Group
Classification Margin
• Distance from example xi to the separator is
• Examples closest to the hyperplane are support vectors.
• Margin ρ of the separator is the distance between support vectors.
w
x
w b
r i
T


r
ρ
5
University of Texas at Austin
Machine Learning Group
Maximum Margin Classification
• Maximizing the margin is good according to intuition and PAC
theory.
• Implies that only support vectors matter; other training
examples are ignorable.
6
University of Texas at Austin
Machine Learning Group
Linear SVM Mathematically
• Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} be separated by a
hyperplane with margin ρ. Then for each training example (xi, yi):
• For every support vector xs the above inequality is an equality.
After rescaling w and b by ρ/2 in the equality, we obtain that
distance between each xs and the hyperplane is
• Then the margin can be expressed through (rescaled) w and b as:
wTxi + b ≤ - ρ/2 if yi = -1
wTxi + b ≥ ρ/2 if yi = 1
w
2
2 
 r

w
w
x
w 1
)
(
y



b
r s
T
s
yi(wTxi + b) ≥ ρ/2

7
University of Texas at Austin
Machine Learning Group
Linear SVMs Mathematically (cont.)
• Then we can formulate the quadratic optimization problem:
Which can be reformulated as:
Find w and b such that
is maximized
and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1
w
2


Find w and b such that
Φ(w) = ||w||2=wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
8
University of Texas at Austin
Machine Learning Group
Solving the Optimization Problem
• Need to optimize a quadratic function subject to linear constraints.
• Quadratic optimization problems are a well-known class of mathematical
programming problems for which several (non-trivial) algorithms exist.
• The solution involves constructing a dual problem where a Lagrange
multiplier αi is associated with every inequality constraint in the primal
(original) problem:
Find w and b such that
Φ(w) =wTw is minimized
and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
Find α1…αn such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxi
Txj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
9
University of Texas at Austin
Machine Learning Group
The Optimization Problem Solution
• Given a solution α1…αn to the dual problem, solution to the primal is:
• Each non-zero αi indicates that corresponding xi is a support vector.
• Then the classifying function is (note that we don’t need w explicitly):
• Notice that it relies on an inner product between the test point x and the
support vectors xi – we will return to this later.
• Also keep in mind that solving the optimization problem involved
computing the inner products xi
Txj between all training points.
w =Σαiyixi b = yk - Σαiyixi
Txk for any αk > 0
f(x) = Σαiyixi
Tx + b
10
University of Texas at Austin
Machine Learning Group
Non-linear SVMs
• Datasets that are linearly separable with some noise work out great:
• But what are we going to do if the dataset is just too hard?
• How about… mapping data to a higher-dimensional space:
0
0
0
x2
x
x
x
11
University of Texas at Austin
Machine Learning Group
Non-linear SVMs: Feature spaces
• General idea: the original feature space can always be mapped to some
higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
12
University of Texas at Austin
Machine Learning Group
Non-linear SVMs Mathematically
• Dual problem formulation:
• The solution is:
• Optimization techniques for finding αi’s remain the same!
Find α1…αn such that
Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
f(x) = ΣαiyiK(xi, xj)+ b
13
University of Texas at Austin
Machine Learning Group
SVM applications
• SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and
gained increasing popularity in late 1990s.
• SVMs are currently among the best performers for a number of classification
tasks ranging from text to genomic data.
• SVMs can be applied to complex data types beyond feature vectors (e.g. graphs,
sequences, relational data) by designing kernel functions for such data.
• SVM techniques have been extended to a number of tasks such as regression
[Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc.
• Most popular optimization algorithms for SVMs use decomposition to hill-
climb over a subset of αi’s at a time, e.g. SMO [Platt ’99] and [Joachims ’99]
• Tuning SVMs remains a black art: selecting a specific kernel and parameters is
usually done in a try-and-see manner.

More Related Content

Similar to lawper.ppt

Machine learning interviews day2
Machine learning interviews   day2Machine learning interviews   day2
Machine learning interviews day2
rajmohanc
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…
2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…
2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…
Dongseo University
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
NoorUlHaq47
 

Similar to lawper.ppt (20)

Support Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the theSupport Vector Machines is the the the the the the the the the
Support Vector Machines is the the the the the the the the the
 
Machine learning interviews day2
Machine learning interviews   day2Machine learning interviews   day2
Machine learning interviews day2
 
Lecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptxLecture8-SVMs-PartI-Feb17-2021.pptx
Lecture8-SVMs-PartI-Feb17-2021.pptx
 
Afsar ml applied_svm
Afsar ml applied_svmAfsar ml applied_svm
Afsar ml applied_svm
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Data Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptxData Mining Lecture_10(b).pptx
Data Mining Lecture_10(b).pptx
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
support vector machine
support vector machinesupport vector machine
support vector machine
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Support Vector Machines Simply
Support Vector Machines SimplySupport Vector Machines Simply
Support Vector Machines Simply
 
[ML]-SVM2.ppt.pdf
[ML]-SVM2.ppt.pdf[ML]-SVM2.ppt.pdf
[ML]-SVM2.ppt.pdf
 
Svm my
Svm mySvm my
Svm my
 
Svm my
Svm mySvm my
Svm my
 
2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…
2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…
2013-1 Machine Learning Lecture 05 - Vasant Honavar - Support Vector Machi…
 
Support vector machine in data mining.pdf
Support vector machine in data mining.pdfSupport vector machine in data mining.pdf
Support vector machine in data mining.pdf
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
 
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
QMC: Operator Splitting Workshop, A Splitting Method for Nonsmooth Nonconvex ...
 
super vector machines algorithms using deep
super vector machines algorithms using deepsuper vector machines algorithms using deep
super vector machines algorithms using deep
 

Recently uploaded

一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样
一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样
一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样
mefyqyn
 
一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样
一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样
一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样
mefyqyn
 
Termination of Employees under the Labor Code.pptx
Termination of Employees under the Labor Code.pptxTermination of Employees under the Labor Code.pptx
Termination of Employees under the Labor Code.pptx
BrV
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样
一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样
一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样
mefyqyn
 

Recently uploaded (20)

Dabholkar-matter-Judgement-1.pdfrefp;sdPp;
Dabholkar-matter-Judgement-1.pdfrefp;sdPp;Dabholkar-matter-Judgement-1.pdfrefp;sdPp;
Dabholkar-matter-Judgement-1.pdfrefp;sdPp;
 
How to Protect Your Children During a Divorce?
How to Protect Your Children During a Divorce?How to Protect Your Children During a Divorce?
How to Protect Your Children During a Divorce?
 
CASE STYDY Lalman Shukla v Gauri Dutt BY MUKUL TYAGI.pptx
CASE STYDY Lalman Shukla v Gauri Dutt BY MUKUL TYAGI.pptxCASE STYDY Lalman Shukla v Gauri Dutt BY MUKUL TYAGI.pptx
CASE STYDY Lalman Shukla v Gauri Dutt BY MUKUL TYAGI.pptx
 
Essential Components of an Effective HIPAA Safeguard Program
Essential Components of an Effective HIPAA Safeguard ProgramEssential Components of an Effective HIPAA Safeguard Program
Essential Components of an Effective HIPAA Safeguard Program
 
Embed-1-1.pdfohediooieoiehohoiefoloeohefoi
Embed-1-1.pdfohediooieoiehohoiefoloeohefoiEmbed-1-1.pdfohediooieoiehohoiefoloeohefoi
Embed-1-1.pdfohediooieoiehohoiefoloeohefoi
 
Respondent Moot Memorial including Charges and Argument Advanced.docx
Respondent Moot Memorial including Charges and Argument Advanced.docxRespondent Moot Memorial including Charges and Argument Advanced.docx
Respondent Moot Memorial including Charges and Argument Advanced.docx
 
TTD - PPT on social stock exchange.pptx Presentation
TTD - PPT on social stock exchange.pptx PresentationTTD - PPT on social stock exchange.pptx Presentation
TTD - PPT on social stock exchange.pptx Presentation
 
一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样
一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样
一比一原版(UOL毕业证书)利物浦大学毕业证成绩单原件一模一样
 
How Can an Attorney Help With My Car Accident Claim?
How Can an Attorney Help With My Car Accident Claim?How Can an Attorney Help With My Car Accident Claim?
How Can an Attorney Help With My Car Accident Claim?
 
HOW LAW FIRMS CAN SUPPORT MILITARY DIVORCE CASES
HOW LAW FIRMS CAN SUPPORT MILITARY DIVORCE CASESHOW LAW FIRMS CAN SUPPORT MILITARY DIVORCE CASES
HOW LAW FIRMS CAN SUPPORT MILITARY DIVORCE CASES
 
2024 Managing Labor + Employee Relations Seminar
2024 Managing Labor + Employee Relations Seminar2024 Managing Labor + Employee Relations Seminar
2024 Managing Labor + Employee Relations Seminar
 
Comprehensive Guide on Drafting Directors' Report and its ROC Compliances und...
Comprehensive Guide on Drafting Directors' Report and its ROC Compliances und...Comprehensive Guide on Drafting Directors' Report and its ROC Compliances und...
Comprehensive Guide on Drafting Directors' Report and its ROC Compliances und...
 
IRDA role in Insurance sector in India .pptx
IRDA role in Insurance sector in India .pptxIRDA role in Insurance sector in India .pptx
IRDA role in Insurance sector in India .pptx
 
ORane M Cornish affidavit statement for New Britain court proving Wentworth'...
ORane M Cornish affidavit statement  for New Britain court proving Wentworth'...ORane M Cornish affidavit statement  for New Britain court proving Wentworth'...
ORane M Cornish affidavit statement for New Britain court proving Wentworth'...
 
一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样
一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样
一比一原版(TUOS毕业证书)谢菲尔德大学毕业证成绩单原件一模一样
 
File Taxes Online Simple Steps for Efficient Filing.pdf
File Taxes Online Simple Steps for Efficient Filing.pdfFile Taxes Online Simple Steps for Efficient Filing.pdf
File Taxes Online Simple Steps for Efficient Filing.pdf
 
Termination of Employees under the Labor Code.pptx
Termination of Employees under the Labor Code.pptxTermination of Employees under the Labor Code.pptx
Termination of Employees under the Labor Code.pptx
 
Embed-4-2.pdf vk[di-[sd[0edKP[p-[kedkpodekp
Embed-4-2.pdf vk[di-[sd[0edKP[p-[kedkpodekpEmbed-4-2.pdf vk[di-[sd[0edKP[p-[kedkpodekp
Embed-4-2.pdf vk[di-[sd[0edKP[p-[kedkpodekp
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样
一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样
一比一原版(BCU毕业证书)伯明翰城市大学毕业证成绩单原件一模一样
 
Law of succession-Notes for students studying law
Law of succession-Notes for students studying lawLaw of succession-Notes for students studying law
Law of succession-Notes for students studying law
 

lawper.ppt

  • 1. University of Texas at Austin Machine Learning Group Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines
  • 2. 2 University of Texas at Austin Machine Learning Group Perceptron Revisited: Linear Separators • Binary classification can be viewed as the task of separating classes in feature space: wTx + b = 0 wTx + b < 0 wTx + b > 0 f(x) = sign(wTx + b)
  • 3. 3 University of Texas at Austin Machine Learning Group Linear Separators • Which of the linear separators is optimal?
  • 4. 4 University of Texas at Austin Machine Learning Group Classification Margin • Distance from example xi to the separator is • Examples closest to the hyperplane are support vectors. • Margin ρ of the separator is the distance between support vectors. w x w b r i T   r ρ
  • 5. 5 University of Texas at Austin Machine Learning Group Maximum Margin Classification • Maximizing the margin is good according to intuition and PAC theory. • Implies that only support vectors matter; other training examples are ignorable.
  • 6. 6 University of Texas at Austin Machine Learning Group Linear SVM Mathematically • Let training set {(xi, yi)}i=1..n, xiRd, yi  {-1, 1} be separated by a hyperplane with margin ρ. Then for each training example (xi, yi): • For every support vector xs the above inequality is an equality. After rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is • Then the margin can be expressed through (rescaled) w and b as: wTxi + b ≤ - ρ/2 if yi = -1 wTxi + b ≥ ρ/2 if yi = 1 w 2 2   r  w w x w 1 ) ( y    b r s T s yi(wTxi + b) ≥ ρ/2 
  • 7. 7 University of Texas at Austin Machine Learning Group Linear SVMs Mathematically (cont.) • Then we can formulate the quadratic optimization problem: Which can be reformulated as: Find w and b such that is maximized and for all (xi, yi), i=1..n : yi(wTxi + b) ≥ 1 w 2   Find w and b such that Φ(w) = ||w||2=wTw is minimized and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1
  • 8. 8 University of Texas at Austin Machine Learning Group Solving the Optimization Problem • Need to optimize a quadratic function subject to linear constraints. • Quadratic optimization problems are a well-known class of mathematical programming problems for which several (non-trivial) algorithms exist. • The solution involves constructing a dual problem where a Lagrange multiplier αi is associated with every inequality constraint in the primal (original) problem: Find w and b such that Φ(w) =wTw is minimized and for all (xi, yi), i=1..n : yi (wTxi + b) ≥ 1 Find α1…αn such that Q(α) =Σαi - ½ΣΣαiαjyiyjxi Txj is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi
  • 9. 9 University of Texas at Austin Machine Learning Group The Optimization Problem Solution • Given a solution α1…αn to the dual problem, solution to the primal is: • Each non-zero αi indicates that corresponding xi is a support vector. • Then the classifying function is (note that we don’t need w explicitly): • Notice that it relies on an inner product between the test point x and the support vectors xi – we will return to this later. • Also keep in mind that solving the optimization problem involved computing the inner products xi Txj between all training points. w =Σαiyixi b = yk - Σαiyixi Txk for any αk > 0 f(x) = Σαiyixi Tx + b
  • 10. 10 University of Texas at Austin Machine Learning Group Non-linear SVMs • Datasets that are linearly separable with some noise work out great: • But what are we going to do if the dataset is just too hard? • How about… mapping data to a higher-dimensional space: 0 0 0 x2 x x x
  • 11. 11 University of Texas at Austin Machine Learning Group Non-linear SVMs: Feature spaces • General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x)
  • 12. 12 University of Texas at Austin Machine Learning Group Non-linear SVMs Mathematically • Dual problem formulation: • The solution is: • Optimization techniques for finding αi’s remain the same! Find α1…αn such that Q(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and (1) Σαiyi = 0 (2) αi ≥ 0 for all αi f(x) = ΣαiyiK(xi, xj)+ b
  • 13. 13 University of Texas at Austin Machine Learning Group SVM applications • SVMs were originally proposed by Boser, Guyon and Vapnik in 1992 and gained increasing popularity in late 1990s. • SVMs are currently among the best performers for a number of classification tasks ranging from text to genomic data. • SVMs can be applied to complex data types beyond feature vectors (e.g. graphs, sequences, relational data) by designing kernel functions for such data. • SVM techniques have been extended to a number of tasks such as regression [Vapnik et al. ’97], principal component analysis [Schölkopf et al. ’99], etc. • Most popular optimization algorithms for SVMs use decomposition to hill- climb over a subset of αi’s at a time, e.g. SMO [Platt ’99] and [Joachims ’99] • Tuning SVMs remains a black art: selecting a specific kernel and parameters is usually done in a try-and-see manner.