SlideShare a Scribd company logo
1 of 12
ECE 8443 – Pattern Recognition
ECE 8443 – Pattern Recognition
• Objectives:
Empirical Risk Minimization
Large-Margin Classifiers
Soft Margin Classifiers
SVM Training
Relevance Vector Machines
• Resources:
DML: Introduction to SVMs
AM: SVM Tutorial
JP: SVM Resources
OC: Taxonomy
NC: SVM Tutorial
LECTURE 16: SUPPORT VECTOR MACHINES
Class 1
Class 2
Audio:
URL:
ECE 8443: Lecture 16, Slide 1
Generative Models
• Thus far we have essentially considered techniques that perform classification
indirectly by modeling the training data, optimizing the parameters of that
model, and then performing classification by choosing the closest model. This
approach is known as a generative model: by training models supervised
learning assumes we know the form of the underlying density function, which
is often not true in real applications.
• Convergence in maximum likelihood does
not guarantee optimal classification.
• Gaussian MLE modeling tends to
overfit data.
0 10 20 30 40
0
0.1
0.2
0.3
0.4
3.5 4 4.5 5
0
0.005
0.01
0.015
0.02
0.025
0.03
ML Decision
Optimal
Decision
Boundary
Boundary
(MLE Gaussian)
Discrimination
Class-Dependent PCA
• Real data often not separable by hyperplanes.
• Goal: balance representation and discrimination
in a common framework (rather than alternating
between the two).
ECE 8443: Lecture 16, Slide 2
Risk Minimization
• The expected risk can be defined as:
)
,
(
)
,
(
( y
x
x
y dP
f
R 


 


• Empirical risk is defined as:
 


l
i
i
i
emp f
l
R
1
)
,
(
2
1
( 
 x
y
• These are related by the Vapnik-Chervonenkis (VC) dimension:
)
(
(
( h
f
R
R emp 
 

l
h
l
h
h
f
)
4
/
log(
))
1
)
/
2
(log((
)
(




where
)
(h
f is referred to as the VC confidence, where  is a confidence measure in
the range [0,1].
• The VC dimension, h, is a measure of the capacity of the learning machine.
• The principle of structural risk minimization (SRM) involves finding the subset
of functions that minimizes the bound on the actual risk.
• Optimal hyperplane classifiers achieve zero empirical risk for linearly
separable data.
• A Support Vector Machine is an approach that gives the least upper bound on
the risk.
confidence in the risk
empirical risk
bound on the expected risk
VC dimension
Expected risk
optimum
ECE 8443: Lecture 16, Slide 3
Large-Margin Classification
• Hyperplanes C0 - C2 achieve perfect classification
(zero empirical risk):
 C0 is optimal in terms of generalization.
 The data points that define the boundary
are called support vectors.
 A hyperplane can be defined by:
 We will impose the constraints:
The data points that satisfy the equality are
called support vectors.
• Support vectors are found using a constrained optimization:
• The final classifier is computed using the support vectors and the weights:
b

 w
x
origin
class 1
class 2
w
H1
H2
C1
CO
C2
optimal
classifier
0
1
)
( 

 b
w
x
y i
i
i


 





N
i
i
N
i
i
i
i
p b
y
w
L
1
1
2
)
(
2
1

 w
x
 



N
i
i
i
i b
y
f
1
)
( x
x
x 
ECE 8443: Lecture 16, Slide 4
Class 1
Class 2
Soft-Margin Classification
• In practice, the number of support vectors will grow unacceptably large for
real problems with large amounts of data.
• Also, the system will be very sensitive to mislabeled training data or outliers.
• Solution: introduce “slack variables”
or a soft margin:
This gives the system the ability to
ignore data points near the boundary,
and effectively pushes the margin
towards the centroid of the training data.
• This is now a constrained optimization
with an additional constraint:
• The solution to this problem can still be found using Lagrange multipliers.
0
)
1
(
)
( 


 i
i
i
i b
w
x
y 
ECE 8443: Lecture 16, Slide 5
Nonlinear Decision Surfaces
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f(.)
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
f( )
Feature space
Input space
• Thus far we have only considered linear decision surfaces. How do we
generalize this to a nonlinear surface?
• Our approach will be to transform the data to a higher dimensional space
where the data can be separated by a linear surface.
• Define a kernel function:
Examples of kernel functions include polynomial:
)
(
)
(
)
,
( j
i
j
i
K x
x
x
x f
f

d
j
t
i
j
i
K )
1
(
)
,
( 
 x
x
x
x
ECE 8443: Lecture 16, Slide 6
Kernel Functions
Other popular kernels are a radial basis function (popular in neural networks):
)
)
2
(
exp(
)
,
( 2
2

j
i
j
i
K x
x
x
x 

and a sigmoid function:
)
tanh(
)
,
( 

 j
t
i
j
i k
K x
x
x
x
• Our optimization does not change significantly:
• The final classifier has a similar form:
• Let’s work some examples.
 


N
i
i
i
i b
K
y
f
1
)
,
(
)
( x
x
x 
0
,
0
)
,
(
2
1
)
(
max
1
1 1
1




 
 


 

i
n
i
i
i
j
i
j
i
n
i
n
j
j
i
n
i
i
y
C
to
subject
K
y
y
W





 x
x
ECE 8443: Lecture 16, Slide 7
SVM Limitations
Model Complexity
Error
Training Set
Error
Open-Loop
Error
Optimum
• Uses a binary (yes/no) decision rule
• Generates a distance from the hyperplane, but this distance is often not a
good measure of our “confidence” in the classification
• Can produce a “probability” as a function of the distance (e.g. using
sigmoid fits), but they are inadequate
• Number of support vectors grows linearly with the size of the data set
• Requires the estimation of trade-off parameter, C, via held-out sets
ECE 8443: Lecture 16, Slide 8
• Build a fully specified probabilistic model – incorporate prior
information/beliefs as well as a notion of confidence in predictions.
• MacKay posed a special form for regularization in neural networks – sparsity.
• Evidence maximization: evaluate candidate models based on their
“evidence”, P(D|Hi).
• Evidence approximation:
• Likelihood of data given best fit parameter set:
• Penalty that measures how well our posterior model
fits our prior assumptions:
• We can use set the prior in favor of sparse,
smooth models.
• Incorporates an automatic relevance
determination (ARD) prior over each weight.
Evidence Maximization
w
w
w 
 )
|
ˆ
(
)
,
ˆ
|
(
)
|
( H
P
H
D
P
H
D
P
)
,
ˆ
|
( H
D
P w
w
w
w
P(w|D,Hi)
P(w|Hi)
w
w 
)
|
ˆ
( H
P
)
1
),
0
(
|
(
)
|
(
0
 


N
i i
i
i
w
N
P



w
ECE 8443: Lecture 16, Slide 9
• Still a kernel-based learning machine:
• Incorporates an automatic relevance determination (ARD) prior over each
weight (MacKay)
• A flat (non-informative) prior over a completes the Bayesian specification.
• The goal in training becomes finding:
• Estimation of the “sparsity” parameters is inherent in the optimization – no
need for a held-out set.
• A closed-form solution to this maximization problem is not available.
Rather, we iteratively reestimate .
Relevance Vector Machines




N
i
i
iK
w
w
y
1
0 )
,
(
)
;
( x
x
w
x )
;
(
1
1
)
;
|
1
( w
x
w
x
i
y
i
e
t
P 



)
1
),
0
(
|
(
)
|
(
0
 


N
i i
i
i
w
N
P



w
)
|
(
)
|
,
(
)
,
,
|
(
)
,
(
)
,
|
,
(
,
max
arg
ˆ
,
ˆ
X
X
α
w
X
α
w
α
w
X
α
w
α
w
α
w
t
p
p
t
p
p
where
t
p


α
w ˆ
ˆ and
ECE 8443: Lecture 16, Slide 10
Summary
• Support Vector Machines are one example of a kernel-based learning machine
that is training in a discriminative fashion.
• Integrates notions of risk minimization, large-margin and soft margin
classification.
• Two fundamental innovations:
 maximize the margin between the classes using actual data points,
 rotate the data into a higher-dimensional space in which the data is linearly
separable.
• Training can be computationally expensive but classification is very fast.
• Note that SVMs are inherently non-probabilistic (e.g., non-Bayesian).
• SVMs can be used to estimate posteriors by mapping the SVM output to a
likelihood-like quantity using a nonlinear function (e.g., sigmoid).
• SVMs are not inherently suited to an N-way classification problem. Typical
approaches include a pairwise comparison or “one vs. world” approach.
ECE 8443: Lecture 16, Slide 11
Summary
• Many alternate forms include Transductive SVMs, Sequential SVMs, Support
Vector Regression, Relevance Vector Machines, and data-driven kernels.
• Key lesson learned: a linear algorithm in the feature space is equivalent to a
nonlinear algorithm in the input space. Standard linear algorithms can be
generalized (e.g., kernel principal component analysis, kernel independent
component analysis, kernel canonical correlation analysis, kernel k-means).
• What we didn’t discuss:
 How do you train SVMs?
 Computational complexity?
 How to deal with large amounts of data?
See Ganapathiraju for an excellent, easy to understand discourse on SVMs
and Hamaker (Chapter 3) for a nice overview on RVMs. There are many other
tutorials available online (see the links on the title slide) as well.
 Other methods based on kernels – more to follow.

More Related Content

Similar to lecture_16.pptx

Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerZahra Sadeghi
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceDiego Tosato
 
Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01Cheng Feng
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhPoorabpatel
 
Adversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam BhattacharyaAdversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam BhattacharyaRupam Bhattacharya
 
Conference_paper.pdf
Conference_paper.pdfConference_paper.pdf
Conference_paper.pdfNarenRajVivek
 
Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptSubrata Kumer Paul
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 

Similar to lecture_16.pptx (20)

Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01Epsrcws08 campbell kbm_01
Epsrcws08 campbell kbm_01
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Module-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptxModule-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptx
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
Adversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam BhattacharyaAdversarial Learning_Rupam Bhattacharya
Adversarial Learning_Rupam Bhattacharya
 
Conference_paper.pdf
Conference_paper.pdfConference_paper.pdf
Conference_paper.pdf
 
09 classadvanced
09 classadvanced09 classadvanced
09 classadvanced
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.ppt
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
P1121133727
P1121133727P1121133727
P1121133727
 

Recently uploaded

Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Recently uploaded (20)

Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 

lecture_16.pptx

  • 1. ECE 8443 – Pattern Recognition ECE 8443 – Pattern Recognition • Objectives: Empirical Risk Minimization Large-Margin Classifiers Soft Margin Classifiers SVM Training Relevance Vector Machines • Resources: DML: Introduction to SVMs AM: SVM Tutorial JP: SVM Resources OC: Taxonomy NC: SVM Tutorial LECTURE 16: SUPPORT VECTOR MACHINES Class 1 Class 2 Audio: URL:
  • 2. ECE 8443: Lecture 16, Slide 1 Generative Models • Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing the parameters of that model, and then performing classification by choosing the closest model. This approach is known as a generative model: by training models supervised learning assumes we know the form of the underlying density function, which is often not true in real applications. • Convergence in maximum likelihood does not guarantee optimal classification. • Gaussian MLE modeling tends to overfit data. 0 10 20 30 40 0 0.1 0.2 0.3 0.4 3.5 4 4.5 5 0 0.005 0.01 0.015 0.02 0.025 0.03 ML Decision Optimal Decision Boundary Boundary (MLE Gaussian) Discrimination Class-Dependent PCA • Real data often not separable by hyperplanes. • Goal: balance representation and discrimination in a common framework (rather than alternating between the two).
  • 3. ECE 8443: Lecture 16, Slide 2 Risk Minimization • The expected risk can be defined as: ) , ( ) , ( ( y x x y dP f R        • Empirical risk is defined as:     l i i i emp f l R 1 ) , ( 2 1 (   x y • These are related by the Vapnik-Chervonenkis (VC) dimension: ) ( ( ( h f R R emp     l h l h h f ) 4 / log( )) 1 ) / 2 (log(( ) (     where ) (h f is referred to as the VC confidence, where  is a confidence measure in the range [0,1]. • The VC dimension, h, is a measure of the capacity of the learning machine. • The principle of structural risk minimization (SRM) involves finding the subset of functions that minimizes the bound on the actual risk. • Optimal hyperplane classifiers achieve zero empirical risk for linearly separable data. • A Support Vector Machine is an approach that gives the least upper bound on the risk. confidence in the risk empirical risk bound on the expected risk VC dimension Expected risk optimum
  • 4. ECE 8443: Lecture 16, Slide 3 Large-Margin Classification • Hyperplanes C0 - C2 achieve perfect classification (zero empirical risk):  C0 is optimal in terms of generalization.  The data points that define the boundary are called support vectors.  A hyperplane can be defined by:  We will impose the constraints: The data points that satisfy the equality are called support vectors. • Support vectors are found using a constrained optimization: • The final classifier is computed using the support vectors and the weights: b   w x origin class 1 class 2 w H1 H2 C1 CO C2 optimal classifier 0 1 ) (    b w x y i i i          N i i N i i i i p b y w L 1 1 2 ) ( 2 1   w x      N i i i i b y f 1 ) ( x x x 
  • 5. ECE 8443: Lecture 16, Slide 4 Class 1 Class 2 Soft-Margin Classification • In practice, the number of support vectors will grow unacceptably large for real problems with large amounts of data. • Also, the system will be very sensitive to mislabeled training data or outliers. • Solution: introduce “slack variables” or a soft margin: This gives the system the ability to ignore data points near the boundary, and effectively pushes the margin towards the centroid of the training data. • This is now a constrained optimization with an additional constraint: • The solution to this problem can still be found using Lagrange multipliers. 0 ) 1 ( ) (     i i i i b w x y 
  • 6. ECE 8443: Lecture 16, Slide 5 Nonlinear Decision Surfaces f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f(.) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) f( ) Feature space Input space • Thus far we have only considered linear decision surfaces. How do we generalize this to a nonlinear surface? • Our approach will be to transform the data to a higher dimensional space where the data can be separated by a linear surface. • Define a kernel function: Examples of kernel functions include polynomial: ) ( ) ( ) , ( j i j i K x x x x f f  d j t i j i K ) 1 ( ) , (   x x x x
  • 7. ECE 8443: Lecture 16, Slide 6 Kernel Functions Other popular kernels are a radial basis function (popular in neural networks): ) ) 2 ( exp( ) , ( 2 2  j i j i K x x x x   and a sigmoid function: ) tanh( ) , (    j t i j i k K x x x x • Our optimization does not change significantly: • The final classifier has a similar form: • Let’s work some examples.     N i i i i b K y f 1 ) , ( ) ( x x x  0 , 0 ) , ( 2 1 ) ( max 1 1 1 1              i n i i i j i j i n i n j j i n i i y C to subject K y y W       x x
  • 8. ECE 8443: Lecture 16, Slide 7 SVM Limitations Model Complexity Error Training Set Error Open-Loop Error Optimum • Uses a binary (yes/no) decision rule • Generates a distance from the hyperplane, but this distance is often not a good measure of our “confidence” in the classification • Can produce a “probability” as a function of the distance (e.g. using sigmoid fits), but they are inadequate • Number of support vectors grows linearly with the size of the data set • Requires the estimation of trade-off parameter, C, via held-out sets
  • 9. ECE 8443: Lecture 16, Slide 8 • Build a fully specified probabilistic model – incorporate prior information/beliefs as well as a notion of confidence in predictions. • MacKay posed a special form for regularization in neural networks – sparsity. • Evidence maximization: evaluate candidate models based on their “evidence”, P(D|Hi). • Evidence approximation: • Likelihood of data given best fit parameter set: • Penalty that measures how well our posterior model fits our prior assumptions: • We can use set the prior in favor of sparse, smooth models. • Incorporates an automatic relevance determination (ARD) prior over each weight. Evidence Maximization w w w   ) | ˆ ( ) , ˆ | ( ) | ( H P H D P H D P ) , ˆ | ( H D P w w w w P(w|D,Hi) P(w|Hi) w w  ) | ˆ ( H P ) 1 ), 0 ( | ( ) | ( 0     N i i i i w N P    w
  • 10. ECE 8443: Lecture 16, Slide 9 • Still a kernel-based learning machine: • Incorporates an automatic relevance determination (ARD) prior over each weight (MacKay) • A flat (non-informative) prior over a completes the Bayesian specification. • The goal in training becomes finding: • Estimation of the “sparsity” parameters is inherent in the optimization – no need for a held-out set. • A closed-form solution to this maximization problem is not available. Rather, we iteratively reestimate . Relevance Vector Machines     N i i iK w w y 1 0 ) , ( ) ; ( x x w x ) ; ( 1 1 ) ; | 1 ( w x w x i y i e t P     ) 1 ), 0 ( | ( ) | ( 0     N i i i i w N P    w ) | ( ) | , ( ) , , | ( ) , ( ) , | , ( , max arg ˆ , ˆ X X α w X α w α w X α w α w α w t p p t p p where t p   α w ˆ ˆ and
  • 11. ECE 8443: Lecture 16, Slide 10 Summary • Support Vector Machines are one example of a kernel-based learning machine that is training in a discriminative fashion. • Integrates notions of risk minimization, large-margin and soft margin classification. • Two fundamental innovations:  maximize the margin between the classes using actual data points,  rotate the data into a higher-dimensional space in which the data is linearly separable. • Training can be computationally expensive but classification is very fast. • Note that SVMs are inherently non-probabilistic (e.g., non-Bayesian). • SVMs can be used to estimate posteriors by mapping the SVM output to a likelihood-like quantity using a nonlinear function (e.g., sigmoid). • SVMs are not inherently suited to an N-way classification problem. Typical approaches include a pairwise comparison or “one vs. world” approach.
  • 12. ECE 8443: Lecture 16, Slide 11 Summary • Many alternate forms include Transductive SVMs, Sequential SVMs, Support Vector Regression, Relevance Vector Machines, and data-driven kernels. • Key lesson learned: a linear algorithm in the feature space is equivalent to a nonlinear algorithm in the input space. Standard linear algorithms can be generalized (e.g., kernel principal component analysis, kernel independent component analysis, kernel canonical correlation analysis, kernel k-means). • What we didn’t discuss:  How do you train SVMs?  Computational complexity?  How to deal with large amounts of data? See Ganapathiraju for an excellent, easy to understand discourse on SVMs and Hamaker (Chapter 3) for a nice overview on RVMs. There are many other tutorials available online (see the links on the title slide) as well.  Other methods based on kernels – more to follow.