1.
Machine
Learning
for
Language
Technology
Lecture
10:
SVM
and
MIRA
Marina
San5ni
Department
of
Linguis5cs
and
Philology
Uppsala
University,
Uppsala,
Sweden
Autumn
2014
Acknowledgement:
Thanks
to
Prof.
Joakim
Nivre
for
course
design
and
materials
1
7.
Maximizing
the
margin
Linear
Classifiers:
Repe55on
&
Extension
7
• The
no5on
of
margin:
a
way
of
predic5ng
what
it
will
be
a
good
separa5on
on
the
test
set.
• Intui5vely,
if
we
make
the
margin
between
opposite
groups
as
wide
as
possible,
our
chances
to
guess
correct
in
the
test
set
should
increase.
• the
generaliza5on
error
on
unseen
test
data
is
propor5onal
to
the
inverse
of
the
margin:
the
larger
the
margin,
the
smaller
the
generaliza5on
error
12.
Perceptron
vs.
SVMs/MIRA
Linear
Classifiers:
Repe55on
&
Extension
12
Perceptron
SVMs/MIRA
If the training set is separable by some margin, the
Perceptron will find a weight vector that separates the data,
but it will not necessarily pick up the vector that maximizes
the margin. If we are lucky, it will be a vector with the
largest margin, but there will be no guarantee.
SVMs/MIRA want a weight vector that maximizes the
margin to 1. Here the margin is normalized to 1. So we put
a constraint on the weight vector saying that the weight
should be such that when you computes the norm we
should get 1. We keep the margin fixed and minimize the
norm. That is, we want the smallest weight vector that
gives us margin 1.
We
do
not
minimize
the
norm,
we
minimize
the
norm
squared
divided
by
2
to
make
the
math
easier
(trust
the
people
who
suggested
this
J
)
It appears that you have an ad-blocker running. By whitelisting SlideShare on your ad-blocker, you are supporting our community of content creators.
Hate ads?
We've updated our privacy policy.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.