[Book Reading] 機械翻訳 - Section 7 No.1

Ch. 7: Optimization
M2
Yuichiro
Sawai
1
15/07/09

Overview
15/07/09
2
•  MT
decoding

•  Need
to
ﬁnd
w
that
assigns
higher
scores
to
be@er
translaBons

(e,
d)

•  Be@er
translaBons
=
translaBons
with
lower
error
f:
source
sentence,
e:
target
sentence,
d:
derivaBon

w:
weight
vector,
h(・):
feature
funcBon

Loss
MinimizaBon
•  Given
parallel
corpus
(F,
E),
ﬁnd
w
that
minimizes
loss
funcBon

l(・)

•  e.g.,
l(F,
E;
w)
=
1
–
BLEU(E,
decodew(F))

•  λ
is
a
regularizaBon
constant
to
avoid
overﬁUng

15/07/09
3
regularizaBon
term

Problems
to
Consider
1.  Search
space
is
vast

•  impossible
to
consider
all
candidates

•  correct
translaBon
is
rarely
possible

2.  ApproximaBon
of
error
funcBon

•  Error
metrics
(e.g.
BLEU)
are
not
diﬀerenBable

•  Split
corpus-‐level
metrics
into
sentence
level

3.  How
to
calculate
argmin
wTh

15/07/09
4

Batch
Learning
•  Given
parallel
corpus
(F,
E),
iniBalize
w
and
iteraBvely

1.  decode
whole
corpus
F
with
current
w,
and
get
k-‐best
lists
C

2.  opBmize
w

3.  loop
unBl
convergence

•  vs.
online
learning

•  opBmize
w
per
sentence

15/07/09
5

Minimum
Error
Rate
Training
(MERT)
•  Given
error
funcBon
error(E,
Ê),
directly
minimize
it

•  E:
reference
translaBons,
Ê:
system
translaBons

•  e.g.
error(E,
Ê)
=
1
–
BLEU(E,
Ê)

•  In
other
words,

•  Since
error(・)
is
not
diﬀerenBable
w.r.t.
w,
gradient-‐based

method
is
not
applicable

•  Instead,
use
Powell’s
method

•  gradients
not
required

15/07/09
6

Powell’s
Method
•  IteraBvely,
ﬁx
a
direcBon,
and
ﬁnd
opBmal
w
in
that
direcBon

•  Applicable
when
gradients
are
not
available
15/07/09
7
w0
w1
w2
w3
x1
x2

OpBmizaBon
in
One
DirecBon
•  1-‐best
translaBon
parameterized
by
scalar
γ
15/07/09
8
bm:
one-‐hot
vector

with
mth
dim
=
1
intercept
slope
γ
wh
+
γh
c1
c2
c4
c3
Candidates
with
highest

score
are
selected
envelope
γ
error
c1
c3
c4
e.g.)

f
=
黒い
猫
を
見た

e
=
I
saw
a
black
cat

c1
=
I
saw
black
cat

c2
=
saw
a
black
cat

…

Corpus-‐level
Error
•  Sentence-‐level
losses
are
summed
to
get
corpus-‐level
error
15/07/09
9
sentence
1
sentence
2
add
sentence-‐level

error
sentence-‐level

envelope
mulB-‐sentence

error
γ*
Find
γ
that
minimizes
overall
error!

Problems
of
Powell’s
Method
•  SensiBve
to
iniBalizaBon
of
w

•  Not
suitable
for
high-‐dimensional
feature
vectors
15/07/09
10

Sojmax
Loss
•  TranslaBon
probability

•  Loss
is
negaBve
likelihood
of
oracle
translaBons

where
oracle
translaBons
are

•  Gradient-‐based
methods
(e.g.
L-‐BFGS)
are
applicable
15/07/09
11

Max
Margin
Loss
15/07/09
12
•  Make
sure
distances
between
correct
translaBons
and

incorrect
translaBons
are
large

•  For
example:

•  OpBmizaBon
methods
for
SVM
are
applicable
(e.g.
SMO)
for
all
oracle
and
non-‐oracle
pairs
…
penalize
when
diﬀ
in
error
is
greater
than
diﬀ
in
score
f:
黒い猫を見た,
e
(correct):
I
saw
a
black
cat

e*
(oracle)
I
saw
black
cat

0.1

0.4

e

(system)
see
red
dog

0.9

0.3

error
score
(=wTh)
large
small!
bad!

Pairwise
Ranking
OpBmizaBon
(PRO)
•  Parameter
esBmaBon
as
ranking
problem

•  Classifier
learns
w
to
rank
candidates
by
error

•  Generate
training
examples
from
pairs
of
candidates

•  posiBve
example:
h(cand1)
–
h(cand2)
=
(-‐4,
6)

•  negaBve
example:
h(cand3)
–
h(cand1)
=
(3,
-‐7)

•  wT{h(cand1)
–
h(cand2)}
>
0
⇔
wTh(cand1)
>
wTh(cand2)

•  Off-‐the-‐shelf
linear
binary
classifiers
can
be
used
15/07/09
13
f:
黒い猫を見た,
e
(correct):
I
saw
a
black
cat

e

(cand1)
I
see
black
cat

0.3

(-‐1,
2)

???

e

(cand2)
see
black
dog

0.7

(3,
-‐4)

???

e

(cand3)
see
red
dog

0.9

(2,
-‐5)

???

error
score
(=wTh)
h

Minimum
Bayes
Risk
15/07/09
14
•  Minimize
expected
loss

where

•  γ
=
0:
all
candidates
are
equally
likely

•  γ
=
1:
sojmax

•  γ→∞:
highest
scoring
candidate
with
probability
1
(MERT)

•  DiﬀerenBable
and
considers
many
candidates
<e,d>

Sentence-‐level
BLEU
error
funcBons
are
needed
for
opBmizaBon

•  BLEU
is
corpus-‐level
metric

•  4-‐gram
precision
is
ojen
0
on
sentence
level

•  varies
from
human
judgments

error

•  Linear
BLEU

•  (Expected
BLEU)
15/07/09
15

Linear
BLEU
•  Linear
approximaBon
of
change
in
BLEU

c:
sum
of
sentence
lengths

mn:
#
matched
n-‐grams

•  Add
one
sentence:
(c,
mn)
-‐>
(c’,
mn’)

•  Linear
BLEU
error
of
candidate
e

15/07/09
16
log
BLEU
(c,mn)
(c’,m’n)
Δ
#
matched
n-‐grams
in
e

[Book Reading] 機械翻訳 - Section 7 No.1

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to [Book Reading] 機械翻訳 - Section 7 No.1

Similar to [Book Reading] 機械翻訳 - Section 7 No.1 (20)

Recently uploaded

Recently uploaded (20)

[Book Reading] 機械翻訳 - Section 7 No.1