Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Yin-Wen Chang - 2017 - A Polynomial-Time Dynamic Programming Algorithm for Phrase-Based Decoding with a Fixed Distortion Limit
1. A Polynomial-Time Dynamic Programming
Algorithm for Phrase-Based Decoding with a
Fixed Distortion Limit
Yin-Wen Chang 1
(Joint work with Michael Collins 1,2)
1Google, New York
2Columbia University
July 31, 2017
2. Introduction
Background:
Phrase-based decoding without further constraints is NP-hard
Proof: reduction from the travelling salesman problem
(TSP)[Knight(1999)]
Hard distortion limit is commonly imposed in PBMT systems
Question:
Is phrase-based decoding with a fixed distortion limit NP-hard
or not?
3. Introduction
A related problem: bandwidth-limited TSP
1 2 . . . i
. . .
j
|i − j| ≤ d
This work: a new decoding algorithm
Process the source word from left-to-right
Maintain multiple “tapes” in the target side
Run time: O(nd!lhd+1
) n: source sentence length
d: distortion limit
4. Overview of the proposed decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
π1 ←
π1 =
π2 =
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
(3, 4, our concern)
Process the source word from left-to-right
Maintain multiple “tapes” in the target side
5. Overview of the proposed decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
π1 ← π1 · (1, 2, this must)
π1 =
π2 =
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
(3, 4, our concern)
Process the source word from left-to-right
Maintain multiple “tapes” in the target side
6. Overview of the proposed decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
π2 ← π2 · (3, 4, our concern)
π1 =
π2 =
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
(3, 4, our concern)
Process the source word from left-to-right
Maintain multiple “tapes” in the target side
7. Overview of the proposed decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
π1 ← π1 · (5, 5, also)
π1 =
π2 =
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
(3, 4, our concern)
Process the source word from left-to-right
Maintain multiple “tapes” in the target side
8. Overview of the proposed decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
π1 ← π1 · (6, 6, be) · π2
π1 =
π2 =
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
(3, 4, our concern)
Process the source word from left-to-right
Maintain multiple “tapes” in the target side
9. Outline
Introduction of the phrase-based decoding problem
Target-side left-to-right: the usual decoding algorithm
Source-side left-to-right: the proposed algorithm
Time complexity of the proposed algorithm
Conclusion and future work
10. Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein
this must our concern also be
Derivation: complete translation with phrase mappings
Sub-derivation: partial translation
11. Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein
this must our concern also be
Segment the German sentence into non-overlapping phrases
Derivation: complete translation with phrase mappings
Sub-derivation: partial translation
12. Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein
this must our concern also bethis must our concern also be
Segment the German sentence into non-overlapping phrases
Find an English translation for each German phrase
Derivation: complete translation with phrase mappings
Sub-derivation: partial translation
13. Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein
this must our concern also bethis must also be our concern
Segment the German sentence into non-overlapping phrases
Find an English translation for each German phrase
Reorder the English phrases to get a better English sentence
Derivation: complete translation with phrase mappings
Sub-derivation: partial translation
14. Phrase-based decoding problem
das muss unsere sorge gleichermaßen sein
this must our concern also bethis must also be our concern
Segment the German sentence into non-overlapping phrases
Find an English translation for each German phrase
Reorder the English phrases to get a better English sentence
Derivation: complete translation with phrase mappings
Sub-derivation: partial translation
15. Score a derivation
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this must also be our concern
16. Score a derivation
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this must also be our concern
Phrase translation score: score(das muss, this must) + · · ·
17. Score a derivation
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this must also be our concern
Phrase translation score: score(das muss, this must) + · · ·
Language model score:
score(<s> this must also be our concern </s>)
=score(this|<s>) + score(must|this) + · · ·
18. Score a derivation
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this must also be our concern
Phrase translation score: score(das muss, this must) + · · ·
Language model score:
score(<s> this must also be our concern </s>)
=score(this|<s>) + score(must|this) + · · ·
Reordering score: η · |2 + 1 − 5|
19. Fixed distortion limit: distortion distance ≤ d
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this must also be our concern
Distortion distance: |2 + 1 − 5| = 2
21. Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must
22. Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also
23. Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be
24. Target-side left-to-right: the usual decoding algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern
25. Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this must
Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
DP state:
26. Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
DP state:DP state: (must, 2, 110000)
27. Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
DP state:DP state: (also, 5, 110010)
28. Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
DP state:DP state: (be, 6, 110011)
29. Target-side left-to-right: dynamic programming algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
DP state:DP state: (concern, 4, 111111)
31. Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must
32. Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must our concern
33. Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also our concern
34. Source-side left-to-right: the proposed algorithm
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern
35. Source-side left-to-right: dynamic programming state
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this must
Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)
π2 =(3, 4, our concern)
DP state: j = 4,
σ1 = 1, this, 2, must ,
σ2 = 3, our, 4, concern
36. Source-side left-to-right: dynamic programming state
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)π1 =
π2 =(3, 4, our concern)
DP state: j = 4,
σ1 = 1, this, 2, must ,
σ2 = 3, our, 4, concern
DP state: j = 2,
σ1 = 1, this, 2, must
37. Source-side left-to-right: dynamic programming state
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)π1 =
π2 =(3, 4, our concern)
DP state: j = 4,
σ1 = 1, this, 2, must ,
σ2 = 3, our, 4, concern
DP state: j = 2,
σ1 = 1, this, 2, must
38. Source-side left-to-right: dynamic programming state
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must our concern
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)π1 =
π2 =(3, 4, our concern)
DP state: j = 4,
σ1 = 1, this, 2, must ,
σ2 = 3, our, 4, concern
DP state: j = 4,
σ1 = 1, this, 2, must ,
σ2 = 3, our, 4, concern
39. Source-side left-to-right: dynamic programming state
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also our concern
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)π1 =
π2 =(3, 4, our concern)
DP state: j = 4,
σ1 = 1, this, 2, must ,
σ2 = 3, our, 4, concern
DP state: j = 5,
σ1 = 1, this, 5, also ,
σ2 = 3, our, 4, concern
40. Source-side left-to-right: dynamic programming state
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern
Sub-derivation:Sub-derivation:
(1, 2, this must)(5, 5, also)(6, 6, be)(3, 4, our concern)π1 =
π2 =(3, 4, our concern)
DP state: j = 4,
σ1 = 1, this, 2, must ,
σ2 = 3, our, 4, concern
DP state: j = 6,
σ1 = 1, this, 4, concern
41. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
42. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
43. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
44. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
s, t: source word indices
45. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
s, t: source word indices
ws, wt: translated target words
46. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words
Next phrase starts at
s, t can only occurs here
Number of states: O(n · g(d) · hd+1)
47. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsTranslated source words
Next phrase starts at
s, t can only occurs here
Number of states: O(n · g(d) · hd+1)
48. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsTranslated source words
Next phrase starts atNext phrase starts at
s, t can only occurs here
Number of states: O(n · g(d) · hd+1)
49. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source wordsTranslated source words
Next phrase starts atNext phrase starts at
s, t can only occurs heres, t can only occurs here
Number of states: O(n · g(d) · hd+1)
50. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words
Next phrase starts at
s, t can only occurs heres, t can only occurs here
Number of states: O(n · g(d) · hd+1)
51. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words
Next phrase starts at
s, t can only occurs heres, t can only occurs here
X
Number of states: O(n · g(d) · hd+1)
52. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words
Next phrase starts at
s, t can only occurs heres, t can only occurs here
s(σ1) = 1
s(σi ) ∈ {j − d + 2 . . . j} ∀i ∈ {2 . . . r}
t(σi ) ∈ {j − d . . . j} ∀i ∈ {1 . . . r}
Number of states: O(n · g(d) · hd+1)
53. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words
Next phrase starts at
s, t can only occurs heres, t can only occurs here
s(σ1) = 1
s(σi ) ∈ {j − d + 2 . . . j} ∀i ∈ {2 . . . r}
t(σi ) ∈ {j − d . . . j} ∀i ∈ {1 . . . r}
r is bounded by d + 1.
Number of states: O(n · g(d) · hd+1)
54. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words
Next phrase starts at
s, t can only occurs heres, t can only occurs here
s(σ1) = 1
s(σi ) ∈ {j − d + 2 . . . j} ∀i ∈ {2 . . . r}
t(σi ) ∈ {j − d . . . j} ∀i ∈ {1 . . . r}
r is bounded by d + 1.
Number of states: O(n · g(d) · hd+1)
55. The number of DP states (fixed distortion limit d)
State: j, {σ1, σ2 . . . σr } r: number of “tapes”
j ∈ {1, . . . , n} n: source sentence length → O(n)
σ = (s, ws, t, wt) Ex: σ = (1, this, 5, also)
σ = (s(σ), ws(σ), t(σ), wt(σ)) → O(g(d) · hd+1)
1 2 3 4 . . . j − d . . . j j + 1 . . .
X
Translated source words
Next phrase starts at
s, t can only occurs heres, t can only occurs here
s(σ1) = 1
s(σi ) ∈ {j − d + 2 . . . j} ∀i ∈ {2 . . . r}
t(σi ) ∈ {j − d . . . j} ∀i ∈ {1 . . . r}
r is bounded by d + 1.
Number of states: O(n · g(d) · hd+1)
56. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern also
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
57. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern also
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
58. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern also
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
59. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern also
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
60. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern alsoalso
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
61. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must alsoalso be our concern also
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
62. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern alsoalso
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
63. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern alsoalso
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)
π2 = (3, 4, our concern)(5, 5, also)π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
64. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern alsoalso
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
65. Extend a sub-derivation by four operations
Current sub-derivation: j, π1, π2, . . . πr
Consider a new phrase starting at source position j + 1 → O(l)
New segment πr+1 = p
Append πi = πi , p
Prepend πi = p, πi
Concatenate πi = πi , p, πi → O(r2) = O(d2)
1 2 3 4 5 6
das muss unsere sorge gleichermaßen sein
this mustthis must also be our concern alsoalsoalso
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (5, 5, also)(3, 4, our concern)
π3 = (5, 5, also)
Sub-derivation: π1 = (1, 2, this must)(5, 5, also)(5, 5, also)(3, 4, our concern)
π2 = (3, 4, our concern)(5, 5, also)
π3 = (5, 5, also)
66. Bound on running time O(nd!lhd+1
)
# DP states: O(n · g(d) · hd+1)
# transition: O(d2 · l)
n: source sentence length
d: distortion limit
l: bound on the number of phrases starting at any position
h: bound on the maximum number of target translations for
any source word
67. Summary
Problem: Phrase-based decoding with a fixed distortion limit
A new decoding algorithm with O(nd!lhd+1) time
Operate from left to right on the source side
Maintain multiple “tapes” on the target side
68. Follow-up paper in EMNLP discussing experimental results
To appear in EMNLP 2017:
“Source-side left-to-right or target-side left-to-right?
An empirical comparison of two phrase-based decoding algorithms”
Beam search with a trigram language model
Constraints on the number of “tapes”
Achieve similar efficiency and accuracy as Moses
69. Future work
Finite state transducer (FST) formulation
j = 4
σ1 = 1, this, 2, must
σ2 = 3, our, 4, concern
j = 5
σ1 = 1, this, 5, also
σ2 = 3, our, 4, concern
1 · (5, 5, also), 2
Neural machine translation
An NMT system using this kind of approach?
Replace the attention model by absolving source words strictly
left-to-right?