Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Insertion Position Selection
Model for Flexible Non-Terminals
in Dependency Tree-to-Tree
Machine Translation
Toshiaki Nakazawa
Japan Science and Technology Agency
(JST）
John Richardson Sadao Kurohashi
Kyoto University
4/11/2016 @ EMNLP2016

Where to insert?
I found Pikachu by chance
yesterday
insertion positions
0.70.25 0.02 0.01prob. 0.010.01
2

Where to insert?
I found Pikachu by chance yesterday
in the park
insertion positions
0.20.1 0.6 0.01
0.01
@Texas State Capitol
0.01
0.1
3

Pikachu
Dependency Tree-to-Tree Translation
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
I
found
by
Input Translation Rules Output
ピカチュウ Pikachu
偶然
[X7]
[X7]
偶然
chance
I
found
by
[X7]
chance
公園 the
park
昨日 yesterday
で 4

Dependency Tree-to-Tree Translation
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules Output
ピカチュウ Pikachu
偶然
公園 the
park
[X7]
偶然
昨日 yesterday
で
[X]
[X]
[X]
[X]
found
by
chance
[X]
I
[X7]
found
Pikachu
by
I
chance
yesterday
the
park
in
found
Pikachu
by
I
chance
yesterday
Pikachu
I
found
by
chance
Flexible Non-terminals
[Richardson+, 2016]
floating
subtree
floating
subtree
5

Translation Quality and Decoding Speed
w/ and w/o Flexible Non-terminals
• Using ASPEC (Asian Scientific Paper Excerpt
Corpus) JE and JC
• Time is a relative decoding time
Ja->En En->Ja Ja->Zh Zh->Ja
BLEU
Tim
e
BLEU
Tim
e
BLEU
Tim
e
BLEU
Tim
e
w/o Flex
20.2
8
1.00
28.7
7
1.00
24.8
5
1.00
30.5
1
1.00
w/ Flex
21.6
1
6.28
30.5
7
3.30
28.7
9
5.16
34.3
2
5.28
6

Appropriate Insertion Position Selection
• roughly half of all translation rules were
augmented with flexible non-terminals
[Richardson+, 2016]
• flexible non-terminals make the search space
much bigger -> slower decoding speed,
increased search error
• reduce the number of possible insertion
positions in translation rules by a Neural
Network model
7

Insertion Position Selection
Model for Flexible Non-Terminals
in Dependency Tree-to-Tree
Machine Translation
Toshiaki Nakazawa
Japan Science and Technology Agency
John Richardson Sadao Kurohashi
Kyoto University
4/11/2016 @ EMNLP2016

INSERTION POSITION SELECTION
MODEL
9

Insertion Position Selection Model
• For each insertion position:
–predict
• scores of the insertion positions
–given
• input: the floating word (I) and its parent word
(Ps) with the distance (Ds)
• target: previous (Sp) and next (Sn) sibling words
of the insertion position and the parent (Pt)
with the distance (Dt)
10

Information for Selection Model
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules
偶然
[X7]
偶然
found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=
4
[X]
Dt
=
-2
Non-terminals:
reverted to the
original word in
the parallel
corpus
11
[yesterday]
[found]

Information for Selection Model
私
は
昨日
公園
で
ピカチュウ
を
見つけた
私
は
を
見つけた
Input Translation Rules
偶然
[X7]
偶然
found
by
chance
I
[X7]
I
Ps
Pt
Sp
Sn
Ds
=
4
[X]
Dt
=
-3
= [POST-BOTTOM]
12
[yesterday]
[found]

Neural Network Model
220
I
Ps
Pt
Sp
1
Sn
1
Ds
Dt
k 100100
220220220220
100
word to be inserted
parent of I
distance from PS
previous sibling
next sibling
parent of the
insertion position
distance from Pt
fully-connected
feed-forward network
（）
・・・
1
1
1
・・・
insertion position 2
insertion position N
scores
0.1
0.6
・
・
・
0.1
0
1
・
・
・
0
（）
softmax gold
loss =
softmax cross-entropy
insertion position 1
13

Training Data Creation
• Training data for the NN model can be
automatically created from the word-aligned
parallel corpus
– consider each alignment as the floating word and
remove it from the target tree
14
私
は
を
見つけた
I
found
by
ピカチュウ
Pikachu
偶然
chance
[X]
[X]
[X]
[X]
label
0
0
0
1

Insertion Position Selection Experiment
• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)
• Data size
• Comparison
– L2-regularized logistic regression (using Multi-core
LIBLINEAR)
Ja-
>En
En-
>Ja
Ja-
>Zh
Zh-
>Ja
Training 15.7M 5.7M
Development 160K 58K
Test 160K 58K
Ave. # IP 3.39 3.15 3.72 3.41
16

Experimental Results
Training 15.7M 5.7M
Development 160K 58K
Test 160K 58K
Ave. # IP 3.39 3.15 3.72 3.41
Mean loss 0.089 0.058 0.105 0.056
Top 1 Accuracy (%) 97.08 97.72 96.51 97.99
Top 2 Accuracy (%) 98.94 99.52 98.97 99.56
Logit Accuracy (%) 55.00 89.03 68.04 83.16
17

Translation Experiment
• Parallel corpus: ASPEC-JE/JC (2M/680K
sentences)
• Decoder: KyotoEBMT [Richardson+, 2014]
• 5 Settings
– Phrase-based and hierarchical phrase-based SMTs
– w/o Flex: not using flexible non-terminals
– w/ Flex: baseline with flexible non-terminals
– Prop: using insertion position selection (only top 1)
• BLEU and relative decoding time
18

Translation Experimental Results
BLEU Time BLEU Time BLEU Time BLEU Time
PBSMT 18.45 - 27.48 - 27.96 - 34.65 -
HPBSMT 18.72 - 30.19 - 27.71 - 35.43 -
w/o Flex 20.28 1.00 28.77 1.00 24.85 1.00 30.51 1.00
w/ Flex 21.61 6.28 30.57 3.30 28.79 5.16 34.32 5.28
Prop 22.07 2.25 30.50 1.27 29.83 2.21 34.71 1.89
19

Conclusion
• Proposed insertion position selection model
to reduced the number of insertion positions
for flexible non-terminals in the translation
rules
• Automatic evaluation scores and decoding
speed are improved
20

Future Work
• Use grand-children’s info
– Recursive NN [Liu et al., 2015] or Convolutional
NN [Mou et al., 2015]
• Shift to NMT!!
– Actually, we’ve already shifted and participated
WAT2016 shared tasks
• However, NMT is still far from perfect
21

J->E Adequacy in WAT2016
22
3.76 3.71
21.75 21
37.25
51.75
46.75
30.50
20.75
26.75
16.25
4.75 5
10.00
1 0.5
6.00
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1
2
3
4
5
3.83Average adequacy
BLEU 26.22 26.39 25.41
Kyoto-U
(NMT)
NAIST/CMU
(NMT)
NAIST
(2015 best, F2T)
Team name

23
Thank You!
AD I’m co-organizing
The 3rd Workshop on Asian Translation
(WAT2016)
in conjunction with COLING 2016
Invited talk by Google about GNMT!
Please come to the workshop!
http://lotus.kuee.kyoto-u.ac.jp/WAT/

Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

More Related Content

Viewers also liked

Similar to Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Recently uploaded

Insertion Position Selection Model for Flexible Non-Terminals in Dependency Tree-to-Tree

Editor's Notes