Modality-Preserving Phrase-based Statistical Machine Translation

Modality-Preserving
Phrase-Based
Statistical Machine Translation
Masamichi Ideue, Masao Utiyama,
Eiichiro Sumita and Kazuhide Yamamoto
(Nagaoka University of Technology and NICT)

Purpose of our study
Japanese to English translation preserving
negation and question modality by Phrase-
based SMT.
Input 私はりんごが好きではありません。
Translation I don’t like apples.
The MT users would not be able to detect a modality error.
1
MT Translation I like apples.

Related Studies
• Class-Dependent Modeling for Dialog
Translation [Finch et al., 2009]
• Discriminative Reranking for SMT
using Various Global Features [Goh
et al., 2010]
Our study focused on characteristic modality
words in negations and questions.
Neither of the studies discussed what
expressions inﬂuence modalities.
2

Proposed Method
Add feature functions considered
characteristic words of negation and
question.
3

Added feature functions
The number of phrase pairs including
characteristic words of question
(negation) in Japanese phrase and
English phrase.
Hypothesis e Where is the purse ?
Input f 財布はどこにありますか？
2
4

Characteristic Words Extraction
• Manual extraction
• Automatic extraction
• Using LLR(Log-likelihood ratio) score
Extract characteristic words from the
parallel corpus in travel domain.
5

Manual Extraction
(English)
Negation Question
not t
don Don
haven isn
No won
wasn doesn
didn cannot
hadn
? Why
Will What
Could Is
How Does
Can Do
Are Which
When Where
Have Does
Did Was
May
6

Manual Extraction
(Japanese)
Negation Question
ない(nai)
ません
(masen)
? か。(ka.)
• The characteristic words that clearly express
the modalities are few.
• Whether a word expresses modality or not,
there is tendency to depends on the domain.
7

Automatic Extraction
• Automatic extraction is based on LLR.
• LLR is convenient for extracting characteristic
words in travel domain (Chujo et al., 2006).
1 ?
2 Will
3 Could
4 How
5 Can
... ...
Extract top N words
from the ranking by
LLR score as the
characteristic words.
Order by LLR score (Question)
8

Calculation of LLR
(In case of negation)
If a word tends to occur in negation only,
the LLR score becomes high.
Negation Aﬃrmation
w=1 a b a+b
w=0 c d c+d
a+c b+d n
(a,b,c,d : occurrence frequency in each condition)
9

Sentence type classification
To build the contingency table, we
divided sentences in the parallel
corpus with manually extracted
English characteristic words.
English Japanese Type
He is not an artist. 彼は芸術家ではない。 negation
I like apples. 私はりんごが好きです。 aﬃrmation
Are you a doctor? あなたは医者ですか。 question
10

Extracted Words by LLR
(English)
Negation Question
do any
there have
this don
long it
isn did
your much
how time
can yet
any but
know worry
I anything
it so
afraid understand
what enough
11

Extracted Words by LLR
(Japanese)
Negation Question
かどこ
何どう
いくらは
いただけどの
何時あり
でしょもらえ
いかがどんな
ませない
んは
なかっあまり
まだあり
できじゃ
いいえそんなに
そんなたく
12

Experiments
SMT Toolkit Moses
Tuning Minimum Error Rate Training
Parallel corpus
Basic Travel Expression
Corpus (BTEC; 70,000 pairs)
Test set
1,500 sentences
(included 500 sentences for negation,
question, and aﬃrmation)
Development
set
1,500 sentences (in the same way as
test set)
13

Experiments
• From preliminary experimental
evaluation with BLEU, the N is
decided as 30 (LLR30).
• Baseline method is no additional
features.
14

Manual Evaluation
• To verify effectiveness of
translation quality when add
the proposed features.
• To verify accuracy of each
modality.
We randomly extracted 90 pairs to test the
methods for each modality (total 270 pairs).
15

Translation Quality
Good
(S,A,B)
S A B C D
Baseline
(No additional features)
151 60 57 34 26 93
Manually Extraction 153 55 54 44 29 88
LLR30 154 60 56 38 28 88
All the methods have the same translation
quality if S, A and B are assumed good
translation.
(number of sentences)
16

Accuracy of each modality
Aﬀ Neg Que
Baseline 86.67 39.22 90.48
Manually Extraction 87.41 64.71 90.48
LLR30 87.41 62.75 95.24
(Percentage of the outputs preserved the modality of the input.)
•Proposed methods indicated a marked improvement
in negation modality.
•The accuracy of LLR30 was better than the accuracy
of the baseline in all modalities.
17

Translation Example
Proposed method (Manually Extraction):
Which one shall we go to the circus and
zoo? (O)
Input (Question):
サーカスと動物園、どっちに行こうか。
Baseline:
Let s go to the circus and, the zoo? (X)
18

Translation Example
I don t mind if you cancel it? (X)
Input (Question):
キャンセルしてもかまいませんか。
Baseline:
May I cancel? (O)
masen
(negation)
masen ka
(question)
We have to treat word combinations.
19

Conclusion
• We proposed additional feature
considering characteristic words for
modality-preserving PBSMT.
• Produced more translations preserved the
modality of the input sentence than baseline
without decrease of translation quality.
• Automatic extraction performed the same
as or better than manual extraction.
20

LLR
w=1 a b a+b
w=0 c d c+d
a+c b+d n

Translation Example
Please go easy. (O)
Input (Aﬃrmation):
やさしく打ってくださいね。
Proposed mthod (English side only):
Please go easy, isn t it? (X)

Calculation of LLR
• Pr(D|H_indep) is the probability under the null
hypothesis that the occurrences of a word w in the
negative and afﬁrmative sentences are independent
of one another.
• Pr(D|H_dep) is the case in which the occurrences
are dependent.
(In case of negation)
If a word tends to occur in negation
only, the LLR score becomes high.

Calculation of LLR
w=1 a b a+b
w=0 c d c+d
a+c b+d n
(a,b,c,d : occurrence frequency in each condition)

Related Studies
• Class-Dependent Modeling for Dialog Translation
[Finch et al., 2009]
• 2 models are trained for question sentence and other
sentence.
• Discriminative Reranking for SMT using Various Global
Features [Goh et al., 2010]
• Probabilities of sentence types such as negations and
questions are used.
Our study focused on characteristic modality
words in negations and questions.
Neither of the studies discussed what
expressions inﬂuence modalities.

Modality-Preserving Phrase-based Statistical Machine Translation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Modality-Preserving Phrase-based Statistical Machine Translation

Similar to Modality-Preserving Phrase-based Statistical Machine Translation (20)

More from 長岡技術科学大学　自然言語処理研究室

More from 長岡技術科学大学　自然言語処理研究室 (20)

Recently uploaded

Recently uploaded (20)