SlideShare a Scribd company logo
Modality-Preserving
Phrase-Based
Statistical Machine Translation
Masamichi Ideue, Masao Utiyama,
Eiichiro Sumita and Kazuhide Yamamoto
(Nagaoka University of Technology and NICT)
Purpose of our study
Japanese to English translation preserving
negation and question modality by Phrase-
based SMT.
Input 私はりんごが好きではありません。
Translation I don’t like apples.
The MT users would not be able to detect a modality error.
1
MT Translation I like apples.
Related Studies
• Class-Dependent Modeling for Dialog
Translation [Finch et al., 2009]
• Discriminative Reranking for SMT
using Various Global Features [Goh
et al., 2010]
Our study focused on characteristic modality
words in negations and questions.
Neither of the studies discussed what
expressions influence modalities.
2
Proposed Method
Add feature functions considered
characteristic words of negation and
question.
3
Added feature functions
The number of phrase pairs including
characteristic words of question
(negation) in Japanese phrase and
English phrase.
Hypothesis e Where is the purse ?
Input f 財布 は どこ に あり ます か ?
2
4
Characteristic Words Extraction
• Manual extraction
• Automatic extraction
• Using LLR(Log-likelihood ratio) score
Extract characteristic words from the
parallel corpus in travel domain.
5
Manual Extraction
(English)
Negation Question
not t
don Don
haven isn
No won
wasn doesn
didn cannot
hadn
? Why
Will What
Could Is
How Does
Can Do
Are Which
When Where
Have Does
Did Was
May
6
Manual Extraction
(Japanese)
Negation Question
ない(nai)
ません
(masen)
? か 。(ka.)
• The characteristic words that clearly express
the modalities are few.
• Whether a word expresses modality or not,
there is tendency to depends on the domain.
7
Automatic Extraction
• Automatic extraction is based on LLR.
• LLR is convenient for extracting characteristic
words in travel domain (Chujo et al., 2006).
1 ?
2 Will
3 Could
4 How
5 Can
... ...
Extract top N words
from the ranking by
LLR score as the
characteristic words.
Order by LLR score (Question)
8
Calculation of LLR
(In case of negation)
If a word tends to occur in negation only,
the LLR score becomes high.
Negation Affirmation
w=1 a b a+b
w=0 c d c+d
a+c b+d n
(a,b,c,d : occurrence frequency in each condition)
9
Sentence type classification
To build the contingency table, we
divided sentences in the parallel
corpus with manually extracted
English characteristic words.
English Japanese Type
He is not an artist. 彼は芸術家ではない。 negation
I like apples. 私はりんごが好きです。 affirmation
Are you a doctor? あなたは医者ですか。 question
10
Extracted Words by LLR
(English)
Negation Question
do any
there have
this don
long it
isn did
your much
how time
can yet
any but
know worry
I anything
it so
afraid understand
what enough
11
Extracted Words by LLR
(Japanese)
Negation Question
か どこ
何 どう
いくら は
いただけ どの
何時 あり
でしょ もらえ
いかが どんな
ませ ない
ん は
なかっ あまり
まだ あり
でき じゃ
いいえ そんなに
そんな たく
12
Experiments
SMT Toolkit Moses
Tuning Minimum Error Rate Training
Parallel corpus
Basic Travel Expression
Corpus (BTEC; 70,000 pairs)
Test set
1,500 sentences
(included 500 sentences for negation,
question, and affirmation)
Development
set
1,500 sentences (in the same way as
test set)
13
Experiments
• From preliminary experimental
evaluation with BLEU, the N is
decided as 30 (LLR30).
• Baseline method is no additional
features.
14
Manual Evaluation
• To verify effectiveness of
translation quality when add
the proposed features.
• To verify accuracy of each
modality.
We randomly extracted 90 pairs to test the
methods for each modality (total 270 pairs).
15
Translation Quality
Good
(S,A,B)
S A B C D
Baseline
(No additional features)
151 60 57 34 26 93
Manually Extraction 153 55 54 44 29 88
LLR30 154 60 56 38 28 88
All the methods have the same translation
quality if S, A and B are assumed good
translation.
(number of sentences)
16
Accuracy of each modality
Aff Neg Que
Baseline 86.67 39.22 90.48
Manually Extraction 87.41 64.71 90.48
LLR30 87.41 62.75 95.24
(Percentage of the outputs preserved the modality of the input.)
•Proposed methods indicated a marked improvement
in negation modality.
•The accuracy of LLR30 was better than the accuracy
of the baseline in all modalities.
17
Translation Example
Proposed method (Manually Extraction):
Which one shall we go to the circus and
zoo? (O)
Input (Question):
サーカスと動物園、どっちに行こうか。
Baseline:
Let s go to the circus and, the zoo? (X)
18
Translation Example
Proposed method (Manually Extraction):
I don t mind if you cancel it? (X)
Input (Question):
キャンセルしてもかまいませんか。
Baseline:
May I cancel? (O)
masen
(negation)
masen ka
(question)
We have to treat word combinations.
19
Conclusion
• We proposed additional feature
considering characteristic words for
modality-preserving PBSMT.
• Produced more translations preserved the
modality of the input sentence than baseline
without decrease of translation quality.
• Automatic extraction performed the same
as or better than manual extraction.
20
LLR
LLR
Negation Affirmation
w=1 a b a+b
w=0 c d c+d
a+c b+d n
Translation Example
Proposed method (Manually Extraction):
Please go easy. (O)
Input (Affirmation):
やさしく打ってくださいね。
Proposed mthod (English side only):
Please go easy, isn t it? (X)
Calculation of LLR
• Pr(D|H_indep) is the probability under the null
hypothesis that the occurrences of a word w in the
negative and affirmative sentences are independent
of one another.
• Pr(D|H_dep) is the case in which the occurrences
are dependent.
(In case of negation)
If a word tends to occur in negation
only, the LLR score becomes high.
Calculation of LLR
Negation Affirmation
w=1 a b a+b
w=0 c d c+d
a+c b+d n
(a,b,c,d : occurrence frequency in each condition)
Related Studies
• Class-Dependent Modeling for Dialog Translation
[Finch et al., 2009]
• 2 models are trained for question sentence and other
sentence.
• Discriminative Reranking for SMT using Various Global
Features [Goh et al., 2010]
• Probabilities of sentence types such as negations and
questions are used.
Our study focused on characteristic modality
words in negations and questions.
Neither of the studies discussed what
expressions influence modalities.

More Related Content

What's hot

Doppl development iteration #6
Doppl development   iteration #6Doppl development   iteration #6
Doppl development iteration #6
Diego Perini
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
Hrishikesh Nair
 
Fluency-Guided Cross-Lingual Image Captioning
Fluency-Guided Cross-Lingual Image CaptioningFluency-Guided Cross-Lingual Image Captioning
Fluency-Guided Cross-Lingual Image Captioning
Xirong Li
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
RIILP
 
A new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a languageA new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a language
Ryoma Sin'ya
 
形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチ形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチ
Ryoma Sin'ya
 
Ijetcas14 575
Ijetcas14 575Ijetcas14 575
Ijetcas14 575
Iasir Journals
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
ijnlc
 
Logic
LogicLogic
Monitoring and feedback in the process of language acquisition analysis and ...
Monitoring and feedback in the process of language acquisition  analysis and ...Monitoring and feedback in the process of language acquisition  analysis and ...
Monitoring and feedback in the process of language acquisition analysis and ...
ijnlc
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
MeetupDataScienceRoma
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
ijnlc
 
Sentence Patterns
Sentence PatternsSentence Patterns
Sentence Patterns
r3h1na
 
Sentence patterns
Sentence patternsSentence patterns
Sentence patterns
Mafatima Devera
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
RIILP
 
NLP_session-3_Alexandra
NLP_session-3_AlexandraNLP_session-3_Alexandra
NLP_session-3_Alexandra
Alexandra M. Liguori, Ph.D.
 
Ai lecture 09(unit03)
Ai lecture  09(unit03)Ai lecture  09(unit03)
Ai lecture 09(unit03)
vikas dhakane
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
iwan_rg
 
NLP_session-1
NLP_session-1NLP_session-1

What's hot (20)

Doppl development iteration #6
Doppl development   iteration #6Doppl development   iteration #6
Doppl development iteration #6
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Fluency-Guided Cross-Lingual Image Captioning
Fluency-Guided Cross-Lingual Image CaptioningFluency-Guided Cross-Lingual Image Captioning
Fluency-Guided Cross-Lingual Image Captioning
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
A new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a languageA new technique for proving non regularity based on the measure of a language
A new technique for proving non regularity based on the measure of a language
 
形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチ形式言語理論への 測度論的アプローチ
形式言語理論への 測度論的アプローチ
 
Ijetcas14 575
Ijetcas14 575Ijetcas14 575
Ijetcas14 575
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
 
Logic
LogicLogic
Logic
 
Monitoring and feedback in the process of language acquisition analysis and ...
Monitoring and feedback in the process of language acquisition  analysis and ...Monitoring and feedback in the process of language acquisition  analysis and ...
Monitoring and feedback in the process of language acquisition analysis and ...
 
Deep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigmDeep Learning for Machine Translation - A dramatic turn of paradigm
Deep Learning for Machine Translation - A dramatic turn of paradigm
 
Hps a hierarchical persian stemming method
Hps a hierarchical persian stemming methodHps a hierarchical persian stemming method
Hps a hierarchical persian stemming method
 
Sentence Patterns
Sentence PatternsSentence Patterns
Sentence Patterns
 
Sentence patterns
Sentence patternsSentence patterns
Sentence patterns
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
NLP_session-3_Alexandra
NLP_session-3_AlexandraNLP_session-3_Alexandra
NLP_session-3_Alexandra
 
Ai lecture 09(unit03)
Ai lecture  09(unit03)Ai lecture  09(unit03)
Ai lecture 09(unit03)
 
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshopورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
ورشة تضمين الكلمات في التعلم العميق Word embeddings workshop
 
NLP_session-1
NLP_session-1NLP_session-1
NLP_session-1
 

Viewers also liked

Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...
Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...
Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...
IOSR Journals
 
Non-Market Valuation
Non-Market ValuationNon-Market Valuation
Non-Market Valuation
Iwl Pcu
 
Mozer Methods Of Valuation
Mozer Methods Of ValuationMozer Methods Of Valuation
Mozer Methods Of Valuation
QRCE
 
Valuation of Environmental Resources
Valuation of Environmental ResourcesValuation of Environmental Resources
Valuation of Environmental Resources
Iwl Pcu
 
Presentation: Valuing Ecosystem Services, Methods and Practices
Presentation: Valuing Ecosystem Services, Methods and PracticesPresentation: Valuing Ecosystem Services, Methods and Practices
Presentation: Valuing Ecosystem Services, Methods and Practices
Steve Puma
 
Environmental valuation techniques a review
Environmental valuation techniques   a reviewEnvironmental valuation techniques   a review
Environmental valuation techniques a review
DocumentStory
 
16 cost benefit analysis of the environment
16 cost benefit analysis of the environment16 cost benefit analysis of the environment
16 cost benefit analysis of the environment
Prabha Panth
 
Ecosystem Services
Ecosystem ServicesEcosystem Services
Ecosystem Services
muskokee
 
Valuation
ValuationValuation
Valuation
srinivas2036
 
Valuation Methods
Valuation MethodsValuation Methods
Valuation Methods
FITT
 
Basic Company Valuation
Basic Company ValuationBasic Company Valuation
Basic Company Valuation
Faizanization
 

Viewers also liked (11)

Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...
Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...
Performance Appraisal and Ranking of DCCBs through Malmquist Index and Super-...
 
Non-Market Valuation
Non-Market ValuationNon-Market Valuation
Non-Market Valuation
 
Mozer Methods Of Valuation
Mozer Methods Of ValuationMozer Methods Of Valuation
Mozer Methods Of Valuation
 
Valuation of Environmental Resources
Valuation of Environmental ResourcesValuation of Environmental Resources
Valuation of Environmental Resources
 
Presentation: Valuing Ecosystem Services, Methods and Practices
Presentation: Valuing Ecosystem Services, Methods and PracticesPresentation: Valuing Ecosystem Services, Methods and Practices
Presentation: Valuing Ecosystem Services, Methods and Practices
 
Environmental valuation techniques a review
Environmental valuation techniques   a reviewEnvironmental valuation techniques   a review
Environmental valuation techniques a review
 
16 cost benefit analysis of the environment
16 cost benefit analysis of the environment16 cost benefit analysis of the environment
16 cost benefit analysis of the environment
 
Ecosystem Services
Ecosystem ServicesEcosystem Services
Ecosystem Services
 
Valuation
ValuationValuation
Valuation
 
Valuation Methods
Valuation MethodsValuation Methods
Valuation Methods
 
Basic Company Valuation
Basic Company ValuationBasic Company Valuation
Basic Company Valuation
 

Similar to Modality-Preserving Phrase-based Statistical Machine Translation

Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
Ramya Nellutla
 
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Association for Computational Linguistics
 
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
Toshiaki Nakazawa
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
Forward Gradient
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
Ted Xiao
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLP
Insoo Chung
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
Jinpyo Lee
 
Nlp
NlpNlp
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
GeekNightHyderabad
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet Mixture
Rakuten Group, Inc.
 
London 28 june final
London 28 june finalLondon 28 june final
London 28 june final
Suzanne Bewell
 
NTHU Natural Language Processing Term Project Intro
NTHU Natural Language Processing Term Project IntroNTHU Natural Language Processing Term Project Intro
NTHU Natural Language Processing Term Project Intro
Howard Lo
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your Vacation
TJ Stalcup
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
MENGSAYLOEM1
 
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Tomoya Mizumoto
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
Hemantha Kulathilake
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
Katerina Vylomova
 
defense
defensedefense
defense
Qing Dou
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)
Nitesh Singh
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
Hendrik D'Oosterlinck
 

Similar to Modality-Preserving Phrase-based Statistical Machine Translation (20)

Deep network notes.pdf
Deep network notes.pdfDeep network notes.pdf
Deep network notes.pdf
 
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
Junki Matsuo - 2015 - Source Phrase Segmentation and Translation for Japanese...
 
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
3-step Parallel Corpus Cleaning using Monolingual Crowd Workers
 
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and ApplicationsICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
ICDM 2019 Tutorial: Speech and Language Processing: New Tools and Applications
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
CSCE181 Big ideas in NLP
CSCE181 Big ideas in NLPCSCE181 Big ideas in NLP
CSCE181 Big ideas in NLP
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Latent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet MixtureLatent Semantic Transliteration using Dirichlet Mixture
Latent Semantic Transliteration using Dirichlet Mixture
 
London 28 june final
London 28 june finalLondon 28 june final
London 28 june final
 
NTHU Natural Language Processing Term Project Intro
NTHU Natural Language Processing Term Project IntroNTHU Natural Language Processing Term Project Intro
NTHU Natural Language Processing Term Project Intro
 
Data Science Your Vacation
Data Science Your VacationData Science Your Vacation
Data Science Your Vacation
 
Beyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLPBeyond the Symbols: A 30-minute Overview of NLP
Beyond the Symbols: A 30-minute Overview of NLP
 
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
Mining Revision Log of Language Learning SNS for Automated Japanese Error Cor...
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Contemporary Models of Natural Language Processing
Contemporary Models of Natural Language ProcessingContemporary Models of Natural Language Processing
Contemporary Models of Natural Language Processing
 
defense
defensedefense
defense
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
 

More from 長岡技術科学大学 自然言語処理研究室

小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
長岡技術科学大学 自然言語処理研究室
 
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
長岡技術科学大学 自然言語処理研究室
 
Selecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for ChildrenSelecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for Children
長岡技術科学大学 自然言語処理研究室
 
Automatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge ExpressionAutomatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge Expression
長岡技術科学大学 自然言語処理研究室
 
用言等換言辞書を用いた換言結果の考察
用言等換言辞書を用いた換言結果の考察用言等換言辞書を用いた換言結果の考察
用言等換言辞書を用いた換言結果の考察
長岡技術科学大学 自然言語処理研究室
 
用言等換言辞書の構築
用言等換言辞書の構築用言等換言辞書の構築
質問意図によるQAサイト質問文の自動分類
質問意図によるQAサイト質問文の自動分類質問意図によるQAサイト質問文の自動分類
質問意図によるQAサイト質問文の自動分類
長岡技術科学大学 自然言語処理研究室
 
役所からの公的文書に対する「やさしい日本語」への変換システムの構築
役所からの公的文書に対する「やさしい日本語」への変換システムの構築役所からの公的文書に対する「やさしい日本語」への変換システムの構築
役所からの公的文書に対する「やさしい日本語」への変換システムの構築
長岡技術科学大学 自然言語処理研究室
 
対訳コーパスから生成したワードグラフによる部分的機械翻訳
対訳コーパスから生成したワードグラフによる部分的機械翻訳対訳コーパスから生成したワードグラフによる部分的機械翻訳
対訳コーパスから生成したワードグラフによる部分的機械翻訳
長岡技術科学大学 自然言語処理研究室
 
用言等換言辞書を人手で作りました
用言等換言辞書を人手で作りました用言等換言辞書を人手で作りました
用言等換言辞書を人手で作りました
長岡技術科学大学 自然言語処理研究室
 
文字列の出現頻度情報を用いた分かち書き単位の自動取得
文字列の出現頻度情報を用いた分かち書き単位の自動取得文字列の出現頻度情報を用いた分かち書き単位の自動取得
文字列の出現頻度情報を用いた分かち書き単位の自動取得
長岡技術科学大学 自然言語処理研究室
 
「やさしい日本語」変換システムの試作
「やさしい日本語」変換システムの試作「やさしい日本語」変換システムの試作
「やさしい日本語」変換システムの試作
長岡技術科学大学 自然言語処理研究室
 
常識表現となり得る用言の自動選定の検討
常識表現となり得る用言の自動選定の検討常識表現となり得る用言の自動選定の検討
常識表現となり得る用言の自動選定の検討
長岡技術科学大学 自然言語処理研究室
 
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
長岡技術科学大学 自然言語処理研究室
 
二格深層格の定量的分析
二格深層格の定量的分析二格深層格の定量的分析
大規模常識知識ベース構築のための常識表現の自動獲得
大規模常識知識ベース構築のための常識表現の自動獲得大規模常識知識ベース構築のための常識表現の自動獲得
大規模常識知識ベース構築のための常識表現の自動獲得
長岡技術科学大学 自然言語処理研究室
 
文脈の多様性に基づく名詞換言の提案
文脈の多様性に基づく名詞換言の提案文脈の多様性に基づく名詞換言の提案
文脈の多様性に基づく名詞換言の提案
長岡技術科学大学 自然言語処理研究室
 
保険関連文書を対象とした文章校正支援のための変換誤り検出
保険関連文書を対象とした文章校正支援のための変換誤り検出保険関連文書を対象とした文章校正支援のための変換誤り検出
保険関連文書を対象とした文章校正支援のための変換誤り検出
長岡技術科学大学 自然言語処理研究室
 
Developing User-friendly and Customizable Text Analyzer
Developing User-friendly and Customizable Text AnalyzerDeveloping User-friendly and Customizable Text Analyzer
Developing User-friendly and Customizable Text Analyzer
長岡技術科学大学 自然言語処理研究室
 
普通名詞換言辞書の構築
普通名詞換言辞書の構築普通名詞換言辞書の構築

More from 長岡技術科学大学 自然言語処理研究室 (20)

小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
小学生の読解支援に向けた複数の換言知識を併用した語彙平易化と評価
 
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
小学生の読解支援に向けた語釈文から語彙的換言を選択する手法
 
Selecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for ChildrenSelecting Proper Lexical Paraphrase for Children
Selecting Proper Lexical Paraphrase for Children
 
Automatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge ExpressionAutomatic Selection of Predicates for Common Sense Knowledge Expression
Automatic Selection of Predicates for Common Sense Knowledge Expression
 
用言等換言辞書を用いた換言結果の考察
用言等換言辞書を用いた換言結果の考察用言等換言辞書を用いた換言結果の考察
用言等換言辞書を用いた換言結果の考察
 
用言等換言辞書の構築
用言等換言辞書の構築用言等換言辞書の構築
用言等換言辞書の構築
 
質問意図によるQAサイト質問文の自動分類
質問意図によるQAサイト質問文の自動分類質問意図によるQAサイト質問文の自動分類
質問意図によるQAサイト質問文の自動分類
 
役所からの公的文書に対する「やさしい日本語」への変換システムの構築
役所からの公的文書に対する「やさしい日本語」への変換システムの構築役所からの公的文書に対する「やさしい日本語」への変換システムの構築
役所からの公的文書に対する「やさしい日本語」への変換システムの構築
 
対訳コーパスから生成したワードグラフによる部分的機械翻訳
対訳コーパスから生成したワードグラフによる部分的機械翻訳対訳コーパスから生成したワードグラフによる部分的機械翻訳
対訳コーパスから生成したワードグラフによる部分的機械翻訳
 
用言等換言辞書を人手で作りました
用言等換言辞書を人手で作りました用言等換言辞書を人手で作りました
用言等換言辞書を人手で作りました
 
文字列の出現頻度情報を用いた分かち書き単位の自動取得
文字列の出現頻度情報を用いた分かち書き単位の自動取得文字列の出現頻度情報を用いた分かち書き単位の自動取得
文字列の出現頻度情報を用いた分かち書き単位の自動取得
 
「やさしい日本語」変換システムの試作
「やさしい日本語」変換システムの試作「やさしい日本語」変換システムの試作
「やさしい日本語」変換システムの試作
 
常識表現となり得る用言の自動選定の検討
常識表現となり得る用言の自動選定の検討常識表現となり得る用言の自動選定の検討
常識表現となり得る用言の自動選定の検討
 
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
動詞意味類型の曖昧性解消に向けた格フレーム情報との関連調査
 
二格深層格の定量的分析
二格深層格の定量的分析二格深層格の定量的分析
二格深層格の定量的分析
 
大規模常識知識ベース構築のための常識表現の自動獲得
大規模常識知識ベース構築のための常識表現の自動獲得大規模常識知識ベース構築のための常識表現の自動獲得
大規模常識知識ベース構築のための常識表現の自動獲得
 
文脈の多様性に基づく名詞換言の提案
文脈の多様性に基づく名詞換言の提案文脈の多様性に基づく名詞換言の提案
文脈の多様性に基づく名詞換言の提案
 
保険関連文書を対象とした文章校正支援のための変換誤り検出
保険関連文書を対象とした文章校正支援のための変換誤り検出保険関連文書を対象とした文章校正支援のための変換誤り検出
保険関連文書を対象とした文章校正支援のための変換誤り検出
 
Developing User-friendly and Customizable Text Analyzer
Developing User-friendly and Customizable Text AnalyzerDeveloping User-friendly and Customizable Text Analyzer
Developing User-friendly and Customizable Text Analyzer
 
普通名詞換言辞書の構築
普通名詞換言辞書の構築普通名詞換言辞書の構築
普通名詞換言辞書の構築
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 

Modality-Preserving Phrase-based Statistical Machine Translation

  • 1. Modality-Preserving Phrase-Based Statistical Machine Translation Masamichi Ideue, Masao Utiyama, Eiichiro Sumita and Kazuhide Yamamoto (Nagaoka University of Technology and NICT)
  • 2. Purpose of our study Japanese to English translation preserving negation and question modality by Phrase- based SMT. Input 私はりんごが好きではありません。 Translation I don’t like apples. The MT users would not be able to detect a modality error. 1 MT Translation I like apples.
  • 3. Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et al., 2009] • Discriminative Reranking for SMT using Various Global Features [Goh et al., 2010] Our study focused on characteristic modality words in negations and questions. Neither of the studies discussed what expressions influence modalities. 2
  • 4. Proposed Method Add feature functions considered characteristic words of negation and question. 3
  • 5. Added feature functions The number of phrase pairs including characteristic words of question (negation) in Japanese phrase and English phrase. Hypothesis e Where is the purse ? Input f 財布 は どこ に あり ます か ? 2 4
  • 6. Characteristic Words Extraction • Manual extraction • Automatic extraction • Using LLR(Log-likelihood ratio) score Extract characteristic words from the parallel corpus in travel domain. 5
  • 7. Manual Extraction (English) Negation Question not t don Don haven isn No won wasn doesn didn cannot hadn ? Why Will What Could Is How Does Can Do Are Which When Where Have Does Did Was May 6
  • 8. Manual Extraction (Japanese) Negation Question ない(nai) ません (masen) ? か 。(ka.) • The characteristic words that clearly express the modalities are few. • Whether a word expresses modality or not, there is tendency to depends on the domain. 7
  • 9. Automatic Extraction • Automatic extraction is based on LLR. • LLR is convenient for extracting characteristic words in travel domain (Chujo et al., 2006). 1 ? 2 Will 3 Could 4 How 5 Can ... ... Extract top N words from the ranking by LLR score as the characteristic words. Order by LLR score (Question) 8
  • 10. Calculation of LLR (In case of negation) If a word tends to occur in negation only, the LLR score becomes high. Negation Affirmation w=1 a b a+b w=0 c d c+d a+c b+d n (a,b,c,d : occurrence frequency in each condition) 9
  • 11. Sentence type classification To build the contingency table, we divided sentences in the parallel corpus with manually extracted English characteristic words. English Japanese Type He is not an artist. 彼は芸術家ではない。 negation I like apples. 私はりんごが好きです。 affirmation Are you a doctor? あなたは医者ですか。 question 10
  • 12. Extracted Words by LLR (English) Negation Question do any there have this don long it isn did your much how time can yet any but know worry I anything it so afraid understand what enough 11
  • 13. Extracted Words by LLR (Japanese) Negation Question か どこ 何 どう いくら は いただけ どの 何時 あり でしょ もらえ いかが どんな ませ ない ん は なかっ あまり まだ あり でき じゃ いいえ そんなに そんな たく 12
  • 14. Experiments SMT Toolkit Moses Tuning Minimum Error Rate Training Parallel corpus Basic Travel Expression Corpus (BTEC; 70,000 pairs) Test set 1,500 sentences (included 500 sentences for negation, question, and affirmation) Development set 1,500 sentences (in the same way as test set) 13
  • 15. Experiments • From preliminary experimental evaluation with BLEU, the N is decided as 30 (LLR30). • Baseline method is no additional features. 14
  • 16. Manual Evaluation • To verify effectiveness of translation quality when add the proposed features. • To verify accuracy of each modality. We randomly extracted 90 pairs to test the methods for each modality (total 270 pairs). 15
  • 17. Translation Quality Good (S,A,B) S A B C D Baseline (No additional features) 151 60 57 34 26 93 Manually Extraction 153 55 54 44 29 88 LLR30 154 60 56 38 28 88 All the methods have the same translation quality if S, A and B are assumed good translation. (number of sentences) 16
  • 18. Accuracy of each modality Aff Neg Que Baseline 86.67 39.22 90.48 Manually Extraction 87.41 64.71 90.48 LLR30 87.41 62.75 95.24 (Percentage of the outputs preserved the modality of the input.) •Proposed methods indicated a marked improvement in negation modality. •The accuracy of LLR30 was better than the accuracy of the baseline in all modalities. 17
  • 19. Translation Example Proposed method (Manually Extraction): Which one shall we go to the circus and zoo? (O) Input (Question): サーカスと動物園、どっちに行こうか。 Baseline: Let s go to the circus and, the zoo? (X) 18
  • 20. Translation Example Proposed method (Manually Extraction): I don t mind if you cancel it? (X) Input (Question): キャンセルしてもかまいませんか。 Baseline: May I cancel? (O) masen (negation) masen ka (question) We have to treat word combinations. 19
  • 21. Conclusion • We proposed additional feature considering characteristic words for modality-preserving PBSMT. • Produced more translations preserved the modality of the input sentence than baseline without decrease of translation quality. • Automatic extraction performed the same as or better than manual extraction. 20
  • 22. LLR
  • 23. LLR Negation Affirmation w=1 a b a+b w=0 c d c+d a+c b+d n
  • 24. Translation Example Proposed method (Manually Extraction): Please go easy. (O) Input (Affirmation): やさしく打ってくださいね。 Proposed mthod (English side only): Please go easy, isn t it? (X)
  • 25. Calculation of LLR • Pr(D|H_indep) is the probability under the null hypothesis that the occurrences of a word w in the negative and affirmative sentences are independent of one another. • Pr(D|H_dep) is the case in which the occurrences are dependent. (In case of negation) If a word tends to occur in negation only, the LLR score becomes high.
  • 26. Calculation of LLR Negation Affirmation w=1 a b a+b w=0 c d c+d a+c b+d n (a,b,c,d : occurrence frequency in each condition)
  • 27. Related Studies • Class-Dependent Modeling for Dialog Translation [Finch et al., 2009] • 2 models are trained for question sentence and other sentence. • Discriminative Reranking for SMT using Various Global Features [Goh et al., 2010] • Probabilities of sentence types such as negations and questions are used. Our study focused on characteristic modality words in negations and questions. Neither of the studies discussed what expressions influence modalities.