SlideShare a Scribd company logo
1 of 66
Download to read offline
Post-editese: an Exacerbated Translationese
Antonio Toral
MT Summit, Dublin, 23rd August 2019
Table of contents
1. Intro and Motivation
2. Datasets
3. Experiments
4. Conclusions and Future
1
Abbreviations
• Translation types: HT (human from scratch), PE (post-editing), MT
(machine translation)
• Languages: ISO-2 codes, e.g. DE, EN, ES, ZH ...
• MT types: RBMT (rule-based), SMT (statistical), NMT (neural)
2
Intro and Motivation
The Reader
3
The Reader
Does PE affect the reading experience?
3
The Reader
Does PE affect the reading experience?
→ Are PE translations = HT?
3
PE vs HT. Theory and Practice
In theory, PE=HT
1. Translator is primed by MT output while post-editing (Green et al., 2013)
2. PE should contain the footprint of MT
3. HT should be preferred over PE
4
PE vs HT. Theory and Practice
In practice, quality of PE
• comparable to that of HT. E.g., Garcia (2010)
• or even better. E.g. Plitt and Masselot (2010)
5
PE vs HT. Theory and Practice
In practice, quality of PE
• comparable to that of HT. E.g., Garcia (2010)
• or even better. E.g. Plitt and Masselot (2010)
But... Quality is typically measured as number of errors (Koponen, 2016)
5
PE vs HT. Beyond Number of Errors
Characteristics of PE vs HT
• Czulo and Nitzke (2016) Terminology in PE closer to MT than HT
• Daems et al. (2017) Discrimination between PE and HT not possible
• Farrell (2018) lexical variability in PE<HT
6
PE vs HT. Translationese
Research has proven the existence of translationese: HT=original text
• Normalisation
• Simplification
• Interference
• Explicitation
7
PE vs HT. Translationese
Research has proven the existence of translationese: HT=original text
• Normalisation
• Simplification
• Interference
• Explicitation
This paper: quantitative analysis of PE vs HT in terms of translationese principles
7
Datasets
PE vs HT. Datasets
Dataset MT systems Direction PE type Domain # Sent. pairs
Tarax¨u 2 SMT, 2 RBMT en→de Light News 272
(2011) de→en 240
es→de 101
IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600
(2016) 2 NMT, 3 SMT en→fr
MS 1 NMT zh→en Full News 1,000
(2018)
8
PE vs HT. Datasets
Dataset MT systems Direction PE type Domain # Sent. pairs
Tarax¨u 2 SMT, 2 RBMT en→de Light News 272
(2011) de→en 240
es→de 101
IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600
(2016) 2 NMT, 3 SMT en→fr
MS 1 NMT zh→en Full News 1,000
(2018)
8
PE vs HT. Datasets
Dataset MT systems Direction PE type Domain # Sent. pairs
Tarax¨u 2 SMT, 2 RBMT en→de Light News 272
(2011) de→en 240
es→de 101
IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600
(2016) 2 NMT, 3 SMT en→fr
MS 1 NMT zh→en Full News 1,000
(2018)
8
PE vs HT. Datasets
Dataset MT systems Direction PE type Domain # Sent. pairs
Tarax¨u 2 SMT, 2 RBMT en→de Light News 272
(2011) de→en 240
es→de 101
IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600
(2016) 2 NMT, 3 SMT en→fr
MS 1 NMT zh→en Full News 1,000
(2018)
8
PE vs HT. Datasets
Dataset MT systems Direction PE type Domain # Sent. pairs
Tarax¨u 2 SMT, 2 RBMT en→de Light News 272
(2011) de→en 240
es→de 101
IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600
(2016) 2 NMT, 3 SMT en→fr
MS 1 NMT zh→en Full News 1,000
(2018)
8
PE vs HT. Datasets
Dataset MT systems Direction PE type Domain # Sent. pairs
Tarax¨u 2 SMT, 2 RBMT en→de Light News 272
(2011) de→en 240
es→de 101
IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600
(2016) 2 NMT, 3 SMT en→fr
MS 1 NMT zh→en Full News 1,000
(2018)
8
PE vs HT. Datasets
Dataset MT systems Direction PE type Domain # Sent. pairs
Tarax¨u 2 SMT, 2 RBMT en→de Light News 272
(2011) de→en 240
es→de 101
IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600
(2016) 2 NMT, 3 SMT en→fr
MS 1 NMT zh→en Full News 1,000
(2018)
Competence missmatch. PE=prof, HT=anyone
8
Experiments
Lexical Variety
type-token ratio =
number of types
number of tokens
(1)
9
Lexical Variety
type-token ratio =
number of types
number of tokens
(1)
→ Simplification principle
9
Lexical Variety Results (Microsoft)
zhen
0.1760
0.1780
0.1800
0.1820
0.1840
0.1860
0.1880
ht
nmt1
nmt2
penmt
type-tokenratio
10
Lexical Variety Results (Microsoft)
zhen
0.1760
0.1780
0.1800
0.1820
0.1840
0.1860
0.1880
ht
nmt1
nmt2
penmt
type-tokenratio
HT > PE > MT
10
Lexical Variety Results (all)
Translation Dataset and translation direction
type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 0.26 0.27 0.31 0.20 0.16 0.14
PE -2.05% -1.81% -1.27% -3.86% -1.17% -4.76%
MT -2.94% -3.62% -5.91% -10.93% -6.04% -6.96%
11
Lexical Variety Results (all)
Translation Dataset and translation direction
type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 0.26 0.27 0.31 0.20 0.16 0.14
PE -2.05% -1.81% -1.27% -3.86% -1.17% -4.76%
MT -2.94% -3.62% -5.91% -10.93% -6.04% -6.96%
PE-NMT -4.21% -1.88% -4.76%
PE-SMT -1.59% -1.31% -1.03% -3.50% -0.70%
PE-RBMT -2.79% -2.04% -3.05%
11
Lexical Variety Results (all)
Translation Dataset and translation direction
type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 0.26 0.27 0.31 0.20 0.16 0.14
PE -2.05% -1.81% -1.27% -3.86% -1.17% -4.76%
MT -2.94% -3.62% -5.91% -10.93% -6.04% -6.96%
PE-NMT -4.21% -1.88% -4.76%
PE-SMT -1.59% -1.31% -1.03% -3.50% -0.70%
PE-RBMT -2.79% -2.04% -3.05%
NMT -12.22% -8.18% -7.33%
SMT -2.36% -2.36% -6.42% -9.63% -4.61%
RBMT -3.08% -4.26% -7.78%
11
Lexical Density
lexical density =
number of content words
number of total words
(2)
Content words: adverbs, adjectives, nouns and verbs (UDPipe)
12
Lexical Density
lexical density =
number of content words
number of total words
(2)
Content words: adverbs, adjectives, nouns and verbs (UDPipe)
→ Simplification principle
12
Lexical Density Results (Taraxu)
ende deen esde
0.4800
0.4900
0.5000
0.5100
0.5200
0.5300
0.5400
0.5500
0.5600
HT
MT
PE
LexicalDensity
13
Lexical Density Results (Taraxu)
ende deen esde
0.4800
0.4900
0.5000
0.5100
0.5200
0.5300
0.5400
0.5500
0.5600
HT
MT
PE
LexicalDensity
HT > PE MT
13
Lexical Density Results (all)
Translation Dataset and translation direction
Type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 0.55 0.53 0.53 0.48 0.46 0.59
PE -1.00% -2.48% -4.31% -3.46% -1.24% -0.46%
MT -0.81% -0.69% -4.53% -5.14% -0.94% -2.37%
14
Lexical Density Results (all)
Translation Dataset and translation direction
Type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 0.55 0.53 0.53 0.48 0.46 0.59
PE -1.00% -2.48% -4.31% -3.46% -1.24% -0.46%
MT -0.81% -0.69% -4.53% -5.14% -0.94% -2.37%
PE-NMT -3.88% -1.47% -0.46%
PE-SMT -0.54% -2.87% -4.78% -3.04% -1.09%
PE-RBMT -1.46% -2.09% -3.84%
14
Lexical Density Results (all)
Translation Dataset and translation direction
Type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 0.55 0.53 0.53 0.48 0.46 0.59
PE -1.00% -2.48% -4.31% -3.46% -1.24% -0.46%
MT -0.81% -0.69% -4.53% -5.14% -0.94% -2.37%
PE-NMT -3.88% -1.47% -0.46%
PE-SMT -0.54% -2.87% -4.78% -3.04% -1.09%
PE-RBMT -1.46% -2.09% -3.84%
NMT -6.31% -3.14% -2.37%
SMT -0.80% 0.14% -3.45% -3.98% 0.53%
RBMT -0.83% -1.51% -5.61%
14
Length Ratio
length ratio =
|lengthST − lengthTT |
lengthST
(3)
15
Length Ratio
length ratio =
|lengthST − lengthTT |
lengthST
(3)
Hypothesis: compared to HT, PE translations are closer in length to source text
15
Length Ratio
length ratio =
|lengthST − lengthTT |
lengthST
(3)
Hypothesis: compared to HT, PE translations are closer in length to source text
→ Normalisation principle
15
Length Ratio Results (Taraxu)
ende deen esde dees
0.000
0.050
0.100
0.150
0.200
0.250
ht
pesmt1
pesmt2
perbmt1
perbmt2
lengthratio
16
Length Ratio Results (Taraxu)
ende deen esde dees
0.000
0.050
0.100
0.150
0.200
0.250
ht
pesmt1
pesmt2
perbmt1
perbmt2
lengthratio
HT > PESMT ≥ PERBMT
16
Length Ratio Results (all)
Dataset Direction
Length ratio
HT PE MT
Tarax¨u
de→en 0.16 -38.5% -36.9%
en→de 0.22 -33.4% -38.5%
es→de 0.17 -25.2% -21.0%
IWSLT
en→de 0.17 -3.4% -18.8%
en→fr 0.18 6.7% -10.9%
MS zh→en 1.41 -9.9% -9.1%
17
Length Ratio Results (all)
Dataset Direction
Length ratio
HT PE MT
Tarax¨u
de→en 0.16 -38.5% -36.9%
en→de 0.22 -33.4% -38.5%
es→de 0.17 -25.2% -21.0%
IWSLT
en→de 0.17 -3.4% -18.8%
en→fr 0.18 6.7% -10.9%
MS zh→en 1.41 -9.9% -9.1%
Competence missmatch. PE=prof, HT=anyone
17
Perplexity on PoS Sequences
Process:
1. PoS tag monolingual corpora (Universal Dependencies tag set) for source
and target languages
2. Build language models on PoS tagged data
3. PoS tag each translation (MT, PE and HT) and calculate:
18
Perplexity on PoS Sequences
Process:
1. PoS tag monolingual corpora (Universal Dependencies tag set) for source
and target languages
2. Build language models on PoS tagged data
3. PoS tag each translation (MT, PE and HT) and calculate:
PP diff = PP(translation, LMsource) − PP(translation, LMtarget) (4)
18
Perplexity on PoS Sequences
Process:
1. PoS tag monolingual corpora (Universal Dependencies tag set) for source
and target languages
2. Build language models on PoS tagged data
3. PoS tag each translation (MT, PE and HT) and calculate:
PP diff = PP(translation, LMsource) − PP(translation, LMtarget) (4)
Hypothesis: PP diffPE < PP diffHT
18
Perplexity on PoS Sequences
Process:
1. PoS tag monolingual corpora (Universal Dependencies tag set) for source
and target languages
2. Build language models on PoS tagged data
3. PoS tag each translation (MT, PE and HT) and calculate:
PP diff = PP(translation, LMsource) − PP(translation, LMtarget) (4)
Hypothesis: PP diffPE < PP diffHT
→ Interference principle
18
Perplexity Results (Taraxu)
ende deen
0.00
1.00
2.00
3.00
4.00
5.00
6.00
ht
smt1
pesmt1
smt2
pesmt2
rbmt1
perbmt1
rbmt2
perbmt2
Perplexitydifference
19
Perplexity Results (all)
Translation Dataset and translation direction
Type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 5.12 5.09 9.41 5.01 2.47 17.23
PE -13.84% -11.29% -8.58% -6.26% -2.03% -3.26%
MT -33.65% -32.25% -20.71% -18.66% -11.07% -3.1%
20
Perplexity Results (all)
Translation Dataset and translation direction
Type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 5.12 5.09 9.41 5.01 2.47 17.23
PE -13.84% -11.29% -8.58% -6.26% -2.03% -3.26%
MT -33.65% -32.25% -20.71% -18.66% -11.07% -3.1%
PE-NMT -3.41% -1.40% -3.26%
PE-SMT -11.72% -13.37% -10.48% -9.10% -2.46%
PE-RBMT -15.95% -9.20% -6.68%
20
Perplexity Results (all)
Translation Dataset and translation direction
Type Tarax¨u IWSLT MS
de→en en→de es→de en→de en→fr zh→en
HT 5.12 5.09 9.41 5.01 2.47 17.23
PE -13.84% -11.29% -8.58% -6.26% -2.03% -3.26%
MT -33.65% -32.25% -20.71% -18.66% -11.07% -3.1%
PE-NMT -3.41% -1.40% -3.26%
PE-SMT -11.72% -13.37% -10.48% -9.10% -2.46%
PE-RBMT -15.95% -9.20% -6.68%
NMT -5.89% -2.58% -3.10%
SMT -30.07% -41.71% -26.30% -31.43% -7.95%
RBMT -37.24% -22.80% -15.13%
20
Conclusions and Future
Conclusions
PE=HT. PEs:
• Are simpler (lexical variety and density)
• Are more normalised (length ratio)
• Have more interference from the source language (PoS sequences)
21
Conclusions
PE=HT. PEs:
• Are simpler (lexical variety and density)
• Are more normalised (length ratio)
• Have more interference from the source language (PoS sequences)
MT paradigms
• (PE)SMT better than (PE)NMT in lexical variety and density
• (PE)NMT has less interference than (PE)SMT
21
Discussion
1. Does PE contribute to the impoverishment of the target language?
22
Discussion
1. Does PE contribute to the impoverishment of the target language?
2. In this study HT better than PE. But number of errors HT≥PE
22
Discussion
1. Does PE contribute to the impoverishment of the target language?
2. In this study HT better than PE. But number of errors HT≥PE
• PE may be better suited than HT for some domains, e.g. technical
22
Discussion
1. Does PE contribute to the impoverishment of the target language?
2. In this study HT better than PE. But number of errors HT≥PE
• PE may be better suited than HT for some domains, e.g. technical
3. It’s not the fault of the post-editing process per se... but of MT
22
Discussion
1. Does PE contribute to the impoverishment of the target language?
2. In this study HT better than PE. But number of errors HT≥PE
• PE may be better suited than HT for some domains, e.g. technical
3. It’s not the fault of the post-editing process per se... but of MT
• PEs should be better if MT is. E.g. interference in PE-NMT<PE-SMT
because interference in NMT<SMT
22
Future
• Effect of PE guidelines, translator’s expertise, etc.
• Measures with deeper linguistic information
• Automatic discrimination between PE and HT
• More data (industry?)
Data and code available: https://bit.ly/2zeKf0b
23
Thanks: L. Bentivogli, S. Castilho, J. Daems, M. Farrell, L. Macken, L. Marg
and M. Popovi´c
23
Thanks: L. Bentivogli, S. Castilho, J. Daems, M. Farrell, L. Macken, L. Marg
and M. Popovi´c
Go raibh maith agaibh!
Ceisteanna?
Antonio Toral
@ atoral
23
References i
References
L. Bowker and J. Buitrago Ciro. Investigating the usefulness of machine
translation for newcomers at the public library. Translation and Interpreting
Studies, 10(2):165–186, 2015. ISSN 1932-2798. doi: 10.1075/tis.10.2.01bow.
URL http://www.jbe-platform.com/content/journals/10.1075/tis.
10.2.01bow.
References ii
O. Czulo and J. Nitzke. Patterns of terminological variation in post-editing and
of cognate use in machine translation in contrast to human translation. In
Proceedings of the 19th Annual Conference of the European Association for
Machine Translation, EAMT 2017, Riga, Latvia, May 30 - June 1, 2016, pages
106–114. European Association for Machine Translation, 2016. URL
https://aclanthology.info/papers/W16-3401/w16-3401.
J. Daems, O. De Clercq, and L. Macken. Translationese and post-editese : how
comparable is comparable quality? LINGUISTICA ANTVERPIENSIA NEW
SERIES-THEMES IN TRANSLATION STUDIES, 16:89–103, 2017. ISSN
0304-2294. URL https://lans-tts.uantwerpen.be/index.php/
LANS-TTS/article/view/434/409.
References iii
M. Farrell. Machine Translation Markers in Post-Edited Machine Translation
Output. In Proceedings of the 40th Conference Translating and the Computer,
pages 50–59, 2018.
R. Fiederer and S. O’Brien. Quality and machine translation: A realistic
objective. The Journal of Specialised Translation, 11:52–74, 2009.
I. Garcia. Is machine translation ready yet? Target. International Journal of
Translation Studies, 22(1):7–21, 2010.
S. Green, J. Heer, and C. D. Manning. The efficacy of human post-editing for
language translation. Chi 2013, pages 439–448, 2013. doi:
10.1145/2470654.2470718. URL
http://vis.stanford.edu/papers/post-editing.
References iv
M. Koponen. Is machine translation post-editing worth the effort? A survey of
research into post-editing and effort. Journal of Specialised Translation, 25
(25):131–148, 2016. ISSN 0169-2607. URL
https://sites.google.com/site/wptp2015/.
M. Plitt and F. Masselot. A productivity test of statistical machine translation
post-editing in a typical localisation context. The Prague bulletin of
mathematical linguistics, 93:7–16, 2010.

More Related Content

What's hot

Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkIRJET Journal
 
Python interview questions
Python interview questionsPython interview questions
Python interview questionsPragati Singh
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easyGopi Krishnan Nambiar
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Johan Blomme
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingProf. Wim Van Criekinge
 
1. python programming
1. python programming1. python programming
1. python programmingsreeLekha51
 

What's hot (7)

Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation Network
 
8 issues in pos tagging
8 issues in pos tagging8 issues in pos tagging
8 issues in pos tagging
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
 
Natural Language Processing made easy
Natural Language Processing made easyNatural Language Processing made easy
Natural Language Processing made easy
 
Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1Text mining and social network analysis of twitter data part 1
Text mining and social network analysis of twitter data part 1
 
Bioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searchingBioinformatica 10-11-2011-t5-database searching
Bioinformatica 10-11-2011-t5-database searching
 
1. python programming
1. python programming1. python programming
1. python programming
 

Similar to Post-editese: Translation Characteristics Exacerbated by Machine Translation

Classification of CNN.com Articles using a TF*IDF Metric
Classification of CNN.com Articles using a TF*IDF MetricClassification of CNN.com Articles using a TF*IDF Metric
Classification of CNN.com Articles using a TF*IDF MetricMarie Vans
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONcscpconf
 
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...Association for Computational Linguistics
 
Deep learning for biotechnology presentation
Deep learning for biotechnology presentationDeep learning for biotechnology presentation
Deep learning for biotechnology presentationashuh3
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...Tomoki Hayashi
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Jeongkyu Shin
 
Named entity recognition (ner) with nltk
Named entity recognition (ner) with nltkNamed entity recognition (ner) with nltk
Named entity recognition (ner) with nltkJanu Jahnavi
 
computer networks presetation.pptx
computer networks presetation.pptxcomputer networks presetation.pptx
computer networks presetation.pptxsarahgrant83
 
nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfYuki Saito
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
Bioinfomatics Presentation
Bioinfomatics PresentationBioinfomatics Presentation
Bioinfomatics PresentationZhenhong Bao
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text ProcessingSuneel Marthi
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextDataWorks Summit
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Association for Computational Linguistics
 

Similar to Post-editese: Translation Characteristics Exacerbated by Machine Translation (20)

Text Mining Analytics 101
Text Mining Analytics 101Text Mining Analytics 101
Text Mining Analytics 101
 
Classification of CNN.com Articles using a TF*IDF Metric
Classification of CNN.com Articles using a TF*IDF MetricClassification of CNN.com Articles using a TF*IDF Metric
Classification of CNN.com Articles using a TF*IDF Metric
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...Ella Rabinovich - 2017 -  Personalized Machine Translation: Preserving Origin...
Ella Rabinovich - 2017 - Personalized Machine Translation: Preserving Origin...
 
Deep learning for biotechnology presentation
Deep learning for biotechnology presentationDeep learning for biotechnology presentation
Deep learning for biotechnology presentation
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Te...
 
N20181217
N20181217N20181217
N20181217
 
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
Let Android dream electric sheep: Making emotion model for chat-bot with Pyth...
 
Named entity recognition (ner) with nltk
Named entity recognition (ner) with nltkNamed entity recognition (ner) with nltk
Named entity recognition (ner) with nltk
 
computer networks presetation.pptx
computer networks presetation.pptxcomputer networks presetation.pptx
computer networks presetation.pptx
 
SAFE EDBT 2011
SAFE EDBT 2011SAFE EDBT 2011
SAFE EDBT 2011
 
nakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdfnakai22apsipa_presentation.pdf
nakai22apsipa_presentation.pdf
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Bioinfomatics Presentation
Bioinfomatics PresentationBioinfomatics Presentation
Bioinfomatics Presentation
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Large Scale Text Processing
Large Scale Text ProcessingLarge Scale Text Processing
Large Scale Text Processing
 
Large Scale Processing of Unstructured Text
Large Scale Processing of Unstructured TextLarge Scale Processing of Unstructured Text
Large Scale Processing of Unstructured Text
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 

Recently uploaded

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 

Recently uploaded (20)

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 

Post-editese: Translation Characteristics Exacerbated by Machine Translation

  • 1. Post-editese: an Exacerbated Translationese Antonio Toral MT Summit, Dublin, 23rd August 2019
  • 2. Table of contents 1. Intro and Motivation 2. Datasets 3. Experiments 4. Conclusions and Future 1
  • 3. Abbreviations • Translation types: HT (human from scratch), PE (post-editing), MT (machine translation) • Languages: ISO-2 codes, e.g. DE, EN, ES, ZH ... • MT types: RBMT (rule-based), SMT (statistical), NMT (neural) 2
  • 6. The Reader Does PE affect the reading experience? 3
  • 7. The Reader Does PE affect the reading experience? → Are PE translations = HT? 3
  • 8. PE vs HT. Theory and Practice In theory, PE=HT 1. Translator is primed by MT output while post-editing (Green et al., 2013) 2. PE should contain the footprint of MT 3. HT should be preferred over PE 4
  • 9. PE vs HT. Theory and Practice In practice, quality of PE • comparable to that of HT. E.g., Garcia (2010) • or even better. E.g. Plitt and Masselot (2010) 5
  • 10. PE vs HT. Theory and Practice In practice, quality of PE • comparable to that of HT. E.g., Garcia (2010) • or even better. E.g. Plitt and Masselot (2010) But... Quality is typically measured as number of errors (Koponen, 2016) 5
  • 11. PE vs HT. Beyond Number of Errors Characteristics of PE vs HT • Czulo and Nitzke (2016) Terminology in PE closer to MT than HT • Daems et al. (2017) Discrimination between PE and HT not possible • Farrell (2018) lexical variability in PE<HT 6
  • 12. PE vs HT. Translationese Research has proven the existence of translationese: HT=original text • Normalisation • Simplification • Interference • Explicitation 7
  • 13. PE vs HT. Translationese Research has proven the existence of translationese: HT=original text • Normalisation • Simplification • Interference • Explicitation This paper: quantitative analysis of PE vs HT in terms of translationese principles 7
  • 15. PE vs HT. Datasets Dataset MT systems Direction PE type Domain # Sent. pairs Tarax¨u 2 SMT, 2 RBMT en→de Light News 272 (2011) de→en 240 es→de 101 IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600 (2016) 2 NMT, 3 SMT en→fr MS 1 NMT zh→en Full News 1,000 (2018) 8
  • 16. PE vs HT. Datasets Dataset MT systems Direction PE type Domain # Sent. pairs Tarax¨u 2 SMT, 2 RBMT en→de Light News 272 (2011) de→en 240 es→de 101 IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600 (2016) 2 NMT, 3 SMT en→fr MS 1 NMT zh→en Full News 1,000 (2018) 8
  • 17. PE vs HT. Datasets Dataset MT systems Direction PE type Domain # Sent. pairs Tarax¨u 2 SMT, 2 RBMT en→de Light News 272 (2011) de→en 240 es→de 101 IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600 (2016) 2 NMT, 3 SMT en→fr MS 1 NMT zh→en Full News 1,000 (2018) 8
  • 18. PE vs HT. Datasets Dataset MT systems Direction PE type Domain # Sent. pairs Tarax¨u 2 SMT, 2 RBMT en→de Light News 272 (2011) de→en 240 es→de 101 IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600 (2016) 2 NMT, 3 SMT en→fr MS 1 NMT zh→en Full News 1,000 (2018) 8
  • 19. PE vs HT. Datasets Dataset MT systems Direction PE type Domain # Sent. pairs Tarax¨u 2 SMT, 2 RBMT en→de Light News 272 (2011) de→en 240 es→de 101 IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600 (2016) 2 NMT, 3 SMT en→fr MS 1 NMT zh→en Full News 1,000 (2018) 8
  • 20. PE vs HT. Datasets Dataset MT systems Direction PE type Domain # Sent. pairs Tarax¨u 2 SMT, 2 RBMT en→de Light News 272 (2011) de→en 240 es→de 101 IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600 (2016) 2 NMT, 3 SMT en→fr MS 1 NMT zh→en Full News 1,000 (2018) 8
  • 21. PE vs HT. Datasets Dataset MT systems Direction PE type Domain # Sent. pairs Tarax¨u 2 SMT, 2 RBMT en→de Light News 272 (2011) de→en 240 es→de 101 IWSLT 4 NMT, 4 SMT en→de Light Subtitles 600 (2016) 2 NMT, 3 SMT en→fr MS 1 NMT zh→en Full News 1,000 (2018) Competence missmatch. PE=prof, HT=anyone 8
  • 23. Lexical Variety type-token ratio = number of types number of tokens (1) 9
  • 24. Lexical Variety type-token ratio = number of types number of tokens (1) → Simplification principle 9
  • 25. Lexical Variety Results (Microsoft) zhen 0.1760 0.1780 0.1800 0.1820 0.1840 0.1860 0.1880 ht nmt1 nmt2 penmt type-tokenratio 10
  • 26. Lexical Variety Results (Microsoft) zhen 0.1760 0.1780 0.1800 0.1820 0.1840 0.1860 0.1880 ht nmt1 nmt2 penmt type-tokenratio HT > PE > MT 10
  • 27. Lexical Variety Results (all) Translation Dataset and translation direction type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 0.26 0.27 0.31 0.20 0.16 0.14 PE -2.05% -1.81% -1.27% -3.86% -1.17% -4.76% MT -2.94% -3.62% -5.91% -10.93% -6.04% -6.96% 11
  • 28. Lexical Variety Results (all) Translation Dataset and translation direction type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 0.26 0.27 0.31 0.20 0.16 0.14 PE -2.05% -1.81% -1.27% -3.86% -1.17% -4.76% MT -2.94% -3.62% -5.91% -10.93% -6.04% -6.96% PE-NMT -4.21% -1.88% -4.76% PE-SMT -1.59% -1.31% -1.03% -3.50% -0.70% PE-RBMT -2.79% -2.04% -3.05% 11
  • 29. Lexical Variety Results (all) Translation Dataset and translation direction type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 0.26 0.27 0.31 0.20 0.16 0.14 PE -2.05% -1.81% -1.27% -3.86% -1.17% -4.76% MT -2.94% -3.62% -5.91% -10.93% -6.04% -6.96% PE-NMT -4.21% -1.88% -4.76% PE-SMT -1.59% -1.31% -1.03% -3.50% -0.70% PE-RBMT -2.79% -2.04% -3.05% NMT -12.22% -8.18% -7.33% SMT -2.36% -2.36% -6.42% -9.63% -4.61% RBMT -3.08% -4.26% -7.78% 11
  • 30. Lexical Density lexical density = number of content words number of total words (2) Content words: adverbs, adjectives, nouns and verbs (UDPipe) 12
  • 31. Lexical Density lexical density = number of content words number of total words (2) Content words: adverbs, adjectives, nouns and verbs (UDPipe) → Simplification principle 12
  • 32. Lexical Density Results (Taraxu) ende deen esde 0.4800 0.4900 0.5000 0.5100 0.5200 0.5300 0.5400 0.5500 0.5600 HT MT PE LexicalDensity 13
  • 33. Lexical Density Results (Taraxu) ende deen esde 0.4800 0.4900 0.5000 0.5100 0.5200 0.5300 0.5400 0.5500 0.5600 HT MT PE LexicalDensity HT > PE MT 13
  • 34. Lexical Density Results (all) Translation Dataset and translation direction Type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 0.55 0.53 0.53 0.48 0.46 0.59 PE -1.00% -2.48% -4.31% -3.46% -1.24% -0.46% MT -0.81% -0.69% -4.53% -5.14% -0.94% -2.37% 14
  • 35. Lexical Density Results (all) Translation Dataset and translation direction Type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 0.55 0.53 0.53 0.48 0.46 0.59 PE -1.00% -2.48% -4.31% -3.46% -1.24% -0.46% MT -0.81% -0.69% -4.53% -5.14% -0.94% -2.37% PE-NMT -3.88% -1.47% -0.46% PE-SMT -0.54% -2.87% -4.78% -3.04% -1.09% PE-RBMT -1.46% -2.09% -3.84% 14
  • 36. Lexical Density Results (all) Translation Dataset and translation direction Type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 0.55 0.53 0.53 0.48 0.46 0.59 PE -1.00% -2.48% -4.31% -3.46% -1.24% -0.46% MT -0.81% -0.69% -4.53% -5.14% -0.94% -2.37% PE-NMT -3.88% -1.47% -0.46% PE-SMT -0.54% -2.87% -4.78% -3.04% -1.09% PE-RBMT -1.46% -2.09% -3.84% NMT -6.31% -3.14% -2.37% SMT -0.80% 0.14% -3.45% -3.98% 0.53% RBMT -0.83% -1.51% -5.61% 14
  • 37. Length Ratio length ratio = |lengthST − lengthTT | lengthST (3) 15
  • 38. Length Ratio length ratio = |lengthST − lengthTT | lengthST (3) Hypothesis: compared to HT, PE translations are closer in length to source text 15
  • 39. Length Ratio length ratio = |lengthST − lengthTT | lengthST (3) Hypothesis: compared to HT, PE translations are closer in length to source text → Normalisation principle 15
  • 40. Length Ratio Results (Taraxu) ende deen esde dees 0.000 0.050 0.100 0.150 0.200 0.250 ht pesmt1 pesmt2 perbmt1 perbmt2 lengthratio 16
  • 41. Length Ratio Results (Taraxu) ende deen esde dees 0.000 0.050 0.100 0.150 0.200 0.250 ht pesmt1 pesmt2 perbmt1 perbmt2 lengthratio HT > PESMT ≥ PERBMT 16
  • 42. Length Ratio Results (all) Dataset Direction Length ratio HT PE MT Tarax¨u de→en 0.16 -38.5% -36.9% en→de 0.22 -33.4% -38.5% es→de 0.17 -25.2% -21.0% IWSLT en→de 0.17 -3.4% -18.8% en→fr 0.18 6.7% -10.9% MS zh→en 1.41 -9.9% -9.1% 17
  • 43. Length Ratio Results (all) Dataset Direction Length ratio HT PE MT Tarax¨u de→en 0.16 -38.5% -36.9% en→de 0.22 -33.4% -38.5% es→de 0.17 -25.2% -21.0% IWSLT en→de 0.17 -3.4% -18.8% en→fr 0.18 6.7% -10.9% MS zh→en 1.41 -9.9% -9.1% Competence missmatch. PE=prof, HT=anyone 17
  • 44. Perplexity on PoS Sequences Process: 1. PoS tag monolingual corpora (Universal Dependencies tag set) for source and target languages 2. Build language models on PoS tagged data 3. PoS tag each translation (MT, PE and HT) and calculate: 18
  • 45. Perplexity on PoS Sequences Process: 1. PoS tag monolingual corpora (Universal Dependencies tag set) for source and target languages 2. Build language models on PoS tagged data 3. PoS tag each translation (MT, PE and HT) and calculate: PP diff = PP(translation, LMsource) − PP(translation, LMtarget) (4) 18
  • 46. Perplexity on PoS Sequences Process: 1. PoS tag monolingual corpora (Universal Dependencies tag set) for source and target languages 2. Build language models on PoS tagged data 3. PoS tag each translation (MT, PE and HT) and calculate: PP diff = PP(translation, LMsource) − PP(translation, LMtarget) (4) Hypothesis: PP diffPE < PP diffHT 18
  • 47. Perplexity on PoS Sequences Process: 1. PoS tag monolingual corpora (Universal Dependencies tag set) for source and target languages 2. Build language models on PoS tagged data 3. PoS tag each translation (MT, PE and HT) and calculate: PP diff = PP(translation, LMsource) − PP(translation, LMtarget) (4) Hypothesis: PP diffPE < PP diffHT → Interference principle 18
  • 48. Perplexity Results (Taraxu) ende deen 0.00 1.00 2.00 3.00 4.00 5.00 6.00 ht smt1 pesmt1 smt2 pesmt2 rbmt1 perbmt1 rbmt2 perbmt2 Perplexitydifference 19
  • 49. Perplexity Results (all) Translation Dataset and translation direction Type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 5.12 5.09 9.41 5.01 2.47 17.23 PE -13.84% -11.29% -8.58% -6.26% -2.03% -3.26% MT -33.65% -32.25% -20.71% -18.66% -11.07% -3.1% 20
  • 50. Perplexity Results (all) Translation Dataset and translation direction Type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 5.12 5.09 9.41 5.01 2.47 17.23 PE -13.84% -11.29% -8.58% -6.26% -2.03% -3.26% MT -33.65% -32.25% -20.71% -18.66% -11.07% -3.1% PE-NMT -3.41% -1.40% -3.26% PE-SMT -11.72% -13.37% -10.48% -9.10% -2.46% PE-RBMT -15.95% -9.20% -6.68% 20
  • 51. Perplexity Results (all) Translation Dataset and translation direction Type Tarax¨u IWSLT MS de→en en→de es→de en→de en→fr zh→en HT 5.12 5.09 9.41 5.01 2.47 17.23 PE -13.84% -11.29% -8.58% -6.26% -2.03% -3.26% MT -33.65% -32.25% -20.71% -18.66% -11.07% -3.1% PE-NMT -3.41% -1.40% -3.26% PE-SMT -11.72% -13.37% -10.48% -9.10% -2.46% PE-RBMT -15.95% -9.20% -6.68% NMT -5.89% -2.58% -3.10% SMT -30.07% -41.71% -26.30% -31.43% -7.95% RBMT -37.24% -22.80% -15.13% 20
  • 53. Conclusions PE=HT. PEs: • Are simpler (lexical variety and density) • Are more normalised (length ratio) • Have more interference from the source language (PoS sequences) 21
  • 54. Conclusions PE=HT. PEs: • Are simpler (lexical variety and density) • Are more normalised (length ratio) • Have more interference from the source language (PoS sequences) MT paradigms • (PE)SMT better than (PE)NMT in lexical variety and density • (PE)NMT has less interference than (PE)SMT 21
  • 55. Discussion 1. Does PE contribute to the impoverishment of the target language? 22
  • 56. Discussion 1. Does PE contribute to the impoverishment of the target language? 2. In this study HT better than PE. But number of errors HT≥PE 22
  • 57. Discussion 1. Does PE contribute to the impoverishment of the target language? 2. In this study HT better than PE. But number of errors HT≥PE • PE may be better suited than HT for some domains, e.g. technical 22
  • 58. Discussion 1. Does PE contribute to the impoverishment of the target language? 2. In this study HT better than PE. But number of errors HT≥PE • PE may be better suited than HT for some domains, e.g. technical 3. It’s not the fault of the post-editing process per se... but of MT 22
  • 59. Discussion 1. Does PE contribute to the impoverishment of the target language? 2. In this study HT better than PE. But number of errors HT≥PE • PE may be better suited than HT for some domains, e.g. technical 3. It’s not the fault of the post-editing process per se... but of MT • PEs should be better if MT is. E.g. interference in PE-NMT<PE-SMT because interference in NMT<SMT 22
  • 60. Future • Effect of PE guidelines, translator’s expertise, etc. • Measures with deeper linguistic information • Automatic discrimination between PE and HT • More data (industry?) Data and code available: https://bit.ly/2zeKf0b 23
  • 61. Thanks: L. Bentivogli, S. Castilho, J. Daems, M. Farrell, L. Macken, L. Marg and M. Popovi´c 23
  • 62. Thanks: L. Bentivogli, S. Castilho, J. Daems, M. Farrell, L. Macken, L. Marg and M. Popovi´c Go raibh maith agaibh! Ceisteanna? Antonio Toral @ atoral 23
  • 63. References i References L. Bowker and J. Buitrago Ciro. Investigating the usefulness of machine translation for newcomers at the public library. Translation and Interpreting Studies, 10(2):165–186, 2015. ISSN 1932-2798. doi: 10.1075/tis.10.2.01bow. URL http://www.jbe-platform.com/content/journals/10.1075/tis. 10.2.01bow.
  • 64. References ii O. Czulo and J. Nitzke. Patterns of terminological variation in post-editing and of cognate use in machine translation in contrast to human translation. In Proceedings of the 19th Annual Conference of the European Association for Machine Translation, EAMT 2017, Riga, Latvia, May 30 - June 1, 2016, pages 106–114. European Association for Machine Translation, 2016. URL https://aclanthology.info/papers/W16-3401/w16-3401. J. Daems, O. De Clercq, and L. Macken. Translationese and post-editese : how comparable is comparable quality? LINGUISTICA ANTVERPIENSIA NEW SERIES-THEMES IN TRANSLATION STUDIES, 16:89–103, 2017. ISSN 0304-2294. URL https://lans-tts.uantwerpen.be/index.php/ LANS-TTS/article/view/434/409.
  • 65. References iii M. Farrell. Machine Translation Markers in Post-Edited Machine Translation Output. In Proceedings of the 40th Conference Translating and the Computer, pages 50–59, 2018. R. Fiederer and S. O’Brien. Quality and machine translation: A realistic objective. The Journal of Specialised Translation, 11:52–74, 2009. I. Garcia. Is machine translation ready yet? Target. International Journal of Translation Studies, 22(1):7–21, 2010. S. Green, J. Heer, and C. D. Manning. The efficacy of human post-editing for language translation. Chi 2013, pages 439–448, 2013. doi: 10.1145/2470654.2470718. URL http://vis.stanford.edu/papers/post-editing.
  • 66. References iv M. Koponen. Is machine translation post-editing worth the effort? A survey of research into post-editing and effort. Journal of Specialised Translation, 25 (25):131–148, 2016. ISSN 0169-2607. URL https://sites.google.com/site/wptp2015/. M. Plitt and F. Masselot. A productivity test of statistical machine translation post-editing in a typical localisation context. The Prague bulletin of mathematical linguistics, 93:7–16, 2010.