Слайды к выступлению доцента Копенгагенской школы бизнеса (Copenhagen Business School) Михаэля Карла, в котором он рассказал о новейших разработках в области машинного перевода, в частности, о системе CasMaCat, в которой применяются интерактивные методы взаимодействия с пользователем.
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
From CasMaCat to SEECAT: Patterns of Interaction in Advanced Computer-Assisted Translation
1. From CasMaCat to SEECAT
Patterns of Interaction in
Advanced Computer Assisted Translation
Michael Carl
CRITT, Copenhagen Business School
Moscow, April, 2014
2. Overview
Post-editing Patterns in CasMaCat
Prototype-I: From scratch translation vs. PE
Prototype-II: IMT and advanced PE
Activity Patterns in Post-editing
SEECAT summer project 2013
Extend CasMaCat prototype with speech and gaze
input
6. Experiment 1: Prototype-I
Time saving: PEMT vs. translation from scratch
Domain: newspaper article
Languages: EN → ES
1) Target Segments empty: from-scratch translation
2) Target Segments filled with pre-translated MT
output
Moses, trained on news texts
Average time saving of 25% for PEMT
7. translating (grey), post-editing (black), in words per hour
Productivity per Participant
(Elming, Winther-Balling & Carl, 2014)
10. Translation production (1)
(Winther-Balling & Carl, 2014)
Translation task: from-scratch translation
takes (almost) always takes longer than post-
editing.
Inefficiency: the more keystrokes are
produced the longer it takes to produce the
translation.
Alternating processing: shifting attention
frequently between different areas (TT, ST
keyboard) is time-consuming.
11. Translation production (2)
(Winther-Balling & Carl, 2014)
Average word frequency: lower word
frequency results in slower production time;
this tendency is more pronounced for student
translators.
Number of different possible translations:
high translation ambiguity has a slow-down
effect only in post-editing.
Alignment crossing: crossing distance has
significant effects only for post-editing
German and Spanish.
13. Usage of advanced IMT
Nine Post-editors
Three datasets (3,000 words each)
Three different CASMACAT configurations:
1) Traditional post-editing (without IMT)
2) Post-editing using IMT
3) Post-editing using advanced IMT
(featuring: word/cursor alignment & prediction length
control).
26. Speaking your translation
More than 4 times quicker (Brown et al 1994)
Up to 6 times (Dragsted, et al, 2011)
Using Dragon speech
44% faster if ASR error rate < 4% (Desilets et al,
2008)
Based on estimation
None of the studies used GUI, no use of gaze
data
27. SEECAT -
Speech & Eye-Tracking Enabled CAT
Use speech input as a post-editing tool in order to
enhance efficiency for language translators.
Use eyetracker to synchronize reading and
speaking with the MT output, for positioning of input
cursor.
Demonstrate increase in translation throughput
using speech input for post-editing over a system
without speech input.
30. SEECAT: Workbench
SPANISH typing + speech
SPANISH typing + speech - 100% accurate
HINDI speech - with inaccuracies
GIVE IT A TRY:
http://bridge.cbs.dk/prototype2/seecat_speech/
ASR: English, Hindi, Spanish
31. PRE-PILOT EXPERIMENTS (I)
Subjects: 2 participants
Text type: tourism domain (6 texts - 10 segments).
Language pair: English to Spanish
Dependent variable: TIME (productivity gain)
Tasks:
i. Translation from scratch through typing (only keyboard)
ii. Translation from scratch through ASR (only speech)
iii. Post-editing through typing (only keyboard)
iv. Post-editing through ASR (only speech)
v. Translation from scratch through typing + ASR
vi. Post-editing through typing + ASR
34
35. Conclusions
Translation Process Research is an active field of
research
Multi-modal input can help to improve productivity
both in translation and post-editing.
Further experimentation is needed to:
Understand cognitive processes
Provide better support for translators
Maximize productivity, and quality, with less effort
38
36.
37. References
Jesus Gonzalez-Rubio, Daniel Ortiz, Jose Miguel Bened, Francisco Casacuberta.
Interactive Machine Translation using Hierarchical Translation Models. Proceedings
of the Conference on Empirical Methods in Natural Language Processing
(EMNLP13). October 18-21, 2013 Seattle, USA.
Vicent Alabau, Ragnar Bonk, Christian Buck, Michael Carl, Francisco Casacuberta,
Mercedes Garcia-Martinez, Jesus Gonzalez, Philipp Koehn, Luis Leiva, Bartolome
Mesa-Lao, Daniel Ortiz, Herve Saint-Amand, German Sanchis, Chara Tsoukala:
"CASMACAT: An Open Source Workbench for Advanced Computer Aided
Translation", The Prague Bulletin of Mathematical Linguistics, Number 100, October
2013, pages 101-112.
Elming, Jakob, Michael Carl, and Laura Winther Balling. Investigating User
Behaviour in Post-editing and Translation Using the CASMACAT Workbench.” In
Expertise in Post-editing: Processes, Technology and Applications, edited by
Sharon O’Brien, Michael Simard, Lucia Specia, Michael Carl and Laura Winther
Balling. Cambridge Scholars Publishing