SlideShare a Scribd company logo
1 of 23
Directorate-General for Translation
EUROPEAN
COMMISSION
Machine Translation at the
European Commission and
the Relation to Terminology Work
Andreas Eisele
Language applications, ICT Unit
- 2 -
MT@DGT
MT at the European Commission
Structure of the presentation
• Usage Scenarios for Translation
• Technological Paradigms for MT
 Statistical Machine Translation (SMT)
 Rule-based Machine Translation (RBMT)
 Hybrid MT
• MT@EC: Recent Developments and
Perspectives
• Relation to Terminology Work
- 3 -
MT@DGT
Usage Scenarios and Requirements for
MT
Requirements for MT depend on the way it is used
a) MT for assimilation
„inbound“
b) MT for dissemination
„outbound“
c) MT for direct communication
Textual quality
MT
L2
L3
…
Ln
L1
MT
L2
L3
…
Ln
L1
MT
L1 L2
Robustness
Coverage
Speech recognition errors, specific style (chat)
context dependence
Publishable quality can only be
authored by humans; Translation
Memories & CAT-Tools are almost
mandatory for professional
translators
Practically unlimited demand;
but free web-based services
reduce incentive to improve
technology
Topic of many running and completed research projects
(VerbMobil, TC Star, TransTac, …) US-Military uses systems for
spoken MT, first applications for smartphones
- 4 -
MT@DGT
Statistical Machine Translation: Theory
 Developed by F. Jelinek at IBM (1988-1995), based on „distorted
channel“ Paradigm (successful for pattern- and speech recognition )
 Decoding: Given observation F, find most likely cause E*
 Three subproblems
P(E): (Target) Language Model
P(F|E): Translation Model
Search for E*: Decoding, MT
 Models are trained with (parallel) corpora, correspondences
(alignments) between languages are estimated via EM-Algorithm
(GIZA++ by F.J.Och) search/decoding possible via Moses (Koehn e.a.)
P(E) P(F|E)
 E  F

E* = argmaxE P(E|F) = argmaxE P(E,F) = argmaxE P(E) * P(F|E)
each has approximate solutions
nGram-Models P(e1…en) = ΠP(ei|ei-2 ei-1)
Transfer of „phrases“ P(F|E) = ΠP(fi|ei)*P(di)
Heuristic (beam) search
source text
translation
- 5 -
MT@DGT
Machine translation
Language Model (Fluency)
Translation Model (Adequacy)
Basic Architecture for Statistical MT
Monolingual
Corpus
Phrase
Table
Parallel
Corpus
nGram-
Model
Alignment,
Phrase
Extraction
Counting,
Smoothing
Decoder
Source
Text
Target
Text
N-best
Lists
- 6 -
MT@DGT
Examples of SMT Models
A selection of 23 out of 3897 ways to translate operations from EN to FR
operations ||| action ||| (0) ||| (0) ||| 0.00338724 0.0017156 0.00316685 0.0034059 2.718
operations ||| actions ||| (0) ||| (0) ||| 0.0575431 0.0534003 0.0731052 0.0958526 2.718
operations ||| activité ||| (0) ||| (0) ||| 0.0102038 0.0079204 0.00744879 0.0084917 2.718
operations ||| activités ||| (0) ||| (0) ||| 0.019962 0.0194538 0.0366753 0.0451576 2.718
operations ||| des actions ||| (0,1) ||| (0) (0) ||| 0.0304499 0.0269505 0.00973472 0.00438066 2.718
operations ||| des activités ||| (0,1) ||| (0) (0) ||| 0.00877089 0.00997725 0.00246435 0.00206379 2.718
operations ||| des opérations ||| (0,1) ||| (0) (0) ||| 0.294821 0.281318 0.0406896 0.0238681 2.718
operations ||| exploitation ||| (0) ||| (0) ||| 0.0437821 0.0365346 0.0208856 0.029298 2.718
operations ||| fonctionnement ||| (0) ||| (0) ||| 0.0141471 0.01165 0.00919948 0.0099513 2.718
operations ||| gestion ||| (0) ||| (0) ||| 0.00141338 0.0013098 0.00286578 0.0032669 2.718
operations ||| intervention ||| (0) ||| (0) ||| 0.00561479 0.0026006 0.00110394 0.0013554 2.718
operations ||| interventions ||| (0) ||| (0) ||| 0.0830237 0.0778631 0.0102142 0.0149096 2.718
operations ||| les actions ||| (0,1) ||| (0) (0) ||| 0.0339458 0.0271478 0.00931099 0.00712787 2.718
operations ||| les activités ||| (0,1) ||| (0) (0) ||| 0.00915348 0.0101746 0.00296613 0.00335805 2.718
operations ||| les interventions ||| (0,1) ||| (0) (0) ||| 0.0565693 0.0393793 0.00207406 0.00110872 2.718
operations ||| les opérations ||| (0,1) ||| (0) (0) ||| 0.413399 0.281515 0.0564235 0.0388363 2.718
operations ||| manipulations ||| (0) ||| (0) ||| 0.0985325 0.183951 0.00104818 0.0034523 2.718
operations ||| operations ||| (0) ||| (0) ||| 0.786026 0.557952 0.00200716 0.0023981 2.718
operations ||| opération ||| (0) ||| (0) ||| 0.0245776 0.021675 0.00785022 0.0085959 2.718
operations ||| opérationnel ||| (0) ||| (0) ||| 0.00656403 0.0069192 0.0012266 0.0013902 2.718
operations ||| opérations effectuées ||| (0,1) ||| (0) (0) ||| 0.110801 0.285316 0.00132696 0.00229301 2.718
operations ||| opérations ||| (0) ||| (0) ||| 0.636821 0.562135 0.409237 0.522254 2.718
operations ||| travaux ||| (0) ||| (0) ||| 0.00273044 0.0024213 0.00194025 0.0023517 2.718
- 7 -
MT@DGT
SMT from a Translator’s perspective
 SMT can be seen as a generalisation of Translation
Memory to sub-segmental level
 The phrases are text snippets taken from real-world
translations (i.e. as good as what you entered)
 Re-combination of those phrases in new contexts may
lead to significant problems:
• Alignment errors  spurious/lost meaning
• Ignorance of morphology
• Grammatical errors
• Wrong disambiguation
 SMT will not recover implicit information from source
text nor handle structural mismatches
}
current research
prototypes include
some linguistics &
show significant
improvements
- 8 -
MT@DGT
Architectures for Rule-Based(RB) MT
Text Text
Syntactic
Structure
Syntactic
Structure
Semantic
Structure
Semantic
Structure
Interlingua
Direct Translation
Syntax-based Transfer
Semantic
Transfer
Syntactic
Analysis
Semantic
Analysis
Syntactic
Generation
Semantic
Generation
The „Vauquois-Triangle“ (Vauquois, 1976)
- 9 -
MT@DGT
RBMT at the Commission
The past: ECMT
 Single technological solution (“one-size-fits-all”)
 Developed between 1975 and 1998
 28 language pairs available (ten languages)
 Suspended since December 2010
The future?
 Hands-on workshop at DGT on Apertium (May 2011)
 Open-source solution, backed by a strong developer
community, originally focused on regional languages
 Lexicons for many European languages being developed
 Could provide building blocks for hybrid solution…
- 10 -
MT@DGT
Strengths and Weaknesses of MT Paradigms
(RBMT:translate pro ↔ SMT:Koehn 2005, examples from EuroParl)
EN: I wish the negotiators continued success with their work
in this important area.
RBMT: Ich wünsche, dass die Unterhändler Erfolg mit ihrer
Arbeit in diesem wichtigen Bereich fortsetzten.
continued: Verb instead of adjective
SMT: Ich wünsche der Verhandlungsführer fortgesetzte Erfolg
bei ihrer Arbeit in diesem wichtigen Bereich.
three wrong inflectional endings
- 11 -
MT@DGT
Strengths and Weaknesses of MT Paradigms
English RBMT: translate pro SMT: Koehn 2005
We seem sometimes
to have lost sight of
this fact.
Wir scheinen manchmal
Anblick dieser Tatsache
verloren zu haben.
Manchmal scheinen wir
aus den Augen verloren
haben, diese Tatsache.
The leaders of
Europe have not
formulated a clear
vision.
Die Leiter von Europa
haben keine klare Vision
formuliert.
Die Führung Europas nicht
formuliert eine klare
Vision.
I would like to close
with a procedural
motion.
Ich möchte mit einer
verfahrenstechnischen
Bewegung schließen.
Ich möchte abschließend
eine Frage zur
Geschäftsordnung ε.
- 12 -
MT@DGT
Problems with Reliability of Lexicon Acquisition
[November 2007, corrected in the meantime]
See translationparty.com for more hilarious examples
Strengths and Weaknesses of MT Paradigms
- 13 -
MT@DGT
Strengths and Weaknesses of MT Paradigms
RBMT SMT
Syntax,
Morphology ++ --
Structural
Semantics + --
Lexical
Semantics - +
Lexical
Adaptivity -- +
Lexical
Reliability + -
In the early 90s, SMT and
RBMT were seen in
sharp contrast.
But advantages and
disadvantages are
complementary.
 Search for integrated
(hybrid) methods is now
seen as natural
extension for both
approaches
- 14 -
MT@DGT
Hybrid MT Architectures (from EuroMatrix/Plus)
Possible ways to combine SMT with RBMT
= SMT Module
= RBMT Module
- 15 -
MT@DGT
Technological approach for MT@EC
 Started in June 2010 to implement an action plan
 Start with SMT as baseline technology
 Integrate linguistic knowledge as needed
 For morphologically simple/structurally similar LPs,
baseline technology may be “good enough”
 For more challenging languages, techniques and
tools from market and research will be incorporated
 Collaboration with DGT’s LDs will be crucial
- 16 -
MT@DGT
MT action lines
MT@EC architecture
outline
DISPATCHER
managing
MT requests
MT engines
by language,
subject…
MT data
language resources
specific for each MT engine Language
resources
built around Euramis
DATA
MODELLING
Customised
interfaces
ENGINES HUB USER FEEDBACK DATA HUB
Users and
Services
3. Service 1. Data
2. Engines
- 17 -
MT@DGT
MT@EC overall planning
If all goes well…
 2011 MT engines available to DGT staff
to use as a CAT tool
(“benchmark” engines,
quality enhancement via feed-back loop)
 2012 Beta versions of the MT@EC service
for selected test users outside DGT
(comparison of engines)
 2013 Operational MT@EC service for Commission,
other EU institutions,and public administrations
- 18 -
MT@DGT
MT@EC Maturity Check
Purpose
 Collect first round of feed-back about main issues in the
current baseline engines
 Identify engines that could already be useful as they are
now
 Limit the effort for the translators involved
Approach
 Let translators compare several hundred MT results with
reference translations, showing differences in color
 Ask via web interface whether editing effort appears
acceptable (“useful”) or not (“useless”)
- 19 -
MT@DGT
Maturity Check: User Interface
Highlighting of differences between translations:
 Words of both translations are shown in black if the same word and
both neighbours appear in the other translation as well.
 Words are shown in blue if the same word appears in the other
translation, but at least one of its neighbours differs.
 Words that do not show up in the other translation (omissions,
insertions, different lexical choice) are shown in red.
 If common parts of unmatched words are identified, they are
displayed in violet.
SRC (3g6558): the date, time and location of the inspection, and
DE REF: das Datum , die Uhrzeit und den Inspektionsort und
DE MT: Datum , Uhrzeit und Ort der Inspektion sowie
 useful useless  irrelevant
- 20 -
MT@DGT
Maturity Check: Summary of Results
61 translators from 21 language departments provided more than 16000 individual
judgments
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100% ES
FR
IT
PT
RO
DE
DA
NL
SV
BG
CS
PL
SK
SL
EL
MT
LT
LV
ET
FI
HU
useful useless
Romance
languages
inflected
Germanic
languages
Slavic
languages
Baltic
lang.
analytic
Semitic
highly inflected languages
Hellenic
Finno-
Ugric
composita
strong
aggluti-
nation
DGT's SMT maturity check outcome as a ( ) sentences ratio + morphology
- 21 -
MT@DGT
Maturity Check: Qualitative Error Analysis
Translators were also asked (via a wiki page) to rank main
types of observed errors. The following error types were
ranked highest across all languages:
 Words or sub-sentences misplaced
 Word prefixes/infixes/suffixes wrong
 Terms usage inconsistent within the text
 Words/stems/vocabulary wrong
 Words missing
 Congruence wrong
- 22 -
MT@DGT
How MT relates to Terminology Work
 MT should respect existing terminology
• In case of doubt, “official” terms should be preferred over alternative
wordings
• SMT models can be tuned to respect such preferences
 Training corpora contain inconsistent terminology
• Causes inconsistencies in MT results, unless properly handled
• Systematic detection of such cases will improve MT quality
 Training SMT from translation memories can identify new
terminology as used in practice
• Frequent terms not in IATE can be identified and manually validated
• This can speed up the development of IATE for new languages
 Experiments with RO LD ongoing
 first results: 2275 out of 2415 manually checked RO terms were good
 precision of 94%
- 23 -
MT@DGT
Thank you

More Related Content

Similar to machine-translation-for-translators-and-terminology-applications-in-the-eu1.ppt

Erp modeling using petri net(updated)
Erp modeling using petri net(updated)Erp modeling using petri net(updated)
Erp modeling using petri net(updated)akash roy
 
MGMT 7020 Technology and Project Management4. Systems I.docx
MGMT 7020   Technology and Project Management4.  Systems I.docxMGMT 7020   Technology and Project Management4.  Systems I.docx
MGMT 7020 Technology and Project Management4. Systems I.docxbuffydtesurina
 
Pack sol landscape v2
Pack sol landscape v2Pack sol landscape v2
Pack sol landscape v2Don Hall
 
Erp modeling using petri net
Erp modeling using petri netErp modeling using petri net
Erp modeling using petri netakash roy
 
Transfer pricing documentation for tax purposes of IT services for an interna...
Transfer pricing documentation for tax purposes of IT services for an interna...Transfer pricing documentation for tax purposes of IT services for an interna...
Transfer pricing documentation for tax purposes of IT services for an interna...Ole Kristian Gravrok
 
D1.3 public report
D1.3 public reportD1.3 public report
D1.3 public reportxpuig
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingEray Cakici
 
The larger picture – an icao update
The larger picture – an icao updateThe larger picture – an icao update
The larger picture – an icao updatealban_sylaj
 
MSF ITC Strat Plan - Executive summary v11
MSF ITC Strat Plan - Executive summary v11MSF ITC Strat Plan - Executive summary v11
MSF ITC Strat Plan - Executive summary v11Joan Lluis
 
IRJET - Design and Manufacturing of Gear Error Profile Detector
IRJET - Design and Manufacturing of Gear Error Profile DetectorIRJET - Design and Manufacturing of Gear Error Profile Detector
IRJET - Design and Manufacturing of Gear Error Profile DetectorIRJET Journal
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
 
IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET -  	  A Review on Chatbot Design and Implementation TechniquesIRJET -  	  A Review on Chatbot Design and Implementation Techniques
IRJET - A Review on Chatbot Design and Implementation TechniquesIRJET Journal
 
Document nsn
Document nsnDocument nsn
Document nsnLm Tài
 
A Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation TechniquesA Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation TechniquesJoe Osborn
 
DWS15 - Future networks forum - Virtualisation - Atos -Cedric Carel
DWS15 - Future networks forum - Virtualisation - Atos -Cedric CarelDWS15 - Future networks forum - Virtualisation - Atos -Cedric Carel
DWS15 - Future networks forum - Virtualisation - Atos -Cedric CarelIDATE DigiWorld
 

Similar to machine-translation-for-translators-and-terminology-applications-in-the-eu1.ppt (20)

Erp modeling using petri net(updated)
Erp modeling using petri net(updated)Erp modeling using petri net(updated)
Erp modeling using petri net(updated)
 
MGMT 7020 Technology and Project Management4. Systems I.docx
MGMT 7020   Technology and Project Management4.  Systems I.docxMGMT 7020   Technology and Project Management4.  Systems I.docx
MGMT 7020 Technology and Project Management4. Systems I.docx
 
Pack sol landscape v2
Pack sol landscape v2Pack sol landscape v2
Pack sol landscape v2
 
Erp modeling using petri net
Erp modeling using petri netErp modeling using petri net
Erp modeling using petri net
 
Transfer pricing documentation for tax purposes of IT services for an interna...
Transfer pricing documentation for tax purposes of IT services for an interna...Transfer pricing documentation for tax purposes of IT services for an interna...
Transfer pricing documentation for tax purposes of IT services for an interna...
 
D1.3 public report
D1.3 public reportD1.3 public report
D1.3 public report
 
CIM UNIT 1.ppt
CIM UNIT 1.pptCIM UNIT 1.ppt
CIM UNIT 1.ppt
 
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in SchedulingConstraint Programming - An Alternative Approach to Heuristics in Scheduling
Constraint Programming - An Alternative Approach to Heuristics in Scheduling
 
The larger picture – an icao update
The larger picture – an icao updateThe larger picture – an icao update
The larger picture – an icao update
 
MSF ITC Strat Plan - Executive summary v11
MSF ITC Strat Plan - Executive summary v11MSF ITC Strat Plan - Executive summary v11
MSF ITC Strat Plan - Executive summary v11
 
IRJET - Design and Manufacturing of Gear Error Profile Detector
IRJET - Design and Manufacturing of Gear Error Profile DetectorIRJET - Design and Manufacturing of Gear Error Profile Detector
IRJET - Design and Manufacturing of Gear Error Profile Detector
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
Robotics for Skoltech
Robotics for SkoltechRobotics for Skoltech
Robotics for Skoltech
 
QVT & MTL In Eclipse
QVT & MTL In EclipseQVT & MTL In Eclipse
QVT & MTL In Eclipse
 
Intento Enterprise MT Hub
Intento Enterprise MT HubIntento Enterprise MT Hub
Intento Enterprise MT Hub
 
IRJET - A Review on Chatbot Design and Implementation Techniques
IRJET -  	  A Review on Chatbot Design and Implementation TechniquesIRJET -  	  A Review on Chatbot Design and Implementation Techniques
IRJET - A Review on Chatbot Design and Implementation Techniques
 
Document nsn
Document nsnDocument nsn
Document nsn
 
A Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation TechniquesA Review On Chatbot Design And Implementation Techniques
A Review On Chatbot Design And Implementation Techniques
 
Eliminate 7 Mudas
Eliminate 7 MudasEliminate 7 Mudas
Eliminate 7 Mudas
 
DWS15 - Future networks forum - Virtualisation - Atos -Cedric Carel
DWS15 - Future networks forum - Virtualisation - Atos -Cedric CarelDWS15 - Future networks forum - Virtualisation - Atos -Cedric Carel
DWS15 - Future networks forum - Virtualisation - Atos -Cedric Carel
 

Recently uploaded

Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdfOrient Homes
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,noida100girls
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...lizamodels9
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 

Recently uploaded (20)

Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Catalogue ONG NUOC PPR DE NHAT .pdf
Catalogue ONG NUOC PPR DE NHAT      .pdfCatalogue ONG NUOC PPR DE NHAT      .pdf
Catalogue ONG NUOC PPR DE NHAT .pdf
 
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
BEST Call Girls In Greater Noida ✨ 9773824855 ✨ Escorts Service In Delhi Ncr,
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Best Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting PartnershipBest Practices for Implementing an External Recruiting Partnership
Best Practices for Implementing an External Recruiting Partnership
 
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130  Available With RoomVIP Kolkata Call Girl Howrah 👉 8250192130  Available With Room
VIP Kolkata Call Girl Howrah 👉 8250192130 Available With Room
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Mehrauli Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service PuneVIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Kirti 8617697112 Independent Escort Service Pune
 
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
Lowrate Call Girls In Laxmi Nagar Delhi ❤️8860477959 Escorts 100% Genuine Ser...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 

machine-translation-for-translators-and-terminology-applications-in-the-eu1.ppt

  • 1. Directorate-General for Translation EUROPEAN COMMISSION Machine Translation at the European Commission and the Relation to Terminology Work Andreas Eisele Language applications, ICT Unit
  • 2. - 2 - MT@DGT MT at the European Commission Structure of the presentation • Usage Scenarios for Translation • Technological Paradigms for MT  Statistical Machine Translation (SMT)  Rule-based Machine Translation (RBMT)  Hybrid MT • MT@EC: Recent Developments and Perspectives • Relation to Terminology Work
  • 3. - 3 - MT@DGT Usage Scenarios and Requirements for MT Requirements for MT depend on the way it is used a) MT for assimilation „inbound“ b) MT for dissemination „outbound“ c) MT for direct communication Textual quality MT L2 L3 … Ln L1 MT L2 L3 … Ln L1 MT L1 L2 Robustness Coverage Speech recognition errors, specific style (chat) context dependence Publishable quality can only be authored by humans; Translation Memories & CAT-Tools are almost mandatory for professional translators Practically unlimited demand; but free web-based services reduce incentive to improve technology Topic of many running and completed research projects (VerbMobil, TC Star, TransTac, …) US-Military uses systems for spoken MT, first applications for smartphones
  • 4. - 4 - MT@DGT Statistical Machine Translation: Theory  Developed by F. Jelinek at IBM (1988-1995), based on „distorted channel“ Paradigm (successful for pattern- and speech recognition )  Decoding: Given observation F, find most likely cause E*  Three subproblems P(E): (Target) Language Model P(F|E): Translation Model Search for E*: Decoding, MT  Models are trained with (parallel) corpora, correspondences (alignments) between languages are estimated via EM-Algorithm (GIZA++ by F.J.Och) search/decoding possible via Moses (Koehn e.a.) P(E) P(F|E)  E  F  E* = argmaxE P(E|F) = argmaxE P(E,F) = argmaxE P(E) * P(F|E) each has approximate solutions nGram-Models P(e1…en) = ΠP(ei|ei-2 ei-1) Transfer of „phrases“ P(F|E) = ΠP(fi|ei)*P(di) Heuristic (beam) search source text translation
  • 5. - 5 - MT@DGT Machine translation Language Model (Fluency) Translation Model (Adequacy) Basic Architecture for Statistical MT Monolingual Corpus Phrase Table Parallel Corpus nGram- Model Alignment, Phrase Extraction Counting, Smoothing Decoder Source Text Target Text N-best Lists
  • 6. - 6 - MT@DGT Examples of SMT Models A selection of 23 out of 3897 ways to translate operations from EN to FR operations ||| action ||| (0) ||| (0) ||| 0.00338724 0.0017156 0.00316685 0.0034059 2.718 operations ||| actions ||| (0) ||| (0) ||| 0.0575431 0.0534003 0.0731052 0.0958526 2.718 operations ||| activité ||| (0) ||| (0) ||| 0.0102038 0.0079204 0.00744879 0.0084917 2.718 operations ||| activités ||| (0) ||| (0) ||| 0.019962 0.0194538 0.0366753 0.0451576 2.718 operations ||| des actions ||| (0,1) ||| (0) (0) ||| 0.0304499 0.0269505 0.00973472 0.00438066 2.718 operations ||| des activités ||| (0,1) ||| (0) (0) ||| 0.00877089 0.00997725 0.00246435 0.00206379 2.718 operations ||| des opérations ||| (0,1) ||| (0) (0) ||| 0.294821 0.281318 0.0406896 0.0238681 2.718 operations ||| exploitation ||| (0) ||| (0) ||| 0.0437821 0.0365346 0.0208856 0.029298 2.718 operations ||| fonctionnement ||| (0) ||| (0) ||| 0.0141471 0.01165 0.00919948 0.0099513 2.718 operations ||| gestion ||| (0) ||| (0) ||| 0.00141338 0.0013098 0.00286578 0.0032669 2.718 operations ||| intervention ||| (0) ||| (0) ||| 0.00561479 0.0026006 0.00110394 0.0013554 2.718 operations ||| interventions ||| (0) ||| (0) ||| 0.0830237 0.0778631 0.0102142 0.0149096 2.718 operations ||| les actions ||| (0,1) ||| (0) (0) ||| 0.0339458 0.0271478 0.00931099 0.00712787 2.718 operations ||| les activités ||| (0,1) ||| (0) (0) ||| 0.00915348 0.0101746 0.00296613 0.00335805 2.718 operations ||| les interventions ||| (0,1) ||| (0) (0) ||| 0.0565693 0.0393793 0.00207406 0.00110872 2.718 operations ||| les opérations ||| (0,1) ||| (0) (0) ||| 0.413399 0.281515 0.0564235 0.0388363 2.718 operations ||| manipulations ||| (0) ||| (0) ||| 0.0985325 0.183951 0.00104818 0.0034523 2.718 operations ||| operations ||| (0) ||| (0) ||| 0.786026 0.557952 0.00200716 0.0023981 2.718 operations ||| opération ||| (0) ||| (0) ||| 0.0245776 0.021675 0.00785022 0.0085959 2.718 operations ||| opérationnel ||| (0) ||| (0) ||| 0.00656403 0.0069192 0.0012266 0.0013902 2.718 operations ||| opérations effectuées ||| (0,1) ||| (0) (0) ||| 0.110801 0.285316 0.00132696 0.00229301 2.718 operations ||| opérations ||| (0) ||| (0) ||| 0.636821 0.562135 0.409237 0.522254 2.718 operations ||| travaux ||| (0) ||| (0) ||| 0.00273044 0.0024213 0.00194025 0.0023517 2.718
  • 7. - 7 - MT@DGT SMT from a Translator’s perspective  SMT can be seen as a generalisation of Translation Memory to sub-segmental level  The phrases are text snippets taken from real-world translations (i.e. as good as what you entered)  Re-combination of those phrases in new contexts may lead to significant problems: • Alignment errors  spurious/lost meaning • Ignorance of morphology • Grammatical errors • Wrong disambiguation  SMT will not recover implicit information from source text nor handle structural mismatches } current research prototypes include some linguistics & show significant improvements
  • 8. - 8 - MT@DGT Architectures for Rule-Based(RB) MT Text Text Syntactic Structure Syntactic Structure Semantic Structure Semantic Structure Interlingua Direct Translation Syntax-based Transfer Semantic Transfer Syntactic Analysis Semantic Analysis Syntactic Generation Semantic Generation The „Vauquois-Triangle“ (Vauquois, 1976)
  • 9. - 9 - MT@DGT RBMT at the Commission The past: ECMT  Single technological solution (“one-size-fits-all”)  Developed between 1975 and 1998  28 language pairs available (ten languages)  Suspended since December 2010 The future?  Hands-on workshop at DGT on Apertium (May 2011)  Open-source solution, backed by a strong developer community, originally focused on regional languages  Lexicons for many European languages being developed  Could provide building blocks for hybrid solution…
  • 10. - 10 - MT@DGT Strengths and Weaknesses of MT Paradigms (RBMT:translate pro ↔ SMT:Koehn 2005, examples from EuroParl) EN: I wish the negotiators continued success with their work in this important area. RBMT: Ich wünsche, dass die Unterhändler Erfolg mit ihrer Arbeit in diesem wichtigen Bereich fortsetzten. continued: Verb instead of adjective SMT: Ich wünsche der Verhandlungsführer fortgesetzte Erfolg bei ihrer Arbeit in diesem wichtigen Bereich. three wrong inflectional endings
  • 11. - 11 - MT@DGT Strengths and Weaknesses of MT Paradigms English RBMT: translate pro SMT: Koehn 2005 We seem sometimes to have lost sight of this fact. Wir scheinen manchmal Anblick dieser Tatsache verloren zu haben. Manchmal scheinen wir aus den Augen verloren haben, diese Tatsache. The leaders of Europe have not formulated a clear vision. Die Leiter von Europa haben keine klare Vision formuliert. Die Führung Europas nicht formuliert eine klare Vision. I would like to close with a procedural motion. Ich möchte mit einer verfahrenstechnischen Bewegung schließen. Ich möchte abschließend eine Frage zur Geschäftsordnung ε.
  • 12. - 12 - MT@DGT Problems with Reliability of Lexicon Acquisition [November 2007, corrected in the meantime] See translationparty.com for more hilarious examples Strengths and Weaknesses of MT Paradigms
  • 13. - 13 - MT@DGT Strengths and Weaknesses of MT Paradigms RBMT SMT Syntax, Morphology ++ -- Structural Semantics + -- Lexical Semantics - + Lexical Adaptivity -- + Lexical Reliability + - In the early 90s, SMT and RBMT were seen in sharp contrast. But advantages and disadvantages are complementary.  Search for integrated (hybrid) methods is now seen as natural extension for both approaches
  • 14. - 14 - MT@DGT Hybrid MT Architectures (from EuroMatrix/Plus) Possible ways to combine SMT with RBMT = SMT Module = RBMT Module
  • 15. - 15 - MT@DGT Technological approach for MT@EC  Started in June 2010 to implement an action plan  Start with SMT as baseline technology  Integrate linguistic knowledge as needed  For morphologically simple/structurally similar LPs, baseline technology may be “good enough”  For more challenging languages, techniques and tools from market and research will be incorporated  Collaboration with DGT’s LDs will be crucial
  • 16. - 16 - MT@DGT MT action lines MT@EC architecture outline DISPATCHER managing MT requests MT engines by language, subject… MT data language resources specific for each MT engine Language resources built around Euramis DATA MODELLING Customised interfaces ENGINES HUB USER FEEDBACK DATA HUB Users and Services 3. Service 1. Data 2. Engines
  • 17. - 17 - MT@DGT MT@EC overall planning If all goes well…  2011 MT engines available to DGT staff to use as a CAT tool (“benchmark” engines, quality enhancement via feed-back loop)  2012 Beta versions of the MT@EC service for selected test users outside DGT (comparison of engines)  2013 Operational MT@EC service for Commission, other EU institutions,and public administrations
  • 18. - 18 - MT@DGT MT@EC Maturity Check Purpose  Collect first round of feed-back about main issues in the current baseline engines  Identify engines that could already be useful as they are now  Limit the effort for the translators involved Approach  Let translators compare several hundred MT results with reference translations, showing differences in color  Ask via web interface whether editing effort appears acceptable (“useful”) or not (“useless”)
  • 19. - 19 - MT@DGT Maturity Check: User Interface Highlighting of differences between translations:  Words of both translations are shown in black if the same word and both neighbours appear in the other translation as well.  Words are shown in blue if the same word appears in the other translation, but at least one of its neighbours differs.  Words that do not show up in the other translation (omissions, insertions, different lexical choice) are shown in red.  If common parts of unmatched words are identified, they are displayed in violet. SRC (3g6558): the date, time and location of the inspection, and DE REF: das Datum , die Uhrzeit und den Inspektionsort und DE MT: Datum , Uhrzeit und Ort der Inspektion sowie  useful useless  irrelevant
  • 20. - 20 - MT@DGT Maturity Check: Summary of Results 61 translators from 21 language departments provided more than 16000 individual judgments 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% ES FR IT PT RO DE DA NL SV BG CS PL SK SL EL MT LT LV ET FI HU useful useless Romance languages inflected Germanic languages Slavic languages Baltic lang. analytic Semitic highly inflected languages Hellenic Finno- Ugric composita strong aggluti- nation DGT's SMT maturity check outcome as a ( ) sentences ratio + morphology
  • 21. - 21 - MT@DGT Maturity Check: Qualitative Error Analysis Translators were also asked (via a wiki page) to rank main types of observed errors. The following error types were ranked highest across all languages:  Words or sub-sentences misplaced  Word prefixes/infixes/suffixes wrong  Terms usage inconsistent within the text  Words/stems/vocabulary wrong  Words missing  Congruence wrong
  • 22. - 22 - MT@DGT How MT relates to Terminology Work  MT should respect existing terminology • In case of doubt, “official” terms should be preferred over alternative wordings • SMT models can be tuned to respect such preferences  Training corpora contain inconsistent terminology • Causes inconsistencies in MT results, unless properly handled • Systematic detection of such cases will improve MT quality  Training SMT from translation memories can identify new terminology as used in practice • Frequent terms not in IATE can be identified and manually validated • This can speed up the development of IATE for new languages  Experiments with RO LD ongoing  first results: 2275 out of 2415 manually checked RO terms were good  precision of 94%