SlideShare a Scribd company logo
1 of 15
Download to read offline
Active Curation of Bi-Text Resources in
Commercial Localization Workflows
Dave Lewis TCD, Andrzej Zydroń
XTM International
•  Open Data on the Web: W3C Semantic Web
standards allow data to be published on Web
– Fine-grained URI-based inter-linking
– Extensible meta-data
– Standard Query APIs
•  Enables a Localization Web
– Terms and translations become linkable resources
– Meta-data from L10n workflows adds value
– Leverage in training Machine Translation and
Automatic Term Extraction
The Localization Web
The Localization Web = Decentralised Annotated Global
Translation Memory and Term Base
Web of Multilingual Content
Domain Terminology
•  Rich word
and
phrase
resources
to assist
translators
Babelfy: Public Lexical Resources
•  Translation
suggestions
can be fed
into MT for
more reliable
translation
Links to BabelNet offer suggestions
for Definitions and Translations
Babelfy & Babelnet offer more term
suggestions
•  Public resources
may not always
yield the right
definitions or
translations for the
context
•  Need to track
human validation/
rejection to train
automatic term
extraction
Active Curation of Linked Language
Resources
The company has also reduced its production
capacity by ceasing manufacture of chest
freezers and freestanding microwave ovens
Extraction &
Segmentation
production capacity
capacité de production
✔
✔ Annotation with
Existing Terms
chest freezer
microwave oven
réfrigérateur
four à micro-onde
?
?
?
?
Auto suggestion from
Babelfy/Babelnet
D'autre part, la société a réduit sa capacité de
production en arrêtant la production de
réfrigérateur et de fours micro-onde pose-libre
Machine Translate
with Term Translations
MT Vendor?
D'autre part, la société a réduit sa capacité de
production en arrêtant la production de
congélateurs coffres et do fours micro-ondes
pose-libre
✗
congélateurs coffres
fours micro-ondes
✔
Postedit and capture
terms in context
✔✔
✔
✔
✔
✔
PE
PE
PE
PE
PE
PE
PE
✗
PE
✔
•  CSV of the Web: tables and JSON meta-
data
•  JSON-Linked Data
•  Provenance Vocabulary
•  Data Catalogue
•  Open Annotation
•  ITS2.0 Vocabulary
•  Also:
– Provenance Plan
– Open Data Rights Language
Linked Data Based on W3C
Standards
Language
Resource
s
Language
Workers
Language
Technology
Language Lifecycle Dependencies
Parallel
Text &
Term base
Posteditors
Machine
Translation
Active Curation: Dynamic MT Retraining
•  Tighten curation cycle: from projects to
segments
– Prioritise postedits for retraining
•  Prioritise Term
Identification by
posteditors
•  Assemble MT-ready,
lexically-rich term bases
•  TermWeb/XTM/DCU
•  Introducing Next Gen Machine Translation
•  Massive scale bilingual dictionaries
•  BabelNet
•  Automatic Term Extraction: forced
decoding
•  Dynamic retraining
•  Optimal segment translation route
•  L3Data curation, sharing
Next Generation Machine Translation
Data Management Lifecycles
Publish
Correct &
refine
Lex-
concept
lifecycleCorrect &
refine
Discover &
use
Discover &
use
Correct &
refine
Bitext lifecycle
Discover
data
(Re)train-
MT
Revise and
annotate
Publish
Content
lifecycle
Publish
I18n &
source QA
Trans
QA
Post-
edit
Automated
translation
Consume Create
•  Better in-context postediting:
– XTM-Easyling
•  Feeding term suggestions from posteditor to Terminology Management
– XTM-Interverbum
•  Dynamic Retraining
– XTM-DCU
•  Bilingual Dictionary SMT improvements
– XTM-DCU
•  NER, terminology enforcements, forced decoding
– XTM-Interverbum-DCU
•  Postediting prioritisation and term flagging
– TCD-DCU-XTM
•  Publishing interlinks of parallel text, lexically rich term bases
– TCD: DG-T TM, EurVoc, Snomed-CT, LEMON, BabelNet
•  Closing the loop – operational instrumentation of postediting
– XTM
Systems Integration

More Related Content

Similar to Active Curation of Bi-Text Resources in Commercial Localization Workflows

EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...
EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...
EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...European Data Forum
 
Sp2010 high availlability
Sp2010 high availlabilitySp2010 high availlability
Sp2010 high availlabilitySamuel Zürcher
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarMS Cloud Summit
 
Modernize Your Infrastructure and Apps with Microsoft Azure
Modernize Your Infrastructure and Apps with Microsoft AzureModernize Your Infrastructure and Apps with Microsoft Azure
Modernize Your Infrastructure and Apps with Microsoft AzureWinWire Technologies Inc
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the EnterpriseEric Kavanagh
 
The Standards Mosaic Opening the Way to New Technologies
The Standards Mosaic Opening the Way to New TechnologiesThe Standards Mosaic Opening the Way to New Technologies
The Standards Mosaic Opening the Way to New TechnologiesDave Lewis
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Gautier Poupeau
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectNoorez Khamis
 
Introduction On Integrating Translation Applications, Wcm To Achieve A Common...
Introduction On Integrating Translation Applications, Wcm To Achieve A Common...Introduction On Integrating Translation Applications, Wcm To Achieve A Common...
Introduction On Integrating Translation Applications, Wcm To Achieve A Common...Dhiraj Makam
 
How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013Dipti Borkar
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Sascha Wenninger
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreAndy Powell
 
Real-world software design practices when developing ASP.NET web systems by B...
Real-world software design practices when developing ASP.NET web systems by B...Real-world software design practices when developing ASP.NET web systems by B...
Real-world software design practices when developing ASP.NET web systems by B...Bojan Veljanovski
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseDataWorks Summit
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 

Similar to Active Curation of Bi-Text Resources in Commercial Localization Workflows (20)

EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...
EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...
EDF2013: Selected talk by David Lewis: Linked Data Reuse in the Language Serv...
 
Sp2010 high availlability
Sp2010 high availlabilitySp2010 high availlability
Sp2010 high availlability
 
J1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan KumarJ1 - Keynote Data Platform - Rohan Kumar
J1 - Keynote Data Platform - Rohan Kumar
 
Modernize Your Infrastructure and Apps with Microsoft Azure
Modernize Your Infrastructure and Apps with Microsoft AzureModernize Your Infrastructure and Apps with Microsoft Azure
Modernize Your Infrastructure and Apps with Microsoft Azure
 
A Tight Ship: How Containers and SDS Optimize the Enterprise
 A Tight Ship: How Containers and SDS Optimize the Enterprise A Tight Ship: How Containers and SDS Optimize the Enterprise
A Tight Ship: How Containers and SDS Optimize the Enterprise
 
The Standards Mosaic Opening the Way to New Technologies
The Standards Mosaic Opening the Way to New TechnologiesThe Standards Mosaic Opening the Way to New Technologies
The Standards Mosaic Opening the Way to New Technologies
 
Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...Why I don't use Semantic Web technologies anymore, event if they still influe...
Why I don't use Semantic Web technologies anymore, event if they still influe...
 
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint ArchitectSharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
SharePoint Saturday Toronto 2015 - Inside the mind of a SharePoint Architect
 
song.ppt
song.pptsong.ppt
song.ppt
 
song.ppt
song.pptsong.ppt
song.ppt
 
song.ppt
song.pptsong.ppt
song.ppt
 
Introduction On Integrating Translation Applications, Wcm To Achieve A Common...
Introduction On Integrating Translation Applications, Wcm To Achieve A Common...Introduction On Integrating Translation Applications, Wcm To Achieve A Common...
Introduction On Integrating Translation Applications, Wcm To Achieve A Common...
 
How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013How companies-use-no sql-and-couchbase-10152013
How companies-use-no sql-and-couchbase-10152013
 
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
Navigating SAP’s Integration Options (Mastering SAP Technologies 2013)
 
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin CoreOpen for Business - Open Archives, OpenURL, RSS and the Dublin Core
Open for Business - Open Archives, OpenURL, RSS and the Dublin Core
 
Real-world software design practices when developing ASP.NET web systems by B...
Real-world software design practices when developing ASP.NET web systems by B...Real-world software design practices when developing ASP.NET web systems by B...
Real-world software design practices when developing ASP.NET web systems by B...
 
MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013MongoDB Partner Program Update - November 2013
MongoDB Partner Program Update - November 2013
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data WarehouseInnovation in the Enterprise Rent-A-Car Data Warehouse
Innovation in the Enterprise Rent-A-Car Data Warehouse
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 

Recently uploaded

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Recently uploaded (20)

定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Active Curation of Bi-Text Resources in Commercial Localization Workflows

  • 1. Active Curation of Bi-Text Resources in Commercial Localization Workflows Dave Lewis TCD, Andrzej Zydroń XTM International
  • 2. •  Open Data on the Web: W3C Semantic Web standards allow data to be published on Web – Fine-grained URI-based inter-linking – Extensible meta-data – Standard Query APIs •  Enables a Localization Web – Terms and translations become linkable resources – Meta-data from L10n workflows adds value – Leverage in training Machine Translation and Automatic Term Extraction The Localization Web The Localization Web = Decentralised Annotated Global Translation Memory and Term Base
  • 5. •  Rich word and phrase resources to assist translators Babelfy: Public Lexical Resources
  • 6. •  Translation suggestions can be fed into MT for more reliable translation Links to BabelNet offer suggestions for Definitions and Translations
  • 7. Babelfy & Babelnet offer more term suggestions
  • 8. •  Public resources may not always yield the right definitions or translations for the context •  Need to track human validation/ rejection to train automatic term extraction
  • 9. Active Curation of Linked Language Resources The company has also reduced its production capacity by ceasing manufacture of chest freezers and freestanding microwave ovens Extraction & Segmentation production capacity capacité de production ✔ ✔ Annotation with Existing Terms chest freezer microwave oven réfrigérateur four à micro-onde ? ? ? ? Auto suggestion from Babelfy/Babelnet D'autre part, la société a réduit sa capacité de production en arrêtant la production de réfrigérateur et de fours micro-onde pose-libre Machine Translate with Term Translations MT Vendor? D'autre part, la société a réduit sa capacité de production en arrêtant la production de congélateurs coffres et do fours micro-ondes pose-libre ✗ congélateurs coffres fours micro-ondes ✔ Postedit and capture terms in context ✔✔ ✔ ✔ ✔ ✔ PE PE PE PE PE PE PE ✗ PE ✔
  • 10. •  CSV of the Web: tables and JSON meta- data •  JSON-Linked Data •  Provenance Vocabulary •  Data Catalogue •  Open Annotation •  ITS2.0 Vocabulary •  Also: – Provenance Plan – Open Data Rights Language Linked Data Based on W3C Standards
  • 12. Parallel Text & Term base Posteditors Machine Translation Active Curation: Dynamic MT Retraining •  Tighten curation cycle: from projects to segments – Prioritise postedits for retraining •  Prioritise Term Identification by posteditors •  Assemble MT-ready, lexically-rich term bases
  • 13. •  TermWeb/XTM/DCU •  Introducing Next Gen Machine Translation •  Massive scale bilingual dictionaries •  BabelNet •  Automatic Term Extraction: forced decoding •  Dynamic retraining •  Optimal segment translation route •  L3Data curation, sharing Next Generation Machine Translation
  • 14. Data Management Lifecycles Publish Correct & refine Lex- concept lifecycleCorrect & refine Discover & use Discover & use Correct & refine Bitext lifecycle Discover data (Re)train- MT Revise and annotate Publish Content lifecycle Publish I18n & source QA Trans QA Post- edit Automated translation Consume Create
  • 15. •  Better in-context postediting: – XTM-Easyling •  Feeding term suggestions from posteditor to Terminology Management – XTM-Interverbum •  Dynamic Retraining – XTM-DCU •  Bilingual Dictionary SMT improvements – XTM-DCU •  NER, terminology enforcements, forced decoding – XTM-Interverbum-DCU •  Postediting prioritisation and term flagging – TCD-DCU-XTM •  Publishing interlinks of parallel text, lexically rich term bases – TCD: DG-T TM, EurVoc, Snomed-CT, LEMON, BabelNet •  Closing the loop – operational instrumentation of postediting – XTM Systems Integration