SlideShare a Scribd company logo
How to
Successfully Integrate
Machine Translation
in your Company
Diego Bartolome
@diegobartolome
dbc@tauyou.com
and others
70+ clients
18 countries
~700 Million words in 2014
All language pairs
performance demanded
in high end markets
performance demanded
in low end markets
sustaining technology
disruptive technology
Objectives for Machine Translation
Productivity gains
Direct cost reduction
Quality consistency
New uses for Machine Translation
Multilingual customer support
Social Media monitoring
Applications enabled by Big Data
Internet of Everything /Internet of Things
Speech-to-Speech translation
Questions: First Round
What is your experience with MT?
1. Quality Metrics
2. Cost reduction
3. Impact on Delivery Times
4. Feedback from Post-editors
5. Your Feelings
Learning about Machine Translation
https://www.taus.net/think-
tank/reports/translate-reports/taus-
translation-technology-landscape-report
https://www.taus.net/think-
tank/reports/translate-reports/moses-mt-
market-report
http://www.lt-
innovate.eu/resources/document/lt-20-13
http://www.gala-global.org/onDemand
Machine Translation Types
Google/Bing Translator vs. Moses
Advantages
Big(gger) data
State-of-the-art technology
Learning curve
Disadvantages
Black-box
Confidentiality
Control
Internal vs. external
Core competence
Resources
ROI
Time to market
Costs of Machine Translation
Internal development – people and time
Free tools – Google + Bing
DOiY solutions
Traditional pricing model
tauyou managed solution
Revenue from Machine Translation
Translation as a Service
Private Machine Translation Portal
MT of internal communication (flat rate)
….
and many others!
Questions: Round 2
1. Where do you provide value now?
2. Where do you think the value will be?
3. How important is confidentiality?
4. Do you care about control?
5. How much could you invest on MT?
(time, people, money)
6. When will your solution be available?
On Language Quality (I)
Source: translate.autodesk.com
On Language Quality (II)
Source: Philipp Koehn
Some Languages Sorted
From EN into
1) FR, ES, PT, IT
2) DE, NL, HE
3) ZH, JA, KR
4) RU, AR, TR, HI
On Domain Quality
Who is willing to pay?
Where does your revenue come from?
What are your key skills?
What domains achieve good quality?
… Quality Order of your domains ...
Questions: Round 3
1. What is your main motivation?
2. Can you try more than 1 domain?
3. Can you train at least 2 language pairs?
4. Can you pilot several MT vendors?
5. What are your current expectations?
Data acquisition
OPUS corpora
http://opus.lingfil.uu.se/
WMT workshops
e.g. http://www.statmt.org/wmt13/
Multilingual websites
TAUS
Corpora building
Related vs. unrelated materials
Percentage of out-of-domain
Does mono-lingual data help?
Corpora extension with linguistic processing
Ad-hoc corpus for file translation
The more, the better?
Data cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data (optimization)
Remark
Data cleaning and selection is a key process
Just more data may harm the quality
Training strategies
One single system with all TMs
+ glossaries
+ linguistic processing input/output
+ forbidden words lists
Layered approach
Generic domain subdomain client→ → →
Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve tokenization, recasing, ...
Workflow integration
Use MT as a secondary TM
Bilingual pre-translated translation files
CAT tool integration
Differentiated workflow
Continuous improvement
Qualitative
Use updated TMs in new trainings
Immediate (incremental) retraining
Rule-based automatic post-editing
Selective pre- and/or post-processing
Source content optimization
Linguistic processing notes
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
Questions: Round 4
What is your preferred option?
How much can you invest in improvements?
The Post-editor profile
Do skills needed differ from translation?
Post-editing guidelines (TAUS)
Full vs. light post-editing
http://www.slideshare.net/TAUS/taus-
mt-postediting-guidelines
Compensation
Questions: Round 5
Do you have the right resources to start?
Quality Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction
Questions: Round 6
Are you able to measure?
Once upon an industry ...
Change
before you
have to
Jack Welch

More Related Content

Similar to Machine Translation Master Class at the EUATC Conference by Diego Bartolome

Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
LoriThicke
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
kantanmt
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
Iconic Translation Machines
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
bdonaldson
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive Translation
Poulomi Choudhury
 
Overcoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationOvercoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering Translation
Ryan Coleman
 
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
Manuel Herranz
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
SDL
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
RIILP
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit Michigan
Kirti Vashee
 
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
TAUS - The Language Data Network
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projects
John Tinsley
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
Iconic Translation Machines
 
Insights in the MT Market, by Jaap van der Meer, TAUS
Insights in the MT Market, by Jaap van der Meer, TAUSInsights in the MT Market, by Jaap van der Meer, TAUS
Insights in the MT Market, by Jaap van der Meer, TAUS
TAUS - The Language Data Network
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
Welocalize
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
kantanmt
 
What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...
tauyou
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS - The Language Data Network
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
kantanmt
 

Similar to Machine Translation Master Class at the EUATC Conference by Diego Bartolome (20)

Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive Translation
 
Overcoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationOvercoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering Translation
 
Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
MiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit MichiganMiTiN 2013 Keynote in Detroit Michigan
MiTiN 2013 Keynote in Detroit Michigan
 
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
FIPOTranslations - Who Need Them and How LE technologies Can Help, Henry Wang...
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projects
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Insights in the MT Market, by Jaap van der Meer, TAUS
Insights in the MT Market, by Jaap van der Meer, TAUSInsights in the MT Market, by Jaap van der Meer, TAUS
Insights in the MT Market, by Jaap van der Meer, TAUS
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
 
What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
Smt & data quality
Smt & data qualitySmt & data quality
Smt & data quality
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 

More from tauyou

Artificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in TranslationArtificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in Translation
tauyou
 
From the Lab to the Market
From the Lab to the MarketFrom the Lab to the Market
From the Lab to the Market
tauyou
 
APIfying the Translation Industry
APIfying the Translation IndustryAPIfying the Translation Industry
APIfying the Translation Industry
tauyou
 
The Discreet Charm of Machine Translation
The Discreet Charm of Machine TranslationThe Discreet Charm of Machine Translation
The Discreet Charm of Machine Translation
tauyou
 
Women in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego BartolomeWomen in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego Bartolome
tauyou
 
TAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English ModuleTAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English Module
tauyou
 
The Beauty of Machine Translation
The Beauty of Machine TranslationThe Beauty of Machine Translation
The Beauty of Machine Translation
tauyou
 
Emerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business ModelsEmerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business Models
tauyou
 
Innovating in Translation
Innovating in TranslationInnovating in Translation
Innovating in Translation
tauyou
 
Pushing Machine Translation Forward
Pushing Machine Translation ForwardPushing Machine Translation Forward
Pushing Machine Translation Forward
tauyou
 
The State of Post-Editing
The State of Post-EditingThe State of Post-Editing
The State of Post-Editing
tauyou
 
lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)
tauyou
 
Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)
tauyou
 
Entrepreneurship in Education
Entrepreneurship in EducationEntrepreneurship in Education
Entrepreneurship in Education
tauyou
 
2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)
tauyou
 
2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session
tauyou
 
2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industry2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industry
tauyou
 
2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Session2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Session
tauyou
 
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
tauyou
 
2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPs2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPs
tauyou
 

More from tauyou (20)

Artificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in TranslationArtificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in Translation
 
From the Lab to the Market
From the Lab to the MarketFrom the Lab to the Market
From the Lab to the Market
 
APIfying the Translation Industry
APIfying the Translation IndustryAPIfying the Translation Industry
APIfying the Translation Industry
 
The Discreet Charm of Machine Translation
The Discreet Charm of Machine TranslationThe Discreet Charm of Machine Translation
The Discreet Charm of Machine Translation
 
Women in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego BartolomeWomen in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego Bartolome
 
TAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English ModuleTAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English Module
 
The Beauty of Machine Translation
The Beauty of Machine TranslationThe Beauty of Machine Translation
The Beauty of Machine Translation
 
Emerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business ModelsEmerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business Models
 
Innovating in Translation
Innovating in TranslationInnovating in Translation
Innovating in Translation
 
Pushing Machine Translation Forward
Pushing Machine Translation ForwardPushing Machine Translation Forward
Pushing Machine Translation Forward
 
The State of Post-Editing
The State of Post-EditingThe State of Post-Editing
The State of Post-Editing
 
lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)
 
Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)
 
Entrepreneurship in Education
Entrepreneurship in EducationEntrepreneurship in Education
Entrepreneurship in Education
 
2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)
 
2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session
 
2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industry2013 ATC Conference London: New Business Models for the Translation Industry
2013 ATC Conference London: New Business Models for the Translation Industry
 
2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Session2013 TMS Inspiration Days Krakow: A Business Model Generation Session
2013 TMS Inspiration Days Krakow: A Business Model Generation Session
 
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
2013 GALA Miami: Breaking into Latin Maerican Markets on a Small Budget
 
2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPs2012 Traducción Automática para LSPs
2012 Traducción Automática para LSPs
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

Machine Translation Master Class at the EUATC Conference by Diego Bartolome

  • 1. How to Successfully Integrate Machine Translation in your Company Diego Bartolome @diegobartolome dbc@tauyou.com
  • 3. 70+ clients 18 countries ~700 Million words in 2014 All language pairs
  • 4.
  • 5.
  • 6. performance demanded in high end markets performance demanded in low end markets sustaining technology disruptive technology
  • 7. Objectives for Machine Translation Productivity gains Direct cost reduction Quality consistency
  • 8. New uses for Machine Translation Multilingual customer support Social Media monitoring Applications enabled by Big Data Internet of Everything /Internet of Things Speech-to-Speech translation
  • 9. Questions: First Round What is your experience with MT? 1. Quality Metrics 2. Cost reduction 3. Impact on Delivery Times 4. Feedback from Post-editors 5. Your Feelings
  • 10. Learning about Machine Translation https://www.taus.net/think- tank/reports/translate-reports/taus- translation-technology-landscape-report https://www.taus.net/think- tank/reports/translate-reports/moses-mt- market-report http://www.lt- innovate.eu/resources/document/lt-20-13 http://www.gala-global.org/onDemand
  • 12. Google/Bing Translator vs. Moses Advantages Big(gger) data State-of-the-art technology Learning curve Disadvantages Black-box Confidentiality Control
  • 13. Internal vs. external Core competence Resources ROI Time to market
  • 14. Costs of Machine Translation Internal development – people and time Free tools – Google + Bing DOiY solutions Traditional pricing model tauyou managed solution
  • 15. Revenue from Machine Translation Translation as a Service Private Machine Translation Portal MT of internal communication (flat rate) …. and many others!
  • 16. Questions: Round 2 1. Where do you provide value now? 2. Where do you think the value will be? 3. How important is confidentiality? 4. Do you care about control? 5. How much could you invest on MT? (time, people, money) 6. When will your solution be available?
  • 17. On Language Quality (I) Source: translate.autodesk.com
  • 18. On Language Quality (II) Source: Philipp Koehn
  • 19. Some Languages Sorted From EN into 1) FR, ES, PT, IT 2) DE, NL, HE 3) ZH, JA, KR 4) RU, AR, TR, HI
  • 20. On Domain Quality Who is willing to pay? Where does your revenue come from? What are your key skills? What domains achieve good quality? … Quality Order of your domains ...
  • 21. Questions: Round 3 1. What is your main motivation? 2. Can you try more than 1 domain? 3. Can you train at least 2 language pairs? 4. Can you pilot several MT vendors? 5. What are your current expectations?
  • 22. Data acquisition OPUS corpora http://opus.lingfil.uu.se/ WMT workshops e.g. http://www.statmt.org/wmt13/ Multilingual websites TAUS
  • 23. Corpora building Related vs. unrelated materials Percentage of out-of-domain Does mono-lingual data help? Corpora extension with linguistic processing Ad-hoc corpus for file translation The more, the better?
  • 24. Data cleaning Clean translation memories Length, punctuation, terminology, … Inconsistencies, repetitions, ... Segment splitting Optimize weight of most frequent n-grams Validate their translations Add out-of-domain data (optimization)
  • 25. Remark Data cleaning and selection is a key process Just more data may harm the quality
  • 26. Training strategies One single system with all TMs + glossaries + linguistic processing input/output + forbidden words lists Layered approach Generic domain subdomain client→ → →
  • 27. Models optimization Filter the translation tables Remove the garbage + tune weights Optimize language models Adapt them to the translation purpose Tune parameters correctly Tune set, test set, optimization parameters Improve tokenization, recasing, ...
  • 28. Workflow integration Use MT as a secondary TM Bilingual pre-translated translation files CAT tool integration Differentiated workflow
  • 29. Continuous improvement Qualitative Use updated TMs in new trainings Immediate (incremental) retraining Rule-based automatic post-editing Selective pre- and/or post-processing Source content optimization
  • 30. Linguistic processing notes In the source and/or target language Grammar checking Entities detection Proper nouns, alphanumeric words, ... Compound words splitting Sentence reordering
  • 31. Questions: Round 4 What is your preferred option? How much can you invest in improvements?
  • 32. The Post-editor profile Do skills needed differ from translation? Post-editing guidelines (TAUS) Full vs. light post-editing http://www.slideshare.net/TAUS/taus- mt-postediting-guidelines Compensation
  • 33. Questions: Round 5 Do you have the right resources to start?
  • 34. Quality Metrics SMT metrics: BLEU, NIST Feedback from translators Translation time vs. Post-editing time Word Error Rate (WER) or Edit Distance Cost reduction
  • 35. Questions: Round 6 Are you able to measure?
  • 36.
  • 37. Once upon an industry ...
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.