SlideShare a Scribd company logo
1 of 30
Download to read offline
tauyou MT platform:
the basics
Diego Bartolome
@diegobartolome
dbc@tauyou.com
performance demanded
in high end markets
performance demanded
in low end markets
sustaining technology
disruptive technology
Objectives for Machine Translation
Productivity gains
Direct cost reduction
Quality consistency
New uses for Machine Translation
Multilingual customer support
Social Media monitoring
Applications enabled by Big Data
Internet of Everything /Internet of Things
Speech-to-Speech translation
Questions
What is your experience with MT?
1. Quality Metrics
2. Cost reduction
3. Impact on Delivery Times
4. Feedback on quality
5. Your Feelings
Machine Translation Types
Google/Bing Translator vs. tauyou
Advantages
Big(gger) data
State-of-the-art technology
Learning curve
Disadvantages
Black-box
Confidentiality
Control
Costs of Machine Translation
Internal development – people and time
Free tools – Google + Bing
DIY solutions
Traditional pricing model
tauyou managed solution
Revenue from Machine Translation
Translation as a Service
Private Machine Translation Portal
MT of internal communication (flat rate)
….
and many others!
Questions
1. Where do you provide value now?
2. Where do you think the value will be?
3. How important is confidentiality?
4. Do you care about control?
5. How much could you invest on MT?
(time, people, money)
6. When will your solution be available?
On Language Quality
Some Languages Sorted
From EN into
1) FR, ES, PT, IT
2) DE, NL, HE, DA, NO, SV
3) ZH, JA, RU
4) KR, AR, TR, HI
On Domain Quality
Who is willing to pay?
Where does your revenue come from?
What are your key skills?
What domains achieve good quality?
… Quality Order of your domains ...
Questions
1. What is your main motivation?
2. Can you try more than 1 domain?
3. Can you train at least 2 language pairs?
4. Can you pilot several MT vendors?
5. What are your expectations?
Data acquisition
OPUS corpora
http://opus.lingfil.uu.se/
WMT workshops
e.g. http://www.statmt.org/wmt16/
Multilingual websites
TAUS
Corpora building
Related vs. unrelated materials
Percentage of out-of-domain
Does mono-lingual data help?
Corpora extension with linguistic processing
Ad-hoc corpus for file translation
The more, the better?
Data cleaning
Clean translation memories
Length, punctuation, terminology, …
Inconsistencies, repetitions, ...
Segment splitting
Optimize weight of most frequent n-grams
Validate their translations
Add out-of-domain data (optimization)
Remark
Data cleaning and selection is a key process
Just more data may harm the quality
Training strategies
One single system with all TMs
+ glossaries
+ linguistic processing input/output
+ forbidden words lists
Layered approach
Generic domain subdomain client→ → →
Models optimization
Filter the translation tables
Remove the garbage + tune weights
Optimize language models
Adapt them to the translation purpose
Tune parameters correctly
Tune set, test set, optimization parameters
Improve tokenization, recasing, ...
Workflow integration
Use MT as a secondary TM
Bilingual pre-translated translation files
CAT tool integration
Differentiated workflow
Continuous improvement
Qualitative
Use updated TMs in new trainings
Immediate (incremental) retraining
Rule-based automatic post-editing
Selective pre- and/or post-processing
Source content optimization
Linguistic processing notes
In the source and/or target language
Grammar checking
Entities detection
Proper nouns, alphanumeric words, ...
Compound words splitting
Sentence reordering
The Post-editor profile
Do skills needed differ from translation?
Post-editing guidelines
Full vs. light post-editing
http://www.slideshare.net/TAUS/taus-mt-postediting-guidelines
Compensation
Questions
Do you have the right resources to start?
Quality Metrics
SMT metrics: BLEU, NIST
Feedback from translators
Translation time vs. Post-editing time
Word Error Rate (WER) or Edit Distance
Cost reduction
Questions
Are you able to measure?
Change
before you
have to
Jack Welch

More Related Content

Viewers also liked

Support Letter - TAR-GET Sdn Bhd
Support Letter - TAR-GET Sdn BhdSupport Letter - TAR-GET Sdn Bhd
Support Letter - TAR-GET Sdn BhdHafiz Alami Hussein
 
4S Bilgi Teknolojileri Hakkında
4S Bilgi Teknolojileri Hakkında4S Bilgi Teknolojileri Hakkında
4S Bilgi Teknolojileri HakkındaSerdar Zeybek
 
(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modelingSerhii Havrylov
 

Viewers also liked (7)

Support Letter - TAR-GET Sdn Bhd
Support Letter - TAR-GET Sdn BhdSupport Letter - TAR-GET Sdn Bhd
Support Letter - TAR-GET Sdn Bhd
 
4S Bilgi Teknolojileri Hakkında
4S Bilgi Teknolojileri Hakkında4S Bilgi Teknolojileri Hakkında
4S Bilgi Teknolojileri Hakkında
 
MundoTI57
MundoTI57MundoTI57
MundoTI57
 
Grafos 8.1.1
Grafos 8.1.1Grafos 8.1.1
Grafos 8.1.1
 
Merkez Bankası ve Ticaret Bankacılığı İlişkileri
Merkez Bankası ve Ticaret Bankacılığı İlişkileriMerkez Bankası ve Ticaret Bankacılığı İlişkileri
Merkez Bankası ve Ticaret Bankacılığı İlişkileri
 
(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling(Kpi summer school 2015) word embeddings and neural language modeling
(Kpi summer school 2015) word embeddings and neural language modeling
 
What is Mindfulness
What is MindfulnessWhat is Mindfulness
What is Mindfulness
 

Similar to Workshop on the tauyou machine translation platform

Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolometauyou
 
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?tauyou
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsJohn Tinsley
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyIconic Translation Machines
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLoriThicke
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16kantanmt
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsIconic Translation Machines
 
What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...tauyou
 
Overcoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationOvercoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationRyan Coleman
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translationbdonaldson
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationPoulomi Choudhury
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyIconic Translation Machines
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
How to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a SupplierHow to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a SupplierResearchShare
 
(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industry(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industryLoctimize GmbH
 
The importance of terminology
The importance of terminologyThe importance of terminology
The importance of terminologySDL Trados
 
Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...
Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...
Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...Sarah Silveri, RSI Content Solutions
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinarkantanmt
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...SDL
 

Similar to Workshop on the tauyou machine translation platform (20)

Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
 
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
2013 ALC Boston: Your Trained Moses SMT System doesn't work. What can you do?
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projects
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Lexcelera MT Breaking Compromises
Lexcelera MT Breaking CompromisesLexcelera MT Breaking Compromises
Lexcelera MT Breaking Compromises
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
What? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projectsWhat? Why? How? Factors that impact the success of commercial MT projects
What? Why? How? Factors that impact the success of commercial MT projects
 
What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...What you need to put Machine Translation into practice: Tools, People, and Pr...
What you need to put Machine Translation into practice: Tools, People, and Pr...
 
Overcoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering TranslationOvercoming the Language Barrier: Considering Translation
Overcoming the Language Barrier: Considering Translation
 
Good Applications of Bad Machine Translation
Good Applications of Bad Machine TranslationGood Applications of Bad Machine Translation
Good Applications of Bad Machine Translation
 
Managing Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive TranslationManaging Translation Memories for Engineering and Automotive Translation
Managing Translation Memories for Engineering and Automotive Translation
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
How to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a SupplierHow to Purchase Translations and What to Look For in a Supplier
How to Purchase Translations and What to Look For in a Supplier
 
Smt & data quality
Smt & data qualitySmt & data quality
Smt & data quality
 
(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industry(Recent) technology trends and bridges to gap in the localization industry
(Recent) technology trends and bridges to gap in the localization industry
 
The importance of terminology
The importance of terminologyThe importance of terminology
The importance of terminology
 
Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...
Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...
Don’t Hide Your Content in a Traditional Help System: A Case Study from TechP...
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 

More from tauyou

Artificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in TranslationArtificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in Translationtauyou
 
I can't help falling in love with machine translation
I can't help falling in love with machine translationI can't help falling in love with machine translation
I can't help falling in love with machine translationtauyou
 
Mind the gap between what you say and what you deliver
Mind the gap between what you say and what you deliverMind the gap between what you say and what you deliver
Mind the gap between what you say and what you delivertauyou
 
Some Lessons Learned on Machine Translation
Some Lessons Learned on Machine TranslationSome Lessons Learned on Machine Translation
Some Lessons Learned on Machine Translationtauyou
 
From the Lab to the Market
From the Lab to the MarketFrom the Lab to the Market
From the Lab to the Markettauyou
 
APIfying the Translation Industry
APIfying the Translation IndustryAPIfying the Translation Industry
APIfying the Translation Industrytauyou
 
The Discreet Charm of Machine Translation
The Discreet Charm of Machine TranslationThe Discreet Charm of Machine Translation
The Discreet Charm of Machine Translationtauyou
 
Women in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego BartolomeWomen in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego Bartolometauyou
 
TAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English ModuleTAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English Moduletauyou
 
The Beauty of Machine Translation
The Beauty of Machine TranslationThe Beauty of Machine Translation
The Beauty of Machine Translationtauyou
 
Emerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business ModelsEmerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business Modelstauyou
 
Innovating in Translation
Innovating in TranslationInnovating in Translation
Innovating in Translationtauyou
 
Pushing Machine Translation Forward
Pushing Machine Translation ForwardPushing Machine Translation Forward
Pushing Machine Translation Forwardtauyou
 
The State of Post-Editing
The State of Post-EditingThe State of Post-Editing
The State of Post-Editingtauyou
 
lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)tauyou
 
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)tauyou
 
Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)tauyou
 
Entrepreneurship in Education
Entrepreneurship in EducationEntrepreneurship in Education
Entrepreneurship in Educationtauyou
 
2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)tauyou
 
2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Sessiontauyou
 

More from tauyou (20)

Artificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in TranslationArtificial Intelligence and Machine Learning found in Translation
Artificial Intelligence and Machine Learning found in Translation
 
I can't help falling in love with machine translation
I can't help falling in love with machine translationI can't help falling in love with machine translation
I can't help falling in love with machine translation
 
Mind the gap between what you say and what you deliver
Mind the gap between what you say and what you deliverMind the gap between what you say and what you deliver
Mind the gap between what you say and what you deliver
 
Some Lessons Learned on Machine Translation
Some Lessons Learned on Machine TranslationSome Lessons Learned on Machine Translation
Some Lessons Learned on Machine Translation
 
From the Lab to the Market
From the Lab to the MarketFrom the Lab to the Market
From the Lab to the Market
 
APIfying the Translation Industry
APIfying the Translation IndustryAPIfying the Translation Industry
APIfying the Translation Industry
 
The Discreet Charm of Machine Translation
The Discreet Charm of Machine TranslationThe Discreet Charm of Machine Translation
The Discreet Charm of Machine Translation
 
Women in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego BartolomeWomen in Localization UK Webinar with Diego Bartolome
Women in Localization UK Webinar with Diego Bartolome
 
TAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English ModuleTAUS Post-editing webinar. Spanish-to-English Module
TAUS Post-editing webinar. Spanish-to-English Module
 
The Beauty of Machine Translation
The Beauty of Machine TranslationThe Beauty of Machine Translation
The Beauty of Machine Translation
 
Emerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business ModelsEmerging Technologies Enabling New Business Models
Emerging Technologies Enabling New Business Models
 
Innovating in Translation
Innovating in TranslationInnovating in Translation
Innovating in Translation
 
Pushing Machine Translation Forward
Pushing Machine Translation ForwardPushing Machine Translation Forward
Pushing Machine Translation Forward
 
The State of Post-Editing
The State of Post-EditingThe State of Post-Editing
The State of Post-Editing
 
lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)lo que he aprendido (y quiero compartir)
lo que he aprendido (y quiero compartir)
 
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
 
Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)Learn to Innovate (GALA Istanbul 2014)
Learn to Innovate (GALA Istanbul 2014)
 
Entrepreneurship in Education
Entrepreneurship in EducationEntrepreneurship in Education
Entrepreneurship in Education
 
2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)2013 UAB Barcelona: Change the world (one start-up at a time)
2013 UAB Barcelona: Change the world (one start-up at a time)
 
2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session2013 Tekom Wiesbaden: A Business Model Generation Session
2013 Tekom Wiesbaden: A Business Model Generation Session
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Workshop on the tauyou machine translation platform

  • 1. tauyou MT platform: the basics Diego Bartolome @diegobartolome dbc@tauyou.com
  • 2.
  • 3.
  • 4. performance demanded in high end markets performance demanded in low end markets sustaining technology disruptive technology
  • 5. Objectives for Machine Translation Productivity gains Direct cost reduction Quality consistency
  • 6. New uses for Machine Translation Multilingual customer support Social Media monitoring Applications enabled by Big Data Internet of Everything /Internet of Things Speech-to-Speech translation
  • 7. Questions What is your experience with MT? 1. Quality Metrics 2. Cost reduction 3. Impact on Delivery Times 4. Feedback on quality 5. Your Feelings
  • 9. Google/Bing Translator vs. tauyou Advantages Big(gger) data State-of-the-art technology Learning curve Disadvantages Black-box Confidentiality Control
  • 10. Costs of Machine Translation Internal development – people and time Free tools – Google + Bing DIY solutions Traditional pricing model tauyou managed solution
  • 11. Revenue from Machine Translation Translation as a Service Private Machine Translation Portal MT of internal communication (flat rate) …. and many others!
  • 12. Questions 1. Where do you provide value now? 2. Where do you think the value will be? 3. How important is confidentiality? 4. Do you care about control? 5. How much could you invest on MT? (time, people, money) 6. When will your solution be available?
  • 14. Some Languages Sorted From EN into 1) FR, ES, PT, IT 2) DE, NL, HE, DA, NO, SV 3) ZH, JA, RU 4) KR, AR, TR, HI
  • 15. On Domain Quality Who is willing to pay? Where does your revenue come from? What are your key skills? What domains achieve good quality? … Quality Order of your domains ...
  • 16. Questions 1. What is your main motivation? 2. Can you try more than 1 domain? 3. Can you train at least 2 language pairs? 4. Can you pilot several MT vendors? 5. What are your expectations?
  • 17. Data acquisition OPUS corpora http://opus.lingfil.uu.se/ WMT workshops e.g. http://www.statmt.org/wmt16/ Multilingual websites TAUS
  • 18. Corpora building Related vs. unrelated materials Percentage of out-of-domain Does mono-lingual data help? Corpora extension with linguistic processing Ad-hoc corpus for file translation The more, the better?
  • 19. Data cleaning Clean translation memories Length, punctuation, terminology, … Inconsistencies, repetitions, ... Segment splitting Optimize weight of most frequent n-grams Validate their translations Add out-of-domain data (optimization)
  • 20. Remark Data cleaning and selection is a key process Just more data may harm the quality
  • 21. Training strategies One single system with all TMs + glossaries + linguistic processing input/output + forbidden words lists Layered approach Generic domain subdomain client→ → →
  • 22. Models optimization Filter the translation tables Remove the garbage + tune weights Optimize language models Adapt them to the translation purpose Tune parameters correctly Tune set, test set, optimization parameters Improve tokenization, recasing, ...
  • 23. Workflow integration Use MT as a secondary TM Bilingual pre-translated translation files CAT tool integration Differentiated workflow
  • 24. Continuous improvement Qualitative Use updated TMs in new trainings Immediate (incremental) retraining Rule-based automatic post-editing Selective pre- and/or post-processing Source content optimization
  • 25. Linguistic processing notes In the source and/or target language Grammar checking Entities detection Proper nouns, alphanumeric words, ... Compound words splitting Sentence reordering
  • 26. The Post-editor profile Do skills needed differ from translation? Post-editing guidelines Full vs. light post-editing http://www.slideshare.net/TAUS/taus-mt-postediting-guidelines Compensation
  • 27. Questions Do you have the right resources to start?
  • 28. Quality Metrics SMT metrics: BLEU, NIST Feedback from translators Translation time vs. Post-editing time Word Error Rate (WER) or Edit Distance Cost reduction
  • 29. Questions Are you able to measure?