SlideShare a Scribd company logo
PRESENTATION
AMTA
ALEXYANISHEVSKY
Welocalize
November 2015
How Much Cake is Enough:
The Case for Domain-Specific
Engines
What is the Tipping Point?
How Much Cake Is Too Much Cake?
I
LOVE
CAKE…
TOO MUCH
CAKE! ugh!
•How many engines
•How to split domains
•How to measure success
•How to improve
AGENDA
HOW MANY ENGINES: CRITERIA
• Environment: elegant deployment?
• Cost
• How different are they from each other?
• Maintenance: engineering + linguistic feedback implementation
HOW TO SPLIT DOMAINS:
CRITERIA
• Content owner feedback
• Historical experience based on business unit or portfolio
• Naming convention
• Style analysis: difference in characteristics based on lexical diversity,
sentence length + syntactic complexity
HOW TO SPLIT DOMAINS:
TOOLS
• Build domain-specific language models + select TUs for domain by PPL
• Source Content Profiler – helps identify domain based on language models,
as well as other stylistic characteristics
• Style Scorer – higher score indicates better match to style established by
client’s documents
HOLISTIC APPROACH BASED ON SEVERAL TOOLS:
TOOLS: PERPLEXITY EVALUATOR
TU LEVEL
<tu srclang="EN-US" tuid="75438"> <prop type="x-ppl:train2">208</prop><prop type="x-
ppl:techdoc6">191.025</prop><prop type="x-ppl:support2">325.983</prop><prop type="x-
ppl:sales1">97.0736</prop><prop type="x-ppl:productLoc1">396.398</prop><prop type="x-
ppl:legal1">617.876</prop><tuv xml:lang="EN-US"> <seg>Consistent feature set across
multiple platforms (Windows, Mac, iOS, Android).</seg> </tuv> <tuv
changedate="20140325T122530Z" changeid="serviceaaa" creationdate="20140325T122530Z"
creationid="serviceaaa" lastusagedate="20140325T122530Z" usagecount="0" xml:lang="ES-XL">
<prop type="x-ALS:Context">TEXT</prop> <prop type="x-ALS:Source
File">DATATC39720SRCEN-USco-02__battle-card_enco-02__battle-card_en.inx</prop>
<seg>Conjunto de características coherente en varias plataformas (Windows, Mac, iOS,
Android)</seg> </tuv> </tu>
TOOLS: SOURCE CONTENT PROFILERIN CONJUNCTION with DCU/ADAPT
TOOLS: STYLE SCORER
COMBINES PPL RATIOS,
DISSIMILARITY SCORE +
CLASSIFICATION SCORE
TEST CATEGORY TRAINING CATEGORY SCORE
SUPPORT TECH DOC 3.16
TECH DOC TECH DOC 2.94
TECH DOC LEGAL .02
WHY USE STYLE SCORER?
• Identify similarity of source document to “gold standard” documents from that
domain and other domains
• Identify similarity of target document to “gold standard” documents from that
domain and other domains
• Example: Is this really a support document? To what degree is it similar to
other support documents, tech doc documents, etc.?
• Dissimilarity can point to worse quality for raw MT and/or reduced post-editing
productivity
STYLE SCORER + SCP
• SCP helps classify a document
• Style Scorer tells you how good a match a document is to a profile
• SCP only works on English source
• Style Scorer works on English source + non-English target
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.
CASE STUDY
ONE DOMAIN?
One ring
to rule them all
CASE STUDY: HOW MANY DOMAINS?
• Started with 6 Domains: Technical Documentation, Legal, Support, Training,
Product UI, Sales/Marketing
• Found that Technical Documentation, Support + Training were very similar based
on LM scores against each other, Style Scorer, length of sentences, similar
grammatical structures
• Found that Product UI was close enough to above 3 that making a separate
engine was not warranted
• Found that Legal + Sales/Marketing were different enough from above domains
and from each other based on LM scores against each other, Style Scorer +
length of sentences
CASE STUDY: GATHERING ASSETS
TMs
• Old
• Somewhat recent
• Current
• Termbases in MultiTerm
• Existing user dictionaries + normalization dictionaries
• New user dictionaries based on term extractions + auto-import for some
languages
CASE STUDY: CURATING ASSETS
• Cleaned TMs
• Introduced metadata for PPL into TUs based on LM perplexity
• Kept the UDs + normalization dictionaries as is
• Additional term extraction for weak languages or languages with
insufficient assets
CASE STUDY: ENGINE ITERATIONS
Based on options in Systran:
• RBMT only
• Hybrid with Stemming, LM Order, Distortion, etc.
• SMT only
HOW TO MEASURE SUCCESS
• Automatic scores
• Human evaluations
• Some reduction in linguistic issues
Forthcoming
• Informed PE Distance
• Productivity
• Systematic reduction of linguistic issues
CASE STUDY: AUTOMATIC SCORES
FOR GENERAL/TECH DOC
CASE STUDY: HUMAN EVALUATIONS
FOR GENERAL/TECH DOC
CASE STUDY: AUTOMATIC SCORES
SALES/MARKETING1
CASE STUDY: AUTOMATIC SCORES
SALES/MARKETING2
CASE STUDY: AUTOMATIC SCORES
LEGAL1
CASE STUDY: AUTOMATIC SCORES
LEGAL2
CASE STUDY: AUTOMATIC SCORES
TECH DOC
HOW TO IMPROVE
Scoring
• Linguistically-informed automatic scores: PE distance with different
weights for POS
• Productivity
Engine
• Eradicate high-frequency inconsistencies between TMs, Termbases +
User Dictionaries (UDs)
• Create domain-specific UDs
• Send best reply: TMT Prime, send best translation irrespective of domain
Source
• Pre-MT source check: was this content properly categorized?
SUMMARY
• Domain-specific engines yield better results as evidenced by auto scores
and human evaluations. Some of evidence of reduction of linguistic issues.
• Group closely related content into one domain
• Determine how many engines your infrastructure can support
YOU
THANK
ALEXYANISHEVSKY
Welocalize
November 2015

More Related Content

Viewers also liked

MT Quality Evaluations: From Test Environment to Production
MT Quality Evaluations: From Test Environment to ProductionMT Quality Evaluations: From Test Environment to Production
MT Quality Evaluations: From Test Environment to Production
Welocalize
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Welocalize
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Welocalize
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
Welocalize
 
2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology 2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology
Welocalize
 
EAMT Presentation by Welocalize Olga Beregovaya May 2015
EAMT Presentation by Welocalize Olga Beregovaya May 2015EAMT Presentation by Welocalize Olga Beregovaya May 2015
EAMT Presentation by Welocalize Olga Beregovaya May 2015
Welocalize
 
Better translations through automated source and post edit analysis
Better translations through automated source and post edit analysisBetter translations through automated source and post edit analysis
Better translations through automated source and post edit analysis
Welocalize
 
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
TAUS - The Language Data Network
 
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
Welocalize
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
Welocalize
 

Viewers also liked (11)

MT Quality Evaluations: From Test Environment to Production
MT Quality Evaluations: From Test Environment to ProductionMT Quality Evaluations: From Test Environment to Production
MT Quality Evaluations: From Test Environment to Production
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
 
2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology 2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology
 
EAMT Presentation by Welocalize Olga Beregovaya May 2015
EAMT Presentation by Welocalize Olga Beregovaya May 2015EAMT Presentation by Welocalize Olga Beregovaya May 2015
EAMT Presentation by Welocalize Olga Beregovaya May 2015
 
Better translations through automated source and post edit analysis
Better translations through automated source and post edit analysisBetter translations through automated source and post edit analysis
Better translations through automated source and post edit analysis
 
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
 
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 

Similar to How Much Cake to Eat: The Case for Targeted MT Engines

Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Hady Elsahar
 
Babysitting your orm essenmacher, adam
Babysitting your orm   essenmacher, adamBabysitting your orm   essenmacher, adam
Babysitting your orm essenmacher, adam
Adam Essenmacher
 
The Intricacies of DITA Content Localization
The Intricacies of DITA Content LocalizationThe Intricacies of DITA Content Localization
The Intricacies of DITA Content Localization
IXIASOFT
 
The importance of terminology
The importance of terminologyThe importance of terminology
The importance of terminology
SDL Trados
 
Opening the Black Box of Software Localization
Opening the Black Box of Software LocalizationOpening the Black Box of Software Localization
Opening the Black Box of Software Localization
Kenneth Farrall
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16
Laura Dent
 
DITA for Localization
DITA for LocalizationDITA for Localization
DITA for Localization
Andrzej Zydroń MBCS
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case Study
MongoDB
 
TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015
TAUS - The Language Data Network
 
Intro to Programming Lang.pptx
Intro to Programming Lang.pptxIntro to Programming Lang.pptx
Intro to Programming Lang.pptx
ssuser51ead3
 
Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4
Shahriar Rafee
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
lucenerevolution
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Lucidworks (Archived)
 
Talk proposal get_accepted
Talk proposal get_acceptedTalk proposal get_accepted
Talk proposal get_accepted
lauraxthomson
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
Findwise
 
IR
IRIR
Introduction to Testing and TDD
Introduction to Testing and TDDIntroduction to Testing and TDD
Introduction to Testing and TDD
Sarah Dutkiewicz
 
The Typed Index
The Typed IndexThe Typed Index
The Typed Index
lucenerevolution
 
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
Kevin Dias
 
Introduction
IntroductionIntroduction
Introduction
baran19901990
 

Similar to How Much Cake to Eat: The Case for Targeted MT Engines (20)

Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
Babysitting your orm essenmacher, adam
Babysitting your orm   essenmacher, adamBabysitting your orm   essenmacher, adam
Babysitting your orm essenmacher, adam
 
The Intricacies of DITA Content Localization
The Intricacies of DITA Content LocalizationThe Intricacies of DITA Content Localization
The Intricacies of DITA Content Localization
 
The importance of terminology
The importance of terminologyThe importance of terminology
The importance of terminology
 
Opening the Black Box of Software Localization
Opening the Black Box of Software LocalizationOpening the Black Box of Software Localization
Opening the Black Box of Software Localization
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16
 
DITA for Localization
DITA for LocalizationDITA for Localization
DITA for Localization
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case Study
 
TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015
 
Intro to Programming Lang.pptx
Intro to Programming Lang.pptxIntro to Programming Lang.pptx
Intro to Programming Lang.pptx
 
Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Talk proposal get_accepted
Talk proposal get_acceptedTalk proposal get_accepted
Talk proposal get_accepted
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
 
IR
IRIR
IR
 
Introduction to Testing and TDD
Introduction to Testing and TDDIntroduction to Testing and TDD
Introduction to Testing and TDD
 
The Typed Index
The Typed IndexThe Typed Index
The Typed Index
 
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
 
Introduction
IntroductionIntroduction
Introduction
 

More from Welocalize

Automating the Localization Workflow. What Works?
Automating the Localization Workflow. What Works?Automating the Localization Workflow. What Works?
Automating the Localization Workflow. What Works?
Welocalize
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize
 
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalizeContent Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
Welocalize
 
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize
 
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
Welocalize
 
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga BeregovayaBeyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Welocalize
 
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Welocalize
 

More from Welocalize (7)

Automating the Localization Workflow. What Works?
Automating the Localization Workflow. What Works?Automating the Localization Workflow. What Works?
Automating the Localization Workflow. What Works?
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalizeContent Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
 
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
 
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
 
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga BeregovayaBeyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
 
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
 

Recently uploaded

Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Operational Excellence Consulting
 
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
techboxsqauremedia
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
Adnet Communications
 
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
taqyea
 
3 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 20243 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 2024
SEOSMMEARTH
 
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
AnnySerafinaLove
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
Easily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYCEasily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYC
Any kyc Account
 
Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...
Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...
Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...
Neil Horowitz
 
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
SOFTTECHHUB
 
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Kalyan Satta Matka Guessing Matka Result Main Bazar chart
 
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
CA Dr. Prithvi Ranjan Parhi
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
Corey Perlman, Social Media Speaker and Consultant
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
aragme
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
my Pandit
 
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdfHOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
46adnanshahzad
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
bosssp10
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
Norma Mushkat Gaffin
 
Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024
FelixPerez547899
 

Recently uploaded (20)

Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
 
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel ChartSatta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
Satta Matka Dpboss Matka Guessing Kalyan Chart Indian Matka Kalyan panel Chart
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
 
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
一比一原版新西兰奥塔哥大学毕业证(otago毕业证)如何办理
 
3 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 20243 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 2024
 
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
 
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta MatkaDpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
Dpboss Matka Guessing Satta Matta Matka Kalyan Chart Satta Matka
 
Easily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYCEasily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYC
 
Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...
Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...
Brian Fitzsimmons on the Business Strategy and Content Flywheel of Barstool S...
 
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
Hamster Kombat' Telegram Game Surpasses 100 Million Players—Token Release Sch...
 
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
 
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
 
Authentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto RicoAuthentically Social by Corey Perlman - EO Puerto Rico
Authentically Social by Corey Perlman - EO Puerto Rico
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
 
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdfHOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
HOW TO START UP A COMPANY A STEP-BY-STEP GUIDE.pdf
 
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
Call 8867766396 Satta Matka Dpboss Matka Guessing Satta batta Matka 420 Satta...
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
 
Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024Company Valuation webinar series - Tuesday, 4 June 2024
Company Valuation webinar series - Tuesday, 4 June 2024
 

How Much Cake to Eat: The Case for Targeted MT Engines

  • 1. PRESENTATION AMTA ALEXYANISHEVSKY Welocalize November 2015 How Much Cake is Enough: The Case for Domain-Specific Engines
  • 2. What is the Tipping Point? How Much Cake Is Too Much Cake? I LOVE CAKE… TOO MUCH CAKE! ugh!
  • 3. •How many engines •How to split domains •How to measure success •How to improve AGENDA
  • 4. HOW MANY ENGINES: CRITERIA • Environment: elegant deployment? • Cost • How different are they from each other? • Maintenance: engineering + linguistic feedback implementation
  • 5. HOW TO SPLIT DOMAINS: CRITERIA • Content owner feedback • Historical experience based on business unit or portfolio • Naming convention • Style analysis: difference in characteristics based on lexical diversity, sentence length + syntactic complexity
  • 6. HOW TO SPLIT DOMAINS: TOOLS • Build domain-specific language models + select TUs for domain by PPL • Source Content Profiler – helps identify domain based on language models, as well as other stylistic characteristics • Style Scorer – higher score indicates better match to style established by client’s documents HOLISTIC APPROACH BASED ON SEVERAL TOOLS:
  • 7. TOOLS: PERPLEXITY EVALUATOR TU LEVEL <tu srclang="EN-US" tuid="75438"> <prop type="x-ppl:train2">208</prop><prop type="x- ppl:techdoc6">191.025</prop><prop type="x-ppl:support2">325.983</prop><prop type="x- ppl:sales1">97.0736</prop><prop type="x-ppl:productLoc1">396.398</prop><prop type="x- ppl:legal1">617.876</prop><tuv xml:lang="EN-US"> <seg>Consistent feature set across multiple platforms (Windows, Mac, iOS, Android).</seg> </tuv> <tuv changedate="20140325T122530Z" changeid="serviceaaa" creationdate="20140325T122530Z" creationid="serviceaaa" lastusagedate="20140325T122530Z" usagecount="0" xml:lang="ES-XL"> <prop type="x-ALS:Context">TEXT</prop> <prop type="x-ALS:Source File">DATATC39720SRCEN-USco-02__battle-card_enco-02__battle-card_en.inx</prop> <seg>Conjunto de características coherente en varias plataformas (Windows, Mac, iOS, Android)</seg> </tuv> </tu>
  • 8. TOOLS: SOURCE CONTENT PROFILERIN CONJUNCTION with DCU/ADAPT
  • 9. TOOLS: STYLE SCORER COMBINES PPL RATIOS, DISSIMILARITY SCORE + CLASSIFICATION SCORE TEST CATEGORY TRAINING CATEGORY SCORE SUPPORT TECH DOC 3.16 TECH DOC TECH DOC 2.94 TECH DOC LEGAL .02
  • 10. WHY USE STYLE SCORER? • Identify similarity of source document to “gold standard” documents from that domain and other domains • Identify similarity of target document to “gold standard” documents from that domain and other domains • Example: Is this really a support document? To what degree is it similar to other support documents, tech doc documents, etc.? • Dissimilarity can point to worse quality for raw MT and/or reduced post-editing productivity
  • 11. STYLE SCORER + SCP • SCP helps classify a document • Style Scorer tells you how good a match a document is to a profile • SCP only works on English source • Style Scorer works on English source + non-English target
  • 12. Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Nine for Mortal Men doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them In the Land of Mordor where the Shadows lie. CASE STUDY ONE DOMAIN? One ring to rule them all
  • 13. CASE STUDY: HOW MANY DOMAINS? • Started with 6 Domains: Technical Documentation, Legal, Support, Training, Product UI, Sales/Marketing • Found that Technical Documentation, Support + Training were very similar based on LM scores against each other, Style Scorer, length of sentences, similar grammatical structures • Found that Product UI was close enough to above 3 that making a separate engine was not warranted • Found that Legal + Sales/Marketing were different enough from above domains and from each other based on LM scores against each other, Style Scorer + length of sentences
  • 14. CASE STUDY: GATHERING ASSETS TMs • Old • Somewhat recent • Current • Termbases in MultiTerm • Existing user dictionaries + normalization dictionaries • New user dictionaries based on term extractions + auto-import for some languages
  • 15. CASE STUDY: CURATING ASSETS • Cleaned TMs • Introduced metadata for PPL into TUs based on LM perplexity • Kept the UDs + normalization dictionaries as is • Additional term extraction for weak languages or languages with insufficient assets
  • 16. CASE STUDY: ENGINE ITERATIONS Based on options in Systran: • RBMT only • Hybrid with Stemming, LM Order, Distortion, etc. • SMT only
  • 17. HOW TO MEASURE SUCCESS • Automatic scores • Human evaluations • Some reduction in linguistic issues Forthcoming • Informed PE Distance • Productivity • Systematic reduction of linguistic issues
  • 18. CASE STUDY: AUTOMATIC SCORES FOR GENERAL/TECH DOC
  • 19. CASE STUDY: HUMAN EVALUATIONS FOR GENERAL/TECH DOC
  • 20. CASE STUDY: AUTOMATIC SCORES SALES/MARKETING1
  • 21. CASE STUDY: AUTOMATIC SCORES SALES/MARKETING2
  • 22. CASE STUDY: AUTOMATIC SCORES LEGAL1
  • 23. CASE STUDY: AUTOMATIC SCORES LEGAL2
  • 24. CASE STUDY: AUTOMATIC SCORES TECH DOC
  • 25. HOW TO IMPROVE Scoring • Linguistically-informed automatic scores: PE distance with different weights for POS • Productivity Engine • Eradicate high-frequency inconsistencies between TMs, Termbases + User Dictionaries (UDs) • Create domain-specific UDs • Send best reply: TMT Prime, send best translation irrespective of domain Source • Pre-MT source check: was this content properly categorized?
  • 26. SUMMARY • Domain-specific engines yield better results as evidenced by auto scores and human evaluations. Some of evidence of reduction of linguistic issues. • Group closely related content into one domain • Determine how many engines your infrastructure can support

Editor's Notes

  1. LM built using KenLM with ngram=5