This document discusses an approach to improving machine translation quality through automated analysis of source texts, translation candidates, and post-edited outputs. It involves profiling source content, evaluating translation candidates for perplexity and style, normalizing user-generated content, checking numbers and dates, and scoring post-edits for consistency. Metrics collected at each stage then feed back into the system to refine machine translation and better support the post-editing process.
2. Background
• MT is here to stay
– Better MT = less PE effort = higher throughput
for less money
• MT quality depends on training data
quantity, quality, and relevance
– Selecting in-domain data increases BLEU
scores by 10-20 BLEU over generic engines
• LSPs have less control over quantity, so
we need to focus on quality & relevance
3. A data-driven approach
• Analytics at each step
Training
MT Production
•
•
Perplexity Evaluator
•
Candidate Scorer
•
Source Content
Profiler (joint project
w/CNGL)
•
StyleScorer
•
Number checking
StyleScorer
•
UGC Normalization
Post-Editing
•
WeScore
•
StyleScorer
4. Candidate Scorer
• Uses corpus of known “difficult” text
• Compares part of speech (POS) n-grams
– Generates per-sentence scores
5.
6. Perplexity (PPL) Evaluator
• Build language models (LMs) from
multiple corpora
– Known “good” sentences for MT
– Known “bad” sentences for MT
– Client-specific in-domain data
• Each document gets a PPL score
against each LM
7.
8. StyleScorer
• Combines PPL ratio, dissimilarity
score, and classification score
– Each document receives a score from 0-4
– Higher score indicates better match to
style established by client’s documents
– Does not require parallel data
• Source scored for training/tuning
suitability
9. Source Content Profiler
• CNGL project (beta)
– Classification of docs into profiles
– Features based on:
•
•
•
•
•
•
•
Word & sent. length
Readability score
Syntactic structure
Terminology
Tag ratios
Do Not Translate lists
Glossary matches
11. A data-driven approach
• Analytics at each step
Training
MT Production
•
•
Perplexity Evaluator
•
Candidate Scorer
•
Source Content
Profiler (joint project
w/CNGL)
•
StyleScorer
•
Number checking
StyleScorer
•
UGC Normalization
Post-Editing
•
WeScore
•
StyleScorer
12. UGC normalization
• Make substitutions in source for known
MT pain points before translating
– Frequent misspellings – “teh”, “mroe”, etc.
– Abbreviations – “imho”, “tyvm”, etc.
– Missing punctuation – “cant”, “theyll”, etc.
– Emoticons
– Spelling variants/slang – “cuz”, “usu”, etc.
13. Number checking
• Verify that numeric MT output is
localized correctly
– Currency – “$1B” vs “1 млрд. $”
– Dates – “2/28/2014” vs “28/2/2014”
– Time – “2pm” vs “14h00”
– Separator & radix – “1,234.5” vs “1 234,5”
14. StyleScorer revisited
• MT output is compared to client’s
historical (in-domain) PE data
– Treat each target segment as a
document
– Lower scores indicate segments likely to
require greater PE effort
15. A data-driven approach
• Analytics at each step
Training
MT Production
•
•
Perplexity Evaluator
•
Candidate Scorer
•
Source Content
Profiler (joint project
w/CNGL)
•
StyleScorer
•
Number checking
StyleScorer
•
UGC Normalization
Post-Editing
•
WeScore
•
StyleScorer
16. WeScore
• Dashboard for viewing MT metrics
– Tokenizes input from variety of formats &
runs several scoring algorithms in parallel
– Exports detailed analysis to spreadsheet
for sentence-by-sentence review
19. StyleScorer III
• PE output is compared to client’s
historical (in-domain) data
– Treat each PE segment as a document
– Lower score indicates possible deviation
from established style
20. Feedback loop
• Data collected and lessons learned
– Update client-specific data for future
engine training
– Mine data for generalizable patterns in
problem areas
– Work with post-editors to understand how
to make a better system & how to
improve PE experience and throughput