#5 Predicting Machine Translation Quality

Predicting
machine translation quality

I am @bittlingmayer.
My company is @SignalNLabs
interests: translation quality, translation crowdsourcing,
transliteration, browser translation integrations, topic
classification, automatic source-side correction
previously @Google, @Adobe, @Cerner
Ciao!

Today’s topics
◉ Why translation quality?
◉ What is the problem?
◉ Our data model
◉ Our learning infra

Quality estimation?
sentence-level quality
good machine translation vs bad
1

Quality evaluation?
corpus-level quality given reference translations
machine translation vs human translation
2

Why quality?
Why is predicting quality useful?

Machine translation
should not be a gamble.

$4.50
1M chars by machine
Optimisation Function
$10000
1M chars at 5¢/word by human

Perfect Prediction == Perfect Translation
translator
predictor
reward [scores, rankings]
state
action [translations]
Reinforcement Learning

What’s the problem?
Is it really harder than self-driving cars?

Payoff
What is solvable?
Effort
bad input
50% of errors
context/customisation
like a human
like Search, FB, Maps...source-side ambiguity
ideally interactive
bad output

What is quality?
Can we quantify the quality of a translation?

Accuracy
What is sentence-level quality?
Fluency
Low Quality
Good Enough
Misleading
Human Quality

Recall vs Precision vs Accuracy
actual bad
predicted bad

Trivial 90% Accuracy Example
actual bad
predicted bad: 100%

How does quality vary?
to English to top languages to other
from English
from top languages
from other

How does quality vary?
Wikipedia
news
dialogues, film subtitles, Coursera, Medium
“everyday” reviews, customer service
your children’s WhatsApp messages
my WhatsApp messages

How do we solve it?
With data and features

What is our data model?
source target score
en-zh Hello 您好 1.0
en-zh The car is driving. The car is driving. 0.0
en-ru The car is driving. Автомобиль вождения. 0.3
... ... ... ...

What is our data model?
source target src_length_bytes ... trg_spam_prob score
en-zh Hello 您好 5 ... 0.5 1.0
en-zh The car is driving. The car is driving. 19 ... 0.2 0.0
en-ru The car is driving. Автомобиль вождения. 19 ... 0.1 0.3
... ... ... ... ... ... ...

10-1000 features
signals engineered by us
1000-10M rows
sentences* hand-scored by linguists
language-agnostic
Language is just another feature.

Human scores
Evaluate many translations by hand

Human Evaluation Score Types
Labels
good/bad
multilabels
word-level labels
Ranking
rank multiple systems
Post-Edit
to comprehensible
to human quality

Human Evaluation Score Types
Labels
good/bad
0.0-1.0
multilabels
word-level labels
Ranking
rank multiple systems
Post-Edit
to comprehensible
to human quality
requires smaller dataset and budget

QuEst baseline features
quest.dcs.shef.ac.uk/quest_files/features_blackbox_baseline_17

number of tokens in the source sentence
number of tokens in the target sentence
average source token length
LM probability of source sentence
LM probability of target sentence
number of occurrences of the target word within the target hypothesis (averaged for all words in the hypothesis -
type/token ratio)
average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob
(t|s) > 0.2)
average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob
(t|s) > 0.01) weighted by the inverse frequency of each word in the source corpus
percentage of unigrams in quartile 1 of frequency (lower frequency words) in a corpus of the source language (SMT
training corpus)
percentage of unigrams in quartile 4 of frequency (higher frequency words) in a corpus of the source language
percentage of bigrams in quartile 1 of frequency of source words in a corpus of the source language
percentage of bigrams in quartile 4 of frequency of source words in a corpus of the source language
percentage of trigrams in quartile 1 of frequency of source words in a corpus of the source language
percentage of trigrams in quartile 4 of frequency of source words in a corpus of the source language
percentage of unigrams in the source sentence seen in a corpus (SMT training corpus)
number of punctuation marks in the source sentence
number of punctuation marks in the target sentence

number of tokens
length
LM probability
number of occurrences of the target word within the target hypothesis
average number of translations per source word in the sentence
…
percentage of unigrams in quartile 1 of frequency (lower frequency words)
…
percentage of unigrams in quartile n of frequency (higher frequency words)
…
percentage of trigrams in quartile 1 of frequency of source words
…
percentage of trigrams in quartile n of frequency of source words
number of punctuation marks

vot tak narod ho4et napisat'
Возможно, вы имели в виду: вот так народ хочет написать

human vot tak narod ho4et napisat' vot tak narod ho4et napisat'
search вот так народ хочет написать That's how people want to write
translation Вот так народ хочет написать. So people want to write.

Google Microsoft Wiktionary ...
Merry Christmas Krismasi! Krismasi Njema! heri ya Krismasi
Krismasi njema
...
eat apples kula mapera kula apples ∅ ...

lexical signals
sygnały leksykalne

parse tree to sequence conversion

50-99+% accuracy
Depends on the benchmark! ;-)
1000-10M rows
10-1000 features

Can we use parallel corpora?
target
Onartutako gertaerak
Aholkuak eta iradokizunak
Etorkizuneko egitasmoei buruz galdetzea
onespena eskatzea
Laguntza eskatzea
Jende galdetzea itxaron
Norbait iritzia eskatzea
Etorkizunari Garrantzia
emanez informazio saihestea
Bad pertsona
…
…
...
Aditu batek ingelesez izatea
Being Lucky
zaharra izatea
pobrea izatea
ari irekietan
aberatsa izatea
Ziur izatea / zenbait
ari kezkaturik
Aspergarria!
Your Mind aldatzeak
Pertsonak txaloak Up
source
받아 들여지는 사실
조언 및 제안
향후 계획에 대해 물어
승인 요청
도움을 요청
사람을 요구하는 대기
누군가의 의견을 물어
미래에 대한 태도
제공 정보 방지
나쁜 사람들
…
…
...
영어 전문가 인
존재 럭키
오래 되
가난
안심되는
부자가되는
확인 인 / 특정
걱정되는
지루한!
당신의 마음을 변경
사람을 응원합니다
score
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
…
…
...
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0
1.0

What is our learning infra?
H2O.ai deeplearning

Why doesn’t
deep learning
work for
translation?

Want to learn more?
The real experts
◉ Dr. Lucia Specia
◉ quest.dcs.shef.ac.uk
◉ statmt.org/wmt15/quality-estimation-task.html
ACL 2016 will be held in Berlin in August.
Reading

Any questions ?
You can find me at
◉ @bittlingmayer
◉ adam@signaln.com
Thanks!

#5 Predicting Machine Translation Quality

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Viewers also liked

Viewers also liked (20)

Similar to #5 Predicting Machine Translation Quality

Similar to #5 Predicting Machine Translation Quality (20)

Recently uploaded

Recently uploaded (20)

#5 Predicting Machine Translation Quality