Kantanfest: Dimitar Shterionov - Part 1

KantanNeural™ from A to Z
1/3: To NMT or not to NMT?
Dimitar Shterionov

The Rise of MT
1954 1966 1970 1982 1993 2003 2005 2016 2020
Quality of MT over time
Relativequality
Time
31/07/2017 KantanFest, Dublin, Ireland 2

Breakthrough in NeuralMT

Yet another MT paradigm?

Which technique is faster?
Which technique is better?
How can I integrate NMT in my pipeline?
How can I compare PBSMT and NMT?
How can I improve my NMT engine?
When to use PBSMT and when NMT?

Is NMT better than PBSMT???

Can NMT better than PBSMT???

 Various empirical evaluations
(since 2015)
…
Scientific Rigour – NMT vs PBSMT

 Experiment Setup
 Identical Training, Test and Tune Data
 NMT training limited to 4 days
 Evaluation:
 Automated Scores: F-Measure, TER, BLEU
 Ranking with KantanLQR™, A/B Testing
 Publications and Presentations
 EAMT 2017
 MT Summit 2017
 LocWorld34 NMT GALA Track

 A small parenthesis…
There are so many factors
 Learning algorithm and rate
 Number of epochs
 ANN properties
 Data – preprocessing, segmentation
you need the right data!

Training: Identical Corpora
Language Arc
Parallel
Sentences
TWC UWC Domain(s)
English->German 8,820,562 110,150,238 859,167 Legal/Medical
English->Chinese(Simplified) 6,522,064 84,426,931 956,864 Legal/Technical
English->Japanese 8,545,366 87,252,129 676,244 Legal/Technical
English->Italian 2,756,185 35,295,535 765,930 Medical
English->Spanish 3,681,332 44,917,538 952,089 Legal

Language Arc F-Measure BLEU TER Time F-Measure BLEU TER Perplexity Time
English->German 62.00% 54.08% 54.31% 18h 62.53% 47.53% 53.41% 3.02 92h
English->Chinese(Simplified) 77.16% 45.36% 46.85% 6h 71.85% 39.39% 47.01% 2.00 10h
English->Japanese 80.04% 63.27% 43.77% 9h 69.51% 40.55% 49.46% 1.89 68h
English->Italian 69.74% 56.98% 42.54% 8h 64.88% 42.00% 48.73% 2.70 83h
English->Spanish 71.53% 54.78% 41.87% 9h 69.41% 49.24% 44.89% 2.59 71h
SMT NMT
Training: Automated Scores
“In information theory, perplexity is a measurement of how well a
probability distribution or probability model predicts a sample. It may be
used to compare probability models. A low perplexity indicates the
probability distribution is good at predicting the sample.”

0
10
20
30
40
50
60
70
80
90
English->German English->Chinese(S) English->Japanese English->Italian English->Spanish
SMT-FM SMT-BLEU SMT-TER NMT-FM NMT-BLEU NMT-TER
SMT NMT

Alternative translations
Source
All dossiers must be individually analysed by the ministry responsible for the
economy and scientific policy.
Reference
Jeder Antrag wird von den Dienststellen des zuständigen Ministers für
Wirtschaft und Wissenschaftspolitik individuell geprüft.
PBSMT
Alle Unterlagen müssen einzeln analysiert werden von den Dienststellen des
zuständigen Ministers für Wirtschaft und Wissenschaftspolitik.
NMT
Alle Unterlagen müssen von dem für die Volkswirtschaft und die
wissenschaftliche Politik zuständigen Ministerium einzeln analysiert werden.
58%
0%
Source En este punto muestro mi desacuerdo con el informe.
Reference On this point, I am not in agreement with the report before us.
PBSMT At this point, I am not in agreement with the report.
NMT In this point I disagree with the report.
72%
7%
Source Debemos apoyarles a todos para que alcancen este objetivo.
Reference We must give them all our support to reach that goal.
PBSMT We must give them all our support to reach that goal.
NMT We have to support everyone to achieve this goal.
100%
0%
BLEU
EN→DEES→ENES→EN

Ranking
37
21
13
24
10
21
EN→ZH-CN EN→JA EN→DE EN→IT EN→ES AVERAGE
Average Scores from A/B Testing (in percent)
Same SMT NMT

Ranking
37
21
13
24
10
21
24
21
34
19
28
25.2
Same SMT NMT

Ranking
37
21
13
24
10
21
24
21
34
19
28
25.2
39
58
53
56
62
53.6
Same SMT NMT

BLEU underestimation of NMT
 Take the translations from the NMT engine
considered better than their PBSMT counterparts.
 How many of those are scored by BLEU lower than
their PBSMT counterparts?
 Do the same for the PBSMT translations.
EN→ZH-CN EN→JP EN→DE EN→IT EN→ES Average
NMT 40% 59% 55% 34% 53% 48%
PBSMT 12% 0% 9% 9% 0% 6%

Take-away messages…
 NMT is a new efficient paradigm for MT
 NMT does not solve the problem of language
 NMT can be much better than PBSMT
 Evaluating NMT:
 BLEU, TER, F-Measure may underestimate NMT
when compared to PBSMT
 Using KantanLQR™ (A/B Testing) facilitates MT ranking

Take-away messages…
 NMT is a new efficient paradigm for MT
 NMT does not solve the problem of language … but it is getting there
 NMT can be much better than PBSMT
 Evaluating NMT:
 BLEU, TER, F-Measure may underestimate NMT
when compared to PBSMT
 Using KantanLQR™ (A/B Testing) facilitates MT ranking
To NMT or not to NMT?

Quality Evaluation
Thank you…

Kantanfest: Dimitar Shterionov - Part 1

Recommended

Recommended

More Related Content

Similar to Kantanfest: Dimitar Shterionov - Part 1

Similar to Kantanfest: Dimitar Shterionov - Part 1 (20)

More from kantanmt

More from kantanmt (20)

Recently uploaded

Recently uploaded (20)

Kantanfest: Dimitar Shterionov - Part 1

Editor's Notes