Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
STATE OF THE
MACHINE TRANSLATION
by Intento

July 2018
July 2018© Intento, Inc.
About
At Intento, we make Cloud Cognitive AI easy to discover,
access, and evaluate for a specific...
July 2018© Intento, Inc.
Intento MT Gateway
- that’s how we run such evaluations
Vendor-agnostic
API
Sync and async
modes
...
July 2018© Intento, Inc.
Important highlights
Amazon and SAP went from preview to production
—
Amazon, Baidu, IBM, Microso...
July 2018© Intento, Inc.
Overview
1 TRANSLATION QUALITY
2 PRICING
3 LANGUAGE COVERAGE
4 HISTORICAL PROGRESS
5 CONCLUSIONS
...
July 2018© Intento, Inc.
Benchmark changes
since March 2018
Added 3 engines: ModernMT*, Alibaba**, Youdao**
—
Updated to n...
July 2018© Intento, Inc.
Machine Translation Engines*
Evaluated
* We have evaluated general purpose Cloud Machine Translat...
July 2018© Intento, Inc.
1Translation Quality
1.1 Evaluation Methodology
1.2 Available MT Quality
1.3 Top-Performing Engin...
July 2018© Intento, Inc.
Evaluation methodology (I)
Translation quality is evaluated by computing LEPOR score
between refe...
July 2018© Intento, Inc.
Evaluation methodology (II)
We judge that the MT quality of service A is better than that of
B fo...
July 2018© Intento, Inc.
LEPOR score
LEPOR: automatic machine translation evaluation metric
considering the enhanced Lengt...
July 2018© Intento, Inc.
48
Language
Pairs
* https://w3techs.com/technologies/overview/content_language/all
Language group...
July 2018© Intento, Inc.
Datasets
WMT-2013 (translation task, news domain)
en-es, es-en
WMT-2015 (translation task, news d...
July 2018© Intento, Inc.
We used 900 - 3000 sentences per language pair. The metric stabilizes and adding
more from the sa...
July 2018© Intento, Inc.
en ru ja de es fr pt it zh cs tr fi ro ko ar nl
en 2 6 3 6 4 5 5 4 2 3 1 2 1 2 1
ru 2 3 3 3 2
ja 4...
July 2018© Intento, Inc.
Sample pair analysis: English-Chinese
LEPOR

score Providers
Price range

(per 1M characters)
71 ...
July 2018© Intento, Inc.
optimal
Provides the lowest price
among the top 5% MT
engines for a language
pair
0
10
20
30
40
5...
July 2018© Intento, Inc.
en ru ja de es fr pt it zh cs tr fi ro ko ar nl
en
ru
ja
de
es
fr
pt
it
zh
cs
tr
fi
ro
ko
ar
nl
Bes...
July 2018© Intento, Inc.
en ru ja de es fr pt it zh cs tr fi ro ko ar nl
en
ru
ja
de
es
fr
pt
it
zh
cs
tr
fi
ro
ko
ar
nl
* C...
July 2018© Intento, Inc.
Price vs. Performance*
AFFORDABILITY
PERFORMANCE
As of March 2018
ACCURATE
NOT
PUB
LIC
COST-EFFEC...
July 2018© Intento, Inc.
2Public pricing
USD
per 1M
symbols
* +20% for some language pairs
** estimation based on 4.79 sym...
July 2018© Intento, Inc.
3Language Coverage
3.1 Supported and Unique per Provider
3.2 Coverage by Language Popularity
22
July 2018© Intento, Inc.
1
100
10000
G
oogle
Yandex
M
icrosoftN
M
TM
icrosoftSM
T
Baidu
Tencent
Systran
Systran
PN
M
T
PRO...
July 2018© Intento, Inc.
Language popularity
Language groups by
web popularity*:
P1 - ≥ 2.0% websites
P2 - 0.5%-2% website...
July 2018© Intento, Inc.
100% 100% 63%
31%
P1 P2 P3 P4
P1
P2
P3
P4
60%
100%
100%
100%
63%
100% 100%
100%
63%
63% 60%
99%
L...
July 2018© Intento, Inc.
Language coverage
by service provider
Google Cloud
Translation API
Yandex
Translate API
Microsoft...
July 2018© Intento, Inc.
4 Historical Progress
4.1 Number of Cloud MT Vendors
4.2 MT Quality
4.3 Performance/Price Efficien...
July 2018© Intento, Inc.
Independent Cloud MT Vendors
with pre-built models
Commercial
Alibaba, Amazon,
Baidu, DeepL,
Goog...
July 2018© Intento, Inc.
30 %
40 %
50 %
60 %
70 %
80 %
Jul 17 Nov 17 Mar 18 Jul 18
Best pair
Worst pair
1 1
Best available...
July 2018© Intento, Inc.
1
12
Best available
Performance/Price Efficiency
Efficiency =
(hLEPOR in %)² /
(USD per 1M
symbols)
...
July 2018© Intento, Inc.
5 Conclusions
Machine Translation quality and efficiency improves
monthly, but far from being idea...
July 2018© Intento, Inc.
Custom version
of this report
You may the evaluation for your project using
our vendor-agnostic A...
July 2018© Intento, Inc.
Evaluate vendors
on your own data
with no effort
—
up to +230% quality and
-87% price by choosing...
STATE OF THE
MACHINE TRANSLATION
by Intento (https://inten.to)

July 2018
Konstantin Savenkov
ks@inten.to
(415) 429-0021
2...
July 2018© Intento, Inc.
Appendix A
Overall performance of the MT services across many language
pairs is computed in the f...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

State of the Machine Translation by Intento (July 2018)

Download to read offline

Evaluation of 19 major Cloud Machine Translation Engines (Alibaba, Amazon, Baidu, DeepL, Google, GRCom, IBM SMT and NMT, Microsoft SMT and NMT, ModernMT, PROMT, SAP, SDL Language Cloud, Systran SMT and PNMT, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyse how the MT landscape changed over the last year.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

State of the Machine Translation by Intento (July 2018)

  1. 1. STATE OF THE MACHINE TRANSLATION by Intento July 2018
  2. 2. July 2018© Intento, Inc. About At Intento, we make Cloud Cognitive AI easy to discover, access, and evaluate for a specific use. — Evaluation is a pain for everyone: to compare different services, you have to sign a lot of contracts and integrate many APIs. — As we show in this report, the Machine Translation landscape is complex, with 4x difference in quality and 195x difference in price across pre-build models available from different vendors. — We deliver this overview report for FREE. To evaluate on your own dataset, reach us at hello@inten.to 2
  3. 3. July 2018© Intento, Inc. Intento MT Gateway - that’s how we run such evaluations Vendor-agnostic API Sync and async modes CLI tools and SDKs Works with files of any size Much faster due to hyper- threading Get your API key at inten.to 3
  4. 4. July 2018© Intento, Inc. Important highlights Amazon and SAP went from preview to production — Amazon, Baidu, IBM, Microsoft, and PROMT increased language coverage — For 7 language pairs, available MT quality raised more than 5% since Mar 2018: en-ko (▲25%), en-nl (▲11%), nl-en (▲14%), ru-de (▲8%), ja-fr (▲10%), en-cs (▲5%), en-tr (▲7%) (see slide 15) — For 13 language pairs, the best MT provider has changed since Mar 2018: en-zh, de-ru, ru-de, en-tr, en-pt, nl-en, en-nl, ja-en, zh-it, cs-en, en-cs, en- it, ru-en — To get the best quality across 48 language pairs, one needs 9 engines (see slide 18) 4
  5. 5. July 2018© Intento, Inc. Overview 1 TRANSLATION QUALITY 2 PRICING 3 LANGUAGE COVERAGE 4 HISTORICAL PROGRESS 5 CONCLUSIONS 48 Language Pairs 19 Machine Translation Engines 5
  6. 6. July 2018© Intento, Inc. Benchmark changes since March 2018 Added 3 engines: ModernMT*, Alibaba**, Youdao** — Updated to new versions: IBM (v3/NMT), Microsoft (v3/ NMT) — Updated SAP*** and Amazon from preview to public — Added detailed best and optimal engines chart (slides 18-19) — Added Pricing section (slide 21) * evaluated on one language pair (cost prohibitive) ** unavailable outside of China yet *** not evaluated (cost prohibitive & unstable) 6
  7. 7. July 2018© Intento, Inc. Machine Translation Engines* Evaluated * We have evaluated general purpose Cloud Machine Translation services with prebuilt translation models, provided via API. Some vendors also provide web-based, on-premise or custom MT engines, which may differ on all aspects from what we’ve evaluated. Alibaba Cloud Machine Translation Amazon Translate Baidu Translate API DeepL API Google Cloud Translation API GTCom YeeCloud MT IBM Watson NMT Language Translator IBM Watson SMT Language Translator Microsoft NMT Translator Text API Microsoft SMT Translator Text API ModernMT API PROMT Cloud API SAP Translation Hub SDL Language Cloud Translation Toolkit Systran PNMT Enterprise Server Systran REST Translation API Tencent Cloud TMT API (preview) Yandex Translate API Youdao Cloud Translation API 7
  8. 8. July 2018© Intento, Inc. 1Translation Quality 1.1 Evaluation Methodology 1.2 Available MT Quality 1.3 Top-Performing Engines 1.4 Best General-Purpose Engines 1.5 Optimal General-Purpose Engines 1.6 Price vs. Performance 8
  9. 9. July 2018© Intento, Inc. Evaluation methodology (I) Translation quality is evaluated by computing LEPOR score between reference translations and the MT output (Slide 11). — Currently, our goal is to evaluate the performance of translation between the most popular languages (Slide 12). — We use public datasets from StatMT/WMT, CASMACAT News Commentary and Tatoeba (Slide 13). — We have performed LEPOR metric convergence analysis to identify the minimal viable number of segments in the dataset. See Slide 14 for some details. 9
  10. 10. July 2018© Intento, Inc. Evaluation methodology (II) We judge that the MT quality of service A is better than that of B for the language pair C if: - mean LEPOR score of A is greater than LEPOR of B for the pair C, and - lower bound of the LEPOR 95% confidence interval of A is greater than the upper bound of the LEPOR confidence interval of B for the pair C. See Slide 14 for example. — Different language pairs (and different datasets) impose different translation complexity. To compare overall MT performance of different services, we regularize LEPOR scores across all language pairs (See Appendix A for more details). 10
  11. 11. July 2018© Intento, Inc. LEPOR score LEPOR: automatic machine translation evaluation metric considering the enhanced Length Penalty, n-gram Position difference Penalty and Recall — In our evaluation, we used hLEPORA v.3.1: — (best metric from ACL-WMT 2013 contest) https://www.slideshare.net/AaronHanLiFeng/lepor-an-augmented-machine-translation-evaluation-metric-thesis-ppt https://github.com/aaronlifenghan/aaron-project-lepor LIKE BLEU, BUT BETTER 11
  12. 12. July 2018© Intento, Inc. 48 Language Pairs * https://w3techs.com/technologies/overview/content_language/all Language groups by web popularity*: P1 - ≥ 2.0% websites P2 - 0.5%-2% websites P3 - 0.1-0.3% websites P4 - <0.1% websites — We focus on the en-P1, P1-en and P1-P1 (partially) en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ru ✓ ✓ ✓ ✓ ✓ ja ✓ ✓ ✓ de ✓ ✓ ✓ ✓ ✓ es ✓ ✓ fr ✓ ✓ ✓ ✓ pt ✓ it ✓ ✓ ✓ zh ✓ ✓ ✓ cs ✓ tr ✓ fi ✓ ro ✓ ko ✓ ar ✓ nl ✓ 12
  13. 13. July 2018© Intento, Inc. Datasets WMT-2013 (translation task, news domain) en-es, es-en WMT-2015 (translation task, news domain) fr-en, en-fr WMT-2016 (translation task, news domain) cs-en, en-cs, de-en, en-de, ro-en, en-ro, fi-en, en-fi, ru-en, en-ru, tr-en, en-tr WMT-2017 (translation task, news domain) zh-en, en-zh NewsCommentary-2011 en-ja, ja-en, en-pt, pt-en, en-it, it-en, ru-de, ru-es, ru-fr, ru-pt, ja-fr, de-ja, es-zh, fr- ru, fr-es, it-pt, zh-it, en-ar, ar-en, en-nl, nl-en, fr-de, de-fr, de-it, ja-zh, zh-ja Tatoeba en-ko, ko-en 13
  14. 14. July 2018© Intento, Inc. We used 900 - 3000 sentences per language pair. The metric stabilizes and adding more from the same domain won’t change the outcome. number of sentences regularisedhLEPORscores Aggregated across all language pairs Examples for individual language pairs: LEPOR Convergence Confi- dence interval Aggre- gated mean 14
  15. 15. July 2018© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en 2 6 3 6 4 5 5 4 2 3 1 2 1 2 1 ru 2 3 3 3 2 ja 4 2 4 de 5 3 3 4 4 es 5 3 fr 6 3 5 8 pt 5 it 8 2 5 zh 4 4 4 cs 4 tr 4 fi 2 ro 3 ko 1 ar 5 nl 1 $$ $$ Available MT Quality Maximal Available hLEPOR score: >80 % 70 % 60 % 50 % 40 % <40 % Minimal price for this quality, per 1M char*: $$$ ≥$20 $$ $10-15 $ <$10 No. of top-performing MT Providers** * base pricing tier ** up to 5% worse than the leader, SMT and NMT counted separately $$ $$ $$ $$ $$ $$$ $$ $$ $$ $$ $$ $ $$ $$$$$ $$ $ $ $$ $$ $$ $$ $ $$ $ $$$ $$ $$ $$ $$ $$ $$ $$$$ $$ $$ $ $$ $$$ $$ $$ $$ $$$ $ $$ 15
  16. 16. July 2018© Intento, Inc. Sample pair analysis: English-Chinese LEPOR score Providers Price range (per 1M characters) 71 % Tencent (preview) 70 % Google, GTCom $10-20 68 % Baidu $7 66.5 % Systran PNMT, Amazon $15-? 65 % Microsoft, IBM NMT $10-21.4 based on WMT-17 dataset BEST QUALITY: Tencent (preview) TOP 5%: Tencent, Google, GTCom, Baidu BEST PRICE IN TOP 5%: Baidu 16
  17. 17. July 2018© Intento, Inc. optimal Provides the lowest price among the top 5% MT engines for a language pair 0 10 20 30 40 50 google deepl am azon yandex ibm -nm t prom t m sft-nm t tencent ibm -sm t baidu systran-pnm tgtcom m sft-sm t sdl-sm t m odernm t across 48 language pairs* TOP Performing MT Providers best Provides the best MT Quality for a language pair top 5% Within 5% of the best available MT Quality for a language pair 17
  18. 18. July 2018© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ru ja de es fr pt it zh cs tr fi ro ko ar nl Best general- purpose MT engines MT Engines google deepl amazon yandex ibm-nmt promt msft-nmt ibm-smt tencent 18
  19. 19. July 2018© Intento, Inc. en ru ja de es fr pt it zh cs tr fi ro ko ar nl en ru ja de es fr pt it zh cs tr fi ro ko ar nl * Cheapest with a performance within 5% of the best available for this language pair Optimal* general- purpose MT engines MT Engines msft-nmt yandex msft-smt baidu google amazon ibm-nmt promt ibm-smt 19
  20. 20. July 2018© Intento, Inc. Price vs. Performance* AFFORDABILITY PERFORMANCE As of March 2018 ACCURATE NOT PUB LIC COST-EFFECTIVE Performance Regularized hLEPOR score aggregated across all language pairs in the dataset Affordability = 1/price Using public volume- based pricing tiers Legend • performance range: - regularized average - max across all pairs - min across all pairs • price range * only production-ready engines shown 20
  21. 21. July 2018© Intento, Inc. 2Public pricing USD per 1M symbols * +20% for some language pairs ** estimation based on 4.79 symbols per word 21
  22. 22. July 2018© Intento, Inc. 3Language Coverage 3.1 Supported and Unique per Provider 3.2 Coverage by Language Popularity 22
  23. 23. July 2018© Intento, Inc. 1 100 10000 G oogle Yandex M icrosoftN M TM icrosoftSM T Baidu Tencent Systran Systran PN M T PRO M T SDL Language C loud Youdao SAP M odernM T DeepL IBM N M T Am azon IBM SM T Alibaba G TC om 2 11 2 56 138 119 1 074 3 022 6 8 20 24 34 424447 72 104106110110 210 812 3 7823 660 8 556 10 712 Total Unique Supported and Unique Language Pairs Unique language pairs - supported exclusively by one provider 23
  24. 24. July 2018© Intento, Inc. Language popularity Language groups by web popularity*: P1 - ≥ 2.0% websites P2 - 0.5%-2% websites P3 - 0.1-0.3% websites P4 - <0.1% websites * https://w3techs.com/technologies/overview/content_language/all A total of 29070 pairs possible, 13098 are supported across all providers P1 en, ru, ja, de, es, fr, pt, it, zh P2 pl, fa, tr, nl, ko, cs, ar, vi, el, sv in, ro, hu P3 da, sk, fi, th, bg, he, lt, uk, hr, no, nb, sr, ca, sl, lv, et P4 hi, az, bs, ms, is, mk, bn, eu, ka, sq, gl, mn, kk, hy, se, uz, kr, ur, ta, nn, af, be, si, my, br, ne, sw, km, fil, ml, pa, … 24
  25. 25. July 2018© Intento, Inc. 100% 100% 63% 31% P1 P2 P3 P4 P1 P2 P3 P4 60% 100% 100% 100% 63% 100% 100% 100% 63% 63% 60% 99% Language coverage by popularity 45% of possible language pairs 25
  26. 26. July 2018© Intento, Inc. Language coverage by service provider Google Cloud Translation API Yandex Translate API Microsoft Translator Text API (SMT) Microsoft Translator Text API (NMT) Baidu Translate API Tencent Cloud TMT API (preview) Systran REST Translation API Systran PNMT Enterprise Server PROMT Cloud API SDL Language Cloud Translation Toolkit Youdao Cloud Translation API SAP Translation Hub ModernMT API DeepL API IBM Watson Language Translator (NMT) Amazon Translate IBM Watson Language Translator (SMT) Alibaba Translate GTCom YeeCloud MT 26
  27. 27. July 2018© Intento, Inc. 4 Historical Progress 4.1 Number of Cloud MT Vendors 4.2 MT Quality 4.3 Performance/Price Efficiency 27
  28. 28. July 2018© Intento, Inc. Independent Cloud MT Vendors with pre-built models Commercial Alibaba, Amazon, Baidu, DeepL, Google, GTCom, IBM, Microsoft, ModernMT, PROMT, SAP, SDL, Systran, Yandex, Youdao Preview Tencent 0 4 8 12 16 Jul 17 Nov 17 Mar 18 Jul 2018 Preview Commercial Intento, Inc. • July 2018 28
  29. 29. July 2018© Intento, Inc. 30 % 40 % 50 % 60 % 70 % 80 % Jul 17 Nov 17 Mar 18 Jul 18 Best pair Worst pair 1 1 Best available MT Quality Number of language pairs available at this level of LEPOR quality out of 14 pairs we evaluated since July 2017 (ru, de, cs, tr, fi, ro, zh to en and back) 8 4 2 7 4 2 Intento, Inc. • July 2018 7 4 2 2 7 4 1 29
  30. 30. July 2018© Intento, Inc. 1 12 Best available Performance/Price Efficiency Efficiency = (hLEPOR in %)² / (USD per 1M symbols) — Number of language pairs available at this level of efficiency out of 14 pairs we evaluated since July 2017 (ru, de, cs, tr, fi, ro, zh to en and back) 100 200 300 400 500 600 700 800 900 Jul 17 Nov 17 Mar 18 Jul 18 Best pair Worst pair 3 2 3 2 3 1 1 1 6 3 2 Intento, Inc. • July 2018 3 1 4 3 1 3 4 3 4 30
  31. 31. July 2018© Intento, Inc. 5 Conclusions Machine Translation quality and efficiency improves monthly, but far from being ideal, hence clever MT choice is a must. — In the same time, the MT landscape gets more fragmented as focus shifts from having the best algorithms to having the best data. — Even for the general domain, having the best quality across 48 language pairs requires 9 engines used simultaneously. 31
  32. 32. July 2018© Intento, Inc. Custom version of this report You may the evaluation for your project using our vendor-agnostic API and command-line tools. — Also we may help with translating your corpus via multiple vendors or handling the whole evaluation for your project. — Reach us at hello@inten.to 32
  33. 33. July 2018© Intento, Inc. Evaluate vendors on your own data with no effort — up to +230% quality and -87% price by choosing the right vendor — save 12wk of engineering and data science efforts Manage and optimise vendor portfolio with our smart routing AI — use the best vendor for each language pair and domain with no hassle Single integration and contract to multiple vendors and models —
 save upfront 5-7wk per each vendor API — save 1d per month per each vendor API Intento Single API
 routes requests to the best models Reach us for pricing and contract 33
  34. 34. STATE OF THE MACHINE TRANSLATION by Intento (https://inten.to) July 2018 Konstantin Savenkov ks@inten.to (415) 429-0021 2150 Shattuck Ave Berkeley CA 94705 34
  35. 35. July 2018© Intento, Inc. Appendix A Overall performance of the MT services across many language pairs is computed in the following way: 1. [Standardisation] We compute mean language-standardised LEPOR score (or z-score) for each provider. 2. [Scale adjustment] We restore the original scale by multiplying z-score for each MT provider by the global LEPOR standard deviation and adding the global mean LEPOR score. 35
  • LucasPestana

    Jun. 25, 2020
  • nicosanta

    Nov. 13, 2019
  • Ben_Davidson

    Apr. 15, 2019
  • zahernourredine

    Jan. 4, 2019
  • AliMazraeh

    Dec. 7, 2018
  • keesjandeelstra

    Nov. 10, 2018
  • negoryacheva

    Nov. 8, 2018
  • HiroshiHigashijima1

    Sep. 23, 2018
  • RichardVarga6

    Aug. 13, 2018
  • bhattisatish

    Aug. 4, 2018
  • LiborBeenyi

    Aug. 3, 2018
  • mopps

    Jul. 30, 2018

Evaluation of 19 major Cloud Machine Translation Engines (Alibaba, Amazon, Baidu, DeepL, Google, GRCom, IBM SMT and NMT, Microsoft SMT and NMT, ModernMT, PROMT, SAP, SDL Language Cloud, Systran SMT and PNMT, Tencent, Yandex, Youdao) for 48 language pairs: pricing, performance, quality, and language coverage. We also analyse how the MT landscape changed over the last year.

Views

Total views

28,415

On Slideshare

0

From embeds

0

Number of embeds

313

Actions

Downloads

172

Shares

0

Comments

0

Likes

12

×