The Web, Database and Neural NMT Comparison

•Download as PPTX, PDF•

0 likes•81 views

1) The document discusses Pangeanic's new neural machine translation (NMT) technology called PangeaMT Neural and ActivaTM and compares its performance to their previous statistical machine translation (SMT) system. 2) Experimental results on English to Japanese translation show that NMT outperforms SMT in BLEU, TER, and WER scores, especially for shorter sentences between 0-25 words, producing translations requiring less post-editing effort. 3) Additional testing of NMT on other language pairs like English to French and Russian also showed superior results compared to SMT, with translators rating 85-90% of NMT translations as good or very good quality versus only 50-60

Technology

The Web, The Database and
The Neural
Garth Hedenskog, Sales Director
Pangeanic TAUS Girona, 13 June 2017

• National research project CDTI
• Workflow system with built in crawler
• PM-less track and workflows initiation
• Powerful tool with incorporated with
Pangeanic’s new technology – ActivaTM and
PangeaMT Neural

ELASTIC CENTRALIZED TM SYSTEM
• FEATURES:
• CAT tool agnostic
• Cor integratable
• Hosting options
• Tag handling capabilities
• API to NMT
• Triangulation
…summary

Our story……
• First translation company in the world to make commercial use of
Moses.
• Wins a post-editing contract in 2007 to work for the European
Commission as MT output post-editors.
THAT WAS THEN, THIS IS NOW
• Pangeanic’s consortium, along with KantanMT, Prompsit and Tilde,
was awarded the largest EU contract by CEF (Connecting Europe
Facility) to supply infrastructure services to the European Union in
the field of Digital Service Infrastructures, and particularly machine
translation. (IADAATPA (Intelligent, Automatic Domain Adapted
Automated Translation for Public Administrations)

Training Corpus
Sentences Running
words
Vocabulary
EN 4,6M 55,9M 491,6K
JA 4,6M 76,0M 283,8K
Dev corpus
Sentences Running
words
OOVs
EN 1,9K 24,1K 1,32
JA 1,9K 32,7K 0,86
Test corpus
Sentences Running
words
OOVs Average length
in characters
Average
number tokens
EN 2K 27,1K 1,80 77 14,12
JA 2K 37,0K 1,14 59 19,08
Training data:
• TAUS data for Electronics Computer Hardware (ECH) plus SOFT (IT) 4,6M sentences / 56M words (EN)
• EN and JA tokenized (tokenizer.perl and Mecab respectively)
BLEU TER WER
PangeaMT 43,25 0,493174 0,607223
NMT 44,53 0,422858 0,473214
Seemingly…. Not such a big difference
Results EN->JP :

0-10 words 11-15 words 16-20 words 21-25 words 26-30 words 31+ words
BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER
Pangea
MT
44,00 0,428
65
0,471
268
42,80 0,465
28
0,591
708
41,08 0,485
096
0,617
126
39,95 0,491
183
0,649
891
39,08 0,539
768
0,693
745
35,38 0,565
217
0,713
226
NMT 40,59 0,398
68
0,414
078
46,00 0,353
941
0,393
642
43,43 0,392
998
0,443
898
42,04 0,407
965
0,476
323
39,86 0,461
081
0,529
578
35,65 0,561
833
0,630
695
Results EN->JP by length:
• In shorter sentences (0-10 words), our SMT system scores better results in BLEU, but if we take a look to the
TER and WER, we see that in character and word level, NMT has better results which means less post edition
efforts.In sentences (11-25 words), NMT always gets better results in BLEU, WER and TER.
• In longer sentences (26++), NMT tends to have same results than PangeaMT.
BLEU TER WER

Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT (results available in our blog)
On average: from a set of (random) 250 sentences, around 85% - 90%, were good or very good (A or B). ES/PT/IT
results similar to FR
Evaluation: Translation companies and professional freelance translators
EN-DE set of 250 sentences
NMT SMT
A 132 53% 34 14%
B 98 39% 95 38%
C 14 6% 97 39%
D 6 2% 24 10%
250 250
EN-FR set of 250 sentences
NMT SMT
A 150 60% 39 16%
B 76 30% 126 50%
C 21 8% 71 28%
D 3 1% 14 6%
250 250
EN-RU set of 250 sentences
NMT SMT
A 128 51% 39 16%
B 84 34% 43 17%
C 22 9% 60 24%
D 16 6% 108 43%
250 250
EN-JP set of 250 sentences
NMT SMT
A 83 33% 17 7%
B 71 28% 14 6%
C 56 22% 95 38%
D 40 16% 124 50%
250 250

•Conclusion
•NN does not produce miracles yet but the innitial results are very exciting.
•The shift is remarkable in all languages especially JP which has moved away from
the usual average to bad results to a great leap to pretty acceptable quality
Thank you!
garth@pangeanic.com

Similar to The Web, Database and Neural NMT Comparison

Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig

TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS - The Language Data Network

SDL BeGlobal The SDL Platform for Automated TranslationSDL Trados

State of the Machine Translation by Intento (November 2017)Konstantin Savenkov

New Breakthroughs in Machine Transation Technologykantanmt

Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz

Methods for Handling Terminology in Machine TranslationKerstin Berns

Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopAssociation for Computational Linguistics

CAT TOOLS.pptKevin464343

What is machine translationStephen Peacock

Tms days 04 2012 manuel herranz pangea mtManuel Herranz

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS - The Language Data Network

Maximising Machine Translation Return on Investment (KantanMT/Medialocate)kantanmt

A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHIRJET Journal

Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar

TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMTTAUS - The Language Data Network

Similar to The Web, Database and Neural NMT Comparison (20)

Gestión proyectos traducción en la Universitat Autònoma de Barcelona

Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...

TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...

SDL BeGlobal The SDL Platform for Automated Translation

State of the Machine Translation by Intento (November 2017)

New Breakthroughs in Machine Transation Technology

Pangeanic presentation at Elia Together Athens - Manuel Herranz

Methods for Handling Terminology in Machine Translation

Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop

CAT TOOLS.ppt

What is machine translation

Tms days 04 2012 manuel herranz pangea mt

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...

Maximising Machine Translation Return on Investment (KantanMT/Medialocate)

A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH

Real-time DirectTranslation System for Sinhala and Tamil Languages.

TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMT

Recently uploaded

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko

Build your next Gen AI Breakthrough - April 2024Neo4j

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Key Features Of Token Development (1).pptxLBM Solutions

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

The transition to renewables in India.pdfCompetition Advisory Services (India) LLP

Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Recently uploaded (20)

SQL Database Design For Developers at php[tek] 2024

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

Build your next Gen AI Breakthrough - April 2024

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Key Features Of Token Development (1).pptx

Pigging Solutions in Pet Food Manufacturing

Unblocking The Main Thread Solving ANRs and Frozen Frames

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Streamlining Python Development: A Guide to a Modern Project Setup

Designing IA for AI - Information Architecture Conference 2024

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Connect Wave/ connectwave Pitch Deck Presentation

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

The transition to renewables in India.pdf

Unlocking the Potential of the Cloud for IBM Power Systems

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

My Hashitalk Indonesia April 2024 Presentation

Human Factors of XR: Using Human Factors to Design XR Systems

The Web, Database and Neural NMT Comparison

1. The Web, The Database and The Neural Garth Hedenskog, Sales Director Pangeanic TAUS Girona, 13 June 2017

2. • National research project CDTI • Workflow system with built in crawler • PM-less track and workflows initiation • Powerful tool with incorporated with Pangeanic’s new technology – ActivaTM and PangeaMT Neural

3. ELASTIC CENTRALIZED TM SYSTEM • FEATURES: • CAT tool agnostic • Cor integratable • Hosting options • Tag handling capabilities • API to NMT • Triangulation …summary

4. Our story…… • First translation company in the world to make commercial use of Moses. • Wins a post-editing contract in 2007 to work for the European Commission as MT output post-editors. THAT WAS THEN, THIS IS NOW • Pangeanic’s consortium, along with KantanMT, Prompsit and Tilde, was awarded the largest EU contract by CEF (Connecting Europe Facility) to supply infrastructure services to the European Union in the field of Digital Service Infrastructures, and particularly machine translation. (IADAATPA (Intelligent, Automatic Domain Adapted Automated Translation for Public Administrations)

5. Training Corpus Sentences Running words Vocabulary EN 4,6M 55,9M 491,6K JA 4,6M 76,0M 283,8K Dev corpus Sentences Running words OOVs EN 1,9K 24,1K 1,32 JA 1,9K 32,7K 0,86 Test corpus Sentences Running words OOVs Average length in characters Average number tokens EN 2K 27,1K 1,80 77 14,12 JA 2K 37,0K 1,14 59 19,08 Training data: • TAUS data for Electronics Computer Hardware (ECH) plus SOFT (IT) 4,6M sentences / 56M words (EN) • EN and JA tokenized (tokenizer.perl and Mecab respectively) BLEU TER WER PangeaMT 43,25 0,493174 0,607223 NMT 44,53 0,422858 0,473214 Seemingly…. Not such a big difference Results EN->JP :

6. 0-10 words 11-15 words 16-20 words 21-25 words 26-30 words 31+ words BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER Pangea MT 44,00 0,428 65 0,471 268 42,80 0,465 28 0,591 708 41,08 0,485 096 0,617 126 39,95 0,491 183 0,649 891 39,08 0,539 768 0,693 745 35,38 0,565 217 0,713 226 NMT 40,59 0,398 68 0,414 078 46,00 0,353 941 0,393 642 43,43 0,392 998 0,443 898 42,04 0,407 965 0,476 323 39,86 0,461 081 0,529 578 35,65 0,561 833 0,630 695 Results EN->JP by length: • In shorter sentences (0-10 words), our SMT system scores better results in BLEU, but if we take a look to the TER and WER, we see that in character and word level, NMT has better results which means less post edition efforts.In sentences (11-25 words), NMT always gets better results in BLEU, WER and TER. • In longer sentences (26++), NMT tends to have same results than PangeaMT. BLEU TER WER

7. Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT (results available in our blog) On average: from a set of (random) 250 sentences, around 85% - 90%, were good or very good (A or B). ES/PT/IT results similar to FR Evaluation: Translation companies and professional freelance translators EN-DE set of 250 sentences NMT SMT A 132 53% 34 14% B 98 39% 95 38% C 14 6% 97 39% D 6 2% 24 10% 250 250 EN-FR set of 250 sentences NMT SMT A 150 60% 39 16% B 76 30% 126 50% C 21 8% 71 28% D 3 1% 14 6% 250 250 EN-RU set of 250 sentences NMT SMT A 128 51% 39 16% B 84 34% 43 17% C 22 9% 60 24% D 16 6% 108 43% 250 250 EN-JP set of 250 sentences NMT SMT A 83 33% 17 7% B 71 28% 14 6% C 56 22% 95 38% D 40 16% 124 50% 250 250

8. •Conclusion •NN does not produce miracles yet but the innitial results are very exciting. •The shift is remarkable in all languages especially JP which has moved away from the usual average to bad results to a great leap to pretty acceptable quality Thank you! garth@pangeanic.com

The Web, Database and Neural NMT Comparison

Recommended

Recommended

More Related Content

Similar to The Web, Database and Neural NMT Comparison

Similar to The Web, Database and Neural NMT Comparison (20)

Recently uploaded

Recently uploaded (20)

The Web, Database and Neural NMT Comparison