Pangeanic Taus Presentation 13.06.17

•Download as PPTX, PDF•

0 likes•81 views

1) The document discusses Pangeanic's new neural machine translation (NMT) technology called PangeaMT Neural and ActivaTM and compares its performance to their previous statistical machine translation (SMT) system. 2) Experimental results on English to Japanese translation show that NMT outperforms SMT in BLEU, TER, and WER scores, especially for shorter sentences between 0-25 words, producing translations requiring less post-editing effort. 3) Additional testing of NMT on other language pairs like English to French and Russian also showed superior results compared to SMT, with translators rating 85-90% of NMT translations as good or very good quality versus only 50-60

The Web, The Database and
The Neural
Garth Hedenskog, Sales Director
Pangeanic TAUS Girona, 13 June 2017

• National research project CDTI
• Workflow system with built in crawler
• PM-less track and workflows initiation
• Powerful tool with incorporated with
Pangeanic’s new technology – ActivaTM and
PangeaMT Neural

ELASTIC CENTRALIZED TM SYSTEM
• FEATURES:
• CAT tool agnostic
• Cor integratable
• Hosting options
• Tag handling capabilities
• API to NMT
• Triangulation
…summary

Our story……
• First translation company in the world to make commercial use of
Moses.
• Wins a post-editing contract in 2007 to work for the European
Commission as MT output post-editors.
THAT WAS THEN, THIS IS NOW
• Pangeanic’s consortium, along with KantanMT, Prompsit and Tilde,
was awarded the largest EU contract by CEF (Connecting Europe
Facility) to supply infrastructure services to the European Union in
the field of Digital Service Infrastructures, and particularly machine
translation. (IADAATPA (Intelligent, Automatic Domain Adapted
Automated Translation for Public Administrations)

Training Corpus
Sentences Running
words
Vocabulary
EN 4,6M 55,9M 491,6K
JA 4,6M 76,0M 283,8K
Dev corpus
Sentences Running
words
OOVs
EN 1,9K 24,1K 1,32
JA 1,9K 32,7K 0,86
Test corpus
Sentences Running
words
OOVs Average length
in characters
Average
number tokens
EN 2K 27,1K 1,80 77 14,12
JA 2K 37,0K 1,14 59 19,08
Training data:
• TAUS data for Electronics Computer Hardware (ECH) plus SOFT (IT) 4,6M sentences / 56M words (EN)
• EN and JA tokenized (tokenizer.perl and Mecab respectively)
BLEU TER WER
PangeaMT 43,25 0,493174 0,607223
NMT 44,53 0,422858 0,473214
Seemingly…. Not such a big difference
Results EN->JP :

0-10 words 11-15 words 16-20 words 21-25 words 26-30 words 31+ words
BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER
Pangea
MT
44,00 0,428
65
0,471
268
42,80 0,465
28
0,591
708
41,08 0,485
096
0,617
126
39,95 0,491
183
0,649
891
39,08 0,539
768
0,693
745
35,38 0,565
217
0,713
226
NMT 40,59 0,398
68
0,414
078
46,00 0,353
941
0,393
642
43,43 0,392
998
0,443
898
42,04 0,407
965
0,476
323
39,86 0,461
081
0,529
578
35,65 0,561
833
0,630
695
Results EN->JP by length:
• In shorter sentences (0-10 words), our SMT system scores better results in BLEU, but if we take a look to the
TER and WER, we see that in character and word level, NMT has better results which means less post edition
efforts.In sentences (11-25 words), NMT always gets better results in BLEU, WER and TER.
• In longer sentences (26++), NMT tends to have same results than PangeaMT.
BLEU TER WER

Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT (results available in our blog)
On average: from a set of (random) 250 sentences, around 85% - 90%, were good or very good (A or B). ES/PT/IT
results similar to FR
Evaluation: Translation companies and professional freelance translators
EN-DE set of 250 sentences
NMT SMT
A 132 53% 34 14%
B 98 39% 95 38%
C 14 6% 97 39%
D 6 2% 24 10%
250 250
EN-FR set of 250 sentences
NMT SMT
A 150 60% 39 16%
B 76 30% 126 50%
C 21 8% 71 28%
D 3 1% 14 6%
250 250
EN-RU set of 250 sentences
NMT SMT
A 128 51% 39 16%
B 84 34% 43 17%
C 22 9% 60 24%
D 16 6% 108 43%
250 250
EN-JP set of 250 sentences
NMT SMT
A 83 33% 17 7%
B 71 28% 14 6%
C 56 22% 95 38%
D 40 16% 124 50%
250 250

•Conclusion
•NN does not produce miracles yet but the innitial results are very exciting.
•The shift is remarkable in all languages especially JP which has moved away from
the usual average to bad results to a great leap to pretty acceptable quality
Thank you!
garth@pangeanic.com

Our statistical machine translation platform and hybrid features were presented at the European Commission offices in Luxembourg last Tuesday 22nd September. It is one of the tools that the European Union will consider, among other machine translation commercial solutions, as a tool to help its mandate for CEF (Connecting Europe Facility). Pangeanic’s CEO, Manuel Herranz, presented the current state-of-the-art that PangeaMT version 3 represents. Representatives from the EU were particularly interested in the solid data management features, machine translation engine retraining routines, data cleaning and automated engine training and creation features. One of key features with the new PangeaMT version is the possibility to change translation algorithms and use rule-based systems like Apertium and Thot as well as the default Moses. It is also compatible with 3rd-party calls from other systems. Its powerful API can also provide machine translated output to requests anywhere in the world, although the platform is designed for onsite use at translation companies and organizations. PangeaMT is also compatible with several popular translation formats like ttx, sdlxliff, memoq, memsource, and most xml-based Tikal formats.

Lexcelera MT Breaking Compromises

LoriThicke

KantanFest: Andy Way

kantanmt

IRJET- Speech to Speech Translation System

IRJET Journal

Gestión proyectos traducción en la Universitat Autònoma de Barcelona

Manuel Herranz

Panelists: Yoshiyasu Yamakawa (Intel), JP Barraza (Systran), Konstantin Dranch (Memsource), David Koot (TAUS) The focus of this session will be on predictions and risk management. What kind of things can you predict and how can you manage risks by by analyzing your translation data or monitoring your productivity and quality. Tracking translation data in different cycles of the translation process (translation, post-editing, review, proof-reading) offers tremendous value when it comes to predicting future trends or making informed choices. What type of data can be valuable and what kind of predictions can we make using this data? How can we make more efficient use of already available data? How can we use this type of data to improve machine translation, automatic QA, error-recognition, sampling or quality estimation? How can academia and industry work together towards a common goal?

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...

ijnlc

With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints.

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...

kevig

ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...

kevig

TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...

TAUS - The Language Data Network

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.   MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.      For the latest updates go to http://www.statmt.org/mosescore/ or follow us on Twitter - #MosesCore

SDL BeGlobal The SDL Platform for Automated Translation

SDL Trados

State of the Machine Translation by Intento (November 2017)

Konstantin Savenkov

New Breakthroughs in Machine Transation Technology

kantanmt

Tony O’Dowd takes us through some of the most innovative technologies offered on the KantanMT.com platform which are helping a growing community of KantanMT users to develop and self-manage custom Machine Translation engines in the cloud. Maxim Khalilov then illustrates bmmt’s journey with Machine Translation on KantanMT. He discusses what they have achieved so far in terms of MT engine development and showcases the value that his team is bringing to their growing international client base through the use of Machine Translation.

Pangeanic presentation at Elia Together Athens - Manuel Herranz

Manuel Herranz

Methods for Handling Terminology in Machine Translation

Kerstin Berns

Im Vortrag werden Möglichkeiten und Vor- und Nachteile verschiedener MÜ-Lösungen in der SDL-Language-Cloud vorgestellt. Besonderes Interesse weckt die sogenannte Adaptive MT, eine spezieller MÜ-System-Typ, welcher durch kontinuierliche Korrekturen bzw. nutzerspezifische Anpassungen von MÜ-Vorschlägen lernt, indem die Post-Edits des Nutzers zur Optimierung der Engine benutzt werden. Eine Technik, die auch im Rahmen der neuralen maschinellen Übersetzung bei SDL noch eine wichtige Rolle spielen wird. Veranstaltung: ETUG 2017, Nürnberg

Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop

Association for Computational Linguistics

CAT TOOLS.ppt

Kevin464343

What is machine translation

Stephen Peacock

Tms days 04 2012 manuel herranz pangea mt

Manuel Herranz

TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...

TAUS - The Language Data Network

Maximising Machine Translation Return on Investment (KantanMT/Medialocate)kantanmt

A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH

IRJET Journal

Real-time DirectTranslation System for Sinhala and Tamil Languages.

Sheeyam Shellvacumar

TAUS MT Showcase 2014, Enabling MT for the Everyone! Tony O’Dowd, KantanMT

TAUS - The Language Data Network

Working with MOSES and building high quality MT systems is not for the faint hearted. It requires a wide range of technical and linguistic based knowledge that is often difficult to find and develop within organisations. Consequently, only the biggest organisations have the financial muscle to invest and reap the awards of MT. This puts the small-to-medium sized organisations at a distinct disadvantage. KantanMT changes everything! KantanMT is a cloud-based implementation of MOSES which enables SMEs to embrace the advantages of MT - quickly and economically. This presentation will demonstrate the KantanMT approach to rapid engine training and tuning, data analytics used to predict MT quality and create tiered pricing structures and instantaneous engine deployment - all of which are driving the new MT Revolution!

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Product School

Sri Ambati

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

91mobiles

The Future of Platform Engineering

Jemma Hussein Allen

Assuring Contact Center Experiences for Your Customers With ThousandEyes

ThousandEyes

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Abida Shariff

Connector Corner: Automate dynamic content and events by pushing a button

DianaGray10

Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to: Create a campaign using Mailchimp with merge tags/fields Send an interactive Slack channel message (using buttons) Have the message received by managers and peers along with a test email for review But there’s more: In a second workflow supporting the same use case, you’ll see: Your campaign sent to target colleagues for approval If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team But—if the “Reject” button is pushed, colleagues will be alerted via Slack message Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors. And... Speakers: Akshay Agnihotri, Product Manager Charlie Greenberg, Host

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Product School

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...

Epistemic Interaction - tuning interfaces to provide information for AI support

Neuro-symbolic is not enough, we need neuro-*semantic*

FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf

Search and Society: Reimagining Information Access for Radical Futures

UiPath Test Automation using UiPath Test Suite series, part 4

Bits & Pixels using AI for Good.........

Knowledge engineering: from people to machines and back

When stars align: studies in data quality, knowledge graphs, and machine lear...

JMeter webinar - integration with InfluxDB and Grafana

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf

The Future of Platform Engineering

Assuring Contact Center Experiences for Your Customers With ThousandEyes

IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx

Connector Corner: Automate dynamic content and events by pushing a button

From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...

Pangeanic Taus Presentation 13.06.17

1. The Web, The Database and The Neural Garth Hedenskog, Sales Director Pangeanic TAUS Girona, 13 June 2017

2. • National research project CDTI • Workflow system with built in crawler • PM-less track and workflows initiation • Powerful tool with incorporated with Pangeanic’s new technology – ActivaTM and PangeaMT Neural

3. ELASTIC CENTRALIZED TM SYSTEM • FEATURES: • CAT tool agnostic • Cor integratable • Hosting options • Tag handling capabilities • API to NMT • Triangulation …summary

4. Our story…… • First translation company in the world to make commercial use of Moses. • Wins a post-editing contract in 2007 to work for the European Commission as MT output post-editors. THAT WAS THEN, THIS IS NOW • Pangeanic’s consortium, along with KantanMT, Prompsit and Tilde, was awarded the largest EU contract by CEF (Connecting Europe Facility) to supply infrastructure services to the European Union in the field of Digital Service Infrastructures, and particularly machine translation. (IADAATPA (Intelligent, Automatic Domain Adapted Automated Translation for Public Administrations)

5. Training Corpus Sentences Running words Vocabulary EN 4,6M 55,9M 491,6K JA 4,6M 76,0M 283,8K Dev corpus Sentences Running words OOVs EN 1,9K 24,1K 1,32 JA 1,9K 32,7K 0,86 Test corpus Sentences Running words OOVs Average length in characters Average number tokens EN 2K 27,1K 1,80 77 14,12 JA 2K 37,0K 1,14 59 19,08 Training data: • TAUS data for Electronics Computer Hardware (ECH) plus SOFT (IT) 4,6M sentences / 56M words (EN) • EN and JA tokenized (tokenizer.perl and Mecab respectively) BLEU TER WER PangeaMT 43,25 0,493174 0,607223 NMT 44,53 0,422858 0,473214 Seemingly…. Not such a big difference Results EN->JP :

6. 0-10 words 11-15 words 16-20 words 21-25 words 26-30 words 31+ words BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER Pangea MT 44,00 0,428 65 0,471 268 42,80 0,465 28 0,591 708 41,08 0,485 096 0,617 126 39,95 0,491 183 0,649 891 39,08 0,539 768 0,693 745 35,38 0,565 217 0,713 226 NMT 40,59 0,398 68 0,414 078 46,00 0,353 941 0,393 642 43,43 0,392 998 0,443 898 42,04 0,407 965 0,476 323 39,86 0,461 081 0,529 578 35,65 0,561 833 0,630 695 Results EN->JP by length: • In shorter sentences (0-10 words), our SMT system scores better results in BLEU, but if we take a look to the TER and WER, we see that in character and word level, NMT has better results which means less post edition efforts.In sentences (11-25 words), NMT always gets better results in BLEU, WER and TER. • In longer sentences (26++), NMT tends to have same results than PangeaMT. BLEU TER WER

7. Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT (results available in our blog) On average: from a set of (random) 250 sentences, around 85% - 90%, were good or very good (A or B). ES/PT/IT results similar to FR Evaluation: Translation companies and professional freelance translators EN-DE set of 250 sentences NMT SMT A 132 53% 34 14% B 98 39% 95 38% C 14 6% 97 39% D 6 2% 24 10% 250 250 EN-FR set of 250 sentences NMT SMT A 150 60% 39 16% B 76 30% 126 50% C 21 8% 71 28% D 3 1% 14 6% 250 250 EN-RU set of 250 sentences NMT SMT A 128 51% 39 16% B 84 34% 43 17% C 22 9% 60 24% D 16 6% 108 43% 250 250 EN-JP set of 250 sentences NMT SMT A 83 33% 17 7% B 71 28% 14 6% C 56 22% 95 38% D 40 16% 124 50% 250 250

8. •Conclusion •NN does not produce miracles yet but the innitial results are very exciting. •The shift is remarkable in all languages especially JP which has moved away from the usual average to bad results to a great leap to pretty acceptable quality Thank you! garth@pangeanic.com

Pangeanic Taus Presentation 13.06.17

Recommended

Recommended

More Related Content

Similar to Pangeanic Taus Presentation 13.06.17

Similar to Pangeanic Taus Presentation 13.06.17 (20)

Recently uploaded

Recently uploaded (20)

Pangeanic Taus Presentation 13.06.17