1) The document discusses Pangeanic's new neural machine translation (NMT) technology called PangeaMT Neural and ActivaTM and compares its performance to their previous statistical machine translation (SMT) system.
2) Experimental results on English to Japanese translation show that NMT outperforms SMT in BLEU, TER, and WER scores, especially for shorter sentences between 0-25 words, producing translations requiring less post-editing effort.
3) Additional testing of NMT on other language pairs like English to French and Russian also showed superior results compared to SMT, with translators rating 85-90% of NMT translations as good or very good quality versus only 50-60
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Manuel Herranz
Presentation of Pangeanic language technologies as a result of EU and national R&D: Cor for web crawling and website translation, linked to Elastic Search-based ActivaTM and NeuralMT
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...Manuel Herranz
Co-presentation by Kerstin Bier and Manuel Herranz in Localization World Barcelona 2011 on the achievement and progress made by a customized PangeaMT engine at Sybase. Initial machine translation implementation, machine translation customization for Sybase, use of client's data for training and productivity results.
Our statistical machine translation platform and hybrid features were presented at the European Commission offices in Luxembourg last Tuesday 22nd September. It is one of the tools that the European Union will consider, among other machine translation commercial solutions, as a tool to help its mandate for CEF (Connecting Europe Facility). Pangeanic’s CEO, Manuel Herranz, presented the current state-of-the-art that PangeaMT version 3 represents. Representatives from the EU were particularly interested in the solid data management features, machine translation engine retraining routines, data cleaning and automated engine training and creation features. One of key features with the new PangeaMT version is the possibility to change translation algorithms and use rule-based systems like Apertium and Thot as well as the default Moses. It is also compatible with 3rd-party calls from other systems. Its powerful API can also provide machine translated output to requests anywhere in the world, although the platform is designed for onsite use at translation companies and organizations. PangeaMT is also compatible with several popular translation formats like ttx, sdlxliff, memoq, memsource, and most xml-based Tikal formats.
MT best practices for price, speed AND quality, as well as Lexcelera’s machine translation case studies and services including training, integration, post-editing and hosted MT
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
Descripción del funcionamiento de una empresa de traducción, departamentos y procesos, tomando a www.pangeanic.es como ejemplo. Descripción de funciones, normas y flujo de trabajo con un énfasis en los procesos de traducción automática.
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Manuel Herranz
Presentation of Pangeanic language technologies as a result of EU and national R&D: Cor for web crawling and website translation, linked to Elastic Search-based ActivaTM and NeuralMT
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...Manuel Herranz
Co-presentation by Kerstin Bier and Manuel Herranz in Localization World Barcelona 2011 on the achievement and progress made by a customized PangeaMT engine at Sybase. Initial machine translation implementation, machine translation customization for Sybase, use of client's data for training and productivity results.
Our statistical machine translation platform and hybrid features were presented at the European Commission offices in Luxembourg last Tuesday 22nd September. It is one of the tools that the European Union will consider, among other machine translation commercial solutions, as a tool to help its mandate for CEF (Connecting Europe Facility). Pangeanic’s CEO, Manuel Herranz, presented the current state-of-the-art that PangeaMT version 3 represents. Representatives from the EU were particularly interested in the solid data management features, machine translation engine retraining routines, data cleaning and automated engine training and creation features. One of key features with the new PangeaMT version is the possibility to change translation algorithms and use rule-based systems like Apertium and Thot as well as the default Moses. It is also compatible with 3rd-party calls from other systems. Its powerful API can also provide machine translated output to requests anywhere in the world, although the platform is designed for onsite use at translation companies and organizations. PangeaMT is also compatible with several popular translation formats like ttx, sdlxliff, memoq, memsource, and most xml-based Tikal formats.
MT best practices for price, speed AND quality, as well as Lexcelera’s machine translation case studies and services including training, integration, post-editing and hosted MT
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
Descripción del funcionamiento de una empresa de traducción, departamentos y procesos, tomando a www.pangeanic.es como ejemplo. Descripción de funciones, normas y flujo de trabajo con un énfasis en los procesos de traducción automática.
Gestión proyectos traducción - Universitat Autònoma de BarcelonaManuel Herranz
Presentación sobre el modelo de gestión de proyectos en una empresa de traducción, sirviendo www.pangeanic.es como ejemplo. Descripción de departamentos y procesos.
Panelists: Yoshiyasu Yamakawa (Intel), JP Barraza (Systran), Konstantin Dranch (Memsource), David Koot (TAUS)
The focus of this session will be on predictions and risk management. What kind of things can you predict and how can you manage risks by by analyzing your translation data or monitoring your productivity and quality. Tracking translation data in different cycles of the translation process (translation, post-editing, review, proof-reading) offers tremendous value when it comes to predicting future trends or making informed choices. What type of data can be valuable and what kind of predictions can we make using this data? How can we make more efficient use of already available data? How can we use this type of data to improve machine translation, automatic QA, error-recognition, sampling or quality estimation? How can academia and industry work together towards a common goal?
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore
SDL BeGlobal The SDL Platform for Automated TranslationSDL Trados
Post edited machine translation as a skill and as an addition to the professional translators’ toolkit is now becoming widely accepted. Here you can see why...
State of the Machine Translation by Intento (November 2017)Konstantin Savenkov
Evaluation of 11 major Machine Translation (Google, Microsoft, IBM, SAP, Yandex, SDL, Systran, Baidu, GTCom, PROMT, DeepL) providers for 35 most popular language pairs: performance, quality, language coverage, API update frequency.
New Breakthroughs in Machine Transation Technologykantanmt
Tony O’Dowd takes us through some of the most innovative technologies offered on the KantanMT.com platform which are helping a growing community of KantanMT users to develop and self-manage custom Machine Translation engines in the cloud.
Maxim Khalilov then illustrates bmmt’s journey with Machine Translation on KantanMT. He discusses what they have achieved so far in terms of MT engine development and showcases the value that his team is bringing to their growing international client base through the use of Machine Translation.
Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz
Our presentation at #Eliatogether in Athens was favored by many attendees. Will disintermediation be a force to reckon with in the translation industry as it has happened in the hotel and travel industries? What is the role of machine translation in all this? How does neural machine translation work?
Methods for Handling Terminology in Machine TranslationKerstin Berns
Im Vortrag werden Möglichkeiten und Vor- und Nachteile verschiedener MÜ-Lösungen in der SDL-Language-Cloud vorgestellt. Besonderes Interesse weckt die sogenannte Adaptive MT, eine spezieller MÜ-System-Typ, welcher durch kontinuierliche Korrekturen bzw. nutzerspezifische Anpassungen von MÜ-Vorschlägen lernt, indem die Post-Edits des Nutzers zur Optimierung der Engine benutzt werden. Eine Technik, die auch im Rahmen der neuralen maschinellen Übersetzung bei SDL noch eine wichtige Rolle spielen wird.
Veranstaltung: ETUG 2017, Nürnberg
Manuel Herranz presents at TMS Inspiration Days, on Pangeanic's use case, the application of MT to LSPs, the Pangeanic development case. Unveiling feature-rich PangeaMT Saas Power, Pangeanic's v3.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates, follow us on Twitter - #MosesCore
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
Presented my research on "Real-time DirectTranslation System for Sinhala and Tamil Languages" at the FedCSIS 2015 Research Conference hosted by University of Lodz, Poland from 13 - 17th of September 2015.
Working with MOSES and building high quality MT systems is not for the faint hearted. It requires a wide range of technical and linguistic based knowledge that is often difficult to find and develop within organisations. Consequently, only the biggest organisations have the financial muscle to invest and reap the awards of MT. This puts the small-to-medium sized organisations at a distinct disadvantage. KantanMT changes everything! KantanMT is a cloud-based implementation of MOSES which enables SMEs to embrace the advantages of MT - quickly and economically. This presentation will demonstrate the KantanMT approach to rapid engine training and tuning, data analytics used to predict MT quality and create tiered pricing structures and instantaneous engine deployment - all of which are driving the new MT Revolution!
Gestión proyectos traducción - Universitat Autònoma de BarcelonaManuel Herranz
Presentación sobre el modelo de gestión de proyectos en una empresa de traducción, sirviendo www.pangeanic.es como ejemplo. Descripción de departamentos y procesos.
Panelists: Yoshiyasu Yamakawa (Intel), JP Barraza (Systran), Konstantin Dranch (Memsource), David Koot (TAUS)
The focus of this session will be on predictions and risk management. What kind of things can you predict and how can you manage risks by by analyzing your translation data or monitoring your productivity and quality. Tracking translation data in different cycles of the translation process (translation, post-editing, review, proof-reading) offers tremendous value when it comes to predicting future trends or making informed choices. What type of data can be valuable and what kind of predictions can we make using this data? How can we make more efficient use of already available data? How can we use this type of data to improve machine translation, automatic QA, error-recognition, sampling or quality estimation? How can academia and industry work together towards a common goal?
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
With the recent developments in the field of Natural Language Processing, there has been a rise in the use
of different architectures for Neural Machine Translation. Transformer architectures are used to achieve
state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such
setups consisting of high-end GPUs and other resources. We train our models on low computational
resources and investigate the results. As expected, transformers outperformed other architectures, but
there were some surprising results. Transformers consisting of more encoders and decoders took more
time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively
less time to train than transformers, making it suitable to use in situations having time constraints.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates go to http://www.statmt.org/mosescore/
or follow us on Twitter - #MosesCore
SDL BeGlobal The SDL Platform for Automated TranslationSDL Trados
Post edited machine translation as a skill and as an addition to the professional translators’ toolkit is now becoming widely accepted. Here you can see why...
State of the Machine Translation by Intento (November 2017)Konstantin Savenkov
Evaluation of 11 major Machine Translation (Google, Microsoft, IBM, SAP, Yandex, SDL, Systran, Baidu, GTCom, PROMT, DeepL) providers for 35 most popular language pairs: performance, quality, language coverage, API update frequency.
New Breakthroughs in Machine Transation Technologykantanmt
Tony O’Dowd takes us through some of the most innovative technologies offered on the KantanMT.com platform which are helping a growing community of KantanMT users to develop and self-manage custom Machine Translation engines in the cloud.
Maxim Khalilov then illustrates bmmt’s journey with Machine Translation on KantanMT. He discusses what they have achieved so far in terms of MT engine development and showcases the value that his team is bringing to their growing international client base through the use of Machine Translation.
Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz
Our presentation at #Eliatogether in Athens was favored by many attendees. Will disintermediation be a force to reckon with in the translation industry as it has happened in the hotel and travel industries? What is the role of machine translation in all this? How does neural machine translation work?
Methods for Handling Terminology in Machine TranslationKerstin Berns
Im Vortrag werden Möglichkeiten und Vor- und Nachteile verschiedener MÜ-Lösungen in der SDL-Language-Cloud vorgestellt. Besonderes Interesse weckt die sogenannte Adaptive MT, eine spezieller MÜ-System-Typ, welcher durch kontinuierliche Korrekturen bzw. nutzerspezifische Anpassungen von MÜ-Vorschlägen lernt, indem die Post-Edits des Nutzers zur Optimierung der Engine benutzt werden. Eine Technik, die auch im Rahmen der neuralen maschinellen Übersetzung bei SDL noch eine wichtige Rolle spielen wird.
Veranstaltung: ETUG 2017, Nürnberg
Manuel Herranz presents at TMS Inspiration Days, on Pangeanic's use case, the application of MT to LSPs, the Pangeanic development case. Unveiling feature-rich PangeaMT Saas Power, Pangeanic's v3.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates, follow us on Twitter - #MosesCore
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
Presented my research on "Real-time DirectTranslation System for Sinhala and Tamil Languages" at the FedCSIS 2015 Research Conference hosted by University of Lodz, Poland from 13 - 17th of September 2015.
Working with MOSES and building high quality MT systems is not for the faint hearted. It requires a wide range of technical and linguistic based knowledge that is often difficult to find and develop within organisations. Consequently, only the biggest organisations have the financial muscle to invest and reap the awards of MT. This puts the small-to-medium sized organisations at a distinct disadvantage. KantanMT changes everything! KantanMT is a cloud-based implementation of MOSES which enables SMEs to embrace the advantages of MT - quickly and economically. This presentation will demonstrate the KantanMT approach to rapid engine training and tuning, data analytics used to predict MT quality and create tiered pricing structures and instantaneous engine deployment - all of which are driving the new MT Revolution!
Similar to Pangeanic Taus Presentation 13.06.17 (20)
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Pangeanic Taus Presentation 13.06.17
1. The Web, The Database and
The Neural
Garth Hedenskog, Sales Director
Pangeanic TAUS Girona, 13 June 2017
2. • National research project CDTI
• Workflow system with built in crawler
• PM-less track and workflows initiation
• Powerful tool with incorporated with
Pangeanic’s new technology – ActivaTM and
PangeaMT Neural
3. ELASTIC CENTRALIZED TM SYSTEM
• FEATURES:
• CAT tool agnostic
• Cor integratable
• Hosting options
• Tag handling capabilities
• API to NMT
• Triangulation
…summary
4. Our story……
• First translation company in the world to make commercial use of
Moses.
• Wins a post-editing contract in 2007 to work for the European
Commission as MT output post-editors.
THAT WAS THEN, THIS IS NOW
• Pangeanic’s consortium, along with KantanMT, Prompsit and Tilde,
was awarded the largest EU contract by CEF (Connecting Europe
Facility) to supply infrastructure services to the European Union in
the field of Digital Service Infrastructures, and particularly machine
translation. (IADAATPA (Intelligent, Automatic Domain Adapted
Automated Translation for Public Administrations)
5. Training Corpus
Sentences Running
words
Vocabulary
EN 4,6M 55,9M 491,6K
JA 4,6M 76,0M 283,8K
Dev corpus
Sentences Running
words
OOVs
EN 1,9K 24,1K 1,32
JA 1,9K 32,7K 0,86
Test corpus
Sentences Running
words
OOVs Average length
in characters
Average
number tokens
EN 2K 27,1K 1,80 77 14,12
JA 2K 37,0K 1,14 59 19,08
Training data:
• TAUS data for Electronics Computer Hardware (ECH) plus SOFT (IT) 4,6M sentences / 56M words (EN)
• EN and JA tokenized (tokenizer.perl and Mecab respectively)
BLEU TER WER
PangeaMT 43,25 0,493174 0,607223
NMT 44,53 0,422858 0,473214
Seemingly…. Not such a big difference
Results EN->JP :
6. 0-10 words 11-15 words 16-20 words 21-25 words 26-30 words 31+ words
BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER BLEU TER WER
Pangea
MT
44,00 0,428
65
0,471
268
42,80 0,465
28
0,591
708
41,08 0,485
096
0,617
126
39,95 0,491
183
0,649
891
39,08 0,539
768
0,693
745
35,38 0,565
217
0,713
226
NMT 40,59 0,398
68
0,414
078
46,00 0,353
941
0,393
642
43,43 0,392
998
0,443
898
42,04 0,407
965
0,476
323
39,86 0,461
081
0,529
578
35,65 0,561
833
0,630
695
Results EN->JP by length:
• In shorter sentences (0-10 words), our SMT system scores better results in BLEU, but if we take a look to the
TER and WER, we see that in character and word level, NMT has better results which means less post edition
efforts.In sentences (11-25 words), NMT always gets better results in BLEU, WER and TER.
• In longer sentences (26++), NMT tends to have same results than PangeaMT.
BLEU TER WER
7. Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT (results available in our blog)
On average: from a set of (random) 250 sentences, around 85% - 90%, were good or very good (A or B). ES/PT/IT
results similar to FR
Evaluation: Translation companies and professional freelance translators
EN-DE set of 250 sentences
NMT SMT
A 132 53% 34 14%
B 98 39% 95 38%
C 14 6% 97 39%
D 6 2% 24 10%
250 250
EN-FR set of 250 sentences
NMT SMT
A 150 60% 39 16%
B 76 30% 126 50%
C 21 8% 71 28%
D 3 1% 14 6%
250 250
EN-RU set of 250 sentences
NMT SMT
A 128 51% 39 16%
B 84 34% 43 17%
C 22 9% 60 24%
D 16 6% 108 43%
250 250
EN-JP set of 250 sentences
NMT SMT
A 83 33% 17 7%
B 71 28% 14 6%
C 56 22% 95 38%
D 40 16% 124 50%
250 250
8. •Conclusion
•NN does not produce miracles yet but the innitial results are very exciting.
•The shift is remarkable in all languages especially JP which has moved away from
the usual average to bad results to a great leap to pretty acceptable quality
Thank you!
garth@pangeanic.com