SlideShare a Scribd company logo
The Web, The Database
and The Neural
Manuel Herranz, CEO
Pangeanic TAUS Tokyo, April 2017
What changes in EN-JP?
The Aim
After building 1000’s of MT systems for different purposes and clients,
we realized shortcoming in several areas for which existing tools were
“locked”, had no innovation, were too inflexible, or presented several
shortcomings.
We needed systems that talked to each other, yet were independent.
This is the result of a EU research project (ActivaTM) and a national
project in Spain (Cor)
The Web
Cor
Eases estimation in any translation format (doc or web)
National research project with EU funding
Full platform
Use by Pangeanic, LSPs, 3rd parties
CMS agnostic – extracts text and converts to xliff
(doc or web)
The Web
Cor
Translate sections of a web only (batches)
Detect new content or content that has been eliminated to update language versions
The Web
Eases estimation in any translation format (doc or web)
Documents, too.
The Database
ActivaTM
Elastic Search-based
All language assets in one database, irrespective of
tool that created them
Deep learning for tag handling
CAT-tool agnostic (solves interoperability issues)
Automatic fuzzy match repair.
More powerful (strict) fuzzy matching than traditional
CAT-tools
Subsegment split
The Database
Matrix (triangulate to create new language pairs)
Statistics on all segment units, words, domains
Remote access, API
Pre-filter prior to MT (TM+MT)
The Neural Artificial Neural Networks for SMT
History of ANN-based Machine Translation and Language
Modelling for SMT:
1997 [Castano & Casacuberta 97] (JAUME I &
U.Politécnica): Machine translation using neural
networks and finite-state models
(PangeaMT: https://www.prhlt.upv.es/wp/research-
areas/mt-showcase)
2007 [Schwenk & Costa-jussa 07]: Smooth bilingual
n-gram translation.
2012 [Le & Allauzen 12, Schwenk 12]: Continuous
space translation models with neural networks.
2014 [Devlin & Zbib 14]: Fast and robust neural
networks for SMT
Conventional SMT
Use of statistics has been controversial in
computational linguistics:
Chomsky 1969: ... the notion ’probability of a
sentence’ is an entirely useless one, under any
known interpretation of this term.
Considered to be true by most experts in (rule-
based) natural language processing and artificial
intelligence
History of Statistical Approach to MT
1989-94: IBM’s pioneering work
since 1996: only a few teams favored SMT:
U.Politécnica Valencia, RWTH Aachen, HKUST,
CMU
2006/2007 Google Translate
2006-2012 Euromatrix
2009: PangeaMT
Training data:
TAUS data for Electronics Computer Hardware (ECH) plus SOFT (IT) 4,6M sentences / 56M words (EN)
EN and JA tokenized (tokenizer.perl and Mecab respectively)
The Neural
Seemingly…. Not such a big difference
Results EN->JA :
The Neural
BLEU: higher is better
TER: lower is better
WER: lower is better
BLEU: detects precision in ngrams
TER: derived from the Levenshtein distance, working at the character level
WER: derived from the Levenshtein distance, working at the word level
Results EN->JA:
The Neural Results EN->JA by length:
In smaller sentences (0-10 words), our SMT system gets better results in BLEU, but if we take a look to the TER and
WER, we see that in character and word level, NMT has better results that results in less postedition effort.
In medium sentences (11-25), NMT gets always better results in BLEU, WER and TER.
In long sentences (26++), NMT tends to have same results than PangeaMT.
BLEU TER WER
The Neural
A: Very good, perfect or very light post-editing
B: OK but needs light post-editingt
C: Not good but some meaning can be understood.
D: Not good at all. Needs HT.
Do we need new metrics? BLEU
does not seem to correlate well
to perception of NMT being
much better.
The Neural
Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT (results to be published in May).
On average: from a set of 250 sentences, around 60%-65% were good or very good (A or B). ES/PT/IT results similar to FR
Evaluation: Translation companies and professional freelance translators
Questions
NMT scary? Almost there? (as good as
human)?
Just a matter time (data and connectors)
to make NMT ubiquitous?
Where will be in 3 years, 5 years?
Translation Companies need to change
business model and become something
else?
Thank you!
m.herranz@pangeanic.com

More Related Content

Similar to Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017

Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Garth Brian Hedenskog
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHIRJET Journal
 
Acceptance Testing Of A Spoken Language Translation System
Acceptance Testing Of A Spoken Language Translation SystemAcceptance Testing Of A Spoken Language Translation System
Acceptance Testing Of A Spoken Language Translation SystemMichele Thomas
 
Language Grid
Language GridLanguage Grid
Language Gridlindh
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docxevonnehoggarth79783
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ijnlc
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...kevig
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...kevig
 
Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)Olga Melnikova
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET Journal
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answeringAli Kabbadj
 

Similar to Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017 (20)

Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISHA NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
A NEURAL MACHINE LANGUAGE TRANSLATION SYSTEM FROM GERMAN TO ENGLISH
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
Acceptance Testing Of A Spoken Language Translation System
Acceptance Testing Of A Spoken Language Translation SystemAcceptance Testing Of A Spoken Language Translation System
Acceptance Testing Of A Spoken Language Translation System
 
Language Grid
Language GridLanguage Grid
Language Grid
 
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx70    C o m m u n i C at i o n s  o f  t h E  a C m       j u.docx
70 C o m m u n i C at i o n s o f t h E a C m j u.docx
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...ANALYZING ARCHITECTURES FOR NEURAL  MACHINE TRANSLATION USING LOW  COMPUTATIO...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIO...
 
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
ANALYZING ARCHITECTURES FOR NEURAL MACHINE TRANSLATION USING LOW COMPUTATIONA...
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)Diversity In Localization (Olga Melnikova)
Diversity In Localization (Olga Melnikova)
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text Detection
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 
French machine reading for question answering
French machine reading for question answeringFrench machine reading for question answering
French machine reading for question answering
 
CAT TOOLS.ppt
CAT TOOLS.pptCAT TOOLS.ppt
CAT TOOLS.ppt
 

More from Manuel Herranz

Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
 
www.pangeanic.com UAB What is machine translation?
www.pangeanic.com UAB What is machine translation?www.pangeanic.com UAB What is machine translation?
www.pangeanic.com UAB What is machine translation?Manuel Herranz
 
Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtManuel Herranz
 
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamt
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamtMTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamt
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamtManuel Herranz
 
machine translation manuel herranz PangeaMT TAUS Barcelona
machine translation manuel herranz PangeaMT TAUS Barcelonamachine translation manuel herranz PangeaMT TAUS Barcelona
machine translation manuel herranz PangeaMT TAUS BarcelonaManuel Herranz
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...Manuel Herranz
 
Panacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - BudapestPanacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - BudapestManuel Herranz
 

More from Manuel Herranz (10)

iadaatpa gala boston
iadaatpa gala bostoniadaatpa gala boston
iadaatpa gala boston
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel Herranz
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
 
www.pangeanic.com UAB What is machine translation?
www.pangeanic.com UAB What is machine translation?www.pangeanic.com UAB What is machine translation?
www.pangeanic.com UAB What is machine translation?
 
Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mt
 
Jtf new
Jtf newJtf new
Jtf new
 
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamt
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamtMTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamt
MTexperiences Sony Europe PangeaMT _f_prastarosony_eyustepangeamt
 
machine translation manuel herranz PangeaMT TAUS Barcelona
machine translation manuel herranz PangeaMT TAUS Barcelonamachine translation manuel herranz PangeaMT TAUS Barcelona
machine translation manuel herranz PangeaMT TAUS Barcelona
 
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
kerstin bier, localization world barcelona, manuel herranz, mt, pangeanic, sy...
 
Panacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - BudapestPanacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - Budapest
 

Recently uploaded

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyJohn Staveley
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»QADay
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 

Recently uploaded (20)

AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017

  • 1. The Web, The Database and The Neural Manuel Herranz, CEO Pangeanic TAUS Tokyo, April 2017 What changes in EN-JP?
  • 2. The Aim After building 1000’s of MT systems for different purposes and clients, we realized shortcoming in several areas for which existing tools were “locked”, had no innovation, were too inflexible, or presented several shortcomings. We needed systems that talked to each other, yet were independent. This is the result of a EU research project (ActivaTM) and a national project in Spain (Cor)
  • 3. The Web Cor Eases estimation in any translation format (doc or web) National research project with EU funding Full platform Use by Pangeanic, LSPs, 3rd parties CMS agnostic – extracts text and converts to xliff (doc or web)
  • 4. The Web Cor Translate sections of a web only (batches) Detect new content or content that has been eliminated to update language versions
  • 5. The Web Eases estimation in any translation format (doc or web) Documents, too.
  • 6. The Database ActivaTM Elastic Search-based All language assets in one database, irrespective of tool that created them Deep learning for tag handling CAT-tool agnostic (solves interoperability issues) Automatic fuzzy match repair. More powerful (strict) fuzzy matching than traditional CAT-tools Subsegment split
  • 7. The Database Matrix (triangulate to create new language pairs) Statistics on all segment units, words, domains Remote access, API Pre-filter prior to MT (TM+MT)
  • 8. The Neural Artificial Neural Networks for SMT History of ANN-based Machine Translation and Language Modelling for SMT: 1997 [Castano & Casacuberta 97] (JAUME I & U.Politécnica): Machine translation using neural networks and finite-state models (PangeaMT: https://www.prhlt.upv.es/wp/research- areas/mt-showcase) 2007 [Schwenk & Costa-jussa 07]: Smooth bilingual n-gram translation. 2012 [Le & Allauzen 12, Schwenk 12]: Continuous space translation models with neural networks. 2014 [Devlin & Zbib 14]: Fast and robust neural networks for SMT Conventional SMT Use of statistics has been controversial in computational linguistics: Chomsky 1969: ... the notion ’probability of a sentence’ is an entirely useless one, under any known interpretation of this term. Considered to be true by most experts in (rule- based) natural language processing and artificial intelligence History of Statistical Approach to MT 1989-94: IBM’s pioneering work since 1996: only a few teams favored SMT: U.Politécnica Valencia, RWTH Aachen, HKUST, CMU 2006/2007 Google Translate 2006-2012 Euromatrix 2009: PangeaMT
  • 9. Training data: TAUS data for Electronics Computer Hardware (ECH) plus SOFT (IT) 4,6M sentences / 56M words (EN) EN and JA tokenized (tokenizer.perl and Mecab respectively) The Neural Seemingly…. Not such a big difference Results EN->JA :
  • 10. The Neural BLEU: higher is better TER: lower is better WER: lower is better BLEU: detects precision in ngrams TER: derived from the Levenshtein distance, working at the character level WER: derived from the Levenshtein distance, working at the word level Results EN->JA:
  • 11. The Neural Results EN->JA by length: In smaller sentences (0-10 words), our SMT system gets better results in BLEU, but if we take a look to the TER and WER, we see that in character and word level, NMT has better results that results in less postedition effort. In medium sentences (11-25), NMT gets always better results in BLEU, WER and TER. In long sentences (26++), NMT tends to have same results than PangeaMT. BLEU TER WER
  • 12. The Neural A: Very good, perfect or very light post-editing B: OK but needs light post-editingt C: Not good but some meaning can be understood. D: Not good at all. Needs HT. Do we need new metrics? BLEU does not seem to correlate well to perception of NMT being much better.
  • 13. The Neural Tests in F/I/G/S, RU, PT point to a very strong preference towards NMT (results to be published in May). On average: from a set of 250 sentences, around 60%-65% were good or very good (A or B). ES/PT/IT results similar to FR Evaluation: Translation companies and professional freelance translators
  • 14. Questions NMT scary? Almost there? (as good as human)? Just a matter time (data and connectors) to make NMT ubiquitous? Where will be in 3 years, 5 years? Translation Companies need to change business model and become something else?