SlideShare a Scribd company logo
1
Quality Evaluation
Laura Casanellas
Machine Translation vs Traditional Translation?
The conversation has moved on
PBSMT vs NMT
Quality evaluation in Production
Initial indicators – the engine is likely to
produce good quality output when…
Rule of
Thumb
F-Measure: 70% or more
BLEU: 60% or more
TER: 40% of less
Initial indicators – the engine is likely to
produce good quality output when…
Rule of
Thumb
Perplexity: Below 3
Epochs: Between 5 and 9
(depending on Perplexity
score)
Quality evaluation in Production - Neural
Quality evaluation – Human validation
• At the final stages of engine creation or re-training and
customisation
• To validate automatic scores and improve the engine further
• Iterative process: each iteration adds to the quality improvement
of an engine
•Engine meets
automatic
quality
indicators
(BLEU, TER,
F-Measure,
Perplexity…)
#1
Iteration
• Adequacy
• Fluency
• Overall
Quality
• Error
Classification
(MQM)
Linguistic
Feedback
Linguistic
feedback is
included,
engine is
re-trained
#2
Iteration…
Different set of skills – Linguists of the “now”
Linguists who understand related concepts but also who know how the
different MT technologies work.
Human Evaluation with KantanLQR
• KantanLQR helps automate the process of harvesting linguistic
feedback
• Goal: improving engine performance
• Ranking – A/B Testing
• Quality Evaluation
• KPIs can be used to implement MQM Industry Standards
• Adequacy / Fluency / Overall Quality
• Error Classification
• Customisable KPIs; e.g.: Is this translation fit for publication?
• Post-editing
• Productivity
Reviewer’s interface
Quality evaluation – KantanLQR
Quality evaluation – KantanLQR
Human Evaluation - Results
Downloadable report provides insights into the quality
performance of the engine.
Quality evaluation – Post-editing
Target segment appears in the Post Edit box to be reviewed or
post-edited.
Productivity!!!
Quality evaluation – Productivity
# seconds per segment
Evaluation Report
Row no Source Target Translation Comment Reviewer Email Processed time(s)
1 We believe that women & girls can reach th
eir full potential when systems are designed
to include their experiences and voices, and
existing patriarchal systems are disabled.
We believe que las mujeres & girls pueden d
irigirse a la totalidad de los sistemas de posi
bles cuando estén concebidos para incluir su
s experiencias y voices, y existentes patriarc
hal sistemas se discapacidad.
Creemos que las mujeres y las niñas pueden
alcanzar su potencial cuando los sistemas
están concebidos para incluir sus
experiencias y voc es, y cuando los sistemas
patriarcales existentes han sido
desabilitados.
KantanMT - Carlos sample@kantanmt.com 137
2 We believe that the identity of marginalized
communities must be safeguarded.
We believe que la identidad de marginalized
Comunidades Europeas deben ofrecer prot
ección jurídica.
Creemos que se debe salvaguardar la
identidad de comunidades europeas
marginadas.
KantanMT - Laura reviewer@kantanmt.com 51
3 Ensuring equal dignity for minorities helps t
o ensure just and equitable systems.
Garantizar la igualdad de respeto de las min
orías ello garantizar justo y equitativa sistem
as.
Garantizar el respeto a las minorías ayuda a
afianzar sistemas justos y equitativos.
KantanMT - Laura reviewer@kantanmt.com 147
4 We train law students and civil society on la
w as an invaluable advocacy tool.
We tren derecho los estudiantes y de la soci
edad civil en la Ley invaluable promoción co
mo herramienta.
Formamos a estudiantes de derecho y a la
sociedad civil acerca de la ley como
herramienta de defensa de un valor
incalculable .
KantanMT - Laura reviewer@kantanmt.com 138
5 A powerful story of how people can change,
why we all must be willing to get to know ou
r fellow human beings, and why echo chamb
ers are so harmful.
Un potentes story de modalidades de perso
nas puede cambio, el motivo por el cual we
todas deben estar dispuesto de la infraestru
cturas para acordar A conocer mi fellow la tr
ata de personas, y motivo por el cual echo s
alas son nocivos.
Un potentes story de modalidades de
personas puede cambio, el motivo por el
cual we todas deben estar dispuesto de la
infraestructuras para acordar A conocer mi
fellow la trata de personas, y motivo por el
cual echo salas son nocivos.
KantanMT - Carlos sample@kantanmt.com 12
Quality evaluation – KantanLQR
Ranking Translations – A/B Testing
Reviewer’s interface
Quality evaluation – KantanLQR
A/B Testing - Results
We use these results to generate detailed insights such as our recent study on
Neural Industry Trials
Quality evaluation – Neural
A/B Test
• 5 language pairs
• 200 segments
• 3 reviewers per language
37
21
13
24
10
21
24
21
34
19
28
25.2
37
58
53
56
62
53.2
ENGLISH->CHINESE ENGLISH->JAPANESE ENGLISH->GERMAN ENGLISH->ITALIAN ENGLISH->SPANISH ALL
Average Scores from A/B Testing
Same SMT NMT
Evaluators SMT Adequacy SMT Fluency NMT Adequacy NMT Fluency
Evaluator 1 - IT 3.78 3.76 3.66 4.14
Evaluator 2 - IT 3.96 4.02 4.24 4.52
Evaluator 3 - ES 3.94 3.64 4.2 4.26
Evaluator 4 - ES 4.16 3.98 4.48 4.38
3.76 4.02
3.64 3.984.14
4.52 4.26 4.38
EVALUATOR 1 -
IT
EVALUATOR 2 -
IT
EVALUATOR 3 -
ES
EVALUATOR 4 -
ES
Fluency
SMT Fluency NMT Fluency
3.78 3.96 3.94 4.16
3.66
4.24 4.2 4.48
EVALUATOR 1
- IT
EVALUATOR 2
- IT
EVALUATOR 3
- ES
EVALUATOR 4
- ES
Adequacy
SMT Adequacy NMT Adequacy
Quality evaluation – Neural
Adequacy and Fluency – Ongoing test
• 2 language pairs (ES and IT)
• 50 segments (from the same test set)
• 2 reviewers per language (in-house)
Fluency and Adequacy scores are higher for Neural
Quality evaluation – Neural
Productivity test– ongoing test
• 2 language pairs (ES and IT)
• 150 segments (from the same test set)
• 2 reviewers per language
560 576 323 882985
3404
1269
1708
1367
3743
1427
1784
EVALUATOR 1 -
IT
EVALUATOR 2 -
IT
EVALUATOR 3 -
ES
EVALUATOR 4 -
ES
Translation Rate (words/hour) SMT PE Rate (words/hour)
NMT PE Rate (words/hour)
39% 10% 13% 4%
Average increase
from SMT to NMT
17%
Average increase
from translation
to NMT
196%
Quality evaluation – Looking for the gap
What is the gap between SMT and NMT output?
Ongoing evaluations
Error classification
What types of errors
can be found in the
NMT segments that are
more fluent than SMT?
Is this language
dependent?
Productivity
When SMT and NMT
output are considered
equal in quality, is there
one more productive
than the other? If so,
by how much? Is this
language dependent?
Terminology
What is NMT behaviour
around terminology? Is
there a pattern?
Which type of productivity increase can we expect
in complex languages (DE. ZH, JP)?
Stay tuned, we will publish the results!
16
Thanks and let’s move to today’s challenge!
Laura Casanellas, KantanMT Product Management

More Related Content

Similar to Kantanfest: Laura Casanellas

Extent 2013 Obninsk Test Tools for Trading Systems: Evolution Theory
Extent 2013 Obninsk Test Tools for Trading Systems: Evolution TheoryExtent 2013 Obninsk Test Tools for Trading Systems: Evolution Theory
Extent 2013 Obninsk Test Tools for Trading Systems: Evolution Theory
extentconf Tsoy
 
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning SystemAnalysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Harshal Jain
 
TransQuest
TransQuestTransQuest
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Lora Aroyo
 
2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf
2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf
2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf
akbazar
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
Iconic Translation Machines
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Sebastian Ruder
 
EXTENT Trading Test Tools Evolution Theory
EXTENT Trading Test Tools Evolution TheoryEXTENT Trading Test Tools Evolution Theory
EXTENT Trading Test Tools Evolution Theory
Iosif Itkin
 
Story generation-Sarah Saneei
Story generation-Sarah SaneeiStory generation-Sarah Saneei
Story generation-Sarah Saneei
SRah Sanei
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
Grace136708
 
Nlp whitepaper the securly way
Nlp whitepaper   the securly wayNlp whitepaper   the securly way
Nlp whitepaper the securly way
Securly
 
The effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appealThe effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appeal
Paul Doncaster
 
Expert Systems In Artificial Intelligence With Characteristics Components And...
Expert Systems In Artificial Intelligence With Characteristics Components And...Expert Systems In Artificial Intelligence With Characteristics Components And...
Expert Systems In Artificial Intelligence With Characteristics Components And...
SlideTeam
 
#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality
Berlin Language Technology
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Julián Urbano
 
Expert systems
Expert systemsExpert systems
Expert systems
Dr. C.V. Suresh Babu
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
Iconic Translation Machines
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
Vijay Ganti
 
Slide 1
Slide 1Slide 1
Slide 1
butest
 
015 History Essay Topics 008049318 1 Thatsnotus
015 History Essay Topics 008049318 1  Thatsnotus015 History Essay Topics 008049318 1  Thatsnotus
015 History Essay Topics 008049318 1 Thatsnotus
Ashley Fisher
 

Similar to Kantanfest: Laura Casanellas (20)

Extent 2013 Obninsk Test Tools for Trading Systems: Evolution Theory
Extent 2013 Obninsk Test Tools for Trading Systems: Evolution TheoryExtent 2013 Obninsk Test Tools for Trading Systems: Evolution Theory
Extent 2013 Obninsk Test Tools for Trading Systems: Evolution Theory
 
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning SystemAnalysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
 
TransQuest
TransQuestTransQuest
TransQuest
 
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
Truth is a Lie: 7 Myths about Human Annotation @CogComputing Forum 2014
 
2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf
2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf
2023-ICFS2023-Zahedan-Akbarzadeh-v4-March-8-March2023.pdf
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila CastilhoHuman Evaluation: Why do we need it? - Dr. Sheila Castilho
Human Evaluation: Why do we need it? - Dr. Sheila Castilho
 
EXTENT Trading Test Tools Evolution Theory
EXTENT Trading Test Tools Evolution TheoryEXTENT Trading Test Tools Evolution Theory
EXTENT Trading Test Tools Evolution Theory
 
Story generation-Sarah Saneei
Story generation-Sarah SaneeiStory generation-Sarah Saneei
Story generation-Sarah Saneei
 
Text Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.pptText Independent Speaker recognitom framework for detecting criminals.ppt
Text Independent Speaker recognitom framework for detecting criminals.ppt
 
Nlp whitepaper the securly way
Nlp whitepaper   the securly wayNlp whitepaper   the securly way
Nlp whitepaper the securly way
 
The effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appealThe effects of preferred text formatting on performance and perceptual appeal
The effects of preferred text formatting on performance and perceptual appeal
 
Expert Systems In Artificial Intelligence With Characteristics Components And...
Expert Systems In Artificial Intelligence With Characteristics Components And...Expert Systems In Artificial Intelligence With Characteristics Components And...
Expert Systems In Artificial Intelligence With Characteristics Components And...
 
#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality#5 Predicting Machine Translation Quality
#5 Predicting Machine Translation Quality
 
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information RetrievalValidity and Reliability of Cranfield-like Evaluation in Information Retrieval
Validity and Reliability of Cranfield-like Evaluation in Information Retrieval
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Machine Learning in NLP
Machine Learning in NLPMachine Learning in NLP
Machine Learning in NLP
 
Slide 1
Slide 1Slide 1
Slide 1
 
015 History Essay Topics 008049318 1 Thatsnotus
015 History Essay Topics 008049318 1  Thatsnotus015 History Essay Topics 008049318 1  Thatsnotus
015 History Essay Topics 008049318 1 Thatsnotus
 

More from kantanmt

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskas
kantanmt
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2
kantanmt
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
kantanmt
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
kantanmt
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowd
kantanmt
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeural
kantanmt
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answer
kantanmt
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
kantanmt
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translation
kantanmt
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
kantanmt
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
kantanmt
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016
kantanmt
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translation
kantanmt
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translation
kantanmt
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
kantanmt
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
kantanmt
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up business
kantanmt
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?
kantanmt
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translation
kantanmt
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMT
kantanmt
 

More from kantanmt (20)

KantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas KazlauskasKantanFest: Mindaugas Kazlauskas
KantanFest: Mindaugas Kazlauskas
 
Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2Kantanfest: Dimitar Shterionov - Part 2
Kantanfest: Dimitar Shterionov - Part 2
 
Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1Kantanfest: Dimitar Shterionov - Part 1
Kantanfest: Dimitar Shterionov - Part 1
 
KantanFest: Andy Way
KantanFest: Andy WayKantanFest: Andy Way
KantanFest: Andy Way
 
KantanFest: Tony O'Dowd
KantanFest: Tony O'DowdKantanFest: Tony O'Dowd
KantanFest: Tony O'Dowd
 
Get Started with KantanNeural
Get Started with KantanNeuralGet Started with KantanNeural
Get Started with KantanNeural
 
You Asked, We Will Answer
You Asked, We Will AnswerYou Asked, We Will Answer
You Asked, We Will Answer
 
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT SystemsATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
ATC Summit 2016: The 7th Habit of 7 Habits of Effective MT Systems
 
Cross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated TranslationCross Border Selling: Breaking the Language Barrier with Automated Translation
Cross Border Selling: Breaking the Language Barrier with Automated Translation
 
Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...Go global with this Winning Combination – Content strategy and Machine Transl...
Go global with this Winning Combination – Content strategy and Machine Transl...
 
Webinar automotive and engineering content 16.06.16
Webinar   automotive and engineering content 16.06.16Webinar   automotive and engineering content 16.06.16
Webinar automotive and engineering content 16.06.16
 
IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016IC4 Cloud Security Workshop 2016
IC4 Cloud Security Workshop 2016
 
New Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine TranslationNew Ways to Engage Clients with Custom Machine Translation
New Ways to Engage Clients with Custom Machine Translation
 
Improving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine TranslationImproving your Bottom Line with Custom Machine Translation
Improving your Bottom Line with Custom Machine Translation
 
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...How to Achieve Agile Localization for High-Volume Content with Machine Transl...
How to Achieve Agile Localization for High-Volume Content with Machine Transl...
 
How to Improve Translation Productivity
How to Improve Translation ProductivityHow to Improve Translation Productivity
How to Improve Translation Productivity
 
How to save 16 million euro for your start up business
How to save 16 million euro for your start up businessHow to save 16 million euro for your start up business
How to save 16 million euro for your start up business
 
What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?What is the Economic Case for Machine Translation?
What is the Economic Case for Machine Translation?
 
Tips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine TranslationTips for Preparing Training Data for High Quality Machine Translation
Tips for Preparing Training Data for High Quality Machine Translation
 
EAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMTEAMT Workshop 2015 - KantanMT
EAMT Workshop 2015 - KantanMT
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 

Kantanfest: Laura Casanellas

  • 2. Machine Translation vs Traditional Translation? The conversation has moved on PBSMT vs NMT
  • 3. Quality evaluation in Production Initial indicators – the engine is likely to produce good quality output when… Rule of Thumb F-Measure: 70% or more BLEU: 60% or more TER: 40% of less
  • 4. Initial indicators – the engine is likely to produce good quality output when… Rule of Thumb Perplexity: Below 3 Epochs: Between 5 and 9 (depending on Perplexity score) Quality evaluation in Production - Neural
  • 5. Quality evaluation – Human validation • At the final stages of engine creation or re-training and customisation • To validate automatic scores and improve the engine further • Iterative process: each iteration adds to the quality improvement of an engine •Engine meets automatic quality indicators (BLEU, TER, F-Measure, Perplexity…) #1 Iteration • Adequacy • Fluency • Overall Quality • Error Classification (MQM) Linguistic Feedback Linguistic feedback is included, engine is re-trained #2 Iteration… Different set of skills – Linguists of the “now” Linguists who understand related concepts but also who know how the different MT technologies work.
  • 6. Human Evaluation with KantanLQR • KantanLQR helps automate the process of harvesting linguistic feedback • Goal: improving engine performance • Ranking – A/B Testing • Quality Evaluation • KPIs can be used to implement MQM Industry Standards • Adequacy / Fluency / Overall Quality • Error Classification • Customisable KPIs; e.g.: Is this translation fit for publication? • Post-editing • Productivity Reviewer’s interface Quality evaluation – KantanLQR
  • 7. Quality evaluation – KantanLQR Human Evaluation - Results Downloadable report provides insights into the quality performance of the engine.
  • 8. Quality evaluation – Post-editing Target segment appears in the Post Edit box to be reviewed or post-edited. Productivity!!!
  • 9. Quality evaluation – Productivity # seconds per segment Evaluation Report Row no Source Target Translation Comment Reviewer Email Processed time(s) 1 We believe that women & girls can reach th eir full potential when systems are designed to include their experiences and voices, and existing patriarchal systems are disabled. We believe que las mujeres & girls pueden d irigirse a la totalidad de los sistemas de posi bles cuando estén concebidos para incluir su s experiencias y voices, y existentes patriarc hal sistemas se discapacidad. Creemos que las mujeres y las niñas pueden alcanzar su potencial cuando los sistemas están concebidos para incluir sus experiencias y voc es, y cuando los sistemas patriarcales existentes han sido desabilitados. KantanMT - Carlos sample@kantanmt.com 137 2 We believe that the identity of marginalized communities must be safeguarded. We believe que la identidad de marginalized Comunidades Europeas deben ofrecer prot ección jurídica. Creemos que se debe salvaguardar la identidad de comunidades europeas marginadas. KantanMT - Laura reviewer@kantanmt.com 51 3 Ensuring equal dignity for minorities helps t o ensure just and equitable systems. Garantizar la igualdad de respeto de las min orías ello garantizar justo y equitativa sistem as. Garantizar el respeto a las minorías ayuda a afianzar sistemas justos y equitativos. KantanMT - Laura reviewer@kantanmt.com 147 4 We train law students and civil society on la w as an invaluable advocacy tool. We tren derecho los estudiantes y de la soci edad civil en la Ley invaluable promoción co mo herramienta. Formamos a estudiantes de derecho y a la sociedad civil acerca de la ley como herramienta de defensa de un valor incalculable . KantanMT - Laura reviewer@kantanmt.com 138 5 A powerful story of how people can change, why we all must be willing to get to know ou r fellow human beings, and why echo chamb ers are so harmful. Un potentes story de modalidades de perso nas puede cambio, el motivo por el cual we todas deben estar dispuesto de la infraestru cturas para acordar A conocer mi fellow la tr ata de personas, y motivo por el cual echo s alas son nocivos. Un potentes story de modalidades de personas puede cambio, el motivo por el cual we todas deben estar dispuesto de la infraestructuras para acordar A conocer mi fellow la trata de personas, y motivo por el cual echo salas son nocivos. KantanMT - Carlos sample@kantanmt.com 12
  • 10. Quality evaluation – KantanLQR Ranking Translations – A/B Testing Reviewer’s interface
  • 11. Quality evaluation – KantanLQR A/B Testing - Results We use these results to generate detailed insights such as our recent study on Neural Industry Trials
  • 12. Quality evaluation – Neural A/B Test • 5 language pairs • 200 segments • 3 reviewers per language 37 21 13 24 10 21 24 21 34 19 28 25.2 37 58 53 56 62 53.2 ENGLISH->CHINESE ENGLISH->JAPANESE ENGLISH->GERMAN ENGLISH->ITALIAN ENGLISH->SPANISH ALL Average Scores from A/B Testing Same SMT NMT
  • 13. Evaluators SMT Adequacy SMT Fluency NMT Adequacy NMT Fluency Evaluator 1 - IT 3.78 3.76 3.66 4.14 Evaluator 2 - IT 3.96 4.02 4.24 4.52 Evaluator 3 - ES 3.94 3.64 4.2 4.26 Evaluator 4 - ES 4.16 3.98 4.48 4.38 3.76 4.02 3.64 3.984.14 4.52 4.26 4.38 EVALUATOR 1 - IT EVALUATOR 2 - IT EVALUATOR 3 - ES EVALUATOR 4 - ES Fluency SMT Fluency NMT Fluency 3.78 3.96 3.94 4.16 3.66 4.24 4.2 4.48 EVALUATOR 1 - IT EVALUATOR 2 - IT EVALUATOR 3 - ES EVALUATOR 4 - ES Adequacy SMT Adequacy NMT Adequacy Quality evaluation – Neural Adequacy and Fluency – Ongoing test • 2 language pairs (ES and IT) • 50 segments (from the same test set) • 2 reviewers per language (in-house) Fluency and Adequacy scores are higher for Neural
  • 14. Quality evaluation – Neural Productivity test– ongoing test • 2 language pairs (ES and IT) • 150 segments (from the same test set) • 2 reviewers per language 560 576 323 882985 3404 1269 1708 1367 3743 1427 1784 EVALUATOR 1 - IT EVALUATOR 2 - IT EVALUATOR 3 - ES EVALUATOR 4 - ES Translation Rate (words/hour) SMT PE Rate (words/hour) NMT PE Rate (words/hour) 39% 10% 13% 4% Average increase from SMT to NMT 17% Average increase from translation to NMT 196%
  • 15. Quality evaluation – Looking for the gap What is the gap between SMT and NMT output? Ongoing evaluations Error classification What types of errors can be found in the NMT segments that are more fluent than SMT? Is this language dependent? Productivity When SMT and NMT output are considered equal in quality, is there one more productive than the other? If so, by how much? Is this language dependent? Terminology What is NMT behaviour around terminology? Is there a pattern? Which type of productivity increase can we expect in complex languages (DE. ZH, JP)? Stay tuned, we will publish the results!
  • 16. 16 Thanks and let’s move to today’s challenge! Laura Casanellas, KantanMT Product Management

Editor's Notes

  1. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  2. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  3. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  4. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  5. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  6. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  7. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  8. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  9. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  10. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  11. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  12. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  13. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.
  14. A translation production line nowadays typically combines an MT component with human post-editing. While the MT component is simply a means to get a raw translation of the original text, which in the next step is modified to meet certain translation quality standards, the choice of correct MT toolset impacts the efficiency of this pipeline.