SlideShare a Scribd company logo
1 of 23
What LSPs can do to support
Post-Editors for addressing
pain-points in NMT
Toru Shishido
Solutions Consultant, Human Science
1
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Agenda
• Post-Edit Survey 2019
• Style Guides
• MTrans Post-Edit Booster
• Conclusion
2
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Post-Edit Survey 2019
We asked 100 translators/reviewers about MT and PE (56 replied).
• Years of a career as a linguist
• Experiences of post-editing
• Do you like or dislike post-editing job
• Why don’t you perform post-editing job
• What improvement do you expect in the future
3
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Post-Edit Survey 2019 – Result 1
We asked 100 translators/reviewers about MT and PE (56 replied).
• Years of a career as a linguist
• Experiences of post-editing
4
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Have you ever performed PE tasks?
Almost every day
Several a month
Several a year
Not experienced, but interested
Always reject
Others
(10.7%)
(30.4%)
(23.2%)
(5.4%)
(21.4%)
(8.9%)
How long have you worked as a freelance
translator/reviewer?
Less than a year
1-3 years
4-5 years
6-9 years
10-14 years
15 years or more
(5.4%)
(14.3%)
(10.7%)
(10.7%)
(19.6%)
(39.3%)
Post-Edit Survey 2019 – Result 2
We asked 100 translators/reviewers about MT and PE (56 replied).
• Do you like or dislike post-editing job
• Why don’t you perform post-editing job
5
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Are you satisfied with PE tasks compared to
translation/review tasks?
Very satisfiedNot satisfied at all
Why don’t you accept PE tasks?
Never been asked
Translation skill might be decreased
Most of outputs are not accurate
Most of outputs are not fluent
Incorrect terminology
Incorrect style
Inconsistent translations
Cheaper word rate
I hate MT
15 (75%)
Post-Edit Survey 2019 – Result 3
We asked 100 translators/reviewers about MT and PE (56 replied).
• What improvement do you expect in the future
6
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
What improvement do you expect to MT?
Choose up to two options.
Accuracy
Fluency
Correct terminology
Correct style
Consistent translations
Improvement on numeric errors
Correct tags
Perceptions from Post-Edit Survey 2019
Feedback to MT outputs from our translators/reviewers:
• Rather than machine translation improvements, I hope MT engines will
support minor style differences depending on the end client.
• As a translator, it would be very happy if the stylistic/cosmetic errors will be
automatically modified when opening the file.
• If the error patterns are identified, post-editing is often easier than reviewing
bad human translations. I'm fed up with reviewing bad HT, but I can forgive
mistranslations in MT. However, I'm wondering why there are some style and
numeric errors in MT outputs. They seem the strong points of machines.
7
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Pain points of Post-Edit – from our Clients
Our clients also mentioned about style errors:
Double quotation marks are always double-byte characters in translations.
Hopefully, double quotation marks should always be changed to Japanese
brackets 「」 in translation.
Double-byte parentheses should not be used. Must be replaced with single-byte
ones. And spaces should be inserted before the beginning parenthesis and after
the closing parenthesis.
Double-byte slashes should not be used. Must be replaced with single-byte ones.
Sometimes the word "default" is translated to "既定", but it should always be "
デフォルト".
8
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Localization style guides - Microsoft
According to Microsoft,
Microsoft (Localization) Style
Guides are collections of rules
that define language and style
conventions for specific
languages. These rules usually
include general localization
guidelines, information on
language style and usage in
technical publications, and
information on market-specific
data formats.
109 localization style guides
are available there.
9
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Why style matters?
Without a style guide:
Linguists cannot decide which style should be applied without a style guide and
there are more inconsistent translations in contents.
Inconsistent translations:
For clients, inconsistent translation may damage the brand and it may take lots
of time to standardize their inconsistent translations.
Links will not work:
https://www.science.co.jp
https://www.science.co.jp
10
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Two character sets: Single/Double Byte Characters
There are two versions of character sets for Japanese, Single-Byte
Character and Double-Byte Characters Sets. And, they bother translators.
Punctuation/symbol characters Alphabet characters Numeric characters
Single-byte
character sets
? ! ; : - () [] <>
' " # % @ * /
ABCDEFG
アイウエオカキクケコ (katakana)
01234567890
Double-byte
character sets
?!;:-()[] <>
’ ” # % @ * /
ABCDEFG
アイウエオカキクケコ
0123456789
11
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Fun Fact: 24 translation patterns for the term "User
Interface" and they are grammatically all correct!
ユーザーインターフェース ユーザーインタフェース ユーザーインターフェイス ユーザーインタフェイス
ユーザインターフェース ユーザインタフェース ユーザインターフェイス ユーザインタフェイス
ユーザー▲インターフェース ユーザー▲インタフェース ユーザー▲インターフェイス ユーザー▲インタフェイス
ユーザ▲インターフェース ユーザ▲インタフェース ユーザ▲インターフェイス ユーザ▲インタフェイス
ユーザー・インターフェース ユーザー・インタフェース ユーザー・インターフェイス ユーザー・インタフェイス
ユーザ・インターフェース ユーザ・インタフェース ユーザ・インターフェイス ユーザ・インタフェイス
12
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
▲ standsforasingle-bytespace.
▲ standsforasingle-bytespace.
Following rules in style guides (1)
Most of the companies have their own style guides and the rules are
slightly different, such as spacing rules, brackets, long vowels (cho-on), etc.
Spacing rules Company A Company B Company C
Katakana
words
en: User interface
ja: ユーザー▲インターフェイス
en: User interface
ja: ユーザインタフェース
en: User interface
ja: ユーザー・インターフェイス
Between
single-byte
anddouble-byte
characters
en: From April 17 to 18
ja: 4▲月▲17▲日~▲18▲日
en: From April 17 to 18
ja: 4月17日~4月18日
en: From April 17 to 18
ja: 4▲月▲17▲日~18▲日
13
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Following rules in style guides (2)
Most of the companies have their own style guides and the rules are
slightly different, such as spacing rules, brackets, long vowels (cho-on), etc.
14
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Company A Company B Company C
Brackets
Use [ ] (single-byte) for user
interface terms
Use 『 』 for book titles and use
「 」 for chapter/section titles
Use [ ] (double-byte) for
user interface terms
Use 『 』 for book, chapter
and section titles
Use 「 」 (double-byte) for
user interface terms
Use 『 』 for book, chapter
and section titles
Long
vowels
(cho-on)
User … ユーザー
Printer … プリンター
Programmer … プログラマー
(dependsofnumbersofsyllables)
User … ユーザー
Printer … プリンター
Programmer … プログラマ
User … ユーザ
Printer … プリンタ
Programmer … プログラマ
MTrans Post-Edit Booster – Auto Pre-Edit tool
To address pain points of post-editors relating to style guides, Human
Science developed a new tool called MTrans Post-Edit Booster.
With MTrans PE Booster, you can
• configure find-and-replace settings to automatically pre-edit common errors
in machine translation outputs,
• use regex to configure find-and-replace settings and it enables the advanced
pre-editing,
• export/import the setting file so that the same corrections are implemented
and the consistency can be improved, and
• improve MT outputs without MT engine training, and users can tune the
settings anytime they want.
15
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
MTrans Post-Edit Booster – Experiment
We did a test to investigate how MTrans PE Booster boosts post-edit
productivity.
Test condition:
• Target content … a Wikipedia page about Microsoft (excerpted, 2049 words / 109
segments)
• CAT tool … SDL Trados Studio 2017
• MT engine … Google Translate
• Style guide … Microsoft Localization Style Guide (excerpted from the 64-page guide)
Spacing rules (between full-width and half-width characters, parentheses, slashes)
Katakana prolonged sound mark (only “コンピューター”)
Katakana compound word
(only “パーソナル コンピューター” and “オペレーティング システム”)
Parentheses (half-width parentheses)
16
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Experiment – MT without MTrans PE Booster
Video
(2:37)
17
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Experiment – MT with MTrans PE Booster
18
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Experiment – MT with MTrans PE Booster
Original Google Translate MTrans PE Booster
19
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Experiment Result
Original Google Translate MTrans PE Booster
95 out of 109 segments
87.2%
Segments which
include style* errors
0 out of 109 segments
0%
661 Style* error
occurrences 0
25 minutes 5 seconds Time to correct
0 second
(a few minutes for configuring
replacement rules)
So-so, because post-editors need to
correct lots of style errors Quality expected
Good quality, because post-editors
can take more time to brush up
translations
Unhappy  Post-editors are … Happy 
20
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
*In this experiment, “style” means the rule described in the slide 15
Original Google Translate MTrans PE Booster
Experiment – MT with MTrans PE Booster
21
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
Conclusion
How can an LSP support post-editors?
1. Hear directly from post-editors to understand what bothers them.
2. Analyze the problems.
3. If something can be solved by an LSP side, do it!
And by supporting post-editors…
post-editors can focus on what they really should do,
the final translation quality can be improved,
and both Clients and Linguists are happy.
22
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
What LSPs can do to support Post-Editors
for addressing pain-points in NMT
23
Thank you www.science.co.jp
+81-3-5321-3111
t-shishido@science.co.jp

More Related Content

Similar to What LSPs can do to support post-editors for addressing pain-points in nmt

Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer ModelsDatabricks
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolometauyou
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network
 
Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019
Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019
Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019Jose Luis Bonilla Sánchez
 
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSIconic Translation Machines
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...SDL
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...SDL
 
Methods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine TranslationMethods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine TranslationKerstin Berns
 
Frequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersFrequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersnishajj
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingWelocalize
 
Introducing the Global English Style Guide
Introducing the Global English Style GuideIntroducing the Global English Style Guide
Introducing the Global English Style GuideEddie Hollon
 
Dtp, web design & presentation software revision
Dtp, web design & presentation software revisionDtp, web design & presentation software revision
Dtp, web design & presentation software revisionMrJRogers
 

Similar to What LSPs can do to support post-editors for addressing pain-points in nmt (20)

TAUS QE Summit 2017 eBay EN-DE MT Pilot
TAUS QE Summit 2017   eBay EN-DE MT PilotTAUS QE Summit 2017   eBay EN-DE MT Pilot
TAUS QE Summit 2017 eBay EN-DE MT Pilot
 
Conversational AI with Transformer Models
Conversational AI with Transformer ModelsConversational AI with Transformer Models
Conversational AI with Transformer Models
 
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego BartolomeMachine Translation Master Class at the EUATC Conference by Diego Bartolome
Machine Translation Master Class at the EUATC Conference by Diego Bartolome
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
TAUS Evaluating Post-Editor Performance Guidelines
TAUS Evaluating Post-Editor Performance GuidelinesTAUS Evaluating Post-Editor Performance Guidelines
TAUS Evaluating Post-Editor Performance Guidelines
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019
Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019
Building and Implementing MT systems @ eBay – TAUS Global Content Summit 2019
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWSSeeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
Seeing the Wood for the Trees in MT Evaluation: an LSP success story from RWS
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...Machine Translation: Latest Innovations and their Impact on Commercial Transl...
Machine Translation: Latest Innovations and their Impact on Commercial Transl...
 
Methods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine TranslationMethods for Handling Terminology in Machine Translation
Methods for Handling Terminology in Machine Translation
 
TAUS MT Post-Editing Guidelines
TAUS MT Post-Editing GuidelinesTAUS MT Post-Editing Guidelines
TAUS MT Post-Editing Guidelines
 
Frequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answersFrequently asked tcs technical interview questions and answers
Frequently asked tcs technical interview questions and answers
 
Tips and tricks for PE
Tips and tricks for PETips and tricks for PE
Tips and tricks for PE
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
 
Nikon - TAUS Tokyo Forum 2015
Nikon - TAUS Tokyo Forum 2015Nikon - TAUS Tokyo Forum 2015
Nikon - TAUS Tokyo Forum 2015
 
Introducing the Global English Style Guide
Introducing the Global English Style GuideIntroducing the Global English Style Guide
Introducing the Global English Style Guide
 
IRJET- Vocal Code
IRJET- Vocal CodeIRJET- Vocal Code
IRJET- Vocal Code
 
Dtp, web design & presentation software revision
Dtp, web design & presentation software revisionDtp, web design & presentation software revision
Dtp, web design & presentation software revision
 

Recently uploaded

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 

Recently uploaded (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 

What LSPs can do to support post-editors for addressing pain-points in nmt

  • 1. What LSPs can do to support Post-Editors for addressing pain-points in NMT Toru Shishido Solutions Consultant, Human Science 1 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 2. Agenda • Post-Edit Survey 2019 • Style Guides • MTrans Post-Edit Booster • Conclusion 2 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 3. Post-Edit Survey 2019 We asked 100 translators/reviewers about MT and PE (56 replied). • Years of a career as a linguist • Experiences of post-editing • Do you like or dislike post-editing job • Why don’t you perform post-editing job • What improvement do you expect in the future 3 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 4. Post-Edit Survey 2019 – Result 1 We asked 100 translators/reviewers about MT and PE (56 replied). • Years of a career as a linguist • Experiences of post-editing 4 What LSPs can do to support Post-Editors for addressing pain-points in NMT Have you ever performed PE tasks? Almost every day Several a month Several a year Not experienced, but interested Always reject Others (10.7%) (30.4%) (23.2%) (5.4%) (21.4%) (8.9%) How long have you worked as a freelance translator/reviewer? Less than a year 1-3 years 4-5 years 6-9 years 10-14 years 15 years or more (5.4%) (14.3%) (10.7%) (10.7%) (19.6%) (39.3%)
  • 5. Post-Edit Survey 2019 – Result 2 We asked 100 translators/reviewers about MT and PE (56 replied). • Do you like or dislike post-editing job • Why don’t you perform post-editing job 5 What LSPs can do to support Post-Editors for addressing pain-points in NMT Are you satisfied with PE tasks compared to translation/review tasks? Very satisfiedNot satisfied at all Why don’t you accept PE tasks? Never been asked Translation skill might be decreased Most of outputs are not accurate Most of outputs are not fluent Incorrect terminology Incorrect style Inconsistent translations Cheaper word rate I hate MT 15 (75%)
  • 6. Post-Edit Survey 2019 – Result 3 We asked 100 translators/reviewers about MT and PE (56 replied). • What improvement do you expect in the future 6 What LSPs can do to support Post-Editors for addressing pain-points in NMT What improvement do you expect to MT? Choose up to two options. Accuracy Fluency Correct terminology Correct style Consistent translations Improvement on numeric errors Correct tags
  • 7. Perceptions from Post-Edit Survey 2019 Feedback to MT outputs from our translators/reviewers: • Rather than machine translation improvements, I hope MT engines will support minor style differences depending on the end client. • As a translator, it would be very happy if the stylistic/cosmetic errors will be automatically modified when opening the file. • If the error patterns are identified, post-editing is often easier than reviewing bad human translations. I'm fed up with reviewing bad HT, but I can forgive mistranslations in MT. However, I'm wondering why there are some style and numeric errors in MT outputs. They seem the strong points of machines. 7 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 8. Pain points of Post-Edit – from our Clients Our clients also mentioned about style errors: Double quotation marks are always double-byte characters in translations. Hopefully, double quotation marks should always be changed to Japanese brackets 「」 in translation. Double-byte parentheses should not be used. Must be replaced with single-byte ones. And spaces should be inserted before the beginning parenthesis and after the closing parenthesis. Double-byte slashes should not be used. Must be replaced with single-byte ones. Sometimes the word "default" is translated to "既定", but it should always be " デフォルト". 8 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 9. Localization style guides - Microsoft According to Microsoft, Microsoft (Localization) Style Guides are collections of rules that define language and style conventions for specific languages. These rules usually include general localization guidelines, information on language style and usage in technical publications, and information on market-specific data formats. 109 localization style guides are available there. 9 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 10. Why style matters? Without a style guide: Linguists cannot decide which style should be applied without a style guide and there are more inconsistent translations in contents. Inconsistent translations: For clients, inconsistent translation may damage the brand and it may take lots of time to standardize their inconsistent translations. Links will not work: https://www.science.co.jp https://www.science.co.jp 10 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 11. Two character sets: Single/Double Byte Characters There are two versions of character sets for Japanese, Single-Byte Character and Double-Byte Characters Sets. And, they bother translators. Punctuation/symbol characters Alphabet characters Numeric characters Single-byte character sets ? ! ; : - () [] <> ' " # % @ * / ABCDEFG アイウエオカキクケコ (katakana) 01234567890 Double-byte character sets ?!;:-()[] <> ’ ” # % @ * / ABCDEFG アイウエオカキクケコ 0123456789 11 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 12. Fun Fact: 24 translation patterns for the term "User Interface" and they are grammatically all correct! ユーザーインターフェース ユーザーインタフェース ユーザーインターフェイス ユーザーインタフェイス ユーザインターフェース ユーザインタフェース ユーザインターフェイス ユーザインタフェイス ユーザー▲インターフェース ユーザー▲インタフェース ユーザー▲インターフェイス ユーザー▲インタフェイス ユーザ▲インターフェース ユーザ▲インタフェース ユーザ▲インターフェイス ユーザ▲インタフェイス ユーザー・インターフェース ユーザー・インタフェース ユーザー・インターフェイス ユーザー・インタフェイス ユーザ・インターフェース ユーザ・インタフェース ユーザ・インターフェイス ユーザ・インタフェイス 12 What LSPs can do to support Post-Editors for addressing pain-points in NMT ▲ standsforasingle-bytespace.
  • 13. ▲ standsforasingle-bytespace. Following rules in style guides (1) Most of the companies have their own style guides and the rules are slightly different, such as spacing rules, brackets, long vowels (cho-on), etc. Spacing rules Company A Company B Company C Katakana words en: User interface ja: ユーザー▲インターフェイス en: User interface ja: ユーザインタフェース en: User interface ja: ユーザー・インターフェイス Between single-byte anddouble-byte characters en: From April 17 to 18 ja: 4▲月▲17▲日~▲18▲日 en: From April 17 to 18 ja: 4月17日~4月18日 en: From April 17 to 18 ja: 4▲月▲17▲日~18▲日 13 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 14. Following rules in style guides (2) Most of the companies have their own style guides and the rules are slightly different, such as spacing rules, brackets, long vowels (cho-on), etc. 14 What LSPs can do to support Post-Editors for addressing pain-points in NMT Company A Company B Company C Brackets Use [ ] (single-byte) for user interface terms Use 『 』 for book titles and use 「 」 for chapter/section titles Use [ ] (double-byte) for user interface terms Use 『 』 for book, chapter and section titles Use 「 」 (double-byte) for user interface terms Use 『 』 for book, chapter and section titles Long vowels (cho-on) User … ユーザー Printer … プリンター Programmer … プログラマー (dependsofnumbersofsyllables) User … ユーザー Printer … プリンター Programmer … プログラマ User … ユーザ Printer … プリンタ Programmer … プログラマ
  • 15. MTrans Post-Edit Booster – Auto Pre-Edit tool To address pain points of post-editors relating to style guides, Human Science developed a new tool called MTrans Post-Edit Booster. With MTrans PE Booster, you can • configure find-and-replace settings to automatically pre-edit common errors in machine translation outputs, • use regex to configure find-and-replace settings and it enables the advanced pre-editing, • export/import the setting file so that the same corrections are implemented and the consistency can be improved, and • improve MT outputs without MT engine training, and users can tune the settings anytime they want. 15 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 16. MTrans Post-Edit Booster – Experiment We did a test to investigate how MTrans PE Booster boosts post-edit productivity. Test condition: • Target content … a Wikipedia page about Microsoft (excerpted, 2049 words / 109 segments) • CAT tool … SDL Trados Studio 2017 • MT engine … Google Translate • Style guide … Microsoft Localization Style Guide (excerpted from the 64-page guide) Spacing rules (between full-width and half-width characters, parentheses, slashes) Katakana prolonged sound mark (only “コンピューター”) Katakana compound word (only “パーソナル コンピューター” and “オペレーティング システム”) Parentheses (half-width parentheses) 16 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 17. Experiment – MT without MTrans PE Booster Video (2:37) 17 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 18. Experiment – MT with MTrans PE Booster 18 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 19. Experiment – MT with MTrans PE Booster Original Google Translate MTrans PE Booster 19 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 20. Experiment Result Original Google Translate MTrans PE Booster 95 out of 109 segments 87.2% Segments which include style* errors 0 out of 109 segments 0% 661 Style* error occurrences 0 25 minutes 5 seconds Time to correct 0 second (a few minutes for configuring replacement rules) So-so, because post-editors need to correct lots of style errors Quality expected Good quality, because post-editors can take more time to brush up translations Unhappy  Post-editors are … Happy  20 What LSPs can do to support Post-Editors for addressing pain-points in NMT *In this experiment, “style” means the rule described in the slide 15
  • 21. Original Google Translate MTrans PE Booster Experiment – MT with MTrans PE Booster 21 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 22. Conclusion How can an LSP support post-editors? 1. Hear directly from post-editors to understand what bothers them. 2. Analyze the problems. 3. If something can be solved by an LSP side, do it! And by supporting post-editors… post-editors can focus on what they really should do, the final translation quality can be improved, and both Clients and Linguists are happy. 22 What LSPs can do to support Post-Editors for addressing pain-points in NMT
  • 23. What LSPs can do to support Post-Editors for addressing pain-points in NMT 23 Thank you www.science.co.jp +81-3-5321-3111 t-shishido@science.co.jp

Editor's Notes

  1. Hello. Thanks for having me. My name is Toru Shishido, from Human Science. I joined Human Science about two years ago and now I’m working here as a Solutions Consultant. With our engineering team, we suggest and provide MT solutions for companies who want to introduce the MT to their workflow or want to improve their MT process. Before joining Human Science, I had worked at a couple of LSPs as a translator, reviewer, post-editor, project manager and MT consultant. Today I would like to share how we are supporting post-editors by our technology and experiences which we have got in the localization business over the years.
  2. Last year I made a presentation in TAUS Executive Forum, and I talked about what LSPs can do to support linguists for emerging post-editing projects. I covered various topics relating to post-editing projects in the last presentation, but I would like to focus on one specific topic this time. The specific topic is Style. Style of translation. When people talk about the customization of NMT, they tend to discuss glossaries like how to apply their jargons to MT outputs. However, I haven’t heard anything about style issues or cosmetic issues in MT outputs from them. So today I would like to dig deep into this challenge, because Human Science has heard lots of complaints about style issues in outputs from not only freelance translators but also our clients. Also, as said, I would like to share how we are addressing the challenge with our experiences and technology.
  3. Firstly, I would like you to know what our freelance linguists think about PE tasks. To find out how they feel and what they want, I did a survey about post-editing tasks like the last year. The idea of this survey was to gather their honest opinions about post-editing tasks and to identify what are the pain-points of PE.
  4. The first question is about years of translator careers. About 60% of respondents are More than 10 years. So, you could say that the survey result is based on well-experienced linguists. The second question is about how often you perform post-editing tasks. Last year I asked the same question, and people who answered “Almost every day” were less than 5 percent. This year it becomes 10.7 percent. And people who replied “Several a month” are about 30% this year and it was up 5 percent from last year. From the result, it is assumed that the number of post-editing projects has been increased from the last year.
  5. Then I prepared different questions for people who perform PE tasks, and for people who do not. For people who take care of PE tasks, I threw a satisfaction question of PE tasks. And it seems that most of them do not enjoy PE tasks like translation or review tasks. People who are taking care of PE tasks admit that the quality of MT outputs have been improved, but they are not satisfied with inconsistent translations, especially Glossary and Style. For people who do not take care of PE tasks, I asked why they do not like to accept the PE tasks. The most common answer is the word rate. They said that PE tasks are not easy as most of the people think and the cheaper word rate is not fair. Other people are afraid of PE tasks because it may affect their translation skill in a negative way.
  6. And, this is the final question of this survey. I asked what improvements are expected to MT engines. Translation Accuracy is the first place of this survey, and Correct Terminology is the second, and then Translation Fluency. In the fourth place, the Correct Style is ranked in and this is what I will cover in this presentation. For the first three expectations (accuracy, terminology, and fluency), they should be very difficult to fix by an LSP. However, we thought that style issues could be solved by ourselves and we could do it. So, today, I would like to share how we do it in the presentation.
  7. Now I would like to share some comments from people who perform post-editing. Some post-editors say that modifying the style errors is a painful task. And, others say that PE is easier than reviewing bad human translation if the error patterns can be identified. Actually, we had got a hint of our new solution from a comment like this, because we also noticed that there are some common style errors in MT outputs.
  8. In this slide, I would like to share what our clients say about post-editing. Interestingly, clients also have problems with style errors as freelance post-editors. These style rules can be configured to Rule-based MT engines. However, it is not feasible to do so to NMT engines, at least it’s not easy like RBMT, so post-editors need to correct tons of style errors manually every day. Also, they said that they would like to take more time to brush up the MT outputs, but they cannot do it because of many style errors. This is why we decided to develop a new PE solution to support post-editors and make them happy.
  9. Before introducing our new PE tool, let’s take a quick look at what Style means in the localization process. I would like to pick out the Microsoft website to explain what the style is, because I guess their style guide is most famous for IT translators. According to Microsoft, style guides are collections of rules that define language and style conventions for specific languages. So you can say the style guide is an essential guideline for translators. You cannot go wrong if you follow the guide.
  10. What will be happened if you ignore the style guide? Of course, your quality score will be lower if your translation contains lots of style errors, and there is more. If there is no style guide, the numbers of inconsistent translation will be increased. And then, with these inconsistent translations, the end-users may not trust the brand anymore. And the final reason is a bit specific and it is common for languages which have double-byte characters like Japanese. In this example, the colon and slashes are translated to double-byte characters by MT, so the link is not clickable in the translation. This can be a cause of the bad user experience. Thus, the style is very important.
  11. For Japanese, we have two character sets for some characters. Grammatically, they are all correct in general. However, most of the style guides indicate which character sets should be used and not.
  12. Here, I would like to share a fun fact in Japanese localization. When translating the term “User Interface” to Japanese, there are 24 spelling patterns, or maybe more. They are all correct in Japanese, but most of the Japanese style guides indicate how to spell the translation of “User Interface”. These variations originate from the history of Japanese writing rule, limited spaces in UI, or background and culture of companies.
  13. Here are some examples of Japanese style. These spacing rules can be configured with RBMT, but it is not possible with NMT.
  14. Here are other samples of Japanese style; brackets and cho-on characters. I will not explain deeply about these rules. But, please imagine. If Company A acquired Company C in this example, these rules would get mixed up. And this would lead to translators getting confused. This is not a joke. This kind of mix-up occurs sometimes.
  15. And now, finally, I would like to introduce our new approach to address these challenges and pain points of post-editors. With the challenges relating to the style which we have seen, and the feedback which we received from our clients and freelance linguists, we developed a new tool called MTrans Post-Edit Booster. By using PE Booster, most of these style errors will be eliminated when post-editors open the bilingual file for post-editing. While PE Booster boosts the productivity of post-editing, it provides more benefits for post-editors. One of the advantages of using PE Booster is that you can tune the auto pre-edit settings anytime you want. You know, LSPs cannot change the logic of MT engines. Even if the engine is customizable, it costs time and money. However, if you can identify the trend of style errors, you can change the settings anytime. And, the quality of MT output will be improved right away by project managers or post-editors.
  16. Next, I would like to show you how PE Booster can improve post-editing productivity. We did a test to measure productivity improvement. I used the same source file and modified only style errors in the bilingual file with and without PE Booster. For the style guide, I picked out the Microsoft one. However, if I cover all in the style guide it takes lots of time, so I narrowed the rules down to 4 items. They are the spacing rule, two Katakana rules which cover only three terms, and parentheses rule. For the MT engine, I used Google one, because I thought Microsoft Translator knows all the style rules of Microsoft.
  17. I would like to show you a video of how Japanese post-editors are struggling with tons of style errors. Please note that there are only four rules which this post-editor needs to modify. I mean there are more style errors to be modified in the actual project. This post-editor is mostly fixing the spacing errors and this is very painful for post-editors. You see? It takes about 48 seconds to only modify the style errors in the first five segments. In the real post-editing tasks, you need to read a source text, and read a translation, and then modify it if there is any mistranslation, or brush up the translation if there are any unnatural phrases contained. However, correcting style errors must be done by post-editors as well. This file contains 109 segments and nearly all the segments have style errors. Do you think post editors can enjoy these modifications? Of course not. In the survey which I showed at the beginning of this presentation, some translators answered they always reject PE tasks because of the style errors in MT outputs like this. Also, lots of translators expect the improvement of style issues. In other words, they may happily accept PE requests if they do not have to take care of these style corrections. By the way, we did the same test with the Microsoft engine. Its output contains fewer style errors such as spacing issues. However, there are specific kinds of errors which were not found in the output of the Google engine. Of course, some of them can be fixed by PE Booster. Finally, the post editor completed modifying the style errors and it took 25 minutes.
  18. Now, I would like to show you how to use PE Booster. Currently, MTrans PE Booster is only available as a Trados plug-in, but we are working on further development to integrate other CAT tools. This is a settings dialog of PE Booster. What you should do is only typing, what words should be detected, and how these detected words should be replaced. You can use a regular expression, so the advanced replacement is possible. I added 14 items to the settings to cover all four rules for this experiment. When you complete the settings, apply the machine translation and open the bilingual file for post-editing.
  19. And here is the result. When opening the bilingual file, the style errors are disappeared. So post-editors can focus on polishing up the translation from the start. For the raw MT, there are fifteen items to be modified in only the first five segments.
  20. And here is a comparison of the productivity and expected quality, with and without PE Booster. Of course, you need to take a few minutes to configure the PE Booster settings, but it is much better than taking 25 minutes for modifying the style and cosmetic errors. And, by using PE Booster, you can expect better quality, because, without PE Booster they need to take a lot of time to correct the style errors.
  21. This time I only applied four rules from Microsoft Style Guide for this experiment, but the Guide contains more rules actually. The red markers indicate other style errors when you need to follow the real Microsoft Style Guide. However, these marked items can be fixed with PE Booster by adding replacement rules.
  22. OK, here’s our conclusion. How can an LSP support post-editors? Some post-editors or translators have lots of complaints about machine translation, even the quality has been improved than before. Some of their complaints cannot be solved by ourselves for some reasons. However, after listening to their pain-points and analyzing them deeply, LSPs can see a light to solve some challenges with their localization experiences. If you see a light, just move forward to it. By doing so, you can support not only the linguists but your end clients. For LSPs, we cannot provide high-quality post-edit services without good post-editors. So I think we, LSPs, need to deeply discuss MT with them, and provide supports “to make them happier” and “to streamline the post-editing workflow”.
  23. Thank you very much