Getting the Right UGC Translation Quality: Human or Machine, Professional or Crowd. Language Tools Team at Welocalize shares insights and expertise regarding the quality requirements for user generated contented. Human Translation, Machine Translation, Post-Editing, Crowdsourcing topics.Localization World TAUS Quality Summit Pre-Conference in Dublin, Ireland in June 2014.
4. Adding UGC to the Mix
major influence in peoples’ buying decisions
the second most trusted form of advertising (after word-of-mouth), with
70% of global consumers indicating they trust this platform
UGC in the form of support forums reduces the cost of supporting customers
UGC is always blended in with the “branded” web UI content
No traditional translation process, quality standards and pricing
models can be viable in the UGC world – something’s got to give!
*Nielsen’s 2011 Global Trust in Advertising report, which surveyed more than 28,000 people in 56 countries
6. UGC & Translation Quality Requirements
There are different kinds of UGC and traditional quality requirements are less relevant
than the impact and purpose requirements.
If your UGC is a part of your brand identity, needs to produce emotional impact and borders the
"transcreated copy", you'll be best off engaging professional translators and providing brand
book-level guidelines.
If your UGC is there to convey information but translations need to preserve the desired tone
(a casual tone upbeat review), the crowd approach will be best.
If your UGC is there to provide technical instructions, you will need subject matter experts who
will be able to accurately preserve the meaning of the source input in the translation.
7. Characteristics of UGC
authored by non-professionals and / or non-native speakers
often similar patterns to oral speech
sometimes authored by power users / “techies”
often highly perishable
multitude of authors = diversity of styles
Examples:
•Short forms (nite (night), sayin (saying), gr8 (great)),
•Acronyms (lol (laugh out loud), iirc (if I remember correctly)),
•Typing errors/misspellings (wouls (would), rediculous (ridiculous)),
•Punctuation omissions/errors (im (I’m), dont (don’t)),
•Non-dictionary slang (that was well mint (that was very good)),
•Wordplay (that was soooooo great (that was so great)),
•Censor avoidance (sh1t, f***),
•Emoticons (:) (smileys), <3 (heart))
•Foreign words used intentionally (al dente, bon voyage)
(Roturier, 2011), (Jiang et al, 2012), (Clark & Araki, 2011)
8. Travel Portal – Company + Customer Content
Yellow = UGC
Green = Web UI
12. UGC & MT
UGC has become a part of the Brand Strategy > raw MT will not be enough in all
cases
Utility scoring should be used to measure the quality of raw MT for UGC; it rates
the comprehensibility & utility of the output rather than the linguistic quality
MT evaluation results for UGC indicate that around 50% or less of comments /
reviews are considered comprehensible
Researchers are focusing efforts on normalization and preprocessing steps of
UGC in order to improve MT output and reduce the PE effort
…. How much can and should post-editing fix?
13. UGC & MT - Normalization
Normalization is the manual or automated process of: taking non-standard input
and pre-translating them using scripts, regular expressions and other processes
in order to make the source text more ‘normal’ before machine translation.
Example:
14. UGC – Example Post-Editing Scenarios
Online Marketplace – post-editing for MT engine re-training
Closer to Full PE, due to very specific PE requirements and expected
knowledge around MT engine logic of post-editors.
E.g.: Change words instead of changing sentence structure, do not
add or leave out any information
2. Knowledgebase – crowd-sourced post-editing
Light PE quality requirements with focus on severe mistranslations and
fixing corrupted content
3. Travel portal user reviews - sanity check of raw MT output
Extra Light PE with focus on severe mistranslations and offensive
content on high volume & high perishability content
15. Levels of Post-Editing for UGC
Source Raw MT Dutch light PE Dutch full PE COMMENTS ON EDITS
We have stayed
here and their has
been a few stag and
hens but nothing to
worry about. They
have been very
respectful that its a
family hotel
We hebben hier
verbleven en hun is
geweest een paar hert en
kippen maar niets te
vrezen. Ze zijn heel
respectvol dat het is een
familiehotel
We hebben hier
verbleven en er zijn
een paar
vrijgezellenfeesten
geweest maar niets te
vrezen. Ze zijn heel
respectvol dat het een
familiehotel is
We hebben hier
verbleven en er zijn een
paar vrijgezellenfeesten
geweest maar er is niets
te vrezen. Ze zijn heel
respectvol dat het een
familiehotel is.
Light PE: mistranslation for stag
and hens, literal translation
entered but meant were stag
and hen parties. Also typo in
source has been taken over
(their > hun) and this had to be
corrected in the PE.
Full PE: Additional rewrites,
adding the word "er" to
improve readability.
Yes it is in the
bedroom, we did
not have a bed
settee
Ja het is in de
slaapkamer, we hebben
niet een slaapbank
NO EDITS REQUIRED Ja het is in de
slaapkamer, we hebben
geen slaapbank
Full PE: "niet een" changed to
"geen" for readability.
my dad is in a
wheelchair is this
hotel suitable
mijn vader is in een
rolstoel is dit hotel
geschikt
NO EDITS REQUIRED Mijn vader zit in een
rolstoel, is dit hotel
geschikt?
Full PE: added punctuation,
Capitalization and made the
sentence more readable.
16. UGC & Quality Guidelines
Style Guide for
Professional
Translation – typically
between 20-50 pages,
outlines client-specific
requirements
Not appropriate for
UGC
(in terms of content &
resources)
18. Translating UGC – Talent Search
- Knowledge of target and source locale
- Knowledge of content / subject-matter
- Product users (power users,…)
- People who like to write / use their language skills
- People who belong to a user / interest group
- May need to learn using CAT technology
- May need to follow very specific instructions
19. Crowdsourcing
The term crowdsourcing was coined by Jeff Howe in 2006, as the act of taking a task
traditionally performed by an employee or contractor, and outsourcing it to an undefined
and generally large group of people in the form of an open call.
Crowd-sourced translation is currently the most active area of investment in the industry.
(Kelly, 2013)
Requires a shared and easy-to-use translation / post-editing platform
In order for crowd-sourcing to be most effective (turnaround times & throughputs), a
good-sized crowd needs to be built – requiring some form of incentive / motivation
Some quality guidance is required to prevent chaos
20. Crowd & Quality Guidelines
While crowd guidelines need to be adapted to the specific crowd (professional / non-
professional, linguists / power-user, paid / un-paid,...) and content & purpose (brand
identity, technical specialization, special user community – e.g. luxury versus student
travel,...), they should generally be:
Simple & easy-to-follow by all crowd members
Brief and ideally without reference to additionl checks & documents
Clarify main purpose of translation / post-editing assignment
Provide clear rules (rather than lists of exceptions)
They may point out the handling of specific technical items.
21. Possible Translation Scenarios
MT + normalization; paid crowd
with basic instructions on “do’s
and don’ts”; crowd can be mix
of translators / customers / …
possibly MT with Full PE,
but possibly professional
HT / transcreation
MT + “accuracy check” PE;
crowd of technical users,
savvy on product,
linguistic errors are ok