SlideShare a Scribd company logo
1 of 46
Download to read offline
What We Want, What We Need,
What We Can’t Do Without
The Enterprise User Perspective on MT Technology & Things Around It
Olga Beregovaya
VP, Language Tools
ONE THING HAS BECOME ALL THINGS; EVERYTHING
• Global Economy Multicultural Transactions
• Multiple Demand-Driven Content Scenarios
• Multiple Data Sources + Formats
27,500 words - 25,000 emoji lattices
…” And what is the use of a book,” thought Alice, “without pictures or
conversations?”
?
“The limits of my language means the limits of my world.”
― Ludwig Wittgenstein
“Never send a human to do a machine’s job.”
― Agent Smith
• Areas of utmost interest and importance as seen from a major Language
Service Provider perspective, or “a week in a life of an Language Technology
group”
• Engine Output Quality; what is it actually, “output quality”? Are we “stuck”?
Next breakthrough?
• Domain adaptation, what can we do when there is neither data nor
budget to create it?
• Supporting “Raw” publishing scenarios (UGC, Support, MT to be consumed
by other applications) – there will be no human to fix it. Or will there?
• Metadata – an enemy or an ally?
• Collaboration: how can we make it interesting for everyone?
AND WE’LL TALK ABOUT…
•  What do translators appreciate?
•  What do translators struggle most with?
•  Fluency VS. Accuracy?
•  Final output quality?
WINS & CHALLENGES
CONTENT DRIVING
QUALITY DECISIONS
LEVELS OF
POST-EDITING
CONTENT TYPE STRATEGY
FULL POST- EDITING
Content that meets certain
“impact” criteria-visibility,
number of clicks, “shelf life”
Post-edit to human translation
levels, correct for terminology,
grammar, fluency, style and
voice
MEDIUM POST-EDITING
Human translation level
requirement but with flexible
style and fluency allowances
Content that meets certain
“impact” criteria-visibility,
number of clicks, “shelf life”
LIGHT POST-EDITING
Emphasis on quick turnaround
and/or large volumes
Sliding scale depending on the
content purpose
Si queréis viajar en grupo,
sóis al menos 6 personas y
deseáis ir en otras fechas
diferentes
a las propuestas,
preguntarnos porque  lo
podemos organizar
;-)
If there are at least 6 of
you wishing to travel
together, but on different
dates that the ones
offered, why don’t you ask
us? We can arrange it. ;- )
If you want to travel in a
group, you are at least 6
people and you wish to
travel on different dates
other than the proposed
ones, let us know because
we can organize it.
If you want to travel in a
group, you are at least 6
people and you wish to
travel on different dates
other than the proposed
ones, let us know because
we can organize it.
If you want to travel in
groups , are at least 6
people and wish to go on
dates other than the
proposed, ask us because
we can organize ;- )
If you want to travel in
groups , sóis at least 6
people and wish to go on
dates other than the
proposals, ask why we can
organize ;- )
A la hora de generar las
cuotas de amortización es
posible utilizar dos
porcentajes distintos; el
fiscal o el de mercado que
la empresa establezca.
At the time of generating
amortization fees, one
might use two different
percentages: Fiscal or
Market established by the
company.
When generating the
amortization fees, it is
possible to use two
different percentages; the
fiscal percentage and the
company established
market one.
When generating the
amortization fees, it is
possible to use two
different percentages; the
fiscal percentage and the
company established
market one.
When generating the
amortization fees is
possible to use two
different percentages; the
fiscal or the market rate
that the company stated.
When generating the
repayment is possible to
use two different rates;
the prosecutor or the
market that the company
stated.
Para el supuesto de que
haya contratado el
servicio de
actualizaciones relativo
al programa software
objeto de licencia, usted
podrá actualizar el
mismo durante los
periodos de vigencia que
tenga contratado el
servicio.
You can update the
licensed software during
the period stated in your
contract.
In case you have signed
up for the update service
related to the licensed
software, you will be able
to update the software
during the validity periods
stated in the contract.
In case you have signed
up for the update service
related to the licensed
software, you will be able
to update the software
during the validity periods
stated in the contract.
Because you engaged the
update service related to
the licensed software
program, you can update
it during periods of
validity of the service
contracted
For the assumption that
engaged the update
service software on the
licensed program, you can
update it during periods
of validity has contracted
the
DOMAINS
SOURCE
TRANSCREATION
TRANSLATION
FULL
POST-EDITING
LIGHT
POST-EDITING
RAWMT
TRAVEL
FINANCE
LEGAL
CONSUMING MT — QUALITY SCENARIOS
THE POST-EDITOR PRODUCES:
Publishable quality
The post-editor is responsible for ensuring that client quality requirements
and style guide are met
The post-editor is expected to adhere to client StyleGuide preferences
with regard to:
  Infinitive / Imperative
  Passive / Impassive
  Formal / Informal
  Different Styles for Headers, Lists, Tables
  Special Handling of UI Options (Bilingual, English, Target?)
  Converting All the Measurements Based On the Local Conventions
+ Disambiguate Terminology
+ Correct all the grammatical errors
THE POST-EDITOR RECEIVES:
GERMAN FRENCH JAPANESE RUSSIAN CHINESE SPANISH ITALIAN BRAZILIAN
WRONG TERMINOLOGY 6.46 4.93 13.63 5.00 6.20 9.63 3.78 1.13
WRONG SPELLING 2.00 0.86 0.88 0.13 0.30 1.13 0.56 1.27
SOURCE NOT TRANSLATED 6.38 5.36 3.88 5.13 3.60 2.50 1.22 1.73
COMPLIANCE WITH CLIENT
SPECS
2.46 0.86 3.00 2.13 0.70 0.63 0.44 2.60
LITERAL TRANSLATION 7.85 8.64 5.00 4.00 9.40 5.38 7.67 7.93
TEXT/INFO ADDED 2.69 1.36 2.13 1.25 0.80 1.88 0.44 0.80
CAPITALIZATION 2.69 3.43 0.00 2.63 0.50 1.75 3.33 2.60
WRONG WORD FORM 6.77 7.79 0.13 9.88 0.60 6.75 3.67 6.75
WRONG PART OF SPEECH 2.62 3.21 2.00 1.88 0.60 2.13 3.67 1.33
PUNCTUATION 4.46 3.00 0.75 3.38 4.10 2.13 1.22 3.53
SENTENCE STRUCTURE 12.54 10.00 14.25 8.00 13.00 5.38 6.11 3.67
TAGS + MARK-UP 1.23 0.14 0.13 0.50 0.20 0.38 0.44 0.20
LOCALE ADAPTATION 0.46 0.29 0.75 0.63 0.20 0.75 0.44 0.13
SPACING 0.92 0.36 2.25 1.25 4.00 0.50 0.33 0.40
OTHER 1.92 1.50 1.88 0.13 0.50 0.13 1.44 0.27
TOTAL ERRORS 61.46 51.71 50.63 45.88 44.70 41.00 34.78 32.53
Most time-consuming issues that translators need to
fix are:
• Sentence structure (word order)
• MT output too literal
• Wrong terminology
• Word form disagreements
• Source term left untranslated
OR, IN A NUTSHELL…
TOP 6 ON THE TRANSLATORS’ LOVE IT-LIST
1.  Source of inspiration: reduces thinking and translation choice time
2.  Provides reference - very useful to translators new to a specific domain
3.  Reduces typing & lookup time by handling well repetitive terminology and
structures
4.  …thereby takes away the more monotonous efforts of translation
5.  Post-editors over time notice improvements; appreciate it more if they
‘co-own’ the engine
6.  MT output can be funny
LOL!
TOP 3 ON THE TRANSLATORS’ S*#!T-LIST
1. Wrong sentence structure
• Major impact on the post-editing effort (Spanish and Portuguese produce fewest errors)
• Japanese has the highest error rate and the lowest productivity gains (supported by
the cognitive effort error ranking research)
2. Wrong and inconsistent terminology
• Very time-consuming to check and fix terminology; + enough issues from Fuzzy
Matches already
• A major problem for new products where the terminology is not settled yet
• Inconsistent output for UI references
3. Correct MT to an agreed standard (=quality expectations)
• A challenging concept in the beginning for post-editors – they think they should edit
less if the quality is bad
S*#!T
FEEDBACK LOOP
SOURCE TEXT MT OUTPUT POST-EDITED OUTPUT
SPECIFIC ERRORS/
CHANGES MADE
Single-phase options range from 1.4kW
to 7.7kW while three-phase PDUs,
packed with output receptacles, range
from 8.6kW to 21.6kW.
Single-fase 7.7kW Opties variëren van
1.4kW om en driefasige PDU's, boordevol
Output-aansluitingen, variëren van 8,6
kW tot 21.6kW.
1,4 kW ... 7,7 kW ... 21,6 kW
Numbers and measurement units are not
converted properly and no spaces
inserted by MT engine (3 out of 4
occurrences, 1 is correct however,
strange...
Single-phase options range from 1.4kW
to 7.7kW while three-phase PDUs,
packed with output receptacles, range
from 8.6kW to 21.6kW.
• Biedt maximaal 24 TB <fmt id="1"
tooltip="SUPERSCRIPT"
endtooltip="SUPERSCRIPT"> 2 </fmt>
maximale capaciteit per-
uitbreidingsbehuizing toe te voegen.
• Biedt een maximale capaciteit van 24
TB<fmt id="1" tooltip="SUPERSCRIPT"
endtooltip="SUPERSCRIPT">2</fmt> per
uitbreidingsbehuizing.
No space should be inserted in front of
and behind a number in superscript (in
this case a "2").
...>2<...
and not:
> 2 <
<fmt id="1" tooltip="b"
endtooltip="b">Interface Speed:</fmt> 6
Gb/s SAS
<fmt id="1" tooltip="b" endtooltip="b">
Interfacesnelheid: 6 </fmt> Gb/s SAS
• Biedt een maximale capaciteit van 24
TB<fmt id="1" tooltip="SUPERSCRIPT"
endtooltip="SUPERSCRIPT">2</fmt> per
uitbreidingsbehuizing.
The number is inserted before the tag
and should be after the tag
<fmt id="1" tooltip="b"
endtooltip="b">Intermixed Drive
Capacities:</fmt> Yes
<fmt id="1" tooltip="b" endtooltip="b">
Intermixed Capaciteit van de schijven:
Ja </fmt>
...</fmt> Ja
The string is inserted before the tag and
should be after the tag (and again
spacing before and after tags inserted)
A new feature — DR Rapid Data Access
— adds tighter integration with backup
software applications, starting with
Symantec OpenStorage-enabled
backup applications.
Een nieuwe functie - DR-Rapid Data
Access - voegt strakkere integratie met
back-uptoepassingen, beginnend met
Symantec OpenStorage geschikte
back-uptoepassingen.
... — DR Rapid Data Access — ...
Please ensure any special characters
like — (ChrW(151)) are preserved when
inserting a TM proposal, and not
replaced by a normal hyphen
(ChrW(45)).
Can these errors can be learned and corrected automatically?
Can we simplify or omit the “feedback loop”?
•  How much more can we squeeze out of SMT phrase-based systems?
•  Factored models?
•  Deep syntactic/semantic structures?
•  Have a closer look at rule-based systems?
•  Deep Learning?
TRANSLATION QUALITY
QUALITY DEGRADATION WITH POST-EDITING?
POST-EDITING QUALITY RESULTS
No fails on one of our 28-language PE program thanks to correct
terminology choices and few and consistent error.
DOMAIN ADAPTATION
•  How much can we get out of minimal amounts of data? A little more
data? Mixed-domain data?
•  Forcing dictionaries -fluency vs. adequacy? How can we seamlessly
integrate client/user dictionaries into standard SMT workflows?
•  How often to retrain?
•  Does dynamic/interactive/”live” retraining help solve the domain
relevance problem?
“History is filled with brilliant people who wanted to fix things and just
made them worse.”
― Chuck Palahniuk
CONTENT EXPLOSION
HOW USEFUL IS MT FOR UGC?
• We performed an evaluations after normalization and domain
customizations of SMT engines.
• Between 54% and 96% of travel reviews scored between 3 and 5 on
the Utility scale.
WHY DO WE CARE?
BACKPACKER WEBSITE REVIEWS
LUXURY HOTEL REVIEWS
TECHNICAL FORUM
Translation purpose: youthful, 5 locales, cheap
Translation purpose: attract high-end clientele in 1 particular
target market
Translation purpose: save cost on user support, as many
locales as possible
MT + normalization; paid crowd
with basic instructions on “do’s +
don’ts”; crowd can be mix of
translators / customers / …
possibly MT with Full PE, but
possibly professional HT /
transcreation
MT +”accuracy check” PE; crowd
of technical users, savvy on product,
linguistic errors are ok
Global Commerce Global Consumer Pandora’s
Box of Brand Names and Geographic Locations
SOURCE CONTENT GONE WILD
•  Short forms (nite (night), sayin (saying), gr8 (great)),
•  Acronyms (lol (laugh out loud), iirc (if I remember correctly)),
•  Typing errors/misspellings (wouls (would), rediculous (ridiculous)),
•  Punctuation omissions/errors (im (I’m), dont (don’t)),
•  Non-dictionary slang (that was well mint (that was very good)),
•  Wordplay (that was soooooo great (that was so great)),
•  Censor avoidance (sh1t, f***),
•  Emoticons (:) (smileys), <3 (heart))
•  Foreign words used intentionally (al dente, bon voyage)
(Jiang et al, 2012; Clark & Araki, 2011)
TRAVEL USER REVIEWS
Non-native writers, typos, grammar errors, two authors with
completely different styles & opinions, idioms don’t make sense…
this is our source text.
NORMALIZING TRAVEL CONTENT
Any other normalization techniques?
ONLINE RETAILER GLOBAL LISTINGS
BEHIND THE SCENES
Proper Name Ant Farm
SOURCE! MT OUTPUT!
Basic	
  MIDI	
  Applica-ons	
  (Keyboard	
  Magazine	
  Library	
  for	
  
Electronic	
  Musicians),	
  Ca	
  
Grundlegende	
  MIDI-­‐Anwendungen	
  (Tastatur	
  Magazin	
  
Library	
  für	
  elektronische	
  Musiker),	
  Ca	
  
Analog	
  Way	
  smart	
  cut	
  2	
  seamless	
  video	
  &	
  computer	
  
switcher	
  -­‐	
  hi	
  res	
  scaled	
  output	
  
	
  
Analoge	
  Weise	
  smart	
  cut	
  2	
  nahtlose	
  Video	
  &	
  Computer-­‐
Umschalter	
  -­‐	
  Hallo	
  Res	
  skalierte	
  Ausgabe	
  
	
  
TOO	
  FAST	
  TOP	
  RAT	
  BABY	
  FANG	
  VAMPIRE	
  LIPS	
  LEOPARD	
  
ROCKABILLY	
  PINUP	
  USA	
  M	
  IRON	
  FIST	
  
	
  
ZU	
  SCHNELL	
  TOP	
  RATTE	
  BABY	
  FANG	
  VAMPIR	
  LIPPEN	
  
LEOPARD	
  ROCKABILLY	
  PINUP	
  GIRL	
  USA	
  M	
  IRON	
  FIST	
  
	
  
YRU	
  Youth	
  Rise	
  Up	
  Kreep	
  PlaXorm	
  Stacked	
  Leopard	
  Animal	
  
Suede	
  Spike	
  Studded	
  Pump	
  
	
  
YRU	
  Jugend	
  Aufs-eg	
  Kreep	
  PlaZorm	
  gestapelt	
  Leopard	
  Tier	
  
Wildleder	
  Spike	
  beschlagene	
  Pumpe	
  
	
  
DO WE WANT TO MESS WITH THIS THING?
• Words missing in the target / extra words
• Terms are translated with different capitalization within the same message
• Incorrect positive / negative translation
• Lack of fluency
• Mix of formal and informal form of address
• Wrong translation for the context
WHAT IS “QUALITY” FOR A GERMAN TOWER
CRANE OPERATOR?
• Do I close or not close the valve
before re-pressurizing?
• Do you mean a wheel or the pulley?
Punctuation and numbers:
• Handling locale-specific punctuation ‘Security Systems” to「セキュリティ システム」
• Slashes -space or no space on/off to вкл / выкл
• Number formatting, i.e. trailing characters 33 % or 33%?
• Target language/locale numbering conventions 44,500 to 44.500
• Intelligently match punctuation, i.e. remove unmatched quotes and parenthes
• Keep source capitalization or replace with target language capitalization conventions?
• Translate or transliterate addresses based on target language conventions
• OOV handling? Leave in the source language or Transliterate (flag for the user?)
• Handling DoNotTranslate without breaking the flow of the target sentence
• Handling Acronyms – does it expand? Which part of it is in the parentheses?
• Recognize groups as proper names (Herr Vogel is not a bird!)
A HUGE ONE: Preserving the negative or positive meaning ("Do remove this part" vs. "Do not
remove this part”); handling standalone negation in source with affixed negation in the
target
AN EQUALLY HUGE ONE: language identification – file level and even more so sub-file/
sentence level
UGC SHOPPING LIST
WORKFLOW, FORMATTING + METADATA
QUIZ: COUNT THE INTEGRATION
‘Segment’ vs. ‘Sentence’
A segment can be a lot of things - a sentence, a part of a sentence, a
word , so if the engine is integrated on a “segment level”, ellipse,
anaphora, other context-related features will not be taken into account
Doesn’t take much to break segmentation: a line break, a carriage return
or anything ambiguous will do the job – damage both on the training and
the runtime side
WHAT IS A ‘SEGMENT’?
“A camel is a horse designed by a committee.”
― Sir Alexander Arnold Constantine Issigonis
LOCALIZATION TAG PLACEMENT
This is what a plain-text engine will do:
To become verified and lift your sending limit, please confirm your email
address, then add a credit or prepaid card to your account and {30} {31}
{32} {33} {34} {35}confirm{36} {37} {38} it.{39}.
{30}Para hacerse verificado y levantar su límite de envío, por favor
confirme su dirección de correo electrónico, luego añada un crédito o
tarjeta de prepago a su cuenta de y confírmelo.{31}{32}{33}{34}{35}{36}{37}
{38}{39}
This is a<ph id="1" x="&lt;b&gt;">{1}</ph>test<ph id="1"
x="&lt;/b&gt;">{2}</ph>
Dies ist ein <ph id="1" x="&lt;b&gt;">{1}</ph>Test<ph id="1"
x="&lt;/b&gt;">{2}</ph>.
AND THIS IS WHAT’S NEEDED
TAG PROJECTION TECHNIQUES?
• We’ll be happy to
consume more
information, but then
please expose more
information
• Walls, zones, pre-
processing, post-
processing – can we do
more?
AN ENEMY OR AN ALLY?
• Domain, sub-domain, product
• Timestamps - deprecate TU-s in the training data?
• XLIFF metadata fields that carry information about specific terms
• UI strings and other variables markup
• Annotation fields
Can Localization metadata can be helpful for MT?
WE WANT TO SHARE
• Human evaluation and ranking
• Source/MT/Edits corpora (for experimentation only)
• Productivity data per-segment (side-by-side with PE distance
and other metrics) – thanks iOmegaT
• Database of correlations between automated scoring, human
ranking and PE effort and time
• Data on correlation between specific errors and translator
preference – can it help translator-focused confidence
scoring?
We have A LOT of "field" data:
DATA
Statistics from
internal database
CORRELATION RESULTS
Adequacy & Fluency versus Productivity Delta
Productivity and Fluency across
all locales with a cumulative
Pearson’s r of 0.77, a very
strong correlation
Productivity and Adequacy
across all locales with a
cumulative Pearson’s r of 0.71,
a very strong correlation
According to our data, Human Evaluations are stronger predictors
of post-editing productivity gains than Automatic metrics
including PE distance
CORRELATION RESULTS
Automatic Metrics versus Productivity Delta
Productivity delta and BLEU with
a cumulative Pearson’s r of
0.24, a weak positive
relationship
With a Pearson’s r of -0.436,
as PE distance increases,
indicating a greater effort from
the post-editor, Productivity
declines; it is a strong negative
relationship
•  More transparency in workings of engine and training
•  Faster systems, shorter turnaround on large systems
•  More “wizards” for training and deployment
•  Easier testing methodologies without full deployments
•  More standardized scoring and comparison metrics
•  More “wizards” for training and deployment
•  Predictive analysis of quality – confidence and utility scores
•  Normalization integrated into workflow and standardized
•  Industry-wide proper name and title library
•  Better transliteration standards
•  Morphologically aware terminology choices
•  More research on post-editing environments
1. How to display source/target
2. How to display multiple suggestions
3. Autocomplete
4. Better ways to calculate the productivity improvements with post-editing
•  More interoperability, so translators can stay in CAT tool they prefer
•  Simplified workflows connecting MT engines and other tools
Dear Santa,
SHALL WE?
— Napoleon Hill
Thank you.

More Related Content

Similar to The Enterprise User Perspective on MT

Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)TAUS - The Language Data Network
 
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)tauyou
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Brian Brazil
 
How do we drive tech changes
How do we drive tech changesHow do we drive tech changes
How do we drive tech changesJaewoo Ahn
 
Moto Mod the Future WebEx
Moto Mod the Future WebExMoto Mod the Future WebEx
Moto Mod the Future WebExBrian Collins
 
Book store automation system
Book store automation systemBook store automation system
Book store automation systemUpendra Sengar
 
Scope Definition of Online Ticketing System
Scope Definition of Online Ticketing SystemScope Definition of Online Ticketing System
Scope Definition of Online Ticketing SystemShahriar Parvez
 
Real-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTsReal-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTsC4Media
 
Scope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladeshScope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladeshShakil Mahmood
 
Success Strategies for Electronic Content Discovery and Access
Success Strategies for Electronic Content Discovery and AccessSuccess Strategies for Electronic Content Discovery and Access
Success Strategies for Electronic Content Discovery and AccessCatherine Giffi
 
Gala Webminar September 2013
Gala Webminar September 2013Gala Webminar September 2013
Gala Webminar September 2013pangeanic
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
What is spatial sql
What is spatial sqlWhat is spatial sql
What is spatial sqlshawty_ds
 
200,000 Lines Later: Our Journey to Manageable Puppet Code
200,000 Lines Later: Our Journey to Manageable Puppet Code200,000 Lines Later: Our Journey to Manageable Puppet Code
200,000 Lines Later: Our Journey to Manageable Puppet CodeDavid Danzilio
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 

Similar to The Enterprise User Perspective on MT (20)

Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
Machine Translation Quality - Are We There Yet? - Olga Beregovaya (Welocalize)
 
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
How we failed to win a 100,000,000 word contract (GALA Istanbul 2014)
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
How do we drive tech changes
How do we drive tech changesHow do we drive tech changes
How do we drive tech changes
 
Moto Mod the Future WebEx
Moto Mod the Future WebExMoto Mod the Future WebEx
Moto Mod the Future WebEx
 
Book store automation system
Book store automation systemBook store automation system
Book store automation system
 
Flexter Fundpitch
Flexter FundpitchFlexter Fundpitch
Flexter Fundpitch
 
Flexter Pitch Oct 2020
Flexter Pitch Oct 2020Flexter Pitch Oct 2020
Flexter Pitch Oct 2020
 
Scope Definition of Online Ticketing System
Scope Definition of Online Ticketing SystemScope Definition of Online Ticketing System
Scope Definition of Online Ticketing System
 
Real-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTsReal-time Collaborative Editing with CRDTs
Real-time Collaborative Editing with CRDTs
 
Scope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladeshScope definition of ticketing automation bangladesh
Scope definition of ticketing automation bangladesh
 
Success Strategies for Electronic Content Discovery and Access
Success Strategies for Electronic Content Discovery and AccessSuccess Strategies for Electronic Content Discovery and Access
Success Strategies for Electronic Content Discovery and Access
 
Gala Webminar September 2013
Gala Webminar September 2013Gala Webminar September 2013
Gala Webminar September 2013
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
 
What is spatial sql
What is spatial sqlWhat is spatial sql
What is spatial sql
 
200,000 Lines Later: Our Journey to Manageable Puppet Code
200,000 Lines Later: Our Journey to Manageable Puppet Code200,000 Lines Later: Our Journey to Manageable Puppet Code
200,000 Lines Later: Our Journey to Manageable Puppet Code
 
STC PMC Newsletter 2004-06
STC PMC Newsletter 2004-06STC PMC Newsletter 2004-06
STC PMC Newsletter 2004-06
 
Eric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New ContextsEric Proegler Oredev Performance Testing in New Contexts
Eric Proegler Oredev Performance Testing in New Contexts
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 

Recently uploaded

Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneCall girls in Ahmedabad High profile
 
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdfkeithzhangding
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxellan12
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663Call Girls Mumbai
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 

Recently uploaded (20)

Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service PuneVIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
 
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Aerocity ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
象限策略:Google Workspace 与 Microsoft 365 对业务的影响 .pdf
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptxAWS Community DAY Albertini-Ellan Cloud Security (1).pptx
AWS Community DAY Albertini-Ellan Cloud Security (1).pptx
 
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
✂️ 👅 Independent Andheri Escorts With Room Vashi Call Girls 💃 9004004663
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 

The Enterprise User Perspective on MT

  • 1. What We Want, What We Need, What We Can’t Do Without The Enterprise User Perspective on MT Technology & Things Around It Olga Beregovaya VP, Language Tools
  • 2. ONE THING HAS BECOME ALL THINGS; EVERYTHING • Global Economy Multicultural Transactions • Multiple Demand-Driven Content Scenarios • Multiple Data Sources + Formats
  • 3. 27,500 words - 25,000 emoji lattices …” And what is the use of a book,” thought Alice, “without pictures or conversations?” ?
  • 4. “The limits of my language means the limits of my world.” ― Ludwig Wittgenstein “Never send a human to do a machine’s job.” ― Agent Smith
  • 5. • Areas of utmost interest and importance as seen from a major Language Service Provider perspective, or “a week in a life of an Language Technology group” • Engine Output Quality; what is it actually, “output quality”? Are we “stuck”? Next breakthrough? • Domain adaptation, what can we do when there is neither data nor budget to create it? • Supporting “Raw” publishing scenarios (UGC, Support, MT to be consumed by other applications) – there will be no human to fix it. Or will there? • Metadata – an enemy or an ally? • Collaboration: how can we make it interesting for everyone? AND WE’LL TALK ABOUT…
  • 6. •  What do translators appreciate? •  What do translators struggle most with? •  Fluency VS. Accuracy? •  Final output quality? WINS & CHALLENGES
  • 7. CONTENT DRIVING QUALITY DECISIONS LEVELS OF POST-EDITING CONTENT TYPE STRATEGY FULL POST- EDITING Content that meets certain “impact” criteria-visibility, number of clicks, “shelf life” Post-edit to human translation levels, correct for terminology, grammar, fluency, style and voice MEDIUM POST-EDITING Human translation level requirement but with flexible style and fluency allowances Content that meets certain “impact” criteria-visibility, number of clicks, “shelf life” LIGHT POST-EDITING Emphasis on quick turnaround and/or large volumes Sliding scale depending on the content purpose
  • 8. Si queréis viajar en grupo, sóis al menos 6 personas y deseáis ir en otras fechas diferentes a las propuestas, preguntarnos porque  lo podemos organizar ;-) If there are at least 6 of you wishing to travel together, but on different dates that the ones offered, why don’t you ask us? We can arrange it. ;- ) If you want to travel in a group, you are at least 6 people and you wish to travel on different dates other than the proposed ones, let us know because we can organize it. If you want to travel in a group, you are at least 6 people and you wish to travel on different dates other than the proposed ones, let us know because we can organize it. If you want to travel in groups , are at least 6 people and wish to go on dates other than the proposed, ask us because we can organize ;- ) If you want to travel in groups , sóis at least 6 people and wish to go on dates other than the proposals, ask why we can organize ;- ) A la hora de generar las cuotas de amortización es posible utilizar dos porcentajes distintos; el fiscal o el de mercado que la empresa establezca. At the time of generating amortization fees, one might use two different percentages: Fiscal or Market established by the company. When generating the amortization fees, it is possible to use two different percentages; the fiscal percentage and the company established market one. When generating the amortization fees, it is possible to use two different percentages; the fiscal percentage and the company established market one. When generating the amortization fees is possible to use two different percentages; the fiscal or the market rate that the company stated. When generating the repayment is possible to use two different rates; the prosecutor or the market that the company stated. Para el supuesto de que haya contratado el servicio de actualizaciones relativo al programa software objeto de licencia, usted podrá actualizar el mismo durante los periodos de vigencia que tenga contratado el servicio. You can update the licensed software during the period stated in your contract. In case you have signed up for the update service related to the licensed software, you will be able to update the software during the validity periods stated in the contract. In case you have signed up for the update service related to the licensed software, you will be able to update the software during the validity periods stated in the contract. Because you engaged the update service related to the licensed software program, you can update it during periods of validity of the service contracted For the assumption that engaged the update service software on the licensed program, you can update it during periods of validity has contracted the DOMAINS SOURCE TRANSCREATION TRANSLATION FULL POST-EDITING LIGHT POST-EDITING RAWMT TRAVEL FINANCE LEGAL CONSUMING MT — QUALITY SCENARIOS
  • 9. THE POST-EDITOR PRODUCES: Publishable quality The post-editor is responsible for ensuring that client quality requirements and style guide are met The post-editor is expected to adhere to client StyleGuide preferences with regard to:   Infinitive / Imperative   Passive / Impassive   Formal / Informal   Different Styles for Headers, Lists, Tables   Special Handling of UI Options (Bilingual, English, Target?)   Converting All the Measurements Based On the Local Conventions + Disambiguate Terminology + Correct all the grammatical errors
  • 10. THE POST-EDITOR RECEIVES: GERMAN FRENCH JAPANESE RUSSIAN CHINESE SPANISH ITALIAN BRAZILIAN WRONG TERMINOLOGY 6.46 4.93 13.63 5.00 6.20 9.63 3.78 1.13 WRONG SPELLING 2.00 0.86 0.88 0.13 0.30 1.13 0.56 1.27 SOURCE NOT TRANSLATED 6.38 5.36 3.88 5.13 3.60 2.50 1.22 1.73 COMPLIANCE WITH CLIENT SPECS 2.46 0.86 3.00 2.13 0.70 0.63 0.44 2.60 LITERAL TRANSLATION 7.85 8.64 5.00 4.00 9.40 5.38 7.67 7.93 TEXT/INFO ADDED 2.69 1.36 2.13 1.25 0.80 1.88 0.44 0.80 CAPITALIZATION 2.69 3.43 0.00 2.63 0.50 1.75 3.33 2.60 WRONG WORD FORM 6.77 7.79 0.13 9.88 0.60 6.75 3.67 6.75 WRONG PART OF SPEECH 2.62 3.21 2.00 1.88 0.60 2.13 3.67 1.33 PUNCTUATION 4.46 3.00 0.75 3.38 4.10 2.13 1.22 3.53 SENTENCE STRUCTURE 12.54 10.00 14.25 8.00 13.00 5.38 6.11 3.67 TAGS + MARK-UP 1.23 0.14 0.13 0.50 0.20 0.38 0.44 0.20 LOCALE ADAPTATION 0.46 0.29 0.75 0.63 0.20 0.75 0.44 0.13 SPACING 0.92 0.36 2.25 1.25 4.00 0.50 0.33 0.40 OTHER 1.92 1.50 1.88 0.13 0.50 0.13 1.44 0.27 TOTAL ERRORS 61.46 51.71 50.63 45.88 44.70 41.00 34.78 32.53
  • 11. Most time-consuming issues that translators need to fix are: • Sentence structure (word order) • MT output too literal • Wrong terminology • Word form disagreements • Source term left untranslated OR, IN A NUTSHELL…
  • 12. TOP 6 ON THE TRANSLATORS’ LOVE IT-LIST 1.  Source of inspiration: reduces thinking and translation choice time 2.  Provides reference - very useful to translators new to a specific domain 3.  Reduces typing & lookup time by handling well repetitive terminology and structures 4.  …thereby takes away the more monotonous efforts of translation 5.  Post-editors over time notice improvements; appreciate it more if they ‘co-own’ the engine 6.  MT output can be funny LOL!
  • 13. TOP 3 ON THE TRANSLATORS’ S*#!T-LIST 1. Wrong sentence structure • Major impact on the post-editing effort (Spanish and Portuguese produce fewest errors) • Japanese has the highest error rate and the lowest productivity gains (supported by the cognitive effort error ranking research) 2. Wrong and inconsistent terminology • Very time-consuming to check and fix terminology; + enough issues from Fuzzy Matches already • A major problem for new products where the terminology is not settled yet • Inconsistent output for UI references 3. Correct MT to an agreed standard (=quality expectations) • A challenging concept in the beginning for post-editors – they think they should edit less if the quality is bad S*#!T
  • 14. FEEDBACK LOOP SOURCE TEXT MT OUTPUT POST-EDITED OUTPUT SPECIFIC ERRORS/ CHANGES MADE Single-phase options range from 1.4kW to 7.7kW while three-phase PDUs, packed with output receptacles, range from 8.6kW to 21.6kW. Single-fase 7.7kW Opties variëren van 1.4kW om en driefasige PDU's, boordevol Output-aansluitingen, variëren van 8,6 kW tot 21.6kW. 1,4 kW ... 7,7 kW ... 21,6 kW Numbers and measurement units are not converted properly and no spaces inserted by MT engine (3 out of 4 occurrences, 1 is correct however, strange... Single-phase options range from 1.4kW to 7.7kW while three-phase PDUs, packed with output receptacles, range from 8.6kW to 21.6kW. • Biedt maximaal 24 TB <fmt id="1" tooltip="SUPERSCRIPT" endtooltip="SUPERSCRIPT"> 2 </fmt> maximale capaciteit per- uitbreidingsbehuizing toe te voegen. • Biedt een maximale capaciteit van 24 TB<fmt id="1" tooltip="SUPERSCRIPT" endtooltip="SUPERSCRIPT">2</fmt> per uitbreidingsbehuizing. No space should be inserted in front of and behind a number in superscript (in this case a "2"). ...>2<... and not: > 2 < <fmt id="1" tooltip="b" endtooltip="b">Interface Speed:</fmt> 6 Gb/s SAS <fmt id="1" tooltip="b" endtooltip="b"> Interfacesnelheid: 6 </fmt> Gb/s SAS • Biedt een maximale capaciteit van 24 TB<fmt id="1" tooltip="SUPERSCRIPT" endtooltip="SUPERSCRIPT">2</fmt> per uitbreidingsbehuizing. The number is inserted before the tag and should be after the tag <fmt id="1" tooltip="b" endtooltip="b">Intermixed Drive Capacities:</fmt> Yes <fmt id="1" tooltip="b" endtooltip="b"> Intermixed Capaciteit van de schijven: Ja </fmt> ...</fmt> Ja The string is inserted before the tag and should be after the tag (and again spacing before and after tags inserted) A new feature — DR Rapid Data Access — adds tighter integration with backup software applications, starting with Symantec OpenStorage-enabled backup applications. Een nieuwe functie - DR-Rapid Data Access - voegt strakkere integratie met back-uptoepassingen, beginnend met Symantec OpenStorage geschikte back-uptoepassingen. ... — DR Rapid Data Access — ... Please ensure any special characters like — (ChrW(151)) are preserved when inserting a TM proposal, and not replaced by a normal hyphen (ChrW(45)). Can these errors can be learned and corrected automatically? Can we simplify or omit the “feedback loop”?
  • 15. •  How much more can we squeeze out of SMT phrase-based systems? •  Factored models? •  Deep syntactic/semantic structures? •  Have a closer look at rule-based systems? •  Deep Learning? TRANSLATION QUALITY
  • 16. QUALITY DEGRADATION WITH POST-EDITING?
  • 17. POST-EDITING QUALITY RESULTS No fails on one of our 28-language PE program thanks to correct terminology choices and few and consistent error.
  • 18. DOMAIN ADAPTATION •  How much can we get out of minimal amounts of data? A little more data? Mixed-domain data? •  Forcing dictionaries -fluency vs. adequacy? How can we seamlessly integrate client/user dictionaries into standard SMT workflows? •  How often to retrain? •  Does dynamic/interactive/”live” retraining help solve the domain relevance problem?
  • 19. “History is filled with brilliant people who wanted to fix things and just made them worse.” ― Chuck Palahniuk
  • 21. HOW USEFUL IS MT FOR UGC? • We performed an evaluations after normalization and domain customizations of SMT engines. • Between 54% and 96% of travel reviews scored between 3 and 5 on the Utility scale.
  • 22. WHY DO WE CARE? BACKPACKER WEBSITE REVIEWS LUXURY HOTEL REVIEWS TECHNICAL FORUM Translation purpose: youthful, 5 locales, cheap Translation purpose: attract high-end clientele in 1 particular target market Translation purpose: save cost on user support, as many locales as possible MT + normalization; paid crowd with basic instructions on “do’s + don’ts”; crowd can be mix of translators / customers / … possibly MT with Full PE, but possibly professional HT / transcreation MT +”accuracy check” PE; crowd of technical users, savvy on product, linguistic errors are ok
  • 23. Global Commerce Global Consumer Pandora’s Box of Brand Names and Geographic Locations
  • 24. SOURCE CONTENT GONE WILD •  Short forms (nite (night), sayin (saying), gr8 (great)), •  Acronyms (lol (laugh out loud), iirc (if I remember correctly)), •  Typing errors/misspellings (wouls (would), rediculous (ridiculous)), •  Punctuation omissions/errors (im (I’m), dont (don’t)), •  Non-dictionary slang (that was well mint (that was very good)), •  Wordplay (that was soooooo great (that was so great)), •  Censor avoidance (sh1t, f***), •  Emoticons (:) (smileys), <3 (heart)) •  Foreign words used intentionally (al dente, bon voyage) (Jiang et al, 2012; Clark & Araki, 2011)
  • 25. TRAVEL USER REVIEWS Non-native writers, typos, grammar errors, two authors with completely different styles & opinions, idioms don’t make sense… this is our source text.
  • 26. NORMALIZING TRAVEL CONTENT Any other normalization techniques?
  • 28. BEHIND THE SCENES Proper Name Ant Farm SOURCE! MT OUTPUT! Basic  MIDI  Applica-ons  (Keyboard  Magazine  Library  for   Electronic  Musicians),  Ca   Grundlegende  MIDI-­‐Anwendungen  (Tastatur  Magazin   Library  für  elektronische  Musiker),  Ca   Analog  Way  smart  cut  2  seamless  video  &  computer   switcher  -­‐  hi  res  scaled  output     Analoge  Weise  smart  cut  2  nahtlose  Video  &  Computer-­‐ Umschalter  -­‐  Hallo  Res  skalierte  Ausgabe     TOO  FAST  TOP  RAT  BABY  FANG  VAMPIRE  LIPS  LEOPARD   ROCKABILLY  PINUP  USA  M  IRON  FIST     ZU  SCHNELL  TOP  RATTE  BABY  FANG  VAMPIR  LIPPEN   LEOPARD  ROCKABILLY  PINUP  GIRL  USA  M  IRON  FIST     YRU  Youth  Rise  Up  Kreep  PlaXorm  Stacked  Leopard  Animal   Suede  Spike  Studded  Pump     YRU  Jugend  Aufs-eg  Kreep  PlaZorm  gestapelt  Leopard  Tier   Wildleder  Spike  beschlagene  Pumpe    
  • 29. DO WE WANT TO MESS WITH THIS THING? • Words missing in the target / extra words • Terms are translated with different capitalization within the same message • Incorrect positive / negative translation • Lack of fluency • Mix of formal and informal form of address • Wrong translation for the context
  • 30. WHAT IS “QUALITY” FOR A GERMAN TOWER CRANE OPERATOR? • Do I close or not close the valve before re-pressurizing? • Do you mean a wheel or the pulley?
  • 31. Punctuation and numbers: • Handling locale-specific punctuation ‘Security Systems” to「セキュリティ システム」 • Slashes -space or no space on/off to вкл / выкл • Number formatting, i.e. trailing characters 33 % or 33%? • Target language/locale numbering conventions 44,500 to 44.500 • Intelligently match punctuation, i.e. remove unmatched quotes and parenthes • Keep source capitalization or replace with target language capitalization conventions? • Translate or transliterate addresses based on target language conventions • OOV handling? Leave in the source language or Transliterate (flag for the user?) • Handling DoNotTranslate without breaking the flow of the target sentence • Handling Acronyms – does it expand? Which part of it is in the parentheses? • Recognize groups as proper names (Herr Vogel is not a bird!) A HUGE ONE: Preserving the negative or positive meaning ("Do remove this part" vs. "Do not remove this part”); handling standalone negation in source with affixed negation in the target AN EQUALLY HUGE ONE: language identification – file level and even more so sub-file/ sentence level UGC SHOPPING LIST
  • 33. QUIZ: COUNT THE INTEGRATION
  • 34. ‘Segment’ vs. ‘Sentence’ A segment can be a lot of things - a sentence, a part of a sentence, a word , so if the engine is integrated on a “segment level”, ellipse, anaphora, other context-related features will not be taken into account Doesn’t take much to break segmentation: a line break, a carriage return or anything ambiguous will do the job – damage both on the training and the runtime side WHAT IS A ‘SEGMENT’?
  • 35. “A camel is a horse designed by a committee.” ― Sir Alexander Arnold Constantine Issigonis
  • 36. LOCALIZATION TAG PLACEMENT This is what a plain-text engine will do: To become verified and lift your sending limit, please confirm your email address, then add a credit or prepaid card to your account and {30} {31} {32} {33} {34} {35}confirm{36} {37} {38} it.{39}. {30}Para hacerse verificado y levantar su límite de envío, por favor confirme su dirección de correo electrónico, luego añada un crédito o tarjeta de prepago a su cuenta de y confírmelo.{31}{32}{33}{34}{35}{36}{37} {38}{39}
  • 37. This is a<ph id="1" x="&lt;b&gt;">{1}</ph>test<ph id="1" x="&lt;/b&gt;">{2}</ph> Dies ist ein <ph id="1" x="&lt;b&gt;">{1}</ph>Test<ph id="1" x="&lt;/b&gt;">{2}</ph>. AND THIS IS WHAT’S NEEDED
  • 38. TAG PROJECTION TECHNIQUES? • We’ll be happy to consume more information, but then please expose more information • Walls, zones, pre- processing, post- processing – can we do more?
  • 39. AN ENEMY OR AN ALLY? • Domain, sub-domain, product • Timestamps - deprecate TU-s in the training data? • XLIFF metadata fields that carry information about specific terms • UI strings and other variables markup • Annotation fields Can Localization metadata can be helpful for MT?
  • 40. WE WANT TO SHARE • Human evaluation and ranking • Source/MT/Edits corpora (for experimentation only) • Productivity data per-segment (side-by-side with PE distance and other metrics) – thanks iOmegaT • Database of correlations between automated scoring, human ranking and PE effort and time • Data on correlation between specific errors and translator preference – can it help translator-focused confidence scoring? We have A LOT of "field" data:
  • 42. CORRELATION RESULTS Adequacy & Fluency versus Productivity Delta Productivity and Fluency across all locales with a cumulative Pearson’s r of 0.77, a very strong correlation Productivity and Adequacy across all locales with a cumulative Pearson’s r of 0.71, a very strong correlation According to our data, Human Evaluations are stronger predictors of post-editing productivity gains than Automatic metrics including PE distance
  • 43. CORRELATION RESULTS Automatic Metrics versus Productivity Delta Productivity delta and BLEU with a cumulative Pearson’s r of 0.24, a weak positive relationship With a Pearson’s r of -0.436, as PE distance increases, indicating a greater effort from the post-editor, Productivity declines; it is a strong negative relationship
  • 44. •  More transparency in workings of engine and training •  Faster systems, shorter turnaround on large systems •  More “wizards” for training and deployment •  Easier testing methodologies without full deployments •  More standardized scoring and comparison metrics •  More “wizards” for training and deployment •  Predictive analysis of quality – confidence and utility scores •  Normalization integrated into workflow and standardized •  Industry-wide proper name and title library •  Better transliteration standards •  Morphologically aware terminology choices •  More research on post-editing environments 1. How to display source/target 2. How to display multiple suggestions 3. Autocomplete 4. Better ways to calculate the productivity improvements with post-editing •  More interoperability, so translators can stay in CAT tool they prefer •  Simplified workflows connecting MT engines and other tools Dear Santa,