• Save
Post-Editing of Machine Translation: Developing Requirements and Compensation Schemes
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Post-Editing of Machine Translation: Developing Requirements and Compensation Schemes

  • 2,350 views
Uploaded on

Establish clear and exhaustive requirements and develop fair and reasonable compensation schemes for PEMT jobs.

Establish clear and exhaustive requirements and develop fair and reasonable compensation schemes for PEMT jobs.

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,350
On Slideshare
1,903
From Embeds
447
Number of Embeds
4

Actions

Shares
Downloads
0
Comments
0
Likes
1

Embeds 447

http://www.scoop.it 383
http://www.s-quid.it 59
https://twitter.com 4
http://webcache.googleusercontent.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Monolingual post-editors Experts in the domain, but not bilingual Bilingual post-editors Professional translators with domain expertise, they are trained to understand issues with MT and not only correct the error in the sentence, but work to create rules for the MT engine to follow. A fully qualified professional translator has to have two sets of skills when translating a text. On the one hand the language skills to generally understand the source language and to write well in the target language, and on the other hand the domain knowledge to understand the content of a possibly very specialized technical document. Both skill sets may be hard to find, especially in combination. In fact, it is common practice in the translation industry to differentiate translators according to their qualifications.
  • While automated metrics such as BLEU and human metrics such as Edit Distance are useful indicators of quality, by themselves they do not provide enough information. Productivity is the metric that matters most to LSPs as this relates directly to profit margin. Measuring productivity provides LSPs and post-editors with a simple means to determine a fair rate for MT post-editing. A fair rate can be established based upon the productivity gain realized via an MT + human approach and the reduced effort required to deliver the same quality output. If post-editing MT is 3 times faster than a human only translation only approach, then there is justification for reducing rates by 33% of the regular rate, but generally since this is just a small sample it would be wiser to adjust this upwards to a level that may accommodate more variance in the MT output. From the translators perspective, they are being paid less per word, but are being paid more per hour overall.

Transcript

  • 1. Post-Editing of MachineTranslation (PEMT)Developing Requirements andCompensation Schemes
  • 2. Overview MT 101 Estimation of MT quality Definition of post-editing PEMT Requirements PEMT skills Pricing and compensation© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 2
  • 3. Caveat emptor Your presenter is an MT enthusiast Using MT since 1990 No bias MT is here to stay No special knowledge Could sound trivial• Common sense© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 3
  • 4. Methods Rule-based Machine Translation (RbMT) Transfer Interlingua Data-driven (stochastic) machinetranslation Example-based MachineTranslation (EbMT) Statistical Machine Translation (SMT) Both methods can work for projects whereMT is suitable Most commercial systems are now all hybrids of some sort Post processing and cleanup© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 4
  • 5. Rules-basedMachine Translation (RbMT) Analytical approach Grammatical representation of language• Morphological analysis– Inflection and conjugation ofwords• Syntactic analysis– Sequence of words and sentencestructure• Semantic analysis– Meaning of words in context Heavy dependence on bilingual dictionaries© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 5
  • 6. RbMT Issues Disambiguation Plain, correct, and consistent source Constant refinement of rules Accurate and comprehensive dictionaries© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 6
  • 7. Example-based MachineTranslation (EbMT) Bilingual corpus Body of reference for similarities• Combination of segments– Best approximation Fuzzy matching algorithms© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 7
  • 8. StatisticalMachine Translation (SMT) Empirical strategy Statistical probability• Statistical assessment of words and phrase positioningwithin segments from corpus– Brute force computing© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 8
  • 9. Two Models for Learning Data Translation model Words and word sequences in SL to find the mostlikely corresponding words in TL Target-language model The most likely way in which corresponding TLwords will be combined© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 9
  • 10. EbMT & SMT Issues Parallel corpora Analysis is challenging• Problems with large corpora Translation memories insteadof corpora Segmentation for accuracy andmatching• Problems with alignment© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 10
  • 11. Ambiguity in SMT Translation models handle word sequences Likeliness of reproducing a wrong interpretation ifin model Collocations (dependencies between words)could be hard to capture Target-language models© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 11
  • 12. SWOT Analysis for MT© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 12• Speed• Volumes• Consistency• Complexity• Error incidence• Amount of skills, expertise andunderstanding needed• Not commonplace• Least engaging, highly rewarding, non-binding content• Areas for improvement• Training and customization• Language data and rules optimization• Writing• Controlled languages• Reliability• Problematic ROI
  • 13. Modes of Use© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 13UnrestrictedtextsHighqualityRestricted inputLow qualityImpracticalInteractiveFullyautomatic
  • 14. Estimating MT quality Biased baseline: MT is always bad Most translators do not know much about MT• Same old jokes about silly mistakes• A mixture of ignorance and fear Automatic metrics Hard interpretation PEMT effort Annotation guidelines• Assigning 1-5 scores© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 14
  • 15. Automatic Metrics BLEU No indication of accuracy 1 ≤ P ≤ 0• 1 = professional human translation• .65 = human quality• A score increase does not necessarily mean improved translation quality NIST Based on BLEU, with some alterations METEOR (Metric for Evaluation of Translation with Explicit ORdering) Based on BLEU• Harmonic mean of unigram precision and recall F-Measure (F1 Score or F-Score) A measure of a test’s accuracy used in machine learning Based on BLEU, with some alterations WER (Word Error Rate) Most often used in speech recognition• 1 ≤ P ≤ 0© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 15
  • 16. Prerequisites Post-editing throughput must be faster thantranslation Post-editing must be less keyboard intensivethan translation Post-editing must be less cognitivelydemanding than translation© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 16
  • 17. PEMT Effort System RbMT• Dictionary• Rules• Customizability Data‐driven• Suitability of input• Training data– Volume– Domain Product captivity• Different technologies that can or cannot be used within more than just one tool Language pair Outcome in one language combination cannot be compared with that inanother Text type Domain© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 17
  • 18. PEMT Issues System RbMT• Incorrect word/term• Incorrect attachment• Meaning notdisambiguated Data‐driven• Missing words• Capitalization• Punctuation• Fluency inconsistency© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 18
  • 19. Post-editing Gist Raw MT• Disposable (volatile UGC)• Validation of automatic evaluation Light Making the translation understandable• Ignoring all stylistic issues• Adjusting mechanical errors (capitalization, andpunctuation)• Replacing unknown words (misspelled in ST)• Removing redundant words Heavy Making the translation stylistically appropriate• Fixing machine‐induced meaning distortion• Making grammatical and syntactic adjustments• Checking terminology (new terms)• Partially or completely rewriting sentences– Adjusting for target language fluency© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 19
  • 20. Degrees of Post-editing User requirements Quality expectations Perishability Volume Text function Turn‐around time© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 20
  • 21. PEMT Specs Type of MT In house vs. outsourced MT Type of MT output Generic, untrained MT output Trained MT output Quality guidelines and index for raw translation ≥ 40% reusable• BLEU– Acceptable 0.3 to 0.5– Good: 0.5 to 1 Request a sample© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 21
  • 22. PEMT Specs Rationale for MT Increased throughput• More languages• More content Faster turnaround time Reduced cost Accuracy and consistency Target consumer Quality of the finished product Reprocessing Publication Amount and type of PEMT Gist Light Heavy Participation in ongoing training of engine© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 22
  • 23. Warning MT engines are not all equal Raw output quality is not consistent fromsystem to system and language to language MT error patterns are not consistent fromsegment to segment© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 23
  • 24. PEMT Instructions Clear and concise Tools Stick to style guide• Language specific conventions• Country/region standards• Grammar, syntax and orthographic conventions Retain as much raw translation as possible Don’t hesitate too long over a problem Don’t worry about style Don’t embark on time‐consuming research Make changes only where absolutely necessary• Non sense• Wrong words (possibly misspelled in source text)• Missing words• Punctuation, capitalization• Inflection• Gender• Word order• Formatting© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 24
  • 25. PEMT Instructions Example for heavy PEMT project The quality expected is publishable quality, this means nodeletions or omissions in the text, full accuracy and nomistranslations with regards to the source text,compliance to language rules of grammar and spelling forthe target language, and compliance to the terminologyfollowing the glossary provided. Make as less edits as possible• Just correct errors– Follow the glossary– If different terminology is found, replace it with the one in the glossary• Do not introduce preferential changes• Do not re-write text, unless to correct nonsenses• Do not try to ‘improve’ the text© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 25
  • 26. PEMT skills Working knowledge of SL Excellent command of TL Specialized domain knowledge Ability to comply with guidelines Unbiased attitude towards MT© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 26Establish your own PEMT academyEstablish your own PEMT academy
  • 27. Pricing and Compensation It will take a year or two more to build out awidely accepted and dominant compensationmodel The final model will be tied to productivity© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 27
  • 28. Pricing and Compensation Common approaches Paying as for high fuzzy matches• 85%-94%• PEMT and post-editing of fuzzy matches are deeplydifferent– Fuzzy matches are inherently correct segments Minor changes (possibly a term or two)– MT is not necessarily inherently correct Even ‘Light PEMT’ could eventually result heavy Paying a time-based fee© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 28
  • 29. Always© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 29 Provide candidate post-editors with MTsamples before contracting Agree on throughput rates• 450 to 750 words/hour Run a pilot project For every domain For any new combination
  • 30. Compensation Grid Generals Method Type of output• Generic, trained or untrained Quality of output Number of references Quality expectations• Threshold File formats• Tagging Time-based fee Productivity rate• Productivity differ by post-editor Time for filling in QA forms• For ongoing training of engine Use a spreadsheet to track time© 2013 Luigi Muzii Developing PEMT Requirements and Compensation Schemes 30