Overview of Multidimensional Quality Metrics (QTLaunchPad)


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Overview of Multidimensional Quality Metrics (QTLaunchPad)

  1. 1. Translation Quality Assessment:Five Easy StepsUsing Multidimensional Quality Metrics toimprove quality assessment and managementPrepared by the QTLaunchPad project (info@qt21.eu)version 1.0 (26.April 2013)
  2. 2. Who does this apply to? Requesters of translation services looking for relevantquality metrics Language Service Providers (LSPs) delivering translationservices to their clients The following materials will apply to negotiationbetween requesters and providers This description does not apply to individual translators(although they may want to be aware of the contents)
  3. 3. Step 1: Specifications
  4. 4. Basic questions about your projectE.g., What languages are you working in? What is your subject field? What sort of project is it (e.g., userinterface, documentation, advertising)? What technology are you using (MT, CAT, etc.)? What register and style are you using?
  5. 5. Step 2. Select Metrics
  6. 6. Based on your specifications… MQM recommendation tool will: suggest a pre-defined metric used for similar projects, or recommend a custom metric that applies to your project You are free to modify the metric as needed Create a metrics specification file that defines the issues to be examined provides weights (descriptions of how important the issuesare) Metrics specification file can be used by an MQM-compliant tool
  7. 7. Step 3: Evaluation Method
  8. 8. Three options:1. Sampling: Examine a portion of the text to determinewhether to pass or fail the entire text. Sampling canutilize quality estimation for better results2. Full error analysis: Review the entire text (needed forcritical legal or safety texts)3. Rubric: Rate the text on a numerical scale (suitablefor quick assessment of suitability)
  9. 9. Automated Metrics If sampling is used, MQM’s quality estimation tools willhelp focus sampling on those parts of the text that needattention Automatic metrics can be used in some cases wherehuman evaluation is too expensive or time-consuming
  10. 10. Step 4: Evaluation
  11. 11. Evaluation… Can be conducted by the requester or LSP in accordancewith the agreement between the parties Follows the method chosen in Step 3 (evaluationmethod) Issues must match the metric chosen in Step 2: issuesnot found in the metric should not be considered errors
  12. 12. MQM provides capabilities For human evaluation Inline markup provides an audit trail: Allows independent verification of errors Helps ensure that issues are corrected Full reporting functions: See what types of errors are reported Understand where errors come from For automatic evaluation Integrated use of existing quality metrics to help provideevaluation
  13. 13. translate5 These capabilities are being integrated into an open-source editing tool, translate5(http://www.translate5.net) All results are free to implement in additional tools(both open source and proprietary) Parties interested in development should contactinfo@qt21.eu
  14. 14. The source matters Full MQM evaluation includes the source Source quality evaluation can help identify reasons forproblems and resolve them Translators can be rewarded for addressing sourcedeficiencies (scores over 100% are possible!)
  15. 15. Step 5: Scoring
  16. 16. Scoring Formula (Q = whatever set of issues being counted within the biggerformula) Provides consistency with LISA QA Model scoring method Can be customized to support other legacy systems Can be applied to individual parts of the overall formula:i.e., fluency, accuracy, grammar, etc. subscores can bederived Weights (not shown) can be used to adjust importance ofvarious issue types
  17. 17. Scores help guide decisions Scores are given on a 100% basis Scores can be broken down into more fine-grainedreports. E.g., a score of 96% could have 100% accuracy but 92%fluency. Helps target actions for quality control.
  18. 18. Example
  19. 19. 1. SpecificationsParameter ValueLanguage/Locale Source: English; Target: JapaneseSubject field/domain MedicalText type NarrativeAudience Educated readers with an interest in medicinePurpose Education about a new procedure for managingdiabetesRegister Moderately formalStyle no specified style – match source if possibleContent correspondence Literal translationOutput modality subtitles (speech to text)File format Time-coded XML for dotSubProduction technology human translation
  20. 20. 2. Recommended MetricIssue type Weight (high,medium, low)NotesFluencyOrthography HighGrammar HighAccuracyMistranslation HighOmission Low Due to nature as captions,some information loss isexpected. Captions should be60% of spoken dialogueUntranslated HighLegalrequirementsHigh Must make sure that legalclaims are admissible underJapanese law
  21. 21. Chosen from…Issue types are a subset of the full catalog of types
  22. 22. Chosen from…
  23. 23. Quality Formula (1)TQ = (Atr + At - As) + (Ft – Fs)with respect to specifications TQ = translation quality Atr = accuracy (transfer) At = accuracy for the target text As = accuracy for the source text Ft = fluency score for target text Fs = fluency score for source text
  24. 24. Quality Formula (2)TQ = (Atr + At - As) + (Ft – Fs)with respect to specificationsDefinition: A quality translation demonstrates requiredaccuracy and fluency for the audience and purpose andcomplies with all other negotiated specifications, takinginto account end-user needs.The gold portion = dimensions (specifications)
  25. 25. 3. Evaluation method In this example, portions of the text are marketing:sampling is an acceptable evaluation method for theseparts Other portions contain legal and regulatory claims: fullerror analysis is required for those portions Inline markup can be used via MQM namespace (becausetext is in XML) to ensure corrections are made.
  26. 26. 4. Evaluation• Evaluation includessubsegment markupwith issues in metric• Issues stored in MQMnamespace to allowaudit and revision• Users can select three severity levels:• critical: the issue renders the text unusable• major: the issue leaves the text usable, but is an obstacle tounderstanding• minor: the issue does not impact usability of the textscreenshot: translate5.net showing MQM markup tool
  27. 27. 5. ScoringIssue type Weight Minor Major Critical Penalty Adjusted TotalFluencyOrthography 1.0 8 2 1 28 28 97.2%Grammar 1.0 6 2 0 16 16 98.4%Subtotal 44 95.6%AccuracyMistranslation 1.0 4 0 0 4 4 99.6%Omission 0.2 12 4 1 42 8.4 99.2%Untranslated 1.0 1 0 0 1 1 99.9%Legalrequirements1.0 0 0 1 10 10 99.0%Subtotal 23.4 97.7%Total 67.4 93.3%Assumes 1000-word sampleBecause Omission is considered a lowpriority in this case, it is given a lowweight
  28. 28. 5. Scoring Without weighting of Omission, the score would be89.9% We can see that the translator has more problems withfluency than with accuracy
  29. 29. 5. Full scoring (including source)Issue type Source Target AdjustedFluencyOrthography 96.1% 97.2% 101.1%Grammar 99.0% 98.4% 99.6%Subtotal 95.1% 95.6% ☞ 100.5%AccuracyMistranslation (100%) 99.6% 99.6%Omission (100%) 99.2% 99.2%Untranslated (100%) 99.9% 99.9%Legal requirements (100%) 99.0% 99.0%Subtotal 100% 97.7% 97.7%Total 95.1% 89.9% 98.2%Assumes 1000-word sample. Source accuracy set to 100% for computational purposes.
  30. 30. 5. Scoring (including source) In many cases, some problems in a translation are notcaused by the translator. In this case, the translator fixed problems in thesource, resulting in better quality for fluency in thetarget. The translator should be recognized for thiswork.
  31. 31. For more information Please visit http://www.qt21.eu/launchpad/ Write to info@qt21.eu