Overview of Multidimensional Quality Metrics (QTLaunchPad)
Translation Quality Assessment:Five Easy StepsUsing Multidimensional Quality Metrics toimprove quality assessment and managementPrepared by the QTLaunchPad project (email@example.com)version 1.0 (26.April 2013)
Who does this apply to? Requesters of translation services looking for relevantquality metrics Language Service Providers (LSPs) delivering translationservices to their clients The following materials will apply to negotiationbetween requesters and providers This description does not apply to individual translators(although they may want to be aware of the contents)
Basic questions about your projectE.g., What languages are you working in? What is your subject field? What sort of project is it (e.g., userinterface, documentation, advertising)? What technology are you using (MT, CAT, etc.)? What register and style are you using?
Based on your specifications… MQM recommendation tool will: suggest a pre-defined metric used for similar projects, or recommend a custom metric that applies to your project You are free to modify the metric as needed Create a metrics specification file that defines the issues to be examined provides weights (descriptions of how important the issuesare) Metrics specification file can be used by an MQM-compliant tool
Three options:1. Sampling: Examine a portion of the text to determinewhether to pass or fail the entire text. Sampling canutilize quality estimation for better results2. Full error analysis: Review the entire text (needed forcritical legal or safety texts)3. Rubric: Rate the text on a numerical scale (suitablefor quick assessment of suitability)
Automated Metrics If sampling is used, MQM’s quality estimation tools willhelp focus sampling on those parts of the text that needattention Automatic metrics can be used in some cases wherehuman evaluation is too expensive or time-consuming
Evaluation… Can be conducted by the requester or LSP in accordancewith the agreement between the parties Follows the method chosen in Step 3 (evaluationmethod) Issues must match the metric chosen in Step 2: issuesnot found in the metric should not be considered errors
MQM provides capabilities For human evaluation Inline markup provides an audit trail: Allows independent verification of errors Helps ensure that issues are corrected Full reporting functions: See what types of errors are reported Understand where errors come from For automatic evaluation Integrated use of existing quality metrics to help provideevaluation
translate5 These capabilities are being integrated into an open-source editing tool, translate5(http://www.translate5.net) All results are free to implement in additional tools(both open source and proprietary) Parties interested in development should firstname.lastname@example.org
The source matters Full MQM evaluation includes the source Source quality evaluation can help identify reasons forproblems and resolve them Translators can be rewarded for addressing sourcedeficiencies (scores over 100% are possible!)
Scoring Formula (Q = whatever set of issues being counted within the biggerformula) Provides consistency with LISA QA Model scoring method Can be customized to support other legacy systems Can be applied to individual parts of the overall formula:i.e., fluency, accuracy, grammar, etc. subscores can bederived Weights (not shown) can be used to adjust importance ofvarious issue types
Scores help guide decisions Scores are given on a 100% basis Scores can be broken down into more fine-grainedreports. E.g., a score of 96% could have 100% accuracy but 92%fluency. Helps target actions for quality control.
1. SpecificationsParameter ValueLanguage/Locale Source: English; Target: JapaneseSubject field/domain MedicalText type NarrativeAudience Educated readers with an interest in medicinePurpose Education about a new procedure for managingdiabetesRegister Moderately formalStyle no specified style – match source if possibleContent correspondence Literal translationOutput modality subtitles (speech to text)File format Time-coded XML for dotSubProduction technology human translation
2. Recommended MetricIssue type Weight (high,medium, low)NotesFluencyOrthography HighGrammar HighAccuracyMistranslation HighOmission Low Due to nature as captions,some information loss isexpected. Captions should be60% of spoken dialogueUntranslated HighLegalrequirementsHigh Must make sure that legalclaims are admissible underJapanese law
Chosen from…Issue types are a subset of the full catalog of types
Quality Formula (1)TQ = (Atr + At - As) + (Ft – Fs)with respect to specifications TQ = translation quality Atr = accuracy (transfer) At = accuracy for the target text As = accuracy for the source text Ft = fluency score for target text Fs = fluency score for source text
Quality Formula (2)TQ = (Atr + At - As) + (Ft – Fs)with respect to specificationsDefinition: A quality translation demonstrates requiredaccuracy and fluency for the audience and purpose andcomplies with all other negotiated specifications, takinginto account end-user needs.The gold portion = dimensions (specifications)
3. Evaluation method In this example, portions of the text are marketing:sampling is an acceptable evaluation method for theseparts Other portions contain legal and regulatory claims: fullerror analysis is required for those portions Inline markup can be used via MQM namespace (becausetext is in XML) to ensure corrections are made.
4. Evaluation• Evaluation includessubsegment markupwith issues in metric• Issues stored in MQMnamespace to allowaudit and revision• Users can select three severity levels:• critical: the issue renders the text unusable• major: the issue leaves the text usable, but is an obstacle tounderstanding• minor: the issue does not impact usability of the textscreenshot: translate5.net showing MQM markup tool
5. ScoringIssue type Weight Minor Major Critical Penalty Adjusted TotalFluencyOrthography 1.0 8 2 1 28 28 97.2%Grammar 1.0 6 2 0 16 16 98.4%Subtotal 44 95.6%AccuracyMistranslation 1.0 4 0 0 4 4 99.6%Omission 0.2 12 4 1 42 8.4 99.2%Untranslated 1.0 1 0 0 1 1 99.9%Legalrequirements1.0 0 0 1 10 10 99.0%Subtotal 23.4 97.7%Total 67.4 93.3%Assumes 1000-word sampleBecause Omission is considered a lowpriority in this case, it is given a lowweight
5. Scoring Without weighting of Omission, the score would be89.9% We can see that the translator has more problems withfluency than with accuracy
5. Full scoring (including source)Issue type Source Target AdjustedFluencyOrthography 96.1% 97.2% 101.1%Grammar 99.0% 98.4% 99.6%Subtotal 95.1% 95.6% ☞ 100.5%AccuracyMistranslation (100%) 99.6% 99.6%Omission (100%) 99.2% 99.2%Untranslated (100%) 99.9% 99.9%Legal requirements (100%) 99.0% 99.0%Subtotal 100% 97.7% 97.7%Total 95.1% 89.9% 98.2%Assumes 1000-word sample. Source accuracy set to 100% for computational purposes.
5. Scoring (including source) In many cases, some problems in a translation are notcaused by the translator. In this case, the translator fixed problems in thesource, resulting in better quality for fluency in thetarget. The translator should be recognized for thiswork.
For more information Please visit http://www.qt21.eu/launchpad/ Write to email@example.com