TAUS Best PracticesError Typology GuidelinesMay 2013
Quality Evaluation using an ErrorTypology ApproachWHY ARE TAUS INDUSTRY GUIDELINES NEEDED?Error typology is the standard approach to quality evaluation currently.There is some consistency in its application across the industry, butthere is also variability in categories, granularity, penalties and so on. Itis a largely manual process, focused only on small samples and takestime and money to apply. Providing guidelines for best practice willenable the industry to:• Adopt a more standard approach to error typologies, ensuring ashared language and understanding between translation buyers,suppliers and evaluators• Move towards increased automation of this quality evaluationmechanism• Better track and compare performance across projects, languagesand vendors
For quality evaluation based on the errortypology, limit the number of error categories• The most commonly used categories are:Language, Terminology, Accuracy and Style.• Diagnostic evaluations that seek to understand indetail the nature or cause of errors may require amore detailed error typology. For further detailson error categories, refer to the TAUS DQFFramework Knowledgebase. The Error Typologyshould be flexible enough to allow for additionalor sub-categories, if required.
Establish clear definitions for eachcategory• The commonly used category of ‘Language’ could beambiguous, but an error in this category generally means agrammatical, syntactic or punctuation error.• The category of ‘Accuracy’ is applied when incorrect meaning hasbeen transferred or there has been an unacceptable omission oraddition in the translated text.• The category ‘Terminology’ is applied when a glossary or otherstandard terminology source has not been adhered to.• The category of ‘Style’ can be quite subjective; Subjectivity can bereduced by defining this as ‘Contravention of the style guide’.Where an error of this type occurs, reference should be made to aspecific guideline within the target-language-specific style guide.• List typical examples to help evaluators select the right category• Add different weightings to each error type depending on thecontent type
Have no more than four severity levels• The established practice is to have four severity levels:Minor, Major, Critical and Neutral. ‘Neutral’ applieswhen a problem needs to be logged, but is not thefault of the translator, or to inform of a mistake thatwill be penalized if made in the future.• Different thresholds exist for major, minor and criticalerrors. These should be flexible, depending on thecontent type, end-user profile and perishability of thecontent. For further information, TAUS DQF FrameworkKnowledgebase.
Include a positive category/positiveaction for excellent translationsAcknowledging excellence is important forensuring continued high levels of quality.Translators often complain that they onlyreceive feedback when it is negative and hearnothing when they do an excellent job.
Use a separate QE metric for DTP andUI text.Use a separate metric for these because specificissues arise for DTP (e.g. formatting, graphics)and for UI text (e.g. truncations).
Provide text in context to facilitate thebest possible review process• Seeing the translated text as the end user willsee it will better enable the evaluator toreview the impact of errors.• Allow reviewers to review chunks of coherenttext, rather than isolated segments.• Ideally, the translation should be carried out ina context-rich environment, especially if thequality evaluation is to be carried out in suchan environment.
To ensure consistency quality human evaluatorsmust meet minimum requirements• Ensure minimum requirements are met bydeveloping training materials, screening tests,and guidelines with examples• Evaluators should be native or near nativespeakers, familiar with the domain of the data• Evaluators should ideally be available toperform one evaluation pass withoutinterruption
Determine when your evaluations are suited forbenchmarking, by making sure results are repeatable• Define tests and test sets for each model anddetermine minimal requirements for inter-rater agreements.• Train and retain evaluator teams• Establish scalable and repeatable processes byusing tools and automated processes for datapreparation, evaluation setup and analysis
Capture evaluation results automatically to enablecomparisons across time, projects, vendors• Use color-coding for comparing performanceover time, e.g. green for meeting or exceedingexpectations, amber to signal a reduction inquality, red for problems that needaddressing.
Implement a CAPA (Corrective ActionPreventive Action) process• Best practice is for there to be a process inplace to deal with quality issues - correctiveaction processes along with preventive actionprocesses. Examples might include theprovision of training or the improvement ofterminology management processes.
Further resources:For TAUS members:For information on when to use an errortypology approach, detailed standard definitionsof categories, examples of thresholds, a step-by-step process guide, ready to use template andguidance on training evaluators, please refer tothe TAUS Dynamic Quality FrameworkKnowledge.
Our thanks to:Sharon O-Brien (TAUS Labs) for drafting theseguidelines. The following organizations for reviewing andrefining the Guidelines at the TAUS Quality Evaluation Summit15 March 2013, Dublin: ABBYY Language Services, CapitaTranslation and Interpreting, CLS Communication, Crestec,EMC Corporation, Intel, Jensen Localization, JonckersTranslation & Engineering s.r.o., KantanMT, Lexcelera,Lingo24, Lionbridge, Logrus International, McAfee, Microsoft,Moravia, Palex Languages & Software, Safaba TranslationSolutions, STP Nordic, Trinity College Dublin, University ofSheffield, Vistatec, Welocalize and Yamagata Europe.
Consultation and Publication A publicconsultation was undertaken between 11 and 24April 2013. The guidelines were published on 2May 2013. Feedback To give feedback on howto improve the guidelines, please write firstname.lastname@example.org.