TRANSLATION QUALITY ASSESSMENT REDEFINED from TQI to competences and suitability Demid Tishin All Correct Language Solutions www.allcorrect.ru
What are they thinking about when they look at the target text?
Client :  Will it blend?*  Let’s find a flaw … *Just a joke. “Will it do”, I mean  
Quality manager : Will it blend ?  I wish the client said OK …
HR / Vendor Manager :  What kind of work can I trust  to this provider? What  can I not ? How quickly can we train him ?
Project Manager :  Return for improvement or correct by other resources ?
To answer these questions the target text needs  assessment
TRANSLATION ASSESSMENT: THE ARMORY What assessment techniques do you know ?
TRANSLATION ASSESSMENT: THE ARMORY Subjective assessment  ( “good / bad” ) Comparing with the source according to a parameter checklist Automated comparison  with a reference translation  ( BLEU etc. ) Weighing errors  and   calculating TQI
SUBJECTIVE ASSESSMENT (“GOOD / BAD”) Pro’s Con’s Speed Results   not repeatable Results   not reproducible Difficult for client and service provider to arrive at the same  opinion Impossible to give detailed reasons Tells nothing of provider’s abilities
COMPARING WITH THE SOURCE ACCORDING TO A PARAMETER CHECKLIST Pro's Con's Some  reasoning  for assessment results Results   not reproducible Difficult for client and service provider to arrive at the same  opinion Results   not repeatable Tells nothing of provider’s abilities
AUTOMATED COMPARISON  WITH A REFERENCE TRANSLATION The more word sequences correlate between the target and the reference ,  the better the translation BLEU ( BiLingual Evaluation Understudy ),  ROUGE , NIST, METEOR   etc . An overview of BLEU :  Tomedes Blog   http://blog.tomedes.com/measuring-machine-translation-quality/
AUTOMATED COMPARISON  WITH A REFERENCE TRANSLATION
AUTOMATED COMPARISON  WITH A REFERENCE TRANSLATION Pro's Con's Speed Does not account for individual style Limited scope  (today limited to MT output) Does not correlate to human assessment A number of  reference translations must be prepared  before assessment  ( justified for batch assessment of different translations of the same source sample ) Tells nothing of provider’s abilities How should acceptability threshold be defined?
WEIGHING ERRORS AND CALCULATING TQI Who ?  Lionbridge ,  Aliquantum ,  Logrus,  All Correct Language Solutions  … and many others Publicly available techniques : -  SAE J2450 -  ATA Framework for Standard Error Marking -  LISA QA Model 3.1 An overview of translation quality index techniques and guidelines to create your own : http://www.aliquantum.biz/downloads.htm
WEIGHING ERRORS AND CALCULATING TQI What  components  you will need : Error classifier Error weighing guidelines Translation assessment guidelines, which yield repeatable and reproducible results Expert  ( competent and unambiguous )  Assessment results form
WEIGHING ERRORS AND CALCULATING TQI
WEIGHING ERRORS AND CALCULATING TQI
WEIGHING ERRORS AND CALCULATING TQI
TQI (Translation Quality Index) is the usual practical result of translation quality measurement ATA Framework:  TQI = EP * (250 / W) - BP SAE J2450:  TQI = EP / W LISA QA Model: TQI  =   (1  -   EP  /  W ) * 100 where EP  =  total Error Points   W  =  number of words in sample BP  =  Bonus Points for outstanding  translation passages  ( ATA)   max.  3  points WEIGHING ERRORS AND CALCULATING TQI
WEIGHING ERRORS AND CALCULATING TQI Pro's Results  highly reproducible  ( SAE J2450 ) Results  highly repeatable  ( SAE J2450 ) Detailed error classifier  with explanations and examples  ( LISA QA Model ) Easy to  use for quality feedback  to providers Convenient to  grade providers  according to their TQI   for a specific project TQI is a simple numeric index ,  which you can account in a database and use in your balanced scorecard ,  KPI’s etc.
WEIGHING ERRORS AND CALCULATING TQI Con's Limited scope   (SAE J2450) Low reproducibility  of results  ( ATA Framework ) A  threshold of acceptable TQI is required  ( e.g.  94,5  etc. ),  while clients do not tolerate any explicitly stated imperfection Assessment is time-consuming  (5- 20  minutes per sample provided that the expert has carefully studied the source. Subjective or underdeveloped error weight assignment  –  a try at forecasting error consequences  ( LISA QA Model ) Tells very little of provider’s abilities
WEIGHING ERRORS AND CALCULATING TQI Con's Underdeveloped Translation assessment guidelines ,  including but not limited to :  -  requirements to  translation sample  ( size, presence of terminology etc. ) -  how to evaluate  repeated typical  (pattern) errors ? -  how to assess flaws in the target ,  which root in  obvious flaws in the source ? how to evade  several minor errors resulting in the same score as one major  error? how to handle obviously  accidental errors  that change the factual meaning ?
WEIGHING ERRORS AND CALCULATING TQI Con's TQI is valid only for : a specific  subject field  ( e.g. gas turbines, food production etc. ) a specific  text type  ( Legal, Technical and Research, or Advertising and Journalism ) A slight change in any of the above (subject, text type) means that one cannot forecast the provider’s TQI   based on former evaluations        a new (tailored) assessment is required        ungrounded expenses
None of the translation assessment methods answers the questions:   Will it blend ? What kind of work can I trust  to this provider? What can I not? How quickly can we train him? Return for improvement or correct by other resources?  TRANSLATION ASSESSMENT: THE ARMORY
Translation assessment techniques  need improvement! TRANSLATION ASSESSMENT: THE ARMORY
Split all errors into  2  major groups : Factual  =  error in designation of objects and phenomena, their logical relations, and degree of event probability / necessity Connotative  =  errors in conveying emotional and stylistic information, non-compliance with rules, standards, checklists and guidelines etc.  IMPROVEMENT 1: TWO ERROR DIMENSIONS
IMPROVEMENT 1: TWO ERROR DIMENSIONS That’s a restaurant That’s a damn fctory That’s a factory (factual error) (2 connotative errors, though no factual errors) (source)
Each text element  ( word, phrase, sentence etc. )  can contain :  1  connotative error or 1  factual error or 1  connotative and 1 factual error   simultaneously IMPROVEMENT 1: TWO ERROR DIMENSIONS
An accidental error  ( e.g. an obvious typo ),  which leads to obscuring factual info ,  counts as two  ( e.g. language and factual ). You can at once give specific instructions to the provider (e.g. be more careful) and consider client’s interest (e.g. absence of factual distortions whatever the reason) “ To kill Edward fear not, good it is” / “To kill Edward fear, not good it is” (Isabella of France) :  an error in the comma    critical factual distortion IMPROVEMENT 1: TWO ERROR DIMENSIONS
Map each error in the classifier  to the competences that are required  to avoid it IMPROVEMENT 2: COMPETENCES
Competence types : Competences of acquisition Competences of production Auxiliary (general) competences IMPROVEMENT 2: COMPETENCES
Competence levels : Unsatisfactory  =  provider cannot do the corresponding work Basic  =  can work Advanced  =  can revise and correct work of others or train others IMPROVEMENT 2: COMPETENCES
Competences of acquisition Source language rules Source Literary   Source Cultural Subject matter IMPROVEMENT 2: COMPETENCES
Competences of production : Target language rules Target literary Target cultural Target mode of expression (= register, functional style)   IMPROVEMENT 2: COMPETENCES
Auxiliary (general) competences : Research Technical General Carefulness, Responsibility and Self-organisation Communication (relevant for translation as a service, not the product) Information security (relevant for  translation as a service, not the product) IMPROVEMENT 2: COMPETENCES
IMPROVEMENT 2: COMPETENCES
IMPROVEMENT 2: COMPETENCES
Client can formulate  precise and objective requirements to the provider  Assessment immediately shows  which competences stand to the required level  and which don’t IMPROVEMENT 2: COMPETENCES
Map each workflow role  (e.g. translate, compile project glossary, revise language etc.)  to a number of required competences IMPROVEMENT 3: WORKFLOW ROLES
IMPROVEMENT 3: WORKFLOW ROLES Example of a competence set : Self-organisation  =  basic Subject matter = basic Source language rules  =  basic   Target language rules  =  basic   Role : Can translate
IMPROVEMENT 3: WORKFLOW ROLES
Vendor Manager / Project Manager quickly  assigns workflow roles  and  schedules the project saves time IMPROVEMENT 3: WORKFLOW ROLES
In each case client indicates which error types are  allowed  and which are not .  The expert puts down the client requirements  in a list One  “not allowed” error in the sample Text  fails  (client perspective) IMPROVEMENT 4: ERROR ALLOWABILITY
IMPROVEMENT 4: ERROR ALLOWABILITY
Assessment stands to the real client needs  ( “pass / fail” ) IMPROVEMENT 4: ERROR ALLOWABILITY
Single out  2  major error groups : Correcting the error requires  minimum training  / instructions; the provider can find and correct all errors of the type in his work himself Correcting the error requires  prolonged training;   the provider cannot find all his errors in the text IMPROVEMENT 5: PROVIDER TRAINABILITY
Errors that require minimum training : the original order of text sections is broken broken cross-references text omissions numbers / dates do not correspond to the source glossary / style guide violated non-compliance with reference sources inconsistent terminology non-compliance with regional formatting standards broken tags, line length obvious language errors etc. IMPROVEMENT 5: PROVIDER TRAINABILITY
Errors that require prolonged training : understanding the source confusion of special notions  (subject competence)   stylistic devices and expressive means  (literary competence) cultural phenomena etc. IMPROVEMENT 5: PROVIDER TRAINABILITY
IMPROVEMENT 5: PROVIDER TRAINABILITY
What is the percentage of errors requiring minimum training ? PM can instantly take a decision  –  return the product for further improvement or correct with other resources saves time IMPROVEMENT 5: PROVIDER TRAINABILITY
If all errors influencing the competence are easy to correct ,  the competence is assessed in two ways (at once) : Current state  (“as is”) Potential state  ( after a short training ) IMPROVEMENT 5: PROVIDER TRAINABILITY
Provider has to work in  normal conditions  ( enough time ,  work instructions ) The sample should be  restricted to one main subject field  according to the subject classifier The source text should be  meaningful and coherent It is important to differentiate between  errors and preferential choices NOTES
To assess a sample the  expert  has to possess all competences on " advanced " level.  As it is difficult to find such experts in reality,  several people can be assigned to assess one sample  (e.g. one assesses terminology, another assesses all the other aspects) Quality predictions  for rush jobs cannot be based on normal competence assessment  (as the rush quality output is normally lower) NOTES
EXAMPLE
EXAMPLE
EXAMPLE
EXAMPLE
The new assessment model replies to all the questions: Will it blend?  –  pass / fail What kind of work can I trust to this provider ?   What can I not? –  competences and workflow roles How quickly can I train the  provider ? –  potential competences Return for improvement or correct by other resources?  –  percentage of errors requiring minimum training CONCLUSION
Provider and client speak the same “language”  ( error types and competences )     less debates Saves time  when testing providers Simplifies planning of  minimum and sufficient workflow, optimizes resources Allows to  avoid extra text processing stages  when not necessary  (extra stages avoided)     better turnaround (extra stages avoided)     more flexible budgets    higher rates    provider  loyalty  and good image for the company Detailed feedback and training    provider  loyalty BENEFITS
Adjustment and testing Dedicated software tool Integration with QA tools THE FUTURE OF THE TECHNIQUE
QUALITY MANAGEMENT PROCESS
Thank you! Questions ? [email_address]

Translation quality assessment redefined

  • 1.
    TRANSLATION QUALITY ASSESSMENTREDEFINED from TQI to competences and suitability Demid Tishin All Correct Language Solutions www.allcorrect.ru
  • 2.
    What are theythinking about when they look at the target text?
  • 3.
    Client : Will it blend?* Let’s find a flaw … *Just a joke. “Will it do”, I mean 
  • 4.
    Quality manager :Will it blend ? I wish the client said OK …
  • 5.
    HR / VendorManager : What kind of work can I trust to this provider? What can I not ? How quickly can we train him ?
  • 6.
    Project Manager : Return for improvement or correct by other resources ?
  • 7.
    To answer thesequestions the target text needs assessment
  • 8.
    TRANSLATION ASSESSMENT: THEARMORY What assessment techniques do you know ?
  • 9.
    TRANSLATION ASSESSMENT: THEARMORY Subjective assessment ( “good / bad” ) Comparing with the source according to a parameter checklist Automated comparison with a reference translation ( BLEU etc. ) Weighing errors and calculating TQI
  • 10.
    SUBJECTIVE ASSESSMENT (“GOOD/ BAD”) Pro’s Con’s Speed Results not repeatable Results not reproducible Difficult for client and service provider to arrive at the same opinion Impossible to give detailed reasons Tells nothing of provider’s abilities
  • 11.
    COMPARING WITH THESOURCE ACCORDING TO A PARAMETER CHECKLIST Pro's Con's Some reasoning for assessment results Results not reproducible Difficult for client and service provider to arrive at the same opinion Results not repeatable Tells nothing of provider’s abilities
  • 12.
    AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION The more word sequences correlate between the target and the reference , the better the translation BLEU ( BiLingual Evaluation Understudy ), ROUGE , NIST, METEOR etc . An overview of BLEU : Tomedes Blog http://blog.tomedes.com/measuring-machine-translation-quality/
  • 13.
    AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION
  • 14.
    AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION Pro's Con's Speed Does not account for individual style Limited scope (today limited to MT output) Does not correlate to human assessment A number of reference translations must be prepared before assessment ( justified for batch assessment of different translations of the same source sample ) Tells nothing of provider’s abilities How should acceptability threshold be defined?
  • 15.
    WEIGHING ERRORS ANDCALCULATING TQI Who ? Lionbridge , Aliquantum , Logrus, All Correct Language Solutions … and many others Publicly available techniques : - SAE J2450 - ATA Framework for Standard Error Marking - LISA QA Model 3.1 An overview of translation quality index techniques and guidelines to create your own : http://www.aliquantum.biz/downloads.htm
  • 16.
    WEIGHING ERRORS ANDCALCULATING TQI What components you will need : Error classifier Error weighing guidelines Translation assessment guidelines, which yield repeatable and reproducible results Expert ( competent and unambiguous ) Assessment results form
  • 17.
    WEIGHING ERRORS ANDCALCULATING TQI
  • 18.
    WEIGHING ERRORS ANDCALCULATING TQI
  • 19.
    WEIGHING ERRORS ANDCALCULATING TQI
  • 20.
    TQI (Translation QualityIndex) is the usual practical result of translation quality measurement ATA Framework: TQI = EP * (250 / W) - BP SAE J2450: TQI = EP / W LISA QA Model: TQI = (1 - EP / W ) * 100 where EP = total Error Points W = number of words in sample BP = Bonus Points for outstanding translation passages ( ATA) max. 3 points WEIGHING ERRORS AND CALCULATING TQI
  • 21.
    WEIGHING ERRORS ANDCALCULATING TQI Pro's Results highly reproducible ( SAE J2450 ) Results highly repeatable ( SAE J2450 ) Detailed error classifier with explanations and examples ( LISA QA Model ) Easy to use for quality feedback to providers Convenient to grade providers according to their TQI for a specific project TQI is a simple numeric index , which you can account in a database and use in your balanced scorecard , KPI’s etc.
  • 22.
    WEIGHING ERRORS ANDCALCULATING TQI Con's Limited scope (SAE J2450) Low reproducibility of results ( ATA Framework ) A threshold of acceptable TQI is required ( e.g. 94,5 etc. ), while clients do not tolerate any explicitly stated imperfection Assessment is time-consuming (5- 20 minutes per sample provided that the expert has carefully studied the source. Subjective or underdeveloped error weight assignment – a try at forecasting error consequences ( LISA QA Model ) Tells very little of provider’s abilities
  • 23.
    WEIGHING ERRORS ANDCALCULATING TQI Con's Underdeveloped Translation assessment guidelines , including but not limited to : - requirements to translation sample ( size, presence of terminology etc. ) - how to evaluate repeated typical (pattern) errors ? - how to assess flaws in the target , which root in obvious flaws in the source ? how to evade several minor errors resulting in the same score as one major error? how to handle obviously accidental errors that change the factual meaning ?
  • 24.
    WEIGHING ERRORS ANDCALCULATING TQI Con's TQI is valid only for : a specific subject field ( e.g. gas turbines, food production etc. ) a specific text type ( Legal, Technical and Research, or Advertising and Journalism ) A slight change in any of the above (subject, text type) means that one cannot forecast the provider’s TQI based on former evaluations   a new (tailored) assessment is required   ungrounded expenses
  • 25.
    None of thetranslation assessment methods answers the questions: Will it blend ? What kind of work can I trust to this provider? What can I not? How quickly can we train him? Return for improvement or correct by other resources? TRANSLATION ASSESSMENT: THE ARMORY
  • 26.
    Translation assessment techniques need improvement! TRANSLATION ASSESSMENT: THE ARMORY
  • 27.
    Split all errorsinto 2 major groups : Factual = error in designation of objects and phenomena, their logical relations, and degree of event probability / necessity Connotative = errors in conveying emotional and stylistic information, non-compliance with rules, standards, checklists and guidelines etc. IMPROVEMENT 1: TWO ERROR DIMENSIONS
  • 28.
    IMPROVEMENT 1: TWOERROR DIMENSIONS That’s a restaurant That’s a damn fctory That’s a factory (factual error) (2 connotative errors, though no factual errors) (source)
  • 29.
    Each text element ( word, phrase, sentence etc. ) can contain : 1 connotative error or 1 factual error or 1 connotative and 1 factual error simultaneously IMPROVEMENT 1: TWO ERROR DIMENSIONS
  • 30.
    An accidental error ( e.g. an obvious typo ), which leads to obscuring factual info , counts as two ( e.g. language and factual ). You can at once give specific instructions to the provider (e.g. be more careful) and consider client’s interest (e.g. absence of factual distortions whatever the reason) “ To kill Edward fear not, good it is” / “To kill Edward fear, not good it is” (Isabella of France) : an error in the comma  critical factual distortion IMPROVEMENT 1: TWO ERROR DIMENSIONS
  • 31.
    Map each errorin the classifier to the competences that are required to avoid it IMPROVEMENT 2: COMPETENCES
  • 32.
    Competence types :Competences of acquisition Competences of production Auxiliary (general) competences IMPROVEMENT 2: COMPETENCES
  • 33.
    Competence levels :Unsatisfactory = provider cannot do the corresponding work Basic = can work Advanced = can revise and correct work of others or train others IMPROVEMENT 2: COMPETENCES
  • 34.
    Competences of acquisitionSource language rules Source Literary Source Cultural Subject matter IMPROVEMENT 2: COMPETENCES
  • 35.
    Competences of production: Target language rules Target literary Target cultural Target mode of expression (= register, functional style) IMPROVEMENT 2: COMPETENCES
  • 36.
    Auxiliary (general) competences: Research Technical General Carefulness, Responsibility and Self-organisation Communication (relevant for translation as a service, not the product) Information security (relevant for translation as a service, not the product) IMPROVEMENT 2: COMPETENCES
  • 37.
  • 38.
  • 39.
    Client can formulate precise and objective requirements to the provider Assessment immediately shows which competences stand to the required level and which don’t IMPROVEMENT 2: COMPETENCES
  • 40.
    Map each workflowrole (e.g. translate, compile project glossary, revise language etc.) to a number of required competences IMPROVEMENT 3: WORKFLOW ROLES
  • 41.
    IMPROVEMENT 3: WORKFLOWROLES Example of a competence set : Self-organisation = basic Subject matter = basic Source language rules = basic Target language rules = basic Role : Can translate
  • 42.
  • 43.
    Vendor Manager /Project Manager quickly assigns workflow roles and schedules the project saves time IMPROVEMENT 3: WORKFLOW ROLES
  • 44.
    In each caseclient indicates which error types are allowed and which are not . The expert puts down the client requirements in a list One “not allowed” error in the sample Text fails (client perspective) IMPROVEMENT 4: ERROR ALLOWABILITY
  • 45.
  • 46.
    Assessment stands tothe real client needs ( “pass / fail” ) IMPROVEMENT 4: ERROR ALLOWABILITY
  • 47.
    Single out 2 major error groups : Correcting the error requires minimum training / instructions; the provider can find and correct all errors of the type in his work himself Correcting the error requires prolonged training; the provider cannot find all his errors in the text IMPROVEMENT 5: PROVIDER TRAINABILITY
  • 48.
    Errors that requireminimum training : the original order of text sections is broken broken cross-references text omissions numbers / dates do not correspond to the source glossary / style guide violated non-compliance with reference sources inconsistent terminology non-compliance with regional formatting standards broken tags, line length obvious language errors etc. IMPROVEMENT 5: PROVIDER TRAINABILITY
  • 49.
    Errors that requireprolonged training : understanding the source confusion of special notions (subject competence) stylistic devices and expressive means (literary competence) cultural phenomena etc. IMPROVEMENT 5: PROVIDER TRAINABILITY
  • 50.
  • 51.
    What is thepercentage of errors requiring minimum training ? PM can instantly take a decision – return the product for further improvement or correct with other resources saves time IMPROVEMENT 5: PROVIDER TRAINABILITY
  • 52.
    If all errorsinfluencing the competence are easy to correct , the competence is assessed in two ways (at once) : Current state (“as is”) Potential state ( after a short training ) IMPROVEMENT 5: PROVIDER TRAINABILITY
  • 53.
    Provider has towork in normal conditions ( enough time , work instructions ) The sample should be restricted to one main subject field according to the subject classifier The source text should be meaningful and coherent It is important to differentiate between errors and preferential choices NOTES
  • 54.
    To assess asample the expert has to possess all competences on " advanced " level. As it is difficult to find such experts in reality, several people can be assigned to assess one sample (e.g. one assesses terminology, another assesses all the other aspects) Quality predictions for rush jobs cannot be based on normal competence assessment (as the rush quality output is normally lower) NOTES
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
    The new assessmentmodel replies to all the questions: Will it blend? – pass / fail What kind of work can I trust to this provider ? What can I not? – competences and workflow roles How quickly can I train the provider ? – potential competences Return for improvement or correct by other resources? – percentage of errors requiring minimum training CONCLUSION
  • 60.
    Provider and clientspeak the same “language” ( error types and competences )  less debates Saves time when testing providers Simplifies planning of minimum and sufficient workflow, optimizes resources Allows to avoid extra text processing stages when not necessary (extra stages avoided)  better turnaround (extra stages avoided)  more flexible budgets  higher rates  provider loyalty and good image for the company Detailed feedback and training  provider loyalty BENEFITS
  • 61.
    Adjustment and testingDedicated software tool Integration with QA tools THE FUTURE OF THE TECHNIQUE
  • 62.
  • 63.
    Thank you! Questions? [email_address]

Editor's Notes

  • #21 1. Ideally the number of possible error occurrences should be used instead of the number of words, but in practice such a count for any text is impossible. 2. TQI’s produced by different techniques cannot be directly compared !
  • #31 Eduardum occidere nolite timere bonum est