Translation quality assessment redefined

5,900 views

Published on

Published in: Education
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
5,900
On SlideShare
0
From Embeds
0
Number of Embeds
65
Actions
Shares
0
Downloads
213
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide
  • 1. Ideally the number of possible error occurrences should be used instead of the number of words, but in practice such a count for any text is impossible. 2. TQI’s produced by different techniques cannot be directly compared !
  • Eduardum occidere nolite timere bonum est
  • Translation quality assessment redefined

    1. 1. TRANSLATION QUALITY ASSESSMENT REDEFINED from TQI to competences and suitability Demid Tishin All Correct Language Solutions www.allcorrect.ru
    2. 2. <ul><li>What are they thinking about when they look at the target text? </li></ul>
    3. 3. <ul><li>Client : </li></ul>Will it blend?* Let’s find a flaw … *Just a joke. “Will it do”, I mean 
    4. 4. <ul><li>Quality manager : </li></ul>Will it blend ? I wish the client said OK …
    5. 5. <ul><li>HR / Vendor Manager : </li></ul>What kind of work can I trust to this provider? What can I not ? How quickly can we train him ?
    6. 6. <ul><li>Project Manager : </li></ul>Return for improvement or correct by other resources ?
    7. 7. <ul><li>To answer these questions the target text needs assessment </li></ul>
    8. 8. TRANSLATION ASSESSMENT: THE ARMORY <ul><li>What assessment techniques do you know ? </li></ul>
    9. 9. TRANSLATION ASSESSMENT: THE ARMORY <ul><li>Subjective assessment ( “good / bad” ) </li></ul><ul><li>Comparing with the source according to a parameter checklist </li></ul><ul><li>Automated comparison with a reference translation ( BLEU etc. ) </li></ul><ul><li>Weighing errors and calculating TQI </li></ul>
    10. 10. SUBJECTIVE ASSESSMENT (“GOOD / BAD”) Pro’s Con’s Speed Results not repeatable Results not reproducible Difficult for client and service provider to arrive at the same opinion Impossible to give detailed reasons Tells nothing of provider’s abilities
    11. 11. COMPARING WITH THE SOURCE ACCORDING TO A PARAMETER CHECKLIST Pro's Con's Some reasoning for assessment results Results not reproducible Difficult for client and service provider to arrive at the same opinion Results not repeatable Tells nothing of provider’s abilities
    12. 12. AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION <ul><li>The more word sequences correlate between the target and the reference , the better the translation </li></ul><ul><li>BLEU ( BiLingual Evaluation Understudy ), ROUGE , NIST, METEOR etc . </li></ul><ul><li>An overview of BLEU : Tomedes Blog http://blog.tomedes.com/measuring-machine-translation-quality/ </li></ul>
    13. 13. AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION
    14. 14. AUTOMATED COMPARISON WITH A REFERENCE TRANSLATION Pro's Con's Speed Does not account for individual style Limited scope (today limited to MT output) Does not correlate to human assessment A number of reference translations must be prepared before assessment ( justified for batch assessment of different translations of the same source sample ) Tells nothing of provider’s abilities How should acceptability threshold be defined?
    15. 15. WEIGHING ERRORS AND CALCULATING TQI <ul><li>Who ? Lionbridge , Aliquantum , Logrus, All Correct Language Solutions … and many others </li></ul><ul><li>Publicly available techniques : - SAE J2450 - ATA Framework for Standard Error Marking - LISA QA Model 3.1 </li></ul><ul><li>An overview of translation quality index techniques and guidelines to create your own : http://www.aliquantum.biz/downloads.htm </li></ul>
    16. 16. WEIGHING ERRORS AND CALCULATING TQI <ul><li>What components you will need : </li></ul><ul><li>Error classifier </li></ul><ul><li>Error weighing guidelines </li></ul><ul><li>Translation assessment guidelines, which yield repeatable and reproducible results </li></ul><ul><li>Expert ( competent and unambiguous ) </li></ul><ul><li>Assessment results form </li></ul>
    17. 17. WEIGHING ERRORS AND CALCULATING TQI
    18. 18. WEIGHING ERRORS AND CALCULATING TQI
    19. 19. WEIGHING ERRORS AND CALCULATING TQI
    20. 20. <ul><li>TQI (Translation Quality Index) is the usual practical result of translation quality measurement </li></ul><ul><li>ATA Framework: TQI = EP * (250 / W) - BP </li></ul><ul><li>SAE J2450: TQI = EP / W </li></ul><ul><li>LISA QA Model: TQI = (1 - EP / W ) * 100 </li></ul><ul><li>where EP = total Error Points </li></ul><ul><li>W = number of words in sample </li></ul><ul><li>BP = Bonus Points for outstanding translation passages ( ATA) max. 3 points </li></ul>WEIGHING ERRORS AND CALCULATING TQI
    21. 21. WEIGHING ERRORS AND CALCULATING TQI Pro's Results highly reproducible ( SAE J2450 ) Results highly repeatable ( SAE J2450 ) Detailed error classifier with explanations and examples ( LISA QA Model ) Easy to use for quality feedback to providers Convenient to grade providers according to their TQI for a specific project TQI is a simple numeric index , which you can account in a database and use in your balanced scorecard , KPI’s etc.
    22. 22. WEIGHING ERRORS AND CALCULATING TQI Con's Limited scope (SAE J2450) Low reproducibility of results ( ATA Framework ) A threshold of acceptable TQI is required ( e.g. 94,5 etc. ), while clients do not tolerate any explicitly stated imperfection Assessment is time-consuming (5- 20 minutes per sample provided that the expert has carefully studied the source. Subjective or underdeveloped error weight assignment – a try at forecasting error consequences ( LISA QA Model ) Tells very little of provider’s abilities
    23. 23. WEIGHING ERRORS AND CALCULATING TQI Con's <ul><li>Underdeveloped Translation assessment guidelines , including but not limited to : - requirements to translation sample ( size, presence of terminology etc. ) </li></ul><ul><li>- how to evaluate repeated typical (pattern) errors ? </li></ul><ul><li>- how to assess flaws in the target , which root in obvious flaws in the source ? </li></ul><ul><li>how to evade several minor errors resulting in the same score as one major error? </li></ul><ul><li>how to handle obviously accidental errors that change the factual meaning ? </li></ul>
    24. 24. WEIGHING ERRORS AND CALCULATING TQI Con's <ul><li>TQI is valid only for : </li></ul><ul><li>a specific subject field ( e.g. gas turbines, food production etc. ) </li></ul><ul><li>a specific text type ( Legal, Technical and Research, or Advertising and Journalism ) </li></ul><ul><li>A slight change in any of the above (subject, text type) means that one cannot forecast the provider’s TQI based on former evaluations   a new (tailored) assessment is required   ungrounded expenses </li></ul>
    25. 25. <ul><li>None of the translation assessment methods answers the questions: </li></ul><ul><li>Will it blend ? </li></ul><ul><li>What kind of work can I trust to this provider? What can I not? </li></ul><ul><li>How quickly can we train him? </li></ul><ul><li>Return for improvement or correct by other resources? </li></ul>TRANSLATION ASSESSMENT: THE ARMORY
    26. 26. <ul><li>Translation assessment techniques need improvement! </li></ul>TRANSLATION ASSESSMENT: THE ARMORY
    27. 27. <ul><li>Split all errors into 2 major groups : </li></ul><ul><li>Factual = error in designation of objects and phenomena, their logical relations, and degree of event probability / necessity </li></ul><ul><li>Connotative = errors in conveying emotional and stylistic information, non-compliance with rules, standards, checklists and guidelines etc. </li></ul>IMPROVEMENT 1: TWO ERROR DIMENSIONS
    28. 28. IMPROVEMENT 1: TWO ERROR DIMENSIONS That’s a restaurant That’s a damn fctory That’s a factory (factual error) (2 connotative errors, though no factual errors) (source)
    29. 29. <ul><li>Each text element ( word, phrase, sentence etc. ) can contain : </li></ul><ul><li>1 connotative error </li></ul><ul><li>or </li></ul><ul><li>1 factual error </li></ul><ul><li>or </li></ul><ul><li>1 connotative and 1 factual error simultaneously </li></ul>IMPROVEMENT 1: TWO ERROR DIMENSIONS
    30. 30. <ul><li>An accidental error ( e.g. an obvious typo ), which leads to obscuring factual info , counts as two ( e.g. language and factual ). </li></ul><ul><li>You can at once give specific instructions to the provider (e.g. be more careful) and consider client’s interest (e.g. absence of factual distortions whatever the reason) </li></ul><ul><li>“ To kill Edward fear not, good it is” / “To kill Edward fear, not good it is” (Isabella of France) : an error in the comma  critical factual distortion </li></ul>IMPROVEMENT 1: TWO ERROR DIMENSIONS
    31. 31. <ul><li>Map each error in the classifier to the competences that are required to avoid it </li></ul>IMPROVEMENT 2: COMPETENCES
    32. 32. <ul><li>Competence types : </li></ul><ul><li>Competences of acquisition </li></ul><ul><li>Competences of production </li></ul><ul><li>Auxiliary (general) competences </li></ul>IMPROVEMENT 2: COMPETENCES
    33. 33. <ul><li>Competence levels : </li></ul><ul><li>Unsatisfactory = provider cannot do the corresponding work </li></ul><ul><li>Basic = can work </li></ul><ul><li>Advanced = can revise and correct work of others or train others </li></ul>IMPROVEMENT 2: COMPETENCES
    34. 34. <ul><li>Competences of acquisition </li></ul><ul><li>Source language rules </li></ul><ul><li>Source Literary </li></ul><ul><li>Source Cultural </li></ul><ul><li>Subject matter </li></ul>IMPROVEMENT 2: COMPETENCES
    35. 35. <ul><li>Competences of production : </li></ul><ul><li>Target language rules </li></ul><ul><li>Target literary </li></ul><ul><li>Target cultural </li></ul><ul><li>Target mode of expression (= register, functional style) </li></ul>IMPROVEMENT 2: COMPETENCES
    36. 36. <ul><li>Auxiliary (general) competences : </li></ul><ul><li>Research </li></ul><ul><li>Technical </li></ul><ul><li>General Carefulness, Responsibility and Self-organisation </li></ul><ul><li>Communication (relevant for translation as a service, not the product) </li></ul><ul><li>Information security (relevant for translation as a service, not the product) </li></ul>IMPROVEMENT 2: COMPETENCES
    37. 37. IMPROVEMENT 2: COMPETENCES
    38. 38. IMPROVEMENT 2: COMPETENCES
    39. 39. <ul><li>Client can formulate precise and objective requirements to the provider </li></ul><ul><li>Assessment immediately shows which competences stand to the required level and which don’t </li></ul>IMPROVEMENT 2: COMPETENCES
    40. 40. <ul><li>Map each workflow role (e.g. translate, compile project glossary, revise language etc.) to a number of required competences </li></ul>IMPROVEMENT 3: WORKFLOW ROLES
    41. 41. IMPROVEMENT 3: WORKFLOW ROLES <ul><li>Example of a competence set : </li></ul><ul><li>Self-organisation = basic </li></ul><ul><li>Subject matter = basic </li></ul><ul><li>Source language rules = basic </li></ul><ul><li>Target language rules = basic </li></ul><ul><li>Role : </li></ul><ul><li>Can translate </li></ul>
    42. 42. IMPROVEMENT 3: WORKFLOW ROLES
    43. 43. <ul><li>Vendor Manager / Project Manager quickly assigns workflow roles and schedules the project </li></ul><ul><li>saves time </li></ul>IMPROVEMENT 3: WORKFLOW ROLES
    44. 44. <ul><li>In each case client indicates which error types are allowed and which are not . The expert puts down the client requirements in a list </li></ul><ul><li>One “not allowed” error in the sample </li></ul><ul><li>Text fails (client perspective) </li></ul>IMPROVEMENT 4: ERROR ALLOWABILITY
    45. 45. IMPROVEMENT 4: ERROR ALLOWABILITY
    46. 46. <ul><li>Assessment stands to the real client needs ( “pass / fail” ) </li></ul>IMPROVEMENT 4: ERROR ALLOWABILITY
    47. 47. <ul><li>Single out 2 major error groups : </li></ul><ul><li>Correcting the error requires minimum training / instructions; the provider can find and correct all errors of the type in his work himself </li></ul><ul><li>Correcting the error requires prolonged training; the provider cannot find all his errors in the text </li></ul>IMPROVEMENT 5: PROVIDER TRAINABILITY
    48. 48. <ul><li>Errors that require minimum training : </li></ul><ul><li>the original order of text sections is broken </li></ul><ul><li>broken cross-references </li></ul><ul><li>text omissions </li></ul><ul><li>numbers / dates do not correspond to the source </li></ul><ul><li>glossary / style guide violated </li></ul><ul><li>non-compliance with reference sources </li></ul><ul><li>inconsistent terminology </li></ul><ul><li>non-compliance with regional formatting standards </li></ul><ul><li>broken tags, line length </li></ul><ul><li>obvious language errors </li></ul><ul><li>etc. </li></ul>IMPROVEMENT 5: PROVIDER TRAINABILITY
    49. 49. <ul><li>Errors that require prolonged training : </li></ul><ul><li>understanding the source </li></ul><ul><li>confusion of special notions (subject competence) </li></ul><ul><li>stylistic devices and expressive means (literary competence) </li></ul><ul><li>cultural phenomena </li></ul><ul><li>etc. </li></ul>IMPROVEMENT 5: PROVIDER TRAINABILITY
    50. 50. IMPROVEMENT 5: PROVIDER TRAINABILITY
    51. 51. <ul><li>What is the percentage of errors requiring minimum training ? </li></ul><ul><li>PM can instantly take a decision – return the product for further improvement or correct with other resources </li></ul><ul><li>saves time </li></ul>IMPROVEMENT 5: PROVIDER TRAINABILITY
    52. 52. <ul><li>If all errors influencing the competence are easy to correct , the competence is assessed in two ways (at once) : </li></ul><ul><li>Current state (“as is”) </li></ul><ul><li>Potential state ( after a short training ) </li></ul>IMPROVEMENT 5: PROVIDER TRAINABILITY
    53. 53. <ul><li>Provider has to work in normal conditions ( enough time , work instructions ) </li></ul><ul><li>The sample should be restricted to one main subject field according to the subject classifier </li></ul><ul><li>The source text should be meaningful and coherent </li></ul><ul><li>It is important to differentiate between errors and preferential choices </li></ul>NOTES
    54. 54. <ul><li>To assess a sample the expert has to possess all competences on &quot; advanced &quot; level. </li></ul><ul><li>As it is difficult to find such experts in reality, several people can be assigned to assess one sample (e.g. one assesses terminology, another assesses all the other aspects) </li></ul><ul><li>Quality predictions for rush jobs cannot be based on normal competence assessment (as the rush quality output is normally lower) </li></ul>NOTES
    55. 55. EXAMPLE
    56. 56. EXAMPLE
    57. 57. EXAMPLE
    58. 58. EXAMPLE
    59. 59. <ul><li>The new assessment model replies to all the questions: </li></ul><ul><li>Will it blend? – pass / fail </li></ul><ul><li>What kind of work can I trust to this provider ? What can I not? – competences and workflow roles </li></ul><ul><li>How quickly can I train the provider ? – potential competences </li></ul><ul><li>Return for improvement or correct by other resources? – percentage of errors requiring minimum training </li></ul>CONCLUSION
    60. 60. <ul><li>Provider and client speak the same “language” ( error types and competences )  less debates </li></ul><ul><li>Saves time when testing providers </li></ul><ul><li>Simplifies planning of minimum and sufficient workflow, optimizes resources </li></ul><ul><li>Allows to avoid extra text processing stages when not necessary </li></ul><ul><li>(extra stages avoided)  better turnaround </li></ul><ul><li>(extra stages avoided)  more flexible budgets  higher rates  provider loyalty and good image for the company </li></ul><ul><li>Detailed feedback and training  provider loyalty </li></ul>BENEFITS
    61. 61. <ul><li>Adjustment and testing </li></ul><ul><li>Dedicated software tool </li></ul><ul><li>Integration with QA tools </li></ul>THE FUTURE OF THE TECHNIQUE
    62. 62. QUALITY MANAGEMENT PROCESS
    63. 63. <ul><li>Thank you! </li></ul><ul><li>Questions ? </li></ul>[email_address]

    ×