SlideShare a Scribd company logo
1 of 34
The science of 
Language Quality Assurance 
What’s behind two new ASTM work items, 
WK46396 and WK46397 
Serge Gladkoff, 
(GALA, Logrus International) 
Chicago, November 5, 2014
LISA QA Model 
SAE J2450 
SDL TMS 
Acrocheck 
ApSIC XBench 
CheckMate 
QA Distiller 
XLIFF:Doc 
EN15038 
{ …Proprietary metrics and 
scorecards… } 
… 
What is translation quality? All of them disagree 
on what quality is. Would you dare giving a 
universal definition, considering all these 
authors had their own idea of what it is? 
Quality 
Definition?
A THEORY OF BIG GAME HUNTING 
PROBLEM 
To Catch a Lion in the Sahara Desert. 
SOLUTION: THE BOLZANO-WEIERSTRASS 
METHOD 
Divide the desert by a line running from north to 
south. The lion is then either in the eastern or in the 
western part. Lets assume it is in the eastern part. 
Divide this part by a line running from east to west. 
The lion is either in the northern or in the southern 
part. Lets assume it is in the northern part. We can 
continue this process arbitrarily and thereby 
constructing with each step an increasingly narrow 
fence around the selected area. The diameter of the 
chosen partitions converges to zero so that the lion 
is caged into a fence of arbitrarily small diameter.
GENERAL CONSIDERATIONS 
Means 
A. Concentrating on factors 
making strongest impression 
B. Separating global (holistic) 
and local issues, with the 
former being typically more 
important and playing bigger 
role 
Reflecting the perception 
and priorities of the target 
audience
GENERAL CONSIDERATIONS 
Means 
A. Covering the whole spectrum of 
potential uses, subject areas, and 
materials; 
B. From slightly post-edited MT to ultra-polished 
manual translations 
C. Common approach 
D. Same approach to technical materials 
and marketing content 
E. Only adjust acceptance criteria / 
thresholds based on expectations 
Universal applicability 
We are all humans and, irrespective of what exactly 
we are looking at, be it a restaurant menu or drug 
usage guidelines, we are making our first judgment 
about text quality using exactly the same criteria. We 
do not need a different approach or a completely new 
metric for each subject area or type of content. In 
reality, the only thing that requires adjustment is 
tolerance level. We are ready to accept a barely 
comprehensible menu translation, but expect perfect 
clarity and lack of ambiguity in the medical area. In 
technical terms, this means that we are still measuring 
the same thing, i.e. readability/clarity, but with 
different expectations, and this approach applies to all 
other criteria.
GENERAL CONSIDERATIONS 
Means 
A. Should be clear, not overly 
complicated 
B. Should be process-friendly, 
i.e. reasonably economical 
and applicable to the real 
world 
Viability of methodology
GENERAL CONSIDERATIONS 
Means 
A. Concentrating on methodology rather than 
particular cases/uses. 
B. Issue typology is not an inalienable part of 
the methodology, but rather an add-on 
component. It can be based for instance on 
MQM or other source, or legacy criteria, 
including those used/provided by the client. 
C. Weights assigned to particular issues are 
expected to vary within a wide range 
depending on the goals set, subject matter, 
type of material, etc. Particular issues might 
simply prove irrelevant for the job or area of 
focus, which results in zero weights being 
assigned to these issues. 
Flexibility of approach 
The client knows what types of are important to his 
content.
ENTIRETY OF IMPRESSION 
Reader/consumer is primarily interested in 
overall readability and adequacy of the whole 
piece, and only then in readability of parts 
(sentences).
TWO KEY FACTORS 
ADEQUACY 
READABILITY
THRESHOLD OF ACCEPTANCE 
…is determined by usability expectations 
Expectation of how readable and adequate the translated content should be, 
determines the acceptable quality level for these key cornerstone factors.
GRADING 
If piece has serious defect, it 
has to be discarded without 
wasting time on further analysis. 
If text is inadequate or 
unreadable, it does not make 
sense to count typos or see 
whether the terminology is right. 
Good stuff 
Substandard
Acceptance threshold 
Neither Readability nor Adequacy are 100% objective 
The solution lies in evaluating each of the two 
major holistic criteria (readability and adequacy) 
separately, on a PASS/FAIL basis. 
The logical thing to do is establish an acceptance 
threshold that would correspond to the lower end 
of the statistical range. 
How can we deal with this lack of complete 
objectivity in a real-world scenario, when no 
reference translations are available, there is a 
single reviewer who can only look at a certain 
percentage of the overall content, and we still 
need to evaluate and grade translated texts?
The scale from 0 to 10 
The smaller scale will not fit the Bell curve 
Important and direct consequence is 
that the scale used for holistic 
translation ratings should be at least 
between 0 and 10, and by no means 
smaller.
Atomistic Quality 
Fluency (content) 
Fluency 
(mechanical) 
Spelling 
Style Guide 
Typography 
Grammar 
Locale convension 
…. 
Inconsistency 
Idiomatic 
Duplication 
Ambiguity 
Accuracy 
Mistranslation 
Omission 
Addition 
Untranslated 
Printing 
Copying 
Color and black 
and white digital printing 
Internationalization 
Compatibility 
(other) 
Design 
Global font choice 
Headers and footers 
Margins 
Page break 
Kerning 
….
MQM Tree
Atomistic Quality 
푄퐴 = 
푛 푁푖 ∗ 푊푖 
푖=0 
푉
Quality 
Triangle?
SHOWSTOPPER PROBLEM
..or quality 
square! 
There are things that you will 
know when you see them… 
Showstoppers…
Building the concrete LQA metrics 
1 
2 
3 
4 
5 
The methodology fully covers all types of 
translated content, including those produced 
using MT and/or MT + post-editing.
Applying LQA metrics 
Applying it 
correctly 
Three-dimensional 
vector 
 Holistic 
readability 
 Holistic 
adequacy 
 Atomic 
compound 
detailed 
metrics 
Readability threshold 
 Pass/Fail 
Adequacy threshold 
 Pass/Fail 
Atomistic rating 
 Detailed score 
Implementation keys 
 Holistic parameters cannot 
be mixed 
 Only those materials that 
pass HP are analyzed 
further 
 Experts required to produce 
precise and reliable 
atomistic score 
 Select content to apply the 
metrics
In the vast majority of real-life cases, 
nobody can afford the luxury of 
employing an expert panel to evaluate 
the translation quality of any particular 
document or web portal. LSPs typically 
have to use a single reviewer who only 
looks at a certain percentage of the 
content. To produce meaningful, reliable 
results despite this limitation proper 
sampling must be done.
HOW FASTIDIOUS ARE YOU? 
95% Confidence Level % of Total - 0.25% CI % of Total - 0.5% CI % of Total - 0.125% CI 
Sample Size - 0.25% CI Sample Size - 0.5% CI Sample Size - 0.125% CI 
100,000 
10,000 
1,000 
100 
100% 
90% 
80% 
70% 
60% 
50% 
40% 
30% 
20% 
10% 
0% 
100 1,000 10,000 100,000 1,000,000 
Sample 
Size 
(Words) 
% to Be Checked 
Total Volume, Words
THE LUXURY OF FULL METRICS 
Develop metrics 
 Significant research 
- Know your area 
- Sustain R&D 
- Develop metrics 
- Develop processes 
- How much and 
what to QA? 
Build supply chain 
 Professional LSP 
 Professional linguists 
 Provide training 
 Provide reference 
materials 
Pay to apply 
 Terminology 
maintenance & 
support 
 Translation Memory 
maintenance and 
management 
 Localization Quality 
Assurance costs
The scene of 
PUBLIC SITE
THE NEED AND THE CONSTRAINTS 
CONTRAINTS 
A. Professional LQA would require global 
federal program to develop applicable 
LQA metrics, allocate funding, book 
professional LQA with specially trained 
LSPs. 
B. Yet, there’s still acute need for that, 
which has been demonstrated by 
GUIDADO DE SALUD web site of 
Affordable Care act. 
C. The methodology of public LQA is very 
much needed. 
D. There IS a feedback on the site from 
the public, how it is to be handled? 
Executive Order 13166 
 http://www.justice.gov/crt/about/c 
or/13166.php 
 www.lep.gov 
 “requires Federal agencies to 
examine the services they provide, 
identify any need for services to 
those with limited English 
proficiency (LEP), and develop and 
implement a system to provide 
those services so LEP persons can 
have meaningful access to them”
CONSTRAINTS 
of public feedback 
The Crowd 
 Cannot be trained 
 Is not ready to spend 
a lot of time 
 Opinionated by 
definition 
The Feedback 
 Is limited by volume 
 Is random by nature 
 Arbitrary issue 
classification 
 Can be large in 
number of reviewers 
The Approach 
 Using the statistical 
approach to turn the 
tables and gain in 
another area what 
we have lost
THE METRICS 
1. Quality square approach 
There MAY be showstopper errors. 
2. The parameters are simplified (no detailed issue definitions) 
No detailed Atomistic quality issue definitions can be applied. 
3. Each reviewer produces four ratings on 0 – 10 scale 
“0 – 10” scale is the smallest one to accommodate the Bell Curve. 
(Each reviewer is asked to provide examples.) 
4. The calibration: 
(a) Showstopper: 0 = two or more major errors, 10 = no major errors 
(b) Holistic readability (fluency): 0 = incomprehensible, 10 = a poem 
(c) Holistic adequacy (accuracy): 0 = inadequate, 10 = perfectly conveying meaning 
(d) Atomistic (small specific errors): 0 = full of small errors, 10 = completely error-free 
For crowd sourced LQA the atomistic quality category is not formalized in any way whatsoever.
THE PROCESS 
1. LQA review scope is defined and briefly and clearly explained 
To prevent reviewers straying to other areas.. 
2. The content needs to be final 
Despite the fact that review was 
by design a less than ideal 
Updates and scope changes are outside of the scope of crowdsourced review. 
3. Communication is done via simple online portal 
No bandwidth to manage the crowd manually. 
4. Better if volunteers are language professionals 
community feedback-based LQA, 
resulting in rating inconsistency 
among reviewers, most 
reviewers found too many 
noticeable and annoying 
It would compensate fore the lack of special training. 
5. Proper sampling 
No less than 10 reviewers for each area; the more – the better. 
6. Proper processing 
technical/minor mistakes in the 
text, as reflected in the low 
The results are manually vetted to remove outliers: 
- discard outliers w/o explanation and obvious reviewers errors 
- are major errors statistically significant? 30% threshold instead of 5% is recommended. 
- apply statistics to analyze results 
average rating, which is 
unsatisfactory. Substantial 
remedial work is clearly called 
..an average Readability 
Rating as 6.2 out of 10 
with a standard 
deviation of 2.2, and 
Adequacy 6.5 out of 10 
with a standard 
deviation of 1.9. 
The conclusion would be: The text 
is readable (rating above 5), but 
barely so, and leaves much to be 
desired in view of its importance 
and high level of public exposure. 
Again, it is up to the expert who is 
doing the analysis to define the 
threshold, that, for example, for this 
type of content a proper target for 
average readability is at least 8 out 
of 10. 
for in this area. 
…the adjusted value for fechnical errors of 
4.7 out of 10 for the average atomistic 
quality rating, with a 2.4 standard deviation
THE ONLY PUBLIC LQA METRICS AVAILABLE 
POSITIVES 
• Both holistic 
measures can be 
relied upon with 
reasonable 
confidence 
• Good overall 
assessment 
• Allows to identify 
showstoppers 
• Good general idea of 
the level of technical 
errors 
• Affordable and 
available for US 
federal agencies 
Is it appropriate? CONTRA 
• Only rough judgment 
• Not a good 
quantitative 
assessment 
• Not complete roster 
of errors even in the 
selected sample 
• No concrete process 
recommendations 
YOU DECIDE!
MORE INFORMATION ABOUT MQM 
http://www.qt21.eu/mqm-definition/definition-2014-08-19.html
MORE INFORMATION ABOUT METHODOLOGY
WK46396= 
WK46397= 
The Proposal 
MQM 
The guide for LQA Methodology
THANK YOU! 
sgladkoff@logrus.net 
sgladkoff@gala-global.org

More Related Content

Similar to Serge astm-presentation-chicago-2014-final

Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...
Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...
Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...chrisbailey000
 
Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...
Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...
Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...TAUS - The Language Data Network
 
Project104_Group713_ProgressReportI
Project104_Group713_ProgressReportIProject104_Group713_ProgressReportI
Project104_Group713_ProgressReportISarp Uzel
 
Poster Tweet-Norm 2013
Poster Tweet-Norm 2013Poster Tweet-Norm 2013
Poster Tweet-Norm 2013pruiz_
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
1WR RapiTests for Sensory
1WR RapiTests for Sensory1WR RapiTests for Sensory
1WR RapiTests for SensoryAlexandre Khan
 
Get help with SWE4202 Computing Infrastructure Assignment
Get help with SWE4202 Computing Infrastructure AssignmentGet help with SWE4202 Computing Infrastructure Assignment
Get help with SWE4202 Computing Infrastructure AssignmentAaravSunak
 
Comparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World BugComparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World BugStefano Di Paola
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...SDL
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyIconic Translation Machines
 
Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...
Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...
Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...Lviv Startup Club
 
Testing Intelligence
Testing IntelligenceTesting Intelligence
Testing IntelligenceLalit Bhamare
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Ian McDonald
 
Due 12 10 2016Week 10 Term PaperClick the link above to submit.docx
Due 12 10 2016Week 10 Term PaperClick the link above to submit.docxDue 12 10 2016Week 10 Term PaperClick the link above to submit.docx
Due 12 10 2016Week 10 Term PaperClick the link above to submit.docxsagarlesley
 
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docxASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docxsherni1
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxKiranKumar918931
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 

Similar to Serge astm-presentation-chicago-2014-final (20)

TAUS Best Practices Error Typology Guidelines
TAUS Best Practices Error Typology GuidelinesTAUS Best Practices Error Typology Guidelines
TAUS Best Practices Error Typology Guidelines
 
Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...
Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...
Measuring and Comparing the Reliability of the Structured Walkthrough Evaluat...
 
21
2121
21
 
Quality meas2001
Quality meas2001Quality meas2001
Quality meas2001
 
Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...
Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...
Opening talk: Quality Evaluation at the EU - Ingemar Strandvik (European Comm...
 
Project104_Group713_ProgressReportI
Project104_Group713_ProgressReportIProject104_Group713_ProgressReportI
Project104_Group713_ProgressReportI
 
Poster Tweet-Norm 2013
Poster Tweet-Norm 2013Poster Tweet-Norm 2013
Poster Tweet-Norm 2013
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
1WR RapiTests for Sensory
1WR RapiTests for Sensory1WR RapiTests for Sensory
1WR RapiTests for Sensory
 
Get help with SWE4202 Computing Infrastructure Assignment
Get help with SWE4202 Computing Infrastructure AssignmentGet help with SWE4202 Computing Infrastructure Assignment
Get help with SWE4202 Computing Infrastructure Assignment
 
Comparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World BugComparing DOM XSS Tools On Real World Bug
Comparing DOM XSS Tools On Real World Bug
 
Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...Learn the different approaches to machine translation and how to improve the ...
Learn the different approaches to machine translation and how to improve the ...
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...
Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...
Veronika Snizhko: Оцінка якості NLP проєкту: чому автоматичних метрик може бу...
 
Testing Intelligence
Testing IntelligenceTesting Intelligence
Testing Intelligence
 
Estimating test effort part 2 of 2
Estimating test effort part 2 of 2Estimating test effort part 2 of 2
Estimating test effort part 2 of 2
 
Due 12 10 2016Week 10 Term PaperClick the link above to submit.docx
Due 12 10 2016Week 10 Term PaperClick the link above to submit.docxDue 12 10 2016Week 10 Term PaperClick the link above to submit.docx
Due 12 10 2016Week 10 Term PaperClick the link above to submit.docx
 
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docxASSIGNMENT 2 - Research Proposal    Weighting 30 tow.docx
ASSIGNMENT 2 - Research Proposal Weighting 30 tow.docx
 
Application_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptxApplication_of_Deep_Learning_Techniques.pptx
Application_of_Deep_Learning_Techniques.pptx
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 

Recently uploaded

Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...aakahthapa70
 
Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712Delhi Escorts Service
 
JABALPUR CALL GIRL 92628/71154 JABALPUR K
JABALPUR CALL GIRL 92628/71154 JABALPUR KJABALPUR CALL GIRL 92628/71154 JABALPUR K
JABALPUR CALL GIRL 92628/71154 JABALPUR KNiteshKumar82226
 
9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Available
9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Available9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Available
9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Availablenitugupta1209
 
Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...
Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...
Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...aakahthapa70
 
Call US Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...
Call US  Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...Call US  Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...
Call US Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...Pooja Nehwal
 
Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...
Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...
Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...Ayesha Khan
 
Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7
Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7
Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7Ayesha Khan
 
BHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALLBHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALLNiteshKumar82226
 
Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝
Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝
Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝Lipikasharma29
 
Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...
Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...
Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...Ayesha Khan
 
🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...
🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...
🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...monikaservice1
 
💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋Sheetaleventcompany
 
Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712Delhi Escorts Service
 
Call Girls in Calangute Beach 8588052666 Goa Escorts ...
Call Girls in Calangute Beach 8588052666 Goa Escorts ...Call Girls in Calangute Beach 8588052666 Goa Escorts ...
Call Girls in Calangute Beach 8588052666 Goa Escorts ...nishakur201
 
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝thapagita
 
Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7
Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7
Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7Ayesha Khan
 
Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...
Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...
Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...aakahthapa70
 

Recently uploaded (20)

Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
Call Girls In {Laxmi Nagar Delhi} 9667938988 Indian Russian High Profile Girl...
 
Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 29, (Gurgaon) Call Us. 9711911712
 
CALL GIRLS IN GOA & ESCORTS SERVICE +919540619990
CALL GIRLS IN GOA & ESCORTS SERVICE +919540619990CALL GIRLS IN GOA & ESCORTS SERVICE +919540619990
CALL GIRLS IN GOA & ESCORTS SERVICE +919540619990
 
JABALPUR CALL GIRL 92628/71154 JABALPUR K
JABALPUR CALL GIRL 92628/71154 JABALPUR KJABALPUR CALL GIRL 92628/71154 JABALPUR K
JABALPUR CALL GIRL 92628/71154 JABALPUR K
 
9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Available
9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Available9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Available
9811611494,Low Rate Call Girls In Connaught Place Delhi 24hrs Available
 
Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...
Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...
Genuine Call Girls In {Mahipalpur Delhi} 9667938988 Indian Russian High Profi...
 
Call US Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...
Call US  Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...Call US  Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...
Call US Pooja📞 9892124323 ✅Call Girls In Mira Road ( Mumbai ) secure service...
 
Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...
Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...
Book Call Girls in Lahore || 03070433345 || Young, Hot, Sexy, VIP Girls Avail...
 
Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7
Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7
Call Girls In Lahore || 03010449222 ||Lahore Call Girl Available 24/7
 
BHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALLBHOPAL CALL GIRL 92628*71154 BHOPAL CALL
BHOPAL CALL GIRL 92628*71154 BHOPAL CALL
 
Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝
Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝
Call Girls in Janakpuri Delhi 💯 Call Us 🔝9667422720🔝
 
Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...
Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...
Call Girls In Islamabad | 03278838827 || 24/7 Service Islamabad Call Girls & ...
 
🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...
🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...
🔝Call Girls In INA Colony Call Us ➥ 8800357707 In Call Out Call Both With Hig...
 
💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Bangalore Escort Service Call Girls, ₹5000 To 25K With AC💚😋
 
Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712
Call Girls In Sector 90, (Gurgaon) Call Us. 9711911712
 
Call Girls in Calangute Beach 8588052666 Goa Escorts ...
Call Girls in Calangute Beach 8588052666 Goa Escorts ...Call Girls in Calangute Beach 8588052666 Goa Escorts ...
Call Girls in Calangute Beach 8588052666 Goa Escorts ...
 
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
Call Girls in Majnu ka Tilla Delhi 💯 Call Us 🔝9711014705🔝
 
Call Girls In Saket Delhi 9953056974 (Low Price) Escort Service Saket Delhi
Call Girls In Saket Delhi 9953056974 (Low Price) Escort Service Saket DelhiCall Girls In Saket Delhi 9953056974 (Low Price) Escort Service Saket Delhi
Call Girls In Saket Delhi 9953056974 (Low Price) Escort Service Saket Delhi
 
Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7
Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7
Call Girls in Karachi || 03081633338 || 50+ Hot Sexy Girls Available 24/7
 
Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...
Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...
Call Girls In {Aerocity Delhi} 98733@20244 Indian Russian High Profile Girls ...
 

Serge astm-presentation-chicago-2014-final

  • 1. The science of Language Quality Assurance What’s behind two new ASTM work items, WK46396 and WK46397 Serge Gladkoff, (GALA, Logrus International) Chicago, November 5, 2014
  • 2. LISA QA Model SAE J2450 SDL TMS Acrocheck ApSIC XBench CheckMate QA Distiller XLIFF:Doc EN15038 { …Proprietary metrics and scorecards… } … What is translation quality? All of them disagree on what quality is. Would you dare giving a universal definition, considering all these authors had their own idea of what it is? Quality Definition?
  • 3. A THEORY OF BIG GAME HUNTING PROBLEM To Catch a Lion in the Sahara Desert. SOLUTION: THE BOLZANO-WEIERSTRASS METHOD Divide the desert by a line running from north to south. The lion is then either in the eastern or in the western part. Lets assume it is in the eastern part. Divide this part by a line running from east to west. The lion is either in the northern or in the southern part. Lets assume it is in the northern part. We can continue this process arbitrarily and thereby constructing with each step an increasingly narrow fence around the selected area. The diameter of the chosen partitions converges to zero so that the lion is caged into a fence of arbitrarily small diameter.
  • 4. GENERAL CONSIDERATIONS Means A. Concentrating on factors making strongest impression B. Separating global (holistic) and local issues, with the former being typically more important and playing bigger role Reflecting the perception and priorities of the target audience
  • 5. GENERAL CONSIDERATIONS Means A. Covering the whole spectrum of potential uses, subject areas, and materials; B. From slightly post-edited MT to ultra-polished manual translations C. Common approach D. Same approach to technical materials and marketing content E. Only adjust acceptance criteria / thresholds based on expectations Universal applicability We are all humans and, irrespective of what exactly we are looking at, be it a restaurant menu or drug usage guidelines, we are making our first judgment about text quality using exactly the same criteria. We do not need a different approach or a completely new metric for each subject area or type of content. In reality, the only thing that requires adjustment is tolerance level. We are ready to accept a barely comprehensible menu translation, but expect perfect clarity and lack of ambiguity in the medical area. In technical terms, this means that we are still measuring the same thing, i.e. readability/clarity, but with different expectations, and this approach applies to all other criteria.
  • 6. GENERAL CONSIDERATIONS Means A. Should be clear, not overly complicated B. Should be process-friendly, i.e. reasonably economical and applicable to the real world Viability of methodology
  • 7. GENERAL CONSIDERATIONS Means A. Concentrating on methodology rather than particular cases/uses. B. Issue typology is not an inalienable part of the methodology, but rather an add-on component. It can be based for instance on MQM or other source, or legacy criteria, including those used/provided by the client. C. Weights assigned to particular issues are expected to vary within a wide range depending on the goals set, subject matter, type of material, etc. Particular issues might simply prove irrelevant for the job or area of focus, which results in zero weights being assigned to these issues. Flexibility of approach The client knows what types of are important to his content.
  • 8. ENTIRETY OF IMPRESSION Reader/consumer is primarily interested in overall readability and adequacy of the whole piece, and only then in readability of parts (sentences).
  • 9. TWO KEY FACTORS ADEQUACY READABILITY
  • 10. THRESHOLD OF ACCEPTANCE …is determined by usability expectations Expectation of how readable and adequate the translated content should be, determines the acceptable quality level for these key cornerstone factors.
  • 11. GRADING If piece has serious defect, it has to be discarded without wasting time on further analysis. If text is inadequate or unreadable, it does not make sense to count typos or see whether the terminology is right. Good stuff Substandard
  • 12. Acceptance threshold Neither Readability nor Adequacy are 100% objective The solution lies in evaluating each of the two major holistic criteria (readability and adequacy) separately, on a PASS/FAIL basis. The logical thing to do is establish an acceptance threshold that would correspond to the lower end of the statistical range. How can we deal with this lack of complete objectivity in a real-world scenario, when no reference translations are available, there is a single reviewer who can only look at a certain percentage of the overall content, and we still need to evaluate and grade translated texts?
  • 13. The scale from 0 to 10 The smaller scale will not fit the Bell curve Important and direct consequence is that the scale used for holistic translation ratings should be at least between 0 and 10, and by no means smaller.
  • 14. Atomistic Quality Fluency (content) Fluency (mechanical) Spelling Style Guide Typography Grammar Locale convension …. Inconsistency Idiomatic Duplication Ambiguity Accuracy Mistranslation Omission Addition Untranslated Printing Copying Color and black and white digital printing Internationalization Compatibility (other) Design Global font choice Headers and footers Margins Page break Kerning ….
  • 16. Atomistic Quality 푄퐴 = 푛 푁푖 ∗ 푊푖 푖=0 푉
  • 19. ..or quality square! There are things that you will know when you see them… Showstoppers…
  • 20. Building the concrete LQA metrics 1 2 3 4 5 The methodology fully covers all types of translated content, including those produced using MT and/or MT + post-editing.
  • 21. Applying LQA metrics Applying it correctly Three-dimensional vector  Holistic readability  Holistic adequacy  Atomic compound detailed metrics Readability threshold  Pass/Fail Adequacy threshold  Pass/Fail Atomistic rating  Detailed score Implementation keys  Holistic parameters cannot be mixed  Only those materials that pass HP are analyzed further  Experts required to produce precise and reliable atomistic score  Select content to apply the metrics
  • 22. In the vast majority of real-life cases, nobody can afford the luxury of employing an expert panel to evaluate the translation quality of any particular document or web portal. LSPs typically have to use a single reviewer who only looks at a certain percentage of the content. To produce meaningful, reliable results despite this limitation proper sampling must be done.
  • 23. HOW FASTIDIOUS ARE YOU? 95% Confidence Level % of Total - 0.25% CI % of Total - 0.5% CI % of Total - 0.125% CI Sample Size - 0.25% CI Sample Size - 0.5% CI Sample Size - 0.125% CI 100,000 10,000 1,000 100 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 100 1,000 10,000 100,000 1,000,000 Sample Size (Words) % to Be Checked Total Volume, Words
  • 24. THE LUXURY OF FULL METRICS Develop metrics  Significant research - Know your area - Sustain R&D - Develop metrics - Develop processes - How much and what to QA? Build supply chain  Professional LSP  Professional linguists  Provide training  Provide reference materials Pay to apply  Terminology maintenance & support  Translation Memory maintenance and management  Localization Quality Assurance costs
  • 25. The scene of PUBLIC SITE
  • 26. THE NEED AND THE CONSTRAINTS CONTRAINTS A. Professional LQA would require global federal program to develop applicable LQA metrics, allocate funding, book professional LQA with specially trained LSPs. B. Yet, there’s still acute need for that, which has been demonstrated by GUIDADO DE SALUD web site of Affordable Care act. C. The methodology of public LQA is very much needed. D. There IS a feedback on the site from the public, how it is to be handled? Executive Order 13166  http://www.justice.gov/crt/about/c or/13166.php  www.lep.gov  “requires Federal agencies to examine the services they provide, identify any need for services to those with limited English proficiency (LEP), and develop and implement a system to provide those services so LEP persons can have meaningful access to them”
  • 27. CONSTRAINTS of public feedback The Crowd  Cannot be trained  Is not ready to spend a lot of time  Opinionated by definition The Feedback  Is limited by volume  Is random by nature  Arbitrary issue classification  Can be large in number of reviewers The Approach  Using the statistical approach to turn the tables and gain in another area what we have lost
  • 28. THE METRICS 1. Quality square approach There MAY be showstopper errors. 2. The parameters are simplified (no detailed issue definitions) No detailed Atomistic quality issue definitions can be applied. 3. Each reviewer produces four ratings on 0 – 10 scale “0 – 10” scale is the smallest one to accommodate the Bell Curve. (Each reviewer is asked to provide examples.) 4. The calibration: (a) Showstopper: 0 = two or more major errors, 10 = no major errors (b) Holistic readability (fluency): 0 = incomprehensible, 10 = a poem (c) Holistic adequacy (accuracy): 0 = inadequate, 10 = perfectly conveying meaning (d) Atomistic (small specific errors): 0 = full of small errors, 10 = completely error-free For crowd sourced LQA the atomistic quality category is not formalized in any way whatsoever.
  • 29. THE PROCESS 1. LQA review scope is defined and briefly and clearly explained To prevent reviewers straying to other areas.. 2. The content needs to be final Despite the fact that review was by design a less than ideal Updates and scope changes are outside of the scope of crowdsourced review. 3. Communication is done via simple online portal No bandwidth to manage the crowd manually. 4. Better if volunteers are language professionals community feedback-based LQA, resulting in rating inconsistency among reviewers, most reviewers found too many noticeable and annoying It would compensate fore the lack of special training. 5. Proper sampling No less than 10 reviewers for each area; the more – the better. 6. Proper processing technical/minor mistakes in the text, as reflected in the low The results are manually vetted to remove outliers: - discard outliers w/o explanation and obvious reviewers errors - are major errors statistically significant? 30% threshold instead of 5% is recommended. - apply statistics to analyze results average rating, which is unsatisfactory. Substantial remedial work is clearly called ..an average Readability Rating as 6.2 out of 10 with a standard deviation of 2.2, and Adequacy 6.5 out of 10 with a standard deviation of 1.9. The conclusion would be: The text is readable (rating above 5), but barely so, and leaves much to be desired in view of its importance and high level of public exposure. Again, it is up to the expert who is doing the analysis to define the threshold, that, for example, for this type of content a proper target for average readability is at least 8 out of 10. for in this area. …the adjusted value for fechnical errors of 4.7 out of 10 for the average atomistic quality rating, with a 2.4 standard deviation
  • 30. THE ONLY PUBLIC LQA METRICS AVAILABLE POSITIVES • Both holistic measures can be relied upon with reasonable confidence • Good overall assessment • Allows to identify showstoppers • Good general idea of the level of technical errors • Affordable and available for US federal agencies Is it appropriate? CONTRA • Only rough judgment • Not a good quantitative assessment • Not complete roster of errors even in the selected sample • No concrete process recommendations YOU DECIDE!
  • 31. MORE INFORMATION ABOUT MQM http://www.qt21.eu/mqm-definition/definition-2014-08-19.html
  • 32. MORE INFORMATION ABOUT METHODOLOGY
  • 33. WK46396= WK46397= The Proposal MQM The guide for LQA Methodology
  • 34. THANK YOU! sgladkoff@logrus.net sgladkoff@gala-global.org

Editor's Notes

  1. The results are presented on this graph. It’s not as complex as it might seem at a first glance. Let me explain. The horizontal axis represents volume in words, between 100 on the left and 1,000,000 on the right. A logarithmic scale is used. The left vertical axis displays percentage of the total volume to be checked (between 0% and 100%), while the right vertical axis shows sample size in words and uses a logarithmic scale for obvious reasons. Now let’s proceed to the curves themselves. The ones starting at 100% and decreasing represent the percentage of the total volume to be checked. The ones starting at 100 words and going up represent the word count to be checked, depending on the total volume. The color of the curves represents different error margins: The higher the precision, the more we have to check. Red curves in the middle represent the medium error margin of a quarter of a percent. Green curves correspond to a tighter error margin (1/8 of a percent), and blue ones – to a more relaxed half-a-percent error margin. Of course, we might be more or less fastidious depending on the situation, but generally it is recommended to stay between the blue and green curves. If we are checking more, we are probably checking too much, and if we are checking less, we are probably checking too little. Now, let’s go over some specifics: When volumes are low, below 10,000 words, the percentage to be checked is close to 100%. This means we have to check everything, and it is quite reasonable as far as the volume is very low overall. We simply can’t make reliable assumptions about quality of the material using a small sample. In the midrange between 20,000 and 200,000 words the percentage to be checked starts going down. When the volume exceeds 300,000 words the curves reach saturation. It means that the volume to be checked stays flat almost irrespective of the overall volume, somewhere between 100,000 and 150,000 words. This volume can be divided or multiplied by a factor of 3 depending on the precision we want to achieve.