19. #LocWorld41
Linguistic Quality and how to scale it
• Automate objective quality
• Put vendors, tools, and metrics in
place for subjective quality
• Scale subjective reviews
• Design machine translation post-
editing (MTPE) efforts for vendor and
tools
• Automated MT quality reviews
Quality
Objective
Subjective
Human
quality at
scale
MTPE
MT and
MTPE
Quality at
scale
Goals:
Localization, internationalization, globalization, translation, regionalization, marketization…
Many, many “…ation” terms…
During the next sessions we will make sense of them for you
Learning and discussing examples
A Globalization Consultant who helps companies assess their current infrastructure and processes and helps to create a strategy, plan, trainings, and analytics for globalization across their whole organization.
Former community college English Teacher of research, and literature
Former Platform Localization Program Manager who helped design infrastructure for Amazon
The goal is to adapt or what I call “culturate” products and services to a locale and region.
If I’m selling to you, I speak your language. If I’m buying, dann müssen Sie Deutsch sprechen! (Willy Brandt: Former German Chancellor)
Culturation:
WELD:
Whole Enterprise Localization Design
Standard Project Management concerns of the “triple constraints”.
Scope: Quality, regulations, Type of translation, regional and discipline concerns.
However content is passed back and forth there are ample opportunities for issues with the process and resulting content.
Make sure to map out the process and consider the initial content deliveries.
It is best to end-to-end trials of the process, tools, and systems to ensure everything works and the expected results are met before ramping up production.
Localization PM: Vendor and Client will each have a PM and they are the point on getting projects completed on-time and on-budget
Localization Engineer: There will be client and vendor loc engineers. Their job is to ensure the content for translation is round-tripped effectively to all the places the content needs to go.
Linguist: Linguists may translate, review, edit, or post-edit translations. They are highly skilled in 2 or more languages and usually do their work into their native language.
MLV: Multi-language vendor. Think of this as your main contact for all the individuals and smaller firms that do your linguistic work.
SLV: Single language vendors work specifically from and to a single language.
MT Provider: If you have an MT provider they will customize NMT or SMT for you.
QA Team: QA team will test for linguistic/functional issues in your localized software.
Project management skills
Requirements: Longevity, quality, speed, cost, etc.
Multimedia: Special Care and extra time/cost is required to produced polished usable content.
Data Intensive: Require bilingual or monolingual data for training languages and behaviors.
Special regulations or requirements: Marketing, games, and specific industries require
Time-specific issues are
Poor Planning and design: Leads to redevelopment, missed functional
Process: Constant churn, product redesign, and redevelopment extends localization process. Most can be avoided by including localization teams early.
Artificial launch timelines: Abandoning or reducing scope of localization. Often happens when a company makes assumptions about global product launches, or adds new locales after initial planning.
Volume not planned: This affects the time and cost required for localization. It affects the overall quality of content produced to meet the deadlines and it will compromise the process and the product.
Locales not planned: Besides the effect on the timeline and budget more locales will strain internal resources, and cause issues with storage and data stores.
Content lifecycle not planned: If the content lifecycle is not addressed the authoring, and localization processes and the interactions between these teams will be affected. Also it will not be clear which locale is the true source locale and what customizations are required per locale. And of course this affects the creation, approval, amendment, and deprecation processes.
Human and MT quality processes differ, but the goal of quality analysis is the same
Ling Quality Objective
Issue: No automated tools to perform translation checks
Data for analysis: Anecdotal and metrics (cost, time, rework rates) to describe the opportunity and posit a solution.
Solution: hire vendors with tools and use the data to argue for internal tooling to do standard checks.
What I owned: I documented and vetted all options with stakeholders and proposed the best solution. I created contracts for initial offerings and user stories for development of internal tools.
Ling Quality Subjective
Issue: The focus of each team and discernment of quality differed.
Data for analysis: Vendor reporting for each of our main stakeholders and what their focus was. Costs of the reviews per team. Error rates and quality considerations
Solution: Integrated MQM into every vendor contract, each team could get the quality they needed for the cost, time, and quality of their business. We had that data compiled in a data lake and eventually built cross-team reporting with redshift and quicksight.
What I owned: I gathered the vendor data, the customer data, and the rubrics. I analyzed the data, made the argument for one standard and integrated the MQM standard into every vendor contract
Human quality at Scale:
Issue: costs of review quickly ran up as high as actual translation costs. Even a review of 1%-5% at a massive scale turned into millions of dollars.
Data for analysis: Cost and quality data, % of review, major error classifications.
Solution: Vendors, better quality tooling to capture stakeholder reviews, LSP reviews, and customer feedback. Lowered costs by automating more checks, and ensuring that we had editors to ensure quality control was an ongoing function. Renegotiated contracts for scale and added more vendors to increase competition
What I owned: Contracts, tooling user stories, gathering data
MT Quality post-editing:
Issue: Varying levels of MTPE offered, new users were unclear what they would get from the process and there was no standard across vendors or MT systems in use to measure quality.
Data for analysis: Data for MQM, LISA, DQF were gathered and Sample tests were done with all 3 methods and internal clients and vendors evaluated and ranked their viability
Solution: MQM adopted across all vendors performing MTPE.
What I owned: documenting methods, aligning samples, managing test process, vetting with stakeholders, and integrating MQM into vendor contracts.
Scaling the MT quality Process
Issue: standard scoring BLEU and METEOR scores were not a good indicator of MT quality.
Data for analysis: MT research team evaluated all MT quality with a vendor and used the same dataset to run BLEU, METEOR, and TER tests. The best correlation was TER.
Solution: Translation error rate was used and required of all vendors.
What I owned: Contracts with quality evaluation vendor teams. PM management of the vendor review process.
TMS, Xbench, Verifika examples to automate
Create vendor pool for quality reviews and clearly delineate costs and delays for the costs.
Decentralized localization teams with mostly manual processes
Multiple formats for measuring quality
Each team needed to adapt the quality to their needs.
How long did it take to solve?
How did I do it?
What was my contribution?
Each translation type presents a new set of problems and opportunities.
Rule-Based: First form of commercial MT. Limited by need for linguistic specialists, and the specificities of each language to scale.
Statistical: Originally limited by storage and data. Leads to uniform errors, and the rise of large software companies dominating MT. Google, Microsoft
NMT: Limited by compute power. Shift to GPUs and cloud compute resources made NMT viable in the last decade.
1954Georgetown-IBM Experiment translation 60+ sentences from Russian to English
1964 ALPAC report of 1964 (7 scientists) quashed govt funding and research for over a decade. The report said it was more costly, less accurate, and more time-consuming than human translation
Rule-based-Systran
Statistical MT (brute force)
NMT: Deep-learning, recursive neural networks, but still requires a large data set of bilingual corporat
Unsupervised NMT: Monolingual Corpora. Still early but the results are promising.
In-house: Product knowledgeable, but larger cost for volume
Vendor: Unlimited resources, but longer ramp time, and less product knowledge.
Freelancer: Lower cost per word, but higher management overhead
SMT : SMT can be better specific for domains, but they require a lot more data.
NMT: Lower data requirement, higher fluency, but also prone to nonsensical data. See Shakespeare study.
Perspective is all a matter of where you are standing.
WELD:
Whole Enterprise Localization Design
CAT: Computer-assisted translation: Linguist UI
TMS: Translation management system: PM / Engineer tooling
AI: Training data
MT: Training data, evaluation, improvement processes
Development: Language, localizability
Loc: TMS
Content lifecycle: TMS, Quality
Technologies: MT, AI, ML, automation of PM, etc.
Data Types: Systems in use and how the data will be used
Management: The EU General Data Protection Regulation (GDPR)
Use: Tools/ systems: e.g. Uis/UX, KM systems, authoritative stores, APIs for access
Raw data is useful in many ways to these players
The EU General Data Protection Regulation (GDPR)
Raw data is useful in many ways to these players
The C-level management will consider the cost/benefit of the localized content for international growth.
The tax and finance teams will want to understand the allocation for revenue and costs.
And the product team will want to measure engagement, abandonment, and necessary changes to the product when it is deployed internationally.
The C-level management will consider the cost/benefit of the localized content for international growth.
The tax and finance teams will want to understand the allocation for revenue and costs.
And the product team will want to measure engagement, abandonment, and necessary changes to the product when it is deployed internationally.
As you get more involved in the process there are a lot of other people involved to scale your work and have a larger impact on the organization and international expansion.
If you stay in localization and work at a few different institutions you will start to see design patterns for the organization and the interaction of localization teams with other parts of the company and with the interaction between localization teams.
Repetitive structures and head count.
Focus across groups differs
One group takes precedence