Making Terms Matter: Early term validation. Ronan Martin, SAS Institute
Ronan Martin – 11th September 2015
the ETV model
The term validation model described here assumes the existence of a
terminology management system (TMS) that is maintained by vendor or client,
and used to support the translation process. In our case we maintain a TMS
here in house.
We call this term validation model, Early Term Validation, or ETV for short.
ETV has been evolved in the context of a client-vendor relationship. It should
be noted that the "term" being validated is really the local-language
equivalent or translation of an existing source term, so it is not a process that
validates source terms, but rather target terms.
We, the client, submit new documents for translation on an ongoing basis,
and new terms crop up which have not previously been translated. This can
be because we are dealing with a completely new product, or because a new
language is being added to the list of target languages, or because there is a
new version of an existing product where significant new features have been
added. For us "document" means a set of source files for a software product.
review versus early validation
An example: A new version of a product has been developed. Global
marketing have decided that in addition to our core languages, the product
needs to be localized to an additional 12 languages.
We do not have linguistic resources in house for these languages and so they
are outsourced. SAS has local offices in each country comprising sales staff
and consultants: sometimes these are quite small.
We need to test and validate the localized version of the product. In the past
we have deployed localized versions of the software and negotiated some kind
of local testing resources. The test was for locale-specific functionality issues.
Higher level functionality issues have always been tested centrally. Also
needed was a review of the translated strings which was done in the context
of the running application. The linguistic review was actually a major part of
The tester would always have a technically-based background - it would be
unusual for them to have a language-based background. This gave rise to
some problems. Experience taught us that the tester often did not have the
linguistic focus needed to review and validate the linguistic content.
On reflection, we decided that if translators have understood the concepts
correctly, and know the appropriate target term, then we can rely on their
professional competence to ensure that the rest of the text is linguistically
It is the key terminology that exposes us to a quality risk.
Our decision to focus on key terminology presented us with some challenging
tasks. Firstly we needed to have a process in place to locate key terminology -
more specifically key terminology that did not yet have a translation in the
Selecting key terminology is not an area I will cover here - suffice it to say
that the task can be done through a combination of term extraction and term
grading. We have found it necessary to grade terms along a "straightforward
to challenging" axis. When we think "challenging", we need to evisage the
translator's task of rendering the terms. A technical term can be a
straightforward term to translate. Likewise, an everyday term used in a
specific way can create challenges for the translator. So this is a localization-
specific definition of key terms.
Returning to our example, let us say that for the new product release there
are 180 or so terms that currently have no target in the termbank for a given
language or set of languages. Some programmatic filtering (using the grading
mentioned above) removes many of these as being non-problematic -
additional terms are removed during a manual review. Reasons for terms
being deemed non-problematic can vary. Perhaps they are pretty standard
terms, or maybe a family of term exists that are really just a core term with
additional words added, like "xxxxx", "xxxxx report", "xxxxx analysis".
Obviously, it is only necessary to validate the target for "xxxxx".
We now have 75 terms left, designated as being challenging for the translator.
Figures differ from language to language.
These 75 terms could be resolved in the course of the translation, and
reviewed afterwards. However, we have discovered that discussion of some
terms can be lengthy and jeopardize deadlines. There are many reasons why
it makes more sense to resolve these terms up-front.
I have already mentioned the client-vendor relationship, but actually things
are more complex than this.
As the client, we are a localization centre representing the head office of the
company, located in the US. This is where virtually all product development
takes place, our software being developed in English (US). In a sense our
local sales office is the initial consumer of the localized product, which will
then be passed on to the local customers.
The person who will carry out the term validation work on our side needs to
be supplied by the local office. Time for the work is probably not budgeted
into the overall schedules, so this is usually seen as an extra task.
We found that asking the subject matter expert to provide targets just did not
work. They did not have the necessary linguistic skills and somtimes proposed
strange terms which could be incorrect, misspelt and inconsistent with
existing terminology, and even inconsistent within the list of terms we gave to
So we decided to start the task by providing the key terminology to the
vendor and asking their translator to provide us with the target terms, which
at this stage would be proposed target terms (pending validation). The
translators who are involved in our software localization are generally
translators who are used to translating for the IT domain. For each language,
we have tested and validated the translations in the past and have a good
working history with our vendors. This was a good place to start.
The translator is not interested in doing this "extra work" for nothing, and we
have agreed to pay for the service. It could be argued that this is an integral
part of translating a text, but this is not strictly correct. We are asking them
to translate a list of terms with only a set of isolated contexts to rely on. They
do not yet have access to the set of files that we require them to translate.
We have agreed to remunerate translators at the rate of 20 terms per hour
(using the standard hourly rate for that language).
Given this rate, translators always seem to approach the task very
professionally and we can see that they have done research into existing
domain terminology (various public glossaries) and considered consistency
with existing terms. They often make notes about difficulties, sources,
possible variants. Sometimes they query us back in order to gain a fuller
understanding of the concept.
Our liaison with the subject matter expert tends to follow a rockier path.
Firstly, we often have to pressure the local office to provide us with a contact.
We need to explain to them what the task involves and what kind of person
we are looking for. Secondly, when we do receive a name or names, and
presume that this is a Subject Matter Expert or SME, we have sometimes
subsequently discovered we have been assigned the most junior person
available, and in the worst case even a student intern. However, in time we
have managed to build up a good system of contacts for all languages.
When we began carrying out ETV, we used a lot of energy on the pedagogical
process of conveying to the translator and SME what the goals of the whole
task comprised. This was a collaboration, not a war! SME's should see the
translator as a valuable resource, and translators should try and understand
the linguistic limitations the SME may have. Also, translators had to
understand that the specific user group may be accustomed to terminology
that differs from the textbook examples they are familiar with. In the final
instance, it would be the SME whose decision held, with modifications. We
have also sometimes had to overrule the SME in extreme circumstances.
We create a list of English key terms. These are supplied to the vendor in an
Excel spreadsheet. In addition we carry out a concordance search against the
product (and related products) and supply the results as a web application
that can be accessed online, or deployed in a local browser.
This set of resources is sent to the translator. Generally, it is enough
information and we usually receive the target terms back quite promptly,
often after 2-3 days.
Now we send the list of terms and proposed targets to the SME, together with
the web application containing the set of concordances.
As regards the web app, we have taken care to create a tool that is easily
navigable and usable. We have toyed with the idea of sending additional
resources, like screen shots, but in the end we rely almost exclusively on the
concordancer web app.
Keeping track of the communication threads is a major task. As far as
possible I have automated what I could, but it is in the nature of the kind of
animated and impassioned discussion that arises from terminology analysis to
be disorganized and varied in format. The actors involved cannot be pressed
into using web forms or complex procedures. Email is the central vehicle.
To ease the process I have written a script that sets up an ETV environment,
which is really just a folder system:
The idea is that the terms pass through a workflow that corresponds roughly
to the folder sequence. In staging, the preparation work is set up. Then the
terms are sent to the translator for each language, received back, sent on to
the reviewer, and received back.
At this point the flow splits. Uncontested targets are considered to be resolved,
and these are added to a resolved pool, still by language. I call these the
"thru-terms" because they have made it through the process. Contested
terms are looped back to the translator. The translator can then accept the
comment of the reviewer - these then also become thru-terms - or they can
send back counter comments. The thru-terms are added to the pool, and the
terms with counter-comments re-enter the cycle, being sent back to the
reviewer for further consideration.
In the majority of cases, the thru-terms quickly rise to 100%. This would be a
• 75 terms to translator
• 75 proposed targets back from translator
• 75 term pairs sent to reviewer
• 50 terms accepted by reviewer -> thru-terms
• 25 counter-proposals
• 25 counter-proposals sent back to translator
• 18 counter-proposals accepted by translator -> thru-terms
• 7 terms receive counter-comments from translator
• 4 counter-comments accepted by reviewer -> thru-terms
• 3 counter-comments refuted
• translator capitulates (sometimes reluctantly, issuing dire warnings)
Sometimes a few stragglers, 2-3 terms, cannot be resolved for some reason.
In these cases we go ahead with the localization and try and resolve them as
we go along.
The resolved term pairs are imported into the termbank and a dictionary is
built, which will be deployed in the CAT tool. Translators must keep to the
terminology in the dictionary, except where there are contextual restraints.
Apart from the fact that we feel we have a localized product with an improved
quality, which is difficult to measure, we have also discovered some
interesting beneficial side-effects.
Most importantly, the early collaboration of the local offices gives them a
sense of ownership in the creation of a localized version of the product. This
takes away the risk that they will reject the localization on the grounds that
the translation is uncacceptable. We had experienced this phenomenon once
every 1-2 years previously: that a localization of a major release for one
language is completely rejected. The reason has always been terminology -
the consequences have been serious, both because a prospective customer
has to wait for a follow-up release, but also because it is extremely tricky to
fix. The translator at the vendor usually goes into defensive mode, ready for a
protracted war with the SME at our local office. The translator and SME are
mostly from the same country, and even city, and are willing to fight for their
cause, citing their own local "experts" - who may even represent different
academic schools and be known to each other. We, as a localization centre
end up in the middle. The truth is usually that the blame, if any, lies on both
sides. The translator at the vendor has been hasty in resolving terminology,
and the local office overreacts, requesting unreasonable wide-reaching
changes. Sometimes the mismatch arises from bad communication or just
plain bad luck.
So, with the ETV model the local office develops a sense of responsibility and
pride in the product because they are now the co-producers of the result.
They feel that they get a version which is matched to the needs of their
Because of this proactive approach, there is a greater awareness that a
localized version is on its way. Also, there is time to sort out the really tricky
Although vendors do not explicitly own up to the fact, we feel that they can
give us faster turnaround times once a project with fully validated key
terminology has been launched. They no longer need to query us about
terminology and wait for a reply. Also, it must give them greater freedom in
splitting projects or switching translators.
It also provides the vendor with some security of cover in the event that
terminology is criticized. They are using terminology that the client has
requested and cannot be held to blame for inappropriate targets.
costs and resources for us
The process is time-consuming for me as the terminologist, but some portion
of the resources spent is really resources re-located. Time-consuming
terminology queries to our PMs have tapered in to a much smaller flow.
Of greater significance is the fact that we have radically changed our testing
processes. Now we only carry out a full functionality/linguistic test for the first
release of a product, but in subsequent releases we centrally test functionality
(smoke test) and do not do a linguistic test, instead relying on early term
validation. It means that we do not need to deploy an early localized version
to the local office for testing (time-consuming), and we do not need to engage
some down-sides that may crop up
Although it isn't strictly true to say that every action has an equal and
opposite re-action, some new challenges will be thrown up by these activities.
Firstly, increased focus on terminology by people who are not necessarily very
linguistically-minded, and even by those who are, will always spark off
discussions that are difficult to close. Language has an elusive character.
Science (in the form of linguistics) can go a long way, but at some stage
discussions become diffuse and strong feelings can be triggered. This seems
to be a universal principle where language is involved. Semantics cannot be
forced into a neat categorical system.
Secondly, we have discovered a recent rapid shift in language usage for many
languages, where the English term is imported and used in the context of
local-language sentences and settings. There may be many reasons for this -
the Internet, the increase in student exchange schemes, prevalence of
Many translators have a strong aversion to the the practice of importing
English terms and, in a sense, may see themselves as guardians of the local
tongue. SMEs are predominantly pragmatic and wish to see the terms that
they are accustomed to, and accustomed to using together with customers.
These are often the English terms. This creates a challenge which we are still
grappling with. On a term-by-term basis it is possible to decide that this or
that term should not be translated, but it is not possible to apply this strategy
across a whole set of terms. Even allowing some terms to remain in English
sets up unsightly conflicts with other related terms that are translated.
Thirdly, validating a term often means changing a proposed term. The
translator has proposed a term based on legacy translation memories
(previous translations of other products). Now the SME requests that the term
should be called something else. How far should this change be allowed to
filter back into the TMs. Should it be a global change, or just become a variant
used in this product domain only.
one final word
There are two very central components of this work. One is the
communication between the parties involved, which I have gone into some
detail in describing. The other is the whole concept of "key terminology",
which in itself is a disarmingly simple term.
Defining the notion of "key terminology" in any precise way is almost
impossible. The analogy I always think of is that of a simple fisherman (or
woman) standing in the sea, casting a small net over a shoal of fish. As it
sinks the fish dart in every direction and many escape. When it sinks to the
bottom, a certain amount of debris and seaweed is scooped up together with
the catch. Your catch is what you managed to capture at that moment, given
the existing circumstances. It isn't perfect, but it's your best shot and,
returning to terminology, among your key terms are prize examples that
really justify the kind of attention that ETV will expose them to. You will
always tend to include borderline terms that are arguably not key terms,
which people will puzzle over: "why on earth is this in there??". But on the
other hand you really have an excellent chance of exposing problematical
terms and dealing with them before they sink the whole project - which even
a handful of terms are capable of doing.