Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
ETV	
  -­	
  Early	
  Term	
  Validation	
  
Ronan Martin – 11th September 2015
the ETV model
The term validation model de...
professional competence to ensure that the rest of the text is linguistically
sound.
It is the key terminology that expose...
The person who will carry out the term validation work on our side needs to
be supplied by the local office. Time for the ...
assets
We create a list of English key terms. These are supplied to the vendor in an
Excel spreadsheet. In addition we car...
terms with counter-comments re-enter the cycle, being sent back to the
reviewer for further consideration.
In the majority...
So, with the ETV model the local office develops a sense of responsibility and
pride in the product because they are now t...
Many translators have a strong aversion to the the practice of importing
English terms and, in a sense, may see themselves...
Upcoming SlideShare
Loading in …5
×

Making Terms Matter: Early term validation. Ronan Martin, SAS Institute

214 views

Published on

From the conference on September 25, 2015 in Stockholm:
Better, faster, cheaper!
Terminology management for big data and explosive content growth.

Published in: Business
  • Be the first to comment

  • Be the first to like this

Making Terms Matter: Early term validation. Ronan Martin, SAS Institute

  1. 1. ETV  -­  Early  Term  Validation   Ronan Martin – 11th September 2015 the ETV model The term validation model described here assumes the existence of a terminology management system (TMS) that is maintained by vendor or client, and used to support the translation process. In our case we maintain a TMS here in house. We call this term validation model, Early Term Validation, or ETV for short. ETV has been evolved in the context of a client-vendor relationship. It should be noted that the "term" being validated is really the local-language equivalent or translation of an existing source term, so it is not a process that validates source terms, but rather target terms. We, the client, submit new documents for translation on an ongoing basis, and new terms crop up which have not previously been translated. This can be because we are dealing with a completely new product, or because a new language is being added to the list of target languages, or because there is a new version of an existing product where significant new features have been added. For us "document" means a set of source files for a software product. review versus early validation An example: A new version of a product has been developed. Global marketing have decided that in addition to our core languages, the product needs to be localized to an additional 12 languages. We do not have linguistic resources in house for these languages and so they are outsourced. SAS has local offices in each country comprising sales staff and consultants: sometimes these are quite small. We need to test and validate the localized version of the product. In the past we have deployed localized versions of the software and negotiated some kind of local testing resources. The test was for locale-specific functionality issues. Higher level functionality issues have always been tested centrally. Also needed was a review of the translated strings which was done in the context of the running application. The linguistic review was actually a major part of the job. The tester would always have a technically-based background - it would be unusual for them to have a language-based background. This gave rise to some problems. Experience taught us that the tester often did not have the linguistic focus needed to review and validate the linguistic content. On reflection, we decided that if translators have understood the concepts correctly, and know the appropriate target term, then we can rely on their
  2. 2. professional competence to ensure that the rest of the text is linguistically sound. It is the key terminology that exposes us to a quality risk. key terminology Our decision to focus on key terminology presented us with some challenging tasks. Firstly we needed to have a process in place to locate key terminology - more specifically key terminology that did not yet have a translation in the target language. Selecting key terminology is not an area I will cover here - suffice it to say that the task can be done through a combination of term extraction and term grading. We have found it necessary to grade terms along a "straightforward to challenging" axis. When we think "challenging", we need to evisage the translator's task of rendering the terms. A technical term can be a straightforward term to translate. Likewise, an everyday term used in a specific way can create challenges for the translator. So this is a localization- specific definition of key terms. Returning to our example, let us say that for the new product release there are 180 or so terms that currently have no target in the termbank for a given language or set of languages. Some programmatic filtering (using the grading mentioned above) removes many of these as being non-problematic - additional terms are removed during a manual review. Reasons for terms being deemed non-problematic can vary. Perhaps they are pretty standard terms, or maybe a family of term exists that are really just a core term with additional words added, like "xxxxx", "xxxxx report", "xxxxx analysis". Obviously, it is only necessary to validate the target for "xxxxx". We now have 75 terms left, designated as being challenging for the translator. Figures differ from language to language. These 75 terms could be resolved in the course of the translation, and reviewed afterwards. However, we have discovered that discussion of some terms can be lengthy and jeopardize deadlines. There are many reasons why it makes more sense to resolve these terms up-front. actors I have already mentioned the client-vendor relationship, but actually things are more complex than this. As the client, we are a localization centre representing the head office of the company, located in the US. This is where virtually all product development takes place, our software being developed in English (US). In a sense our local sales office is the initial consumer of the localized product, which will then be passed on to the local customers.
  3. 3. The person who will carry out the term validation work on our side needs to be supplied by the local office. Time for the work is probably not budgeted into the overall schedules, so this is usually seen as an extra task. We found that asking the subject matter expert to provide targets just did not work. They did not have the necessary linguistic skills and somtimes proposed strange terms which could be incorrect, misspelt and inconsistent with existing terminology, and even inconsistent within the list of terms we gave to them. So we decided to start the task by providing the key terminology to the vendor and asking their translator to provide us with the target terms, which at this stage would be proposed target terms (pending validation). The translators who are involved in our software localization are generally translators who are used to translating for the IT domain. For each language, we have tested and validated the translations in the past and have a good working history with our vendors. This was a good place to start. The translator is not interested in doing this "extra work" for nothing, and we have agreed to pay for the service. It could be argued that this is an integral part of translating a text, but this is not strictly correct. We are asking them to translate a list of terms with only a set of isolated contexts to rely on. They do not yet have access to the set of files that we require them to translate. We have agreed to remunerate translators at the rate of 20 terms per hour (using the standard hourly rate for that language). Given this rate, translators always seem to approach the task very professionally and we can see that they have done research into existing domain terminology (various public glossaries) and considered consistency with existing terms. They often make notes about difficulties, sources, possible variants. Sometimes they query us back in order to gain a fuller understanding of the concept. Our liaison with the subject matter expert tends to follow a rockier path. Firstly, we often have to pressure the local office to provide us with a contact. We need to explain to them what the task involves and what kind of person we are looking for. Secondly, when we do receive a name or names, and presume that this is a Subject Matter Expert or SME, we have sometimes subsequently discovered we have been assigned the most junior person available, and in the worst case even a student intern. However, in time we have managed to build up a good system of contacts for all languages. When we began carrying out ETV, we used a lot of energy on the pedagogical process of conveying to the translator and SME what the goals of the whole task comprised. This was a collaboration, not a war! SME's should see the translator as a valuable resource, and translators should try and understand the linguistic limitations the SME may have. Also, translators had to understand that the specific user group may be accustomed to terminology that differs from the textbook examples they are familiar with. In the final instance, it would be the SME whose decision held, with modifications. We have also sometimes had to overrule the SME in extreme circumstances.
  4. 4. assets We create a list of English key terms. These are supplied to the vendor in an Excel spreadsheet. In addition we carry out a concordance search against the product (and related products) and supply the results as a web application that can be accessed online, or deployed in a local browser. This set of resources is sent to the translator. Generally, it is enough information and we usually receive the target terms back quite promptly, often after 2-3 days. Now we send the list of terms and proposed targets to the SME, together with the web application containing the set of concordances. As regards the web app, we have taken care to create a tool that is easily navigable and usable. We have toyed with the idea of sending additional resources, like screen shots, but in the end we rely almost exclusively on the concordancer web app. keeping track Keeping track of the communication threads is a major task. As far as possible I have automated what I could, but it is in the nature of the kind of animated and impassioned discussion that arises from terminology analysis to be disorganized and varied in format. The actors involved cannot be pressed into using web forms or complex procedures. Email is the central vehicle. To ease the process I have written a script that sets up an ETV environment, which is really just a folder system: The idea is that the terms pass through a workflow that corresponds roughly to the folder sequence. In staging, the preparation work is set up. Then the terms are sent to the translator for each language, received back, sent on to the reviewer, and received back. At this point the flow splits. Uncontested targets are considered to be resolved, and these are added to a resolved pool, still by language. I call these the "thru-terms" because they have made it through the process. Contested terms are looped back to the translator. The translator can then accept the comment of the reviewer - these then also become thru-terms - or they can send back counter comments. The thru-terms are added to the pool, and the
  5. 5. terms with counter-comments re-enter the cycle, being sent back to the reviewer for further consideration. In the majority of cases, the thru-terms quickly rise to 100%. This would be a typical scenario. • 75 terms to translator • 75 proposed targets back from translator • 75 term pairs sent to reviewer • 50 terms accepted by reviewer -> thru-terms • 25 counter-proposals • 25 counter-proposals sent back to translator • 18 counter-proposals accepted by translator -> thru-terms • 7 terms receive counter-comments from translator • 4 counter-comments accepted by reviewer -> thru-terms • 3 counter-comments refuted • translator capitulates (sometimes reluctantly, issuing dire warnings) Sometimes a few stragglers, 2-3 terms, cannot be resolved for some reason. In these cases we go ahead with the localization and try and resolve them as we go along. The resolved term pairs are imported into the termbank and a dictionary is built, which will be deployed in the CAT tool. Translators must keep to the terminology in the dictionary, except where there are contextual restraints. beneficial side-effects Apart from the fact that we feel we have a localized product with an improved quality, which is difficult to measure, we have also discovered some interesting beneficial side-effects. Most importantly, the early collaboration of the local offices gives them a sense of ownership in the creation of a localized version of the product. This takes away the risk that they will reject the localization on the grounds that the translation is uncacceptable. We had experienced this phenomenon once every 1-2 years previously: that a localization of a major release for one language is completely rejected. The reason has always been terminology - the consequences have been serious, both because a prospective customer has to wait for a follow-up release, but also because it is extremely tricky to fix. The translator at the vendor usually goes into defensive mode, ready for a protracted war with the SME at our local office. The translator and SME are mostly from the same country, and even city, and are willing to fight for their cause, citing their own local "experts" - who may even represent different academic schools and be known to each other. We, as a localization centre end up in the middle. The truth is usually that the blame, if any, lies on both sides. The translator at the vendor has been hasty in resolving terminology, and the local office overreacts, requesting unreasonable wide-reaching changes. Sometimes the mismatch arises from bad communication or just plain bad luck.
  6. 6. So, with the ETV model the local office develops a sense of responsibility and pride in the product because they are now the co-producers of the result. They feel that they get a version which is matched to the needs of their customers. Because of this proactive approach, there is a greater awareness that a localized version is on its way. Also, there is time to sort out the really tricky terminology issues. Although vendors do not explicitly own up to the fact, we feel that they can give us faster turnaround times once a project with fully validated key terminology has been launched. They no longer need to query us about terminology and wait for a reply. Also, it must give them greater freedom in splitting projects or switching translators. It also provides the vendor with some security of cover in the event that terminology is criticized. They are using terminology that the client has requested and cannot be held to blame for inappropriate targets. costs and resources for us The process is time-consuming for me as the terminologist, but some portion of the resources spent is really resources re-located. Time-consuming terminology queries to our PMs have tapered in to a much smaller flow. Of greater significance is the fact that we have radically changed our testing processes. Now we only carry out a full functionality/linguistic test for the first release of a product, but in subsequent releases we centrally test functionality (smoke test) and do not do a linguistic test, instead relying on early term validation. It means that we do not need to deploy an early localized version to the local office for testing (time-consuming), and we do not need to engage testing resources. some down-sides that may crop up Although it isn't strictly true to say that every action has an equal and opposite re-action, some new challenges will be thrown up by these activities. Firstly, increased focus on terminology by people who are not necessarily very linguistically-minded, and even by those who are, will always spark off discussions that are difficult to close. Language has an elusive character. Science (in the form of linguistics) can go a long way, but at some stage discussions become diffuse and strong feelings can be triggered. This seems to be a universal principle where language is involved. Semantics cannot be forced into a neat categorical system. Secondly, we have discovered a recent rapid shift in language usage for many languages, where the English term is imported and used in the context of local-language sentences and settings. There may be many reasons for this - the Internet, the increase in student exchange schemes, prevalence of English-language textbooks.
  7. 7. Many translators have a strong aversion to the the practice of importing English terms and, in a sense, may see themselves as guardians of the local tongue. SMEs are predominantly pragmatic and wish to see the terms that they are accustomed to, and accustomed to using together with customers. These are often the English terms. This creates a challenge which we are still grappling with. On a term-by-term basis it is possible to decide that this or that term should not be translated, but it is not possible to apply this strategy across a whole set of terms. Even allowing some terms to remain in English sets up unsightly conflicts with other related terms that are translated. Thirdly, validating a term often means changing a proposed term. The translator has proposed a term based on legacy translation memories (previous translations of other products). Now the SME requests that the term should be called something else. How far should this change be allowed to filter back into the TMs. Should it be a global change, or just become a variant used in this product domain only. one final word There are two very central components of this work. One is the communication between the parties involved, which I have gone into some detail in describing. The other is the whole concept of "key terminology", which in itself is a disarmingly simple term. Defining the notion of "key terminology" in any precise way is almost impossible. The analogy I always think of is that of a simple fisherman (or woman) standing in the sea, casting a small net over a shoal of fish. As it sinks the fish dart in every direction and many escape. When it sinks to the bottom, a certain amount of debris and seaweed is scooped up together with the catch. Your catch is what you managed to capture at that moment, given the existing circumstances. It isn't perfect, but it's your best shot and, returning to terminology, among your key terms are prize examples that really justify the kind of attention that ETV will expose them to. You will always tend to include borderline terms that are arguably not key terms, which people will puzzle over: "why on earth is this in there??". But on the other hand you really have an excellent chance of exposing problematical terms and dealing with them before they sink the whole project - which even a handful of terms are capable of doing.

×