Translation and adaptation issues influencing the normative interpretation of assessment instruments
Cross-cultural adaptation: Translation and Adaptation Issues Influencing the Normative
Interpretation of Assessment Instruments
The adaptation of assessment instruments for new target populations is generally required
when the new target population differs appreciably from the original population with which the
assessment device is used in terms of culture or cultural background, country, and language.
Most cross-cultural adaptations of assessment instruments involve the translation of an
instrument from one language into another (Geisinger, 1994). In some instances, however,
adaptations of assessment instruments are needed even when the language remains the same,
because the culture or life experiences of those speaking the same language differ. Numerous
instruments have been adapted for use with groups that differ experientially from those with
whom the instrument was originally used (Geisinger, 1994).
Example: The Minnesota Multiphase Personality Inventory-Adolescent [MMPI-A], for
example, is an adaptation of the MMPI-2 developed to be used with adolescents, whose life
experiences systematically differ from those of adults. Other measures have been developed with
different versions for men and women, for example, again on the basis of the generally different
life experiences and resultant orientations of men and women (Butcher, 1985; Butcher &
In recent years, the term test adaptation frequently has replaced test translation. This shift
in terminology documents the adaptations in references to culture, in content, and in wording that
are needed in addition to simple translation in revising a test. These changes are needed to meet
the requirements of a circumstance that differs qualitatively from the original use of an
assessment device. Few guidelines regarding test adaptations have been developed, and those
that do exist have not been widely circulated. The International Test Commission, working in
conjunction with the European Association of Psychological Assessment, the International
Association of Applied Psychology, the International Association of Cross-Cultural Psychology,
the International Association for the Evaluation of Educational Achievement, the International
Language Testing Association, and the International Union of Psychological Science, has begun
systematizing the procedures that are recommended in test adaptation (International Test
Commission, 1993, p. 1).
Procedures for Adapting Assessment Measures for New Populations
Anytime a measure is simply used with a population that differs qualitatively from the
one for which it was originally developed, one must check its continued validity and usefulness
in that new population, even if the test itself remains unchanged. When it is likely that a measure
needs to be adapted for use with a new population, the onus is even greater on those adapting the
measure to demonstrate its usefulness (Geisinger, 1994).
Does a Given Measure Need to Be Adapted?
An early decision to be made is whether or not an assessment device actually needs to be
adapted for a new intended use. The answer to this question is generally easy when applied to an
extreme circumstance. When a new target population is not likely to differ appreciably from the
original population with whom the assessment device was used, it is unlikely that the measure
needs to be adjusted to that new population. A measure that has been developed with students
from Pennsylvania State University would not likely need to be adapted if it were to be
administered to students from the University of North Carolina, for example. (However, if it
were found that some items on the instrument used local vernacular or were based on
information known only to residents of central Pennsylvania, then some minor accommodations
would be needed.) At the other extreme, when one wishes to administer a measure to individuals
who cannot communicate in the language in which the measure was originally written, then the
measure obviously must be translated and perhaps otherwise altered. Indeed, such a translation
or adaptation must typically consider cultural as well as language differences between the
original and the target populations. Descriptions of adaptations of the MMPI for Turkey, Hong
Kong, Greece, and Chile may be found in Savasir and Erol (1990),Cheung (1985), Manos
(1985), and Rissetti and Makes (1985), respectively, and serve as excellent examples of both
language and cultural adaptations of that particular measure (Geisinger, 1994).
Steps for Adapting a Measure
The procedures provided later are intended to serve as general guidelines for adapting an
assessment instrument to a new culture and language population. Others (e.g., Bracken &
Barona, 1991; Butcher, in press-a) have provided lists of the steps suggested for the adaptation of
assessment instruments from one language and culture to another. In some cases, all of the steps
provided here would be required. In other situations, either additional steps would be
recommended or some of those steps provided here might not be needed. Furthermore, in some
instances, specific steps would need to be repeated in an iterative fashion. For example, if
validation research failed to yield acceptable evidence supporting the use of the assessment
instrument, the adaptation process might begin anew. Alternatively, minor changes might be
required, followed by field testing, and so forth. Nevertheless, the steps detailed here provide a
test translator with some common approaches to adapting a measure for a new target
population(Mininel et al., 2012).
Cross-cultural adaptation procedures:
The method followed international guidelines for studies of this kind, including the
Evaluation by an
Testing of the
These steps allowed obtaining conceptual, semantic, idiomatic, experiential and
operational equivalences, in addition to content validity (Mininel et al., 2012).
1. Translate and adapt the measure: Sometimes an instrument can be translated or
adapted on a question-by-question basis. At other times, it must be adapted and translated only in
concept. A personality inventory item that asks an adolescent whether he or she prefers going to
watch a movie or to attend a dance will make little sense to an individual from a non-Western,
developing country. The choice presented in the item hence must be adapted culturally to be both
a meaningful decision and one comparable to the item when responded to by members of the
original population. Those translating or adapting an assessment instrument must meet a rigorous
set of requirements. They must be fluent in both languages, extremely knowledgeable about both
cultures, and expert in both the characteristics and the content measured on the instrument and
the uses to which the assessment instrument will be put. Back transition was once the technique
of choice in test translation. In this procedure, an original translation would render items from
the original version of the instrument to the second language, and a second translator—one not
familiar with the instrument—would translate the instrument back into the original language.
This second translation would be compared with the original version of the instrument.
Unfortunately, it has been found that, when translators knew that their work was going to be
subjected to back translation, they would use wording that ensured that a second translation
would faithfully reproduce the original version rather than a translation using the optimal
wording in the target language (Hambleton, 1993).
2. Review the translated or adapted version of the instrument . Rather than perform a
back translation, a more effective technique to ensure that the translation or adaptation has been
conducted appropriately is to use a group of individuals meeting the same criteria as the test
translator (as described in Step 1) to review carefully the quality of the translation or the
adaptation. This editorial review could be accomplished in a group meeting, with individual
reviews by the group members (much like the reviews of journal articles), or through some
combination thereof. Perhaps the most effective strategy would be to have the individuals
composing the panel (a) review the items and react in writing, (b) share their comments with one
another, and (c) meet to consider the points made by each other and to reconcile any differences
3. Adapt the draft instrument on the basis of the comments of the reviewers. The
original test translator or adapter needs to consider the comments made by the panel of experts
described in Step 2. Such deliberations should occur only after the panel has met without the
translator or adapter present to discuss their concerns. The translator or adapter can then meet
with the panel to explain the reasons for drafting the instrument in the manner used. Similarly,
the panel can explain why they reacted to the draft as they did. Through this discursive process,
the final instrument will reflect the best judgment of the entire group (Geisinger, 1994).
4. Pilot test the instrument. A few trial administrations of the assessment device should
be made simply to learn of the potential problems faced by those responding to the assessment
instrument. At this time, a small sample of individuals comparable to the eventual target
population should be identified; administered the assessment device; and then interviewed as to
the understandability of the instructions, acceptability of the time limits, wording of the items,
and so on. The instrument should be changed in light of these early findings (Geisinger, 1994).
5. Field test the instrument. Once pilot testing has been performed and the instrument is
revised accordingly, it is ready to be field tested. At this stage, the instrument is administered to a
large sample representative of the eventual population to be assessed with the instrument. (This
stage can also be performed in several waves, each involving a specific data collection project.)
A number of initial analyses should be performed with the data emerging from this data
collection. Internal consistency reliability analyses should be performed, for example. If
possible, test-retest reliability analyses also should be included in this phase of the instrument
adaptation process. Traditional or item-response theory item analyses should be performed. With
regard to all of the analytic processes described, comparisons should be made with the
corresponding results of the instrument in the original language or culture. It is also possible to
compare performance on individual items, in comparison with data collected with the original
instrument in the country of its origin. Especially useful with cognitive- type assessments, this
set of procedures has been called differential item functioning analyses and was once called item
bias detection methods. (The use of these procedures in cross-cultural testing situations may be
found in O'Brien, 1992, or van de Vijver & Poortinga, 1991. For a brief discussion of some of
the techniques themselves, see Cole & Moss, 1989.)
6. Standardize the scores. If desirable and appropriate, equate them with scores on the
original version. At this time, it is sufficient to state that most instruments that are translated or
adapted for use with a new population, unless they are used purely as research measures,
especially those research measures that are unpublished, use a standard scale such as the T scale
(Mean of 50, standard deviation of 10) to present their scores.
Using data from the field testing, if the sample is large enough (probably at least 750-1,000
participants, depending on how many subgroups in the population need adequate representation),
it probably is possible to perform a standardization. If the sample is not large enough or fully
representative, then collection of a new standardization sample is needed. The analytic
procedures involved in such processes are straightforward and provided in Peterson, Kolen, and
7. Perform validation research as appropriate. Information on the research needed to
document that (a) an assessment device measures the same qualities in both languages or
versions and (b) the new version continues to provide scores that are interpretable in the manner
proposed (Geisinger, 1994).
8. Develop a manual: and other documents for the users of the assessment device. "Test
developers have a responsibility to provide evidence regarding reliability and validity for stated
test purposes, as well as manuals and norms, when appropriate, to guide proper interpretation"
(American Educational Research Association et al., 1985, p. 25). Just adapting a measure for use
in a second setting, no matter how much it might be needed I that setting, is not professionally
acceptable. Documents attesting to the value of using the adapted instrument and describing its
appropriate use are a required part of the adaptation of any measure for professional use
9. Train users. Standard 6.6 of the Standards for Educational and Psychological Testing
(American Educational Research Association et al., 1985) states that responsibility for test use
should be assumed by or delegated only to those individuals who have the training and
experience necessary to handle this responsibility in a professional and technically adequate
manner. Any special qualifications for test administration or interpretation noted in the manual
should be met (p. 42).
10. Collect reactions from users. As one implements the newly revised instrument, it is
frequently propitious to gather comments from those actually using the instrument. Such
comments could guide future changes in the instrument and suggest avenues for needed research.
It is also appropriate to attempt to determine whether the revised assessment device is being
misused or misinterpreted by users (Geisinger, 1994).
What Do Scores on the Adapted Measure Mean?
The interpretation of any score on an assessment device is dependent on many kinds of
information. Clearly, validation (including information on fairness and bias; see Geisinger,
1992a) and reliability information are critical to test interpretation. Normative information
carries meaning to a professional regarding the placement of an individual within a population
distribution of test takers, permitting the professional to interpret the likely meaning of the score.
This interpretative meaning of test scores is especially useful for tests with well-established
validation information and norms (Geisinger, 1994).
Example: Thus, a professional admissions counselor at a university can look at an
individual's scores on the SAT or a similar measure and approximate the probability that the
prospective student will succeed at the university. Similarly, an informed admissions committee
making decisions with regard to the applications for graduate study in psychology can consider
their applicants' Graduate Record Examination scores by using their accumulated (validation)
information on how former applicants have done in graduate study and both their local norms
(whether actually computed or not) and the examination's national test norms. Clinicians use
scores from well-established measures such as the MMPI in much the same way. Experienced
test users can make highly skilled interpretations on the basis of the proper test information,
especially if they are also able to gather the other information that they need to fine-tune their
judgments. Just translating a test and using the same scoring algorithm may not lead to
meaningful scores, even when the understanding of specific scores is well-established in the
original language, nation, or culture. Cultural and other differences between the original and
target populations as well as language differences across the two forms of the assessment device
may render the use of the same scores to denote identical meaning across test forms and cultures
substantially less meaningful (Lu, 1967, as quoted by Cheung, 1985, p. 131).
American Educational Research Association, American Psychological Association, & National
Council on Measurement in Education (1985). Standards for educational and
psychological testing. Washington,DC: American Psychological Association.
American Psychological Association (Producer). (1974-1992). Psychological abstracts. (From
PsycLIT, Version 3.11 [CD-ROM]. Washington, DC: [Producer]. Boston: Silver Platter
Angoff, W. H. (1971). Scales, norms and equivalent scores. In R. L. Thorndike (Ed.),
Educational measurement (2nd ed., pp. 508-600). Washington, DC: American Council on
Ben-Porath, Y. S. (1990). Cross-cultural assessment of personality: The case for replicatory
factor analysis. In J. N. Butcher & C. D. Spielberger (Eds.), Advances in personality
assessment (Vol. 8, pp. 1-26). Hillsdale, NJ: Erlbaum.
Butcher, J. N. (1985). Current developments in MMPI use: An international perspective. In J. N.
Butcher & C. D. Spielberger (Eds.), Advances in personality assessment (Vol. 4, pp. 83-
94). Hillsdale, NJ: Erlbaum. Butcher, J. N.
Cheung, F. M. (1985). Cross-cultural considerations for the translation and adaptation of the
Chinese MMPI in Hong Kong. In J. N. Butcher & C. D. Spielberger (Eds.), Advances in
personality assessment (Vol. 4, pp. 131-158). Hillsdale, NJ: Erlbaum.
Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation issues
influencing the normative interpretation of assessment instruments. Psychological
assessment, 6(4), 304.
International Test Commission, European Association of Psychological Assessment,
International Association of Applied Psychology, International Association of Cross-
Cultural Psychology, International Association for the Evaluation of Educational
Achievement, International Language Testing Association, & International Union of
Psychological Science. (1993). Standards for adapting instruments and establishing score
equivalence. Manuscript in preparation.
Mininel, V. A., Felli, V. E. A., Loisel, P., & Marziale, M. H. P. (2012). Cross-cultural adaptation
of the Work Disability Diagnosis Interview (WoDDI) for the Brazilian context. Revista
Latino-Americana de Enfermagem, 20, 27-34.
Pennock-Roman, M. (1990). Test validity and language background: A study of Hispanic-
American students at six universities. New York: College Entrance Examination Board.
Pennock-Roman, M. (1992). Interpreting test performance in selective admissions for Hispanic
students. In K. F. Geisinger (Ed.), Psychological testing of Hispanics (pp. 99-135).
Washington, DC: American Psychological Association.