Statisticians can analyze text by properly preparing textual data through coding using controlled terminologies like MedDRA and the WHO Drug Dictionary. MedDRA is a medical terminology used to code adverse event reports from clinical trials into standardized terms and organize them into system organ classes to facilitate analysis. The WHO Drug Dictionary codes medications by generic name, ingredients, and ATC code to identify drug interactions and therapeutic uses. Coding clinical research data with controlled terminologies allows the data to be summarized, analyzed, and reported based on coding hierarchies and categories.
1. Statisticians Can’t Analyze Text
An Introduction to Coding in the Context of Clinical Research Data Management
INTRODUCTION MedDRA
It is often said that textual data can’t be analyzed, but this is untrue. MedDRA - the Medical Dictionary for Regulatory Activities - is a pragmatic, medically
Statisticians can provide summary statistics and categorical analysis on text valid terminology with an emphasis on ease of use for data entry, retrieval, analysis, and
based data. However this can only be successful if the data is properly display, as well as a suitable balance between sensitivity and specificity within the
prepared. This poster considers the practice of coding clinical research data regulatory environment.
using controlled terminologies and externally controlled data dictionaries such
as MedDRA and the WHO Drug Dictionary. Using MedDRA reported terms are assigned Low Level Terms (LLTs). Each LLT maps to
a single Preferred Term (PT). Each PT maps, through the MedDRA hierarchy, to one or
Terminology more System Organ Classes (SOCs). For ease of reporting each PT is also assigned a
A finite, enumerated set of terms intended to convey information single Primary SOC.
unambiguously. Code lists are good examples of simple controlled MedDRA is maintained by a team of medical experts and is mainly used to analyse and
terminologies. report symptoms, diagnoses and indications based on their SOC allocation. For
example, using MedDRA it is a simple task to extract a list of study subjects experiencing
Controlled Terminology neurological symptoms. In a large study it would be impractical to do this due to a lack of
A controlled terminology is a dictionary of terms that is managed to ensure the specificity in the verbatim adverse event reports..
terms it contains are appropriate for their intended purpose.
Some examples:
Controlled terminologies may be either internal – managed within the users
own organization, or external – managed by an external organization.
Verbatim Term Preferred Term System Organ Class(es)
Coding Using a Controlled Terminology Pain in chest Chest Pain Cardiac disorders
Respiratory, thoracic and mediastinal
Using a controlled terminology data recorded on a data collection form
disorders
(verbatim text) is allocated a ‘reported term’ or synonym. The reported term
General disorders and administration site
translates to a single ‘preferred term’ or code. In more a complex terminology
conditions
the Preferred Term maps to superior terms within a hierarchy. This allows data
to be summarised, analysed and reported based on it’s position within the Bladder infection Urinary tract infection Renal and urinary disorders
hierarchy. Infections and infestations
CDISC WHO-DDE
The Clinical Data Interchange Standards Consortium publishes a list of The WHO Drug Dictionary (Enhanced) is the world’s most comprehensive dictionary of
controlled terminologies for use in clinical trials. ‘Age Span’ is a simple example medicinal product information. It is used by pharmaceutical companies, clinical research
of a CDISC terminology that can be used to categorize patients by their age. organisations and drug regulatory authorities for identifying drug names, their active
ingredients and therapeutic use.
Age Term
The dictionary links medicinal product names to the manufacturer, country, ingredients
PRETERM PRETERM NEWBORN INFANT and Anatomical Therapeutic Chemical code (ATC).
IN UTERO IN UTERO Typically the WHO-DDE is used to code concomitant medications in order to identify
0 – 27 DAYS NEWBORN excluded medications and possible drug interactions.
28 DAYS - 23 MONTHS INFANT Each medication is allocated a number of ATC codes based on it’s chemical constituents,
therapeutic action and prescribing practice. For analysis a single, primary, ATC code is
2 -11 YEARS CHILD often selected based on the indication for which the medication has been prescribed.
12-17 YEARS ADOLESCENT
18-65 YEARS ADULT
> 65 YEARS ELDERLY
AVAILABLE TERMINOLOGIES INTERNALLY CONTROLLED TERMINOLOGY
CDISC terminologies are available free of charge from the CDISC web site and
can be obtained from CRIC. Coding Lab Data
The Medical Dictionary for Regulatory Activities (MedDRA) is licensed to the In this example a Case Report Form requires an investigator to report all haematology
University of Alberta and is available from CRIC free of charge. results that are outside normal range. The data cannot be analysed without coding
because of variations in the verbatim text.
The WHO Drug Dictionary (WHO-DDE) is a commercial product which can be
purchased from the WHO Monitoring Centre, Uppsala, Sweden. However its Verbatim Text Synonym Preferred Term
cost may be prohibitive for smaller studies without significant funding.
Elevated red blood cells Red blood cells RBC
Health Canada’s Drug Product Database is available free of charge. This
product categorises medications based on ingredients, ATC and AHFS Patient had high RBC count RBC RBC
classification. It may therefore be a suitable alternative to the WHO-DDE for High erythrocytes Erythrocytes RBC
coding Canadian medications.
White blood cells WBC WBC
CRIC has a coding application that can be used in conjunction with these
terminologies to code your data. White cell count White blood cells WBC
Rick Watts 2008