SlideShare a Scribd company logo
Statisticians Can’t Analyze Text
                                                           An Introduction to Coding in the Context of Clinical Research Data Management
                          INTRODUCTION                                                                                      MedDRA
It is often said that textual data can’t be analyzed, but this is untrue.             MedDRA - the Medical Dictionary for Regulatory Activities - is a pragmatic, medically
Statisticians can provide summary statistics and categorical analysis on text         valid terminology with an emphasis on ease of use for data entry, retrieval, analysis, and
based data. However this can only be successful if the data is properly               display, as well as a suitable balance between sensitivity and specificity within the
prepared. This poster considers the practice of coding clinical research data         regulatory environment.
using controlled terminologies and externally controlled data dictionaries such
as MedDRA and the WHO Drug Dictionary.                                                Using MedDRA reported terms are assigned Low Level Terms (LLTs). Each LLT maps to
                                                                                      a single Preferred Term (PT). Each PT maps, through the MedDRA hierarchy, to one or
Terminology                                                                           more System Organ Classes (SOCs). For ease of reporting each PT is also assigned a
A finite, enumerated set of terms intended to convey information                      single Primary SOC.
unambiguously. Code lists are good examples of simple controlled                      MedDRA is maintained by a team of medical experts and is mainly used to analyse and
terminologies.                                                                        report symptoms, diagnoses and indications based on their SOC allocation. For
                                                                                      example, using MedDRA it is a simple task to extract a list of study subjects experiencing
Controlled Terminology                                                                neurological symptoms. In a large study it would be impractical to do this due to a lack of
A controlled terminology is a dictionary of terms that is managed to ensure the       specificity in the verbatim adverse event reports..
terms it contains are appropriate for their intended purpose.
                                                                                      Some examples:
Controlled terminologies may be either internal – managed within the users
own organization, or external – managed by an external organization.
                                                                                       Verbatim Term Preferred Term                    System Organ Class(es)
Coding Using a Controlled Terminology                                                  Pain in chest         Chest Pain                Cardiac disorders
                                                                                                                                       Respiratory, thoracic and mediastinal
Using a controlled terminology data recorded on a data collection form
                                                                                                                                       disorders
(verbatim text) is allocated a ‘reported term’ or synonym. The reported term
                                                                                                                                       General disorders and administration site
translates to a single ‘preferred term’ or code. In more a complex terminology
                                                                                                                                       conditions
the Preferred Term maps to superior terms within a hierarchy. This allows data
to be summarised, analysed and reported based on it’s position within the              Bladder infection     Urinary tract infection   Renal and urinary disorders
hierarchy.                                                                                                                             Infections and infestations


                                   CDISC                                                                                  WHO-DDE
The Clinical Data Interchange Standards Consortium publishes a list of                The WHO Drug Dictionary (Enhanced) is the world’s most comprehensive dictionary of
controlled terminologies for use in clinical trials. ‘Age Span’ is a simple example   medicinal product information. It is used by pharmaceutical companies, clinical research
of a CDISC terminology that can be used to categorize patients by their age.          organisations and drug regulatory authorities for identifying drug names, their active
                                                                                      ingredients and therapeutic use.
Age                        Term
                                                                                      The dictionary links medicinal product names to the manufacturer, country, ingredients
PRETERM                    PRETERM NEWBORN INFANT                                     and Anatomical Therapeutic Chemical code (ATC).
IN UTERO                   IN UTERO                                                   Typically the WHO-DDE is used to code concomitant medications in order to identify
0 – 27 DAYS                NEWBORN                                                    excluded medications and possible drug interactions.

28 DAYS - 23 MONTHS INFANT                                                            Each medication is allocated a number of ATC codes based on it’s chemical constituents,
                                                                                      therapeutic action and prescribing practice. For analysis a single, primary, ATC code is
2 -11 YEARS                CHILD                                                      often selected based on the indication for which the medication has been prescribed.
12-17 YEARS                ADOLESCENT
18-65 YEARS                ADULT
> 65     YEARS             ELDERLY


                  AVAILABLE TERMINOLOGIES                                                   INTERNALLY CONTROLLED TERMINOLOGY
CDISC terminologies are available free of charge from the CDISC web site and
can be obtained from CRIC.                                                            Coding Lab Data
The Medical Dictionary for Regulatory Activities (MedDRA) is licensed to the          In this example a Case Report Form requires an investigator to report all haematology
University of Alberta and is available from CRIC free of charge.                      results that are outside normal range. The data cannot be analysed without coding
                                                                                      because of variations in the verbatim text.
The WHO Drug Dictionary (WHO-DDE) is a commercial product which can be
purchased from the WHO Monitoring Centre, Uppsala, Sweden. However its                Verbatim Text                    Synonym                         Preferred Term
cost may be prohibitive for smaller studies without significant funding.
                                                                                      Elevated red blood cells         Red blood cells                 RBC
Health Canada’s Drug Product Database is available free of charge. This
product categorises medications based on ingredients, ATC and AHFS                    Patient had high RBC count       RBC                             RBC
classification. It may therefore be a suitable alternative to the WHO-DDE for         High erythrocytes                Erythrocytes                    RBC
coding Canadian medications.
                                                                                      White blood cells                WBC                             WBC
CRIC has a coding application that can be used in conjunction with these
terminologies to code your data.                                                      White cell count                 White blood cells               WBC
Rick Watts 2008

More Related Content

Similar to Statisticians Can\'t Analyze Text!

Medical terminologyfornurses -final
Medical terminologyfornurses -finalMedical terminologyfornurses -final
Medical terminologyfornurses -final
Siraj Fatima Haleem
 
Medical terminologyfornurses -final
Medical terminologyfornurses -finalMedical terminologyfornurses -final
Medical terminologyfornurses -final
Siraj Fatima Haleem
 
Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...
Koray Atalag
 

Similar to Statisticians Can\'t Analyze Text! (20)

Coding
CodingCoding
Coding
 
SNOMED CT and other healthcare terminology standards: competition or cooperat...
SNOMED CT and other healthcare terminology standards: competition or cooperat...SNOMED CT and other healthcare terminology standards: competition or cooperat...
SNOMED CT and other healthcare terminology standards: competition or cooperat...
 
Medical Dictionary for Regulatory Activities (MedDRA)
Medical Dictionary for Regulatory Activities (MedDRA)Medical Dictionary for Regulatory Activities (MedDRA)
Medical Dictionary for Regulatory Activities (MedDRA)
 
Medical Coding_06-SEP-2021.pptx
Medical Coding_06-SEP-2021.pptxMedical Coding_06-SEP-2021.pptx
Medical Coding_06-SEP-2021.pptx
 
Glossary Of Terms Clinical Research
Glossary Of  Terms   Clinical ResearchGlossary Of  Terms   Clinical Research
Glossary Of Terms Clinical Research
 
Med dra basic-training
Med dra basic-trainingMed dra basic-training
Med dra basic-training
 
Medical terminologyfornurses -final
Medical terminologyfornurses -finalMedical terminologyfornurses -final
Medical terminologyfornurses -final
 
Medical terminologyfornurses -final
Medical terminologyfornurses -finalMedical terminologyfornurses -final
Medical terminologyfornurses -final
 
HM312 Week 6
HM312 Week 6HM312 Week 6
HM312 Week 6
 
Find the Right Term for Your Goals: How to Choose Healthcare Terminology Stan...
Find the Right Term for Your Goals: How to Choose Healthcare Terminology Stan...Find the Right Term for Your Goals: How to Choose Healthcare Terminology Stan...
Find the Right Term for Your Goals: How to Choose Healthcare Terminology Stan...
 
EMRs: Meaningful Use and Research
EMRs: Meaningful Use and ResearchEMRs: Meaningful Use and Research
EMRs: Meaningful Use and Research
 
Healthcare terminologies recommendations
Healthcare terminologies recommendationsHealthcare terminologies recommendations
Healthcare terminologies recommendations
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHR
 
Definitva ehr
Definitva ehrDefinitva ehr
Definitva ehr
 
Using MAI™ to Filter News Data
Using MAI™ to Filter News DataUsing MAI™ to Filter News Data
Using MAI™ to Filter News Data
 
Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...Health research, clinical registries, electronic health records – how do they...
Health research, clinical registries, electronic health records – how do they...
 
Embase.pdf
Embase.pdfEmbase.pdf
Embase.pdf
 
Presentation "Spanish Resources in Trendminer Project"
Presentation "Spanish Resources in Trendminer Project"Presentation "Spanish Resources in Trendminer Project"
Presentation "Spanish Resources in Trendminer Project"
 
Ivan cliff
Ivan cliffIvan cliff
Ivan cliff
 
Ivan cliff
Ivan cliffIvan cliff
Ivan cliff
 

Statisticians Can\'t Analyze Text!

  • 1. Statisticians Can’t Analyze Text An Introduction to Coding in the Context of Clinical Research Data Management INTRODUCTION MedDRA It is often said that textual data can’t be analyzed, but this is untrue. MedDRA - the Medical Dictionary for Regulatory Activities - is a pragmatic, medically Statisticians can provide summary statistics and categorical analysis on text valid terminology with an emphasis on ease of use for data entry, retrieval, analysis, and based data. However this can only be successful if the data is properly display, as well as a suitable balance between sensitivity and specificity within the prepared. This poster considers the practice of coding clinical research data regulatory environment. using controlled terminologies and externally controlled data dictionaries such as MedDRA and the WHO Drug Dictionary. Using MedDRA reported terms are assigned Low Level Terms (LLTs). Each LLT maps to a single Preferred Term (PT). Each PT maps, through the MedDRA hierarchy, to one or Terminology more System Organ Classes (SOCs). For ease of reporting each PT is also assigned a A finite, enumerated set of terms intended to convey information single Primary SOC. unambiguously. Code lists are good examples of simple controlled MedDRA is maintained by a team of medical experts and is mainly used to analyse and terminologies. report symptoms, diagnoses and indications based on their SOC allocation. For example, using MedDRA it is a simple task to extract a list of study subjects experiencing Controlled Terminology neurological symptoms. In a large study it would be impractical to do this due to a lack of A controlled terminology is a dictionary of terms that is managed to ensure the specificity in the verbatim adverse event reports.. terms it contains are appropriate for their intended purpose. Some examples: Controlled terminologies may be either internal – managed within the users own organization, or external – managed by an external organization. Verbatim Term Preferred Term System Organ Class(es) Coding Using a Controlled Terminology Pain in chest Chest Pain Cardiac disorders Respiratory, thoracic and mediastinal Using a controlled terminology data recorded on a data collection form disorders (verbatim text) is allocated a ‘reported term’ or synonym. The reported term General disorders and administration site translates to a single ‘preferred term’ or code. In more a complex terminology conditions the Preferred Term maps to superior terms within a hierarchy. This allows data to be summarised, analysed and reported based on it’s position within the Bladder infection Urinary tract infection Renal and urinary disorders hierarchy. Infections and infestations CDISC WHO-DDE The Clinical Data Interchange Standards Consortium publishes a list of The WHO Drug Dictionary (Enhanced) is the world’s most comprehensive dictionary of controlled terminologies for use in clinical trials. ‘Age Span’ is a simple example medicinal product information. It is used by pharmaceutical companies, clinical research of a CDISC terminology that can be used to categorize patients by their age. organisations and drug regulatory authorities for identifying drug names, their active ingredients and therapeutic use. Age Term The dictionary links medicinal product names to the manufacturer, country, ingredients PRETERM PRETERM NEWBORN INFANT and Anatomical Therapeutic Chemical code (ATC). IN UTERO IN UTERO Typically the WHO-DDE is used to code concomitant medications in order to identify 0 – 27 DAYS NEWBORN excluded medications and possible drug interactions. 28 DAYS - 23 MONTHS INFANT Each medication is allocated a number of ATC codes based on it’s chemical constituents, therapeutic action and prescribing practice. For analysis a single, primary, ATC code is 2 -11 YEARS CHILD often selected based on the indication for which the medication has been prescribed. 12-17 YEARS ADOLESCENT 18-65 YEARS ADULT > 65 YEARS ELDERLY AVAILABLE TERMINOLOGIES INTERNALLY CONTROLLED TERMINOLOGY CDISC terminologies are available free of charge from the CDISC web site and can be obtained from CRIC. Coding Lab Data The Medical Dictionary for Regulatory Activities (MedDRA) is licensed to the In this example a Case Report Form requires an investigator to report all haematology University of Alberta and is available from CRIC free of charge. results that are outside normal range. The data cannot be analysed without coding because of variations in the verbatim text. The WHO Drug Dictionary (WHO-DDE) is a commercial product which can be purchased from the WHO Monitoring Centre, Uppsala, Sweden. However its Verbatim Text Synonym Preferred Term cost may be prohibitive for smaller studies without significant funding. Elevated red blood cells Red blood cells RBC Health Canada’s Drug Product Database is available free of charge. This product categorises medications based on ingredients, ATC and AHFS Patient had high RBC count RBC RBC classification. It may therefore be a suitable alternative to the WHO-DDE for High erythrocytes Erythrocytes RBC coding Canadian medications. White blood cells WBC WBC CRIC has a coding application that can be used in conjunction with these terminologies to code your data. White cell count White blood cells WBC Rick Watts 2008