SlideShare a Scribd company logo
1 of 26
Download to read offline
A Method for Analyzing 
Commonalities in Clinical Trial Target 
Populations 
Zhe (Henry) He1, Simona Carini2, Tianyong Hao1, 
Ida Sim2, and Chunhua Weng1 
1Department of Biomedical Informatics, Columbia University 
2Department of Medicine, University of California, San Francisco 
AMIA 2014 Annual Symposium 
November 18 2014
Disclosure 
• All the authors disclose that they have no relationships with 
commercial interests. 
2
Learning Objective 
After this session, the learners should be aware of a 
database called COMPACT and its use for analyzing 
common eligibility features for related clinical trials 
3
Motivation 
• Observation 1: Clinical studies are often criticized for the 
lack of generalizability 
4 
Patients 
enrolled in 
studies on 
dementia 
(Schoenmaker et al. 2004) 
1. Schoenmaker N, Van Gool WA. The age gap between patients in clinical studies and in 
the general population: a pitfall for dementia research. Lancet Neurol. 2004;3(10):627-30. 
2. Hao T, Rusanov A, Boland MR, Weng C. Clustering clinical trials with similar eligibility 
criteria features. J Biomed Inform. 2014;pii: S1532-0464(14)00011-2. 
Demented 
patients in the 
general 
population 
• Observation 2: Many clinical trials on the same condition 
use similar or identical eligibility criteria (Hao et al. 2013)
Motivation (cont.) 
• Make transparent the design pattern of research eligibility 
criteria 
• Tradeoffs between internal validity and external validity 
(generalizability). 
• Opportunity: The official trial registry ClinicalTrials.gov 
• 170,000 + summaries of studies in 180+ countries 
5
Our long-term research agenda to analyze 
commonalities in target populations 
Eligibility criteria text 
Discrete data elements 
Semantic equivalence between 
eligibility features 
e.g., “HbA1c > 7.0%” “diabetic” 
6 
e.g., “Diabetic with HbA1c > 7.0% 
after injection of insulin” 
e.g., Data elements: “diabetic”, 
“insulin”, “HbA1c” with its 
property “>7.0%” 
Contextual and temporal 
information of features 
e.g., HbA1c > 7.0% after insulin 
≡
HbA1c <= 10.0 
HbA1c >= 7.0 A1c > 7.0 
Hemoglobin A1c < 11.0 
HbA1c > 11.0 A1c less than 6.5 
Database 
7 
Innovation 
Unifying names 
and units
Methods I 
COMPACT (Commonalities in Target Populations in Clinical Trials) 
• Metadata of clinical trials, e.g., condition, study design, phase 
• Structured eligibility criteria 
• Common eligibility features 
8 
• Numeric, e.g., BMI, blood pressure 
• Categorical, e.g., gravidity, lung cancer 
• Properties of numeric features by disease domain
Methods II 
Pipeline of Constructing the COMPACT Database 
9
Extracting Metadata of Trials 
• Downloaded 159,891 trial summaries from ClinicalTrials.gov 
• Excluded 704 trials with no or non-informative eligibility criteria 
text 
• Extracted structured metadata of trials 
• Retrieved NCT IDs for each condition in a list on 
ClinicalTrials.gov 
• Synonyms of conditions were consolidated, e.g., “heart attack” and 
“myocardial infarction” 
10 
Methods III
Extracting Categorical Features 
• Unsupervised tag mining of eligibility criteria text 
(Miotto et al. 2013) 
• N-grams in eligibility criteria text that 
• Completely or partially match a UMLS concept assigned 27 
semantic types that are relevant to clinical study domain 
• Normalized to Concept Unique Identifier of the UMLS 
• Appeared in at least 5% of trials on the same condition 
• Excluded concepts of the semantic type “Body Part, Organ, 
or Organ Component”, e.g., eye, ear. 
11 
Methods IV 
1. Miotto R, Weng C. Unsupervised mining of frequent tags for clinical eligibility text indexing. 
J Biomed Inform. 2013;46(6):1145-51.
12 
Methods V 
Parsing Numeric Features 
• Inclusion criterion: “HbA1c value between 7.5% and 11%” 
• Numeric expressions extracted by Valx (Hao et al. 2013) : 
[''HbA1c'', ''>='', 7.5, ''%'’] [''HbA1c'', ''<='', 11.0, ''%'’] 
• Properties of the numeric feature “HbA1c”: 
• Value range: [7.5, 11.0] 
• Boundary values: 7.5 and 11 
• Extracted in total 1,045,893 numeric expressions 
1. Hao T, Weng C. Valx - Numeric Expression Extraction and Normalization Tool. 2013. 
Available from: http://columbiaelixr.appspot.com/valx.
COMPACT 
13 
Data Storage 
Metadata 
NCT02097342: Type 2 diabetes, phase 4, … 
NCT01784848: hypertension, phase 3,… 
….. 
Categorical features 
NCT02097342: metformin, diabetes 
NCT01784848: hepatic, smoker … 
….. 
Numeric features 
NCT02097342: HbA1c < 7.5%, BMI >= 20 kg/m2 
NCT00859300: Age > 18, … 
… 
Trial 
summaries
Database Schema of COMPACT 
14
15 
What are the commonalities in Type 2 
diabetes trials?
Frequent Categorical Features 
Categorical features used in the inclusion criteria Categorical features used in the exclusion criteria 
Categorical 
features 
Semantic 
group 
# Trials Perc. Categorical 
features 
Semantic 
group 
# Trials Perc. 
diabetes mellitus 
non-insulin-dependent 
Disorders 520 74.3% diabetes mellitus 
insulin-dependent 
Disorders 236 33.7% 
sulfonylurea 
compounds 
Chemicals 
& Drugs 
118 16.9% pharmacologic 
substance 
Chemicals & 
Drugs 
229 32.7% 
antidiabetics Chemicals 
& Drugs 
94 13.4% allergy severity - 
severe 
Disorders 224 32.0% 
pharmacologic 
substance 
Chemicals 
& Drugs 
91 13.0% gravidity Disorders 223 31.9% 
contraceptive 
methods 
Procedures 83 11.9% malignant 
neoplasm 
Disorders 190 27.1% 
Results I 
16 
Type 2 diabetes trials that recruit patients with HbA1c >= 7.0%
Results II 
Frequent Numeric Features 
Numeric features used in the inclusion criteria Numeric features used in the exclusion criteria 
Numeric 
Semantic group # 
Perc. Numeric 
Semantic group # 
features 
Trials 
features 
Trials 
Perc. 
HbA1c Physiology 663 94.7% Creatinine Chemicals & Drugs 114 16.3% 
BMI Physiology 370 52.8% Systolic blood 
pressure 
Physiology 85 12.1% 
Age Physiology 327 46.7% Diastolic blood 
pressure 
Physiology 84 12% 
Glucose Chemicals & Drugs 114 15.9% ALT Chemicals & Drugs 73 10.4% 
! 
17 
• HbA1c (Glycohemglobin): average blood sugar level 
• BMI (body mass index): weight(kg)/(height(m))2
Top Five Collective Value Ranges 
HbA1c value ranges Number of trials BMI value ranges Number of trials 
[7.0, 10.0] 228 (-∞, 45.0] 113 
(7.0, +∞) 97 (-∞, 40.0] 104 
(-∞, 7.0] 88 [25.0, 40.0] 72 
[7.0, 11.0] 75 (-∞, 40.0) 61 
[7.0, +∞) 57 (-∞, 35.0] 58 
18 
Results III
Collective Boundary Values of HbA1c 
19 
Results IV 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
800 
700 
600 
500 
400 
300 
200 
100 
0 
HbA1c Value (%) 
Number of Trials 
HbA1c boundary values in Type 2 Diabetes Trials
Collective Value Range Widths of HbA1c 
0 1 2 3 4 5 6 7 8 9 10 11 12 
400 
350 
300 
250 
200 
150 
100 
50 
0 
HbA1c Value Range Width 
Number of Trials 
HbA1c Value Range Width in Type 2 Diabetes Trials 
Results V 
20
VITTA: http://is.gd/VITTA 
21 
Results VI
Discussion & Limitations 
• Utility of the COMPACT database 
• Limitations of NLP techniques 
• Partial semantics of the eligibility features 
• Accuracy of parsing numeric expressions 
• Limitations of ClinicalTrials.gov 
• Partial or condensed eligibility criteria 
• Condition indexing errors of trials 
22
Future Work 
• Improve free-text parsers 
• Sophisticated feature analysis 
• Large-scale user evaluation 
• Comparative analysis between 
patient population and target 
population 
• Analysis of multiple features 
simultaneously 
• …… 
23 
1. Weng C, Li Y, Ryan P, Zhang Y, Gao J, Liu F, Bigger JT, Hripcsak G. A Distribution-based 
Method for Assessing The Differences between Clinical Trial Target Populations and Patient 
Populations in Electronic Health Records. Applied Clinical Informatics. 2014;5(2):463-79.
Summary 
• COMPACT - a new resource for analyzing common eligibility 
features in clinical trials 
• VITTA (http://is.gd/VITTA) – a Web-based visual analysis tool 
of clinical trial target populations 
• To improve the transparency of design patterns of eligibility 
criteria for trials in a certain disease domain 
24
Acknowledgments 
(current) Mount Sinai 
• Riccardo Miotto 
25 
Columbia University Medical Center 
• Gregory Hruby 
• Alexander Rusanov 
Funding support: 
• NLM R01 LM009886 (PI: Weng) 
• NCATS UL1 TR000040 (PI: Ginsberg)
Thank you! 
26 
Zhe He, Ph.D. 
zh2132@cumc.columbia.edu 
http://www.columbia.edu/~zh2132/ 
Feedback: 
http://is.gd/yhMnDX

More Related Content

What's hot

Statistical outliers in BE Studies DIA 12 april 2019
Statistical outliers in BE Studies DIA 12 april 2019Statistical outliers in BE Studies DIA 12 april 2019
Statistical outliers in BE Studies DIA 12 april 2019Bhaswat Chakraborty
 
Internal 2013 - General practice clinical systems
Internal 2013 - General practice clinical systemsInternal 2013 - General practice clinical systems
Internal 2013 - General practice clinical systemsEvangelos Kontopantelis
 
Accounting for all randomised cases: handling non-compliance and missing data...
Accounting for all randomised cases: handling non-compliance and missing data...Accounting for all randomised cases: handling non-compliance and missing data...
Accounting for all randomised cases: handling non-compliance and missing data...HTAi Bilbao 2012
 
A Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical ProgrammingA Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical ProgrammingMohammad Majharul Alam
 
GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...
GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...
GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...Tigran Uzunyan
 
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma   Nancy Shum And Anne Marcy Intro To Clinical Data ManagementMelanoma   Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Managementcunniffe6
 
Hugh Gravelle: The impact of care quality on patient choice
Hugh Gravelle: The impact of care quality on patient choiceHugh Gravelle: The impact of care quality on patient choice
Hugh Gravelle: The impact of care quality on patient choiceNuffield Trust
 
Data mining for diabetes readmission
Data mining for diabetes readmissionData mining for diabetes readmission
Data mining for diabetes readmissionYi Chun (Nancy) Chien
 
Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...
Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...
Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...Epochresearch
 
Challenges in Phase III Cancer Clinical Trials
Challenges in Phase III Cancer Clinical Trials Challenges in Phase III Cancer Clinical Trials
Challenges in Phase III Cancer Clinical Trials Bhaswat Chakraborty
 
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1cPredicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1cDamian R. Mingle, MBA
 
Topical corticosteroids as adjunctive therapy for bacterial keratitis
Topical corticosteroids as adjunctive therapy for bacterial keratitisTopical corticosteroids as adjunctive therapy for bacterial keratitis
Topical corticosteroids as adjunctive therapy for bacterial keratitisHari Prakash
 
Alvarado Syst Rv
Alvarado Syst RvAlvarado Syst Rv
Alvarado Syst RvNHS
 
Environmental scans versus horizon scans; systematic reviews versus rapid rev...
Environmental scans versus horizon scans; systematic reviews versus rapid rev...Environmental scans versus horizon scans; systematic reviews versus rapid rev...
Environmental scans versus horizon scans; systematic reviews versus rapid rev...Kimberly Yang
 

What's hot (20)

Comparative efficacy of interventions to promote hand hygiene in hospital sys...
Comparative efficacy of interventions to promote hand hygiene in hospital sys...Comparative efficacy of interventions to promote hand hygiene in hospital sys...
Comparative efficacy of interventions to promote hand hygiene in hospital sys...
 
Statistical outliers in BE Studies DIA 12 april 2019
Statistical outliers in BE Studies DIA 12 april 2019Statistical outliers in BE Studies DIA 12 april 2019
Statistical outliers in BE Studies DIA 12 april 2019
 
Internal 2013 - General practice clinical systems
Internal 2013 - General practice clinical systemsInternal 2013 - General practice clinical systems
Internal 2013 - General practice clinical systems
 
Msrc types research (1)
Msrc types research (1)Msrc types research (1)
Msrc types research (1)
 
Accounting for all randomised cases: handling non-compliance and missing data...
Accounting for all randomised cases: handling non-compliance and missing data...Accounting for all randomised cases: handling non-compliance and missing data...
Accounting for all randomised cases: handling non-compliance and missing data...
 
PMED: APPM Workshop: Challenges in Using Bayesian Analysis Approaches for Reg...
PMED: APPM Workshop: Challenges in Using Bayesian Analysis Approaches for Reg...PMED: APPM Workshop: Challenges in Using Bayesian Analysis Approaches for Reg...
PMED: APPM Workshop: Challenges in Using Bayesian Analysis Approaches for Reg...
 
A Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical ProgrammingA Roadmap for SAS Programmers to Clinical Statistical Programming
A Roadmap for SAS Programmers to Clinical Statistical Programming
 
GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...
GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...
GCP Course for Clinical Trials Involving Investigational Drugs ICH Completion...
 
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma   Nancy Shum And Anne Marcy Intro To Clinical Data ManagementMelanoma   Nancy Shum And Anne Marcy Intro To Clinical Data Management
Melanoma Nancy Shum And Anne Marcy Intro To Clinical Data Management
 
Hugh Gravelle: The impact of care quality on patient choice
Hugh Gravelle: The impact of care quality on patient choiceHugh Gravelle: The impact of care quality on patient choice
Hugh Gravelle: The impact of care quality on patient choice
 
Data mining for diabetes readmission
Data mining for diabetes readmissionData mining for diabetes readmission
Data mining for diabetes readmission
 
Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...
Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...
Clinical SAS Programming | SAS Training | Big Data | Hadoop | Business Analyt...
 
Challenges in Phase III Cancer Clinical Trials
Challenges in Phase III Cancer Clinical Trials Challenges in Phase III Cancer Clinical Trials
Challenges in Phase III Cancer Clinical Trials
 
Clinical research
Clinical researchClinical research
Clinical research
 
Turacoz - Clinical Study Report
Turacoz - Clinical Study ReportTuracoz - Clinical Study Report
Turacoz - Clinical Study Report
 
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1cPredicting Diabetic Readmission Rates: Moving Beyond HbA1c
Predicting Diabetic Readmission Rates: Moving Beyond HbA1c
 
Topical corticosteroids as adjunctive therapy for bacterial keratitis
Topical corticosteroids as adjunctive therapy for bacterial keratitisTopical corticosteroids as adjunctive therapy for bacterial keratitis
Topical corticosteroids as adjunctive therapy for bacterial keratitis
 
Alvarado Syst Rv
Alvarado Syst RvAlvarado Syst Rv
Alvarado Syst Rv
 
Environmental scans versus horizon scans; systematic reviews versus rapid rev...
Environmental scans versus horizon scans; systematic reviews versus rapid rev...Environmental scans versus horizon scans; systematic reviews versus rapid rev...
Environmental scans versus horizon scans; systematic reviews versus rapid rev...
 
Drug utilization evaluation
Drug utilization evaluationDrug utilization evaluation
Drug utilization evaluation
 

Viewers also liked

ความสำค ญของแหล งทร_พยากรการเร_ยนร__
ความสำค ญของแหล งทร_พยากรการเร_ยนร__ความสำค ญของแหล งทร_พยากรการเร_ยนร__
ความสำค ญของแหล งทร_พยากรการเร_ยนร__Apichart Wattanasiri
 
كيفية تحميل الDns
كيفية تحميل الDnsكيفية تحميل الDns
كيفية تحميل الDnswahhed123321
 
Observing and Assessing Flow
Observing and Assessing FlowObserving and Assessing Flow
Observing and Assessing Flowdrtaichi
 
The Qi Cueing Method
The Qi Cueing MethodThe Qi Cueing Method
The Qi Cueing Methoddrtaichi
 
Presentation : Business Process Management with mobile routes
Presentation : Business Process Management with mobile routesPresentation : Business Process Management with mobile routes
Presentation : Business Process Management with mobile routesCharif Mahmoudi
 
Penurunan mata uang IPS SMP
Penurunan mata uang IPS SMPPenurunan mata uang IPS SMP
Penurunan mata uang IPS SMPlestaridiana28
 
An Introduction to TACI
An Introduction to TACIAn Introduction to TACI
An Introduction to TACIMehdi Alamdar
 
AMIA2013-ZH-Family-v15
AMIA2013-ZH-Family-v15AMIA2013-ZH-Family-v15
AMIA2013-ZH-Family-v15Zhe (Henry) He
 

Viewers also liked (13)

ความสำค ญของแหล งทร_พยากรการเร_ยนร__
ความสำค ญของแหล งทร_พยากรการเร_ยนร__ความสำค ญของแหล งทร_พยากรการเร_ยนร__
ความสำค ญของแหล งทร_พยากรการเร_ยนร__
 
كيفية تحميل الDns
كيفية تحميل الDnsكيفية تحميل الDns
كيفية تحميل الDns
 
SFS
SFSSFS
SFS
 
Observing and Assessing Flow
Observing and Assessing FlowObserving and Assessing Flow
Observing and Assessing Flow
 
MIXHS12-Zhe
MIXHS12-ZheMIXHS12-Zhe
MIXHS12-Zhe
 
Research presentation
Research presentationResearch presentation
Research presentation
 
The Qi Cueing Method
The Qi Cueing MethodThe Qi Cueing Method
The Qi Cueing Method
 
Presentation : Business Process Management with mobile routes
Presentation : Business Process Management with mobile routesPresentation : Business Process Management with mobile routes
Presentation : Business Process Management with mobile routes
 
Penurunan mata uang IPS SMP
Penurunan mata uang IPS SMPPenurunan mata uang IPS SMP
Penurunan mata uang IPS SMP
 
VDOS2013-Zhe-Slides
VDOS2013-Zhe-SlidesVDOS2013-Zhe-Slides
VDOS2013-Zhe-Slides
 
An Introduction to TACI
An Introduction to TACIAn Introduction to TACI
An Introduction to TACI
 
AMIA2013-ZH-Family-v15
AMIA2013-ZH-Family-v15AMIA2013-ZH-Family-v15
AMIA2013-ZH-Family-v15
 
C politico
C politicoC politico
C politico
 

Similar to zhe_amia14_v7

The Envisia Genomic Classifier
The Envisia Genomic ClassifierThe Envisia Genomic Classifier
The Envisia Genomic ClassifierPhil J. Morrison
 
Prognosis of Diabetes by Performing Data Mining of HbA1c
 Prognosis of Diabetes by Performing Data Mining of HbA1c Prognosis of Diabetes by Performing Data Mining of HbA1c
Prognosis of Diabetes by Performing Data Mining of HbA1cIJCSIS Research Publications
 
Four strategies to upgrade clinical trial quality in this computerized world ...
Four strategies to upgrade clinical trial quality in this computerized world ...Four strategies to upgrade clinical trial quality in this computerized world ...
Four strategies to upgrade clinical trial quality in this computerized world ...Pubrica
 
SAPC 2013 - general practice clinical systems
SAPC 2013 - general practice clinical systemsSAPC 2013 - general practice clinical systems
SAPC 2013 - general practice clinical systemsEvangelos Kontopantelis
 
RSS 2009 - Investigating the impact of the QOF on quality of primary care
RSS 2009 - Investigating the impact of the QOF on quality of primary careRSS 2009 - Investigating the impact of the QOF on quality of primary care
RSS 2009 - Investigating the impact of the QOF on quality of primary careEvangelos Kontopantelis
 
Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Marion Sills
 
Evidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGY
Evidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGYEvidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGY
Evidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGYYasser Sami Abdel Dayem Amer
 
Paying for performance to improve the delivery of health interventions in LMICs
Paying for performance to improve the delivery of health interventions in LMICsPaying for performance to improve the delivery of health interventions in LMICs
Paying for performance to improve the delivery of health interventions in LMICsReBUILD for Resilience
 
PREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMS
PREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMSPREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMS
PREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMSAIRCC Publishing Corporation
 
Prediction of Anemia using Machine Learning Algorithms
Prediction of Anemia using Machine Learning AlgorithmsPrediction of Anemia using Machine Learning Algorithms
Prediction of Anemia using Machine Learning AlgorithmsAIRCC Publishing Corporation
 
Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...HIMSS UK
 
Reality Diabetes Care
Reality Diabetes CareReality Diabetes Care
Reality Diabetes Carerxmike
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsKarin Verspoor
 
Atlantic Health System Case Study for McKesson
Atlantic Health System Case Study for McKessonAtlantic Health System Case Study for McKesson
Atlantic Health System Case Study for McKessonLori Gilchrist
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEwout Steyerberg
 
2016 Enabling Reporting of Patient Safety Events - NIST Workshop
2016 Enabling Reporting of Patient Safety Events - NIST Workshop2016 Enabling Reporting of Patient Safety Events - NIST Workshop
2016 Enabling Reporting of Patient Safety Events - NIST WorkshopMegan Sawchuk
 

Similar to zhe_amia14_v7 (20)

zhe_CRI2015_NHANES
zhe_CRI2015_NHANESzhe_CRI2015_NHANES
zhe_CRI2015_NHANES
 
The Envisia Genomic Classifier
The Envisia Genomic ClassifierThe Envisia Genomic Classifier
The Envisia Genomic Classifier
 
Prognosis of Diabetes by Performing Data Mining of HbA1c
 Prognosis of Diabetes by Performing Data Mining of HbA1c Prognosis of Diabetes by Performing Data Mining of HbA1c
Prognosis of Diabetes by Performing Data Mining of HbA1c
 
Four strategies to upgrade clinical trial quality in this computerized world ...
Four strategies to upgrade clinical trial quality in this computerized world ...Four strategies to upgrade clinical trial quality in this computerized world ...
Four strategies to upgrade clinical trial quality in this computerized world ...
 
KPIs in Microbiology
KPIs in Microbiology KPIs in Microbiology
KPIs in Microbiology
 
SAPC 2013 - general practice clinical systems
SAPC 2013 - general practice clinical systemsSAPC 2013 - general practice clinical systems
SAPC 2013 - general practice clinical systems
 
4 Dr Jian Wang Health Canada
4 Dr Jian Wang   Health Canada4 Dr Jian Wang   Health Canada
4 Dr Jian Wang Health Canada
 
RSS 2009 - Investigating the impact of the QOF on quality of primary care
RSS 2009 - Investigating the impact of the QOF on quality of primary careRSS 2009 - Investigating the impact of the QOF on quality of primary care
RSS 2009 - Investigating the impact of the QOF on quality of primary care
 
Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...Methods for Observational Comparative Effectiveness Research on Healthcare De...
Methods for Observational Comparative Effectiveness Research on Healthcare De...
 
Evidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGY
Evidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGYEvidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGY
Evidence-Based Clinical Practice Guidelines for OBSTETRICS AND GYNECOLOGY
 
Paying for performance to improve the delivery of health interventions in LMICs
Paying for performance to improve the delivery of health interventions in LMICsPaying for performance to improve the delivery of health interventions in LMICs
Paying for performance to improve the delivery of health interventions in LMICs
 
PREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMS
PREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMSPREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMS
PREDICTION OF ANEMIA USING MACHINE LEARNING ALGORITHMS
 
Prediction of Anemia using Machine Learning Algorithms
Prediction of Anemia using Machine Learning AlgorithmsPrediction of Anemia using Machine Learning Algorithms
Prediction of Anemia using Machine Learning Algorithms
 
Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...  Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
Brendan Delany – Chair in Medical Informatics and Decision Making, Imperial...
 
Pine Biotech
Pine BiotechPine Biotech
Pine Biotech
 
Reality Diabetes Care
Reality Diabetes CareReality Diabetes Care
Reality Diabetes Care
 
Using real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questionsUsing real-world evidence to investigate clinical research questions
Using real-world evidence to investigate clinical research questions
 
Atlantic Health System Case Study for McKesson
Atlantic Health System Case Study for McKessonAtlantic Health System Case Study for McKesson
Atlantic Health System Case Study for McKesson
 
Evaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk predictionEvaluation of the clinical value of biomarkers for risk prediction
Evaluation of the clinical value of biomarkers for risk prediction
 
2016 Enabling Reporting of Patient Safety Events - NIST Workshop
2016 Enabling Reporting of Patient Safety Events - NIST Workshop2016 Enabling Reporting of Patient Safety Events - NIST Workshop
2016 Enabling Reporting of Patient Safety Events - NIST Workshop
 

zhe_amia14_v7

  • 1. A Method for Analyzing Commonalities in Clinical Trial Target Populations Zhe (Henry) He1, Simona Carini2, Tianyong Hao1, Ida Sim2, and Chunhua Weng1 1Department of Biomedical Informatics, Columbia University 2Department of Medicine, University of California, San Francisco AMIA 2014 Annual Symposium November 18 2014
  • 2. Disclosure • All the authors disclose that they have no relationships with commercial interests. 2
  • 3. Learning Objective After this session, the learners should be aware of a database called COMPACT and its use for analyzing common eligibility features for related clinical trials 3
  • 4. Motivation • Observation 1: Clinical studies are often criticized for the lack of generalizability 4 Patients enrolled in studies on dementia (Schoenmaker et al. 2004) 1. Schoenmaker N, Van Gool WA. The age gap between patients in clinical studies and in the general population: a pitfall for dementia research. Lancet Neurol. 2004;3(10):627-30. 2. Hao T, Rusanov A, Boland MR, Weng C. Clustering clinical trials with similar eligibility criteria features. J Biomed Inform. 2014;pii: S1532-0464(14)00011-2. Demented patients in the general population • Observation 2: Many clinical trials on the same condition use similar or identical eligibility criteria (Hao et al. 2013)
  • 5. Motivation (cont.) • Make transparent the design pattern of research eligibility criteria • Tradeoffs between internal validity and external validity (generalizability). • Opportunity: The official trial registry ClinicalTrials.gov • 170,000 + summaries of studies in 180+ countries 5
  • 6. Our long-term research agenda to analyze commonalities in target populations Eligibility criteria text Discrete data elements Semantic equivalence between eligibility features e.g., “HbA1c > 7.0%” “diabetic” 6 e.g., “Diabetic with HbA1c > 7.0% after injection of insulin” e.g., Data elements: “diabetic”, “insulin”, “HbA1c” with its property “>7.0%” Contextual and temporal information of features e.g., HbA1c > 7.0% after insulin ≡
  • 7. HbA1c <= 10.0 HbA1c >= 7.0 A1c > 7.0 Hemoglobin A1c < 11.0 HbA1c > 11.0 A1c less than 6.5 Database 7 Innovation Unifying names and units
  • 8. Methods I COMPACT (Commonalities in Target Populations in Clinical Trials) • Metadata of clinical trials, e.g., condition, study design, phase • Structured eligibility criteria • Common eligibility features 8 • Numeric, e.g., BMI, blood pressure • Categorical, e.g., gravidity, lung cancer • Properties of numeric features by disease domain
  • 9. Methods II Pipeline of Constructing the COMPACT Database 9
  • 10. Extracting Metadata of Trials • Downloaded 159,891 trial summaries from ClinicalTrials.gov • Excluded 704 trials with no or non-informative eligibility criteria text • Extracted structured metadata of trials • Retrieved NCT IDs for each condition in a list on ClinicalTrials.gov • Synonyms of conditions were consolidated, e.g., “heart attack” and “myocardial infarction” 10 Methods III
  • 11. Extracting Categorical Features • Unsupervised tag mining of eligibility criteria text (Miotto et al. 2013) • N-grams in eligibility criteria text that • Completely or partially match a UMLS concept assigned 27 semantic types that are relevant to clinical study domain • Normalized to Concept Unique Identifier of the UMLS • Appeared in at least 5% of trials on the same condition • Excluded concepts of the semantic type “Body Part, Organ, or Organ Component”, e.g., eye, ear. 11 Methods IV 1. Miotto R, Weng C. Unsupervised mining of frequent tags for clinical eligibility text indexing. J Biomed Inform. 2013;46(6):1145-51.
  • 12. 12 Methods V Parsing Numeric Features • Inclusion criterion: “HbA1c value between 7.5% and 11%” • Numeric expressions extracted by Valx (Hao et al. 2013) : [''HbA1c'', ''>='', 7.5, ''%'’] [''HbA1c'', ''<='', 11.0, ''%'’] • Properties of the numeric feature “HbA1c”: • Value range: [7.5, 11.0] • Boundary values: 7.5 and 11 • Extracted in total 1,045,893 numeric expressions 1. Hao T, Weng C. Valx - Numeric Expression Extraction and Normalization Tool. 2013. Available from: http://columbiaelixr.appspot.com/valx.
  • 13. COMPACT 13 Data Storage Metadata NCT02097342: Type 2 diabetes, phase 4, … NCT01784848: hypertension, phase 3,… ….. Categorical features NCT02097342: metformin, diabetes NCT01784848: hepatic, smoker … ….. Numeric features NCT02097342: HbA1c < 7.5%, BMI >= 20 kg/m2 NCT00859300: Age > 18, … … Trial summaries
  • 14. Database Schema of COMPACT 14
  • 15. 15 What are the commonalities in Type 2 diabetes trials?
  • 16. Frequent Categorical Features Categorical features used in the inclusion criteria Categorical features used in the exclusion criteria Categorical features Semantic group # Trials Perc. Categorical features Semantic group # Trials Perc. diabetes mellitus non-insulin-dependent Disorders 520 74.3% diabetes mellitus insulin-dependent Disorders 236 33.7% sulfonylurea compounds Chemicals & Drugs 118 16.9% pharmacologic substance Chemicals & Drugs 229 32.7% antidiabetics Chemicals & Drugs 94 13.4% allergy severity - severe Disorders 224 32.0% pharmacologic substance Chemicals & Drugs 91 13.0% gravidity Disorders 223 31.9% contraceptive methods Procedures 83 11.9% malignant neoplasm Disorders 190 27.1% Results I 16 Type 2 diabetes trials that recruit patients with HbA1c >= 7.0%
  • 17. Results II Frequent Numeric Features Numeric features used in the inclusion criteria Numeric features used in the exclusion criteria Numeric Semantic group # Perc. Numeric Semantic group # features Trials features Trials Perc. HbA1c Physiology 663 94.7% Creatinine Chemicals & Drugs 114 16.3% BMI Physiology 370 52.8% Systolic blood pressure Physiology 85 12.1% Age Physiology 327 46.7% Diastolic blood pressure Physiology 84 12% Glucose Chemicals & Drugs 114 15.9% ALT Chemicals & Drugs 73 10.4% ! 17 • HbA1c (Glycohemglobin): average blood sugar level • BMI (body mass index): weight(kg)/(height(m))2
  • 18. Top Five Collective Value Ranges HbA1c value ranges Number of trials BMI value ranges Number of trials [7.0, 10.0] 228 (-∞, 45.0] 113 (7.0, +∞) 97 (-∞, 40.0] 104 (-∞, 7.0] 88 [25.0, 40.0] 72 [7.0, 11.0] 75 (-∞, 40.0) 61 [7.0, +∞) 57 (-∞, 35.0] 58 18 Results III
  • 19. Collective Boundary Values of HbA1c 19 Results IV 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 800 700 600 500 400 300 200 100 0 HbA1c Value (%) Number of Trials HbA1c boundary values in Type 2 Diabetes Trials
  • 20. Collective Value Range Widths of HbA1c 0 1 2 3 4 5 6 7 8 9 10 11 12 400 350 300 250 200 150 100 50 0 HbA1c Value Range Width Number of Trials HbA1c Value Range Width in Type 2 Diabetes Trials Results V 20
  • 22. Discussion & Limitations • Utility of the COMPACT database • Limitations of NLP techniques • Partial semantics of the eligibility features • Accuracy of parsing numeric expressions • Limitations of ClinicalTrials.gov • Partial or condensed eligibility criteria • Condition indexing errors of trials 22
  • 23. Future Work • Improve free-text parsers • Sophisticated feature analysis • Large-scale user evaluation • Comparative analysis between patient population and target population • Analysis of multiple features simultaneously • …… 23 1. Weng C, Li Y, Ryan P, Zhang Y, Gao J, Liu F, Bigger JT, Hripcsak G. A Distribution-based Method for Assessing The Differences between Clinical Trial Target Populations and Patient Populations in Electronic Health Records. Applied Clinical Informatics. 2014;5(2):463-79.
  • 24. Summary • COMPACT - a new resource for analyzing common eligibility features in clinical trials • VITTA (http://is.gd/VITTA) – a Web-based visual analysis tool of clinical trial target populations • To improve the transparency of design patterns of eligibility criteria for trials in a certain disease domain 24
  • 25. Acknowledgments (current) Mount Sinai • Riccardo Miotto 25 Columbia University Medical Center • Gregory Hruby • Alexander Rusanov Funding support: • NLM R01 LM009886 (PI: Weng) • NCATS UL1 TR000040 (PI: Ginsberg)
  • 26. Thank you! 26 Zhe He, Ph.D. zh2132@cumc.columbia.edu http://www.columbia.edu/~zh2132/ Feedback: http://is.gd/yhMnDX