Increased amounts of data contained in electronic health records (EHRs) has led to inefficiencies for clinicians trying to locate relevant patient information. Automated summarization tools that create condition-specific data displays rather than current displays by data type have the potential to greatly improve clinician efficiency. These tools require new kinds of clinical knowledge (e.g., problem-medication relationships) that is difficult to obtain. We compared association rule mining and crowdsourcing as automated methods for generating a knowledge base of problem-medication pairs using a single source of clinical data from a commercially available EHR. The association rule mining and crowdsourcing approaches identified 19,586 and 31,440 pairs respectively. A reasonably strong positive relationship existed between the measures for ranking pairs for each approach (Spearman’s rho = 0.539, p < 0.0001). When comparing the top 500 pairs from each approach, only 186 overlapped. Manual inspection of the pairs indicated that crowdsourcing identified mostly common relationships, while association rule mining identified a combination of common and rare relationships. These findings indicate that the approaches are complementary, and further research is necessary to combine the approaches and better evaluate the approaches to generate an all-inclusive, highly accurate problem-medication knowledge base.
Unraveling Multimodality with Large Language Models.pdf
Comparison of Association Rule Mining and Crowdsourcing for Automated Generation of a Problem-Medication Knowledge Base
1. Comparison of Association Rule Mining and
Crowdsourcing for Automated Generation of a
Problem-Medication Knowledge Base
Allison B. McCoy, PhD1, Adam Wright, PhD2, Dean F. Sittig, PhD1
1The University of Texas Health Science Center at Houston – School of Biomedical Informatics
2Brigham and Women’s Hospital, Harvard Medical School
Objective Knowledge Base Generation Results Figure 2
60000
To compare the use of association rule mining For association rule mining, we used Association rule mining identified 19,586
and crowdsourcing to generate a problem- minimum support and confidence thresholds pairs, including 2,920 distinct medications
Association Rule Mining (Chi-Squared)
medication knowledge base using a single of 5 and 10%, respectively to identify related and 4,759 distinct problems. Crowdsourcing
source of clinical data from a commercially problem-medication pairs. We ranked pairs identified 31,440 pairs, including 2,756
40000
available electronic health record (EHR). using the chi-squared statistic, which distinct medications and 4,675 distinct
performed best when compared to a gold problems. Spearman’s rho comparing
Introduction standard in our previous analysis. overlapping pairs was 0.539, with p <
20000
0.0001 (Fig. 2). Of the top 500 ranked pairs for
Increased amounts of EHR data has led to
For crowdsourcing, we retrieved links both approaches, 186 overlapped (Fig. 3).
inefficiencies for clinicians trying to locate
between medications and problems asserted
relevant patient information. Automated
by clinicians during e-prescribing. We Of the top-ranked association rule mining
0
summarization tools that create condition- 0 50 100 150 200
performed a logistic regression using a subset pairs, nine were also identified through Crowdsourcing (Logistic Regression Fitted Value)
specific data displays rather than current
of pairs that were manually reviewed for crowdsourcing. Support and confidence
displays by data type may improve clinician Overlapping problem-medication pairs in association
appropriateness. We ranked each pair i using varied, as did the number of patients having rule mining and crowdsourcing
efficiency. However, these tools require new
the resulting predictor function, where pi the pair linked and the ratio of linked pairs for
clinical knowledge (e.g., problem-medication
represents the number of patients having pair i those also identified through crowdsourcing.
relationships) that is difficult to obtain.
and ri represents the ratio of patients having All top-ranked crowdsourcing pairs had a Figure 3
Approaches to automatically generating this
the pair linked to the number of patients corresponding association rule mining
knowledge include:
having both the medication and problem in rank. The number of patients having the pair
• Standards-based ontologies, such as pair i co-occurring. linked was greater than 500 patients for all
NDF-RT, a reference terminology for ƒ(i) = 0.14 * pi + 2.34 * ri + 2.45 pairs, while the ratio of linked pairs had a wide
medications that provides a formal content range. Support for the association rule mining 314 184 314
model to describe medications and Comparison of Approaches was high and confidence varied.
definitional relationships. However,
mapping of EHR data to standard We computed Spearman’s rho to test the Top-ranked pairs uniquely identified through
terminologies can be problematic correlation between the association rule association rule mining included some
mining and crowdsourcing. For the 500 top- rarely prescribed medications (e.g., Overlap between association rule mining and
• Association rule mining, a method of ranked problem-medication pairs for each crowdsourcing approaches.
glycopyrrolate) and non-clinical problems
data mining that identifies related concepts approach, we then determined the number of (e.g., taking medication). Top-ranked pairs
using measures of interestingness and has pairs that existed in both sets and the uniquely identified through crowdsourcing Discussion
been previously used to identify number of pairs that were unique. We included commonly prescribed Both approaches effectively identified
relationships between clinical data manually inspected the top-ranked pairs to medications with secondary indicated related pairs; crowdsourcing likely identified
elements classify the types best identified by each problems (e.g., metformin and polycystic more because we did not restrict inclusion,
• Crowdsourcing, defined as outsourcing a approach. ovarian syndrome). while for association rule mining we set
task to a group of people, which takes support and confidence thresholds. Review of
advantage of manually linked laboratory Top-Ranked Association Rule Mining Problem-Medication Pairs overlaps between approaches found a heavy
tests to clinical problems by clinicians Number Ratio of positive skew when comparing number of
during standard EHR e-ordering, a task Crowdsourcing
Rank Medication Problem Support Confidence of Linked pairs included with the percentage of overlap,
Rank
required by many institutions for billing Patients Pairs suggesting that the percentage of overlap
(Fig. 1) 1 Permethrin Scabies 125 0.874 108 100 0.8 increases as the number of pairs included
2 MetroNIDAZOLE Bacterial Vaginosis 1061 0.563 4 1003 0.945 increases until a certain threshold, at which
Figure 1 3 Rilutek Motor Neuron Disease 5 0.833 13543 3 0.6 point both approaches become less accurate.
4 Terconazole Vaginal Candidiasis 404 0.599 20 388 0.960
Pseudomonas Wound Some limitations of this work include the use
5 Amikacin Sulfate 5 1 N/A N/A N/A
Infection of a single source of data that may not be
Glucagon Type I Diabetes Mellitus - directly utilized by other EHR systems; the use
6 115 0.762 113 96 0.835
Emergency Uncontrolled
Levothyroxine
of structured elements, which may be
7 Hypothyroidism 1865 0.675 1 1396 0.749 incomplete compared of narrative text; and the
Sodium
Disorder Of Mitochondrial lack of an evaluation of the appropriateness
8 LevOCARNitine 65 0.3476 240 53 0.815
Metabolism of the identified pairs.
Griseofulvin
9 Tinea Capitis 99 0.846 171 74 0.747
Microsize
10 Solu-CORTEF
Congenital Adrenal
16 0.552 663 14 0.875
Summary of Conclusions
Hyperplasia
Association rule mining and crowdsourcing
are effective, complementary approaches
Top-Ranked Crowdsourcing Problem-Medication Pairs for automatically generating a problem-
Sample screen for linking a medication to an indicated Number Ratio of Association medication knowledge base, which can be
problem during e-prescribing. Rank Medication Problem of Linked Rule Support Confidence used to improve clinical care through
Patients Pairs Mining Rank summary screens. Further research is
Levothyroxine necessary to combine and better evaluate the
1 Hypothyroidism 1396 0.749 7 1865 0.675
Study Setting and Data Sodium
approaches to generate an all-inclusive, highly
2 Simvastatin Hyperlipidemia 1152 0.651 102 1769 0.609
We collected data from a large, multi- accurate problem-medication knowledge
3 Lisinopril Hypertension 1045 0.402 315 2598 0.590
specialty, academic practice that provides base.
4 MetroNIDAZOLE Bacterial Vaginosis 1003 0.945 2 1061 0.563
ambulatory care throughout Houston, TX.
5 Lipitor Hyperlipidemia 865 0.563 143 1537 0.591 Acknowledgments
Clinicians utilized Allscripts Enterprise EHR to
6 Hydrochlorothiazide Hypertension 731 0.429 468 1703 0.625 This project was supported by Grant No. 10510592 for
maintain patient notes and problem lists, order Patient-Centered Cognitive Support under the Strategic
AmLODIPine
laboratory tests, and prescribe medications. 7
Besylate
Hypertension 699 0.422 460 1658 0.635 Health IT Advanced Research Projects (SHARP) from the
During the one year study period, clinicians Office of the National Coordinator for Health Information
Fluticasone Technology and NCRR grant 3UL1RR024148.
8 Allergic Rhinitis 630 0.699 150 901 0.525
entered 418,221 medications and 1,222,308 Propionate
problems for 53,108 patients. 9 NexIUM Esophageal Reflux 561 0.613 211 915 0.352 Please contact the first author via email:
10 MetFORMIN HCl Diabetes Mellitus 566 0.353 437 1605 0.544 allison.b.mccoy@uth.tmc.edu