Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Comparison of Association Rule Mining and
                                         Crowdsourcing for Automated Generation ...
You’ve finished this document.
Upcoming SlideShare
Association Rule Mining in Data Mining
Next
Upcoming SlideShare
Association Rule Mining in Data Mining
Next

0

Share

Comparison of Association Rule Mining and Crowdsourcing for Automated Generation of a Problem-Medication Knowledge Base

Increased amounts of data contained in electronic health records (EHRs) has led to inefficiencies for clinicians trying to locate relevant patient information. Automated summarization tools that create condition-specific data displays rather than current displays by data type have the potential to greatly improve clinician efficiency. These tools require new kinds of clinical knowledge (e.g., problem-medication relationships) that is difficult to obtain. We compared association rule mining and crowdsourcing as automated methods for generating a knowledge base of problem-medication pairs using a single source of clinical data from a commercially available EHR. The association rule mining and crowdsourcing approaches identified 19,586 and 31,440 pairs respectively. A reasonably strong positive relationship existed between the measures for ranking pairs for each approach (Spearman’s rho = 0.539, p < 0.0001). When comparing the top 500 pairs from each approach, only 186 overlapped. Manual inspection of the pairs indicated that crowdsourcing identified mostly common relationships, while association rule mining identified a combination of common and rare relationships. These findings indicate that the approaches are complementary, and further research is necessary to combine the approaches and better evaluate the approaches to generate an all-inclusive, highly accurate problem-medication knowledge base.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Comparison of Association Rule Mining and Crowdsourcing for Automated Generation of a Problem-Medication Knowledge Base

  1. 1. Comparison of Association Rule Mining and Crowdsourcing for Automated Generation of a Problem-Medication Knowledge Base Allison B. McCoy, PhD1, Adam Wright, PhD2, Dean F. Sittig, PhD1 1The University of Texas Health Science Center at Houston – School of Biomedical Informatics 2Brigham and Women’s Hospital, Harvard Medical School Objective Knowledge Base Generation Results Figure 2 60000 To compare the use of association rule mining For association rule mining, we used Association rule mining identified 19,586 and crowdsourcing to generate a problem- minimum support and confidence thresholds pairs, including 2,920 distinct medications Association Rule Mining (Chi-Squared) medication knowledge base using a single of 5 and 10%, respectively to identify related and 4,759 distinct problems. Crowdsourcing source of clinical data from a commercially problem-medication pairs. We ranked pairs identified 31,440 pairs, including 2,756 40000 available electronic health record (EHR). using the chi-squared statistic, which distinct medications and 4,675 distinct performed best when compared to a gold problems. Spearman’s rho comparing Introduction standard in our previous analysis. overlapping pairs was 0.539, with p < 20000 0.0001 (Fig. 2). Of the top 500 ranked pairs for Increased amounts of EHR data has led to For crowdsourcing, we retrieved links both approaches, 186 overlapped (Fig. 3). inefficiencies for clinicians trying to locate between medications and problems asserted relevant patient information. Automated by clinicians during e-prescribing. We Of the top-ranked association rule mining 0 summarization tools that create condition- 0 50 100 150 200 performed a logistic regression using a subset pairs, nine were also identified through Crowdsourcing (Logistic Regression Fitted Value) specific data displays rather than current of pairs that were manually reviewed for crowdsourcing. Support and confidence displays by data type may improve clinician Overlapping problem-medication pairs in association appropriateness. We ranked each pair i using varied, as did the number of patients having rule mining and crowdsourcing efficiency. However, these tools require new the resulting predictor function, where pi the pair linked and the ratio of linked pairs for clinical knowledge (e.g., problem-medication represents the number of patients having pair i those also identified through crowdsourcing. relationships) that is difficult to obtain. and ri represents the ratio of patients having All top-ranked crowdsourcing pairs had a Figure 3 Approaches to automatically generating this the pair linked to the number of patients corresponding association rule mining knowledge include: having both the medication and problem in rank. The number of patients having the pair • Standards-based ontologies, such as pair i co-occurring. linked was greater than 500 patients for all NDF-RT, a reference terminology for ƒ(i) = 0.14 * pi + 2.34 * ri + 2.45 pairs, while the ratio of linked pairs had a wide medications that provides a formal content range. Support for the association rule mining 314 184 314 model to describe medications and Comparison of Approaches was high and confidence varied. definitional relationships. However, mapping of EHR data to standard We computed Spearman’s rho to test the Top-ranked pairs uniquely identified through terminologies can be problematic correlation between the association rule association rule mining included some mining and crowdsourcing. For the 500 top- rarely prescribed medications (e.g., Overlap between association rule mining and • Association rule mining, a method of ranked problem-medication pairs for each crowdsourcing approaches. glycopyrrolate) and non-clinical problems data mining that identifies related concepts approach, we then determined the number of (e.g., taking medication). Top-ranked pairs using measures of interestingness and has pairs that existed in both sets and the uniquely identified through crowdsourcing Discussion been previously used to identify number of pairs that were unique. We included commonly prescribed Both approaches effectively identified relationships between clinical data manually inspected the top-ranked pairs to medications with secondary indicated related pairs; crowdsourcing likely identified elements classify the types best identified by each problems (e.g., metformin and polycystic more because we did not restrict inclusion, • Crowdsourcing, defined as outsourcing a approach. ovarian syndrome). while for association rule mining we set task to a group of people, which takes support and confidence thresholds. Review of advantage of manually linked laboratory Top-Ranked Association Rule Mining Problem-Medication Pairs overlaps between approaches found a heavy tests to clinical problems by clinicians Number Ratio of positive skew when comparing number of during standard EHR e-ordering, a task Crowdsourcing Rank Medication Problem Support Confidence of Linked pairs included with the percentage of overlap, Rank required by many institutions for billing Patients Pairs suggesting that the percentage of overlap (Fig. 1) 1 Permethrin Scabies 125 0.874 108 100 0.8 increases as the number of pairs included 2 MetroNIDAZOLE Bacterial Vaginosis 1061 0.563 4 1003 0.945 increases until a certain threshold, at which Figure 1 3 Rilutek Motor Neuron Disease 5 0.833 13543 3 0.6 point both approaches become less accurate. 4 Terconazole Vaginal Candidiasis 404 0.599 20 388 0.960 Pseudomonas Wound Some limitations of this work include the use 5 Amikacin Sulfate 5 1 N/A N/A N/A Infection of a single source of data that may not be Glucagon Type I Diabetes Mellitus - directly utilized by other EHR systems; the use 6 115 0.762 113 96 0.835 Emergency Uncontrolled Levothyroxine of structured elements, which may be 7 Hypothyroidism 1865 0.675 1 1396 0.749 incomplete compared of narrative text; and the Sodium Disorder Of Mitochondrial lack of an evaluation of the appropriateness 8 LevOCARNitine 65 0.3476 240 53 0.815 Metabolism of the identified pairs. Griseofulvin 9 Tinea Capitis 99 0.846 171 74 0.747 Microsize 10 Solu-CORTEF Congenital Adrenal 16 0.552 663 14 0.875 Summary of Conclusions Hyperplasia Association rule mining and crowdsourcing are effective, complementary approaches Top-Ranked Crowdsourcing Problem-Medication Pairs for automatically generating a problem- Sample screen for linking a medication to an indicated Number Ratio of Association medication knowledge base, which can be problem during e-prescribing. Rank Medication Problem of Linked Rule Support Confidence used to improve clinical care through Patients Pairs Mining Rank summary screens. Further research is Levothyroxine necessary to combine and better evaluate the 1 Hypothyroidism 1396 0.749 7 1865 0.675 Study Setting and Data Sodium approaches to generate an all-inclusive, highly 2 Simvastatin Hyperlipidemia 1152 0.651 102 1769 0.609 We collected data from a large, multi- accurate problem-medication knowledge 3 Lisinopril Hypertension 1045 0.402 315 2598 0.590 specialty, academic practice that provides base. 4 MetroNIDAZOLE Bacterial Vaginosis 1003 0.945 2 1061 0.563 ambulatory care throughout Houston, TX. 5 Lipitor Hyperlipidemia 865 0.563 143 1537 0.591 Acknowledgments Clinicians utilized Allscripts Enterprise EHR to 6 Hydrochlorothiazide Hypertension 731 0.429 468 1703 0.625 This project was supported by Grant No. 10510592 for maintain patient notes and problem lists, order Patient-Centered Cognitive Support under the Strategic AmLODIPine laboratory tests, and prescribe medications. 7 Besylate Hypertension 699 0.422 460 1658 0.635 Health IT Advanced Research Projects (SHARP) from the During the one year study period, clinicians Office of the National Coordinator for Health Information Fluticasone Technology and NCRR grant 3UL1RR024148. 8 Allergic Rhinitis 630 0.699 150 901 0.525 entered 418,221 medications and 1,222,308 Propionate problems for 53,108 patients. 9 NexIUM Esophageal Reflux 561 0.613 211 915 0.352 Please contact the first author via email: 10 MetFORMIN HCl Diabetes Mellitus 566 0.353 437 1605 0.544 allison.b.mccoy@uth.tmc.edu

Increased amounts of data contained in electronic health records (EHRs) has led to inefficiencies for clinicians trying to locate relevant patient information. Automated summarization tools that create condition-specific data displays rather than current displays by data type have the potential to greatly improve clinician efficiency. These tools require new kinds of clinical knowledge (e.g., problem-medication relationships) that is difficult to obtain. We compared association rule mining and crowdsourcing as automated methods for generating a knowledge base of problem-medication pairs using a single source of clinical data from a commercially available EHR. The association rule mining and crowdsourcing approaches identified 19,586 and 31,440 pairs respectively. A reasonably strong positive relationship existed between the measures for ranking pairs for each approach (Spearman’s rho = 0.539, p &lt; 0.0001). When comparing the top 500 pairs from each approach, only 186 overlapped. Manual inspection of the pairs indicated that crowdsourcing identified mostly common relationships, while association rule mining identified a combination of common and rare relationships. These findings indicate that the approaches are complementary, and further research is necessary to combine the approaches and better evaluate the approaches to generate an all-inclusive, highly accurate problem-medication knowledge base.

Views

Total views

276

On Slideshare

0

From embeds

0

Number of embeds

1

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×