SlideShare a Scribd company logo
1 of 1
Comparison of Association Rule Mining and
                                         Crowdsourcing for Automated Generation of a
                                            Problem-Medication Knowledge Base
                                                  Allison B. McCoy, PhD1, Adam Wright, PhD2, Dean F. Sittig, PhD1
                                                  1The   University of Texas Health Science Center at Houston – School of Biomedical Informatics
                                                                       2Brigham and Women’s Hospital, Harvard Medical School




                   Objective                                    Knowledge Base Generation                                            Results                                                                                                   Figure 2




                                                                                                                                                                                                               60000
To compare the use of association rule mining            For association rule mining, we used                     Association rule mining identified 19,586
and crowdsourcing to generate a problem-                 minimum support and confidence thresholds                pairs, including 2,920 distinct medications




                                                                                                                                          Association Rule Mining (Chi-Squared)
medication knowledge base using a single                 of 5 and 10%, respectively to identify related           and 4,759 distinct problems. Crowdsourcing
source of clinical data from a commercially              problem-medication pairs. We ranked pairs                identified 31,440 pairs, including 2,756




                                                                                                                                                                                                               40000
available electronic health record (EHR).                using the chi-squared statistic, which                   distinct medications and 4,675 distinct
                                                         performed best when compared to a gold                   problems. Spearman’s rho comparing
                 Introduction                            standard in our previous analysis.                       overlapping pairs was 0.539, with p <




                                                                                                                                                                                                               20000
                                                                                                                  0.0001 (Fig. 2). Of the top 500 ranked pairs for
Increased amounts of EHR data has led to
                                                         For crowdsourcing, we retrieved links                    both approaches, 186 overlapped (Fig. 3).
inefficiencies for clinicians trying to locate
                                                         between medications and problems asserted
relevant patient information. Automated
                                                         by clinicians during e-prescribing. We                   Of the top-ranked association rule mining




                                                                                                                                                                                                                        0
summarization tools that create condition-                                                                                                                                                                                  0           50                100               150        200
                                                         performed a logistic regression using a subset           pairs, nine were also identified through                                                                          Crowdsourcing (Logistic Regression Fitted Value)
specific data displays rather than current
                                                         of pairs that were manually reviewed for                 crowdsourcing. Support and confidence
displays by data type may improve clinician                                                                                                                                                                            Overlapping problem-medication pairs in association
                                                         appropriateness. We ranked each pair i using             varied, as did the number of patients having                                                                   rule mining and crowdsourcing
efficiency. However, these tools require new
                                                         the resulting predictor function, where pi               the pair linked and the ratio of linked pairs for
clinical knowledge (e.g., problem-medication
                                                         represents the number of patients having pair i          those also identified through crowdsourcing.
relationships) that is difficult to obtain.
                                                         and ri represents the ratio of patients having           All top-ranked crowdsourcing pairs had a                                                                                     Figure 3
Approaches to automatically generating this
                                                         the pair linked to the number of patients                corresponding association rule mining
knowledge include:
                                                         having both the medication and problem in                rank. The number of patients having the pair
•   Standards-based ontologies, such as                  pair i co-occurring.                                     linked was greater than 500 patients for all
    NDF-RT, a reference terminology for                           ƒ(i) = 0.14 * pi + 2.34 * ri + 2.45             pairs, while the ratio of linked pairs had a wide
    medications that provides a formal content                                                                    range. Support for the association rule mining                                                                    314              184               314
    model to describe medications and                           Comparison of Approaches                          was high and confidence varied.
    definitional relationships. However,
    mapping of EHR data to standard                      We computed Spearman’s rho to test the                   Top-ranked pairs uniquely identified through
    terminologies can be problematic                     correlation between the association rule                 association rule mining included some
                                                         mining and crowdsourcing. For the 500 top-               rarely prescribed medications (e.g.,                                                                      Overlap between association rule mining and
•   Association rule mining, a method of                 ranked problem-medication pairs for each                                                                                                                                   crowdsourcing approaches.
                                                                                                                  glycopyrrolate) and non-clinical problems
    data mining that identifies related concepts         approach, we then determined the number of               (e.g., taking medication). Top-ranked pairs
    using measures of interestingness and has            pairs that existed in both sets and the                  uniquely identified through crowdsourcing                                                                                 Discussion
    been previously used to identify                     number of pairs that were unique. We                     included commonly prescribed                                                                     Both approaches effectively identified
    relationships between clinical data                  manually inspected the top-ranked pairs to               medications with secondary indicated                                                             related pairs; crowdsourcing likely identified
    elements                                             classify the types best identified by each               problems (e.g., metformin and polycystic                                                         more because we did not restrict inclusion,
•   Crowdsourcing, defined as outsourcing a              approach.                                                ovarian syndrome).                                                                               while for association rule mining we set
    task to a group of people, which takes                                                                                                                                                                         support and confidence thresholds. Review of
    advantage of manually linked laboratory                        Top-Ranked Association Rule Mining Problem-Medication Pairs                                                                                     overlaps between approaches found a heavy
    tests to clinical problems by clinicians                                                                                                                                               Number Ratio of         positive skew when comparing number of
    during standard EHR e-ordering, a task                                                                                       Crowdsourcing
                                                         Rank       Medication             Problem            Support Confidence                                                              of    Linked         pairs included with the percentage of overlap,
                                                                                                                                     Rank
    required by many institutions for billing                                                                                                                                              Patients Pairs          suggesting that the percentage of overlap
    (Fig. 1)                                                1 Permethrin         Scabies                          125      0.874                                                  108          100       0.8       increases as the number of pairs included
                                                            2 MetroNIDAZOLE      Bacterial Vaginosis             1061      0.563                                                       4      1003     0.945       increases until a certain threshold, at which
                    Figure 1                                3 Rilutek            Motor Neuron Disease               5      0.833                                              13543                3     0.6       point both approaches become less accurate.
                                                            4 Terconazole        Vaginal Candidiasis              404      0.599                                                      20       388     0.960
                                                                                 Pseudomonas Wound                                                                                                                 Some limitations of this work include the use
                                                            5 Amikacin Sulfate                                      5            1                                                N/A          N/A      N/A
                                                                                 Infection                                                                                                                         of a single source of data that may not be
                                                              Glucagon           Type I Diabetes Mellitus -                                                                                                        directly utilized by other EHR systems; the use
                                                            6                                                     115      0.762                                                  113             96   0.835
                                                              Emergency          Uncontrolled
                                                              Levothyroxine
                                                                                                                                                                                                                   of structured elements, which may be
                                                            7                    Hypothyroidism                  1865      0.675                                                       1      1396     0.749       incomplete compared of narrative text; and the
                                                              Sodium
                                                                                 Disorder Of Mitochondrial                                                                                                         lack of an evaluation of the appropriateness
                                                            8 LevOCARNitine                                        65     0.3476                                                  240             53   0.815
                                                                                 Metabolism                                                                                                                        of the identified pairs.
                                                                Griseofulvin
                                                            9                    Tinea Capitis                     99      0.846                                                  171             74   0.747
                                                                Microsize
                                                           10 Solu-CORTEF
                                                                                 Congenital Adrenal
                                                                                                                   16      0.552                                                  663             14   0.875
                                                                                                                                                                                                                                Summary of Conclusions
                                                                                 Hyperplasia
                                                                                                                                                                                                                   Association rule mining and crowdsourcing
                                                                                                                                                                                                                   are effective, complementary approaches
                                                                           Top-Ranked Crowdsourcing Problem-Medication Pairs                                                                                       for automatically generating a problem-
Sample screen for linking a medication to an indicated                                                        Number Ratio of    Association                                                                       medication knowledge base, which can be
           problem during e-prescribing.                 Rank       Medication             Problem               of    Linked        Rule                                             Support Confidence           used to improve clinical care through
                                                                                                              Patients Pairs     Mining Rank                                                                       summary screens. Further research is
                                                              Levothyroxine                                                                                                                                        necessary to combine and better evaluate the
                                                            1                    Hypothyroidism                  1396    0.749                                                    7        1865        0.675
         Study Setting and Data                               Sodium
                                                                                                                                                                                                                   approaches to generate an all-inclusive, highly
                                                            2 Simvastatin        Hyperlipidemia                  1152    0.651                                        102                  1769        0.609
We collected data from a large, multi-                                                                                                                                                                             accurate problem-medication knowledge
                                                            3 Lisinopril         Hypertension                    1045    0.402                                        315                  2598        0.590
specialty, academic practice that provides                                                                                                                                                                         base.
                                                            4 MetroNIDAZOLE      Bacterial Vaginosis             1003    0.945                                                    2        1061        0.563
ambulatory care throughout Houston, TX.
                                                            5 Lipitor            Hyperlipidemia                   865    0.563                                        143                  1537        0.591                        Acknowledgments
Clinicians utilized Allscripts Enterprise EHR to
                                                            6 Hydrochlorothiazide Hypertension                    731    0.429                                        468                  1703        0.625       This project was supported by Grant No. 10510592 for
maintain patient notes and problem lists, order                                                                                                                                                                    Patient-Centered Cognitive Support under the Strategic
                                                              AmLODIPine
laboratory tests, and prescribe medications.                7
                                                              Besylate
                                                                                  Hypertension                    699    0.422                                        460                  1658        0.635       Health IT Advanced Research Projects (SHARP) from the
During the one year study period, clinicians                                                                                                                                                                       Office of the National Coordinator for Health Information
                                                              Fluticasone                                                                                                                                          Technology and NCRR grant 3UL1RR024148.
                                                            8                     Allergic Rhinitis               630    0.699                                        150                   901        0.525
entered 418,221 medications and 1,222,308                     Propionate
problems for 53,108 patients.                               9 NexIUM              Esophageal Reflux               561    0.613                                             211              915        0.352            Please contact the first author via email:
                                                           10 MetFORMIN HCl      Diabetes Mellitus                566    0.353                                        437                  1605        0.544                 allison.b.mccoy@uth.tmc.edu

More Related Content

More from Allison McCoy

Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...
Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...
Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...Allison McCoy
 
Improving Lab Order, Verification, and Follow-up Processes at UT Physicians
Improving Lab Order, Verification, and Follow-up Processes at UT PhysiciansImproving Lab Order, Verification, and Follow-up Processes at UT Physicians
Improving Lab Order, Verification, and Follow-up Processes at UT PhysiciansAllison McCoy
 
Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...
Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...
Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...Allison McCoy
 
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...Allison McCoy
 
Automated Inference of Patient Problems from Medications using NDF-RT and the...
Automated Inference of Patient Problems from Medications using NDF-RT and the...Automated Inference of Patient Problems from Medications using NDF-RT and the...
Automated Inference of Patient Problems from Medications using NDF-RT and the...Allison McCoy
 
The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...
The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...
The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...Allison McCoy
 
A System to Improve Medication Safety in the Setting of Acute Kidney Injury
A System to Improve Medication Safety in the Setting of Acute Kidney InjuryA System to Improve Medication Safety in the Setting of Acute Kidney Injury
A System to Improve Medication Safety in the Setting of Acute Kidney InjuryAllison McCoy
 
Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...
Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...
Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...Allison McCoy
 

More from Allison McCoy (8)

Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...
Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...
Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...
 
Improving Lab Order, Verification, and Follow-up Processes at UT Physicians
Improving Lab Order, Verification, and Follow-up Processes at UT PhysiciansImproving Lab Order, Verification, and Follow-up Processes at UT Physicians
Improving Lab Order, Verification, and Follow-up Processes at UT Physicians
 
Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...
Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...
Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...
 
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...
A Prototype Knowledge Base and SMART App to Facilitate Organization of Patien...
 
Automated Inference of Patient Problems from Medications using NDF-RT and the...
Automated Inference of Patient Problems from Medications using NDF-RT and the...Automated Inference of Patient Problems from Medications using NDF-RT and the...
Automated Inference of Patient Problems from Medications using NDF-RT and the...
 
The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...
The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...
The Greasemonkey Firefox Add-On for Altering Display of Data in a Web-Based E...
 
A System to Improve Medication Safety in the Setting of Acute Kidney Injury
A System to Improve Medication Safety in the Setting of Acute Kidney InjuryA System to Improve Medication Safety in the Setting of Acute Kidney Injury
A System to Improve Medication Safety in the Setting of Acute Kidney Injury
 
Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...
Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...
Real-Time Surveillance for Rapid Correction of Clinical Decision Support Fail...
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 

Comparison of Association Rule Mining and Crowdsourcing for Automated Generation of a Problem-Medication Knowledge Base

  • 1. Comparison of Association Rule Mining and Crowdsourcing for Automated Generation of a Problem-Medication Knowledge Base Allison B. McCoy, PhD1, Adam Wright, PhD2, Dean F. Sittig, PhD1 1The University of Texas Health Science Center at Houston – School of Biomedical Informatics 2Brigham and Women’s Hospital, Harvard Medical School Objective Knowledge Base Generation Results Figure 2 60000 To compare the use of association rule mining For association rule mining, we used Association rule mining identified 19,586 and crowdsourcing to generate a problem- minimum support and confidence thresholds pairs, including 2,920 distinct medications Association Rule Mining (Chi-Squared) medication knowledge base using a single of 5 and 10%, respectively to identify related and 4,759 distinct problems. Crowdsourcing source of clinical data from a commercially problem-medication pairs. We ranked pairs identified 31,440 pairs, including 2,756 40000 available electronic health record (EHR). using the chi-squared statistic, which distinct medications and 4,675 distinct performed best when compared to a gold problems. Spearman’s rho comparing Introduction standard in our previous analysis. overlapping pairs was 0.539, with p < 20000 0.0001 (Fig. 2). Of the top 500 ranked pairs for Increased amounts of EHR data has led to For crowdsourcing, we retrieved links both approaches, 186 overlapped (Fig. 3). inefficiencies for clinicians trying to locate between medications and problems asserted relevant patient information. Automated by clinicians during e-prescribing. We Of the top-ranked association rule mining 0 summarization tools that create condition- 0 50 100 150 200 performed a logistic regression using a subset pairs, nine were also identified through Crowdsourcing (Logistic Regression Fitted Value) specific data displays rather than current of pairs that were manually reviewed for crowdsourcing. Support and confidence displays by data type may improve clinician Overlapping problem-medication pairs in association appropriateness. We ranked each pair i using varied, as did the number of patients having rule mining and crowdsourcing efficiency. However, these tools require new the resulting predictor function, where pi the pair linked and the ratio of linked pairs for clinical knowledge (e.g., problem-medication represents the number of patients having pair i those also identified through crowdsourcing. relationships) that is difficult to obtain. and ri represents the ratio of patients having All top-ranked crowdsourcing pairs had a Figure 3 Approaches to automatically generating this the pair linked to the number of patients corresponding association rule mining knowledge include: having both the medication and problem in rank. The number of patients having the pair • Standards-based ontologies, such as pair i co-occurring. linked was greater than 500 patients for all NDF-RT, a reference terminology for ƒ(i) = 0.14 * pi + 2.34 * ri + 2.45 pairs, while the ratio of linked pairs had a wide medications that provides a formal content range. Support for the association rule mining 314 184 314 model to describe medications and Comparison of Approaches was high and confidence varied. definitional relationships. However, mapping of EHR data to standard We computed Spearman’s rho to test the Top-ranked pairs uniquely identified through terminologies can be problematic correlation between the association rule association rule mining included some mining and crowdsourcing. For the 500 top- rarely prescribed medications (e.g., Overlap between association rule mining and • Association rule mining, a method of ranked problem-medication pairs for each crowdsourcing approaches. glycopyrrolate) and non-clinical problems data mining that identifies related concepts approach, we then determined the number of (e.g., taking medication). Top-ranked pairs using measures of interestingness and has pairs that existed in both sets and the uniquely identified through crowdsourcing Discussion been previously used to identify number of pairs that were unique. We included commonly prescribed Both approaches effectively identified relationships between clinical data manually inspected the top-ranked pairs to medications with secondary indicated related pairs; crowdsourcing likely identified elements classify the types best identified by each problems (e.g., metformin and polycystic more because we did not restrict inclusion, • Crowdsourcing, defined as outsourcing a approach. ovarian syndrome). while for association rule mining we set task to a group of people, which takes support and confidence thresholds. Review of advantage of manually linked laboratory Top-Ranked Association Rule Mining Problem-Medication Pairs overlaps between approaches found a heavy tests to clinical problems by clinicians Number Ratio of positive skew when comparing number of during standard EHR e-ordering, a task Crowdsourcing Rank Medication Problem Support Confidence of Linked pairs included with the percentage of overlap, Rank required by many institutions for billing Patients Pairs suggesting that the percentage of overlap (Fig. 1) 1 Permethrin Scabies 125 0.874 108 100 0.8 increases as the number of pairs included 2 MetroNIDAZOLE Bacterial Vaginosis 1061 0.563 4 1003 0.945 increases until a certain threshold, at which Figure 1 3 Rilutek Motor Neuron Disease 5 0.833 13543 3 0.6 point both approaches become less accurate. 4 Terconazole Vaginal Candidiasis 404 0.599 20 388 0.960 Pseudomonas Wound Some limitations of this work include the use 5 Amikacin Sulfate 5 1 N/A N/A N/A Infection of a single source of data that may not be Glucagon Type I Diabetes Mellitus - directly utilized by other EHR systems; the use 6 115 0.762 113 96 0.835 Emergency Uncontrolled Levothyroxine of structured elements, which may be 7 Hypothyroidism 1865 0.675 1 1396 0.749 incomplete compared of narrative text; and the Sodium Disorder Of Mitochondrial lack of an evaluation of the appropriateness 8 LevOCARNitine 65 0.3476 240 53 0.815 Metabolism of the identified pairs. Griseofulvin 9 Tinea Capitis 99 0.846 171 74 0.747 Microsize 10 Solu-CORTEF Congenital Adrenal 16 0.552 663 14 0.875 Summary of Conclusions Hyperplasia Association rule mining and crowdsourcing are effective, complementary approaches Top-Ranked Crowdsourcing Problem-Medication Pairs for automatically generating a problem- Sample screen for linking a medication to an indicated Number Ratio of Association medication knowledge base, which can be problem during e-prescribing. Rank Medication Problem of Linked Rule Support Confidence used to improve clinical care through Patients Pairs Mining Rank summary screens. Further research is Levothyroxine necessary to combine and better evaluate the 1 Hypothyroidism 1396 0.749 7 1865 0.675 Study Setting and Data Sodium approaches to generate an all-inclusive, highly 2 Simvastatin Hyperlipidemia 1152 0.651 102 1769 0.609 We collected data from a large, multi- accurate problem-medication knowledge 3 Lisinopril Hypertension 1045 0.402 315 2598 0.590 specialty, academic practice that provides base. 4 MetroNIDAZOLE Bacterial Vaginosis 1003 0.945 2 1061 0.563 ambulatory care throughout Houston, TX. 5 Lipitor Hyperlipidemia 865 0.563 143 1537 0.591 Acknowledgments Clinicians utilized Allscripts Enterprise EHR to 6 Hydrochlorothiazide Hypertension 731 0.429 468 1703 0.625 This project was supported by Grant No. 10510592 for maintain patient notes and problem lists, order Patient-Centered Cognitive Support under the Strategic AmLODIPine laboratory tests, and prescribe medications. 7 Besylate Hypertension 699 0.422 460 1658 0.635 Health IT Advanced Research Projects (SHARP) from the During the one year study period, clinicians Office of the National Coordinator for Health Information Fluticasone Technology and NCRR grant 3UL1RR024148. 8 Allergic Rhinitis 630 0.699 150 901 0.525 entered 418,221 medications and 1,222,308 Propionate problems for 53,108 patients. 9 NexIUM Esophageal Reflux 561 0.613 211 915 0.352 Please contact the first author via email: 10 MetFORMIN HCl Diabetes Mellitus 566 0.353 437 1605 0.544 allison.b.mccoy@uth.tmc.edu