Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data mining methodologies for pharmacovigilance


Published on

Medicines: is the applied science or practice of the diagnosis, treatment, and prevention of disease.
Bad effects called Adverse Drug Reactions (ADRs) , it differs from side effects.

Published in: Technology, Health & Medicine
  • Be the first to comment

Data mining methodologies for pharmacovigilance

  1. 1. 1 ABDELFATTAH AL ZAQQA SCHOOL OF COMPUTER SCIENCE PRINCESS SUMAYA UNIVERSITY FOR TECHNOLOGY Abdelfattah Al Zaqqa, PSUT-Amman-Jordan Data Mining Methodologies for Pharmacovigilance
  2. 2. Agenda  Introduction  Examples  Some facts of ADRs and drugs.  Pharmacovigilance  Phv methodologies  Data mining  Computational methodology-Pre-Marketing  Computational methodology-Post Marketing  Future perspectives Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 2
  3. 3. Introduction  Medicines: is the applied science or practice of the diagnosis, treatment, and prevention of disease.  Most medicines have both good and bad effects.  Bad effects called Adverse Drug Reactions (ADRs) , it differs from side effects.  Side effects whether therapeutic or adverse 3 Abdelfattah Al Zaqqa, PSUT-Amman-Jordan ADRs cause over 700,000 emergency department visits each year in the United States
  4. 4. Example of ADRs and side effects • Desired and undesired effects of an aspirin therapy reduce your headache or fever reduce the ability of your blood to clot × bleeding of intestine Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 4
  5. 5. 5 Facts New drug may takes 10 years and billions of dollars.  Drug interactions may also increase the risk of ADRs ADRs may cause over 100,000 deaths among hospitalized patients each year.  ADRs is the fourth largest cause of death in US 136 $ billion annual cost in US from ADRs. ADRs may led to withdrawals drug.     Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
  6. 6. 6 Pharmacovigilance (PhV)  Pharmacovigilance (PhV) is the science that concerns with the detection, assessment, understanding and prevention of ADRs  Pharmacovigilance (PhV)=drug safety surveillance  Surveillance for premarketing (i.e. Data from preclinical & clinical trials) and post-marketing (i.e. throughout a drug’s market life) Abdelfattah Al Zaqqa, PSUT-Amman-Jordan  Phv trend to link the Preclinical human safety with information from post marketing.
  7. 7. Phv methodologies  7 Phv historically relied on biological experiments or manual review of case report In vitro Safety Pharmacology Profiling (SPP) is one of the fundamental method for preclinical; by testing compounds with biochemical and cellular assays. SPP still not efficient (cost and time) Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
  8. 8. Computational methodologies for PhV  Vast quantities and complexity of data to be analyzed  Computational methods at both premarketing and post-marketing stages are more efficient in time and cost (i.e. can accurately detect ADRs in a timely fashion)  SPP still not efficient (cost and time) Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 8  Datasets are available  EMA and NCA are example of specialized companies that maintain and develop database of ADRs
  9. 9. What is Data mining ?!  Data mining  the process of extracting previously unknown, valid and actionable information from large information sources or databases  So what we will need to do this process?!  project goals: detection and prevention of ADRs  dataset acquisition: Available  data cleaning and preprocessing: organize the raw data obtained  data mining: extract useful information  data interpretation: Analysis of data  utilization: the act of using Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 9
  10. 10. Computational methodology-Pre-Marketing  Most of existing research devoted to develop computational methods.  These research can be categorized into I. protein target-based. II. chemical structure-based approaches. III. integrative approach. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 10
  11. 11. Computational methodology-Pre-Marketing-Protein target-based  Drugs typically work by activating or inhibiting the function of a protein, which in turn results in therapeutic benefits to a patient.  drugs with similar in vitro protein binding profiles tend to similar side-effects, Fliri et al.  Fukuzaki et al, proposed a method to predict ADRs using subpathways “cooperative pathways” (pathways that function together).  They developed an algorithm called CoopeRativE Pathway Enumerator (CREPE) to select combinations of sub-pathways  it depends on the availability of gene-expression data observed under identical conditions. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 11
  12. 12. CoopeRativE Pathway Enumerator (CREPE) V vertex, I itemset (activation conditions) Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 12
  13. 13. Computational methodology-PreMarketing-Protein target-based  More recently, Brouwers et al proposed that the side effect similarity of drugs could be attributed to their target proteins being close in a molecular network.  They proposed a pathway neighborhood measure to assess the closest distance of drug pairs according to their target proteins in the human protein protein interaction network and found network neighborhoods to only account for 5.8% of the side-effect similarities compared to 64% by shared drug targets. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 13
  14. 14. Computational methodology-PreMarketing-Protein target-based  Pouliot et al. applied logistic regression (LR) models.  To identify potential ADRs manifesting in 19 specific system organ classes (SOCs), as defined by the Medical Dictionary for Regulatory Activities ,across 485 compounds in 508 BioAssays in the PubChem database.  The models were evaluated using leave-one-out-crossvalidation. The mean AUCs (area under the receiver operating characteristic curve) ranged from 0.60 to 0.92 across different SOCs. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 14
  15. 15. Chemical Structure-based Approachpremarketing  It attempts to link ADRs to their chemical structure.  Bender et al, explore the correlation but the positive predictive was quit low under 0.5. but at least he proved the concept.  Hammann et al, employed decision tree to determine the chemical, physical, and structural properties of compounds that predispose them to causing ADRs  Hammann focused on ADRs in centerla nervous system (CNS),liver, and kidney.  Hammann decision tree model positive predictive accuracies ranging from 78.9% to 90.2%. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 15
  16. 16. Chemical Structure-based Approachpremarketing  Pauwels et al. developed a sparse canonical correlation analysis (SCCA) method to predict high-dimensional side-effect profiles of drug molecules based on the chemical structures.  They predict 1385 side effects in the SIDER DB from chemical structures of 888 approved drugs.  16 Pauwels et al best resulting AUC(area under curve) was between 0.6088 and 0.8932 • SCCA examines the relationships of many variables of different types simultaneously Abdelfattah Al Zaqqa, PSUT-Amman-Jordan
  17. 17. Integrative Approach- premarketing  Huang et al. proposed a new computational framework to predict ADRs by integrating systems biology data that include protein targets, proteinprotein interaction network, gene ontology (GO) annotation ,and reported side effects. They predict heart-related ADRs (i.e. cardio toxicity), which resulted in the highest AUC of 0.771.  Recently, Liu et al. investigated the use of phenotypic information, together with chemical and biological properties of drugs, to predict ADRs. using five machine learning algorithms: LR, Naïve Bayes (NB), KNearest Neighbor (KNN), Random Forest (RF), and SVM. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 17
  18. 18. Integrative Approach  integration of chemical, biological, and phenotypic properties outperforms the chemical structured-based method (from 0.9054 to 0.9524 with SVM) and has the potential to detect clinically important ADRs at both preclinical and post-market phases for drug surveillance. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 18
  19. 19. Post Marketing  many ADRs may still be missed because the clinical trials are often small, short, and biased by excluding patients with comorbid diseases.  do not mirror actual clinical use situations for diverse populations   (e.g. inpatient) thus it is important to continue the surveillance postmarket. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 19
  20. 20. Computational methodology-Post marketing-Data sources  Spontaneous reporting systems (SRSs) is the core data-collection system for postmarketing drug surveillance since 1960. US FDA and the VigiBase maintain such as these report.  World Health Organization (WHO) manage these SRSs. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 20
  21. 21. Post marketing-Spontaneous Reports  Disproportionality Analysis (DPA) involves frequency analyses of 2x2 contingency tables to quantify the degree to which a drug and ADR cooccurs “disproportionally” compared with what would be expected if there were no association ADR No ADR Total Drug a b N=a+b No Drug c d c+d Total M=a+c B+d T=a+b+c+d Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 21
  22. 22. Post marketing-Spontaneous Reports  Many approaches are applied the straightforward method is the calculation of frequentist metrics Association Measures Definition Relative Reporting Ratio (RRR) (t * a) / (m * n) Proportional Reporting Ratio (PRR) (a * (t – n)) / (c * n) Reporting Odds Ratio (ROR) (a * d) / (c * b) • Definitions of the frequentist measures of association Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 22
  23. 23. Post marketing-Spontaneous Reports  Other algorithms were also developed but they are more complex, such as gamma-Poisson shrinker (GPS) and the multi-item gamma-Poisson shrinker (MGPS)  DPA methods are effective in detecting single Drug-ADR associations Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 23
  24. 24. Data Mining Algorithms  DPA methods are effective in detecting single Drug-ADR associations  Data mining for multi-item ADR associations.  Harpaz et al identified 1167 multi-item ADR associations Using a set of 162,744 reports submitted to the FDA in 2008, 67% were validated by a domain expert  Tatonetti et al applied the bi clustering algorithm to identify drug groups that share a common set of ADRs in SRS data.  They discovered ADRs between drugs that couldn’t be discovered using DPA method.(e.g pravastatin and paroxetine had effect on blood glucose) Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 24
  25. 25. Post marketing -Electronic Medical Records  Electronic Medical Records :is a computerized medical record created in an organization that delivers care, EMRs contain not only detailed patient information but also copious longitudinal clinical data.  EMR databases consist of data in two types formats: (1) structured (e.g., laboratory data)  Several groups have employed computational methods on structured or coded data in EMRs to identify specific ADR signals (2) unstructured (narrative clinical notes). Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 25
  26. 26. Structured & unstructured Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 26
  27. 27. Post marketing -Electronic Medical Records-structured data  Yoon et al, demonstrated laboratory abnormality to be a valuable source for PhV by examining the odds ratio of laboratory abnormalities between a drug-exposed and a matched unexposed group using 10 years of EMR data.  Evaluation of their algorithm on 470 randomly selected drug-andabnormal-lab-event pairs produced a positive predictive value of 0.837 and negative predictive value of 0.659. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 27
  28. 28. Post marketing -Electronic Medical Records-Unstructured Data  natural language processing (NLP) technique is required to extract the needed information from unstructured data.  Wang et al first employed NLP techniques to extract drug-ADR  Link Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 28
  29. 29. Non-conventional Data Sources-Post marketing 1. Biomedical Literature  Shetty and Dalal retrieved articles (published between 1949 and2009), for prioritizing drug-ADR associations.  DPA was applied to identify statistically significant pairs from the thousands of pairs in the remaining articles.  Evaluation showed that the method identified true associations with 0.41 and 0.71 inprecision and recall, respectively. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 29
  30. 30. Non-conventional Data Sources 2. Health Forums  Data posted by users on health-related websites may also contain valuable drug safety information  mine drug-and-ADR from health –related websites (e.g. DailyStrength (  System evaluation was conducted on a manually annotated set of 3600 user posts corresponding to 6 drugs. The system was shown to achieve 0.78 in precision and 0.70 in recall. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 30
  31. 31. Non-conventional Data Sources  Chee et al, aggregated individuals’ opinions and review of drugs and used NLP technique to group drugs.  Some drugs were withdrawn from based on these messages. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 31
  32. 32. 32 Future perspectives   This presentation provides a general overview of the current computational methodologies applied for PhV. basic concepts and highlight some representative work it is desirable to incorporate various data sources into one framework to understand ADRs. Abdelfattah Al Zaqqa, PSUT-Amman-Jordan  Data mining algorithms are applicable and useful to detect drugs interactions.  EMR for ADR prediction is not readily accessible for data mining, more sophisticated studies and NLP techniques is needed.  cause-and-effect relationships is an intrinsically hard problem in data mining and need to be further investigated for the PhV application.
  33. 33. Useful links       2/06/04/disproportionality-analysis-iscoming-in-jmp-clinical-4-0/ Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 33
  34. 34. References  Oxford English Dictionary definition of "medicine“  Source: The Importance of Pharmacovigilance, WHO 2002  Budnitz, D.S., Pollock, D.A., Weidenbach, K.N.,Mendelsohn, A.B., Schroeder, T.J. and Annest, J.L. National surveillance of emergency department visits for outpatient adverse drug events. JAMA, 296, 15 (Oct 18 2006), 1858-1866.  Hopkins, A.L. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4, 11 (Nov 2008), 682-690.  Helma C, Gottmann E, Kramer S. Knowledge discovery and data mining in toxicology. Stat Meth Med Res. 2000;9:329–58.   Mutsumi Fukuzaki, Mio Seki,Side Effect Prediction using Cooperative Pathways Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 34
  35. 35. Thank you! Abdelfattah Al Zaqqa, PSUT-Amman-Jordan 35