Extending the “Web of DrugIdentity” with KnowledgeExtracted from United StatesProduct LabelsOktie Hassanzadeh, IBM Researc...
Take home message• Drug product labeling is a vital, unique, and  under-utilized source of claims and evidence  about drug...
Drug product labeling is special?    • It complements existing knowledge sources             – 40% of 44 pharmacokinetic d...
Why product labeling has informationthat is not in the scientific literature 1. Product labels contain a summary of    inf...
Product labeling is under-utilized        by translational researchers        • only two out of more than 2,300          M...
Doesn’t DrugBank handle this?• Not really!  – DrugBank includes product label content from    the Physicians’ Desk Referen...
Second take home point:• All American product labeling content  is available in an accessible format  – Structured Product...
Structured Product Labels (SPLs)• What you would see if you downloaded an  SPL from DailyMed                  1. http://ww...
More about SPLs                  9   Biomedical Informatics
More about SPLs                  10   Biomedical Informatics
Third take home point• LinkedSPLs is a Linked Data version of  SPLs  – simplifies access to SPL content  – interoperable w...
LinkedSPLs – hypothesisHypothesis: A Linked Data knowledge base ofdrug product labels with accurate links to otherrelevant...
LinkedSPLs – A research program               13       Biomedical Informatics
LinkedSPLs – A research programYour annotationswould go here!                   14   Biomedical Informatics
LinkedSPLs – Method                •   Currently we are focusing on                    linking active ingredients in the  ...
Linkage to external sources• There are many sources of drug information  that are complementary to each other.  – DrugBank...
Example prodName       rxNormProduct        epcClass   contraindicationsNefazodone      rxcui:1098666    SEROTONIN CONTRAI...
What we tested• Three different linking approaches to link  to DrugBank    1. Structure string (InChI)    2. Ontology labe...
Linkage to DrugBank – Results• 1,246 active ingredients could be mapped to  DrugBank by at least one method     • 1,096 un...
Conclusions• The automatic approach performs very well  – A greater number of accurate links discovered    with less effor...
Want more information?• LinkedSPLs   – http://purl.org/LinkedSPLs• Google code project   – code.google.com/p/swat-4-med-sa...
Acknowledgements• NIH/NIGMS (U19 GM61388; the  Pharmacogenomic Research Network)• Agency for Healthcare Research and  Qual...
Backup Slides                23   Biomedical Informatics
Linkage in LinkedSPLsAn active ingredient from an SPL Active ingredient resource in Linked SPLs                   dailymed...
Linkage to DrugBank – Approach 1Starting with UNII….       “N7U69T4SZR”       Idea: Using NCI Resolver & InChIKey1. FDA UN...
Linkage to DrugBank – Approach 2Starting with name….     “OLANZAPINE”      Idea: Using ChEBI identifier & NCBO Portal1. Ch...
Linkage to DrugBank – Approach 3Starting with all data in the FDA UNII table and DrugBank…. Preferred Substance Name      ...
Linkage Point Discovery Framework • A generic framework for unsupervised discovery   of linkage pointsDetails can be found...
Upcoming SlideShare
Loading in …5
×

Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

774 views

Published on

Report on Linked Structured Product Labels (LinkedSPLs) and a study evaluating three different approaches to mapping active ingredients coded in Structured Product Labels to DrugBank.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
774
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Discuss the shortcomings of Structured Product Labels published by FDA
  • Introduce LinkedSPLs and discuss its goals
  • Discuss why we need linkage to external resources This can be using an example use case that relies on existence of links and so LinkedSPLs makes it possible (if not shown already in the discussion of the shortcomings of existing SPLs) Examples from paper: For example, RxNorm provides normalized names for the drug products and Unified Medical Language System mappings from the drug product and its active ingredients to concepts in numerous other vocabularies. DrugBank contains information on the specific biochemical targets that a drug entity may influence, major enzymatic pathways, and potential drug-drug interactions. While information on the latter two items may be present in the SPLs, it is hidden in the unstructured text. Similarly, ChEBI provides a rigorous classification of drug entities using a formal ontology maintained by members of the OBO. Both resources provide links to other important drug taxonomies (such as the ATC system) as well as resources that provide further information on the genes that encode drug targets, metabolism and transport of the drug, and diseases that the drug may help treat.
  • Extending the "Web of Drug Identity" with knowledge extracted from United States product labels

    1. 1. Extending the “Web of DrugIdentity” with KnowledgeExtracted from United StatesProduct LabelsOktie Hassanzadeh, IBM ResearchQian Zhu, Mayo ClinicRobert Freimuth, Mayo ClinicRichard Boyce*, University of Pittsburgh 1 Biomedical Informatics Department of Biomedical Informatics
    2. 2. Take home message• Drug product labeling is a vital, unique, and under-utilized source of claims and evidence about drugs – genes, diseases, drugs, drug interactions, special populations, and adverse reactions• All American product labeling content is available in an accessible format – Structured Product Labeling (SPL)• LinkedSPLs is a Linked Data version of SPLs – simplifies access to SPL content – interoperable with other important drug terminologies 2 Biomedical Informatics
    3. 3. Drug product labeling is special? • It complements existing knowledge sources – 40% of 44 pharmacokinetic drug-drug interactions affecting 25 drugs were located exclusively in product labeling [1] – 24% of clinical efficacy trials for 90 drugs were discussed in the product label but not the scientific literature [2] – 1/5th of the evidence for metabolic pathways for 16 drugs and 19 metabolites was found in product labeling but not the scientific literature [3]1. Boyce RD, Collins C, Clayton M, Kloke J, Horn JR. Inhibitory metabolic drug interactions with newer psycho-tropic drugs: inclusion in package inserts andinfluences of concurrence in drug interaction screening software. Ann Pharmacother. 2012;46(10):1287–1298.2. Lee K, Bacchetti P, Sim I. Publication of Clinical Trials Supporting Successful New Drug Applications: A Literature Analysis. PLoS Med. 2008;5(9):e191.3. Boyce R, Collins C, Horn J, Kalet I. Computing with evidence: Part I: A drug-mechanism evidence taxonomy oriented toward confidence assignment. Journal ofBiomedical Informatics. 2009;42(6):979–989. 3 Biomedical Informatics
    4. 4. Why product labeling has informationthat is not in the scientific literature 1. Product labels contain a summary of information reported in detail in a drug’s New Drug Application – Often difficult/impossible for a researcher to access 1. Until recently, there was no requirement to publish pre-market drug study results – This has changed since ~2010 4 Biomedical Informatics
    5. 5. Product labeling is under-utilized by translational researchers • only two out of more than 2,300 MEDLINE abstracts discuss product label NLP [1] • Several recent informatics projects did not explicitly include product label information [2-6]1. Query done on 11/26: (Natural Language Processing [MeSH Terms] OR Natural Language Processing [Text Word]) AND ((Drug Labeling [MeSH Terms] OR drug labeling[Text Word]) OR (Product Labeling, Drug [MeSH Terms]) OR ("product labeling" [Text Word]))2. Segura-Bedmar I, Martinez P, Sanchez-Cisneros D eds. Proceedings of the First Challenge Task: Drug-Drug Interaction Extraction 2011. Huelva, Spain; 2011. Available at: http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-761/. Accessed December 9, 2011.3. 16. SEMEVAL. Task Description - Extraction of Drug-Drug Interactions from BioMedical Texts. 2012. Available at: http://www.cs.york.ac.uk/semeval-2013/task9/. Accessed November 20, 2012.4. Percha B, Garten Y, Altman RB. Discovery and explanation of drug-drug interactions via text mining. Pac Symp Biocomput. 2012:410–421.5. Tari L, Anwar S, Liang S, Cai J, Baral C. Discovering drug-drug interactions: a text-mining and reasoning approach based on properties of drug metabolism. Bioinformatics. 2010;26(18):i547–553.6. Duke JD, Han X, Wang Z, et al. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated drug interactions. PLoS computational biology. 2012;8(8):e1002614. 5 Biomedical Informatics
    6. 6. Doesn’t DrugBank handle this?• Not really! – DrugBank includes product label content from the Physicians’ Desk Reference (PDR) [1] – However, the PDR is actually a subset of available product label content • claims and evidence unique to those drug product labels not included in the PDR will be missing from DrugBank • potential negative effects on informatics experiments that that require complete drug information. • E.g., possibly missed drug-interactions (DrugBank 3.0) include cimetidine-sertraline, cimetidine-venlafaxine, http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=b1de3ed9-1cb8-e419-3f25-5b0aeed5779a. Accessed November 27, 2012. [2- cimetidine-citalopram, and venlafaxine-haloperidol. 1. Physicians’ Desk Reference, 66th Edition. 2012 Edition. PDR Network; 2011. 2. 3. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=cf2d9bee-f8e3-477a-e4b4-f0e82657b7d2. Accessed November 27, 2012. 4. 5] http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=4259d9b1-de34-43a4-85a8-41dd214e9177. Accessed November 27, 2012. 5. http://dailymed.nlm.nih.gov/dailymed/lookup.cfm?setid=53c3e7ac-1852-4d70-d2b6-4fca819acf26. Accessed November 27, 2012. 6 Biomedical Informatics
    7. 7. Second take home point:• All American product labeling content is available in an accessible format – Structured Product Labeling (SPL) 7 Biomedical Informatics
    8. 8. Structured Product Labels (SPLs)• What you would see if you downloaded an SPL from DailyMed 1. http://www.fda.gov/OHRMS/DOCKETS/98fr/FDA-2005-N-0464-gdl.pdf 2. http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/default.htm 3. http://dailymed.nlm.nih.gov/dailymed/downloadLabels.cfm 8 Biomedical Informatics
    9. 9. More about SPLs 9 Biomedical Informatics
    10. 10. More about SPLs 10 Biomedical Informatics
    11. 11. Third take home point• LinkedSPLs is a Linked Data version of SPLs – simplifies access to SPL content – interoperable with other important drug terminologies 11 Biomedical Informatics
    12. 12. LinkedSPLs – hypothesisHypothesis: A Linked Data knowledge base ofdrug product labels with accurate links to otherrelevant sources of drug information will provide adynamic platform for drug information NLP thatprovides real value to translational researchers 12 Biomedical Informatics
    13. 13. LinkedSPLs – A research program 13 Biomedical Informatics
    14. 14. LinkedSPLs – A research programYour annotationswould go here! 14 Biomedical Informatics
    15. 15. LinkedSPLs – Method • Currently we are focusing on linking active ingredients in the structured portion of SPLs • unstructured text for future work 15 Biomedical Informatics
    16. 16. Linkage to external sources• There are many sources of drug information that are complementary to each other. – DrugBank: contains drug targets, pathways, interactions – RxNorm: provides UMLS mappings – ChEBI: provides rigorous classification of drugs 16 Biomedical Informatics
    17. 17. Example prodName rxNormProduct epcClass contraindicationsNefazodone rxcui:1098666 SEROTONIN CONTRAINDICATIONSHydrochloride REUPTAKE Coadministration of INHIBITOR terfenadine, astemizole, cisapride, pimozide, or carbamazepine with nefazodone hydrochloride is contraindicated…. 17 Biomedical Informatics
    18. 18. What we tested• Three different linking approaches to link to DrugBank 1. Structure string (InChI) 2. Ontology label matching (ChEBI) 3. Unsupervised linkage point discovery (Automated) [1]1. O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear in PVLDB, Vol6. Issue 6, August 2013 18 Biomedical Informatics
    19. 19. Linkage to DrugBank – Results• 1,246 active ingredients could be mapped to DrugBank by at least one method • 1,096 unmapped ingredients• The three approaches complement each other InChI ChEBI InChI + Automatic identifier identifier ChEBIInChI identifier 424 261 424 395ChEBI identifier --- 707 707 650InChI + ChEBI -- -- 831 791Automatic -- -- -- 1162 19 Biomedical Informatics
    20. 20. Conclusions• The automatic approach performs very well – A greater number of accurate links discovered with less effort• A significant number remain unmapped: – Some salt or racemic forms of mapped ingredients (e.g., alpha tocopherol acetate D) – Elements (e.g., gold, iodine), and variety of natural organic compounds including pollens (N~200)• Not all ingredients are included in DrugBank – other resources may be required to obtain complete mappings for active ingredients. 20 Biomedical Informatics
    21. 21. Want more information?• LinkedSPLs – http://purl.org/LinkedSPLs• Google code project – code.google.com/p/swat-4-med-safety/• Publications – Hassanzadeh, O., Zhu, Qian., Freimuth, RR., Boyce R. Extending the “Web of Drug Identity” with Knowledge Extracted from United States Product Labels. Proceedings of the 2013 AMIA Summit on Translational Bioinformatics. San Francisco, March 2013. – Boyce, RD., Freimuth, RR., Romagnoli, KM., Pummer, T., Hochheiser, H., Empey, PE. Toward semantic modeling of pharmacogenomic knowledge for clinical and translational decision support. Proceedings of the 2013 AMIA Summit on Translational Bioinformatics. San Francisco, March 2013. – Boyce RD, Horn JR, Hassanzadeh O, de Waard A, Schneider J, Luciano JS, Rastegar-Mojarad M, Liakata M. Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness. J Biomed Semantics. 2013 Jan 26;4(1):5. PMID: 23351881. 21 Biomedical Informatics
    22. 22. Acknowledgements• NIH/NIGMS (U19 GM61388; the Pharmacogenomic Research Network)• Agency for Healthcare Research and Quality (K12HS019461). 22 Biomedical Informatics
    23. 23. Backup Slides 23 Biomedical Informatics
    24. 24. Linkage in LinkedSPLsAn active ingredient from an SPL Active ingredient resource in Linked SPLs dailymed:activeMoiety SPL resource “OLANZAPINE” dailymed:activeMoietyUNII “N7U69T4SZR” 24 Biomedical Informatics
    25. 25. Linkage to DrugBank – Approach 1Starting with UNII…. “N7U69T4SZR” Idea: Using NCI Resolver & InChIKey1. FDA UNII table provides structure string:2-METHYL-4-(4-METHYL-1-PIPERAZINYL)-10H-THIENO(2,3-B)(1,5)BENZODIAZEPINE2. NCI Resolver provides InChIKey: KVWDHTXUZHCGIO-UHFFFAOYSA-N3. DrugBank record with the above InChIKey provides identifier: DB00334Results: 429 out of 2,264 ingredients are linked, out of which 424 are valid 25 Biomedical Informatics
    26. 26. Linkage to DrugBank – Approach 2Starting with name…. “OLANZAPINE” Idea: Using ChEBI identifier & NCBO Portal1. ChEBI preferred name from NCBO Bioportal: “OLANZAPINE”2. ChEBI identifier from NCBO Bioportal: 77353. DrugBank record with the above ChEBI identifier provides identifier: DB00334Results: 718 out of 2,264 ingredients are linked, out of which 707 are valid 26 Biomedical Informatics
    27. 27. Linkage to DrugBank – Approach 3Starting with all data in the FDA UNII table and DrugBank…. Preferred Substance Name Molecular Formula “OLANZAPINE” Idea: “2-METHYL-4….” Automatic discovery of UNII “N7U69T4SZR” synonym linkage points “ZYPREXA” 1. Index all FDA UNII table and DrugBank XML attributes 2. Search for linkage points and score similarity: UNII -> Substance Name  DrugBank -> brands -> brand: 0.94 UNII -> Preferred Substance Name  DrugBank -> name : 0.91 UNII -> Substance Name  DrugBank -> synonyms -> synonym : 0.83 … 3. Prune list of linkage points based on cardinality, coverage, and average score 4. Establish links between FDA UNII table and DrugBank using the linkage points UNII “OLANZAPINE”   DrugBank “Zyprexa” : 1.0 … Results: 1,179 out of 2,264 ingredients are linked, out of which 1,169 are valid 27 Biomedical Informatics
    28. 28. Linkage Point Discovery Framework • A generic framework for unsupervised discovery of linkage pointsDetails can be found at:O. Hassanzadeh et al. “Discovering Linkage Points over Web Data”. To Appear inPVLDB, Vol 6. Issue 6, August 2013 28 Biomedical Informatics

    ×