Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Connecting the dots: drug information and Linked Data

395 views

Published on

Presented as part of the AMIA2014 Knowledge Representation + Semantics and
Clinical Information Systems Working Groups Pre-Symposium "Drug
Terminology Standards: Meaningful Use and Better Knowledge"

November 16, 2014
Washington, DC

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

Connecting the dots: drug information and Linked Data

  1. 1. Connec&ng the dots: drug informa&on and Linked Data Tomasz Adamusiak MD PhD 7omasz
  2. 2. Conflict of interest disclosure • Tomasz Adamusiak is a Senior Data Scien&st at Thomson Reuters, provider of intelligent informa&on for pharma and research ins&tu&ons
  3. 3. Tomasz Adamusiak MD PhD • Former NLM Fellow and bioinforma&cian at EBI
  4. 4. Learning Objec&ves • Describe Linked Data and and seman&c content integra&on technologies • Recognize the value of integra&ng drug informa&on with public resources
  5. 5. AS OF 2012, ABOUT 2.5 EXABYTES OF DATA CREATED EACH DAY
  6. 6. 2.5 exabytes ≈ 7 000 Libraries of Congress By Carol M. Highsmith (Own work) [CC-­‐BY-­‐SA-­‐3.0]
  7. 7. 2.5 exabytes ≈ 7 000 Libraries of Congress
  8. 8. Tim Berners-­‐Lee: the next Web of open, linked data If you want to put something on the web there are three rules: 1. All kinds of conceptual things, they have names now that start with HTTP. 2. If I take one of these HTTP names and I look it up [...] I fetch the data using the HTTP protocol from the web, I will get back some data in a standard format 3. It's got rela5onships [..] the other thing that it's related to is given one of those names that starts HTTP. So, I can go ahead and look that thing up. Sir Tim Berners-­‐Lee on the next Web (TED2009)
  9. 9. The 5 stars of open linked data ★ Pu`ng anything up there ★★ Machine readable format ★★★ Non-­‐proprietary format ★★★★ Use URLs to iden&fy things ★★★★★ Provide context by linking to others Gov 2.0 Expo 2010: Tim Berners-­‐Lee, "Open, Linked Data for a Global Community” hdps://www.youtube.com/watch?v=ga1aSJXCFe0#t=328
  10. 10. RDF triple is the core concept underpinning the seman&c web subject predicate object <hdp://www.example.com/index.html> <hdp://purl.org/dc/elements/1.1/creator> „John Smith” example:index.html John Smith dc:creator
  11. 11. Several data sources available
  12. 12. Caveat 1: missing central URI reconcilia&on • Responsibility for URIs: hdp://bio2rdf.org/mesh:68009154 hdp://bio2rdf.org/pubmed:11992264 hdp://bio2rdf.org/go:0016458 hdp://purl.org/obo/owl/GO#GO_0016458 • Versioning: hdp://sig.uw.edu/fma#Anatomical_en&ty (FMA 3.1) hdp://sig.biostr.washington.edu/fma3.0#Anatomical_en&ty (FMA 3.0) hdp://purl.obolibrary.org/obo/GO_0016458 (Foundry-­‐compliant URI) • Requires insAtuAonal support • RxNorm in RDF?
  13. 13. Caveat 2: data locality hdp://gigaom.com/broadband/the-­‐storage-­‐vs-­‐bandwidth-­‐debate/
  14. 14. CONNECTING THE DOTS Given therapeutic action - PPAR gamma partial/ agonist – what were the related compounds studied, the indications for treatment, technologies of drug delivery, related genes and affected pathways?
  15. 15. EBI RDF Plasorm • All model elements with annota&ons to acetylcholine-­‐ gated channel complex (GO:0005892) • Samples treated with alcohol • Find drug-­‐like (but currently not approved) molecules which bind 7TM1 GPCRs with high affinity • Under what experimental condi&ons is Ensembl gene ENSG00000129991 (TNNI3) expressed? • Pathways that reference Insulin (P01308) • What are the preferred gene name and disease annota&ons of all human UniProt entries that are known to be involved in a disease? ★★★★★
  16. 16. ★★★★★ Open PHACTS Discovery Plasorm Freely available, pharmacological data from a variety of resources + tools and services to support pharmacological research
  17. 17. ★★★★★ Bio2RDF: Linked Data for the Life Sciences • ~11 billion triples across 35 datasets • Datasets include: clinicaltrials.gov, dbSNP, GenAge, GenDR, LSR, OrphaNet, PubMed, SIDER, WormBase • Locally hosted endpoints: chembl, linkedSPL, pathwaycommons, reactome, wikipathways
  18. 18. NCBO BioPortal RDF • Provide RDF for each class in BioPortal so that we can have a URL to a concept that resolves to a set of RDF triples that provide essen&al informa&on about the term • Provide an RDF dump of each ontology in BioPortal to put them in a tripelstore to enable SPARQL access to the ontologies ★★★★★
  19. 19. ★★★★★ Linked Structured Product Labels hdp://purl.org/net/linkedSPLs • LinkedSPLs publishes all sec&ons of FDA-­‐approved prescrip&on and over the counter drug package inserts from DailyMed for use by NLP and Seman&c Web researchers • All ac&ve moie&es and product labels are mapped to RxNORM PURLs provided by the NCBO Bioportal SPARQL endpoint • LinkedSPLs is provided as a service as part of the Drug Interac&on Knowledge Base (DIKB) project Boyce RD et al. Dynamic enhancement of drug product labels to support drug safety, efficacy, and effecLveness. J Biomed SemanLcs. 2013 Jan 26;4(1):5. PMID: 23351881.
  20. 20. Making public FDA datasets more accessible • Adverse events. ★★★★ FDA’s publically available drug adverse event and medica&on error reports, and medical device adverse event reports. • Recalls. Enforcement report data, containing informa&on gathered from public no&ces about certain recalls of FDA-­‐regulated products. • Labeling. Structured Product Labeling (SPL) data for FDA-­‐regulated human prescrip&on drug, OTC drug and biological product labeling.
  21. 21. RDF Representa&on of CDISC Founda&onal Standards • PhUSE and CDISC Draz RDF Representa&on • RDF could provide a founda&on for interoperable end to end data standards in clinical research • hdp://github.com/phuse-­‐org/rdf.cdisc.org
  22. 22. Thank You

×