Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data

164 views

Published on

Slides from my talk presented at the 26th World Wide Web Conference (WWW 2017), held at Perth from April 3-8, 2017.

Published in: Science
  • Be the first to comment

  • Be the first to like this

Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data

  1. 1. Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data 26th World Wide Web Conference (WWW) Perth, 4th – 8th April 2017 MAU LIK R . KA MDA R A N D MA RK A . MU S E N Stanford Center for Biomedical Informatics Research maulikrk@stanford.edu
  2. 2. Linked Open Data (LOD) Cloud Cyganiak, Richard et al. 2014 2
  3. 3. Life Sciences Linked Open Data (LSLOD) Cloud … 3
  4. 4. 4
  5. 5. Semantic Web: Publishing Data as a Graph 5 589.25 mol_weight Gleevec (Mol. Wt.: 589.25 g/mol, Half-Life: 18 hours) inhibits PDGFR, involved in signal transduction. “18 hours” half-life x-ref Gleevec DrugB: DB00619 Gleevec Resource Description Framework (RDF) Inhibits target name type GO:0007165 (Signal Transduction) process PDGFR KEGG: D01441http://bio2rdf.org/kegg:D01441 http://bio2rdf.org/drugbank:DB00619 Uniform Resource Identifier
  6. 6. Semantic Web: Querying the Graph < 1000 mol_weight ?half-life x-ref ? ? What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction? SPARQL Query Language 6 Inhibits ?target name type GO:0007165 (Signal Transduction) process
  7. 7. Life Sciences Linked Open Data Cloud – query federation • Challenges associated with retrieving information from LSLOD sources • Pattern-based method to rewrite queries across LSLOD sources • An application in mechanism-based pharmacovigilance - PhLeGrA What this talk is about … 7
  8. 8. 8
  9. 9. Query Federation: Rewriting and executing queries across different sources QUERY FEDERATION Drug  molecular-weight < 1000  target  process = “GO:0007165”  half-life 9Schwarte, et al. ISWC 2012 Drug  molecular-weight < 1000  target  half-life Drug  molecular-weight < 1000  target  process = “GO:0007165” What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?
  10. 10. Heterogeneity in the LSLOD Cloud 10 Gleevec molecular-weight 493.61 Gleevec mol_weight 589.25 Label Mismatch: Different labels for classes, relations and attributes (clinical features) (biological features)
  11. 11. Heterogeneity in the LSLOD Cloud 11 Gleevec molecular-weight 493.61 Gleevec mol_weight 589.25 Label Mismatch: Different labels for classes, relations and attributes (clinical features) (biological features)
  12. 12. Heterogeneity in the LSLOD Cloud 12 Gleevec PDGFR drug-target Gleevec Inhibits PDGFR target name type PubMed: 21152856 source Model Mismatch: Different graph patterns to capture granularity Gleevec molecular-weight 493.61 Gleevec mol_weight 589.25 Label Mismatch: Different labels for classes, relations and attributes (clinical features) (biological features)
  13. 13. Heterogeneity in the LSLOD Cloud 13 • Inconsistent Meanings • Inconsistent URI labels for classes, relations and attributes • Inconsistent Attribute values for entities • Inconsistent Graph patterns for SPARQL queries • Incomplete Relations between entities
  14. 14. Query Rewriting fails over the LSLOD Cloud What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction? ?s a <Drug> ?s <molecular-weight> ?mw ?s <target> ?protein ?s <half-life> ?hl ?mw < 1000 g/mol ?protein <hasGO> <GO:0007165> ?s a <Drug> {?s <molecular-weight> ?mw} {?s <half-life> ?hl} ?mw < 1000 g/mol ?s a <Drug> {?s <target> ?protein} ?protein <hasGO> <GO:0007165> Query Rewriting 14
  15. 15. Using Graph Patterns for Query Rewriting ?Drug DrugBank:drug-target ?Protein ?Drug KEGG:target ?blank KEGG:link ?Protein Mapping Rules: 15 ?Drug hasTarget ?Protein
  16. 16. Using Graph Patterns for Query Rewriting ?Drug DrugBank:drug-target ?Protein ?Drug KEGG:target ?blank KEGG:link ?Protein Mapping Rules: What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction? ?s a <Drug> ?s <hasMolWt> ?mw ?s <hasTarget> ?protein ?s <hasHalfLife> ?hl ?mw < 1000 g/mol ?protein <hasGO> <GO:0007165> ?s a <Drug> {?s <molecular-weight> ?mw} ?s <drug-target> ?protein {?s <half-life> ?hl} ?mw < 1000 g/mol ?s a <Drug> ?s <mol_wt> ?mw {?s <target> ?protein_blank ?protein_blank <link> ?protein} ?protein <hasGO> <GO:0007165> Query RewriteQuery Rewriting 16 ?Drug hasTarget ?Protein
  17. 17. Life Sciences Linked Open Data Cloud – query federation • Challenges associated with retrieving information from LSLOD sources • Pattern-based method to rewrite queries across LSLOD sources • An application in mechanism-based pharmacovigilance - PhLeGrA What this talk is about … 17
  18. 18. PhLeGrA – Linked Graph Analytics in Pharmacology 18 Phlegra is a spider genus of the Salticidae family, commonly termed jumping spiders.
  19. 19. k-partite network will be generated as output 19
  20. 20. Entities and Relations from 4 different sources are retrieved to create the k-partite Network This k-partite network is generated in < 1 day 20
  21. 21. Query Federation overcomes heterogeneous Distribution of Entities and Relations R1: Drug hasTarget ProteinE1: Drug • Similar and complete unique entities and relations exist between data sources • Necessary to get the complete picture, but also determine sources of noise 21
  22. 22. Several underlying mechanisms are possible … http://onto-apps.stanford.edu/phlegra 22
  23. 23. A graph analytics module to rank the mechanisms 23
  24. 24. Preliminary results using network-based Apriori Algorithm for ranking mechanisms 24
  25. 25. The story so far … 25 Pattern-based federation methods can retrieve data from multiple sources in the Life Sciences Linked Open Data Cloud, and can enable development of advanced methods for mechanism-based pharmacovigilance. …
  26. 26. Acknowledgments Musen Lab, Stanford Biomedical Informatics Training Program Michel Dumontier US NIH Grant HG004028 26
  27. 27. PhLeGrA – Linked Graph Analytics in Pharmacology 27 www.stanford.edu/~maulikrk/research.html www.onto-apps.stanford.edu/phlegra

×