Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bibliological data science and drug discovery


Published on

Presented at the 2016 ACS Fall Meeting in Philadelpha, session "Effectively Harnessing the World's Literature to Inform Rational Compound Design", on 8/21/16.

Published in: Science
  • Be the first to comment

Bibliological data science and drug discovery

  1. 1. Bibliological data science and drug discovery Knowing the knowns* Effectively Harnessing the World’s Literature To Inform Rational Compound Design - ACS National Meeting, Philadelphia, Aug 21-24, 2016 Jeremy J Yang Translational Informatics Division School of Medicine University of New Mexico Integrative Data Science Lab School of Informatics & Computing Indiana University *phrase borrowed from Edgar Jacoby, Janssen.
  2. 2. In science, luck favors the prepared. - Louis Pasteur The main thing was not to . . . "foul up." - The Right Stuff, by Tom Wolfe, about John Glenn.
  3. 3. Overview of talk ● Formulation of problem ● Resources and examples: TIN-X, Target Importance and Novelty Explorer (&IDG) Chem2Bio2RDF OPDDR, Open Phenotypic Drug Discovery Resource DrugCentral
  4. 4. Formulation of problem ● "World's Literature" redefined by online revolution ● Rational Compound Design = improving our odds ● For given research question, what are the known knowns? ● Connect the dots and weigh the evidence from global knowledge graph.
  5. 5. TIN-X
  6. 6. TIN-X Target Importance & Novelty Explorer ● Bibliometric application developed for Illuminating the Druggable Genome (IDG) project ● Text mining from Novo Nordisk Center for Protein Research (U. Copenhagen) lab of Lars Juhl Jensen. ● Algorithm and client developed at UNM (Cristian Bologa, Daniel Cannon) ● Disease Ontology (DO) classification ● Drug Target Ontology (DTO) protein classification
  7. 7. Illuminating the Druggable Genome (IDG) 7 Knowledge Mgmt Center PI: Tudor Oprea, MD, PhD
  8. 8. TIN-X
  9. 9. TIN-X
  10. 10. TIN-X
  11. 11. Target Novelty: Fk = 1 / Tk ● Tk = # targets in paper (k) ● Fk = fractional score of paper (k) ● for papers where Tk > 0 Ni = 1 / ∑(Fk ) ● Ni = novelty, target (i) ● sum over papers where target (i) mentioned Target-Disease Importance: Fk = 1 / (Tk * Dk ) ● Tk = # targets in paper (k) ● Dk = # diseases in paper (k) ● Fk = fractional score of paper (k) Iij = ∑(Fk ) ● Iij = importance, target (i) for disease (j) ● sum over papers where both mentioned Target Importance and Novelty Explorer (TIN-X), Daniel Cannon, Jeremy Yang, Stephen Mathias, Oleg Ursu, Subramani Mani, Anna Waller, Stephan Schürer, Lars Juhl Jensen, Larry Sklar, Cristian Bologa, and Tudor Oprea (manuscript in preparation). TIN-X
  12. 12. TIN-X Target Importance & Novelty Explorer ● Text mining is a valuable tool for monitoring literature, filtering and ranking, and detecting trends. ● Automation can infer patterns regarding community trends and consensus. ● Interactive visualization tools help navigate big data. ● Good big data text miners care about small data too!
  13. 13. TIN-X Key contributors Cristian Bologa Daniel Cannon Lars Juhl Jensen
  14. 14. Chem2Bio2RDF
  15. 15. ● 24 sources, 52 datasets, 78M triples ● Semantically linked ● Chen, B, et al, BMC Bioinformatics (2010). ● Chen, B et al, PLoS Comp Bio (2012). ● Fu, G et al, BMC Bioinfo (2016). ● Related projects: Bio2RDF, LOD
  16. 16. Classes: biological chemical chemogenomics literature phenotype systems disease pathway polypharmacology PPI side effect BindingDB BindingMOAD IU ChEBI ChEMBL CTD DCDB DIP DrugBank HGNC HPRD KEGG MATADOR OMIM PDBe PDSP PharmGKB PubChem PubMed Reactome SIDER TTD UniProt Sources:
  17. 17. Linked Open Data (LOD)
  18. 18. Chem2Bio2RDF apps: (1) SLAP, (2) Metapaths 2012 2016
  19. 19. ● Data semantics essential for integration of heterogeneous sources ● Strong evidence requires strong semantics ● Semantic Web Technologies common framework enabling -- but not assuring -- community progress ● Chem2Bio2RDF v2.0 to leverage major community advances (esp. Open PHACTS) ● Data ecosystems, coop-tition & prisoner's dilemma
  20. 20. Key contributors Bin Chen Ying Ding David Wild
  21. 21. OPDDR
  22. 22. OPDDR Open Phenotypic Drug Discovery Resource
  23. 23.
  24. 24. OPDDR collaboration
  25. 25. Example: OIDD HeLa cell based assay Integrated RDF bioassay:AID1117350 skos:exactMatch oidd_assay:17 . bioassay:AID1117350 dcterms:source source:ID846 ; dcterms:title "Increased chromatin condensation in HeLa cells-IC50"@en . bioassay:AID1117350 rdf:type bao:BAO_0002786 . bioassay:AID1117350 rdf:type bao:BAO_0000010 . bioassay:AID1117350 rdf:type bao:BAO_0000219 . endpoint:SID170464897_AID1117349 vocabulary:PubChemAssayOutcome vocabulary:active ; sio:has-value "0.0656"^^xsd:float ; a bao:BAO_0000190 ; rdfs:label "IC50"@en . substance:SID170464897 skos:exactMatch chembl_molecule:CHEMBL1483 . chembl_assay:OIDD00017 cco:hasCellLine chembl_cell_line:CHEMBL3308376 .
  26. 26. D2D builds apps, tools and solutions for knowledge discovery powered by fast, scalable network analytics and rigorous semantics. Predictive Phenotypic Profiler (P3) prototype
  27. 27.
  28. 28. OPDDR ● OPDDR phenotypic assays have been linked and integrated via community semantics to both phenotypic (cell lines) and molecular (genomic/protein targets) ● New phenotypic knowledge domain offers additional value in drug discovery and pharmacological informatics ● Open PHACTS excellent, well suited platform
  29. 29. DrugCentral
  30. 30. DrugCentral ● DrugCentral is a free, open, curated resource about approved drugs, designed for research ● Compounds, products, labels, targets, IDs, names ● DrugCentral developed over several years at UNM ● DrugCentral recently released with new interface ● License: CC-BY-SA
  31. 31.
  32. 32.
  33. 33. DrugCentral ● Free, open, accurate, comprehensive drug reference for biomolecular and biomedical informatics research Compounds 4444 Products 84787 Synonyms 20522 Structures 4231 Targets 3651 Bioactivities 15620 MoA 3484 SNOMED 45349
  34. 34. "DrugCentral: online drug compendium", Oleg Ursu, Jayme Holmes, Jeffrey Knockel, Cristian Bologa, Jeremy Yang, Stephen Mathias, Stuart Nelson, Tudor Oprea (manuscript submitted).
  35. 35. In Conclusion ● New resources continue to emerge and evolve, providing opportunities for knowledge driven drug discovery ● Community standards → more intelligent web ● Adapt to new data environment for success ● Private + public data must be integrated to ○ Be prepared (like Pasteur) ○ Not "foul up" (like Glenn)