Opening up pharmacological space, the OPEN PHACTs api


Published on

Description of the Open PHACTs API presented by at the technology track of the ISMB 20123 meeting in Berlin: 13-07-2013.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Opening up pharmacological space, the OPEN PHACTs api

  1. 1. Opening up Pharmacological Space: The Open PHACTS API @Chris_Evelo Dept. Bioinformatics - BiGCaT Maastricht University
  2. 2. Fundamental issues: There is a *lot* of science outside your walls It’s a chaotic space Scientists want to find information quickly and easily Often they just “cant get there” (or don’t even know where “there” is) And you have to manage it all (or not)
  3. 3. Pre-competitive Informatics: Pharma are all accessing, processing, storing & re-processing external research data Literature Genbank Patents PubChem Data Integration Databases Data Analysis Downloads x Repeat @ each company Firewalled Databases Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
  4. 4. The Innovative Medicines Initiative • EC funded public-private partnership for pharmaceutical research • Focus on key problems – Efficacy, Safety, Education & Training, Knowledge Management The Open PHACTS Project • Create a semantic integration hub (“Open Pharmacological Space”)… • Deliver services to support on-going drug discovery programs in pharma and public domain • Not just another project; leading academics in semantics, pharmacology and informatics, driven by solid industry business requirements • 23 academic partners, 8 pharmaceutical companies, 3 biotech companies • Work split into clusters: • Technical Build (focus here) • Scientific Drive • Community & Sustainability The Project
  5. 5. Major Work Streams Build: OPS service layer and resource integration Drive: Development of exemplar work packages & Applications Sustain: Community engagement and long-term sustainability ‘Consumer’ Firewall Target Dossier Pharmacological Networks Compound Dossier OPS Service Layer Assertion & Meta Data Mgmt Transform / Translate Integrator Std Public Vocabularies Business Rules Supplier Firewall Work Stream 2: Exemplar Drug Discovery Informatics tools Develop exemplar services to test OPS Service Layer Target Dossier (Data Integration) Pharmacological Network Navigator (Data Visualisation) Compound Dossier (Data Analysis) Work Stream 1: Open Pharmacological Space (OPS) Service Layer Standardised software layer to allow public DD resource integration − − Db 2 − Db 4 Corpus 1 Db 3 Corpus 5 Define standards and construct OPS service layer Develop interface (API) for data access, integration and analysis Develop secure access models Existing Drug Discovery (DD) Resource Integration
  6. 6. “Let me compare MW, logP and PSA for known oxidoreductase inhibitors” “What is the selectivity profile of known p38 inhibitors?” ChEMBL ChEBI DrugBank UniProt ConceptWiki Gene Ontology “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” WikiPathways UMLS ChemSpider GeneGo GVKBio TrialTrove TR Integrity
  7. 7. Business Question Driven Approach Number sum Nr of 1 Question 15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse 18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound? 24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives. 32 13 8 For a given interaction profile, give me compounds similar to it. 37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X. 38 13 8 41 13 8 44 13 8 46 13 8 59 14 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not). A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature. Give me all active compounds on a given target with the relevant assay data Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease) Identify all known protein-protein interaction inhibitors Paper in DDT
  8. 8. Open PHACTS Scientific Services “Provenance Everywhere” Platform Explorer Apps API Standards
  9. 9. Contrary to popular belief: If you produce RDF from two different data sources and put them in a triple store… Magic does not happen, and it does not automatically become linked data! You need to link: • • • • • Terms to ontologies Ontologies to ontologies Identifiers to identifiers Text to known concepts Chemicals to known structures
  10. 10. Core Platform Apps Identity Resolution Service Identifier Management Service “Adenosine receptor 2a” Linked Data API (RDF/XML, TTL, JSON) P12374 EC2.43.4 CS4532 Domain Specific Services Semantic Workflow Engine Chemistry Registration Normalisation & Q/C Data Cache (Virtuoso Triple Store) Indexing VoID VoID VoID Nanopub Public Ontologies Db Db Public Content VoID Nanopub Db VoID Nanopub Db Commercial User Annotations
  11. 11. Present Content Source Initial Records Triples Properties Chembl 1,149,792 ~1,091,462 cmpds ~8845 targets 146,079,194 17 cmpds 13 targets DrugBank 19,628 ~14,000 drugs ~5000 targets 517,584 74 UniProt 536,789 156,569,764 78 ENZYME 6,187 73,838 2 ChEBI 35,584 905,189 2 GO/GOA 38,137 24,574,774 42 ChemSpider/ACD 1,194,437 161,336,857 22 ACD, 4 CS ConceptWiki 2,828,966 3,739,884 1 WikiPathways Just added
  12. 12. Quantitative Data Challenges STANDARD_TYPE UNIT_COUNT ---------------- ------AC50 7 Activity 421 STANDARD_TYPE EC50 39 -----------------IC50 46 IC50 ID50 42 IC50 Ki 23 IC50 IC50 Log IC50 4 IC50 Log Ki 7 Potency 11 IC50 IC50 log IC50 0 IC50 IC50 IC50 >5000 types Implemented using the Quantities, Dimension, Units, Types Ontology ( STANDARD_UNITS COUNT(*) ------------------ -------nM 829448 ug.mL-1 41000 38521 ug/ml 2038 ug ml-1 509 mg kg-1 295 molar ratio 178 ug 117 % 113 uM well-1 52 ~ 100 units
  13. 13. GB:29384 P12047 X31045 Let the IMS take the strain….
  14. 14. Open PHACTS GUIs Identity Resolution ee2a7b1ed67c-… Camalexin Results Query Expander Identity Mapping (BridgeDb based) CHEMBL239716 552646 ChEMBLRDF ChemSpider
  15. 15. Chemistry Registration • Existing chemistry registration system uses standard ChemSpider deposition system: includes low-level structure validation and manual curation service by RSC staff. • New Registration System in Development • Utilizes ChemSpider Validation and Standardization platform including collapsing tautomers • Utilizes FDA rule set as basis for standardization (GSK lead) • Will generate Open PHACTS identifier (OPS ID) Chemistry Registration Normalisation & Q/C
  16. 16. Its easy to integrate, difficult to integrate well:
  17. 17. What Is Gleevec? Imatinib Mesylate ChemSpider Drugbank PubChem
  18. 18. Dynamic Equality Strict Relaxed Analysing Browsing chemspider:gleevec drugbank:gleevec LinkSet#1 { chemspider:gleevec hasParent imatinib ... drugbank:gleevec exactMatch imatinib ... }
  19. 19.
  20. 20. Example applications Advanced analytics ChemBioNavigator Navigating at the interface of chemical and biological data with sorting and plotting options TargetDossier Interconnecting Open PHACTS with multiple target centric services. Exploring target similarity using diverse criteria PharmaTrek Interactive Polypharmacology space of experimental annotations UTOPIA Semantic enrichment of scientific PDFs Predictions GARFIELD Prediction of target pharmacology based on the Similar Ensemble Approach eTOX connector Automatic extraction of data for building predictive toxicology models in eTOX project
  21. 21. Target dossier (CNIO) Front-end framework to visualize biological data
  22. 22. A Precompetitive Knowledge Framework Pharma Needs Integration Inputs Management / Governance Sustainability Stability Security Mapping & Populating Data Mining Services/Alg orithms Architecture Vocabularies & Identifiers (URIs) Community KD Innovation Interfaces & Services Content Structured & Unstructured
  23. 23. Leveraging Our Community Associated partners MoU Organisations, most will join here Support, information Exchange of ideas, data, technology Opportunities to demo at community webinars Need MoU Development partnerships Influence on API developments Opportunities to demo ideas & use cases to core team Need MoU and annexe Consortium 28 current members Associated partners +Annexe Development partnerships Consortium
  24. 24. Sustaining Impact “Software is free like puppies are free they both need money for maintenance” …and more resource for future development
  25. 25. The Open PHACTS community ecosystem
  26. 26. A UK-based not-for-profit member owned company Publish Maintain and develop the Open PHACTS Platform ‘Research based’ organisation Many different types of member Develop and build the Open PHACTS community Become partner in other consortia Develop and build new valueadded analytical methods Develop and contribute to data standards Innovate Provide stable API services Precompetitive Promote and build interoperable data beyond Open PHACTS
  27. 27. Becoming part of the Open PHACTS Foundation Members A UK-based not-for-profit member owned company membership offers early access to platform updates and releases the opportunity to steer research and development directions receive technical support work with the ecosystem of developers and semantic data integrators around Open PHACTS tiered membership familiar business and governance model
  28. 28. Open PHACTS Project Partners Pfizer Limited – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen OpenLink
  29. 29. Access to a wide range of interconnected data – easily jump between pharmacology, chemistry, disease, pathways and other databases without having to perform complex mapping operations Query by data type, not by data source (“Protein Information” not “Uniprot Information) API queries that seamlessly connect data (for instance the Pharmacology query draws data from Chembl, ChemSpider, ConceptWiki and Drugbank) Strong chemistry representation – all chemicals reprocessed via Open PHACTS chemical registry to ensure consistency across databases Built using open community standards, not an ad-hoc solution. Developed in conjuction with 8 major pharma (so your app will speak their language!) Simple, flexible data-joining (join compound data ignoring salt forms, join protein data ignoring species) Provenance everywhere – every single data point tagged with source, version, author, etc Nanopublication-enabled. Access to a rich dataset of established and emerging biomedical “assertions” Professionally Hosted (Continually Monitored) Developer-friendly JSON/XML methods. Consistent API for multiple services Seamless data upgrades. We manage updates so you don’t have to Community-curation tools to enhance and correct content Access to a rich application network (many different App builders) Toolkits to support many different languages, workflow engines and user applications Private and secure, suitable for confidential analyses Active & still growing through a unique public-private partnership