Finding & integrating relevant information is costly & time consuming Information re-use becomes a problem...
WOMBAT ligand-protein database , Venns (on the &quot;public domain drug discovery slide) this illustrates the overlap in pharmacology data across different databases, commercial and public. Basically what its saying is that there is some overlap but lots of data unique to any one source, so you need integration. 2006 --> 2008, you saw the data grow as expected, but still the overlap remains roughly the same. Hence to get a holistic view of pharmacology, you need to integrate [and that integration costs companies, every company is doing the same thing, with public data, public domain needs this too but the industry knowledge (e,g, in judging what makes a good assay vs a bad assay) is not translated back into the public etc] -> hence OPS is a really good idea.
Interoperability and federation
Work Stream 1: Open Pharmacological Space (OPS) Service Layer Standardised software layer to allow public DD resource integration Define standards and construct OPS service layer Develop interface (API) for data access, integration and analysis Develop secure access models Existing Drug Discovery (DD) Resource Integration Modify existing public DD resources to operate with OPS service layer IMI funds cost of modifications Consider partnership with international resources WS2: Development of exemplar work packages Develop exemplar services to test OPS Service Layer proof of concept of the functionality developed in work stream 1 services will depend on the expertise of the consortium Some possible exemplar services: Target Dossier (Data Integration) Integrate target info from diverse sources e.g. bioactive molecules, druggability, orthology, drug expression signatures Pharmacological Network Navigator (Data Visualisation) Profile “chemical space” of screening libraries Visualisation to assist lead hopping Compound Dossier (Data Analysis) Integrate compound information Algorithms to predict drug action
Services sit on top of semantic fabric
Chem-Bio Navigator: querying and visualization of sets of pharmacologically annotated small molecules, on basis of chemical substructures, pharmacophores, biological activities Target Dossier: in silico dossiers about targets, incorporating related information on sequences, structures, pathways, diseases and small molecules Heavy on the text mining Polypharmacology Browser: map coverage of the chemo-biological space, to facilitate the polypharmacological profiling of small molecules Polypharmacology – drug hits many targets Main architecture, technical implementation and primary capabilities driven by a set of prioritised research questions Based on the main research questions define prioritised data sources Three Exemplars will be developed to demonstrate the capabilites of the OPS System and to define interfaces and input/output standards Three Use cases have been defined to benchmark the OPS system towards current standard workflows in data retrieval and mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results
Data retrieval and data/text mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results Main architecture, technical implementation and primary capabilities driven by a set of prioritised research questions Based on the main research questions define prioritised data sources Three Exemplars will be developed to demonstrate the capabilites of the OPS System and to define interfaces and input/output standards Three Use cases have been defined to benchmark the OPS system towards current standard workflows in data retrieval and mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results
In biochemistry and pharmacology , a ligand (from the Latin ligandum , binding ) is a substance that forms a complex with a biomolecule to serve a biological purpose. In a narrower sense, it is a signal triggering molecule, binding to a site on a target protein .
Fraunhofer SCAI: SCAIView Sparql interface to LarKC backend PDSP receptor database - The grey box is a set of Web Services API that provide nicer interfaces for GUI developers. Currently, these did not get implemented. Instead, we are issuing sparql queries directly from the various GUIs. - The set of purple boxes circled in red - are a set of facilities for allowing us to track provenance, curate data that has been aggregated in the cache, and do aggregated quality measures. These ended up *not* being in the lashup do to time. All these are allowing us to give feedback about the quality of the integrated data. (b) The red words are correct. I would say we don't do data mapping. We translate or get rdf versions of the data. The mapping is performed at runtime based on mappings. The updated architecture diagram with lessons learned from the lashup (Slide 17 in the bootstrapping development slides). Maybe you can show before and after? RE: Scalability (from a paper on the Larkc platform) LarKC aims to be the platform to address these issues, and is built on the following principles: • Achieve scalability through parallelisation. Different possibilities are offered either through tight integration of parallel processes on cluster-style hardware, or through much looser coupled wide-area distributed computing. • Achieve scalability through giving up completeness. Partial reasoning results are useful in many domains of application. Significant speedups and can be obtained by incompleteness in many stages of the reasoning process, ranging from selection of the axioms to incomplete reasoning over those axioms. • Do not build a single reasoning engine that is supposed to be suited for all kinds of use-cases, but instead build a configurable platform on which different components can be plugged in to obtain different scale/efficiency trade-offs, as required by different use- cases. RE: technologies being used in lashup larkc, lsp4all, ChemSpider but the next step is to incorporate ConceptWik and bridgeDB technologies The GUIs in the lashup demo were Utopia, Pathvisio, and the generic interface from Lundbeck RE: how larkc was populated loading RDF from source, but we have plans for an automated system that uses semantic site map standards
Search by enzyme family SMILES: The S implified M olecular I nput L ine E ntry S pecification (SMILES) a line notation for molecules.
DSD – dataset descriptor, probably in VOID.
A series of are a key vehicle for building the OPS community and encouraging wider engagement. These OPS Workshops will be hosted twice-annually, and focus on different aspects of drug discovery, the technology used, data sharing, sustainability, licensing and practical applications.
An Innovative Medicines Initiative to Build a Semantics-Based Open Pharmacology Space for Drug Discovery Twitter: @Open_Phacts Prof Carole Goble FREng FBCS www.openphacts.org BioITWorldExpoEurope – Oct 13 2011
Why is it so hard to…. What’s the structure? Are they in our file? Whats similar? Whats the target? Pharmacology data? Known Pathways? Working On Now? Connections to disease? Expressed in right cell type? Competitors? IP?
Syntactic Normalisation Semantic Normalisation KEGG URI CheBI URI BRENDA URI Open Data Linked Data Linked Knowledge OPS in RDF “ triples” Annotated OPS with ontologies
Developers (Builders) End users (Drivers) A use case driven approach Prioritised research questions Prioritised data sources Target dossiers about targets, incorporating related information on sequences, structures, pathways, diseases and small molecules Chem/bio space navigator of sets of pharmacologically annotated small molecules, by chemical substructures, pharmacophores, biological activities Polypharmacology browser map coverage of the chemo-biological space for polypharmacological profiling of small molecules Exemplars
Developers (Builders) End users (Drivers) A use case driven approach Target validation work-bench: in silico target validation studies Fusion/aggregation of data from different domains to improve predictions of drug-transporter interactions Combination of physicochemical data & data from transporter interaction for prediction of blood-brain barrier permeation and tissue distribution Prioritised research questions Prioritised data sources Bench mark Pilots
Chemspider for cross check of chemistry Wombat as show case for integration of commerical db Science-Driven Data set selection Database Webpage Available for No of Ligands Ligands TP Search http://22.214.171.124/tp-search/index.html Human, Mouse, Rat, Rabbit, Pig >5,000 Inhibitor, substrate, inducers ChEMBL https://www.ebi.ac.uk/chembldb/ Human, Mouse, Rat ~ 5,000 Inhibitor, substrate, inducers PharmGKB http://www.pharmgkb.org Human NA NA DrugBank http://www.drugbank.ca/ Human NA NA HMTDSEngine http://digibench.net/ Human NA NA PubchemBioassay http://www.ncbi.nlm.nih.gov/sites/entrez Human 194,393 ABCB1, G2 Inhibitors, others NCI Database http://dtp.nci.nih.gov/ Human ~ 30,000 Substrates/Collatral sensitivity CancerResource http://bioinf-data.charite.de/cancerresource/index.php?site=home Human NA Ligands
Does it provide an adequate answer to the questions 15 and 30?
Demo for users (drive group) to recalibrate build tasks in order to better respond to user requirements
rdf mapping id mapping concept mapping interface data Sources triple store chemical resolution Chem2Bio2RDF text mining
GUI - User suggestions for workflow Select question (“template” from category) Fill in template variables Via “relation browser” and add filters (IC50 value, dates etc) View results, filter and export dataset Select relevant data sources Execute search Modify query (change concepts and attributes) www.openphacts.org
LSP4All (Lundbeck) Generic Interface search by enzyme family Q15: All oxidoreductase inhibitors active <100nMolars in both human & mouse Credit: Sune Askjær / Claus Stie Kallesøe (Lundbeck) Pharmacological data Exact and structure search Navigate from compounds to targets