Open PHACTS BioIT World Europe CAG 111013
Upcoming SlideShare
Loading in...5
×
 

Open PHACTS BioIT World Europe CAG 111013

on

  • 1,774 views

Presentation on the IMI Open PHACTS project, at BioIT World Europe, given by Prof Carole Goble

Presentation on the IMI Open PHACTS project, at BioIT World Europe, given by Prof Carole Goble

Statistics

Views

Total Views
1,774
Views on SlideShare
1,770
Embed Views
4

Actions

Likes
1
Downloads
28
Comments
0

3 Embeds 4

https://twitter.com 2
http://www.twylah.com 1
https://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Scott Marshall gave a nice introduction
  • Finding & integrating relevant information is costly & time consuming Information re-use becomes a problem...
  • WOMBAT ligand-protein database , Venns (on the "public domain drug discovery slide) this illustrates the overlap in pharmacology data across different databases, commercial and public. Basically what its saying is that there is some overlap but lots of data unique to any one source, so you need integration. 2006 --> 2008, you saw the data grow as expected, but still the overlap remains roughly the same. Hence to get a holistic view of pharmacology, you need to integrate [and that integration costs companies, every company is doing the same thing, with public data, public domain needs this too but the industry knowledge (e,g, in judging what makes a good assay vs a bad assay) is not translated back into the public etc] -> hence OPS is a really good idea.
  •   Interoperability and federation
  • -
  • Work Stream 1: Open Pharmacological Space (OPS) Service Layer Standardised software layer to allow public DD resource integration Define standards and construct OPS service layer Develop interface (API) for data access, integration and analysis Develop secure access models Existing Drug Discovery (DD) Resource Integration Modify existing public DD resources to operate with OPS service layer IMI funds cost of modifications Consider partnership with international resources WS2: Development of exemplar work packages Develop exemplar services to test OPS Service Layer proof of concept of the functionality developed in work stream 1 services will depend on the expertise of the consortium Some possible exemplar services: Target Dossier (Data Integration) Integrate target info from diverse sources e.g. bioactive molecules, druggability, orthology, drug expression signatures Pharmacological Network Navigator (Data Visualisation) Profile “chemical space” of screening libraries Visualisation to assist lead hopping Compound Dossier (Data Analysis) Integrate compound information Algorithms to predict drug action
  • Services sit on top of semantic fabric
  • Chem-Bio Navigator: querying and visualization of sets of pharmacologically annotated small molecules, on basis of chemical substructures, pharmacophores, biological activities Target Dossier: in silico dossiers about targets, incorporating related information on sequences, structures, pathways, diseases and small molecules Heavy on the text mining Polypharmacology Browser: map coverage of the chemo-biological space, to facilitate the polypharmacological profiling of small molecules Polypharmacology – drug hits many targets Main architecture, technical implementation and primary capabilities driven by a set of prioritised research questions Based on the main research questions define prioritised data sources Three Exemplars will be developed to demonstrate the capabilites of the OPS System and to define interfaces and input/output standards Three Use cases have been defined to benchmark the OPS system towards current standard workflows in data retrieval and mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results
  • Data retrieval and data/text mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results Main architecture, technical implementation and primary capabilities driven by a set of prioritised research questions Based on the main research questions define prioritised data sources Three Exemplars will be developed to demonstrate the capabilites of the OPS System and to define interfaces and input/output standards Three Use cases have been defined to benchmark the OPS system towards current standard workflows in data retrieval and mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results
  • ( http://bioassayontology.org/)
  • In biochemistry and pharmacology , a ligand (from the Latin ligandum , binding ) is a substance that forms a complex with a biomolecule to serve a biological purpose. In a narrower sense, it is a signal triggering molecule, binding to a site on a target protein .
  • Fraunhofer SCAI: SCAIView Sparql interface to LarKC backend PDSP receptor database - The grey box is a set of Web Services API that provide nicer interfaces for GUI developers. Currently, these did not get implemented. Instead, we are issuing sparql queries directly from the various GUIs. - The set of purple boxes circled in red - are a set of facilities for allowing us to track provenance, curate data that has been aggregated in the cache, and do aggregated quality measures. These ended up *not* being in the lashup do to time. All these are allowing us to give feedback about the quality of the integrated data. (b) The red words are correct.  I would say we don't do data mapping. We translate or get rdf versions of the data. The mapping is performed at runtime based on mappings. The updated architecture diagram with lessons learned from the lashup (Slide 17 in the bootstrapping development slides). Maybe you can show before and after? RE: Scalability (from a paper on the Larkc platform) LarKC aims to be the platform to address these issues, and is built on the following principles: • Achieve scalability through parallelisation. Different possibilities are offered either through tight integration of parallel processes on cluster-style hardware, or through much looser coupled wide-area distributed computing. • Achieve scalability through giving up completeness. Partial reasoning results are useful in many domains of application. Significant speedups and can be obtained by incompleteness in many stages of the reasoning process, ranging from selection of the axioms to incomplete reasoning over those axioms. • Do not build a single reasoning engine that is supposed to be suited for all kinds of use-cases, but instead build a configurable platform on which different components can be plugged in to obtain different scale/efficiency trade-offs, as required by different use- cases. RE: technologies being used in lashup larkc, lsp4all, ChemSpider but the next step is to incorporate ConceptWik and bridgeDB technologies The GUIs in the lashup demo were Utopia, Pathvisio, and the generic interface from Lundbeck RE: how larkc was populated loading RDF from source, but we have plans for an automated system that uses semantic site map standards
  • Search by enzyme family SMILES: The S implified M olecular I nput L ine E ntry S pecification (SMILES) a line notation for molecules.
  • DSD – dataset descriptor, probably in VOID.
  • A series of are a key vehicle for building the OPS community and encouraging wider engagement. These OPS Workshops will be hosted twice-annually, and focus on different aspects of drug discovery, the technology used, data sharing, sustainability, licensing and practical applications.

Open PHACTS BioIT World Europe CAG 111013 Open PHACTS BioIT World Europe CAG 111013 Presentation Transcript

  • An Innovative Medicines Initiative to Build a Semantics-Based Open Pharmacology Space for Drug Discovery Twitter: @Open_Phacts Prof Carole Goble FREng FBCS www.openphacts.org BioITWorldExpoEurope – Oct 13 2011
  • Why is it so hard to…. What’s the structure? Are they in our file? Whats similar? Whats the target? Pharmacology data? Known Pathways? Working On Now? Connections to disease? Expressed in right cell type? Competitors? IP?
    • Information Tombs...
    • Internal and external
    • Built to manage content
    • Built to meet primary use-case
    • Tailored indexes
    • Tailored GUIs
    • Unique language & metadata
    • Poor interoperability/integration
    • Proliferation of Powerpoint, Documents, excel, etc.
    • Many suppliers of systems and content in a single workflow
      • ...
    Literature Patents News Pipeline SAR CSRs Safety In vivo Etc View slide
    • Public Domain Chemistry Resources improving
    • NIH Roadmap Initiative
      • Molecular Library Screening Center Network ( (MLSCN)
      • PubChem (structure, bioassay and bioactivity data).
    • DrugBank, ChEBI, ChemBank and ChEMBL, ChemSpider.
    • Databases supporting biology-based Drug Discovery less apparent
    • Rich public domain resources for biology are not Drug Discovery-centric
    • Pharma Companies spend over $50 billion p.a. on R&D
      • How much of this knowledge/information is in the public domain?
      • How much knowledge is tacit? e.g. druggability?
      • How much is truly competitive?
    Public Domain Drug Discovery Sorel Muresan, Peter Varkonyi, Chris Southan 2006 2008 View slide
  • The Information Supply Challenge
    • Typically 1 semantic framework per source
    • Number of vocabularies ∝ Number of suppliers/sources
    • Number of vocabularies ∝ Investment required to optimally exploit
    Sorana Popa | 16 th July 2011 ISMB BioOntologies SIG Vendor 1 Vendor 2 Internal system 3 Compound Pathway Cellular Process Gene/ Protein Anatomy Disease Clinical Obs. Adverse Event
  • [Ian Harrow] SESL Semantic Enrichment of Scientific Literature
    • The Innovative Medicines Initiative
      • EC funded public-private partnership for pharmaceutical research
    • Focus on key problems
      • Efficacy
      • Safety
      • Education & Training
      • Knowledge Management
    www.openphacts.org
    • Open PHACTS: an infrastructure project
    • Develop / apply a set of robust standards …
    • Implementing the standards in a semantic integration platform (“Open Pharmacological Space”)…
    • Delivering services to support on-going drug discovery programs in pharma and public domain
    • Mix ideal with the pragmatic .  Build open that can accommodate non-open components in the real world.
    Guiding principle is open access, open usage, open source - Key to standards adoption - www.openphacts.org
    • 22 partners, 8 pharmaceutical companies, 3 biotechs, Royal Society of Chemistry
    • 36 months project length and 6 months in
    • Pfizer– Coordinator
    • (Bryn Williams-Jones)
    • Universität Wien – Managing entity of IMI JU funding (Gerhard Ecker)
    • Technical University of Denmark
    • (Sören Brunak)
    • University of Hamburg, Center for Bioinformatics
    • (Mattias Rarey)
    • BioSolveIT GmBH
    • (Christian Lemmen)
    • Consorci Mar Parc de Salut de Barcelona
    • (Ferran Sanz)
    • Leiden University Medical Centre
    • (Barend Mons)
    • CTO: Lee Harland
    • Royal Society of Chemistry
    • (Richard Kidd, Antony Williams)
    • Vrije Universiteit Amsterdam
    • (Paul Groth, Frank van Harmelen)
    • Spanish National Cancer Research Centre
    • (Alfonso Valencia)
    • University of Manchester
    • (Carole Goble, Steve Pettifer)
    • Maastricht University
    • (Chris Evelo)
    • AQnowledge
    • (Jan Velterop)
    • University of Santiago de Compostela
    • (Mabel Loza)
    • Rheinische Friedrich-Wilhelms-Universität Bonn
    • (Martin Hofmann-Apitius)
    AstraZeneca (Niklas Blomberg) GlaxoSmithKline (Andrew Leach) Esteve (Mabel Loza) Novartis (Edgar Jacoby) Merck Serono (Thomas Grombacher) H. Lundbeck A/S (Askjaer Sune) Eli Lilly (Hans Constandt)
    • Major Work Streams
    • Build : OPS service layer and resource integration “commons”
    • Drive : Development of exemplars & applications
    • Sustain : Community engagement and long-term sustainability
  • OPS Services
    • Integrate data on target expression, biological pathways and pharmacology to identify the most productive points for therapeutic intervention
    • Investigate the in vitro pharmacology and mode-of-action of novel targets to help develop screening assays for drug discovery
    • Compare molecular interaction profiles to assess potential off-target effects and safety pharmacology
    • Analyse chemical motifs against biological effects to deconvolute high content biology assays
    OPS Semantic Fabric – Linked Data to Linked Knowledge
    • Publishing inform as linked statements that are sufficiently well described that the information can be automatically linked, brokered and processed.
    • Define, index & link across ….
    • Common data identity and entity : IRI and nomenclature/id/entity services
    • Common data structure : RDF and min info models & markup formats
    • Common data meaning: OWL/RDFS/OBO/SKOS and community ontologies
  • Linked Data http://www.linkeddata.org 2011-09-19
    • LinkedLifeData
    • 5 billion triples
    • Chem2Bio2RDF
    • 83 million triples
    • Bio2RDF
    • 30 billion triples
    • SameAs services
    • The Semantic Web
    • & Linked Data
    • Harmonising data sets
    • Id & Syntactic & Semantic
    • Concept vocabulary services
    • Identity resolution services
    • Concept mapping services
    • Mapping identifier services
    Syntactic Normalisation Semantic Normalisation KEGG URI CheBI URI BRENDA URI Open Data Linked Data Linked Knowledge OPS in RDF “ triples” Annotated OPS with ontologies
  • Developers (Builders) End users (Drivers) A use case driven approach Prioritised research questions Prioritised data sources Target dossiers about targets, incorporating related information on sequences, structures, pathways, diseases and small molecules Chem/bio space navigator of sets of pharmacologically annotated small molecules, by chemical substructures, pharmacophores, biological activities Polypharmacology browser map coverage of the chemo-biological space for polypharmacological profiling of small molecules Exemplars
  • Developers (Builders) End users (Drivers) A use case driven approach Target validation work-bench: in silico target validation studies Fusion/aggregation of data from different domains to improve predictions of drug-transporter interactions Combination of physicochemical data & data from transporter interaction for prediction of blood-brain barrier permeation and tissue distribution Prioritised research questions Prioritised data sources Bench mark Pilots
    • Example Research questions
    • Give all compounds with IC50 < xxx for target Y in species W and Z plus assay data
    • What substructures are associated with readout X (target, pathway, disease, …)
    • Give all experimental and clinical data for compound X
    • Give all targets for compound X or a compound with a similarity > y%
    • 73 questions identified across consortium
    www.openphacts.org
    • Prioritised Research Questions Analysis
    • Prevalent Concepts
      • Compound
      • Bioassay
      • Target
      • Pathway
      • Disease
    • Prevalent data relationships
      • Compound – target
      • Compound – bioassay
      • Bioassay – target
      • Compound – target – mode of action
      • Target – target classification
      • Target – pathway
      • Target – disease
      • Pathway - disease
    • Required cheminformatics functionality
      • Chemical substructure searching
      • Chemical similarity searching
    • Required bioinformatics functionality
      • Sequence and similarity searching
      • Bioprofile similarity searching
    www.openphacts.org
    • Selection of prioritised data sources
    • Chemistry
      • ChEMBL
      • DrugBank
      • ChEBI
      • PubChem
      • ChemSpider
      • Human Metabolome DB
      • Wombat (commercial)
    • Ontologies
      • AmiGo (The Gene Ontology)
      • KEGG ( Kyoto Encyclopedia of Genes and Genomes)
      • OBI ( The Ontology for Biomedical Investigations)
      • Bioassay Ontology
      • EFO ( Experimental Factor Ontology)
    • Biology
      • EntrezGene
      • HGNC
      • Uniprot
      • Interpro
      • SCOP
      • Wikipathways
      • OMIM
      • IUPHAR
    www.openphacts.org
  • Chemspider for cross check of chemistry Wombat as show case for integration of commerical db Science-Driven Data set selection Database Webpage Available for No of Ligands Ligands TP Search http://125.206.112.67/tp-search/index.html Human, Mouse, Rat, Rabbit, Pig >5,000 Inhibitor, substrate, inducers ChEMBL https://www.ebi.ac.uk/chembldb/ Human, Mouse, Rat ~ 5,000 Inhibitor, substrate, inducers PharmGKB http://www.pharmgkb.org Human NA NA DrugBank http://www.drugbank.ca/ Human NA NA HMTDSEngine http://digibench.net/ Human NA NA PubchemBioassay http://www.ncbi.nlm.nih.gov/sites/entrez Human 194,393 ABCB1, G2 Inhibitors, others NCI Database http://dtp.nci.nih.gov/ Human ~ 30,000 Substrates/Collatral sensitivity CancerResource http://bioinf-data.charite.de/cancerresource/index.php?site=home Human NA Ligands
    • Produce a working “lash up” system
    • Constrained to technologies in consortium + a few data sources
    • Focused on 2 prioritized research questions (Q15 and Q30)
      • Q 15: All oxidoreductase inhibitors active <100nM in both human and mouse
      • Q 30: For a given compound [clozapine], give me the interaction profile with [human or mouse] targets
    • Minimum requirements: two data sources (one targets, one compounds) and able to produce answers in “manual time”.
      • Brenda, KEGG, PDSP, ChEMBL, Chebi, ENZYME DB, Chem2Bio2RDF
    Agile Development: 6 month “lash up” www.openphacts.org
  • Build a lash up
    • Outcomes of exercise:
    • Team building
    • Performance / scalability analysis
    • Does it provide an adequate answer to the questions 15 and 30?
    • Demo for users (drive group) to recalibrate build tasks in order to better respond to user requirements
    www.openphacts.org
  • Build a lash up
    • Outcomes of exercise:
    • Team building
    • Performance / scalability analysis
    • Does it provide an adequate answer to the questions 15 and 30?
    • Demo for users (drive group) to recalibrate build tasks in order to better respond to user requirements
    www.openphacts.org
  • rdf mapping id mapping concept mapping interface data Sources triple store chemical resolution Chem2Bio2RDF text mining
  • GUI - User suggestions for workflow Select question (“template” from category) Fill in template variables Via “relation browser” and add filters (IC50 value, dates etc) View results, filter and export dataset Select relevant data sources Execute search Modify query (change concepts and attributes) www.openphacts.org
    • “ Lash up” Facts and Figures
    • Total Number of RDF Triples: 62,054,627
    ScaiView (text mined triples) acquired - 4 billion triples LinkedLifeData - 5 billion triples Chem2Bio2RDF - 83 million triples ConceptWiki - 300 million triples Bio2RDF - 30 billion triples Data Source Number of RDF triples Triple Producer BRENDA 345,935 In house PDSP 719,991 In house ENZYME DB 39,540 UniProt KEGG 247,292 Chem2Bio2RDF ChEMBL 57,795,793 Chem2Bio2RDF ChEBI 2,906,076 Chem2Bio2RDF
    • Lash Up
    • Demo
    http://www.youtube.com/OpenPHACTS Play
  • LSP4All (Lundbeck) Generic Interface search by enzyme family Q15: All oxidoreductase inhibitors active <100nMolars in both human & mouse Credit: Sune Askjær / Claus Stie Kallesøe (Lundbeck) Pharmacological data Exact and structure search Navigate from compounds to targets
  • UTOPIA Documents (U Manchester)
  • PathVisio (Maastricht U) Biological data Genes suggestion for selected protein
    • “ Lash Up” Sanity Check
    • Q15: All oxidoreductase inhibitors active <100nM in both human and mouse
    • IC50 values and compounds fully coincident between the automatic and manual search.
    • “ Lash up” identified a compound lost in the manual search (Raloxifene) which value after doing a new manual search was correct.
    • Manual search took 3 days ( Mabel Loza’s team @USC)
    • Automated search took milliseconds (OWLim).
    Representing Text Mining Results for Structured Pharmacological Queries http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/PostersDemos/iswc11pd_submission_19.pdf
    • Onwards and Upwards
    • Connection between developers and users
      • Solidify interfaces for exemplar developers
      • Review lash up for technology, content and exemplars
    • Architecture
    • Services: e.g. entity identification and resolution and representing similarity, ORCID, DataCite
    • Models: RDF / Nanopublication model spec and guidelines
    • Tender documents for commercial storage providers
    • Prototype
    • March 2012: Internal Prototype Delivery
    • September 2012: Release 1 st Prototype
    • Early community adoption
    • Talking to major partners to take part in the project
  • Prototype Architecture
  • Nanopublications Capturing scientific information in the Triple Store
  • OPS Component Stack www.openphacts.org
    • Open Flavours
    • OPS Open - open access to all
    • OPS Consortia - data sets licensed just to the consortia
    • OPS Academia - fully open to academia.
    • “ My OPS”
    • Open Source
    • Open Access Infrastructure . GUI and back-end platform, online at open-phacts.org or download both + data for local setup.
    • Open Services : for example, RSC services.
    • Open Data + Private Data : licensing fun for all the family.
    • Commercial providers: abstract service interface to swap in commercial and open source platforms
    • Focus on different aspects of drug discovery, the technology used, data sharing, sustainability, licensing and practical applications.
    • 1 st Volendam (near Amsterdam) September 19-20, 2011
      • Joint with GEN2PHEN
    • Solving Bottlenecks in Data Sharing in the Life Sciences
    OPS Community Workshops
    • 2 nd Location TBD April 16-17, 2012
  • Pistoia Alliance
    • Pre-competitive Pharma industry body that works to develop and promote standards for software interoperability
    • Active in many areas of Pharma software pipeline, including “Information Ecosystem”.
    • Pistoia project “ Semantic Enrichment of Scientific Literature (SESL) ” directly related to Open PHACTS (see Ian Harrow’s talk yesterday) http://www.pistoia-sesl.org
    • Open PHACTS seen as important to the Pistoia mission
    • Academia-Commercial Venture
    • Focus
      • One area - pharmacology
      • “ Production Level” software
      • Currency/Updates & Licensing key
      • Semantic Pragmatics: everyday use by scientists not informaticians
    • Future
      • An infrastructure that can be built upon, to provide a stable foundation for further pre-competitive informatics collaboration
      • Sustainability
    Developers (Builders) End users (Drivers)
    • Summary
    • Robust standards and techniques
      • Solid integration between data sources via semantic technologies
      • Development of high quality assertions
      • Workflows and analysis pipelines across resources
    • A semantic integration hub (“Open Pharmacological Space”)
      • Open, public domain infrastructure for drug discovery data integration
      • Open web-services for drug discovery
      • Secure access model to enable queries with proprietary data (pharma, SME, NGO and PPP)
    • Deliver services
      • To support on-going drug discovery programs in pharma and public domain
      • Align development of standards, vocabs and data integration to selected drug discovery issues
    Guiding principle is open access, open usage, open source - Key to standards adoption -
  • http://www.openphacts.org Twitter: @Open_Phacts