RSC CICAG 28 Nov 2011 Open PHACTS Kidd
Upcoming SlideShare
Loading in...5
×
 

RSC CICAG 28 Nov 2011 Open PHACTS Kidd

on

  • 1,727 views

Given at "The Future Information Needs of Pharmaceutical & Medicinal Chemistry",

Given at "The Future Information Needs of Pharmaceutical & Medicinal Chemistry",
Monday 28 November 2011 at The Linnean Society, Burlington Square, London run by the RSC CICAG group

Statistics

Views

Total Views
1,727
Views on SlideShare
1,519
Embed Views
208

Actions

Likes
2
Downloads
18
Comments
0

3 Embeds 208

http://openbiomed.info 205
http://paper.li 2
http://admin.totalmarketing.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • -
  • Services sit on top of semantic fabric
  • Chem-Bio Navigator: querying and visualization of sets of pharmacologically annotated small molecules, on basis of chemical substructures, pharmacophores, biological activities Target Dossier: in silico dossiers about targets, incorporating related information on sequences, structures, pathways, diseases and small molecules Heavy on the text mining Polypharmacology Browser: map coverage of the chemo-biological space, to facilitate the polypharmacological profiling of small molecules Polypharmacology – drug hits many targets Main architecture, technical implementation and primary capabilities driven by a set of prioritised research questions Based on the main research questions define prioritised data sources Three Exemplars will be developed to demonstrate the capabilites of the OPS System and to define interfaces and input/output standards Three Use cases have been defined to benchmark the OPS system towards current standard workflows in data retrieval and mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results
  • Data retrieval and data/text mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results Main architecture, technical implementation and primary capabilities driven by a set of prioritised research questions Based on the main research questions define prioritised data sources Three Exemplars will be developed to demonstrate the capabilites of the OPS System and to define interfaces and input/output standards Three Use cases have been defined to benchmark the OPS system towards current standard workflows in data retrieval and mining The Apps must provide answers to relevant research questions Interrogation model GUI/interactivity Presentation of results
  • ( http://bioassayontology.org/)
  • Fraunhofer SCAI: SCAIView Sparql interface to LarKC backend PDSP receptor database - The grey box is a set of Web Services API that provide nicer interfaces for GUI developers. Currently, these did not get implemented. Instead, we are issuing sparql queries directly from the various GUIs. - The set of purple boxes circled in red - are a set of facilities for allowing us to track provenance, curate data that has been aggregated in the cache, and do aggregated quality measures. These ended up *not* being in the lashup do to time. All these are allowing us to give feedback about the quality of the integrated data. (b) The red words are correct.  I would say we don't do data mapping. We translate or get rdf versions of the data. The mapping is performed at runtime based on mappings. The updated architecture diagram with lessons learned from the lashup (Slide 17 in the bootstrapping development slides). Maybe you can show before and after? RE: Scalability (from a paper on the Larkc platform) LarKC aims to be the platform to address these issues, and is built on the following principles: • Achieve scalability through parallelisation. Different possibilities are offered either through tight integration of parallel processes on cluster-style hardware, or through much looser coupled wide-area distributed computing. • Achieve scalability through giving up completeness. Partial reasoning results are useful in many domains of application. Significant speedups and can be obtained by incompleteness in many stages of the reasoning process, ranging from selection of the axioms to incomplete reasoning over those axioms. • Do not build a single reasoning engine that is supposed to be suited for all kinds of use-cases, but instead build a configurable platform on which different components can be plugged in to obtain different scale/efficiency trade-offs, as required by different use- cases. RE: technologies being used in lashup larkc, lsp4all, ChemSpider but the next step is to incorporate ConceptWik and bridgeDB technologies The GUIs in the lashup demo were Utopia, Pathvisio, and the generic interface from Lundbeck RE: how larkc was populated loading RDF from source, but we have plans for an automated system that uses semantic site map standards
  • Search by enzyme family SMILES: The S implified M olecular I nput L ine E ntry S pecification (SMILES) a line notation for molecules.
  • A series of are a key vehicle for building the OPS community and encouraging wider engagement. These OPS Workshops will be hosted twice-annually, and focus on different aspects of drug discovery, the technology used, data sharing, sustainability, licensing and practical applications.

RSC CICAG 28 Nov 2011 Open PHACTS Kidd RSC CICAG 28 Nov 2011 Open PHACTS Kidd Presentation Transcript

  • Pharmacological information discovery with Open PHACTS www.openphacts.org
    • The Innovative Medicines Initiative
      • EC funded public-private partnership for pharmaceutical research
    • Focus on key problems
      • Efficacy
      • Safety
      • Education & Training
      • Knowledge Management
    www.openphacts.org
    • 22 partners, 8 pharmaceutical companies, 3 biotechs, RSC
    • 36 months project length and 8 months in
    • Pfizer– Coordinator
    • (Bryn Williams-Jones)
    • Universität Wien – Managing entity of IMI JU funding (Gerhard Ecker)
    • Technical University of Denmark
    • (Sören Brunak)
    • University of Hamburg, Center for Bioinformatics
    • (Mattias Rarey)
    • BioSolveIT GmBH
    • (Christian Lemmen)
    • Consorci Mar Parc de Salut de Barcelona
    • (Ferran Sanz)
    • Leiden University Medical Centre
    • (Barend Mons)
    • CTO: Lee Harland
    • Royal Society of Chemistry
    • (Richard Kidd, Antony Williams)
    • Vrije Universiteit Amsterdam
    • (Paul Groth, Frank van Harmelen)
    • Spanish National Cancer Research Centre
    • (Alfonso Valencia)
    • University of Manchester
    • (Carole Goble, Steve Pettifer)
    • Maastricht University
    • (Chris Evelo)
    • AQnowledge
    • (Jan Velterop)
    • University of Santiago de Compostela
    • (Mabel Loza)
    • Rheinische Friedrich-Wilhelms-Universität Bonn
    • (Martin Hofmann-Apitius)
    AstraZeneca (Niklas Blomberg) GlaxoSmithKline (Andrew Leach) Esteve (Leo Salgado) Novartis (Edgar Jacoby) Merck Serono (Thomas Grombacher) H. Lundbeck A/S (Askjaer Sune) Eli Lilly (Hans Constandt)
    • Information Tombs...
    • Internal and external
    • Built to manage content
    • Built to meet primary use-case
    • Tailored indexes, GUIs
    • Unique language & metadata
    • Poor interoperability/integration
    • Proliferation of Powerpoint, documents, excel, etc.
    • Many suppliers of systems and content in a single workflow
      • ...
    Literature Patents News Pipeline SAR Safety etc In vivo
  • Pharma are accessing, processing, storing & re-processing Public Domain Drug Discovery Data
    • Open PHACTS: an infrastructure project
    • Develop / apply a set of robust standards …
    • Implementing the standards in a semantic integration platform (“Open Pharmacological Space”)…
    • Delivering services to support on-going drug discovery programs in pharma and public domain
    • Mix ideal with the pragmatic .  Build open that can accommodate non-open components in the real world.
    Guiding principle is open access, open usage, open source - Key to standards adoption - www.openphacts.org
    • Major Work Streams
    • Build : OPS service layer and resource integration “commons”
    • Drive : Development of exemplars & applications
    • Sustain : Community engagement and long-term sustainability
  • OPS Services
    • Integrate data on target expression, biological pathways and pharmacology to identify the most productive points for therapeutic intervention
    • Investigate the in vitro pharmacology and mode-of-action of novel targets to help develop screening assays for drug discovery
    • Compare molecular interaction profiles to assess potential off-target effects and safety pharmacology
    • Analyse chemical motifs against biological effects to deconvolute high content biology assays
    OPS Semantic Fabric – Linked Data to Linked Knowledge
    • Publishing information as linked statements that are sufficiently well described that the information can be automatically linked, brokered and processed.
    • Define, index & link across ….
    • Common data identity and entity, data structure, data meaning
    • The Semantic Web
    • & Linked Data
    • Harmonising data sets
    • Id & Syntactic & Semantic
    • Concept vocabulary services
    • Identity resolution services
    • Concept mapping services
    • Mapping identifier services
    Syntactic Normalisation Semantic Normalisation KEGG URI CheBI URI BRENDA URI Open Data Linked Data Linked Knowledge OPS in RDF “ triples” Annotated OPS with ontologies
  • Developers (Builders) End users (Drivers) A use case driven approach Prioritised research questions Prioritised data sources Target dossiers about targets, incorporating related information on sequences, structures, pathways, diseases and small molecules Chem/bio space navigator of sets of pharmacologically annotated small molecules, by chemical substructures, pharmacophores, biological activities Polypharmacology browser map coverage of the chemo-biological space for polypharmacological profiling of small molecules Exemplars
  • Developers (Builders) End users (Drivers) A use case driven approach Target validation work-bench: in silico target validation studies Fusion/aggregation of data from different domains to improve predictions of drug-transporter interactions Combination of physicochemical data & data from transporter interaction for prediction of blood-brain barrier permeation and tissue distribution Prioritised research questions Prioritised data sources Bench mark Pilots
    • Example Research questions
    • Give all compounds with IC50 < xxx for target Y in species W and Z plus assay data
    • What substructures are associated with readout X (target, pathway, disease, …)
    • Give all experimental and clinical data for compound X
    • Give all targets for compound X or a compound with a similarity > y%
    • 73 questions identified across consortium
    www.openphacts.org
    • Prioritised Research Questions Analysis
    • Prevalent Concepts
      • Compound
      • Bioassay
      • Target
      • Pathway
      • Disease
    • Prevalent data relationships
      • Compound – target
      • Compound – bioassay
      • Bioassay – target
      • Compound – target – mode of action
      • Target – target classification
      • Target – pathway
      • Target – disease
      • Pathway - disease
    • Required cheminformatics functionality
      • Chemical substructure searching
      • Chemical similarity searching
    • Required bioinformatics functionality
      • Sequence and similarity searching
      • Bioprofile similarity searching
    www.openphacts.org
    • Selection of prioritised data sources
    • Chemistry
      • ChEMBL
      • DrugBank
      • ChEBI
      • PubChem
      • ChemSpider
      • Human Metabolome DB
      • Wombat (commercial)
    • Ontologies
      • AmiGo (The Gene Ontology)
      • KEGG ( Kyoto Encyclopedia of Genes and Genomes)
      • OBI ( The Ontology for Biomedical Investigations)
      • Bioassay Ontology
      • EFO ( Experimental Factor Ontology)
    • Biology
      • EntrezGene
      • HGNC
      • Uniprot
      • Interpro
      • SCOP
      • Wikipathways
      • OMIM
      • IUPHAR
    www.openphacts.org
    • Produce a working “lash up” system
    • Constrained to technologies in consortium + a few data sources
    • Focused on 2 prioritized research questions (Q15 and Q30)
      • Q 15: All oxidoreductase inhibitors active <100nM in both human and mouse
      • Q 30: For a given compound [clozapine], give me the interaction profile with [human or mouse] targets
    • Minimum requirements: two data sources (one targets, one compounds) and able to produce answers in “manual time”.
      • Brenda, KEGG, PDSP, ChEMBL, ChEBI, ENZYME DB, Chem2Bio2RDF
    Agile Development: 6 month “lash up” www.openphacts.org
  • Build a lash up
    • Outcomes of exercise:
    • Team building
    • Performance / scalability analysis
    • Does it provide an adequate answer to the questions 15 and 30?
    • Demo for users (drive group) to recalibrate build tasks in order to better respond to user requirements
    www.openphacts.org
  • Build a lash up
    • Outcomes of exercise:
    • Team building
    • Performance / scalability analysis
    • Does it provide an adequate answer to the questions 15 and 30?
    • Demo for users (drive group) to recalibrate build tasks in order to better respond to user requirements
    www.openphacts.org
  • rdf mapping id mapping concept mapping interface data Sources triple store chemical resolution Chem2Bio2RDF text mining
  • GUI - User suggestions for workflow Select question (“template” from category) Fill in template variables Via “relation browser” and add filters (IC50 value, dates etc) View results, filter and export dataset Select relevant data sources Execute search Modify query (change concepts and attributes) www.openphacts.org
    • “ Lash up” Facts and Figures
    • Total Number of RDF Triples: 62,054,627
    ScaiView (text mined triples) acquired - 4 billion triples LinkedLifeData - 5 billion triples Chem2Bio2RDF - 83 million triples ConceptWiki - 300 million triples Bio2RDF - 30 billion triples Data Source Number of RDF triples Triple Producer BRENDA 345,935 In house PDSP 719,991 In house ENZYME DB 39,540 UniProt KEGG 247,292 Chem2Bio2RDF ChEMBL 57,795,793 Chem2Bio2RDF ChEBI 2,906,076 Chem2Bio2RDF
    • Lash Up
    • Demo
    http://www.youtube.com/OpenPHACTS
  • LSP4All (Lundbeck) Generic Interface search by enzyme family Q15: All oxidoreductase inhibitors active <100nMolars in both human & mouse Credit: Sune Askjær / Claus Stie Kallesøe (Lundbeck) Pharmacological data Exact and structure search Navigate from compounds to targets
  • UTOPIA Documents (U Manchester)
  • PathVisio (Maastricht U) Biological data Genes suggestion for selected protein
    • “ Lash Up” Sanity Check
    • Q15: All oxidoreductase inhibitors active <100nM in both human and mouse
    • IC50 values and compounds fully coincident between the automatic and manual search.
    • “ Lash up” identified a compound lost in the manual search (Raloxifene) which value after doing a new manual search was correct.
    • Manual search took 3 days ( Mabel Loza’s team @USC)
    • Automated search took milliseconds.
    Representing Text Mining Results for Structured Pharmacological Queries http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/PostersDemos/iswc11pd_submission_19.pdf
    • Onwards and Upwards
    • Connection between developers and users
      • Solidify interfaces for exemplar developers
      • Review lash up for technology, content and exemplars
    • Architecture
    • Services: e.g. entity identification and resolution and representing similarity, ORCID, DataCite
    • Models: RDF / Nanopublication model spec and guidelines
    • Tender documents for commercial storage providers
    • Prototype
    • March 2012: Internal Prototype Delivery
    • September 2012: Release 1 st Prototype
    • Early community adoption
    • Talking to major partners to take part in the project
  • Nanopublications Capturing scientific information in the Triple Store
  • OPS Component Stack www.openphacts.org
    • Open Flavours
    • OPS Open - open access to all
    • OPS Consortia - data sets licensed just to the consortia
    • OPS Academia - fully open to academia.
    • “ My OPS”
    • Open Source
    • Open Access Infrastructure . GUI and back-end platform, online at openphacts.org or download both + data for local setup.
    • Open Services : for example, RSC services.
    • Open Data + Private Data : licensing fun for all the family.
    • Commercial providers: abstract service interface to swap in commercial and open source platforms
    • Focus on different aspects of drug discovery, the technology used, data sharing, sustainability, licensing and practical applications.
    • 1 st Volendam (near Amsterdam) September 19-20, 2011
      • Joint with GEN2PHEN
    • Solving Bottlenecks in Data Sharing in the Life Sciences
    OPS Community Workshops
    • 2 nd Location TBD April 16-17, 2012
  • Pistoia Alliance
    • Pre-competitive Pharma industry body that works to develop and promote standards for software interoperability
    • Active in many areas of Pharma software pipeline, including “Information Ecosystem”.
    • Pistoia project “ Semantic Enrichment of Scientific Literature (SESL) ” directly related to Open PHACTS http://www.pistoia-sesl.org
    • Open PHACTS seen as important to the Pistoia mission
    • Academia-Commercial Venture
    • Focus
      • One area - pharmacology
      • “ Production Level” software
      • Currency/Updates & Licensing key
      • Semantic Pragmatics: everyday use by scientists not informaticians
    • Future
      • An infrastructure that can be built upon, to provide a stable foundation for further pre-competitive informatics collaboration
      • Sustainability
    Developers (Builders) End users (Drivers)
    • Outside interest?
    • Infrastructure
      • Reduce delivery and platform costs
      • RDF delivery to customer
      • Data and database preservation
    • Sustainability
      • By number of users
      • By number of data providers
      • By service providers
      • To maintain after project ends
    • Summary
    • Robust standards and techniques
      • Solid integration between data sources via semantic technologies
      • Development of high quality assertions
      • Workflows and analysis pipelines across resources
    • A semantic integration hub (“Open Pharmacological Space”)
      • Open, public domain infrastructure for drug discovery data integration
      • Open web-services for drug discovery
      • Secure access model to enable queries with proprietary data (pharma, SME, NGO and PPP)
    • Deliver services
      • To support on-going drug discovery programs in pharma and public domain
      • Align development of standards, vocabs and data integration to selected drug discovery issues
    Guiding principle is open access, open usage, open source - Key to standards adoption -
  • http://www.openphacts.org slideshare.com/open_phacts @Open_Phacts Thanks to Carole Goble, Lee Harland, Antony Williams