Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)

224 views

Published on

Overview of Open PHACTS, the BDE Pilot project in SC1, presented at BDE SC1 Workshop 3, 13 December, 2017.

https://www.big-data-europe.eu/the-final-big-data-europe-workshop/

Published in: Healthcare
  • Be the first to comment

  • Be the first to like this

BDE SC1 Workshop 3 - Open PHACTS Pilot (Kiera McNeice)

  1. 1. BIG DATA EUROPE SC1 PILOT The Open PHACTS Discovery Platform Kiera McNeice, Open PHACTS Foundation13 Dec 2017
  2. 2. Big Data Europe Objectives  Build foundational Big Data infrastructure that: o Is open source o Makes it simple to get started with Big Data o Supports a variety of use cases o Embraces emerging Big Data technologies o Enables simple integration with custom components
  3. 3. The SC1 Pilot: Open PHACTS
  4. 4. Drug discovery using public data Literature PubChem Genbank Patents Databases Downloads Data Integration Data Analysis Firewalled Databases
  5. 5. The situation in 2010… GSK Pfizer AstraZeneca Roche Novartis Merck-Serono Janssen
  6. 6. Challenges: Identifiers Andy Law’s third law:  The number of unique identifiers assigned to an individual is never less than the number of institutions involved in the study P12047 X31045 GB:29384 http://bioinformatics.roslin.ac.uk/lawslaws/
  7. 7. Everyone loves standards… …that’s why we have so many of them! https://xkcd.com/927/
  8. 8. Semantic linking (RDF) Link and store data as semantic “triples”: [Compound] acts on [Target] Subject Predicate Object––
  9. 9. Focus on researcher needs ChEMBL DrugBank Gene Ontology Wikipathways UniProt ChemSpider UMLS ConceptWiki ChEBI TrialTrove GVKBio GeneGo TR Integrity “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “What is the selectivity profile of known p38 inhibitors?” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors” DisGeNet neXtProt ChEMBL Target Class ENZYME FDA adverse events SureChEMBL
  10. 10. Ranked research questions Number sum Nr of 1 Question 15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse 18 14 8 Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound? 24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives. 32 13 8 For a given interaction profile, give me compounds similar to it. 37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X. 38 13 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not). 41 13 8 A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the
  11. 11. The Platform
  12. 12. Using Open PHACTS
  13. 13. Accessing the data: API https://dev.openphacts.org/
  14. 14. Accessing the data: Workflow tools
  15. 15. Example workflow  Q10: For a given compound, summarise all similar compounds and their activities  CC1=C(C(C(=C(N 1)C)C(=O)OC)C2= CC=CC=C2[N+](= O)[O-])C(=O)OC
  16. 16. Example workflow: KNIME
  17. 17. Example workflow: Heatmap
  18. 18. Benefits of Open PHACTS  Efficiency: Queries that once took days can now be done in less than an hour  Novelty: Semantically integrated databases allow for completely new ways of analysing the data  Cost: Sharing cost and effort in a precompetitive project saved “millions” “Integration of different databases is difficult, costly, and time consuming, and probably would not have been done at this level of quality without Open
  19. 19. Can it be recreated in the BDI?
  20. 20. Open PHACTS architecture Nanopub Db VoID Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Indexing CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations Apps Yes!
  21. 21. Open PHACTS in BDE  Able to exchange Virtuoso triple store for 4Store  18 of 21 original research questions answered o (Remaining 3 required patent data which is not open)  IMS implemented as independent docker module  Local installation runs much faster than original platform!
  22. 22. Local hardware requirements Hardware:  150GB of disk space (ideal: 250GB)  16GB of RAM (ideal: 128GB)  4 CPU core (ideal: 8 cores) Prerequisites:  Recent x64 Linux (Ubuntu 14.04 LTS, Centos 7)  Docker and Docker Compose  Fast Internet connection https://github.com/openphacts/ops-docker https://data.openphacts.org/
  23. 23. Advantages of rebuilding with BDI  Integration into a wider platform  Flexibility, scalability, extensibility  Local installation of the entire Open PHACTS infrastructure!
  24. 24. Looking forwards
  25. 25. What’s next?  Refresh of all data sources  Identify new data sources o What’s your big data with health problem?
  26. 26. kiera@openphactsfoundation.or g Thank you!

×