0
Open innovation and chemistry
data management contributions
from RSC resulting from the
Open PHACTS project
Antony William...
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
Wh...
Fundamental issue:
•There is a LOT of science online!
•Chaotic, varying quality and very valuable!
•Scientists want to fin...
Pre-competitive Informatics:
Pharma are all accessing, processing, storing & re-processing external research data
Literatu...
ChEMBLChEMBL DrugBankDrugBank
Gene
Ontology
Gene
Ontology
WikipathwaysWikipathways
UniProtUniProt
ChemSpiderChemSpider
UML...
Business Question Driven Approach
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
semantic web technologies
...
The Open PHACTS community ecosystem
Originally used ChemSpider..
Open PHACTS Deliverables
• Many details but overall…
• Deliver an Open Source chemical registry
service, independent of Ch...
Standardize
• Use the SRS as guidance for standardization
• Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
Depositions Gateway User
Interface
Validate and Standardize
CVSP Filtering
CVSP Filtering of DrugBank
ChEMBL (1.3 million records)
• 11,020 records with 4 bonds and zero charge,
e.g. CHEMBL501101 or CHEMBL501973
• 271 record...
OPS1
DrugBank ID DB07241
OPS5OPS4
OPS3
OPS2
OPS6
ops:OPS1 skos:exactMatch
<http://www4.wiwiss.fu-
berlin.de/drugbank/resou...
Open Sourcing Data and Code
• All Open PHACTS data is licensed as Open
Data and available from Open PHACTS
website – ca. 2...
RSC data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between ChEBI, ChEMBL, DrugBank
and OPS identifie...
Our RDF schema
Two dozen calculated properties >106
molecules
•CHEMINF ontology for cheminformatics
•QUDT for units and nu...
Synonyms and identifiers
Newly added to the CHEMINF ontology:
•Validated ChemSpider synonyms
•Unvalidated ChemSpider synon...
Physicochemical properties
• log P
• log D (at pH 5.5 and 7.4)
• bioconcentration factor KOC (at pH 5.5, at pH 7.4)
• inde...
RDF exports from CRS
benzene’s
connection table
OPS
benzene
calculation result
QUDT
dimensionless
quantity
“2.17”^^xsd:float
IAO
is about
OBI
h...
What’s built on top of this?
Important for other projects
• Multiple outputs from the project available for
reuse to underpin other projects:
• Chemica...
New Repository Architecture
doi: 10.1007/s10822-014-9784-5
New Repository Architecture
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materi...
Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
M...
Compounds
Reactions
Analytical data
For Deposition of Data
• Quality of data at source
• ensuring chemicals are correct - VALIDATION
• reactions map and balan...
Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
M...
Deposition of Data
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
...
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
...
Work in Progress
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
...
A Compounds Repository Interface
The PharmaSea Website
The Open PHACTS community ecosystem
Open PHACTS Project Partners
Pfizer Limited – Coordinator
Universität Wien – Managing entity
Technical University of Denma...
Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com...
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
Upcoming SlideShare
Loading in...5
×

Open innovation contributions from RSC resulting from the Open Phacts project

1,389

Published on

The Royal Society of Chemistry was pleased to contribute to the Open PHACTS project, a 3 year project funded by the Innovative Medicines Initiative fund from the European Union. For three years we developed our existing platforms, created new and innovative widgets and data platforms to handle chemistry data, extended existing chemistry ontologies and embraced the semantic web open standards. As a result RSC served as the centralized chemistry data hub for the project. With the conclusion of the Open PHACTS project we will report on our experiences resulting from our participation in the project and provide an overview of what tools, capabilities and data have been released into the community as a result of our participation and how this may influence future projects. This will include the Open PHACTS open chemistry data dump including the chemistry related data in chemistry and semantic web consumable formats as well as some of the resulting chemistry software released to the community. The Open PHACTS project resulted in significant contributions to the chemistry community as well as the supporting pharmaceutical companies and biomedical community.

Published in: Science
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,389
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Mx/psa, how calculated who did it?
    Mash up. With your data too,
    - top layer join together but need them all
    commerical
  • 10
    Can go get everything
    OPS not a repo of the world, specific sources
  • Transcript of "Open innovation contributions from RSC resulting from the Open Phacts project"

    1. 1. Open innovation and chemistry data management contributions from RSC resulting from the Open PHACTS project Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov, Colin Batchelor, Jon Steele & David Sharpe ACS San Francisco August 2014
    2. 2. What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
    3. 3. Fundamental issue: •There is a LOT of science online! •Chaotic, varying quality and very valuable! •Scientists want to find information quickly and easily •Often they just “can’t get there” (or don’t even know where “there” is) •And you have to manage it all (or not)
    4. 4. Pre-competitive Informatics: Pharma are all accessing, processing, storing & re-processing external research data Literature PubChem Genbank Patents Databases Downloads Data Integration Data Analysis Firewalled Databases Repeat @ each company x Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
    5. 5. ChEMBLChEMBL DrugBankDrugBank Gene Ontology Gene Ontology WikipathwaysWikipathways UniProtUniProt ChemSpiderChemSpider UMLSUMLS ConceptWikiConceptWiki ChEBIChEBI TrialTroveTrialTrove GVKBioGVKBio GeneGoGeneGo TR IntegrityTR Integrity “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “What is the selectivity profile of known p38 inhibitors?” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
    6. 6. Business Question Driven Approach
    7. 7. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
    8. 8. The Open PHACTS community ecosystem
    9. 9. Originally used ChemSpider..
    10. 10. Open PHACTS Deliverables • Many details but overall… • Deliver an Open Source chemical registry service, independent of ChemSpider • Development of Open Source CVSP platform • Deliver widgets and APIs to the project • Deliver high quality, standardized Open Data • Deliver structure data in RDF format
    11. 11. Standardize • Use the SRS as guidance for standardization • Adjust as necessary to our needs
    12. 12. Nitro groups
    13. 13. Salt and Ionic Bonds
    14. 14. Depositions Gateway User Interface
    15. 15. Validate and Standardize
    16. 16. CVSP Filtering
    17. 17. CVSP Filtering of DrugBank
    18. 18. ChEMBL (1.3 million records) • 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973 • 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine • 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704
    19. 19. OPS1 DrugBank ID DB07241 OPS5OPS4 OPS3 OPS2 OPS6 ops:OPS1 skos:exactMatch <http://www4.wiwiss.fu- berlin.de/drugbank/resource/drugs/DB07241> . ops:OPS2 skos:relatedMatch ops:OPS1 . ops:OPS3 skos:relatedMatch ops:OPS1 . ops:OPS3 skos:closeMatch ops:OPS4 . ops:OPS3 skos:closeMatch ops:OPS5 . ops:OPS4 skos:closeMatch ops:OPS6 . ops:OPS5 skos:closeMatch ops:OPS6 . Chemical Registry Service
    20. 20. Open Sourcing Data and Code • All Open PHACTS data is licensed as Open Data and available from Open PHACTS website – ca. 2 Million chemicals • The Chemical Registration Service, including Chemical Validation and Standardization Platform preparing as Open Source now!
    21. 21. RSC data in Open PHACTS 1. Molecule synonyms and identifiers 2. Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3. Molecule–molecule relations (“parent–child”) of interest for drug discovery 4. Calculated physicochemical properties for compounds (both molecular and macroscopic)
    22. 22. Our RDF schema Two dozen calculated properties >106 molecules •CHEMINF ontology for cheminformatics •QUDT for units and numeric values •ChemSpider IDs for molecules
    23. 23. Synonyms and identifiers Newly added to the CHEMINF ontology: •Validated ChemSpider synonyms •Unvalidated ChemSpider synonyms •Validated database identifiers •Unvalidated database identifiers •InChI, InChIKey, SMILES •Preferred ChemSpider name
    24. 24. Physicochemical properties • log P • log D (at pH 5.5 and 7.4) • bioconcentration factor KOC (at pH 5.5, at pH 7.4) • index of refraction • polar surface area • molar refractivity • molar volume • Polarizability • surface tension • density at STP • flash point at 1 atm • boiling point at 1 atm • enthalpy of vaporization at STP • vapour pressure at STP
    25. 25. RDF exports from CRS
    26. 26. benzene’s connection table OPS benzene calculation result QUDT dimensionless quantity “2.17”^^xsd:float IAO is about OBI has specified output OBI has specified input QUDT has value QUDT has standard uncertainty QUDT has unit CHEMINF calculated log P rdf:type CHEMINF connection table rdf:type “0.234”^^xsd:float calculation process CHEMINF execution of ACD/Labs PhysChem software library version 12.01 rdf:type It is actually more complicated..
    27. 27. What’s built on top of this?
    28. 28. Important for other projects • Multiple outputs from the project available for reuse to underpin other projects: • Chemical registry service • Chemical validation and standardization • APIs and visualization widgets
    29. 29. New Repository Architecture doi: 10.1007/s10822-014-9784-5
    30. 30. New Repository Architecture Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
    31. 31. Input data pipeline Deposition Gateway Staging databases Compounds Reactions Spectra Materials Articles / CSSP Compounds Module Spectra Module Reactions Module Materials Module Textmining Module ͙ Module Web UI for unified depositions DropBox, Google Drive, SkyDrive, etc LabTroveand other templated data Documents API, FTP, etc Raw data Validated data Staging databases Alldatabases are sliced by data sources/data collections and havesimple security model where each data slice/sourceis private, public or embargoed
    32. 32. Compounds
    33. 33. Reactions
    34. 34. Analytical data
    35. 35. For Deposition of Data • Quality of data at source • ensuring chemicals are correct - VALIDATION • reactions map and balance as appropriate – VALIDATION and STANDARDIZATION • file format handling for analytical data types – binary file formats are proprietary - STANDARDIZATION • valid interpretation of data – VALIDATION and ANNOTATION
    36. 36. Input data pipeline Deposition Gateway Staging databases Compounds Reactions Spectra Materials Articles / CSSP Compounds Module Spectra Module Reactions Module Materials Module Textmining Module ͙ Module Web UI for unified depositions DropBox, Google Drive, SkyDrive, etc LabTroveand other templated data Documents API, FTP, etc Raw data Validated data Staging databases Alldatabases are sliced by data sources/data collections and havesimple security model where each data slice/sourceis private, public or embargoed
    37. 37. Deposition of Data
    38. 38. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
    39. 39. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
    40. 40. Work in Progress
    41. 41. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
    42. 42. A Compounds Repository Interface
    43. 43. The PharmaSea Website
    44. 44. The Open PHACTS community ecosystem
    45. 45. Open PHACTS Project Partners Pfizer Limited – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen OpenLink
    46. 46. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×