Open innovation contributions from RSC resulting from the Open Phacts project

US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure Scientist at National Center of Computational Toxicology at EPA
Open innovation and chemistry
data management contributions
from RSC resulting from the
Open PHACTS project
Antony Williams, Valery Tkachenko, Ken
Karapetyan, Alexey Pshenichnov, Colin
Batchelor, Jon Steele & David Sharpe
ACS San Francisco
August 2014
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections
to disease?
Connections
to disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Fundamental issue:
•There is a LOT of science online!
•Chaotic, varying quality and very valuable!
•Scientists want to find information quickly and
easily
•Often they just “can’t get there” (or don’t even
know where “there” is)
•And you have to manage it all (or not)
Pre-competitive Informatics:
Pharma are all accessing, processing, storing & re-processing external research data
Literature
PubChem
Genbank
Patents
Databases
Downloads
Data Integration Data Analysis
Firewalled Databases
Repeat @
each
company
x
Lowering industry firewalls: pre-competitive informatics in drug discovery
Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
ChEMBLChEMBL DrugBankDrugBank
Gene
Ontology
Gene
Ontology
WikipathwaysWikipathways
UniProtUniProt
ChemSpiderChemSpider
UMLSUMLS
ConceptWikiConceptWiki
ChEBIChEBI
TrialTroveTrialTrove
GVKBioGVKBio
GeneGoGeneGo
TR IntegrityTR Integrity
“Find me compounds
that inhibit targets in
NFkB pathway assayed
in only functional assays
with a potency <1 μM”
“What is the
selectivity profile of
known p38 inhibitors?”
“Let me compare
MW, logP and PSA
for known
oxidoreductase
inhibitors”
Business Question Driven Approach
• 3-year Innovative Medicines Initiative project
• Integrating chemistry and biology data using
semantic web technologies
• Open source code, open data and open
standards
• Academics, Pharmas, Publishers…
• To put medicines in the pipeline…
The Open PHACTS community ecosystem
Originally used ChemSpider..
Open PHACTS Deliverables
• Many details but overall…
• Deliver an Open Source chemical registry
service, independent of ChemSpider
• Development of Open Source CVSP platform
• Deliver widgets and APIs to the project
• Deliver high quality, standardized Open Data
• Deliver structure data in RDF format
Standardize
• Use the SRS as guidance for standardization
• Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
Depositions Gateway User
Interface
Validate and Standardize
CVSP Filtering
CVSP Filtering of DrugBank
ChEMBL (1.3 million records)
• 11,020 records with 4 bonds and zero charge,
e.g. CHEMBL501101 or CHEMBL501973
• 271 records with hypervalent oxygen (e.g. ,
CHEMBL2219679), carbon (e.g. 1005895),
boron, chlorine, iodine or phosphine
• 6,177 records where direction of bond makes
no sense, e.g. CHEMBL12760 and
CHEMBL34704
OPS1
DrugBank ID DB07241
OPS5OPS4
OPS3
OPS2
OPS6
ops:OPS1 skos:exactMatch
<http://www4.wiwiss.fu-
berlin.de/drugbank/resource/drugs/DB07241> .
ops:OPS2 skos:relatedMatch ops:OPS1 .
ops:OPS3 skos:relatedMatch ops:OPS1 .
ops:OPS3 skos:closeMatch ops:OPS4 .
ops:OPS3 skos:closeMatch ops:OPS5 .
ops:OPS4 skos:closeMatch ops:OPS6 .
ops:OPS5 skos:closeMatch ops:OPS6 .
Chemical Registry Service
Open Sourcing Data and Code
• All Open PHACTS data is licensed as Open
Data and available from Open PHACTS
website – ca. 2 Million chemicals
• The Chemical Registration Service, including
Chemical Validation and Standardization
Platform preparing as Open Source now!
RSC data in Open PHACTS
1. Molecule synonyms and identifiers
2. Linksets between ChEBI, ChEMBL, DrugBank
and OPS identifiers
3. Molecule–molecule relations (“parent–child”) of
interest for drug discovery
4. Calculated physicochemical properties for
compounds (both molecular and macroscopic)
Our RDF schema
Two dozen calculated properties >106
molecules
•CHEMINF ontology for cheminformatics
•QUDT for units and numeric values
•ChemSpider IDs for molecules
Synonyms and identifiers
Newly added to the CHEMINF ontology:
•Validated ChemSpider synonyms
•Unvalidated ChemSpider synonyms
•Validated database identifiers
•Unvalidated database identifiers
•InChI, InChIKey, SMILES
•Preferred ChemSpider name
Physicochemical properties
• log P
• log D (at pH 5.5 and 7.4)
• bioconcentration factor KOC (at pH 5.5, at pH 7.4)
• index of refraction
• polar surface area
• molar refractivity
• molar volume
• Polarizability
• surface tension
• density at STP
• flash point at 1 atm
• boiling point at 1 atm
• enthalpy of vaporization at STP
• vapour pressure at STP
RDF exports from CRS
benzene’s
connection table
OPS
benzene
calculation result
QUDT
dimensionless
quantity
“2.17”^^xsd:float
IAO
is about
OBI
has specified
output
OBI
has specified
input
QUDT
has value
QUDT
has standard
uncertainty
QUDT
has unit
CHEMINF
calculated log P
rdf:type
CHEMINF
connection table
rdf:type
“0.234”^^xsd:float
calculation
process
CHEMINF
execution of
ACD/Labs
PhysChem software
library version 12.01
rdf:type
It is actually more complicated..
What’s built on top of this?
Important for other projects
• Multiple outputs from the project available for
reuse to underpin other projects:
• Chemical registry service
• Chemical validation and standardization
• APIs and visualization widgets
New Repository Architecture
doi: 10.1007/s10822-014-9784-5
New Repository Architecture
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
API
Documents
API
Compounds
Widgets
Reactions
Widgets
Spectra
Widgets
Materials
Widgets
Documents
Widgets
Data tier
Data access
tier
User
interface
components
tier
Analytical Laboratory application
User
interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd
party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
Module
Spectra
Module
Reactions
Module
Materials
Module
Textmining
Module
͙
Module
Web UI for unified depositions
DropBox, Google Drive,
SkyDrive, etc
LabTroveand other templated
data
Documents
API, FTP, etc
Raw data Validated data
Staging
databases
Alldatabases are
sliced by data
sources/data
collections and
havesimple
security model
where each data
slice/sourceis
private, public or
embargoed
Compounds
Reactions
Analytical data
For Deposition of Data
• Quality of data at source
• ensuring chemicals are correct - VALIDATION
• reactions map and balance as appropriate –
VALIDATION and STANDARDIZATION
• file format handling for analytical data types –
binary file formats are proprietary -
STANDARDIZATION
• valid interpretation of data – VALIDATION and
ANNOTATION
Input data pipeline
Deposition Gateway
Staging
databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds
Module
Spectra
Module
Reactions
Module
Materials
Module
Textmining
Module
͙
Module
Web UI for unified depositions
DropBox, Google Drive,
SkyDrive, etc
LabTroveand other templated
data
Documents
API, FTP, etc
Raw data Validated data
Staging
databases
Alldatabases are
sliced by data
sources/data
collections and
havesimple
security model
where each data
slice/sourceis
private, public or
embargoed
Deposition of Data
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
API
Documents
API
Compounds
Widgets
Reactions
Widgets
Spectra
Widgets
Materials
Widgets
Documents
Widgets
Data tier
Data access
tier
User
interface
components
tier
Analytical Laboratory application
User
interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd
party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
API
Documents
API
Compounds
Widgets
Reactions
Widgets
Spectra
Widgets
Materials
Widgets
Documents
Widgets
Data tier
Data access
tier
User
interface
components
tier
Analytical Laboratory application
User
interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd
party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
Open innovation contributions from RSC resulting from the Open Phacts project
Work in Progress
User Interface Approach
Compounds Reactions Spectra Materials Documents
Compounds
API
Reactions
API
Spectra
API
Materials
API
Documents
API
Compounds
Widgets
Reactions
Widgets
Spectra
Widgets
Materials
Widgets
Documents
Widgets
Data tier
Data access
tier
User
interface
components
tier
Analytical Laboratory application
User
interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd
party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
A Compounds Repository Interface
The PharmaSea Website
The Open PHACTS community ecosystem
Open PHACTS Project Partners
Pfizer Limited – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for Bioinformatics
BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona
Leiden University Medical Centre
Royal Society of Chemistry
Vrije Universiteit Amsterdam
Spanish National Cancer Research Centre
University of Manchester
Maastricht University
Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität Bonn
AstraZeneca
GlaxoSmithKline
Esteve
Novartis
Merck Serono
H. Lundbeck A/S
Eli Lilly
Netherlands Bioinformatics Centre
Swiss Institute of Bioinformatics
ConnectedDiscovery
EMBL-European Bioinformatics Institute
Janssen
OpenLink
Thank you
Email: williamsa@rsc.org
ORCID: 0000-0002-2668-4821
Twitter: @ChemConnector
Personal Blog: www.chemconnector.com
SLIDES: www.slideshare.net/AntonyWilliams
1 of 49

Recommended

More Related Content

What's hot

What's hot(19)

BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015 by Charlie Hull
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
BioSolr - Searching the stuff of life - Lucene/Solr Revolution 2015
Charlie Hull2.8K views

Similar to Open innovation contributions from RSC resulting from the Open Phacts project

Dealing with the complex challenge of managing diverse chemistry data online by
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineKen Karapetyan
314 views79 slides
The application of cloud computing to royal society of chemistry data platforms by
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsValery Tkachenko
892 views29 slides
Implementing chemistry platform for OpenPHACTS by
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
516 views33 slides
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry by
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
1.4K views53 slides

Similar to Open innovation contributions from RSC resulting from the Open Phacts project(20)

Dealing with the complex challenge of managing diverse chemistry data online by Ken Karapetyan
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
Ken Karapetyan314 views
The application of cloud computing to royal society of chemistry data platforms by Valery Tkachenko
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platforms
Valery Tkachenko892 views
Implementing chemistry platform for OpenPHACTS by Valery Tkachenko
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
Valery Tkachenko516 views
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry by Dr. Haxel Consult
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
Dr. Haxel Consult1.4K views
Open chemistry registry and mapping platform based on open source cheminforma... by Valery Tkachenko
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...
Valery Tkachenko175 views
Metagenomic Data Provenance and Management using the ISA infrastructure --- o... by Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Building a semantic chemistry platform with the royal society of chemistry by Valery Tkachenko
Building a semantic chemistry platform with the royal society of chemistryBuilding a semantic chemistry platform with the royal society of chemistry
Building a semantic chemistry platform with the royal society of chemistry
Valery Tkachenko1.2K views
2010 CASCON - Towards a integrated network of data and services for the life ... by Michel Dumontier
2010 CASCON - Towards a integrated network of data and services for the life ...2010 CASCON - Towards a integrated network of data and services for the life ...
2010 CASCON - Towards a integrated network of data and services for the life ...
Michel Dumontier577 views
Royal society of chemistry activities to develop a data repository for chemis... by Ken Karapetyan
Royal society of chemistry activities to develop a data repository for chemis...Royal society of chemistry activities to develop a data repository for chemis...
Royal society of chemistry activities to develop a data repository for chemis...
Ken Karapetyan649 views
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan by Philippe Rocca-Serra
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, JapanISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
ISA-Tab Standards at Metabolomics Society Meeting, Tsuruoka 2014, Japan
How the InChI identifier is used to underpin our online chemistry databases a... by Ken Karapetyan
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
Ken Karapetyan311 views

Recently uploaded

별헤는 사람들 2023년 12월호 전명원 교수 자료 by
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료sciencepeople
41 views30 slides
Experimental animal Guinea pigs.pptx by
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptxMansee Arya
17 views16 slides
CSF -SHEEBA.D presentation.pptx by
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptxSheebaD7
14 views13 slides
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf by
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfMODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfKerryNuez1
25 views5 slides
MILK LIPIDS 2.pptx by
MILK LIPIDS 2.pptxMILK LIPIDS 2.pptx
MILK LIPIDS 2.pptxabhinambroze18
7 views15 slides
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...InsideScientific
58 views62 slides

Recently uploaded(20)

별헤는 사람들 2023년 12월호 전명원 교수 자료 by sciencepeople
별헤는 사람들 2023년 12월호 전명원 교수 자료별헤는 사람들 2023년 12월호 전명원 교수 자료
별헤는 사람들 2023년 12월호 전명원 교수 자료
sciencepeople41 views
Experimental animal Guinea pigs.pptx by Mansee Arya
Experimental animal Guinea pigs.pptxExperimental animal Guinea pigs.pptx
Experimental animal Guinea pigs.pptx
Mansee Arya17 views
CSF -SHEEBA.D presentation.pptx by SheebaD7
CSF -SHEEBA.D presentation.pptxCSF -SHEEBA.D presentation.pptx
CSF -SHEEBA.D presentation.pptx
SheebaD714 views
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf by KerryNuez1
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdfMODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
MODULE-9-Biotechnology, Genetically Modified Organisms, and Gene Therapy.pdf
KerryNuez125 views
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance... by InsideScientific
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
A Ready-to-Analyze High-Plex Spatial Signature Development Workflow for Cance...
InsideScientific58 views
Nitrosamine & NDSRI.pptx by NileshBonde4
Nitrosamine & NDSRI.pptxNitrosamine & NDSRI.pptx
Nitrosamine & NDSRI.pptx
NileshBonde417 views
Artificial Intelligence Helps in Drug Designing and Discovery.pptx by abhinashsahoo2001
Artificial Intelligence Helps in Drug Designing and Discovery.pptxArtificial Intelligence Helps in Drug Designing and Discovery.pptx
Artificial Intelligence Helps in Drug Designing and Discovery.pptx
abhinashsahoo2001126 views
Open Access Publishing in Astrophysics by Peter Coles
Open Access Publishing in AstrophysicsOpen Access Publishing in Astrophysics
Open Access Publishing in Astrophysics
Peter Coles906 views
"How can I develop my learning path in bioinformatics? by Bioinformy
"How can I develop my learning path in bioinformatics?"How can I develop my learning path in bioinformatics?
"How can I develop my learning path in bioinformatics?
Bioinformy24 views
RemeOs science and clinical evidence by PetrusViitanen1
RemeOs science and clinical evidenceRemeOs science and clinical evidence
RemeOs science and clinical evidence
PetrusViitanen137 views
himalay baruah acid fast staining.pptx by HimalayBaruah
himalay baruah acid fast staining.pptxhimalay baruah acid fast staining.pptx
himalay baruah acid fast staining.pptx
HimalayBaruah7 views
application of genetic engineering 2.pptx by SankSurezz
application of genetic engineering 2.pptxapplication of genetic engineering 2.pptx
application of genetic engineering 2.pptx
SankSurezz10 views
How to be(come) a successful PhD student by Tom Mens
How to be(come) a successful PhD studentHow to be(come) a successful PhD student
How to be(come) a successful PhD student
Tom Mens491 views

Open innovation contributions from RSC resulting from the Open Phacts project

  • 1. Open innovation and chemistry data management contributions from RSC resulting from the Open PHACTS project Antony Williams, Valery Tkachenko, Ken Karapetyan, Alexey Pshenichnov, Colin Batchelor, Jon Steele & David Sharpe ACS San Francisco August 2014
  • 2. What’s the structure? What’s the structure? Are they in our file? Are they in our file? What’s similar? What’s similar? What’s the target? What’s the target?Pharmacology data? Pharmacology data? Known Pathways? Known Pathways? Working On Now? Working On Now?Connections to disease? Connections to disease? Expressed in right cell type? Expressed in right cell type? Competitors?Competitors? IP?IP?
  • 3. Fundamental issue: •There is a LOT of science online! •Chaotic, varying quality and very valuable! •Scientists want to find information quickly and easily •Often they just “can’t get there” (or don’t even know where “there” is) •And you have to manage it all (or not)
  • 4. Pre-competitive Informatics: Pharma are all accessing, processing, storing & re-processing external research data Literature PubChem Genbank Patents Databases Downloads Data Integration Data Analysis Firewalled Databases Repeat @ each company x Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
  • 5. ChEMBLChEMBL DrugBankDrugBank Gene Ontology Gene Ontology WikipathwaysWikipathways UniProtUniProt ChemSpiderChemSpider UMLSUMLS ConceptWikiConceptWiki ChEBIChEBI TrialTroveTrialTrove GVKBioGVKBio GeneGoGeneGo TR IntegrityTR Integrity “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “What is the selectivity profile of known p38 inhibitors?” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
  • 7. • 3-year Innovative Medicines Initiative project • Integrating chemistry and biology data using semantic web technologies • Open source code, open data and open standards • Academics, Pharmas, Publishers… • To put medicines in the pipeline…
  • 8. The Open PHACTS community ecosystem
  • 10. Open PHACTS Deliverables • Many details but overall… • Deliver an Open Source chemical registry service, independent of ChemSpider • Development of Open Source CVSP platform • Deliver widgets and APIs to the project • Deliver high quality, standardized Open Data • Deliver structure data in RDF format
  • 11. Standardize • Use the SRS as guidance for standardization • Adjust as necessary to our needs
  • 13. Salt and Ionic Bonds
  • 17. CVSP Filtering of DrugBank
  • 18. ChEMBL (1.3 million records) • 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973 • 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine • 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704
  • 19. OPS1 DrugBank ID DB07241 OPS5OPS4 OPS3 OPS2 OPS6 ops:OPS1 skos:exactMatch <http://www4.wiwiss.fu- berlin.de/drugbank/resource/drugs/DB07241> . ops:OPS2 skos:relatedMatch ops:OPS1 . ops:OPS3 skos:relatedMatch ops:OPS1 . ops:OPS3 skos:closeMatch ops:OPS4 . ops:OPS3 skos:closeMatch ops:OPS5 . ops:OPS4 skos:closeMatch ops:OPS6 . ops:OPS5 skos:closeMatch ops:OPS6 . Chemical Registry Service
  • 20. Open Sourcing Data and Code • All Open PHACTS data is licensed as Open Data and available from Open PHACTS website – ca. 2 Million chemicals • The Chemical Registration Service, including Chemical Validation and Standardization Platform preparing as Open Source now!
  • 21. RSC data in Open PHACTS 1. Molecule synonyms and identifiers 2. Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3. Molecule–molecule relations (“parent–child”) of interest for drug discovery 4. Calculated physicochemical properties for compounds (both molecular and macroscopic)
  • 22. Our RDF schema Two dozen calculated properties >106 molecules •CHEMINF ontology for cheminformatics •QUDT for units and numeric values •ChemSpider IDs for molecules
  • 23. Synonyms and identifiers Newly added to the CHEMINF ontology: •Validated ChemSpider synonyms •Unvalidated ChemSpider synonyms •Validated database identifiers •Unvalidated database identifiers •InChI, InChIKey, SMILES •Preferred ChemSpider name
  • 24. Physicochemical properties • log P • log D (at pH 5.5 and 7.4) • bioconcentration factor KOC (at pH 5.5, at pH 7.4) • index of refraction • polar surface area • molar refractivity • molar volume • Polarizability • surface tension • density at STP • flash point at 1 atm • boiling point at 1 atm • enthalpy of vaporization at STP • vapour pressure at STP
  • 26. benzene’s connection table OPS benzene calculation result QUDT dimensionless quantity “2.17”^^xsd:float IAO is about OBI has specified output OBI has specified input QUDT has value QUDT has standard uncertainty QUDT has unit CHEMINF calculated log P rdf:type CHEMINF connection table rdf:type “0.234”^^xsd:float calculation process CHEMINF execution of ACD/Labs PhysChem software library version 12.01 rdf:type It is actually more complicated..
  • 27. What’s built on top of this?
  • 28. Important for other projects • Multiple outputs from the project available for reuse to underpin other projects: • Chemical registry service • Chemical validation and standardization • APIs and visualization widgets
  • 29. New Repository Architecture doi: 10.1007/s10822-014-9784-5
  • 30. New Repository Architecture Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
  • 31. Input data pipeline Deposition Gateway Staging databases Compounds Reactions Spectra Materials Articles / CSSP Compounds Module Spectra Module Reactions Module Materials Module Textmining Module ͙ Module Web UI for unified depositions DropBox, Google Drive, SkyDrive, etc LabTroveand other templated data Documents API, FTP, etc Raw data Validated data Staging databases Alldatabases are sliced by data sources/data collections and havesimple security model where each data slice/sourceis private, public or embargoed
  • 35. For Deposition of Data • Quality of data at source • ensuring chemicals are correct - VALIDATION • reactions map and balance as appropriate – VALIDATION and STANDARDIZATION • file format handling for analytical data types – binary file formats are proprietary - STANDARDIZATION • valid interpretation of data – VALIDATION and ANNOTATION
  • 36. Input data pipeline Deposition Gateway Staging databases Compounds Reactions Spectra Materials Articles / CSSP Compounds Module Spectra Module Reactions Module Materials Module Textmining Module ͙ Module Web UI for unified depositions DropBox, Google Drive, SkyDrive, etc LabTroveand other templated data Documents API, FTP, etc Raw data Validated data Staging databases Alldatabases are sliced by data sources/data collections and havesimple security model where each data slice/sourceis private, public or embargoed
  • 38. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
  • 41. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
  • 44. User Interface Approach Compounds Reactions Spectra Materials Documents Compounds API Reactions API Spectra API Materials API Documents API Compounds Widgets Reactions Widgets Spectra Widgets Materials Widgets Documents Widgets Data tier Data access tier User interface components tier Analytical Laboratory application User interface tier (examples) Electronic Laboratory Notebook Paid 3rd party integrations (various platforms – SharePoint, Google, etc) Chemical Inventory application
  • 47. The Open PHACTS community ecosystem
  • 48. Open PHACTS Project Partners Pfizer Limited – Coordinator Universität Wien – Managing entity Technical University of Denmark University of Hamburg, Center for Bioinformatics BioSolveIT GmBH Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre Royal Society of Chemistry Vrije Universiteit Amsterdam Spanish National Cancer Research Centre University of Manchester Maastricht University Aqnowledge University of Santiago de Compostela Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery EMBL-European Bioinformatics Institute Janssen OpenLink
  • 49. Thank you Email: williamsa@rsc.org ORCID: 0000-0002-2668-4821 Twitter: @ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams

Editor's Notes

  1. Mx/psa, how calculated who did it? Mash up. With your data too, - top layer join together but need them all commerical
  2. 10 Can go get everything OPS not a repo of the world, specific sources