SlideShare a Scribd company logo
1 of 31
Open PHACTS - Chemistry
Platform Update and learnings
Antony Williams and Valery Tkachenko
ORCID ID:0000-0002-2668-4821
@gray_alasdair Big Data Integration 2
OpenPHACTS and CRS Diagram
The Chemical Registration Service
Chemistry processing
•Validation
•Standardization
•Properties generation
•Properties retrieval
Export
•RDF
•SDF
API
•Domain-specific searches
•Chemical visualization
•Properties
•Conversions
Subsystems
• “CVSP” (frontend, backend, database)
• Compounds (frontend, database)
• OpenPHACTS API (frontend, database)
• Datasources registry (frontend, database)
• Processing farm (optional)
Structure-Based Database linking
• Open PHACTS, and many other projects
requiring the linking of structure databases,
depend on mappings
• Different databases use different processes
for standardization prior at deposition
• Examples: PubChem, EBI databases,
ChemSpider, etc.
DrugBank
• ~60 records can’t be dearomatized unambiguously
• ~40 records where InChIs did not match structure
• 2 records where SMILES, InChI and name did not
match the structure
• 7 records with 2 stereo bonds at chiral atoms
DB04283 DB04462
Standardizers
• EBI Standardizer:
https://wwwdev.ebi.ac.uk/chembl/extra/francis/sta
/
• PubChem Standardizer: https://
pubchem.ncbi.nlm.nih.gov/standardize/standardi
• NCGC Standardizer: https://tripod.nih.gov/?
p=61
• The CVSP Standardizer work in Open
PHACTS http://cvsp.chemspider.com/
Standardization Rules
• Available from: http://tinyurl.com/hwapem3
• Use the SRS as guidance for standardization
• Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
The CVSP System
http://cvsp.chemspider.com
Supports various file formats
Comptox Chemistry Dashboard
Prior to deposition check a deposition…
>3450 compounds in one SDF
98 Errors, 1571 Warnings
Review Errors
Validation Rule Set
Various Rules Sets Available
CVSP – My own custom rules
ChEMBL Validation Review
(of 1.3 million records)
• 11,020 records with 4 bonds and zero charge, e.g.
CHEMBL501101 or CHEMBL501973
• 271 records with hypervalent oxygen (e.g. ,
CHEMBL2219679), carbon (e.g. 1005895), boron,
chlorine, iodine or phosphine
• 6,177 records where direction of bond makes no
sense, e.g. CHEMBL12760 and CHEMBL34704
Chemical Validation first…
Standardization Second
• Chemical Validation detects errors –
Standardization FIXES them according to rules
• SMIRKS transformations are based on both
InChI Normalization and FDA SRS rules
Standardization SMIRKS
Examples of InChI normalization
[*;H+:1]>>[*;H:1]
[O,S,Se,Te:1]=[O+,S+,Se+,Te+:2][C-;v3:3]>>[O,S,Se,Te:1]=[O,S,Se,Te:2]=[C:3]
[N-,P-,As-,Sb-:1]=[C+;v3:2]>>[N,P,As,Sb:1]#[C:2]
Examples of FDA SRS rules
[n:1]=[O:2]>>[n+:1][O-:2]
[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]
[N+0;H3:1].[C:3](=[O:4])[O:5][H:6]>>[N+1;H4:1].[C:3](=[O:4])[O-:5]
Thiopurine [H:1][S:2][c:3]1[n:8][c:7]([H,*:13])[n:6][c:5]2[c:4]1[n:11][c:10]
([H,*:12])[n:9]2>>[H:1][N:8]1[C:7]([H,*:13])=[N:6][C:5]2=[C:4]([N:11]=[C:10]
([H,*:12])[N:9]2)[C:3]1=[S:2]
Examples of Standardization
Double bond with adjacent wiggly single bond
Collapser hydrogen atoms with no stereo bonds
Examples of Standardization
Remove symmetric stereocenters
Turn off chiral flag if no up or down bonds
Defining a Community Rule Set
• There are multiple standardizers, each with
their own rules set
• Can we decide on a default community rules
set, like Standard InChI, that could be used
by ALL Standardizers?
• A joint meeting between the Research Data
Alliance (RDA), IUPAC and ACS Division of
Chemical Information discussed the value
and possibilities of this approach (July 2016)
EPA is investigating CVSP
• EPA is investigating CVSP as a validation
and standardization platform
• Considering the API aspects of CVSP to
integrate to our registration system
• CVSP is a reference implementation and
“starting point” for a community rules set
CVSP code is now Open Source
• Open Source CVSP code now released
• Code is hosted on Open PHACTS Github
https://github.com/openphacts/ops-crs
• Valery Tkachenko will offer future support
• Hoping for additional community engagement
and support
• Some details of availability….
Virtual Machines
• OPS_FRONT (all websites and API)
• OPS_BACK (all heavy-lifting)
• OPS_DB (databases)
• VMs are VMware images
• Can be converted to other hypervisors
Thank you
Emails: tony27587@gmail.com and tkachenko.valery@gmail.com
SLIDES: www.slideshare.net/AntonyWilliams

More Related Content

What's hot

Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

What's hot (20)

Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
The EPA iCSS Chemistry Dashboard to Support Compound Identification Using Hig...
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELSOPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
OPERA, AN OPEN SOURCE AND OPEN DATA SUITE OF QSAR MODELS
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
 

Viewers also liked

Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...
NASIG
 
Building an Online Profile Using Social Networking and Amplification Tools fo...
Building an Online Profile Using Social Networking and Amplification Tools fo...Building an Online Profile Using Social Networking and Amplification Tools fo...
Building an Online Profile Using Social Networking and Amplification Tools fo...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Viewers also liked (16)

From Data Availability to Information Accessibility: The WellWiki Project
From Data Availability to Information Accessibility: The WellWiki ProjectFrom Data Availability to Information Accessibility: The WellWiki Project
From Data Availability to Information Accessibility: The WellWiki Project
 
NSF Data Management Requirements 101
NSF Data Management Requirements 101NSF Data Management Requirements 101
NSF Data Management Requirements 101
 
Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...
Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...
Simple Springshare Mashups: Cross-Platform Strategies for Repurposing Digital...
 
How One Monkey on a Typewriter Made a Difference to Online Chemistry
How One Monkey on a Typewriter Made a Difference to Online ChemistryHow One Monkey on a Typewriter Made a Difference to Online Chemistry
How One Monkey on a Typewriter Made a Difference to Online Chemistry
 
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
Using Ecological Momentary Assessment to Examine Post-food Consumption Affect...
 
SMS Berlin 2016 Cultural Perspectives on Strategic Management
SMS Berlin 2016 Cultural Perspectives on Strategic ManagementSMS Berlin 2016 Cultural Perspectives on Strategic Management
SMS Berlin 2016 Cultural Perspectives on Strategic Management
 
Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...Investigating Impact Metrics for Performance for the US-EPA National Center f...
Investigating Impact Metrics for Performance for the US-EPA National Center f...
 
A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...
A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...
A Bird in the Hand: Leveraging ILL Requests to Improve Electronic Resource A...
 
Social Media Tools for Scientists and Building an Online Profile
Social Media Tools for Scientists and Building an Online ProfileSocial Media Tools for Scientists and Building an Online Profile
Social Media Tools for Scientists and Building an Online Profile
 
Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...Shaping Expectations: Defining and Refining the Role of Technical Services in...
Shaping Expectations: Defining and Refining the Role of Technical Services in...
 
Building an Online Profile Using Social Networking and Amplification Tools fo...
Building an Online Profile Using Social Networking and Amplification Tools fo...Building an Online Profile Using Social Networking and Amplification Tools fo...
Building an Online Profile Using Social Networking and Amplification Tools fo...
 
Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...Web Preservation, or Managing your Organisation’s Online Presence After the O...
Web Preservation, or Managing your Organisation’s Online Presence After the O...
 
Going Concerns: A Perspective from the Nexus of Business, Culture and Instit...
Going Concerns:  A Perspective from the Nexus of Business, Culture and Instit...Going Concerns:  A Perspective from the Nexus of Business, Culture and Instit...
Going Concerns: A Perspective from the Nexus of Business, Culture and Instit...
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 

Similar to Open PHACTS Chemistry Platform Update and Learnings

How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
Dr. Haxel Consult
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
Ken Karapetyan
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
Ken Karapetyan
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Chemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet EraChemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet Era
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to Open PHACTS Chemistry Platform Update and Learnings (20)

ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...ChemValidator – an online service for validating and standardizing chemical s...
ChemValidator – an online service for validating and standardizing chemical s...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...How the InChI identifier is used to underpin our online chemistry databases a...
How the InChI identifier is used to underpin our online chemistry databases a...
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 
Hosting Public Domain Chemicals Data Online for the Community – the Challenge...
Hosting Public Domain Chemicals Data Online for the Community – the Challenge...Hosting Public Domain Chemicals Data Online for the Community – the Challenge...
Hosting Public Domain Chemicals Data Online for the Community – the Challenge...
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Dealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data onlineDealing with the complex challenge of managing diverse chemistry data online
Dealing with the complex challenge of managing diverse chemistry data online
 
Acs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvspAcs 2013 indianapolis_cvsp
Acs 2013 indianapolis_cvsp
 
Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...Automated workflows for data curation and standardization of chemical structu...
Automated workflows for data curation and standardization of chemical structu...
 
Experiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the CommunityExperiences in Hosting Big Chemistry Data Collections for the Community
Experiences in Hosting Big Chemistry Data Collections for the Community
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...The RSC chemical validation and standardization platform, a potential path to...
The RSC chemical validation and standardization platform, a potential path to...
 
Chemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet EraChemistry data: Distortion and dissemination in the Internet Era
Chemistry data: Distortion and dissemination in the Internet Era
 
20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse20 million public patent structures: looking at the gift horse
20 million public patent structures: looking at the gift horse
 
An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...An examination of data quality on QSAR Modeling in regards to the environment...
An examination of data quality on QSAR Modeling in regards to the environment...
 

Recently uploaded

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
seri bangash
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Recently uploaded (20)

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 

Open PHACTS Chemistry Platform Update and Learnings

  • 1. Open PHACTS - Chemistry Platform Update and learnings Antony Williams and Valery Tkachenko ORCID ID:0000-0002-2668-4821
  • 2. @gray_alasdair Big Data Integration 2 OpenPHACTS and CRS Diagram
  • 3. The Chemical Registration Service Chemistry processing •Validation •Standardization •Properties generation •Properties retrieval Export •RDF •SDF API •Domain-specific searches •Chemical visualization •Properties •Conversions
  • 4.
  • 5. Subsystems • “CVSP” (frontend, backend, database) • Compounds (frontend, database) • OpenPHACTS API (frontend, database) • Datasources registry (frontend, database) • Processing farm (optional)
  • 6. Structure-Based Database linking • Open PHACTS, and many other projects requiring the linking of structure databases, depend on mappings • Different databases use different processes for standardization prior at deposition • Examples: PubChem, EBI databases, ChemSpider, etc.
  • 7. DrugBank • ~60 records can’t be dearomatized unambiguously • ~40 records where InChIs did not match structure • 2 records where SMILES, InChI and name did not match the structure • 7 records with 2 stereo bonds at chiral atoms DB04283 DB04462
  • 8. Standardizers • EBI Standardizer: https://wwwdev.ebi.ac.uk/chembl/extra/francis/sta / • PubChem Standardizer: https:// pubchem.ncbi.nlm.nih.gov/standardize/standardi • NCGC Standardizer: https://tripod.nih.gov/? p=61 • The CVSP Standardizer work in Open PHACTS http://cvsp.chemspider.com/
  • 9.
  • 10. Standardization Rules • Available from: http://tinyurl.com/hwapem3 • Use the SRS as guidance for standardization • Adjust as necessary to our needs
  • 12. Salt and Ionic Bonds
  • 15. Comptox Chemistry Dashboard Prior to deposition check a deposition…
  • 17. 98 Errors, 1571 Warnings
  • 20. Various Rules Sets Available
  • 21. CVSP – My own custom rules
  • 22. ChEMBL Validation Review (of 1.3 million records) • 11,020 records with 4 bonds and zero charge, e.g. CHEMBL501101 or CHEMBL501973 • 271 records with hypervalent oxygen (e.g. , CHEMBL2219679), carbon (e.g. 1005895), boron, chlorine, iodine or phosphine • 6,177 records where direction of bond makes no sense, e.g. CHEMBL12760 and CHEMBL34704
  • 23. Chemical Validation first… Standardization Second • Chemical Validation detects errors – Standardization FIXES them according to rules • SMIRKS transformations are based on both InChI Normalization and FDA SRS rules
  • 24. Standardization SMIRKS Examples of InChI normalization [*;H+:1]>>[*;H:1] [O,S,Se,Te:1]=[O+,S+,Se+,Te+:2][C-;v3:3]>>[O,S,Se,Te:1]=[O,S,Se,Te:2]=[C:3] [N-,P-,As-,Sb-:1]=[C+;v3:2]>>[N,P,As,Sb:1]#[C:2] Examples of FDA SRS rules [n:1]=[O:2]>>[n+:1][O-:2] [*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3] [N+0;H3:1].[C:3](=[O:4])[O:5][H:6]>>[N+1;H4:1].[C:3](=[O:4])[O-:5] Thiopurine [H:1][S:2][c:3]1[n:8][c:7]([H,*:13])[n:6][c:5]2[c:4]1[n:11][c:10] ([H,*:12])[n:9]2>>[H:1][N:8]1[C:7]([H,*:13])=[N:6][C:5]2=[C:4]([N:11]=[C:10] ([H,*:12])[N:9]2)[C:3]1=[S:2]
  • 25. Examples of Standardization Double bond with adjacent wiggly single bond Collapser hydrogen atoms with no stereo bonds
  • 26. Examples of Standardization Remove symmetric stereocenters Turn off chiral flag if no up or down bonds
  • 27. Defining a Community Rule Set • There are multiple standardizers, each with their own rules set • Can we decide on a default community rules set, like Standard InChI, that could be used by ALL Standardizers? • A joint meeting between the Research Data Alliance (RDA), IUPAC and ACS Division of Chemical Information discussed the value and possibilities of this approach (July 2016)
  • 28. EPA is investigating CVSP • EPA is investigating CVSP as a validation and standardization platform • Considering the API aspects of CVSP to integrate to our registration system • CVSP is a reference implementation and “starting point” for a community rules set
  • 29. CVSP code is now Open Source • Open Source CVSP code now released • Code is hosted on Open PHACTS Github https://github.com/openphacts/ops-crs • Valery Tkachenko will offer future support • Hoping for additional community engagement and support • Some details of availability….
  • 30. Virtual Machines • OPS_FRONT (all websites and API) • OPS_BACK (all heavy-lifting) • OPS_DB (databases) • VMs are VMware images • Can be converted to other hypervisors
  • 31. Thank you Emails: tony27587@gmail.com and tkachenko.valery@gmail.com SLIDES: www.slideshare.net/AntonyWilliams

Editor's Notes

  1. Open PHACTS was developed to support the key questions of drug discovery Business questions have been at the heart of Open PHACTS and have driven the development of the platform Mx/psa, how calculated who did it? Mash up. With your data too, - top layer join together but need them all commercial Data provided by many publishers Originally in many formats: relational, SD files and RDF Worked closely with publishers Data licensing was a major issue Over 5 billion triples – 14 datasets & growing Hosted on beefy hardware; data in memory (aim) Extensive memcaching Pose complex queries to extract data