SlideShare a Scribd company logo
1 of 25
Chemistry Validation and
Standardization Platform
Version 2.0
Valery Tkachenko1, Boris Sattarov1, Antony Williams2, Chris Grulke2
and Rick Zakharov1
1Science Data Software, Rockville, USA
2National Center for Computational Toxicology, US-EPA
ACS Spring 2017
San Francisco, April 1-6th 2017
DIKW workflow
Predictive data models & toolsExperimental Design
Data Analysis
and Modeling
Structured
Data
Repository
Data collection,
curation,
integration, and
structuring
(ontology)
Literature data
Electronic
Databases
:
Processing
Experimental
Data
Disease
Experimental
Validation
3
Effect
Decision support
Karmann Mills and Anthony
Hickey
RTI International, RTP, NC
27709
and
Alex Tropsha
Eshelman School of
Pharmacy,
University of North Carolina at
Chapel Hill, NC 27599
Standards and authorities
We live in a hyperconnected World
Data repositories
Fourches, Muratov, Tropsha. Nat Chem Biol. 2015,11(8):535.
How the problem is being solved now
[Very incomplete] list of common problems
• Violations of chemical and common sense
• Violations of valence bond theory
• Unsupported format and chemical model features
• Information loss during conversion
• Tautomers
• Stereochemical issues
• Mixtures
• Other classes of chemicals (materials, formulations, biologicals, structurally
diverse, etc)
• Equivalence/mapping issues
• Identifiers/names issues
• Etc, etc, etc…
…problems (continued)
• Multiple [historical, proprietary, shortcoming] formats
• ChemDraw, ChemSketch, AccelrysDraw
• MOL, SDF
• SMILES
• Identifiers
• Names and Synonyms
• Multiple toolkits/models
• Open Source (alphabetical)
• CDK
• RDKit
• Indigo
• OpenBabel
• Etc…
• Commercial (alphabetical)
• CACTVS
• ChemAxon
• OpenEye
• Etc…
• Hystorical software
• No [machine-readable] standards
• No authorities No coordinated efforts!!!
Solution
• Agreed and machine-readable (digital) standards
• Open-source (and therefore fully transparent) solution
• Organizational AND community support and involvement
• Accessible solution
• Data triaging at data repositories level
• Real-time validation/standardization (API, library, “docker”, etc)
Big Data Integration 11
OpenPHACTS
Chemical data[base]
OpenPHACTS Chemistry Registry System (CRS)
OpenPHACTS Chemistry Registry System (CRS)
Shortcomings of the original
OpenPHACTS CRS approach
• Platform-dependent
• Toolkit-dependent (potential licensing issues)
• No deployable library
• No [convenient] API
…OpenPHACTS CRS1 - ongoing work
• Microsoft  platform independent
• .NET Core, Python
• Linux
• NoSQL
• Toolkit independent
• Indigo
• RDKit (in progress)
• CDK (planned)
• Docker image
• RESTful API
1 Was open-sourced and now supported by the OpenPHACTS Foundation
CVSP on Jupyter - validation
CVSP on Jupyter - standardization
CVSP on Jupyter – ontologies, chemotypes, etc
The Comptox Chemistry Dashboard data
• 747,000 chemical substances and ca. 727k chemical structures!
• Diverse structures – organics, inorganics, organometallics
• Fully open data available for download, reuse and repurposing:
https://comptox.epa.gov/dashboard/downloads
• Increasingly an integration hub for EPA chemical structure data
• Strong team with cheminformatics experience
• Manual curators to provide feedback on
CVSP results and processing
• Data processing to start shortly…
Example – Standardization of tautomers
What form is preferred – Keto or Enol???
Open Science Data Repository
Mapping and conversion
Meet the Team
Alexandru Korotcov
Data Science
Rick Zakharov
Technology
Valery Tkachenko
Ideas and plumbing
Boris Sattarov
Cheminformatics
Slides: https://www.slideshare.net/valerytkachenko16Contact us: info@scidatasoft.com

More Related Content

What's hot

The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsValery Tkachenko
 
The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...Valery Tkachenko
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Royal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesRoyal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesValery Tkachenko
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIRDOM
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)Carole Goble
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platformTim Clark
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsTim Clark
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIMartin Scharm
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseJustin Clark-Casey
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Sean Ekins
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked DataMichel Dumontier
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.FAIRDOM
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentsTim Clark
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsTim Clark
 

What's hot (20)

The application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platformsThe application of cloud computing to royal society of chemistry data platforms
The application of cloud computing to royal society of chemistry data platforms
 
Overview of open resources to support automated structure verification and e...
Overview of open resources to support automated structure verification  and e...Overview of open resources to support automated structure verification  and e...
Overview of open resources to support automated structure verification and e...
 
The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...The royal society of chemistry and its adoption of semantic web technologies ...
The royal society of chemistry and its adoption of semantic web technologies ...
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Royal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and librariesRoyal Society of Chemistry open source cheminformatics platforms and libraries
Royal Society of Chemistry open source cheminformatics platforms and libraries
 
FAIR data and model management for systems biology.
FAIR data and model management for systems biology.FAIR data and model management for systems biology.
FAIR data and model management for systems biology.
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)What is Reproducibility? The R* brouhaha (and how Research Objects can help)
What is Reproducibility? The R* brouhaha (and how Research Objects can help)
 
Annotopia open annotation services platform
Annotopia open annotation services platformAnnotopia open annotation services platform
Annotopia open annotation services platform
 
Fairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology viewsFairport domain specific metadata using w3 c dcat & skos w ontology views
Fairport domain specific metadata using w3 c dcat & skos w ontology views
 
Improving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBIImproving the Management of Computational Models -- Invited talk at the EBI
Improving the Management of Computational Models -- Invited talk at the EBI
 
Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications Sharing chemical structures with peer reviewed publications
Sharing chemical structures with peer reviewed publications
 
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data WarehouseMaking Linked Data SPARQL with the InterMine Biological Data Warehouse
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 
Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...Acs collaborative computational technologies for biomedical research an enabl...
Acs collaborative computational technologies for biomedical research an enabl...
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.Reproducible and citable data and models: an introduction.
Reproducible and citable data and models: an introduction.
 
eXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic ExperimentseXframe: A Semantic Web Platform for Genomic Experiments
eXframe: A Semantic Web Platform for Genomic Experiments
 
exFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics ExperimentsexFrame: a Semantic Web Platform for Genomics Experiments
exFrame: a Semantic Web Platform for Genomics Experiments
 

Similar to Chemistry Validation and Standardization Platform v2.0

ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryDr. Haxel Consult
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Valery Tkachenko
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...Andrew McEachran
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...Ken Karapetyan
 

Similar to Chemistry Validation and Standardization Platform v2.0 (20)

Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...Big data challenges associated with building a national data repository for c...
Big data challenges associated with building a national data repository for c...
 
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of ChemistryICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
ICIC 2013 Conference Proceedings Antony Williams Royal Society of Chemistry
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
 
Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...Open chemistry registry and mapping platform based on open source cheminforma...
Open chemistry registry and mapping platform based on open source cheminforma...
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...Activities at the Royal Society of Chemistry to gather, extract and analyze b...
Activities at the Royal Society of Chemistry to gather, extract and analyze b...
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...Applications of the US EPA’s CompTox chemicals dashboard to support structure...
Applications of the US EPA’s CompTox chemicals dashboard to support structure...
 

More from Valery Tkachenko

Evolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureEvolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureValery Tkachenko
 
In silico design of new functional materials
In silico design of new functional materialsIn silico design of new functional materials
In silico design of new functional materialsValery Tkachenko
 
Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Valery Tkachenko
 
Abstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsAbstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsValery Tkachenko
 
Machine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsMachine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsValery Tkachenko
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionValery Tkachenko
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsValery Tkachenko
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Valery Tkachenko
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Valery Tkachenko
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Valery Tkachenko
 
Living in a world of federated knowledge challenges, principles, tools and ...
Living in a world of federated knowledge   challenges, principles, tools and ...Living in a world of federated knowledge   challenges, principles, tools and ...
Living in a world of federated knowledge challenges, principles, tools and ...Valery Tkachenko
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataValery Tkachenko
 
OpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsOpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsValery Tkachenko
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical informationValery Tkachenko
 
OMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesOMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesValery Tkachenko
 
Not just another reaction database
Not just another reaction databaseNot just another reaction database
Not just another reaction databaseValery Tkachenko
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
Text mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community accessText mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community accessValery Tkachenko
 

More from Valery Tkachenko (20)

Evolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the futureEvolution of public chemistry databases: past and the future
Evolution of public chemistry databases: past and the future
 
In silico design of new functional materials
In silico design of new functional materialsIn silico design of new functional materials
In silico design of new functional materials
 
Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...Metal-organic frameworks: from database to supramolecular effects in complexa...
Metal-organic frameworks: from database to supramolecular effects in complexa...
 
Abstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representationsAbstract recommendation system: beyond word-level representations
Abstract recommendation system: beyond word-level representations
 
Machine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpointsMachine learning methods for chemical properties and toxicity based endpoints
Machine learning methods for chemical properties and toxicity based endpoints
 
Chemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collectionChemical workflows supporting automated research data collection
Chemical workflows supporting automated research data collection
 
Deep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpointsDeep learning methods applied to physicochemical and toxicological endpoints
Deep learning methods applied to physicochemical and toxicological endpoints
 
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictionsDeep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
Deep Learning on nVidia GPUs for QSAR, QSPR and QNAR predictions
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...Need and benefits for structure standardization to facilitate integration and...
Need and benefits for structure standardization to facilitate integration and...
 
Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...Development and comparison of deep learning toolkit with other machine learni...
Development and comparison of deep learning toolkit with other machine learni...
 
Living in a world of federated knowledge challenges, principles, tools and ...
Living in a world of federated knowledge   challenges, principles, tools and ...Living in a world of federated knowledge   challenges, principles, tools and ...
Living in a world of federated knowledge challenges, principles, tools and ...
 
Using the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical dataUsing the structured product labeling format to index versatile chemical data
Using the structured product labeling format to index versatile chemical data
 
OpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and LearningsOpenPHACTS - Chemistry Platform Update and Learnings
OpenPHACTS - Chemistry Platform Update and Learnings
 
Evolution of open chemical information
Evolution of open chemical informationEvolution of open chemical information
Evolution of open chemical information
 
OMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spacesOMPOL – visualisation of large chemical spaces
OMPOL – visualisation of large chemical spaces
 
Not just another reaction database
Not just another reaction databaseNot just another reaction database
Not just another reaction database
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
Text mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community accessText mining to produce large chemistry datasets for community access
Text mining to produce large chemistry datasets for community access
 

Recently uploaded

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

Chemistry Validation and Standardization Platform v2.0

  • 1. Chemistry Validation and Standardization Platform Version 2.0 Valery Tkachenko1, Boris Sattarov1, Antony Williams2, Chris Grulke2 and Rick Zakharov1 1Science Data Software, Rockville, USA 2National Center for Computational Toxicology, US-EPA ACS Spring 2017 San Francisco, April 1-6th 2017
  • 3. Predictive data models & toolsExperimental Design Data Analysis and Modeling Structured Data Repository Data collection, curation, integration, and structuring (ontology) Literature data Electronic Databases : Processing Experimental Data Disease Experimental Validation 3 Effect Decision support Karmann Mills and Anthony Hickey RTI International, RTP, NC 27709 and Alex Tropsha Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC 27599
  • 5. We live in a hyperconnected World
  • 7. Fourches, Muratov, Tropsha. Nat Chem Biol. 2015,11(8):535. How the problem is being solved now
  • 8. [Very incomplete] list of common problems • Violations of chemical and common sense • Violations of valence bond theory • Unsupported format and chemical model features • Information loss during conversion • Tautomers • Stereochemical issues • Mixtures • Other classes of chemicals (materials, formulations, biologicals, structurally diverse, etc) • Equivalence/mapping issues • Identifiers/names issues • Etc, etc, etc…
  • 9. …problems (continued) • Multiple [historical, proprietary, shortcoming] formats • ChemDraw, ChemSketch, AccelrysDraw • MOL, SDF • SMILES • Identifiers • Names and Synonyms • Multiple toolkits/models • Open Source (alphabetical) • CDK • RDKit • Indigo • OpenBabel • Etc… • Commercial (alphabetical) • CACTVS • ChemAxon • OpenEye • Etc… • Hystorical software • No [machine-readable] standards • No authorities No coordinated efforts!!!
  • 10. Solution • Agreed and machine-readable (digital) standards • Open-source (and therefore fully transparent) solution • Organizational AND community support and involvement • Accessible solution • Data triaging at data repositories level • Real-time validation/standardization (API, library, “docker”, etc)
  • 11. Big Data Integration 11 OpenPHACTS
  • 15.
  • 16. Shortcomings of the original OpenPHACTS CRS approach • Platform-dependent • Toolkit-dependent (potential licensing issues) • No deployable library • No [convenient] API
  • 17. …OpenPHACTS CRS1 - ongoing work • Microsoft  platform independent • .NET Core, Python • Linux • NoSQL • Toolkit independent • Indigo • RDKit (in progress) • CDK (planned) • Docker image • RESTful API 1 Was open-sourced and now supported by the OpenPHACTS Foundation
  • 18. CVSP on Jupyter - validation
  • 19. CVSP on Jupyter - standardization
  • 20. CVSP on Jupyter – ontologies, chemotypes, etc
  • 21. The Comptox Chemistry Dashboard data • 747,000 chemical substances and ca. 727k chemical structures! • Diverse structures – organics, inorganics, organometallics • Fully open data available for download, reuse and repurposing: https://comptox.epa.gov/dashboard/downloads • Increasingly an integration hub for EPA chemical structure data • Strong team with cheminformatics experience • Manual curators to provide feedback on CVSP results and processing • Data processing to start shortly…
  • 22. Example – Standardization of tautomers What form is preferred – Keto or Enol???
  • 23. Open Science Data Repository
  • 25. Meet the Team Alexandru Korotcov Data Science Rick Zakharov Technology Valery Tkachenko Ideas and plumbing Boris Sattarov Cheminformatics Slides: https://www.slideshare.net/valerytkachenko16Contact us: info@scidatasoft.com

Editor's Notes

  1. Open PHACTS was developed to support the key questions of drug discovery Business questions have been at the heart of Open PHACTS and have driven the development of the platform Mx/psa, how calculated who did it? Mash up. With your data too, - top layer join together but need them all commercial Data provided by many publishers Originally in many formats: relational, SD files and RDF Worked closely with publishers Data licensing was a major issue Over 5 billion triples – 14 datasets & growing Hosted on beefy hardware; data in memory (aim) Extensive memcaching Pose complex queries to extract data