SlideShare a Scribd company logo
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
http://www.orcid.org/0000-0002-2668-4821
Introduction to Cheminformatics: Accessing
data through the CompTox Dashboard
The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA
Antony John Williams
williams.antony@epa.gov
UNC Chapel Hill: September 20th 2021
Who am I?
https://orcid.org/0000-0002-2668-4821
• Computational chemist at the US-EPA – scientist
• Responsibility for cheminformatics projects, internal
& external collaborations, “product marketing”–
cheminformatician
• Work with a team of people developing software
solutions – “product & project manager”
• Scientific publications, books, blogger – author; I
am @ChemConnector – social networker
Learning Objectives
• A very short overview of cheminformatics focused on
• Chemical identifiers and some associated challenges
• Molecular fingerprints
• Molecular similarity
• Structure-based modeling (QSAR/QSPR/QSUR)
• An overview of the CompTox Chemicals Dashboard and how it can help to:
• Search, source, visualize and download data for singleton or thousands of chemicals
• Perform real-time prediction calculations and read-across
• Navigate into dozens of other online resources that contain additional data
Problem: Too Many Chemicals and Too Few Resources
• Fast characterization of human and ecological risk posed by
existing and emerging chemicals is a critical challenge
• Chemistry never stops. But there is sparse and distributed data…
70
60
50
40
30
20
10
0
Percent
of
Chemicals
Acute
Gentox
Cancer
Dev Tox
EDSP Tier 1
<1%
Repro Tox
Modified from Judson et al., EHP 2010
Data for
Environmental Chemicals
CAS REGISTRY® contains more than
171 million unique organic and
inorganic chemical substances,
such as alloys, coordination
compounds, minerals, mixtures,
polymers and salts, and more than
68 million protein and DNA sequences
Solution
• Develop a “first-stop-shop” for environmental chemical data
to support EPA and partner decision making:
– Centralized location for relevant chemical data
– Chemistry, exposure, hazard and dosimetry
– Combination of existing data and predictive models
– Publicly accessible, periodically updated, curated
• Easy access to data improves efficiency and ultimately
accelerates chemical risk assessment
Cheminformatics and the
Dashboard
• Cheminformatics is the application of computer science and
informatics-based approaches to:
• Represent chemical structures, substances and reactions
• Store chemistry-related data
• Search for chemistry related data
• Model data sets to provide predictive capabilities
• Visualize and analyse chemistry related data
• The US-EPA uses cheminformatics (and bioinformatics) to
manipulate, integrate, store, model and deliver access to our data.
The CompTox Chemicals Dashboard is built on a solid
cheminformatics foundation
Types of Chemical Identifiers
• Structural Identifiers
• The visual depiction
• Multiple electronic formats
• InChI (Key): FMMWHPNWAFZXNH-
UHFFFAOYSA-N
• Common Name: Benzo(a)pyrene
• Systematic Name: Benzo[pqr]tetraphene
• CAS Registry Number(s) : 50-32-8
• Lots of other “common names and trade
names”
Information Associated with a Chemical Structure?
INTRINSIC PROPERTIES
• Formula : C20H12
• Molecular weight: 252.316 g/mol
• Monoisotopic Mass: 252.093900 g/mol
MEASURED PROPERTIES
• LogKow 6.13
• Melting Pt177°C
• Boiling Pt 485°C
• ….and many more
How to Store a Chemical Structure
•Multiple approaches:
• Names and identifiers
• 2D or 3D structure “molfile”
• SMILES:
• c1cc2c3ccc4cccc5ccc(cc2cc1)c3c45
• C1=CC2=CC3=CC=C4C=CC=C5C=CC(=C2C=C1)C3=C45
• and many other variants….
• InChI=1S/C20H12/c1-2-7-17-15(4-1)12-16-9-8-13-5-3-
6-14-10-11-18(17)20(16)19(13)14/h1-12H
• InChIKey: FMMWHPNWAFZXNH-UHFFFAOYSA-N
If We Database Chemical Structures…
• …then we can search the dataset by inherent structural properties
• Formula
• Mass
• Substructure
• Structural similarity
• …we can integrate other info into the database for retrieval
• …available data, both experimental and predicted, is a click away
• …data can be downloaded, distributed and shared
• …linking out to other resources enabled by adopting specific standards
• …structure collections, with associated data, are available for modeling
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
11
SEARCH
TOX DATA
BIOACTIVITY
SIMILARITY
READ-ACROSS
PUBMED
BATCH SEARCH
CompTox Chemicals Dashboard
883k Chemical Substances
BASIC Search
• Type ahead search using Names,
synonyms and CASRNs
• Millions of identifiers
• Substring search
Search for classes of chemicals
• Examples: “perfluoro”
Challenges with Nomenclature
• Be CAREFUL with names! There is a LOT of confusion in the
public domain. CHOOSE sources wisely!
• There are MANY public databases but not many are curated
• All public databases have value but not many curate data
• Example: METHANE on PubChem
https://pubchem.ncbi.nlm.nih.gov/compound/297
CAS Registry Numbers on PubChem
CASRN lookup on the dashboard
Methane is Diamond and Nanotubes?
• These are all Depositor Names for Methane 
Detailed Chemical Pages
One more identifier – the DTXSID
• Chemical page: Wikipedia snippet when available, intrinsic
properties, structural identifiers, linked substances
Detailed Chemical Pages
Easy Navigation
• Chemical page: Wikipedia snippet when available, intrinsic
properties, structural identifiers, linked substances
From the Chemical Details Page…
all chemicals with same FORMULA
How many chemicals are associated
through LINKED SUBSTANCES?
• Atrazine, is a herbicide – in MANY commercial products
• The dashboard has salt forms, isotopically labelled forms,
multicomponent forms
• How do we identify what they are???
A little more about the InChI
• An InChIKey is made up of two blocks…
• Block 1 – “the connectivity” of atoms and bonds
• Block 2 – isotopes, charge, stereo
• The InChIKey is VERY USEFUL
Searching using InChI
• Demo an internet search using InChIs – Cholesterol has the
InChIKey: HVYWMOMLDIMFJA-DPAQBDIFSA-N
• Demo Atrazine – Linked Substances – Skeleton
• More about Linked Substances….
Linked Substances – more interesting
• We map chemicals together
using cheminformatics
approaches
• Use desalting, destereo,
split multicomponents etc to
map chemicals together
Atrazine Linked Substances
A little more about our data quality
• Five full time curators register and curate data to elevate quality
Underneath the Dashboard
28
DSSTox_v2
Public
Curated
5. Low
2. Low
6. Untrusted
1. High
3. High
4. Med
7. Incomplete
validated
Public_Untrusted
Public_Low
Public_Medium
Public_High
DSSTox_Low
DSSTox_High 7269
16K
33K
101K
590K
~ 310K pending
~ 150K pending
~ 150K substances in
top 4 QC quality bins
Distribution of curated data
Now at >1.2 MILLION substances
A little more about our data quality
Navigating data via the Left Hand Tabs
Experimental and Predicted Data
• Physchem and Fate & Transport
experimental and predicted data
• Data can be downloaded as Excel, TSV
and CSV files
• Predictions: multiple algorithms
• EPI Suite: Estimation Program Interface
• ACD/Labs (commercial)
• TEST: Toxicity Estimation Software Tool
• OPERA: OPEn structure–activity/
property Relationship App
Chemical Hazard Data
ToxVal Database
• >50k chemicals
• >770k tox. values
• >30 sources of data
• ~5k journals cited
• ~70k citations
Safety Data
Identifiers Support Searches in other
systems
More About CASRNs
• CASRNs are very useful, and still limited
• Not every chemical has a STRUCTURE…substances vs structures
• “Chemical Abstracts Service” – numbers don’t exist until they
abstracted and indexed
• Not every chemical on the dashboard necessarily has a CASRN –
how would you find those that didn’t??? Hint: Search NOCAS_
• There are ~6000 chemicals without CASRN on the dashboard
• A chemical can also have many deleted CASRNs
Products Searching
What chemicals are in hair care products?
Let’s Talk Exposure
• Types of Exposure Data on the Dashboard
• Consumer product categories and uses
• Products containing the chemical
• Predicted exposure levels from modeling (more in next session)
Sources of Exposure to Chemicals
QSAR modeling
• What do you trust more? Experimental or predicted data?
• Do you trust individual models or consensus models
• What if there are no experimental data, how good are predictions?
Data Curation Pipelines plus
Manual Curation Processes
Property and Fate and Transport Data
~25 MILLION pre-predicted values
• We have built QSPR models based on tens of thousands of
property data points curated over the past decade
• We push our “QSAR-Ready” chemical structures through
predictions to produce property predictions
Access to Predictions
OPERA Reports
Similar reports for TEST predictions
Real-Time Predictions
46
Toxicity and Properties
Real-Time Predictions
48
What’s the best way to search the
internet for chemical data?
• We know how complex chemicals identifiers are…
• CASRN(s)
• Hundreds of names (maybe)
• SMILES
• InChIs
• EINECS, EC numbers
• What can WE do to help you navigate the internet?
Identifiers are used in the app
• Identifiers are used to feed and link into “Literature”
Literature Searching
• Real-time retrieval of data from PubMed ~30
million abstracts and growing)
• Choose from set of pre-defined queries
• Adjust and fine tune queries based on interests
Literature Searching
• “Sifting” of results using
multiple terms
• Frequency counting terms
• Color highlighting of terms
• Download list to Excel
• Send list to PubMed for
downloading ref. file
• Direct link via PubMed ID
External Links – Also use Identifiers
Names, CASRN, PubChem IDs, InChIs…
53
External Links
• Links to ~90 websites providing access to
additional data on the chemical of interest
Chemical Lists and
Categories
PFAS lists of Chemicals
56
Curated List of Pesticides
•Find list of interest
•Select list and
send to batch
Batch Searching
• Singleton searches are great but…
• …we generally want data on LOTS of chemicals!
• Typical questions
• What are the structures for a set of chemical names? Set of CASRNs?
• Can I get chemical lists in Excel files? As a list of SMILES strings?
Can I get an SDF file?
• Can I include predicted properties in the download file? OPERA?
TEST?
• Are “these chemicals” screened in Toxcast?
• I’m a mass spectrometrist and need masses and formulae for a list of
chemicals
Access data en masse for thousands of chemicals….
Select Output Format and Content
Batch Search CASRNs
61
Summary and Conclusion
• CompTox Chemicals Dashboard - a
central hub for environmental data
• ~900k chemical substances
• Integrating property data, hazard data,
exposure data, in vitro bioactivity data
• Interrogation of bioactivity data -
• Multiple types of searches
• Batch search for thousands of chemicals
• Real-time property and toxicity predictions
• Downloadable files – CSV, TSV and Excel
References
• The CompTox Chemistry Dashboard: a community data resource for environmental
chemistry, J. Cheminformatics, 9, 61 (2017)
• EPA’s DSSTox database: History of development of a curated chemistry resource
supporting computational toxicology research, Comp. Tox. 12, 100096 (2019)
• OPERA models for predicting physicochemical properties and environmental fate
endpoints, J. Cheminformatics, 10, 10 (2018)
• Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model,
Environ. Sci. Technol. 49, 8804-8814 (2015)
• ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology, Chem. Res.
Toxicol. 29, 1225-51 (2016)
• Development and Validation of a Computational Model for Androgen Receptor Activity,
Chem. Res. Toxicol. 30, 946-964 (2017)
• CERAPP: Collaborative Estrogen Receptor Activity Prediction Project, Environ. Health
Perspect. 124, 1023 (2016)
• Abstract Sifter: a comprehensive front-end system to PubMed, F1000, 6, 2164 (2017)
You want to know more…
• Lots of resources available
• Presentations: https://tinyurl.com/w5hqs55
• Communities of Practice Videos: https://rb.gy/qsbno1
• Manual: https://rb.gy/4fgydc
• Latest News: https://comptox.epa.gov/dashboard/news_info
64
Acknowledgments
• Contact: Williams.Antony@epa.gov
• Feedback and follow-up is
welcomed! Your questions help
• The dashboard is based on the
efforts of many more team
members than us. Many
collaborators provide data also.
65
EPA’s Center for Computational Toxicology and Exposure

More Related Content

What's hot

US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases? Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases?
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

What's hot (20)

US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
Accessing information for Per- & Polyfluoroalkyl Substances using the US EPA ...
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases? Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases?
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
 
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
US-EPA CompTox Chemicals Dashboard providing access to experimental and predi...
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
The EPA Comptox Chemicals Dashboard as a Data Integration Hub for Environment...
 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 

Similar to Introduction to Cheminformatics: Accessing data through the CompTox Chemicals Dashboard

Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Cheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting ExposomicsCheminformatics Support for MS Supporting Exposomics
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Success in decision making data relevance curation
Success in decision making data relevance curationSuccess in decision making data relevance curation
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to Introduction to Cheminformatics: Accessing data through the CompTox Chemicals Dashboard (20)

Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
Cheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting ExposomicsCheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting Exposomics
 
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
Chemistry Data Delivery from the US-EPA Center for Computational Toxicology a...
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
 
Success in decision making data relevance curation
Success in decision making data relevance curationSuccess in decision making data relevance curation
Success in decision making data relevance curation
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Chemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistryChemistry data delivery from the US-EPA to support environmental chemistry
Chemistry data delivery from the US-EPA to support environmental chemistry
 
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
Using Cheminformatics Approaches to Develop a Structure Searchable Database o...
 
Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 

Recently uploaded

extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 

Recently uploaded (20)

extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 

Introduction to Cheminformatics: Accessing data through the CompTox Chemicals Dashboard

  • 1. Center for Computational Toxicology and Exposure, US-EPA, RTP, NC http://www.orcid.org/0000-0002-2668-4821 Introduction to Cheminformatics: Accessing data through the CompTox Dashboard The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA Antony John Williams williams.antony@epa.gov UNC Chapel Hill: September 20th 2021
  • 2. Who am I? https://orcid.org/0000-0002-2668-4821 • Computational chemist at the US-EPA – scientist • Responsibility for cheminformatics projects, internal & external collaborations, “product marketing”– cheminformatician • Work with a team of people developing software solutions – “product & project manager” • Scientific publications, books, blogger – author; I am @ChemConnector – social networker
  • 3. Learning Objectives • A very short overview of cheminformatics focused on • Chemical identifiers and some associated challenges • Molecular fingerprints • Molecular similarity • Structure-based modeling (QSAR/QSPR/QSUR) • An overview of the CompTox Chemicals Dashboard and how it can help to: • Search, source, visualize and download data for singleton or thousands of chemicals • Perform real-time prediction calculations and read-across • Navigate into dozens of other online resources that contain additional data
  • 4. Problem: Too Many Chemicals and Too Few Resources • Fast characterization of human and ecological risk posed by existing and emerging chemicals is a critical challenge • Chemistry never stops. But there is sparse and distributed data… 70 60 50 40 30 20 10 0 Percent of Chemicals Acute Gentox Cancer Dev Tox EDSP Tier 1 <1% Repro Tox Modified from Judson et al., EHP 2010 Data for Environmental Chemicals CAS REGISTRY® contains more than 171 million unique organic and inorganic chemical substances, such as alloys, coordination compounds, minerals, mixtures, polymers and salts, and more than 68 million protein and DNA sequences
  • 5. Solution • Develop a “first-stop-shop” for environmental chemical data to support EPA and partner decision making: – Centralized location for relevant chemical data – Chemistry, exposure, hazard and dosimetry – Combination of existing data and predictive models – Publicly accessible, periodically updated, curated • Easy access to data improves efficiency and ultimately accelerates chemical risk assessment
  • 6. Cheminformatics and the Dashboard • Cheminformatics is the application of computer science and informatics-based approaches to: • Represent chemical structures, substances and reactions • Store chemistry-related data • Search for chemistry related data • Model data sets to provide predictive capabilities • Visualize and analyse chemistry related data • The US-EPA uses cheminformatics (and bioinformatics) to manipulate, integrate, store, model and deliver access to our data. The CompTox Chemicals Dashboard is built on a solid cheminformatics foundation
  • 7. Types of Chemical Identifiers • Structural Identifiers • The visual depiction • Multiple electronic formats • InChI (Key): FMMWHPNWAFZXNH- UHFFFAOYSA-N • Common Name: Benzo(a)pyrene • Systematic Name: Benzo[pqr]tetraphene • CAS Registry Number(s) : 50-32-8 • Lots of other “common names and trade names”
  • 8. Information Associated with a Chemical Structure? INTRINSIC PROPERTIES • Formula : C20H12 • Molecular weight: 252.316 g/mol • Monoisotopic Mass: 252.093900 g/mol MEASURED PROPERTIES • LogKow 6.13 • Melting Pt177°C • Boiling Pt 485°C • ….and many more
  • 9. How to Store a Chemical Structure •Multiple approaches: • Names and identifiers • 2D or 3D structure “molfile” • SMILES: • c1cc2c3ccc4cccc5ccc(cc2cc1)c3c45 • C1=CC2=CC3=CC=C4C=CC=C5C=CC(=C2C=C1)C3=C45 • and many other variants…. • InChI=1S/C20H12/c1-2-7-17-15(4-1)12-16-9-8-13-5-3- 6-14-10-11-18(17)20(16)19(13)14/h1-12H • InChIKey: FMMWHPNWAFZXNH-UHFFFAOYSA-N
  • 10. If We Database Chemical Structures… • …then we can search the dataset by inherent structural properties • Formula • Mass • Substructure • Structural similarity • …we can integrate other info into the database for retrieval • …available data, both experimental and predicted, is a click away • …data can be downloaded, distributed and shared • …linking out to other resources enabled by adopting specific standards • …structure collections, with associated data, are available for modeling
  • 11. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 11 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH
  • 12. CompTox Chemicals Dashboard 883k Chemical Substances
  • 13. BASIC Search • Type ahead search using Names, synonyms and CASRNs • Millions of identifiers • Substring search
  • 14. Search for classes of chemicals • Examples: “perfluoro”
  • 15. Challenges with Nomenclature • Be CAREFUL with names! There is a LOT of confusion in the public domain. CHOOSE sources wisely! • There are MANY public databases but not many are curated • All public databases have value but not many curate data • Example: METHANE on PubChem https://pubchem.ncbi.nlm.nih.gov/compound/297
  • 16. CAS Registry Numbers on PubChem
  • 17. CASRN lookup on the dashboard
  • 18. Methane is Diamond and Nanotubes? • These are all Depositor Names for Methane 
  • 19. Detailed Chemical Pages One more identifier – the DTXSID • Chemical page: Wikipedia snippet when available, intrinsic properties, structural identifiers, linked substances
  • 20. Detailed Chemical Pages Easy Navigation • Chemical page: Wikipedia snippet when available, intrinsic properties, structural identifiers, linked substances
  • 21. From the Chemical Details Page… all chemicals with same FORMULA
  • 22. How many chemicals are associated through LINKED SUBSTANCES? • Atrazine, is a herbicide – in MANY commercial products • The dashboard has salt forms, isotopically labelled forms, multicomponent forms • How do we identify what they are???
  • 23. A little more about the InChI • An InChIKey is made up of two blocks… • Block 1 – “the connectivity” of atoms and bonds • Block 2 – isotopes, charge, stereo • The InChIKey is VERY USEFUL
  • 24. Searching using InChI • Demo an internet search using InChIs – Cholesterol has the InChIKey: HVYWMOMLDIMFJA-DPAQBDIFSA-N • Demo Atrazine – Linked Substances – Skeleton • More about Linked Substances….
  • 25. Linked Substances – more interesting • We map chemicals together using cheminformatics approaches • Use desalting, destereo, split multicomponents etc to map chemicals together
  • 27. A little more about our data quality • Five full time curators register and curate data to elevate quality
  • 29. DSSTox_v2 Public Curated 5. Low 2. Low 6. Untrusted 1. High 3. High 4. Med 7. Incomplete validated Public_Untrusted Public_Low Public_Medium Public_High DSSTox_Low DSSTox_High 7269 16K 33K 101K 590K ~ 310K pending ~ 150K pending ~ 150K substances in top 4 QC quality bins Distribution of curated data Now at >1.2 MILLION substances
  • 30. A little more about our data quality
  • 31. Navigating data via the Left Hand Tabs
  • 32. Experimental and Predicted Data • Physchem and Fate & Transport experimental and predicted data • Data can be downloaded as Excel, TSV and CSV files • Predictions: multiple algorithms • EPI Suite: Estimation Program Interface • ACD/Labs (commercial) • TEST: Toxicity Estimation Software Tool • OPERA: OPEn structure–activity/ property Relationship App
  • 33. Chemical Hazard Data ToxVal Database • >50k chemicals • >770k tox. values • >30 sources of data • ~5k journals cited • ~70k citations
  • 35. Identifiers Support Searches in other systems
  • 36. More About CASRNs • CASRNs are very useful, and still limited • Not every chemical has a STRUCTURE…substances vs structures • “Chemical Abstracts Service” – numbers don’t exist until they abstracted and indexed • Not every chemical on the dashboard necessarily has a CASRN – how would you find those that didn’t??? Hint: Search NOCAS_ • There are ~6000 chemicals without CASRN on the dashboard • A chemical can also have many deleted CASRNs
  • 37. Products Searching What chemicals are in hair care products?
  • 38. Let’s Talk Exposure • Types of Exposure Data on the Dashboard • Consumer product categories and uses • Products containing the chemical • Predicted exposure levels from modeling (more in next session)
  • 39. Sources of Exposure to Chemicals
  • 40. QSAR modeling • What do you trust more? Experimental or predicted data? • Do you trust individual models or consensus models • What if there are no experimental data, how good are predictions?
  • 41. Data Curation Pipelines plus Manual Curation Processes
  • 42. Property and Fate and Transport Data ~25 MILLION pre-predicted values • We have built QSPR models based on tens of thousands of property data points curated over the past decade • We push our “QSAR-Ready” chemical structures through predictions to produce property predictions
  • 45. Similar reports for TEST predictions
  • 49. What’s the best way to search the internet for chemical data? • We know how complex chemicals identifiers are… • CASRN(s) • Hundreds of names (maybe) • SMILES • InChIs • EINECS, EC numbers • What can WE do to help you navigate the internet?
  • 50. Identifiers are used in the app • Identifiers are used to feed and link into “Literature”
  • 51. Literature Searching • Real-time retrieval of data from PubMed ~30 million abstracts and growing) • Choose from set of pre-defined queries • Adjust and fine tune queries based on interests
  • 52. Literature Searching • “Sifting” of results using multiple terms • Frequency counting terms • Color highlighting of terms • Download list to Excel • Send list to PubMed for downloading ref. file • Direct link via PubMed ID
  • 53. External Links – Also use Identifiers Names, CASRN, PubChem IDs, InChIs… 53
  • 54. External Links • Links to ~90 websites providing access to additional data on the chemical of interest
  • 56. PFAS lists of Chemicals 56
  • 57. Curated List of Pesticides •Find list of interest •Select list and send to batch
  • 58. Batch Searching • Singleton searches are great but… • …we generally want data on LOTS of chemicals! • Typical questions • What are the structures for a set of chemical names? Set of CASRNs? • Can I get chemical lists in Excel files? As a list of SMILES strings? Can I get an SDF file? • Can I include predicted properties in the download file? OPERA? TEST? • Are “these chemicals” screened in Toxcast? • I’m a mass spectrometrist and need masses and formulae for a list of chemicals
  • 59. Access data en masse for thousands of chemicals….
  • 60. Select Output Format and Content
  • 62. Summary and Conclusion • CompTox Chemicals Dashboard - a central hub for environmental data • ~900k chemical substances • Integrating property data, hazard data, exposure data, in vitro bioactivity data • Interrogation of bioactivity data - • Multiple types of searches • Batch search for thousands of chemicals • Real-time property and toxicity predictions • Downloadable files – CSV, TSV and Excel
  • 63. References • The CompTox Chemistry Dashboard: a community data resource for environmental chemistry, J. Cheminformatics, 9, 61 (2017) • EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research, Comp. Tox. 12, 100096 (2019) • OPERA models for predicting physicochemical properties and environmental fate endpoints, J. Cheminformatics, 10, 10 (2018) • Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model, Environ. Sci. Technol. 49, 8804-8814 (2015) • ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology, Chem. Res. Toxicol. 29, 1225-51 (2016) • Development and Validation of a Computational Model for Androgen Receptor Activity, Chem. Res. Toxicol. 30, 946-964 (2017) • CERAPP: Collaborative Estrogen Receptor Activity Prediction Project, Environ. Health Perspect. 124, 1023 (2016) • Abstract Sifter: a comprehensive front-end system to PubMed, F1000, 6, 2164 (2017)
  • 64. You want to know more… • Lots of resources available • Presentations: https://tinyurl.com/w5hqs55 • Communities of Practice Videos: https://rb.gy/qsbno1 • Manual: https://rb.gy/4fgydc • Latest News: https://comptox.epa.gov/dashboard/news_info 64
  • 65. Acknowledgments • Contact: Williams.Antony@epa.gov • Feedback and follow-up is welcomed! Your questions help • The dashboard is based on the efforts of many more team members than us. Many collaborators provide data also. 65 EPA’s Center for Computational Toxicology and Exposure

Editor's Notes

  1. From Two neologisms, “chemoinformatics” and “cheminformatics”, presently occur with near-equal frequency as a search in the database of the Chemical Abstracts Service reveals. In the following the term “chemoinformatics” will be used without providing extensive justification but to show the linguistic relation to bioinformatics Definition is a summary of Engel, although very broad topic Mostly applied, historically in drug design – now increasingly used in toxicology as well [NB searching data includes similarity as we will look at in this ppt)
  2. https://en.wikipedia.org/wiki/Flutamide Numbers of atoms Properties from epa dashboard
  3. Because we can https://en.wikipedia.org/wiki/Flutamide NB. We cannot store the chemical structure as it is written MKXKFYHWDHIYRV-UHFFFAOYSA-N is Inchikey Those of us who remember the world before SMILES, still appreciate its beauty! Many methods make chemical structure machine readable Convenience – don’t need to go through paper files Sustainability - (in theory at least) permanent storage Transferability - (in theory at least) anyone in world can view and retrieve data
  4. Because we can https://en.wikipedia.org/wiki/Flutamide Many methods make chemical structure machine readable Convenience – don’t need to go through paper files Sustainability - (in theory at least) permanent storage Transferability - (in theory at least) anyone in world can view and retrieve data