SlideShare a Scribd company logo
1 of 99
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
http://www.orcid.org/0000-0002-2668-4821
Center for Computational Toxicology and Exposure, US-EPA, RTP, NC
http://www.orcid.org/0000-0002-2668-4821
How to Place Your Research
Question or Results into the
Context of Legacy Data
The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA
Antony John Williams
williams.antony@epa.gov
My background…
• Worked in
• Government lab as postdoc
• Academia as NMR Facility Manager
• Fortune 500 Company as NMR Tech Leader
• Small start-up company as product manager, marketing manager,
Chief Science Officer
• Consultant – chemistry informatics industry
• Company owner – created ChemSpider
• Royal Society of Chemistry (bought ChemSpider)
• US-EPA – cheminformatician and “connector”
Who am I today?
https://orcid.org/0000-0002-2668-4821
• Computational chemist at the US-EPA – scientist
• Responsibility for cheminformatics projects, internal
& external collaborations, “product marketing”–
cheminformatician
• Work with a team of people developing software
solutions – “product & project manager”
• Scientific publications, books, blogger – author; I
am @ChemConnector – social networker
Abbreviations
• CompTox - Computational
Toxicology
• DSSTox - Distributed Structure
Searchable Toxicity DB
• CASRN - Chemical Abstracts
Registry Number
• InChI - International Chemical
Identifier
• QMRF – QSAR Model Report
Format
• ToxVal - Toxicity Value Database
• OPERA - OPEn structure–activity
Relationship App
• TEST - Toxicity Estimation
Software Tool
• ToxCAST - Toxicity Forecaster
• SDF - Structure data file
Learning Objectives
• A very short overview of cheminformatics focused on
• Chemical identifiers and some associated challenges
• Molecular fingerprints
• Molecular similarity
• Structure-based modeling (QSAR/QSPR/QSUR)
• An overview of the CompTox Chemicals Dashboard and how it can help to:
• Search, source, visualize and download data for singleton or thousands of chemicals
• Perform real-time prediction calculations and read-across
• Navigate into dozens of other online resources that contain additional data
Problem: Too Many Chemicals and Too Few Resources
• Fast characterization of human and ecological risk posed by
existing and emerging chemicals is a critical challenge
• Chemistry never stops. But there is sparse and distributed data…
70
60
50
40
30
20
10
0
Percent
of
Chemicals
Acute
Gentox
Cancer
Dev Tox
EDSP Tier 1
<1%
Repro Tox
Modified from Judson et al., EHP 2010
Data for
Environmental Chemicals
CAS REGISTRY® contains more than
171 million unique organic and
inorganic chemical substances,
such as alloys, coordination
compounds, minerals, mixtures,
polymers and salts, and more than
68 million protein and DNA sequences
Solution
• Develop a “first-stop-shop” for environmental chemical data
to support EPA and partner decision making:
– Centralized location for relevant chemical data
– Chemistry, exposure, hazard and dosimetry
– Combination of existing data and predictive models
– Publicly accessible, periodically updated, curated
• Easy access to data improves efficiency and ultimately
accelerates chemical risk assessment
Cheminformatics and the
Dashboard
• Cheminformatics is the application of computer science and
informatics-based approaches to:
• Represent chemical structures, substances and reactions
• Store chemistry-related data
• Search for chemistry related data
• Model data sets to provide predictive capabilities
• Visualize and analyse chemistry related data
• The US-EPA uses cheminformatics (and bioinformatics) to
manipulate, integrate, store, model and deliver access to our data.
The CompTox Chemicals Dashboard is built on a solid
cheminformatics foundation
Types of Chemical Identifiers
• Structural Identifiers
• The visual depiction
• Multiple electronic formats
• InChI (Key): FMMWHPNWAFZXNH-
UHFFFAOYSA-N
• Common Name: Benzo(a)pyrene
• Systematic Name: Benzo[pqr]tetraphene
• CAS Registry Number(s) : 50-32-8
• Lots of other “common names and trade
names”
Information Associated with a Chemical Structure?
INTRINSIC PROPERTIES
• Formula : C20H12
• Molecular weight: 252.316 g/mol
• Monoisotopic Mass: 252.093900 g/mol
MEASURED PROPERTIES
• LogKow 6.13
• Melting Pt177°C
• Boiling Pt 485°C
• ….and many more
How to Store a Chemical Structure
•Multiple approaches:
• Names and identifiers
• 2D or 3D structure “molfile”
• SMILES:
• c1cc2c3ccc4cccc5ccc(cc2cc1)c3c45
• C1=CC2=CC3=CC=C4C=CC=C5C=CC(=C2C=C1)C3=C45
• and many other variants….
• InChI=1S/C20H12/c1-2-7-17-15(4-1)12-16-9-8-13-5-3-
6-14-10-11-18(17)20(16)19(13)14/h1-12H
• InChIKey: FMMWHPNWAFZXNH-UHFFFAOYSA-N
If We Database Chemical Structures…
• …then we can search the dataset by inherent structural properties
• Formula
• Mass
• Substructure
• Structural similarity
• …we can integrate other info into the database for retrieval
• …available data, both experimental and predicted, is a click away
• …data can be downloaded, distributed and shared
• …linking out to other resources enabled by adopting specific standards
• …structure collections, with associated data, are available for modeling
CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
15
SEARCH
TOX DATA
BIOACTIVITY
SIMILARITY
READ-ACROSS
PUBMED
BATCH SEARCH
CompTox Chemicals Dashboard
883k Chemical Substances
BASIC Search
• Type ahead search using Names,
synonyms and CASRNs
• Millions of identifiers
• Substring search
Search for classes of chemicals
• Examples: “perfluoro”
Search for classes of chemicals
Examples:
• PCBs (or polychlorinated biphenyls)
• Polycyclic aromatic hydrocarbons
QUESTION 1
• How many “conazoles” are in the dashboard?
29 56 125 321
Challenges with Nomenclature
• Be CAREFUL with names! There is a LOT of confusion in the
public domain. CHOOSE sources wisely!
• There are MANY public databases but not many are curated
• All public databases have value but not many curate data
• Example: METHANE on PubChem
https://pubchem.ncbi.nlm.nih.gov/compound/297
CAS Registry Numbers on PubChem
CASRN lookup on the dashboard
Methane is Diamond and Nanotubes?
• These are all Depositor Names for Methane 
QUESTION 1
• How many “conazoles” are in the dashboard?
29 56 125 321
Detailed Chemical Pages
One more identifier – the DTXSID
• Chemical page: Wikipedia snippet when available, intrinsic
properties, structural identifiers, linked substances
DTXSID as a substance identifier
• Blessed now by Wikipedia
• In MANY public databases – ChemSpider, PubChem, eChemPortal
• Increasingly used across all EPA databases
• Easy to use/search
• URL linking: https://comptox.epa.gov/dashboard/DTXSID0020737
• ANY database can link directly via DTXSIDs…more later..
Detailed Chemical Pages
Easy Navigation
• Chemical page: Wikipedia snippet when available, intrinsic
properties, structural identifiers, linked substances
From the Chemical Details Page…
all chemicals with same FORMULA
How many chemicals are associated
through LINKED SUBSTANCES?
• Atrazine, is a herbicide – in MANY commercial products
• The dashboard has salt forms, isotopically labelled forms,
multicomponent forms
• How do we identify what they are???
A little more about the InChI
• An InChIKey is made up of two blocks…
• Block 1 – “the connectivity” of atoms and bonds
• Block 2 – isotopes, charge, stereo
• The InChIKey is VERY USEFUL
Searching using InChI
• Demo an internet search using InChIs – Cholesterol has the
InChIKey: HVYWMOMLDIMFJA-DPAQBDIFSA-N
• Demo Atrazine – Linked Substances – Skeleton
• More about Linked Substances….
Linked Substances – more interesting
• We map chemicals together
using cheminformatics
approaches
• Use desalting, destereo,
split multicomponents etc to
map chemicals together
Atrazine Linked Substances
QUESTION 2
• How many salts of perfluorooctanesulfonic acid are there?
6 15 18 43
A little more about our data quality
• Five full time curators register and curate data to elevate quality
Underneath the Dashboard
37
DSSTox_v2
Public
Curated
5. Low
2. Low
6. Untrusted
1. High
3. High
4. Med
7. Incomplete
validated
Public_Untrusted
Public_Low
Public_Medium
Public_High
DSSTox_Low
DSSTox_High 7269
16K
33K
101K
590K
~ 310K pending
~ 150K pending
~ 150K substances in
top 4 QC quality bins
Distribution of curated data
Now at >910k substances
Record Information Quality Flags
39
ChemReg Curation
The ALANWOOD Pesticide Set
40
A little more about our data quality
QUESTION 2
• How many salts of perfluorooctanesulfonic acid are there?
6 15 18 43
Navigating data via the Left Hand Tabs
“Executive Summary”
• Overview of toxicity-related info
• Quantitative values
• Info re. toxicology subsets
• Physchem. and Fate & Transport
• Adverse Outcome Pathway links
• In vitro bioactivity summary plot
Experimental and Predicted Data
• Physchem and Fate & Transport
experimental and predicted data
• Data can be downloaded as Excel, TSV
and CSV files
• Predictions: multiple algorithms
• EPI Suite: Estimation Program Interface
• ACD/Labs (commercial)
• TEST: Toxicity Estimation Software Tool
• OPERA: OPEn structure–activity/
property Relationship App
Chemical Hazard Data
ToxVal Database
• >50k chemicals
• >770k tox. values
• >30 sources of data
• ~5k journals cited
• ~70k citations
Let’s talk about Export Formats
• Anywhere you see a table you can export
• CSV is great for integration with other applications (plus read into Excel)
• Excel file is generally the best for “viewing” as it can have multiple
worksheets, color flagging of cells and offers all
• If you have cheminformatics tools SDF files are the best – view
structures directly as concatenated “molfiles”
• Some examples… download file(s)
Advantages of Open Data
• By doing data exchange between databases you get to use
other peoples work
• Linkages between systems (more later)
• Calling “APIs and Web Services” on the fly….
• SHOW EXAMPLES
Safety Data
Identifiers Support Searches in other
systems
QUESTION 3
• How many CAS Registry Numbers does Atrazine have?
3 5 7 12
More About CASRNs
• CASRNs are very useful, and still limited
• Not every chemical has a STRUCTURE…substances vs structures
• “Chemical Abstracts Service” – numbers don’t exist until they
abstracted and indexed
• Not every chemical on the dashboard necessarily has a CASRN –
how would you find those that didn’t??? Hint: Search NOCAS_
• There are ~6000 chemicals without CASRN on the dashboard
• A chemical can also have many deleted CASRNs
• Where do you look for all identifiers for a chemical???
QUESTION 3
• How many CAS Registry Numbers does Atrazine have?
3 5 7 12
Products Searching
What chemicals are in hair care products?
Let’s Talk Exposure
• Types of Exposure Data on the Dashboard
• Consumer product categories and uses
• Products containing the chemical
• Predicted exposure levels from modeling (more in next session)
Sources of Exposure to Chemicals
QUESTION 4
• How many consumer products are listed as containing chlorothalonil?
45 67 138 203
About exposure data
• Not every chemical has associated categories and uses
• MODELING data depends on QSAR predictions (structures)
• Data gathering continues unabated and has interesting sources
QUESTION 4
• How many consumer products are listed as containing chlorothalonil?
45 67 138 203
While we are discussing QSAR modeling
• What do you trust more? Experimental or predicted data?
• Do you trust individual models or consensus models
• What if there are no experimental data, how good are predictions?
• This will be covered a little in my next presentation and in way
more detail in a later session
QUESTION 5
• What is the LogKow for fluconazole?
~0.2 ~0.3 ~0.5 ~3.4
How do we gather data for models
• We are continuously gathering data…where from?
• How do we validate?
• What can we check?
• Projects underway at present that may be of interest
• Water Solubility dataset
• Mass Spec Amenability
• Eye and Skin Sensitization and Irritation
• PFAS chemicals
QUESTION 5
• What is the LogKow for fluconazole?
~0.2 ~0.3 ~0.5 ~3.4
QUESTION 6
• What is the brand name for Ketoconazole? (There may be more than
one)
Glyphosate Bisphenol A Cialis Nizoral
What’s the best way to search the
internet for chemical data?
• We know how complex chemicals identifiers are…
• CASRN(s)
• Hundreds of names (maybe)
• SMILES
• InChIs
• EINECS, EC numbers
• What can WE do to help you navigate the internet?
QUESTION 6
• What is the brand name for Ketoconazole? (There may be more than
one)
Glyphosate Bisphenol A Cialis Nizoral
Identifiers are used in the app
• Identifiers are used to feed and link into “Literature”
Literature Searching
• Real-time retrieval of data from PubMed ~30
million abstracts and growing)
• Choose from set of pre-defined queries
• Adjust and fine tune queries based on interests
Literature Searching
• “Sifting” of results using
multiple terms
• Frequency counting terms
• Color highlighting of terms
• Download list to Excel
• Send list to PubMed for
downloading ref. file
• Direct link via PubMed ID
External Links – Also use Identifiers
Names, CASRN, PubChem IDs, InChIs…
70
External Links
• Links to ~90 websites providing access to
additional data on the chemical of interest
QUESTION 7
• Searches can be done for CLASSES of Chemicals too
• How many explicit chemicals are included in the class of
chemicals “polychlorinated biphenyls”
302 211 209 104
Some example classes where we
map data
• Classes of chemicals – what do we have?
• PAHs, PCBs, PBDEs etc…
• Strings and Substrings
• InChI “first block” for skeleton
• Other “classes”
• Similar chemicals
• Related Substances
• Lists of chemicals
QUESTION 7
• Searches can be done for CLASSES of Chemicals too
• How many explicit chemicals are included in the class of
chemicals “polychlorinated biphenyls”
302 211 209 104
Similarity Searching
QUESTION 8
• How many chemicals are “similar” in structure to ketoconazole
with a threshold of >0.8 similarity (based on Tanimoto score).
80 800 63 8
Related Substances
QUESTION 8
• How many chemicals are “similar” in structure to ketoconazole
with a threshold of >0.8 similarity (based on Tanimoto score).
80 800 63 8
Chemical Lists and
Categories
Example: AEGLs list
80
PFAS lists of Chemicals
81
QUESTION 9
• How many chemicals are in the EPA Toxics Release Inventory
based on the associated list available on the dashboard?
666 677 766 777
Curated List of Pesticides
•Find list of interest
•Select list and
send to batch
Batch Searching
• Singleton searches are great but…
• …we generally want data on LOTS of chemicals!
• Typical questions
• What are the structures for a set of chemical names? Set of CASRNs?
• Can I get chemical lists in Excel files? As a list of SMILES strings?
Can I get an SDF file?
• Can I include predicted properties in the download file? OPERA?
TEST?
• Are “these chemicals” screened in Toxcast?
• I’m a mass spectrometrist and need masses and formulae for a list of
chemicals
QUESTION 9
• How many chemicals are in the EPA Toxics Release Inventory
based on the associated list available on the dashboard?
666 677 766 777
Access data en masse for thousands of chemicals….
Select Output Format and Content
Batch Search CASRNs
88
Send to batch and select….
• A few seconds to assemble
• ToxCast data - #actives/#assays and % active
• # articles in PubMed
• Links to IRIS or PPRTV reports
• TEST or OPERA predictions
• Exposure data: predictions and CPDat
QUESTION 10
• How many chemicals are in the Tox21 Screening library, based on the
associated dashboard list
4987 6587 8947 7632
• What is the Tox21 Screening Library?? How is it valuable? NEXT TIME
In the next Session we discuss NAMS
QUESTION 10
• How many chemicals are in the Tox21 Screening library, based on the
associated dashboard list
4987 6587 8947 7632
Summary and Conclusion
• CompTox Chemicals Dashboard - a
central hub for environmental data
• ~875k chemical substances
• Integrating property data, hazard data,
exposure data, in vitro bioactivity data
• Interrogation of bioactivity data -
• Multiple types of searches
• Batch search for thousands of chemicals
• Real-time property and toxicity predictions
• Downloadable files – CSV, TSV and Excel
Summary and Conclusions
• The Dashboard is one of many
applications on the internet from
which you can source data
• It is not difficult to do a Google
search on get some form of
answer
• Always question the data, and
sometimes even “facts”
• We work hard to qualify, curate
and validate data but obtain some
from public sources
If you find an error, or want to comment…
Select text and “Submit Comment”
References
• The CompTox Chemistry Dashboard: a community data resource for environmental
chemistry, J. Cheminformatics, 9, 61 (2017)
• EPA’s DSSTox database: History of development of a curated chemistry resource
supporting computational toxicology research, Comp. Tox. 12, 100096 (2019)
• OPERA models for predicting physicochemical properties and environmental fate
endpoints, J. Cheminformatics, 10, 10 (2018)
• Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model,
Environ. Sci. Technol. 49, 8804-8814 (2015)
• ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology, Chem. Res.
Toxicol. 29, 1225-51 (2016)
• Development and Validation of a Computational Model for Androgen Receptor Activity,
Chem. Res. Toxicol. 30, 946-964 (2017)
• CERAPP: Collaborative Estrogen Receptor Activity Prediction Project, Environ. Health
Perspect. 124, 1023 (2016)
• Abstract Sifter: a comprehensive front-end system to PubMed, F1000, 6, 2164 (2017)
You want to know more…
• Lots of resources available
• Presentations: https://tinyurl.com/w5hqs55
• Communities of Practice Videos: https://rb.gy/qsbno1
• Manual: https://rb.gy/4fgydc
• Latest News: https://comptox.epa.gov/dashboard/news_info
97
Acknowledgments
• Contact: Williams.Antony@epa.gov
• Feedback and follow-up is
welcomed! Your questions help
• The dashboard is based on the
efforts of many more team
members than us. Many
collaborators provide data also.
98
EPA’s Center for Computational Toxicology and Exposure
EPA Dashboard Searches Chemicals, Assesses Risks

More Related Content

What's hot

What's hot (20)

Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
Introduction to Cheminformatics: Accessing data through the CompTox Chemicals...
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
 
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
US EPA CompTox Chemistry Dashboard as a source of data to fill data gaps for ...
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...New developments in delivering public access to data from the National Center...
New developments in delivering public access to data from the National Center...
 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
 
Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...Delivering The Benefits of Chemical-Biological Integration in Computational T...
Delivering The Benefits of Chemical-Biological Integration in Computational T...
 
Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...Development of a Tool for Systematic Integration of Traditional and New Appro...
Development of a Tool for Systematic Integration of Traditional and New Appro...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...Adding complex expert knowledge into chemical database and transforming surfa...
Adding complex expert knowledge into chemical database and transforming surfa...
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings Open PHACTS Chemistry Platform Update and Learnings
Open PHACTS Chemistry Platform Update and Learnings
 
Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases? Does bigger mean better in the world of chemistry databases?
Does bigger mean better in the world of chemistry databases?
 
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
The US-EPA CompTox Chemicals Dashboard – a key player in the domain of Open S...
 

Similar to EPA Dashboard Searches Chemicals, Assesses Risks

The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...Andrew McEachran
 
Royal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryRoyal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryKen Karapetyan
 

Similar to EPA Dashboard Searches Chemicals, Assesses Risks (20)

US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
US-EPA Cheminformatics Support for Delivering Data Related to Chemicals of E...
 
Cheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting ExposomicsCheminformatics Support for MS Supporting Exposomics
Cheminformatics Support for MS Supporting Exposomics
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Env...
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Chemis...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Chemis...
 
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Utilizing US-EPA Data Dashboards to Support Exposomics research
Utilizing US-EPA Data Dashboards to  Support Exposomics researchUtilizing US-EPA Data Dashboards to  Support Exposomics research
Utilizing US-EPA Data Dashboards to Support Exposomics research
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...
Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...
Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...
 
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
Data delivery from the US-EPA Center for Computational Toxicology and Exposur...
 
Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...Incorporating new technologies and High Throughput Screening in the design an...
Incorporating new technologies and High Throughput Screening in the design an...
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
Royal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryRoyal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discovery
 
Royal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discoveryRoyal society of chemistry developments to support open drug discovery
Royal society of chemistry developments to support open drug discovery
 
Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...Web-based access to experimental and predicted data for environmental fate, t...
Web-based access to experimental and predicted data for environmental fate, t...
 

Recently uploaded

User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)itwameryclare
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 

Recently uploaded (20)

User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)Functional group interconversions(oxidation reduction)
Functional group interconversions(oxidation reduction)
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 

EPA Dashboard Searches Chemicals, Assesses Risks

  • 1. Center for Computational Toxicology and Exposure, US-EPA, RTP, NC http://www.orcid.org/0000-0002-2668-4821
  • 2.
  • 3. Center for Computational Toxicology and Exposure, US-EPA, RTP, NC http://www.orcid.org/0000-0002-2668-4821 How to Place Your Research Question or Results into the Context of Legacy Data The views expressed in this presentation are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA Antony John Williams williams.antony@epa.gov
  • 4. My background… • Worked in • Government lab as postdoc • Academia as NMR Facility Manager • Fortune 500 Company as NMR Tech Leader • Small start-up company as product manager, marketing manager, Chief Science Officer • Consultant – chemistry informatics industry • Company owner – created ChemSpider • Royal Society of Chemistry (bought ChemSpider) • US-EPA – cheminformatician and “connector”
  • 5. Who am I today? https://orcid.org/0000-0002-2668-4821 • Computational chemist at the US-EPA – scientist • Responsibility for cheminformatics projects, internal & external collaborations, “product marketing”– cheminformatician • Work with a team of people developing software solutions – “product & project manager” • Scientific publications, books, blogger – author; I am @ChemConnector – social networker
  • 6. Abbreviations • CompTox - Computational Toxicology • DSSTox - Distributed Structure Searchable Toxicity DB • CASRN - Chemical Abstracts Registry Number • InChI - International Chemical Identifier • QMRF – QSAR Model Report Format • ToxVal - Toxicity Value Database • OPERA - OPEn structure–activity Relationship App • TEST - Toxicity Estimation Software Tool • ToxCAST - Toxicity Forecaster • SDF - Structure data file
  • 7. Learning Objectives • A very short overview of cheminformatics focused on • Chemical identifiers and some associated challenges • Molecular fingerprints • Molecular similarity • Structure-based modeling (QSAR/QSPR/QSUR) • An overview of the CompTox Chemicals Dashboard and how it can help to: • Search, source, visualize and download data for singleton or thousands of chemicals • Perform real-time prediction calculations and read-across • Navigate into dozens of other online resources that contain additional data
  • 8. Problem: Too Many Chemicals and Too Few Resources • Fast characterization of human and ecological risk posed by existing and emerging chemicals is a critical challenge • Chemistry never stops. But there is sparse and distributed data… 70 60 50 40 30 20 10 0 Percent of Chemicals Acute Gentox Cancer Dev Tox EDSP Tier 1 <1% Repro Tox Modified from Judson et al., EHP 2010 Data for Environmental Chemicals CAS REGISTRY® contains more than 171 million unique organic and inorganic chemical substances, such as alloys, coordination compounds, minerals, mixtures, polymers and salts, and more than 68 million protein and DNA sequences
  • 9. Solution • Develop a “first-stop-shop” for environmental chemical data to support EPA and partner decision making: – Centralized location for relevant chemical data – Chemistry, exposure, hazard and dosimetry – Combination of existing data and predictive models – Publicly accessible, periodically updated, curated • Easy access to data improves efficiency and ultimately accelerates chemical risk assessment
  • 10. Cheminformatics and the Dashboard • Cheminformatics is the application of computer science and informatics-based approaches to: • Represent chemical structures, substances and reactions • Store chemistry-related data • Search for chemistry related data • Model data sets to provide predictive capabilities • Visualize and analyse chemistry related data • The US-EPA uses cheminformatics (and bioinformatics) to manipulate, integrate, store, model and deliver access to our data. The CompTox Chemicals Dashboard is built on a solid cheminformatics foundation
  • 11. Types of Chemical Identifiers • Structural Identifiers • The visual depiction • Multiple electronic formats • InChI (Key): FMMWHPNWAFZXNH- UHFFFAOYSA-N • Common Name: Benzo(a)pyrene • Systematic Name: Benzo[pqr]tetraphene • CAS Registry Number(s) : 50-32-8 • Lots of other “common names and trade names”
  • 12. Information Associated with a Chemical Structure? INTRINSIC PROPERTIES • Formula : C20H12 • Molecular weight: 252.316 g/mol • Monoisotopic Mass: 252.093900 g/mol MEASURED PROPERTIES • LogKow 6.13 • Melting Pt177°C • Boiling Pt 485°C • ….and many more
  • 13. How to Store a Chemical Structure •Multiple approaches: • Names and identifiers • 2D or 3D structure “molfile” • SMILES: • c1cc2c3ccc4cccc5ccc(cc2cc1)c3c45 • C1=CC2=CC3=CC=C4C=CC=C5C=CC(=C2C=C1)C3=C45 • and many other variants…. • InChI=1S/C20H12/c1-2-7-17-15(4-1)12-16-9-8-13-5-3- 6-14-10-11-18(17)20(16)19(13)14/h1-12H • InChIKey: FMMWHPNWAFZXNH-UHFFFAOYSA-N
  • 14. If We Database Chemical Structures… • …then we can search the dataset by inherent structural properties • Formula • Mass • Substructure • Structural similarity • …we can integrate other info into the database for retrieval • …available data, both experimental and predicted, is a click away • …data can be downloaded, distributed and shared • …linking out to other resources enabled by adopting specific standards • …structure collections, with associated data, are available for modeling
  • 15. CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard 15 SEARCH TOX DATA BIOACTIVITY SIMILARITY READ-ACROSS PUBMED BATCH SEARCH
  • 16. CompTox Chemicals Dashboard 883k Chemical Substances
  • 17. BASIC Search • Type ahead search using Names, synonyms and CASRNs • Millions of identifiers • Substring search
  • 18. Search for classes of chemicals • Examples: “perfluoro”
  • 19. Search for classes of chemicals Examples: • PCBs (or polychlorinated biphenyls) • Polycyclic aromatic hydrocarbons
  • 20. QUESTION 1 • How many “conazoles” are in the dashboard? 29 56 125 321
  • 21. Challenges with Nomenclature • Be CAREFUL with names! There is a LOT of confusion in the public domain. CHOOSE sources wisely! • There are MANY public databases but not many are curated • All public databases have value but not many curate data • Example: METHANE on PubChem https://pubchem.ncbi.nlm.nih.gov/compound/297
  • 22. CAS Registry Numbers on PubChem
  • 23. CASRN lookup on the dashboard
  • 24. Methane is Diamond and Nanotubes? • These are all Depositor Names for Methane 
  • 25. QUESTION 1 • How many “conazoles” are in the dashboard? 29 56 125 321
  • 26. Detailed Chemical Pages One more identifier – the DTXSID • Chemical page: Wikipedia snippet when available, intrinsic properties, structural identifiers, linked substances
  • 27. DTXSID as a substance identifier • Blessed now by Wikipedia • In MANY public databases – ChemSpider, PubChem, eChemPortal • Increasingly used across all EPA databases • Easy to use/search • URL linking: https://comptox.epa.gov/dashboard/DTXSID0020737 • ANY database can link directly via DTXSIDs…more later..
  • 28. Detailed Chemical Pages Easy Navigation • Chemical page: Wikipedia snippet when available, intrinsic properties, structural identifiers, linked substances
  • 29. From the Chemical Details Page… all chemicals with same FORMULA
  • 30. How many chemicals are associated through LINKED SUBSTANCES? • Atrazine, is a herbicide – in MANY commercial products • The dashboard has salt forms, isotopically labelled forms, multicomponent forms • How do we identify what they are???
  • 31. A little more about the InChI • An InChIKey is made up of two blocks… • Block 1 – “the connectivity” of atoms and bonds • Block 2 – isotopes, charge, stereo • The InChIKey is VERY USEFUL
  • 32. Searching using InChI • Demo an internet search using InChIs – Cholesterol has the InChIKey: HVYWMOMLDIMFJA-DPAQBDIFSA-N • Demo Atrazine – Linked Substances – Skeleton • More about Linked Substances….
  • 33. Linked Substances – more interesting • We map chemicals together using cheminformatics approaches • Use desalting, destereo, split multicomponents etc to map chemicals together
  • 35. QUESTION 2 • How many salts of perfluorooctanesulfonic acid are there? 6 15 18 43
  • 36. A little more about our data quality • Five full time curators register and curate data to elevate quality
  • 38. DSSTox_v2 Public Curated 5. Low 2. Low 6. Untrusted 1. High 3. High 4. Med 7. Incomplete validated Public_Untrusted Public_Low Public_Medium Public_High DSSTox_Low DSSTox_High 7269 16K 33K 101K 590K ~ 310K pending ~ 150K pending ~ 150K substances in top 4 QC quality bins Distribution of curated data Now at >910k substances
  • 40. ChemReg Curation The ALANWOOD Pesticide Set 40
  • 41. A little more about our data quality
  • 42. QUESTION 2 • How many salts of perfluorooctanesulfonic acid are there? 6 15 18 43
  • 43. Navigating data via the Left Hand Tabs
  • 44. “Executive Summary” • Overview of toxicity-related info • Quantitative values • Info re. toxicology subsets • Physchem. and Fate & Transport • Adverse Outcome Pathway links • In vitro bioactivity summary plot
  • 45. Experimental and Predicted Data • Physchem and Fate & Transport experimental and predicted data • Data can be downloaded as Excel, TSV and CSV files • Predictions: multiple algorithms • EPI Suite: Estimation Program Interface • ACD/Labs (commercial) • TEST: Toxicity Estimation Software Tool • OPERA: OPEn structure–activity/ property Relationship App
  • 46. Chemical Hazard Data ToxVal Database • >50k chemicals • >770k tox. values • >30 sources of data • ~5k journals cited • ~70k citations
  • 47. Let’s talk about Export Formats • Anywhere you see a table you can export • CSV is great for integration with other applications (plus read into Excel) • Excel file is generally the best for “viewing” as it can have multiple worksheets, color flagging of cells and offers all • If you have cheminformatics tools SDF files are the best – view structures directly as concatenated “molfiles” • Some examples… download file(s)
  • 48. Advantages of Open Data • By doing data exchange between databases you get to use other peoples work • Linkages between systems (more later) • Calling “APIs and Web Services” on the fly…. • SHOW EXAMPLES
  • 50. Identifiers Support Searches in other systems
  • 51. QUESTION 3 • How many CAS Registry Numbers does Atrazine have? 3 5 7 12
  • 52. More About CASRNs • CASRNs are very useful, and still limited • Not every chemical has a STRUCTURE…substances vs structures • “Chemical Abstracts Service” – numbers don’t exist until they abstracted and indexed • Not every chemical on the dashboard necessarily has a CASRN – how would you find those that didn’t??? Hint: Search NOCAS_ • There are ~6000 chemicals without CASRN on the dashboard • A chemical can also have many deleted CASRNs • Where do you look for all identifiers for a chemical???
  • 53. QUESTION 3 • How many CAS Registry Numbers does Atrazine have? 3 5 7 12
  • 54. Products Searching What chemicals are in hair care products?
  • 55. Let’s Talk Exposure • Types of Exposure Data on the Dashboard • Consumer product categories and uses • Products containing the chemical • Predicted exposure levels from modeling (more in next session)
  • 56. Sources of Exposure to Chemicals
  • 57. QUESTION 4 • How many consumer products are listed as containing chlorothalonil? 45 67 138 203
  • 58. About exposure data • Not every chemical has associated categories and uses • MODELING data depends on QSAR predictions (structures) • Data gathering continues unabated and has interesting sources
  • 59. QUESTION 4 • How many consumer products are listed as containing chlorothalonil? 45 67 138 203
  • 60. While we are discussing QSAR modeling • What do you trust more? Experimental or predicted data? • Do you trust individual models or consensus models • What if there are no experimental data, how good are predictions? • This will be covered a little in my next presentation and in way more detail in a later session
  • 61. QUESTION 5 • What is the LogKow for fluconazole? ~0.2 ~0.3 ~0.5 ~3.4
  • 62. How do we gather data for models • We are continuously gathering data…where from? • How do we validate? • What can we check? • Projects underway at present that may be of interest • Water Solubility dataset • Mass Spec Amenability • Eye and Skin Sensitization and Irritation • PFAS chemicals
  • 63. QUESTION 5 • What is the LogKow for fluconazole? ~0.2 ~0.3 ~0.5 ~3.4
  • 64. QUESTION 6 • What is the brand name for Ketoconazole? (There may be more than one) Glyphosate Bisphenol A Cialis Nizoral
  • 65. What’s the best way to search the internet for chemical data? • We know how complex chemicals identifiers are… • CASRN(s) • Hundreds of names (maybe) • SMILES • InChIs • EINECS, EC numbers • What can WE do to help you navigate the internet?
  • 66. QUESTION 6 • What is the brand name for Ketoconazole? (There may be more than one) Glyphosate Bisphenol A Cialis Nizoral
  • 67. Identifiers are used in the app • Identifiers are used to feed and link into “Literature”
  • 68. Literature Searching • Real-time retrieval of data from PubMed ~30 million abstracts and growing) • Choose from set of pre-defined queries • Adjust and fine tune queries based on interests
  • 69. Literature Searching • “Sifting” of results using multiple terms • Frequency counting terms • Color highlighting of terms • Download list to Excel • Send list to PubMed for downloading ref. file • Direct link via PubMed ID
  • 70. External Links – Also use Identifiers Names, CASRN, PubChem IDs, InChIs… 70
  • 71. External Links • Links to ~90 websites providing access to additional data on the chemical of interest
  • 72. QUESTION 7 • Searches can be done for CLASSES of Chemicals too • How many explicit chemicals are included in the class of chemicals “polychlorinated biphenyls” 302 211 209 104
  • 73. Some example classes where we map data • Classes of chemicals – what do we have? • PAHs, PCBs, PBDEs etc… • Strings and Substrings • InChI “first block” for skeleton • Other “classes” • Similar chemicals • Related Substances • Lists of chemicals
  • 74. QUESTION 7 • Searches can be done for CLASSES of Chemicals too • How many explicit chemicals are included in the class of chemicals “polychlorinated biphenyls” 302 211 209 104
  • 76. QUESTION 8 • How many chemicals are “similar” in structure to ketoconazole with a threshold of >0.8 similarity (based on Tanimoto score). 80 800 63 8
  • 78. QUESTION 8 • How many chemicals are “similar” in structure to ketoconazole with a threshold of >0.8 similarity (based on Tanimoto score). 80 800 63 8
  • 81. PFAS lists of Chemicals 81
  • 82. QUESTION 9 • How many chemicals are in the EPA Toxics Release Inventory based on the associated list available on the dashboard? 666 677 766 777
  • 83. Curated List of Pesticides •Find list of interest •Select list and send to batch
  • 84. Batch Searching • Singleton searches are great but… • …we generally want data on LOTS of chemicals! • Typical questions • What are the structures for a set of chemical names? Set of CASRNs? • Can I get chemical lists in Excel files? As a list of SMILES strings? Can I get an SDF file? • Can I include predicted properties in the download file? OPERA? TEST? • Are “these chemicals” screened in Toxcast? • I’m a mass spectrometrist and need masses and formulae for a list of chemicals
  • 85. QUESTION 9 • How many chemicals are in the EPA Toxics Release Inventory based on the associated list available on the dashboard? 666 677 766 777
  • 86. Access data en masse for thousands of chemicals….
  • 87. Select Output Format and Content
  • 89. Send to batch and select…. • A few seconds to assemble • ToxCast data - #actives/#assays and % active • # articles in PubMed • Links to IRIS or PPRTV reports • TEST or OPERA predictions • Exposure data: predictions and CPDat
  • 90. QUESTION 10 • How many chemicals are in the Tox21 Screening library, based on the associated dashboard list 4987 6587 8947 7632 • What is the Tox21 Screening Library?? How is it valuable? NEXT TIME
  • 91. In the next Session we discuss NAMS
  • 92. QUESTION 10 • How many chemicals are in the Tox21 Screening library, based on the associated dashboard list 4987 6587 8947 7632
  • 93. Summary and Conclusion • CompTox Chemicals Dashboard - a central hub for environmental data • ~875k chemical substances • Integrating property data, hazard data, exposure data, in vitro bioactivity data • Interrogation of bioactivity data - • Multiple types of searches • Batch search for thousands of chemicals • Real-time property and toxicity predictions • Downloadable files – CSV, TSV and Excel
  • 94. Summary and Conclusions • The Dashboard is one of many applications on the internet from which you can source data • It is not difficult to do a Google search on get some form of answer • Always question the data, and sometimes even “facts” • We work hard to qualify, curate and validate data but obtain some from public sources
  • 95. If you find an error, or want to comment… Select text and “Submit Comment”
  • 96. References • The CompTox Chemistry Dashboard: a community data resource for environmental chemistry, J. Cheminformatics, 9, 61 (2017) • EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research, Comp. Tox. 12, 100096 (2019) • OPERA models for predicting physicochemical properties and environmental fate endpoints, J. Cheminformatics, 10, 10 (2018) • Screening Chemicals for Estrogen Receptor Bioactivity Using a Computational Model, Environ. Sci. Technol. 49, 8804-8814 (2015) • ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology, Chem. Res. Toxicol. 29, 1225-51 (2016) • Development and Validation of a Computational Model for Androgen Receptor Activity, Chem. Res. Toxicol. 30, 946-964 (2017) • CERAPP: Collaborative Estrogen Receptor Activity Prediction Project, Environ. Health Perspect. 124, 1023 (2016) • Abstract Sifter: a comprehensive front-end system to PubMed, F1000, 6, 2164 (2017)
  • 97. You want to know more… • Lots of resources available • Presentations: https://tinyurl.com/w5hqs55 • Communities of Practice Videos: https://rb.gy/qsbno1 • Manual: https://rb.gy/4fgydc • Latest News: https://comptox.epa.gov/dashboard/news_info 97
  • 98. Acknowledgments • Contact: Williams.Antony@epa.gov • Feedback and follow-up is welcomed! Your questions help • The dashboard is based on the efforts of many more team members than us. Many collaborators provide data also. 98 EPA’s Center for Computational Toxicology and Exposure