The National Center for Computational Toxicology (NCCT) at the US Environmental Protection Agency has measured, assembled and delivered an enormous quantity and diversity of data for the environmental sciences. This includes high-throughput in vitro screening data, legacy in vivo animal data, and functional use data, exposure models and chemical databases with associated properties. The CompTox Chemistry Dashboard provides access to data associated with ~750,000 chemical substances with approximately 20,000 of these substances being Unknown or Variable Composition, Complex reaction products and Biological materials (UVCB substances). The dashboard is increasingly applied to supporting the needs of “non-targeted analysis” (NTA), the identification of chemical substances using analytical science, specifically mass spectrometry (MS). Even though complex mixtures are being analyzed, MS approaches identify chemical constituents in the mixture, with many of these being ambiguous in terms of substitution patterns. This talk will review our efforts to utilize generic and Markush representations and enumeration approaches to map structure candidates identified through NTA in the growing chemistry content within the Dashboard. We will also discuss how enumeration approaches can help in profiling UVCB chemicals for physicochemical parameter ranges and how this information can be of value in terms of hazard and risk assessment. - This abstract does not reflect U.S. EPA policy.
Markush enumeration to manage, mesh and manipulate substances of unknown or variable composition
1. Markush enumeration to manage, mesh
and manipulate substances of unknown
or variable composition
Antony Williams1, Chris Grulke1, Andrew McEachran2
and Emma Schymanski3,4
1. National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC
2. Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC
3. Eawag: Swiss Federal Institute for Aquatic Science and Technology, Switzerland
4. Luxembourg Centre for Systems Biomedicine (LCSB), Luxembourg
August 2017
ACS Fall Meeting, Washington, DC
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
2. The CompTox Chemistry Dashboard:
• A publicly accessible website delivering access:
– ~760,000 chemicals and related property data
– Links to other agency websites and public data resources
– “Literature” searches for chemicals using public resources
– Integration to “biological assay data” for 1000s of chemicals
– Information regarding consumer products containing chemicals
– “Batch searching” for thousands of chemicals
1
10. The CompTox Chemistry Dashboard:
• A publicly accessible website delivering access:
– ~760,000 chemicals and related property data
– Links to other agency websites and public data resources
– “Literature” searches for chemicals using public resources
– Integration to “biological assay data” for 1000s of chemicals
– Information regarding consumer products containing chemicals
– “Batch searching” for thousands of chemicals
• Chemical structures aren’t easy – but very doable
• Complex mixtures and undefined substances – way
more challenging, but necessary!
9
12. Chemical “Families”
11
• Sometimes the simplest of questions are
difficult to answer!
– What is the list of CAS Numbers for all PCBs?
– Can I get an SDF file of all PCBs?
– Do you have predicted properties for all PCBs?
– What toxicity data is available for individual PCBS?
– Have you measured ToxCast data for any PCBs?
– Can I get all PCBs listed in an Excel Spreadsheet?
16. Relationship Mappings
• Various relationship mappings can be
established. To this point all are manual.
• Enumerated structures are going to help
because we have a major challenge
15
22. How we want to create mappings…
• For UVCB chemicals (and families)..
– Input a generic structure and enumerate all chemicals
– Check for mappings across the existing structures
– Auto-create the mappings as “related chemicals”
• Two approaches to be tested
– Markush structure enumeration in ChemAxon
– Structure enumeration and flexibility with MOLGEN
– Compare results
21
28. MOLGEN – Enumerating PCBs
• INPUTS: MOLGEN Command Line
– Fuzzy Formula: C12H0-10Cl1-10
– Total H+Cl=10 (fixes RDB/saturation)
– Substructure definition: biphenyl_aro.mol
– Restrict cycles to 2 (biphenyl only)
– Pipe output to OpenBabel to write out InChIKeys
directly from SDF
• No. structures: 209
• https://comptox.epa.gov/dashboard/dsstoxdb/
results?search=PCBs
27
29. Other than mappings what
are real applications???
• How will we use “enumerated structures”?
– Auto-mapping of UVCB chemicals to components
– Distribution of physicochemical properties across
complex mixtures
– Mapping chemicals and “predicted metabolites”
– Structure identification approaches for complex
mixtures – homologues are commonly detected by MS
28
32. MS detection of homologues
31
S OO
OH
CH3
CH3
m
n
C9H19
O
O
S
O
O
OHm
M. Loos & H Singer, 2017.
J. Cheminf. DOI: 10.1186/
s13321-017-0197-z
Schymanski et al.
2014, ES&T DOI:
10.1021/es4044374
33. Conclusion
• The CompTox Chemistry Dashboard provides
access to data for ~760,000 chemicals (most are
structures)
• MANY substances of interest are “UVCB Chemicals”
• Automated mapping procedures within the
Dashboard will improve navigation dramatically
• Structure enumeration from Markush inputs is
underway – ChemAxon Markush and MOLGEN
• Getting information into databases is vital to support
UVCB detection in real samples
32
35. Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology
Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
34