SlideShare a Scribd company logo
1 of 35
Markush enumeration to manage, mesh
and manipulate substances of unknown
or variable composition
Antony Williams1, Chris Grulke1, Andrew McEachran2
and Emma Schymanski3,4
1. National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC
2. Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC
3. Eawag: Swiss Federal Institute for Aquatic Science and Technology, Switzerland
4. Luxembourg Centre for Systems Biomedicine (LCSB), Luxembourg
August 2017
ACS Fall Meeting, Washington, DC
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
The CompTox Chemistry Dashboard:
• A publicly accessible website delivering access:
– ~760,000 chemicals and related property data
– Links to other agency websites and public data resources
– “Literature” searches for chemicals using public resources
– Integration to “biological assay data” for 1000s of chemicals
– Information regarding consumer products containing chemicals
– “Batch searching” for thousands of chemicals
1
Comptox Chemistry Dashboard
https://comptox.epa.gov
2
~760,000 chemicals
>15 years of data
Chemical Page
3
Toxicity Values
4
Product Composition Details
Links to Other Resources
Example External Links…
7
Managing structure relationships
8
The CompTox Chemistry Dashboard:
• A publicly accessible website delivering access:
– ~760,000 chemicals and related property data
– Links to other agency websites and public data resources
– “Literature” searches for chemicals using public resources
– Integration to “biological assay data” for 1000s of chemicals
– Information regarding consumer products containing chemicals
– “Batch searching” for thousands of chemicals
• Chemical structures aren’t easy – but very doable
• Complex mixtures and undefined substances – way
more challenging, but necessary!
9
Chemical “Families”
10
Chemical “Families”
11
• Sometimes the simplest of questions are
difficult to answer!
– What is the list of CAS Numbers for all PCBs?
– Can I get an SDF file of all PCBs?
– Do you have predicted properties for all PCBs?
– What toxicity data is available for individual PCBS?
– Have you measured ToxCast data for any PCBs?
– Can I get all PCBs listed in an Excel Spreadsheet?
Downloading Families
12
One click download
13
How Did We Do This?
14
Relationship Mappings
• Various relationship mappings can be
established. To this point all are manual.
• Enumerated structures are going to help
because we have a major challenge
15
The TSCA Inventory
https://www.epa.gov/tsca-inventory
16
• The inventory contains about 85,000 chemical substances
Lots of Complex Chemistry
17
UVCB Chemicals
• UVCB chemical examples
– Surfactants with undefined composition
– Petroleum Distillates
– Gelatins, hydrozylates
– Formaldehyde, reaction products with diethanolamine
– Fatty acids, linseed-oil, compds. with triethylamine
18
19
UVCB handling – early progress
Managing UVCB Relationships
20
CDK Depict
How we want to create mappings…
• For UVCB chemicals (and families)..
– Input a generic structure and enumerate all chemicals
– Check for mappings across the existing structures
– Auto-create the mappings as “related chemicals”
• Two approaches to be tested
– Markush structure enumeration in ChemAxon
– Structure enumeration and flexibility with MOLGEN
– Compare results
21
ChemAxon Markush Technology
22
Xylenes, a simple example…
23
Polychlorinated Biphenyls (PCBs)
24
Markush Input for Enumeration
25
MOLGEN
http://www.molgen.de/
26
MOLGEN – Enumerating PCBs
• INPUTS: MOLGEN Command Line
– Fuzzy Formula: C12H0-10Cl1-10
– Total H+Cl=10 (fixes RDB/saturation)
– Substructure definition: biphenyl_aro.mol
– Restrict cycles to 2 (biphenyl only)
– Pipe output to OpenBabel to write out InChIKeys
directly from SDF
• No. structures: 209
• https://comptox.epa.gov/dashboard/dsstoxdb/
results?search=PCBs
27
Other than mappings what
are real applications???
• How will we use “enumerated structures”?
– Auto-mapping of UVCB chemicals to components
– Distribution of physicochemical properties across
complex mixtures
– Mapping chemicals and “predicted metabolites”
– Structure identification approaches for complex
mixtures – homologues are commonly detected by MS
28
Structure Identification using
Non-Targeted Analysis
Supporting Mass Spectrometry
30
MS detection of homologues
31
S OO
OH
CH3
CH3
m
n
C9H19
O
O
S
O
O
OHm
M. Loos & H Singer, 2017.
J. Cheminf. DOI: 10.1186/
s13321-017-0197-z
Schymanski et al.
2014, ES&T DOI:
10.1021/es4044374
Conclusion
• The CompTox Chemistry Dashboard provides
access to data for ~760,000 chemicals (most are
structures)
• MANY substances of interest are “UVCB Chemicals”
• Automated mapping procedures within the
Dashboard will improve navigation dramatically
• Structure enumeration from Markush inputs is
underway – ChemAxon Markush and MOLGEN
• Getting information into databases is vital to support
UVCB detection in real samples
32
Acknowledgments
Contact
Antony Williams
US EPA Office of Research and Development
National Center for Computational Toxicology
Williams.Antony@epa.gov
ORCID: https://orcid.org/0000-0002-2668-4821
34

More Related Content

Similar to Markush enumeration to manage, mesh and manipulate substances of unknown or variable composition

The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...Andrew McEachran
 

Similar to Markush enumeration to manage, mesh and manipulate substances of unknown or variable composition (20)

The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted AnalysisThe US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
The US-EPA CompTox Chemicals Dashboard to support Non-Targeted Analysis
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
 
EPA’s CompTox Chemicals Dashboard, a tool with information on ~900,000 chemicals
EPA’s CompTox Chemicals Dashboard, a tool with information on ~900,000 chemicalsEPA’s CompTox Chemicals Dashboard, a tool with information on ~900,000 chemicals
EPA’s CompTox Chemicals Dashboard, a tool with information on ~900,000 chemicals
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
 
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
TRIANGLE AREA MASS SPECTOMETRY MEETING: Structure Identification Approaches U...
 
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
EPA CompTox Chemicals Dashboard as a Data Integration Hub for Environmental C...
 
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
US-EPA Chemicals Dashboard and Applications to Digital Design  of MoleculesUS-EPA Chemicals Dashboard and Applications to Digital Design  of Molecules
US-EPA Chemicals Dashboard and Applications to Digital Design of Molecules
 
Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...Integrating Mass Spectrometry  Non-Targeted Analysis and Computational Toxico...
Integrating Mass Spectrometry Non-Targeted Analysis and Computational Toxico...
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
Using the US EPA’s CompTox Chemistry Dashboard for structure identification a...
 
Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...Chemical identification of unknowns in high resolution mass spectrometry usin...
Chemical identification of unknowns in high resolution mass spectrometry usin...
 
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
The EPA CompTox Chemistry Dashboard and Underpinning Software Architecture
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 
Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...Delivering web-based access to data and algorithms to support computational t...
Delivering web-based access to data and algorithms to support computational t...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data DashboardsAccessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 

Recently uploaded

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 

Recently uploaded (20)

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 

Markush enumeration to manage, mesh and manipulate substances of unknown or variable composition

  • 1. Markush enumeration to manage, mesh and manipulate substances of unknown or variable composition Antony Williams1, Chris Grulke1, Andrew McEachran2 and Emma Schymanski3,4 1. National Center for Computational Toxicology, U.S. Environmental Protection Agency, RTP, NC 2. Oak Ridge Institute of Science and Education (ORISE) Research Participant, Research Triangle Park, NC 3. Eawag: Swiss Federal Institute for Aquatic Science and Technology, Switzerland 4. Luxembourg Centre for Systems Biomedicine (LCSB), Luxembourg August 2017 ACS Fall Meeting, Washington, DC http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  • 2. The CompTox Chemistry Dashboard: • A publicly accessible website delivering access: – ~760,000 chemicals and related property data – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – Integration to “biological assay data” for 1000s of chemicals – Information regarding consumer products containing chemicals – “Batch searching” for thousands of chemicals 1
  • 7. Links to Other Resources
  • 10. The CompTox Chemistry Dashboard: • A publicly accessible website delivering access: – ~760,000 chemicals and related property data – Links to other agency websites and public data resources – “Literature” searches for chemicals using public resources – Integration to “biological assay data” for 1000s of chemicals – Information regarding consumer products containing chemicals – “Batch searching” for thousands of chemicals • Chemical structures aren’t easy – but very doable • Complex mixtures and undefined substances – way more challenging, but necessary! 9
  • 12. Chemical “Families” 11 • Sometimes the simplest of questions are difficult to answer! – What is the list of CAS Numbers for all PCBs? – Can I get an SDF file of all PCBs? – Do you have predicted properties for all PCBs? – What toxicity data is available for individual PCBS? – Have you measured ToxCast data for any PCBs? – Can I get all PCBs listed in an Excel Spreadsheet?
  • 15. How Did We Do This? 14
  • 16. Relationship Mappings • Various relationship mappings can be established. To this point all are manual. • Enumerated structures are going to help because we have a major challenge 15
  • 17. The TSCA Inventory https://www.epa.gov/tsca-inventory 16 • The inventory contains about 85,000 chemical substances
  • 18. Lots of Complex Chemistry 17
  • 19. UVCB Chemicals • UVCB chemical examples – Surfactants with undefined composition – Petroleum Distillates – Gelatins, hydrozylates – Formaldehyde, reaction products with diethanolamine – Fatty acids, linseed-oil, compds. with triethylamine 18
  • 20. 19 UVCB handling – early progress
  • 22. How we want to create mappings… • For UVCB chemicals (and families).. – Input a generic structure and enumerate all chemicals – Check for mappings across the existing structures – Auto-create the mappings as “related chemicals” • Two approaches to be tested – Markush structure enumeration in ChemAxon – Structure enumeration and flexibility with MOLGEN – Compare results 21
  • 24. Xylenes, a simple example… 23
  • 26. Markush Input for Enumeration 25
  • 28. MOLGEN – Enumerating PCBs • INPUTS: MOLGEN Command Line – Fuzzy Formula: C12H0-10Cl1-10 – Total H+Cl=10 (fixes RDB/saturation) – Substructure definition: biphenyl_aro.mol – Restrict cycles to 2 (biphenyl only) – Pipe output to OpenBabel to write out InChIKeys directly from SDF • No. structures: 209 • https://comptox.epa.gov/dashboard/dsstoxdb/ results?search=PCBs 27
  • 29. Other than mappings what are real applications??? • How will we use “enumerated structures”? – Auto-mapping of UVCB chemicals to components – Distribution of physicochemical properties across complex mixtures – Mapping chemicals and “predicted metabolites” – Structure identification approaches for complex mixtures – homologues are commonly detected by MS 28
  • 32. MS detection of homologues 31 S OO OH CH3 CH3 m n C9H19 O O S O O OHm M. Loos & H Singer, 2017. J. Cheminf. DOI: 10.1186/ s13321-017-0197-z Schymanski et al. 2014, ES&T DOI: 10.1021/es4044374
  • 33. Conclusion • The CompTox Chemistry Dashboard provides access to data for ~760,000 chemicals (most are structures) • MANY substances of interest are “UVCB Chemicals” • Automated mapping procedures within the Dashboard will improve navigation dramatically • Structure enumeration from Markush inputs is underway – ChemAxon Markush and MOLGEN • Getting information into databases is vital to support UVCB detection in real samples 32
  • 35. Contact Antony Williams US EPA Office of Research and Development National Center for Computational Toxicology Williams.Antony@epa.gov ORCID: https://orcid.org/0000-0002-2668-4821 34