SlideShare a Scribd company logo
Characterizing information content of
Markush libraries with InChI
Gabriel Sinclair1, Inthirany Thillainadarajah2, Brian Meyer2,
Vicente Samano2, Sakuntala Sivasupramaniam2, Linda Adams2,
Ann Richard3 and Antony Williams3
1 Center for Computational Toxicology & Exposure, U.S. Environmental Protection Agency,
2 ORAU Student Services Contractor
3 Senior Environmental Employment (SEE) Program
August 2022: ACS Chicago
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
Characterizing information content of
Markush libraries with InChI
Gabriel Sinclair1, Inthirany Thillainadarajah2, Brian Meyer2,
Vicente Samano2, Sakuntala Sivasupramaniam2, Linda Adams2,
Ann Richard3 and Antony Williams3
1 Center for Computational Toxicology & Exposure, U.S. Environmental Protection Agency,
2 ORAU Student Services Contractor
3 Senior Environmental Employment (SEE) Program
August 2022: ACS Chicago
http://www.orcid.org/0000-0002-2668-4821
The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
The CompTox Chemicals Dashboard
https://comptox.epa.gov/dashboard
• Integration hub for >906k substances and
associated data (1.2M are about to release)
• Curated data, with ongoing daily curation
and expansion, focused on environmental
chemistry and impacts on human health
• Aggregated sets of chemical lists of interest
– endocrine disruptors, PFAS, azo dyes,
disinfectant by-products, and 100s more
2
~906k chemicals includes explicit
structures and complexity
• Many chemicals cannot be defined as distinct chemical
structures but have significant complexity
– UVCB chemicals (unknown or variable composition, complex reaction
products and biological substances)
3
~906k chemicals includes explicit
structures and complexity
• Many chemicals cannot be defined as distinct chemical
structures but have significant complexity
– UVCB chemicals (unknown or variable composition, complex reaction
products and biological substances)
– Some chemicals can be described with Markush structure boundaries
• WELL-DEFINED: Xylenes: 3 chemicals. Polychlorinated biphenyls and
polybrominated diphenyl ethers: 209 chemicals in each class
• WELL-DEFINED: C10-C14 linear alkylbenzenesulfonic acids
• LESS WELL-DEFINED: Linear perfluoroalkylsulfonic acids, becomes
well defined once the limits are defined (C1-C100)
• HARDLY DEFINED: Chloroparaffins, Cobalt/Cobalt-related substances
• HARDLY DEFINED: Polymers – length, HH, TT, HT, capped,
4
Well-Defined Complex Substances
PCBs: 209 distinct structures
5
• 209 “Markush-enumerated”
structures are mapped
• Markush-enumerated
structures are mapped to
existing chemicals in the
database
Well-Defined Complex Substances
Linear alkylbenzene sulfonates
6
Well-Defined Complex Substances
Linear alkylbenzene sulfonates
7
• Mapping to other
generic structures and
explicit structures
Less Well-Defined Substances
PFAS Categories of chemicals (326)
8
Less Well-Defined Substances
PFAS “Categories” of chemicals
9
Markush chemicals are now essential…
• Markush structures are essential for:
– Mapping constrained structure datasets (PCBs, PBDEs etc)
– Enumerating possible structures and mapping to existing chemicals
– As we enumerate Markush structures we search for validation
10
Markush chemicals are now essential…
• Markush structures are essential for:
– Mapping constrained structure datasets (PCBs, PBDEs etc)
– Enumerating possible structures and mapping to existing chemicals
– As we enumerate Markush structures we search for validation
• The benefits of this approach
– Querying external resources for validation and expansion of DSSTox
– Sourcing identifiers for us to register and validate in DSSTox
– Sourcing chemicals to purchase
11
Markush Enumeration mapping
• When we add Markush structures to our database we
enumerate and map related substance mappings
12
Markush Enumeration mapping
• When we add Markush structures to our database we
enumerate and map related substance mappings
• Complex substances therefore map to individual structures
13
Markush Enumeration mapping
14
Sourcing data from external sites
• When Markush are enumerated we source data by InChI
• InChI keys pinged against resources for “chemical exists”
– EBI Unichem
15
Sourcing data from external sites
• When Markush are enumerated we source data by InChI
• InChI keys pinged against resources for “chemical exists”
– EBI Unichem
– PubChem
– Other sources
16
Retrieving data for validation
• When we retrieve data for Markush structures we use it to
assist in curation – 5 different curation levels
17
InChIs is our PRIMARY mapping identifier
• Since we can map Markush to explicit structures we end up
with “representative structures” for searching. This helps:
– Search public resources for chemical vendors (thanks PubChem!). Sourcing
chemicals for study in our studies such as bioactivity, HTTK, metabolism
(InChI first block will find us isotopically labeled chemicals)
– Searching across “literature” – Google scholar, books and patents
– Sourcing toxicity data for read-across purposes
• All of this is available to the public via “External Links”
18
External Links
19
External Links
20
External Links
21
External Links
22
External Links
23
Future Work
• We are desperately waiting for the Markush InChI Key -
we have use cases for it immediately
• We are definitely interested in the Mixture InChI, and it’s
applications to mixtures and potentially polymers
24
Polytriazine
There is one Google hit for the combined
InChI for polytriazine
But thousands of hits for one of the
components of the mixture (Atrazine)
Future Work – adding Markush where we can
• TSCA non-confidential has >33k substances with half have no
structures. Some can have Markush added (and mapped)
25
Conclusions
• Dashboard data is under constant curation and Markush
addition has become a standard part of our work
• Mapped relationships between chemicals is of value for
complex substances and important for us to source data
• Markush enumerated to distinct chemicals with associated
InChIs then allows us to harvest data and link across sites
• Chemistry is complex and informatics solutions are more
complete for distinct structures than complex substances.
There is lots more to do… we await the Markush InChI 
26

More Related Content

Similar to Characterizing information content of Markush libraries with InChI

Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
Andrew McEachran
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Lipinski in silico drug discovery durham nc 2014
Lipinski in silico drug discovery durham nc 2014Lipinski in silico drug discovery durham nc 2014
Lipinski in silico drug discovery durham nc 2014
Christopher Lipinski
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discovery
Alichy Sowmya
 
Structure standardization approaches for mass spectrometry data integration
Structure standardization approaches for  mass spectrometry data integrationStructure standardization approaches for  mass spectrometry data integration
Structure standardization approaches for mass spectrometry data integration
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to Characterizing information content of Markush libraries with InChI (20)

Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...Non-targeted analysis supported by data and cheminformatics delivered via the...
Non-targeted analysis supported by data and cheminformatics delivered via the...
 
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
US EPA CompTox Chemicals Dashboard Data Integration Hub to Support Environmen...
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...Structure Identification Using High Resolution Mass Spectrometry Data and the...
Structure Identification Using High Resolution Mass Spectrometry Data and the...
 
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
 
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
Accessing Environmental Chemistry Data via Data Dashboards and Applications t...
 
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
US-EPA CompTox Chemicals Dashboard – integrating chemistry and biology data t...
 
Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards Accessing Environmental Chemistry Data via Data Dashboards
Accessing Environmental Chemistry Data via Data Dashboards
 
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
The EPA CompTox Dashboard as a Data Integration Hub for Environmental Chemist...
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
Delivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applicationsDelivering chemical-associated data via EPA web applications
Delivering chemical-associated data via EPA web applications
 
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...Structure identification approaches using the EPA CompTox Chemicals Dashboard...
Structure identification approaches using the EPA CompTox Chemicals Dashboard...
 
Lipinski in silico drug discovery durham nc 2014
Lipinski in silico drug discovery durham nc 2014Lipinski in silico drug discovery durham nc 2014
Lipinski in silico drug discovery durham nc 2014
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
 
SciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discoverySciFinder and its utility in Drug discovery
SciFinder and its utility in Drug discovery
 
Structure standardization approaches for mass spectrometry data integration
Structure standardization approaches for  mass spectrometry data integrationStructure standardization approaches for  mass spectrometry data integration
Structure standardization approaches for mass spectrometry data integration
 
Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...Accessing information for chemicals in hydraulic fracturing fluids using the ...
Accessing information for chemicals in hydraulic fracturing fluids using the ...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
Structure identification by Mass Spectrometry Non-Targeted Analysis using the...
 
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
 

Recently uploaded

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
Areesha Ahmad
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 

Recently uploaded (20)

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of LipidsGBSN - Biochemistry (Unit 5) Chemistry of Lipids
GBSN - Biochemistry (Unit 5) Chemistry of Lipids
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 

Characterizing information content of Markush libraries with InChI

  • 1. Characterizing information content of Markush libraries with InChI Gabriel Sinclair1, Inthirany Thillainadarajah2, Brian Meyer2, Vicente Samano2, Sakuntala Sivasupramaniam2, Linda Adams2, Ann Richard3 and Antony Williams3 1 Center for Computational Toxicology & Exposure, U.S. Environmental Protection Agency, 2 ORAU Student Services Contractor 3 Senior Environmental Employment (SEE) Program August 2022: ACS Chicago http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  • 2. Characterizing information content of Markush libraries with InChI Gabriel Sinclair1, Inthirany Thillainadarajah2, Brian Meyer2, Vicente Samano2, Sakuntala Sivasupramaniam2, Linda Adams2, Ann Richard3 and Antony Williams3 1 Center for Computational Toxicology & Exposure, U.S. Environmental Protection Agency, 2 ORAU Student Services Contractor 3 Senior Environmental Employment (SEE) Program August 2022: ACS Chicago http://www.orcid.org/0000-0002-2668-4821 The views expressed in this presentation are those of the author and do not necessarily reflect the views or policies of the U.S. EPA
  • 3. The CompTox Chemicals Dashboard https://comptox.epa.gov/dashboard • Integration hub for >906k substances and associated data (1.2M are about to release) • Curated data, with ongoing daily curation and expansion, focused on environmental chemistry and impacts on human health • Aggregated sets of chemical lists of interest – endocrine disruptors, PFAS, azo dyes, disinfectant by-products, and 100s more 2
  • 4. ~906k chemicals includes explicit structures and complexity • Many chemicals cannot be defined as distinct chemical structures but have significant complexity – UVCB chemicals (unknown or variable composition, complex reaction products and biological substances) 3
  • 5. ~906k chemicals includes explicit structures and complexity • Many chemicals cannot be defined as distinct chemical structures but have significant complexity – UVCB chemicals (unknown or variable composition, complex reaction products and biological substances) – Some chemicals can be described with Markush structure boundaries • WELL-DEFINED: Xylenes: 3 chemicals. Polychlorinated biphenyls and polybrominated diphenyl ethers: 209 chemicals in each class • WELL-DEFINED: C10-C14 linear alkylbenzenesulfonic acids • LESS WELL-DEFINED: Linear perfluoroalkylsulfonic acids, becomes well defined once the limits are defined (C1-C100) • HARDLY DEFINED: Chloroparaffins, Cobalt/Cobalt-related substances • HARDLY DEFINED: Polymers – length, HH, TT, HT, capped, 4
  • 6. Well-Defined Complex Substances PCBs: 209 distinct structures 5 • 209 “Markush-enumerated” structures are mapped • Markush-enumerated structures are mapped to existing chemicals in the database
  • 7. Well-Defined Complex Substances Linear alkylbenzene sulfonates 6
  • 8. Well-Defined Complex Substances Linear alkylbenzene sulfonates 7 • Mapping to other generic structures and explicit structures
  • 9. Less Well-Defined Substances PFAS Categories of chemicals (326) 8
  • 10. Less Well-Defined Substances PFAS “Categories” of chemicals 9
  • 11. Markush chemicals are now essential… • Markush structures are essential for: – Mapping constrained structure datasets (PCBs, PBDEs etc) – Enumerating possible structures and mapping to existing chemicals – As we enumerate Markush structures we search for validation 10
  • 12. Markush chemicals are now essential… • Markush structures are essential for: – Mapping constrained structure datasets (PCBs, PBDEs etc) – Enumerating possible structures and mapping to existing chemicals – As we enumerate Markush structures we search for validation • The benefits of this approach – Querying external resources for validation and expansion of DSSTox – Sourcing identifiers for us to register and validate in DSSTox – Sourcing chemicals to purchase 11
  • 13. Markush Enumeration mapping • When we add Markush structures to our database we enumerate and map related substance mappings 12
  • 14. Markush Enumeration mapping • When we add Markush structures to our database we enumerate and map related substance mappings • Complex substances therefore map to individual structures 13
  • 16. Sourcing data from external sites • When Markush are enumerated we source data by InChI • InChI keys pinged against resources for “chemical exists” – EBI Unichem 15
  • 17. Sourcing data from external sites • When Markush are enumerated we source data by InChI • InChI keys pinged against resources for “chemical exists” – EBI Unichem – PubChem – Other sources 16
  • 18. Retrieving data for validation • When we retrieve data for Markush structures we use it to assist in curation – 5 different curation levels 17
  • 19. InChIs is our PRIMARY mapping identifier • Since we can map Markush to explicit structures we end up with “representative structures” for searching. This helps: – Search public resources for chemical vendors (thanks PubChem!). Sourcing chemicals for study in our studies such as bioactivity, HTTK, metabolism (InChI first block will find us isotopically labeled chemicals) – Searching across “literature” – Google scholar, books and patents – Sourcing toxicity data for read-across purposes • All of this is available to the public via “External Links” 18
  • 25. Future Work • We are desperately waiting for the Markush InChI Key - we have use cases for it immediately • We are definitely interested in the Mixture InChI, and it’s applications to mixtures and potentially polymers 24 Polytriazine There is one Google hit for the combined InChI for polytriazine But thousands of hits for one of the components of the mixture (Atrazine)
  • 26. Future Work – adding Markush where we can • TSCA non-confidential has >33k substances with half have no structures. Some can have Markush added (and mapped) 25
  • 27. Conclusions • Dashboard data is under constant curation and Markush addition has become a standard part of our work • Mapped relationships between chemicals is of value for complex substances and important for us to source data • Markush enumerated to distinct chemicals with associated InChIs then allows us to harvest data and link across sites • Chemistry is complex and informatics solutions are more complete for distinct structures than complex substances. There is lots more to do… we await the Markush InChI  26