SlideShare a Scribd company logo
1 of 24
Download to read offline
Alex M. Clark
Mixtures QSAR
modelling collections of chemicals
alex@collaborativedrug.com
February 2021
Real world chemistry is mixtures
✤ Most mixtures stored as text or
custom spreadsheets

✤ Value of upgrading to
cheminformatics is well
established...

✤ ... with the right datastructure,
always just one script away from
what you need

✤ If you can represent it, you can
model it
2
Mol
fi
le Mix
fi
le
InChI MInChI
1980's 2020's
Mixfile/MInChI ✤ Format needs to be:

‣ hierarchical

‣ embed structures when possible

‣ include concentration information

‣ tolerate uncertainty

✤ More verbose ELN-friendly form is Mix
fi
le

✤ Concise form with canonical components is
MInChI (mixtures InChI) notation
MInChI=0.00.1S/C4H8O/c1-2-4-5-3-1/h1-4H2&C6H12/
c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/
h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/
c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C6H14N.Li/c1-5(2)7-6(3)4;/
h5-6H,1-4H3;/q-1;+1/n{6&{1&{3&2&4&5}}}/
g{1mr0&{1vp0&{5:7pp1&1:2pp1&1:5pp0&1:5pp0}7vp0}}
3
(JSON-serialised)
Journal of Cheminformatics (2019)
10.1186/s13321-019-0357-4
Content creation
✤ Draw with editor

github.com/cdd/mixtures

✤ ... or import existing data
4
Fine chemicals as text
✤ Catalogs & inventories favour short descriptions of mixtures

✤ Machine learning for text-to-mixture works well:
5
ACS Omega (2021)
10.1021/acsomega.1c03311
M
Q
M
Q
M
Q
M
Q
(a) (b) (c)
Mixtures as spreadsheets
✤ Each component de
fi
ned by a column (or several)...
6
CDD Vault
✤ Mixture composer

✤ Destination for
each column

✤ Build mixture
hierarchy from rows

✤ Name-to-structure,
lookup databases,
links to ID codes
7
High throughput screening
8
Drug tablet formulation
✤ Represent active ingredient by structure &
concentration

✤ Other materials (excipients):

‣ polymers

‣ organic molecules

‣ salts

‣ amorphous materials
9
J Pharm Sci (2009)
10.1002/jps.21422
active

ingredient
excipients
Cosmetics
10
Consumer Products
11
abrasive hand cleaner dishwashing liquid
Hazards
12
storage 2-8 °C

fl
ash point -49 °C
storage 15 °C 

fl
ash point -4 °C
volatile

extreme danger

sealed ampule
less

dangerous
Mixtures QSAR
✤ Using chemical structure to model properties has a long history

✤ Modelling mixtures adds additional layers:

‣ data capture has to be solved
fi
rst

‣ adapt cheminformatics for multiple structures

‣ factor in concentrations

‣ use other descriptors when structures absent

✤ Several demo examples...
13
Example 1: Theophilline
✤ https://bit.ly/3Kqb5VA

✤ Solubility is limiting factor

✤ Sourced from multiple papers
14
n
n n n n
Example 2: Eutectic solubility
✤ https://bit.ly/3IWFKJV

✤ Have solubility of CO2 and SO2 in
various solvent combinations
15
R² = 0.9412
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Predicted
Measured
Gas Absorption
Polymers: Chi
✤ χ is an measure of entropy of
mixing

✤ Values for polymer + [polymer
or solvent or other]

✤ 2-component mixtures

✤ Data taken from Materials
Genome Project...
16
Spreadsheet
✤ Content arrives in tabular form

✤ Expanding into full mixtures - 2 steps:

1. register each named structure

2. use Vault to compose into mixtures
17
Register structures
18
Markup into mixtures
✤ CDD Vault importer: Compose Mixtures
19
Gather data
✤ Select all chemical objects
with a Flory-Huggins Chi
property

✤ 172 unique datapoints

✤ Data gathering is generic:
applies to any mixture-
formatted content with this
property
20
Model
✤ For polymer components, do some simple processing:

✤ Generate ECFP4
fi
ngerprints / fold into 2048 bit vector

✤ Add vectors for component 1 & component 2: values are [0, 1, 2]

✤ Model descriptors ➪ chi using LightGBM (tree-based gradient boost)
21
Results
MAE = 0.208

R2 = 0.696
22
✤ Can propose:

‣ arbitrary polymer structure

‣ arbitrary other molecule

‣ predict entropy of mixing (χ)

✤ Dataset is small and diverse...

✤ ... cross validation not great, need
more data

✤ Database of mixtures & χ-values?

✤ Bring into training set
The Future
✤ Digitise mixtures, like for molecules - self describing

✤ Put them in repositories

‣ with measured properties

‣ open databases (like PubChem)

‣ or just a huge data lake

✤ Then focus on

‣ queries, analysis, model building - the fun part!

‣ describing components that are not simple molecules
23
Acknowledgements
✤ Leah McEwen

✤ InChI Trust & IUPAC

✤ NIH Grant 1R43TR002528-01

✤ The CDD Vault team
24
alex@collaborativedrug.com
(PS: we're hiring)

More Related Content

More from Alex Clark

Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsAlex Clark
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...Alex Clark
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay ExpressAlex Clark
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?Alex Clark
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsAlex Clark
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsAlex Clark
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designAlex Clark
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookAlex Clark
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Alex Clark
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Alex Clark
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Alex Clark
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealAlex Clark
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App StrategyAlex Clark
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Alex Clark
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsAlex Clark
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex Clark
 
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 PhiladelphiaAlex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 PhiladelphiaAlex Clark
 
Building a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaborationBuilding a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaborationAlex Clark
 
MolPrime+ Feature Overview
MolPrime+ Feature OverviewMolPrime+ Feature Overview
MolPrime+ Feature OverviewAlex Clark
 

More from Alex Clark (20)

Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay Express
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithms
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile apps
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by design
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab Notebook
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health Montreal
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App Strategy
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile apps
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 Philadelphia
 
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 PhiladelphiaAlex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
 
Building a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaborationBuilding a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaboration
 
MolPrime+ Feature Overview
MolPrime+ Feature OverviewMolPrime+ Feature Overview
MolPrime+ Feature Overview
 

Recently uploaded

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 

Recently uploaded (20)

The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 

Mixtures QSAR: modelling collections of chemicals

  • 1. Alex M. Clark Mixtures QSAR modelling collections of chemicals alex@collaborativedrug.com February 2021
  • 2. Real world chemistry is mixtures ✤ Most mixtures stored as text or custom spreadsheets ✤ Value of upgrading to cheminformatics is well established... ✤ ... with the right datastructure, always just one script away from what you need ✤ If you can represent it, you can model it 2 Mol fi le Mix fi le InChI MInChI 1980's 2020's
  • 3. Mixfile/MInChI ✤ Format needs to be: ‣ hierarchical ‣ embed structures when possible ‣ include concentration information ‣ tolerate uncertainty ✤ More verbose ELN-friendly form is Mix fi le ✤ Concise form with canonical components is MInChI (mixtures InChI) notation MInChI=0.00.1S/C4H8O/c1-2-4-5-3-1/h1-4H2&C6H12/ c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/ h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/ c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C6H14N.Li/c1-5(2)7-6(3)4;/ h5-6H,1-4H3;/q-1;+1/n{6&{1&{3&2&4&5}}}/ g{1mr0&{1vp0&{5:7pp1&1:2pp1&1:5pp0&1:5pp0}7vp0}} 3 (JSON-serialised) Journal of Cheminformatics (2019) 10.1186/s13321-019-0357-4
  • 4. Content creation ✤ Draw with editor github.com/cdd/mixtures ✤ ... or import existing data 4
  • 5. Fine chemicals as text ✤ Catalogs & inventories favour short descriptions of mixtures ✤ Machine learning for text-to-mixture works well: 5 ACS Omega (2021) 10.1021/acsomega.1c03311 M Q M Q M Q M Q (a) (b) (c)
  • 6. Mixtures as spreadsheets ✤ Each component de fi ned by a column (or several)... 6
  • 7. CDD Vault ✤ Mixture composer ✤ Destination for each column ✤ Build mixture hierarchy from rows ✤ Name-to-structure, lookup databases, links to ID codes 7
  • 9. Drug tablet formulation ✤ Represent active ingredient by structure & concentration ✤ Other materials (excipients): ‣ polymers ‣ organic molecules ‣ salts ‣ amorphous materials 9 J Pharm Sci (2009) 10.1002/jps.21422 active ingredient excipients
  • 11. Consumer Products 11 abrasive hand cleaner dishwashing liquid
  • 12. Hazards 12 storage 2-8 °C fl ash point -49 °C storage 15 °C fl ash point -4 °C volatile extreme danger sealed ampule less dangerous
  • 13. Mixtures QSAR ✤ Using chemical structure to model properties has a long history ✤ Modelling mixtures adds additional layers: ‣ data capture has to be solved fi rst ‣ adapt cheminformatics for multiple structures ‣ factor in concentrations ‣ use other descriptors when structures absent ✤ Several demo examples... 13
  • 14. Example 1: Theophilline ✤ https://bit.ly/3Kqb5VA ✤ Solubility is limiting factor ✤ Sourced from multiple papers 14 n n n n n
  • 15. Example 2: Eutectic solubility ✤ https://bit.ly/3IWFKJV ✤ Have solubility of CO2 and SO2 in various solvent combinations 15 R² = 0.9412 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted Measured Gas Absorption
  • 16. Polymers: Chi ✤ χ is an measure of entropy of mixing ✤ Values for polymer + [polymer or solvent or other] ✤ 2-component mixtures ✤ Data taken from Materials Genome Project... 16
  • 17. Spreadsheet ✤ Content arrives in tabular form ✤ Expanding into full mixtures - 2 steps: 1. register each named structure 2. use Vault to compose into mixtures 17
  • 19. Markup into mixtures ✤ CDD Vault importer: Compose Mixtures 19
  • 20. Gather data ✤ Select all chemical objects with a Flory-Huggins Chi property ✤ 172 unique datapoints ✤ Data gathering is generic: applies to any mixture- formatted content with this property 20
  • 21. Model ✤ For polymer components, do some simple processing: ✤ Generate ECFP4 fi ngerprints / fold into 2048 bit vector ✤ Add vectors for component 1 & component 2: values are [0, 1, 2] ✤ Model descriptors ➪ chi using LightGBM (tree-based gradient boost) 21
  • 22. Results MAE = 0.208 R2 = 0.696 22 ✤ Can propose: ‣ arbitrary polymer structure ‣ arbitrary other molecule ‣ predict entropy of mixing (χ) ✤ Dataset is small and diverse... ✤ ... cross validation not great, need more data ✤ Database of mixtures & χ-values? ✤ Bring into training set
  • 23. The Future ✤ Digitise mixtures, like for molecules - self describing ✤ Put them in repositories ‣ with measured properties ‣ open databases (like PubChem) ‣ or just a huge data lake ✤ Then focus on ‣ queries, analysis, model building - the fun part! ‣ describing components that are not simple molecules 23
  • 24. Acknowledgements ✤ Leah McEwen ✤ InChI Trust & IUPAC ✤ NIH Grant 1R43TR002528-01 ✤ The CDD Vault team 24 alex@collaborativedrug.com (PS: we're hiring)