SlideShare a Scribd company logo
1 of 19
Download to read offline
Alex M. Clark
Mixtures
as
fi
rst class citizens in the realm of informatics
alex@collaborativedrug.com
February 2021
Premise
✤ Most mixtures stored as text or
custom table layouts

✤ Value of upgrading to
cheminformatics is well
established...

✤ ... with the right datastructure,
always just one script away from
what you need

✤ If you can represent it, you can
model it
2
Mol
fi
le Mix
fi
le
InChI MInChI
1980's 2020's
Mixfile/MInChI ✤ Format needs to be:

‣ hierarchical

‣ embed structures when possible

‣ include concentration information

‣ tolerate uncertainty

✤ More verbose ELN-friendly form is Mix
fi
le

✤ Concise form with canonical components is
MInChI (mixtures InChI)
MInChI=0.00.1S/C4H8O/c1-2-4-5-3-1/h1-4H2&C6H12/
c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/
h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/
c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C6H14N.Li/c1-5(2)7-6(3)4;/
h5-6H,1-4H3;/q-1;+1/n{6&{1&{3&2&4&5}}}/
g{1mr0&{1vp0&{5:7pp1&1:2pp1&1:5pp0&1:5pp0}7vp0}}
3
(JSON-serialised)
Data Creation
4
github.com/cdd/mixtures collaborativedrug.com
Formulation Example
✤ Many consumer products are well described
from a chemical perspective

✤ Some components are more easily de
fi
ned
than others

✤ When structure is not available, can use
external identi
fi
ers

✤ Hierarchy encodes information about the
design of the product

✤ Concentrations can be expressed with
uncertainties
5
Comparisons with Structures
InChI=1S/C7H6O3/c8-6-4-2-1-3-5(6)7(9)10/h1-4,8H,(H,9,10)
≡ substructure of
n n
MW 300-500 MW 400-700
≅
Y1.2Ba0.8CuO4 ≅ YBa2Cu3O7−δ
≅
6
Search Queries
>40%
has
has INCI: COCAMIDE DEA
has not substructure
✤ Looking for a certain subset of
external cleaning surfactants,
phosphate-free
7
Informatics Example
✤ Solubility of theophylline

✤ Often delivered in liquid form with mixed solvents: optimising
proportion of drug is important

✤ Consider a scenario where:

‣ all data was provided in Mixtures InChI form

‣ these data exist in openly available repositories

✤ Query:

‣ check that theophylline is present and has concentration

‣ check that other ingredients are solvents

✤ Consider 4 papers with relevant solubility, published over 20 years...
theophylline

nasal anti-in
fl
ammatory
8
https://tinyurl.com/


y3svytfp


➫ 4:13:00
Papers (x4)
9
n n n n
All Together for QSAR
Solubility
0.699 1
15.19 1
1.04 1
3.142 1
0.784 1
0.91 1
6.3 1
13.7 1
11.6 1
13.58 1
6.73 1
9.3 1
8.20 0.8 0.2
16.38 0.5 0.5
13.60 0.2 0.8
(+8 more similar)
15.39 0.333 0.667
26.6 0.5 0.5
17.97 0.083 0.584 0.333
19.06 0.708 0.292
22.59 0.417 0.25 0.333
26.52 0.283 0.25 0.3 0.167
(+14 more similar)
n
10
Deep Eutectics / Carbon Capture
✤ Mixtures of ionic & neutral solvents can
absorb gases like CO2
11
DOI: 10.1021/ef5028873
✤ Want to model?

‣
fi
nding data in literature is hard

‣ curating is extremely laborious

✤ Each datapoint is a mixture...
Mixturfication
✤ Data entry easier as
tabular content

✤ Mixture hierarchy is
implicit
12
Mixtures
✤ Convert each row into a self-contained Mix
fi
le...

✤ ... have solubility of CO2 and SO2 in various solvent combinations

✤ Could be looked up in a database of mixtures from many di
ff
erent sources
13
Free Wilson-esque
✤ Tabulate components as column headings, with proportions - scriptable

✤ Build a basic regression model (PLS)...
14
Absorption
R² = 0.9523
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Predicted
Measured
Gas Absorption
Cross-populate
✤ Empty cells set to max(Tanimoto ECFP4 × concentration)
15
Absorption
R² = 0.9412
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Predicted
Measured
Gas Absorption
10-most Significant Descriptors
16
R² = 0.7723
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Predicted
Measured
Gas Absorption
Labelling Caveat
✤ Mixture data not quite free of implicit assumptions...
17
saturated
single

phase,

liquid
✤ Need additional metadata, e.g.

‣ ontologies

‣ IUPAC terminology

✤ Work in progress
Summary
✤ Open protocols, tools & example data available for representing mixtures

✤ New work
fl
ows become practical with cheminformaticisation of mixtures

✤ Much work to be done to catch up with structure databases

✤ Community building is key

✤ Many industries: lab chemistry, drug formulations, consumer products,
agriculture, analytical standards, safety...
18
Questions?
✤ Contact:

Alex M. Clark alex@collaborativedrug.com (Collaborative Drug Discovery)
19
Journal of Cheminformatics (2019
)

10.1186/s13321-019-0357-4
Open Sourc
e

github.com/cdd/mixtures
CDD Vaul
t

collaborativedrug.com
✤ Also: Leah McEwen (Cornell, InChI, IUPAC)

More Related Content

More from Alex Clark

Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAlex Clark
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsAlex Clark
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...Alex Clark
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay ExpressAlex Clark
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?Alex Clark
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsAlex Clark
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsAlex Clark
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designAlex Clark
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookAlex Clark
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Alex Clark
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Alex Clark
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Alex Clark
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealAlex Clark
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App StrategyAlex Clark
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Alex Clark
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsAlex Clark
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex Clark
 
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 PhiladelphiaAlex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 PhiladelphiaAlex Clark
 
Building a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaborationBuilding a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaborationAlex Clark
 

More from Alex Clark (20)

Autonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocolsAutonomous model building with a preponderance of well annotated assay protocols
Autonomous model building with a preponderance of well annotated assay protocols
 
Representing molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informaticsRepresenting molecules with minimalism: A solution to the entropy of informatics
Representing molecules with minimalism: A solution to the entropy of informatics
 
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
CDD BioAssay Express: Expanding the target dimension: How to visualize a lot ...
 
BioAssay Express
BioAssay ExpressBioAssay Express
BioAssay Express
 
SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?SLAS2016: Why have one model when you could have thousands?
SLAS2016: Why have one model when you could have thousands?
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithms
 
Compact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile appsCompact models for compact devices: Visualisation of SAR using mobile apps
Compact models for compact devices: Visualisation of SAR using mobile apps
 
Green chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by designGreen chemistry in chemical reactions: informatics by design
Green chemistry in chemical reactions: informatics by design
 
ICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab NotebookICCE 2014: The Green Lab Notebook
ICCE 2014: The Green Lab Notebook
 
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
Cloud hosted APIs for cheminformatics on mobile devices (ACS Dallas 2014)
 
Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)Building a mobile reaction lab notebook (ACS Dallas 2014)
Building a mobile reaction lab notebook (ACS Dallas 2014)
 
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
Reaction Lab Notebooks for Mobile Devices - Alex M. Clark - GDCh 2013
 
Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013Alex Clark : NETTAB 2013
Alex Clark : NETTAB 2013
 
Open Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health MontrealOpen Drug Discovery Teams @ Hacking Health Montreal
Open Drug Discovery Teams @ Hacking Health Montreal
 
Pistoia Alliance App Strategy
Pistoia Alliance App StrategyPistoia Alliance App Strategy
Pistoia Alliance App Strategy
 
Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?Mobile+Cloud: a viable replacement for desktop cheminformatics?
Mobile+Cloud: a viable replacement for desktop cheminformatics?
 
Practical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile appsPractical cheminformatics workflows with mobile apps
Practical cheminformatics workflows with mobile apps
 
Alex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 PhiladelphiaAlex M. Clark, CINF, ACS 2012 Philadelphia
Alex M. Clark, CINF, ACS 2012 Philadelphia
 
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 PhiladelphiaAlex M. Clark, Chemical Education, ACS 2012 Philadelphia
Alex M. Clark, Chemical Education, ACS 2012 Philadelphia
 
Building a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaborationBuilding a mobile app ecosystem for chemistry collaboration
Building a mobile app ecosystem for chemistry collaboration
 

Recently uploaded

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxPABOLU TEJASREE
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 

Recently uploaded (20)

Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 

Mixtures as first class citizens in the realm of informatics

  • 1. Alex M. Clark Mixtures as fi rst class citizens in the realm of informatics alex@collaborativedrug.com February 2021
  • 2. Premise ✤ Most mixtures stored as text or custom table layouts ✤ Value of upgrading to cheminformatics is well established... ✤ ... with the right datastructure, always just one script away from what you need ✤ If you can represent it, you can model it 2 Mol fi le Mix fi le InChI MInChI 1980's 2020's
  • 3. Mixfile/MInChI ✤ Format needs to be: ‣ hierarchical ‣ embed structures when possible ‣ include concentration information ‣ tolerate uncertainty ✤ More verbose ELN-friendly form is Mix fi le ✤ Concise form with canonical components is MInChI (mixtures InChI) MInChI=0.00.1S/C4H8O/c1-2-4-5-3-1/h1-4H2&C6H12/ c1-6-4-2-3-5-6/h6H,2-5H2,1H3&C6H14/c1-3-5-6-4-2/ h3-6H2,1-2H3&C6H14/c1-4-5-6(2)3/h6H,4-5H2,1-3H3&C6H14/ c1-4-6(3)5-2/h6H,4-5H2,1-3H3&C6H14N.Li/c1-5(2)7-6(3)4;/ h5-6H,1-4H3;/q-1;+1/n{6&{1&{3&2&4&5}}}/ g{1mr0&{1vp0&{5:7pp1&1:2pp1&1:5pp0&1:5pp0}7vp0}} 3 (JSON-serialised)
  • 5. Formulation Example ✤ Many consumer products are well described from a chemical perspective ✤ Some components are more easily de fi ned than others ✤ When structure is not available, can use external identi fi ers ✤ Hierarchy encodes information about the design of the product ✤ Concentrations can be expressed with uncertainties 5
  • 6. Comparisons with Structures InChI=1S/C7H6O3/c8-6-4-2-1-3-5(6)7(9)10/h1-4,8H,(H,9,10) ≡ substructure of n n MW 300-500 MW 400-700 ≅ Y1.2Ba0.8CuO4 ≅ YBa2Cu3O7−δ ≅ 6
  • 7. Search Queries >40% has has INCI: COCAMIDE DEA has not substructure ✤ Looking for a certain subset of external cleaning surfactants, phosphate-free 7
  • 8. Informatics Example ✤ Solubility of theophylline ✤ Often delivered in liquid form with mixed solvents: optimising proportion of drug is important ✤ Consider a scenario where: ‣ all data was provided in Mixtures InChI form ‣ these data exist in openly available repositories ✤ Query: ‣ check that theophylline is present and has concentration ‣ check that other ingredients are solvents ✤ Consider 4 papers with relevant solubility, published over 20 years... theophylline nasal anti-in fl ammatory 8 https://tinyurl.com/ y3svytfp ➫ 4:13:00
  • 10. All Together for QSAR Solubility 0.699 1 15.19 1 1.04 1 3.142 1 0.784 1 0.91 1 6.3 1 13.7 1 11.6 1 13.58 1 6.73 1 9.3 1 8.20 0.8 0.2 16.38 0.5 0.5 13.60 0.2 0.8 (+8 more similar) 15.39 0.333 0.667 26.6 0.5 0.5 17.97 0.083 0.584 0.333 19.06 0.708 0.292 22.59 0.417 0.25 0.333 26.52 0.283 0.25 0.3 0.167 (+14 more similar) n 10
  • 11. Deep Eutectics / Carbon Capture ✤ Mixtures of ionic & neutral solvents can absorb gases like CO2 11 DOI: 10.1021/ef5028873 ✤ Want to model? ‣ fi nding data in literature is hard ‣ curating is extremely laborious ✤ Each datapoint is a mixture...
  • 12. Mixturfication ✤ Data entry easier as tabular content ✤ Mixture hierarchy is implicit 12
  • 13. Mixtures ✤ Convert each row into a self-contained Mix fi le... ✤ ... have solubility of CO2 and SO2 in various solvent combinations ✤ Could be looked up in a database of mixtures from many di ff erent sources 13
  • 14. Free Wilson-esque ✤ Tabulate components as column headings, with proportions - scriptable ✤ Build a basic regression model (PLS)... 14 Absorption R² = 0.9523 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted Measured Gas Absorption
  • 15. Cross-populate ✤ Empty cells set to max(Tanimoto ECFP4 × concentration) 15 Absorption R² = 0.9412 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted Measured Gas Absorption
  • 16. 10-most Significant Descriptors 16 R² = 0.7723 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Predicted Measured Gas Absorption
  • 17. Labelling Caveat ✤ Mixture data not quite free of implicit assumptions... 17 saturated single phase, liquid ✤ Need additional metadata, e.g. ‣ ontologies ‣ IUPAC terminology ✤ Work in progress
  • 18. Summary ✤ Open protocols, tools & example data available for representing mixtures ✤ New work fl ows become practical with cheminformaticisation of mixtures ✤ Much work to be done to catch up with structure databases ✤ Community building is key ✤ Many industries: lab chemistry, drug formulations, consumer products, agriculture, analytical standards, safety... 18
  • 19. Questions? ✤ Contact: Alex M. Clark alex@collaborativedrug.com (Collaborative Drug Discovery) 19 Journal of Cheminformatics (2019 ) 10.1186/s13321-019-0357-4 Open Sourc e github.com/cdd/mixtures CDD Vaul t collaborativedrug.com ✤ Also: Leah McEwen (Cornell, InChI, IUPAC)