SlideShare a Scribd company logo
Creating Solubility Models with Reaxys |
Presented By
Date
Creating Solubility Models with Reaxys
Elsevier R&D Solutions Services
Dr. Matthew CLARK
19 January 2016
Creating Solubility Models with Reaxys |
• Reaxys has solubility data that can be used to create and study predictive
models
• Appears to have data more diverse than the well-studied “Huuskonen” data set.
• The nature/diversity of the training set is very important for predictive
models
• The best reported models have the smallest training sets.
• However, these training sets may not be useful for prediction of more diverse
compounds.
• Huuskonen-set-trained model predictions on Reaxys set is poor.
• Reaxys has a diverse set of structures and solubilities
• Each individual measurement is referenced.
• Good source for model making
2
What We Will Learn
Creating Solubility Models with Reaxys |
• In addition to the well-known reactions and compounds, Reaxys is filled with
hundreds of different measured properties reported for the compounds
• Each property is associated with a reference
• Each property has a “cluster” of values such as measurement temperature, pressure,
solvents etc. describing the conditions of the measurement.
• In many cases multiple measurements are reported by different authors at
different times for a particular value.
• A mean, median, and standard deviation can be assessed for the value. Each value is
associated with a reference.
• One can use this data, combined with the chemical structures of the
compounds to make structure-based predictive models for these properties.
• One can then predict the value of new or proposed compounds from their chemical
structures.
Reaxys Property Data
Creating Solubility Models with Reaxys | 4
Reaxys Property Data is Grouped with Conditions
You can select the measurement conditions relevant to your model
Boiling Point
Boiling Point, °C (BP.BP)
Pressure, Torr (BP.P)
Refractive Index
Refractive Index (RI.RI)
Wavelength, nm (RI.W)
Temperature, °C (RI.T)
Dielectric Constant
Dielectric Constant (DIC.DIC)
Frequency, Hz (DIC.F)
Temperature, °C (DIC.T)
Electrical Moment
Description (EM.KW)
Moment, D (EM.EM)
Temperature, °C (EM.T)
Method (EM.MET)
Solvent (EM.SOL)
Enthalpy of Formation
Enthalpy of Formation, Jmol-1
(HFOR.HFOR)
Temperature, °C (HFOR.T)
Pressure, Torr (HFOR.P)
Solubility (MCS)
Solubility, gl-1 (SLB.SLB)
Saturation (SLB.SAT)
Temperature, °C (SLB.T)
Solvent (SLB.SOL)
Ratio of Solvents (SLB.RAT)
Creating Solubility Models with Reaxys |
• There are several ways to access this data
• API (Application Programming Interface) allows direct access
• Download tagged SD file from Reaxys after searching
• “Hop in to” links to automatically go to data
• Reaxys API allows direct access to the data
• XML-based interface
• KNIME, PiplelinePilot supported.
• Need to query based on measurement conditions, (temp, solvent), and nature
of molecules (organic, single-fragment)
• Form-based query
• “Advanced Query”
5
Model Making Tools
Creating Solubility Models with Reaxys | 6
Solubility Query To Select Data and Molecules
SLB.SLB > 0 has a reported solubility
Temperature 19-25 temperature range of measurement
Solvent 'H2O solubility in water
Number of Fragments =1 only one contiguous fragment
Elements = 'c‘ contains carbon!
NOT Chemical Name = '*radical not a radical
Molecular Weight > 40 AND < 1000 molecular weight range
Number of Elements <5 fewer than 5 different elements
Creating Solubility Models with Reaxys | 7
Reviewing Solubility Data in Reaxys
Creating Solubility Models with Reaxys | 8
SolubilitySources
Reaxys logS is -3.67
Creating Solubility Models with Reaxys | 9
Data Processing in KNIME
• Combines compounds with solubility measured in desired conditions
• Convert values to molarity by dividing by molecular weight.
Creating Solubility Models with Reaxys |
• Used with data from Reaxys, and from the Huuskonen paper
• Uses “R” and stepwise multiple regression
• Results and error of prediction appear in a spreadsheet
10
Model Making Workflow
Creating Solubility Models with Reaxys |
• Full compound set, no further
filtering
• 3590 compounds
• Standard error of prediction 1.1
log units
• Not spectacular, but useful
• Training set is larger range of
diversity than used in most
models
• r2 0.56
11
Initial Model and Prediction Result is OK-ish
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
Creating Solubility Models with Reaxys | 12
Reaxys Solubility Model 2 – Filtering of Source Compounds
Residual standard error: 0.6932 on 2697 degrees of freedom
Multiple R-squared: 0.8099, Adjusted R-squared: 0.8037
F-statistic: 132 on 87 and 2697 DF, p-value: < 2.2e-16
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
2785 remain, Examples
of filtered compounds:
Model is better, but
does not improve
prediction of
Huuskonen data set
Creating Solubility Models with Reaxys | 13
Comparison with Other Reports
Clark – fragment-based solubility model r2 0.73, SE 0.89
using “PHYSPROP” data set
Generalized Fragment-Substructure Based Property Prediction Method
Matthew Clark J. Chem. Inf. Model., 2005, 45 (1), pp 30–38
DOI: 10.1021/ci049744c
Creating Solubility Models with Reaxys | 14
Comparison with other data sets
Defined a training set of compounds/solubilities, and test sets that
have been used for several comparative studies
Creating Solubility Models with Reaxys |
• Models made with Huuskonen structures and data using CDK descriptors
and R model
• Using published training, test sets.
• Models not as good as in publication; he used different descriptor
computation and statistical method. Standard error 0.67 log units.
15
Huuskonen Molecule/Data Set Models – (No Reaxys Data)
y = 0.961x
R² = 0.8832
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
y = 0.9452x
R² = 0.8598
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
y = 0.9912x
R² = 0.7857
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
Training Set Test Set 1 Test Set 2
Creating Solubility Models with Reaxys |
• Same molecule sets – Model Trained with Reaxys Training Set
• Standard error 0.98 log units – not bad
16
Huuskonen Molecule Sets – Predicted with Model Created from
Reaxys Data Set
y = 0.8824x
R² = 0.6522
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
y = 0.8834x
R² = 0.6889
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
y = 0.8741x
R² = 0.7968
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
Creating Solubility Models with Reaxys |
• Standard Error 3.5 log units
• Issue is likely that many molecules
from Reaxys are “outside” the
structural diversity of the Huuskonen
data set
• Illustrates a significant issue with
modeling –
• Generally predictions are best when
the molecule are similar to the training
set.
17
Reaxys Molecule Set Predicted with Model Created from
Huuskonen Data Set – Not Very Good
y = 0.6596x - 1.0645
R² = 0.1459
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
predictedlogS
experimental logS
Creating Solubility Models with Reaxys |
• Only a subset of solubilities of the Huuskonen set are found in Reaxys.
• Differences are generally due to multiple measurements being reported with
outliers
18
Does Reaxys Give The Same Solubility Values as Huuskonen Data
Set? Yes.
y = 1.0082x - 0.0367
R² = 0.9607
-12
-10
-8
-6
-4
-2
0
2
4
-12 -10 -8 -6 -4 -2 0 2 4
ReaxyslogS
Huuskonen logS
Creating Solubility Models with Reaxys |
• Similarity matrix of each data set computed set using fingerprints/Tanimoto
• Huuskonen set more similar to each other than Reaxys set
19
Reaxys Solubility Data Set is Structurally More Diverse
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
NormalizedFractionofPair-SimilarityCount
Similarity Value
Huuskonen
Reaxys
Reaxys has a higher
proportion of
molecules not similar
to others in the setNormalized
for different
data set
sizes
Creating Solubility Models with Reaxys |
• Reaxys has solubility data that can be used to create and study predictive
models
• Appears to have data more diverse than the well-studied “Huuskonen” data set.
• The nature/diversity of the training set is very important.
• The best reported models have the smallest training sets.
• However, these training sets may not be useful for prediction of more diverse
compounds.
• Huuskonen-set-trained model predictions on Reaxys set is poor.
• Generally good models can predict with a standard error of about 1 log unit – for
compounds similar to training set.
• Question: what is the accuracy of measurement?
•
𝜕𝑙𝑜𝑔𝑆
𝜕𝑔𝐿−1 =
1
2.303 ∗𝑔𝐿−1 ~ logS changes 0.4 log units/mg for a 1mg/L solubility
• Reaxys has a diverse set of structures and solubilities
• Each individual measurement is referenced.
• Good source for model making
20
What We Learned
Creating Solubility Models with Reaxys |
• Reaxys is a rich source of data for solubility and other properties.
• One can explore many subsets based on condition, molecule class etc.
• High diversity of molecules – organic, inorganic, peptides etc.
• Reaxys is a good source of data for making predictive models
• It provides not just the value, but the measurement conditions
• Selection of “good” measurements is an important factor in making models
• Reaxys contains hundreds of measured properties!
• Solubility is well studied
• Not as many models available for refractive index, magnetic susceptibility etc.
• Reaxys has only measured solubilities, SciFinder has predicted values
• We can see the effect of the training set and model quality in this presentation.
• Reaxys Medicinal Chemistry contains thousands of bioassay results on
thousands of targets that can be used for predictive models.
21
Conclusion

More Related Content

Viewers also liked

Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
Brian Hess
 
The New Crisis And Issues Communications - An Integrated Approach
The New Crisis And Issues Communications - An Integrated ApproachThe New Crisis And Issues Communications - An Integrated Approach
The New Crisis And Issues Communications - An Integrated Approach
Burson-Marsteller Asia-Pacific
 
Renal function tests
Renal function testsRenal function tests
Renal function tests
Tapeshwar Yadav
 
Prednisolona
PrednisolonaPrednisolona
Prednisolona
Milton Lazo Yzaga
 
Prednisona
PrednisonaPrednisona
Prednisona
Katia Rc
 
The single most important thing for the success
The single most important thing for the successThe single most important thing for the success
The single most important thing for the success
Yurij Riphyak
 

Viewers also liked (6)

Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
 
The New Crisis And Issues Communications - An Integrated Approach
The New Crisis And Issues Communications - An Integrated ApproachThe New Crisis And Issues Communications - An Integrated Approach
The New Crisis And Issues Communications - An Integrated Approach
 
Renal function tests
Renal function testsRenal function tests
Renal function tests
 
Prednisolona
PrednisolonaPrednisolona
Prednisolona
 
Prednisona
PrednisonaPrednisona
Prednisona
 
The single most important thing for the success
The single most important thing for the successThe single most important thing for the success
The single most important thing for the success
 

Similar to Making solubility models with reaxy

Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
Abhik Seal
 
Reichert - Lunch & Learn Presentation
Reichert  - Lunch & Learn PresentationReichert  - Lunch & Learn Presentation
Reichert - Lunch & Learn Presentation
Ernie Desmarais
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistry
Ann-Marie Roche
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
jaumebp
 
Reaxys structure searching
Reaxys structure searchingReaxys structure searching
Reaxys structure searching
Ann-Marie Roche
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
Greg Landrum
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Section 05 stoichiometric calculations
Section 05 stoichiometric calculationsSection 05 stoichiometric calculations
Section 05 stoichiometric calculations
Cleophas Rwemera
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
Kamel Mansouri
 
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of ChemistryApps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
A Platform Approach to Preformulation Development for Antibody Products
A Platform Approach to Preformulation Development for Antibody ProductsA Platform Approach to Preformulation Development for Antibody Products
A Platform Approach to Preformulation Development for Antibody Products
KBI Biopharma
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Activity coefficient models
Activity coefficient modelsActivity coefficient models
Activity coefficient models
masudvalavi
 
Catalysis and Catalyst.pptx
Catalysis and Catalyst.pptxCatalysis and Catalyst.pptx
Catalysis and Catalyst.pptx
IyerVasundhara
 
Stoichiometric Calculations
Stoichiometric CalculationsStoichiometric Calculations
Stoichiometric Calculations
Nishoanth Ramanathan
 
Analytical centrifugation
Analytical centrifugationAnalytical centrifugation
Analytical centrifugation
Varshini3
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Cad introduction 2019 30 min
Cad introduction 2019 30 minCad introduction 2019 30 min
Cad introduction 2019 30 min
Oskari Aro
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
Lee Larcombe
 
Sampling Techniques
Sampling TechniquesSampling Techniques
Sampling Techniques
CarloJamesSablan1
 

Similar to Making solubility models with reaxy (20)

Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Reichert - Lunch & Learn Presentation
Reichert  - Lunch & Learn PresentationReichert  - Lunch & Learn Presentation
Reichert - Lunch & Learn Presentation
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistry
 
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
A Mixed Discrete-Continuous Attribute List Representation for Large Scale Cla...
 
Reaxys structure searching
Reaxys structure searchingReaxys structure searching
Reaxys structure searching
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...The needs for chemistry standards, database tools and data curation at the ch...
The needs for chemistry standards, database tools and data curation at the ch...
 
Section 05 stoichiometric calculations
Section 05 stoichiometric calculationsSection 05 stoichiometric calculations
Section 05 stoichiometric calculations
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of ChemistryApps and approaches to mobilizing chemistry from the Royal Society of Chemistry
Apps and approaches to mobilizing chemistry from the Royal Society of Chemistry
 
A Platform Approach to Preformulation Development for Antibody Products
A Platform Approach to Preformulation Development for Antibody ProductsA Platform Approach to Preformulation Development for Antibody Products
A Platform Approach to Preformulation Development for Antibody Products
 
ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...ChemSpider reactions – delivering a free community resource of chemical synth...
ChemSpider reactions – delivering a free community resource of chemical synth...
 
Activity coefficient models
Activity coefficient modelsActivity coefficient models
Activity coefficient models
 
Catalysis and Catalyst.pptx
Catalysis and Catalyst.pptxCatalysis and Catalyst.pptx
Catalysis and Catalyst.pptx
 
Stoichiometric Calculations
Stoichiometric CalculationsStoichiometric Calculations
Stoichiometric Calculations
 
Analytical centrifugation
Analytical centrifugationAnalytical centrifugation
Analytical centrifugation
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
Cad introduction 2019 30 min
Cad introduction 2019 30 minCad introduction 2019 30 min
Cad introduction 2019 30 min
 
Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
Sampling Techniques
Sampling TechniquesSampling Techniques
Sampling Techniques
 

More from Ann-Marie Roche

How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
Ann-Marie Roche
 
Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017
Ann-Marie Roche
 
Oil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor Tari
Oil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor TariOil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor Tari
Oil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor Tari
Ann-Marie Roche
 
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob Forkner
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob ForknerOil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob Forkner
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob Forkner
Ann-Marie Roche
 
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander Houben
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander HoubenOil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander Houben
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander Houben
Ann-Marie Roche
 
Embase for pharmacovigilance: Search and validation March 22 2017
Embase for pharmacovigilance: Search and validation March 22 2017Embase for pharmacovigilance: Search and validation March 22 2017
Embase for pharmacovigilance: Search and validation March 22 2017
Ann-Marie Roche
 
Literature Management for Pharmacovigilance: Outsource or in-house solution? ...
Literature Management for Pharmacovigilance: Outsource or in-house solution? ...Literature Management for Pharmacovigilance: Outsource or in-house solution? ...
Literature Management for Pharmacovigilance: Outsource or in-house solution? ...
Ann-Marie Roche
 
Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016
Ann-Marie Roche
 
Medical device reporting 27 sep2016
Medical device reporting 27 sep2016Medical device reporting 27 sep2016
Medical device reporting 27 sep2016
Ann-Marie Roche
 
Eac webinar 09.21.2016
Eac webinar 09.21.2016Eac webinar 09.21.2016
Eac webinar 09.21.2016
Ann-Marie Roche
 
Literature monitoring for pv what are we doing at galderma elsevier webinar
Literature monitoring for pv   what are we doing at galderma elsevier webinarLiterature monitoring for pv   what are we doing at galderma elsevier webinar
Literature monitoring for pv what are we doing at galderma elsevier webinar
Ann-Marie Roche
 
Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0
Ann-Marie Roche
 
Knovel lss webinar
Knovel lss webinarKnovel lss webinar
Knovel lss webinar
Ann-Marie Roche
 
Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_
Ann-Marie Roche
 
Pathway studiosymposium lorenzi
Pathway studiosymposium lorenziPathway studiosymposium lorenzi
Pathway studiosymposium lorenzi
Ann-Marie Roche
 
Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)
Ann-Marie Roche
 
Julie glanville embase sunrise seminar may 2016
Julie glanville embase sunrise seminar may 2016Julie glanville embase sunrise seminar may 2016
Julie glanville embase sunrise seminar may 2016
Ann-Marie Roche
 
Ian crowlesmith embase retrospective mla 2016
Ian crowlesmith embase retrospective mla 2016Ian crowlesmith embase retrospective mla 2016
Ian crowlesmith embase retrospective mla 2016
Ann-Marie Roche
 
Ivan krstic embase update mla 2016
Ivan krstic embase update mla 2016Ivan krstic embase update mla 2016
Ivan krstic embase update mla 2016
Ann-Marie Roche
 
Kp bloch psm preparedness final rev
Kp bloch psm preparedness final revKp bloch psm preparedness final rev
Kp bloch psm preparedness final rev
Ann-Marie Roche
 

More from Ann-Marie Roche (20)

How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017Webinar: New RMC - Your lead_optimization Solution June082017
Webinar: New RMC - Your lead_optimization Solution June082017
 
Oil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor Tari
Oil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor TariOil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor Tari
Oil&Gas Thought Leader Webinar - New Plays for Old Ideas - Dr.Gabor Tari
 
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob Forkner
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob ForknerOil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob Forkner
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Rob Forkner
 
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander Houben
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander HoubenOil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander Houben
Oil&Gas Thought-Leader Webinar - New Plays for Old Ideas - Dr. Sander Houben
 
Embase for pharmacovigilance: Search and validation March 22 2017
Embase for pharmacovigilance: Search and validation March 22 2017Embase for pharmacovigilance: Search and validation March 22 2017
Embase for pharmacovigilance: Search and validation March 22 2017
 
Literature Management for Pharmacovigilance: Outsource or in-house solution? ...
Literature Management for Pharmacovigilance: Outsource or in-house solution? ...Literature Management for Pharmacovigilance: Outsource or in-house solution? ...
Literature Management for Pharmacovigilance: Outsource or in-house solution? ...
 
Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016Finding the right medical device information in embase 11 2016
Finding the right medical device information in embase 11 2016
 
Medical device reporting 27 sep2016
Medical device reporting 27 sep2016Medical device reporting 27 sep2016
Medical device reporting 27 sep2016
 
Eac webinar 09.21.2016
Eac webinar 09.21.2016Eac webinar 09.21.2016
Eac webinar 09.21.2016
 
Literature monitoring for pv what are we doing at galderma elsevier webinar
Literature monitoring for pv   what are we doing at galderma elsevier webinarLiterature monitoring for pv   what are we doing at galderma elsevier webinar
Literature monitoring for pv what are we doing at galderma elsevier webinar
 
Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0Drug analytics based on triple linking v1.0
Drug analytics based on triple linking v1.0
 
Knovel lss webinar
Knovel lss webinarKnovel lss webinar
Knovel lss webinar
 
Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_Reaxys rmc unified platform_ webinar_
Reaxys rmc unified platform_ webinar_
 
Pathway studiosymposium lorenzi
Pathway studiosymposium lorenziPathway studiosymposium lorenzi
Pathway studiosymposium lorenzi
 
Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)Searching literature databases for post authorisation safety studies (pass)
Searching literature databases for post authorisation safety studies (pass)
 
Julie glanville embase sunrise seminar may 2016
Julie glanville embase sunrise seminar may 2016Julie glanville embase sunrise seminar may 2016
Julie glanville embase sunrise seminar may 2016
 
Ian crowlesmith embase retrospective mla 2016
Ian crowlesmith embase retrospective mla 2016Ian crowlesmith embase retrospective mla 2016
Ian crowlesmith embase retrospective mla 2016
 
Ivan krstic embase update mla 2016
Ivan krstic embase update mla 2016Ivan krstic embase update mla 2016
Ivan krstic embase update mla 2016
 
Kp bloch psm preparedness final rev
Kp bloch psm preparedness final revKp bloch psm preparedness final rev
Kp bloch psm preparedness final rev
 

Recently uploaded

Histololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptxHistololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptx
AyeshaZaid1
 
Cardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdfCardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdf
shivalingatalekar1
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
FFragrant
 
pathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathologypathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathology
ZayedKhan38
 
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptxVestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Dr. Rabia Inam Gandapore
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
Dr. Ahana Haroon
 
Osteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdfOsteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdf
Jim Jacob Roy
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
LaniyaNasrink
 
Acute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdfAcute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdf
Jim Jacob Roy
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
Tina Purnat
 
Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)
Josep Vidal-Alaball
 
Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...
Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...
Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...
Kosmoderma Academy Of Aesthetic Medicine
 
Complementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLSComplementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLS
chiranthgowda16
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
LEFLOT Jean-Louis
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
FFragrant
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
suvadeepdas911
 
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotesPromoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
PsychoTech Services
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
MedicoseAcademics
 
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Jim Jacob Roy
 
Pharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and AntagonistPharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and Antagonist
Dr. Nikhilkumar Sakle
 

Recently uploaded (20)

Histololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptxHistololgy of Female Reproductive System.pptx
Histololgy of Female Reproductive System.pptx
 
Cardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdfCardiac Assessment for B.sc Nursing Student.pdf
Cardiac Assessment for B.sc Nursing Student.pdf
 
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
Demystifying Fallopian Tube Blockage- Grading the Differences and Implication...
 
pathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathologypathology MCQS introduction to pathology general pathology
pathology MCQS introduction to pathology general pathology
 
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptxVestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
Vestibulocochlear Nerve by Dr. Rabia Inam Gandapore.pptx
 
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USENARCOTICS- POLICY AND PROCEDURES FOR ITS USE
NARCOTICS- POLICY AND PROCEDURES FOR ITS USE
 
Osteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdfOsteoporosis - Definition , Evaluation and Management .pdf
Osteoporosis - Definition , Evaluation and Management .pdf
 
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptxREGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
REGULATION FOR COMBINATION PRODUCTS AND MEDICAL DEVICES.pptx
 
Acute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdfAcute Gout Care & Urate Lowering Therapy .pdf
Acute Gout Care & Urate Lowering Therapy .pdf
 
share - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptxshare - Lions, tigers, AI and health misinformation, oh my!.pptx
share - Lions, tigers, AI and health misinformation, oh my!.pptx
 
Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)Artificial Intelligence Symposium (THAIS)
Artificial Intelligence Symposium (THAIS)
 
Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...
Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...
Cosmetology and Trichology Courses at Kosmoderma Academy PRP (Hair), DR Growt...
 
Complementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLSComplementary feeding in infant IAP PROTOCOLS
Complementary feeding in infant IAP PROTOCOLS
 
Cervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptxCervical Disc Arthroplasty ORSI 2024.pptx
Cervical Disc Arthroplasty ORSI 2024.pptx
 
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptxPost-Menstrual Smell- When to Suspect Vaginitis.pptx
Post-Menstrual Smell- When to Suspect Vaginitis.pptx
 
CBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdfCBL Seminar 2024_Preliminary Program.pdf
CBL Seminar 2024_Preliminary Program.pdf
 
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotesPromoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
Promoting Wellbeing - Applied Social Psychology - Psychology SuperNotes
 
The Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of RespirationThe Nervous and Chemical Regulation of Respiration
The Nervous and Chemical Regulation of Respiration
 
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdfMedical Quiz ( Online Quiz for API Meet 2024 ).pdf
Medical Quiz ( Online Quiz for API Meet 2024 ).pdf
 
Pharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and AntagonistPharmacology of 5-hydroxytryptamine and Antagonist
Pharmacology of 5-hydroxytryptamine and Antagonist
 

Making solubility models with reaxy

  • 1. Creating Solubility Models with Reaxys | Presented By Date Creating Solubility Models with Reaxys Elsevier R&D Solutions Services Dr. Matthew CLARK 19 January 2016
  • 2. Creating Solubility Models with Reaxys | • Reaxys has solubility data that can be used to create and study predictive models • Appears to have data more diverse than the well-studied “Huuskonen” data set. • The nature/diversity of the training set is very important for predictive models • The best reported models have the smallest training sets. • However, these training sets may not be useful for prediction of more diverse compounds. • Huuskonen-set-trained model predictions on Reaxys set is poor. • Reaxys has a diverse set of structures and solubilities • Each individual measurement is referenced. • Good source for model making 2 What We Will Learn
  • 3. Creating Solubility Models with Reaxys | • In addition to the well-known reactions and compounds, Reaxys is filled with hundreds of different measured properties reported for the compounds • Each property is associated with a reference • Each property has a “cluster” of values such as measurement temperature, pressure, solvents etc. describing the conditions of the measurement. • In many cases multiple measurements are reported by different authors at different times for a particular value. • A mean, median, and standard deviation can be assessed for the value. Each value is associated with a reference. • One can use this data, combined with the chemical structures of the compounds to make structure-based predictive models for these properties. • One can then predict the value of new or proposed compounds from their chemical structures. Reaxys Property Data
  • 4. Creating Solubility Models with Reaxys | 4 Reaxys Property Data is Grouped with Conditions You can select the measurement conditions relevant to your model Boiling Point Boiling Point, °C (BP.BP) Pressure, Torr (BP.P) Refractive Index Refractive Index (RI.RI) Wavelength, nm (RI.W) Temperature, °C (RI.T) Dielectric Constant Dielectric Constant (DIC.DIC) Frequency, Hz (DIC.F) Temperature, °C (DIC.T) Electrical Moment Description (EM.KW) Moment, D (EM.EM) Temperature, °C (EM.T) Method (EM.MET) Solvent (EM.SOL) Enthalpy of Formation Enthalpy of Formation, Jmol-1 (HFOR.HFOR) Temperature, °C (HFOR.T) Pressure, Torr (HFOR.P) Solubility (MCS) Solubility, gl-1 (SLB.SLB) Saturation (SLB.SAT) Temperature, °C (SLB.T) Solvent (SLB.SOL) Ratio of Solvents (SLB.RAT)
  • 5. Creating Solubility Models with Reaxys | • There are several ways to access this data • API (Application Programming Interface) allows direct access • Download tagged SD file from Reaxys after searching • “Hop in to” links to automatically go to data • Reaxys API allows direct access to the data • XML-based interface • KNIME, PiplelinePilot supported. • Need to query based on measurement conditions, (temp, solvent), and nature of molecules (organic, single-fragment) • Form-based query • “Advanced Query” 5 Model Making Tools
  • 6. Creating Solubility Models with Reaxys | 6 Solubility Query To Select Data and Molecules SLB.SLB > 0 has a reported solubility Temperature 19-25 temperature range of measurement Solvent 'H2O solubility in water Number of Fragments =1 only one contiguous fragment Elements = 'c‘ contains carbon! NOT Chemical Name = '*radical not a radical Molecular Weight > 40 AND < 1000 molecular weight range Number of Elements <5 fewer than 5 different elements
  • 7. Creating Solubility Models with Reaxys | 7 Reviewing Solubility Data in Reaxys
  • 8. Creating Solubility Models with Reaxys | 8 SolubilitySources Reaxys logS is -3.67
  • 9. Creating Solubility Models with Reaxys | 9 Data Processing in KNIME • Combines compounds with solubility measured in desired conditions • Convert values to molarity by dividing by molecular weight.
  • 10. Creating Solubility Models with Reaxys | • Used with data from Reaxys, and from the Huuskonen paper • Uses “R” and stepwise multiple regression • Results and error of prediction appear in a spreadsheet 10 Model Making Workflow
  • 11. Creating Solubility Models with Reaxys | • Full compound set, no further filtering • 3590 compounds • Standard error of prediction 1.1 log units • Not spectacular, but useful • Training set is larger range of diversity than used in most models • r2 0.56 11 Initial Model and Prediction Result is OK-ish -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS
  • 12. Creating Solubility Models with Reaxys | 12 Reaxys Solubility Model 2 – Filtering of Source Compounds Residual standard error: 0.6932 on 2697 degrees of freedom Multiple R-squared: 0.8099, Adjusted R-squared: 0.8037 F-statistic: 132 on 87 and 2697 DF, p-value: < 2.2e-16 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS 2785 remain, Examples of filtered compounds: Model is better, but does not improve prediction of Huuskonen data set
  • 13. Creating Solubility Models with Reaxys | 13 Comparison with Other Reports Clark – fragment-based solubility model r2 0.73, SE 0.89 using “PHYSPROP” data set Generalized Fragment-Substructure Based Property Prediction Method Matthew Clark J. Chem. Inf. Model., 2005, 45 (1), pp 30–38 DOI: 10.1021/ci049744c
  • 14. Creating Solubility Models with Reaxys | 14 Comparison with other data sets Defined a training set of compounds/solubilities, and test sets that have been used for several comparative studies
  • 15. Creating Solubility Models with Reaxys | • Models made with Huuskonen structures and data using CDK descriptors and R model • Using published training, test sets. • Models not as good as in publication; he used different descriptor computation and statistical method. Standard error 0.67 log units. 15 Huuskonen Molecule/Data Set Models – (No Reaxys Data) y = 0.961x R² = 0.8832 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS y = 0.9452x R² = 0.8598 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS y = 0.9912x R² = 0.7857 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS Training Set Test Set 1 Test Set 2
  • 16. Creating Solubility Models with Reaxys | • Same molecule sets – Model Trained with Reaxys Training Set • Standard error 0.98 log units – not bad 16 Huuskonen Molecule Sets – Predicted with Model Created from Reaxys Data Set y = 0.8824x R² = 0.6522 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS y = 0.8834x R² = 0.6889 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS y = 0.8741x R² = 0.7968 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS
  • 17. Creating Solubility Models with Reaxys | • Standard Error 3.5 log units • Issue is likely that many molecules from Reaxys are “outside” the structural diversity of the Huuskonen data set • Illustrates a significant issue with modeling – • Generally predictions are best when the molecule are similar to the training set. 17 Reaxys Molecule Set Predicted with Model Created from Huuskonen Data Set – Not Very Good y = 0.6596x - 1.0645 R² = 0.1459 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 predictedlogS experimental logS
  • 18. Creating Solubility Models with Reaxys | • Only a subset of solubilities of the Huuskonen set are found in Reaxys. • Differences are generally due to multiple measurements being reported with outliers 18 Does Reaxys Give The Same Solubility Values as Huuskonen Data Set? Yes. y = 1.0082x - 0.0367 R² = 0.9607 -12 -10 -8 -6 -4 -2 0 2 4 -12 -10 -8 -6 -4 -2 0 2 4 ReaxyslogS Huuskonen logS
  • 19. Creating Solubility Models with Reaxys | • Similarity matrix of each data set computed set using fingerprints/Tanimoto • Huuskonen set more similar to each other than Reaxys set 19 Reaxys Solubility Data Set is Structurally More Diverse 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 NormalizedFractionofPair-SimilarityCount Similarity Value Huuskonen Reaxys Reaxys has a higher proportion of molecules not similar to others in the setNormalized for different data set sizes
  • 20. Creating Solubility Models with Reaxys | • Reaxys has solubility data that can be used to create and study predictive models • Appears to have data more diverse than the well-studied “Huuskonen” data set. • The nature/diversity of the training set is very important. • The best reported models have the smallest training sets. • However, these training sets may not be useful for prediction of more diverse compounds. • Huuskonen-set-trained model predictions on Reaxys set is poor. • Generally good models can predict with a standard error of about 1 log unit – for compounds similar to training set. • Question: what is the accuracy of measurement? • 𝜕𝑙𝑜𝑔𝑆 𝜕𝑔𝐿−1 = 1 2.303 ∗𝑔𝐿−1 ~ logS changes 0.4 log units/mg for a 1mg/L solubility • Reaxys has a diverse set of structures and solubilities • Each individual measurement is referenced. • Good source for model making 20 What We Learned
  • 21. Creating Solubility Models with Reaxys | • Reaxys is a rich source of data for solubility and other properties. • One can explore many subsets based on condition, molecule class etc. • High diversity of molecules – organic, inorganic, peptides etc. • Reaxys is a good source of data for making predictive models • It provides not just the value, but the measurement conditions • Selection of “good” measurements is an important factor in making models • Reaxys contains hundreds of measured properties! • Solubility is well studied • Not as many models available for refractive index, magnetic susceptibility etc. • Reaxys has only measured solubilities, SciFinder has predicted values • We can see the effect of the training set and model quality in this presentation. • Reaxys Medicinal Chemistry contains thousands of bioassay results on thousands of targets that can be used for predictive models. 21 Conclusion