SlideShare a Scribd company logo
1 of 25
Download to read offline
Data-analytic sins in property-based
molecular design
Peter Kenny
pwk.pub.2008@gmail.com | http://fbdd-lit.blogspot.com
TEP =
[𝐷𝑟𝑢𝑔 𝑿,𝑡 ] 𝑓𝑟𝑒𝑒
𝐾 𝑑
Target engagement potential (TEP)
A basis for molecular design?
Property-based design as search for ‘sweet spot’
Correlation
• Strong correlation implies good predictivity
– I have observed a correlation so you must use my rule
• Multivariate data analysis (e.g. PCA) usually involves
transformation to orthogonal basis
• Applying cutoffs (e.g. MW restriction) to data can
distort correlations
• Noise and range limits in data
Quantifying strengths of relationships between
continuous variables
• Correlation measures
– Pearson product-moment correlation coefficient (R)
– Spearman's rank correlation coefficient ()
– Kendall rank correlation coefficient (τ)
• Quality of fit measures
– Coefficient of determination (R2) is the fraction of the
variance in Y that is explained by model
– Root mean square error (RMSE)
Preparation of synthetic data sets
Kenny & Montanari (2013) JCAMD 27:1-13 DOI
Add Gaussian noise
(SD=10) to Y
Correlation inflation by hiding variation
See Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI
Leeson & Springthorpe (2007) NRDD 6:881-890 DOI
Data is naturally binned (X is an integer) and mean value of Y is calculated for each
value of X. In some studies, averaged data is only presented graphically and it is left to
the reader to judge the strength of the correlation.
R = 0.34 R = 0.30 R = 0.31
R = 0.67 R = 0.93 R = 0.996
r
N 1202
R 0.247 ( 95% CI: 0.193 | 0.299)
 0.215 ( P < 0.0001)
 0.148 ( P < 0.0001)
N 8
R 0.972 ( 95% CI: 0.846 | 0.995)
 0.970 ( P < 0.0001)
 0.909 ( P = 0.0018)
Correlation Inflation in Flatland
See Lovering, Bikker & Humblet (2009) JMC 52:6752-6756 DOI
Masking variation with standard error
See Gleeson (2008) JMC 51:817-834 DOI
Partition by value of X into 4 bins with equal numbers of data points and display 95%
confidence interval for mean (green) and mean ± SD (blue) for each bin.
R = 0.12 R = 0.29 R = 0.28
N Bins Degrees of Freedom F P
40 4 3 0.2596 0.8540
400 4 3 12.855 < 0.0001
4000 4 3 115.35 < 0.0001
4000 2 1 270.91 < 0.0001
4000 8 7 50.075 < 0.0001
“In each plot provided, the width of the errors bars and the difference in the mean
values of the different categories are indicative of the strength of the relationship
between the parameters.” Gleeson (2008) JMC 51:817-834 DOI
The error of standard error
ANOVA for binned data sets
Know your data
• Assays are typically run in replicate making it possible
to estimate assay variance
• Every assay has a finite dynamic range and it may not
always be obvious what this is for a particular assay
• Dynamic range may have been sacrificed for
thoughput but this, by itself, does not make the
assay bad
• We need to be able analyse in-range and out-of-
range data within single unified framework
– See Lind (2010) QSAR analysis involving assay results which are only known to
be greater than, or less than some cut-off limit. Mol Inf 29:845-852 DOI
Depicting variation with
percentile plots
This graphical representation of data makes it easy
to visualize variation and can be used with mixed
in-range and out-of-range data. See Colclough et
al (2008) BMCL 16:6611-6616 DOI
Binning continuous data restricts your options for analysis and
places burden of proof on you to show that your conclusions are
independent of the binning scheme. Think before you bin!
Averaging the
binned data was
your idea so don’t
try blaming me this
time!
Correlation inflation: some stuff to think about
• Model continuous data as continuous data
– RMSE is most relevant to prediction but you still need R2
– Fitted parameters may provide insight (e.g. solubility is more sensitive than
potency to lipophilicity)
• When selecting training data think in terms of Design of Experiments
(e.g. evenly spaced values of X)
• Try to achieve normally distributed Y (e.g. use pIC50 rather than IC50)
• Never make statements about the strength of a relationship when
you’ve hidden or masked variation in the data (unless you want a
starring role in Correlation Inflation 2)
• To be meaningful, a measure of the spread of a distribution must be
independent of sample size
• Reviewers/editors, mercilessly purge manuscripts of statements like,
“A negative correlation was observed between X and Y” or “A and B are
correlated/linked”
Ligand efficiency metrics (LEMs) considered harmful
• We use LEMs to normalize activity with respect to risk
factors such as molecular size and lipophilicity
• What do we mean by normalization?
• We make assumptions about underlying relationship
between activity and risk factor(s) when we define an
LEM
• LEM as measure of extent to which activity beats a
trend?
Kenny, Leitão & Montanari (2014) JCAMD 28:699-710 DOI
Scale activity/affinity by risk factor
LE = ΔG/HA
Offset activity/affinity by risk factor
LipE = pIC50  ClogP
Ligand efficiency metrics
No reason that dependence of activity on risk factor should be restricted to
one of these two linear models
Use trend actually observed in data for normalization
rather than some arbitrarily assumed trend
There’s a reason why we say standard free energy
of binding…
DG = DH TDS = RTln(Kd/C0)
• Adoption of 1 M as standard concentration is
arbitrary
• A view of a chemical system that changes with
the choice of standard concentration is
thermodynamically invalid
NHA Kd/M C/M (1/NHA) log10(Kd/C)
10 10-3 1 0.30
20 10-6 1 0.30
30 10-9 1 0.30
10 10-3 0.1 0.20
20 10-6 0.1 0.25
30 10-9 0.1 0.27
10 10-3 10 0.40
20 10-6 10 0.35
30 10-9 10 0.33
Effect on LE of changing standard concentration
Scaling transformation of parallel lines by dividing Y by X
(This is how ligand efficiency is calculated)
Size dependency of LE is consequence of non-zero intercept
Affinity plotted against molecular weight for minimal binding
elements against various targets in inhibitor deconstruction
study showing variation in intercept term
Hajduk PJ (2006) J Med Chem 49:6972–6976 DOI
Is it valid to combine results from different assays in LE analysis?
Offsetting transformation of lines with different slope and
common intercept by subtracting X from Y
(This is how lipophilic efficiency is calculated)
Thankfully (hopefully?) nobody has ‘discovered’
lipophilicity-dependent lipophilic efficiency yet
Linear fit of ΔG for published data set
Mortenson & Murray (2011) JCAMD 25:663-667 DOI
Ligand efficiency, group efficiency and residuals plotted
for published data set
Some more stuff to think about
• Normalize activity using trend actually observed in
data (this means you have to model the data)
• Residuals are invariant with respect to choice in
standard concentration
• Residuals can be used with other functional forms
(e.g. non-linear and multi-linear)

More Related Content

What's hot

Factor analysis
Factor analysisFactor analysis
Factor analysissaba khan
 
Addressing moderated mediation preacher rucker_hayes_2007
Addressing moderated mediation preacher rucker_hayes_2007Addressing moderated mediation preacher rucker_hayes_2007
Addressing moderated mediation preacher rucker_hayes_2007Payal Anand
 
Research Methodology (Correlational Research) By Emeral & Sarah
Research Methodology (Correlational Research) By Emeral & SarahResearch Methodology (Correlational Research) By Emeral & Sarah
Research Methodology (Correlational Research) By Emeral & SarahEmeral Djunas
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examplesGaurav Kamboj
 
Statistics And Correlation
Statistics And CorrelationStatistics And Correlation
Statistics And Correlationpankaj prabhakar
 
Factor analysis
Factor analysisFactor analysis
Factor analysis緯鈞 沈
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisDaire Hooper
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor AnalysesNeerav Shivhare
 
Factor Analysis for Exploratory Studies
Factor Analysis for Exploratory StudiesFactor Analysis for Exploratory Studies
Factor Analysis for Exploratory StudiesManohar Pahan
 
Factor analysis in Spss
Factor analysis in SpssFactor analysis in Spss
Factor analysis in SpssFayaz Ahmad
 

What's hot (18)

Meta analysis with R
Meta analysis with RMeta analysis with R
Meta analysis with R
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Addressing moderated mediation preacher rucker_hayes_2007
Addressing moderated mediation preacher rucker_hayes_2007Addressing moderated mediation preacher rucker_hayes_2007
Addressing moderated mediation preacher rucker_hayes_2007
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
 
Research Methodology (Correlational Research) By Emeral & Sarah
Research Methodology (Correlational Research) By Emeral & SarahResearch Methodology (Correlational Research) By Emeral & Sarah
Research Methodology (Correlational Research) By Emeral & Sarah
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 
EFA
EFAEFA
EFA
 
Logistic regression with SPSS examples
Logistic regression with SPSS examplesLogistic regression with SPSS examples
Logistic regression with SPSS examples
 
Statistics And Correlation
Statistics And CorrelationStatistics And Correlation
Statistics And Correlation
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
 
Econometrics chapter 8
Econometrics chapter 8Econometrics chapter 8
Econometrics chapter 8
 
Factor Analysis for Exploratory Studies
Factor Analysis for Exploratory StudiesFactor Analysis for Exploratory Studies
Factor Analysis for Exploratory Studies
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Factor Analysis with an Example
Factor Analysis with an ExampleFactor Analysis with an Example
Factor Analysis with an Example
 
Path analysis
Path analysisPath analysis
Path analysis
 
Factor analysis in Spss
Factor analysis in SpssFactor analysis in Spss
Factor analysis in Spss
 

Similar to Data-analytic sins in property-based molecular design

Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Peter Kenny
 
Molecular design: How to and how not to?
Molecular design:  How to and how not to?Molecular design:  How to and how not to?
Molecular design: How to and how not to?Peter Kenny
 
Ligand efficiency: nice concept shame about the metrics
Ligand efficiency: nice concept shame about the metricsLigand efficiency: nice concept shame about the metrics
Ligand efficiency: nice concept shame about the metricsPeter Kenny
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-completeDr Hemant Sharma
 
Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)Peter Kenny
 
Data analysis
Data analysisData analysis
Data analysisamlbinder
 
Predicating continuous variables-1.pptx
Predicating continuous  variables-1.pptxPredicating continuous  variables-1.pptx
Predicating continuous variables-1.pptxluckyanirudhsai
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisShailendra Tomar
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Statistics
StatisticsStatistics
Statisticsmegamsma
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical DatasetsAbhik Seal
 
Measures of Dispersion
Measures of DispersionMeasures of Dispersion
Measures of DispersionKainatIqbal7
 
Molecular design: One step back and two paths forward
Molecular design:  One step back and two paths forwardMolecular design:  One step back and two paths forward
Molecular design: One step back and two paths forwardPeter Kenny
 
Kendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotKendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotBharath kumar Karanam
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataNick Stauner
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfJermaeDizon2
 
Terms for smartPLS.pptx
Terms for smartPLS.pptxTerms for smartPLS.pptx
Terms for smartPLS.pptxkinmengcheng1
 

Similar to Data-analytic sins in property-based molecular design (20)

Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC) Tales of correlation inflation (2013 CADD GRC)
Tales of correlation inflation (2013 CADD GRC)
 
BrazMedChem2014
BrazMedChem2014BrazMedChem2014
BrazMedChem2014
 
Molecular design: How to and how not to?
Molecular design:  How to and how not to?Molecular design:  How to and how not to?
Molecular design: How to and how not to?
 
Ligand efficiency: nice concept shame about the metrics
Ligand efficiency: nice concept shame about the metricsLigand efficiency: nice concept shame about the metrics
Ligand efficiency: nice concept shame about the metrics
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Slides sem on pls-complete
Slides sem on pls-completeSlides sem on pls-complete
Slides sem on pls-complete
 
Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)Aspects of pharmaceutical molecular design (Fidelta version)
Aspects of pharmaceutical molecular design (Fidelta version)
 
Data analysis
Data analysisData analysis
Data analysis
 
Predicating continuous variables-1.pptx
Predicating continuous  variables-1.pptxPredicating continuous  variables-1.pptx
Predicating continuous variables-1.pptx
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Statistics
StatisticsStatistics
Statistics
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
Measures of Dispersion
Measures of DispersionMeasures of Dispersion
Measures of Dispersion
 
Molecular design: One step back and two paths forward
Molecular design:  One step back and two paths forwardMolecular design:  One step back and two paths forward
Molecular design: One step back and two paths forward
 
Kendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plotKendall's ,partial correlation and scatter plot
Kendall's ,partial correlation and scatter plot
 
Estimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale dataEstimators for structural equation models of Likert scale data
Estimators for structural equation models of Likert scale data
 
cannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdfcannonicalpresentation-110505114327-phpapp01.pdf
cannonicalpresentation-110505114327-phpapp01.pdf
 
Terms for smartPLS.pptx
Terms for smartPLS.pptxTerms for smartPLS.pptx
Terms for smartPLS.pptx
 

More from Peter Kenny

LE Metrics (EuroQSAR2016)
LE Metrics (EuroQSAR2016)LE Metrics (EuroQSAR2016)
LE Metrics (EuroQSAR2016)Peter Kenny
 
Thermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry designThermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry designPeter Kenny
 
partition coefficients in drug discovery
partition coefficients in drug discoverypartition coefficients in drug discovery
partition coefficients in drug discoveryPeter Kenny
 
Property-based molecular design: where next? (12-Jun-2015)
Property-based molecular design: where next? (12-Jun-2015)Property-based molecular design: where next? (12-Jun-2015)
Property-based molecular design: where next? (12-Jun-2015)Peter Kenny
 
Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)Peter Kenny
 
Aspects of pharmaceutical molecular design
Aspects of pharmaceutical molecular designAspects of pharmaceutical molecular design
Aspects of pharmaceutical molecular designPeter Kenny
 
Perspective of pharmaceutical molecular design
Perspective of pharmaceutical molecular designPerspective of pharmaceutical molecular design
Perspective of pharmaceutical molecular designPeter Kenny
 
Some new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular designSome new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular designPeter Kenny
 
A survey of halogens (2008 EuroCUP)
A survey of halogens (2008 EuroCUP)A survey of halogens (2008 EuroCUP)
A survey of halogens (2008 EuroCUP)Peter Kenny
 
Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)Peter Kenny
 
Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)Peter Kenny
 
Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)Peter Kenny
 
Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)Peter Kenny
 
Lipophilicity in the context of molecular design
Lipophilicity in the context of molecular designLipophilicity in the context of molecular design
Lipophilicity in the context of molecular designPeter Kenny
 
From screening to molecular interactions: A short tour
From screening to molecular interactions: A short tour From screening to molecular interactions: A short tour
From screening to molecular interactions: A short tour Peter Kenny
 
An overview of drug discovery
An overview of drug discoveryAn overview of drug discovery
An overview of drug discoveryPeter Kenny
 
I'm a molecule designer... get me out of here!
I'm a molecule designer... get me out of here!I'm a molecule designer... get me out of here!
I'm a molecule designer... get me out of here!Peter Kenny
 

More from Peter Kenny (20)

LE Metrics (EuroQSAR2016)
LE Metrics (EuroQSAR2016)LE Metrics (EuroQSAR2016)
LE Metrics (EuroQSAR2016)
 
PWK EuroQSAR
PWK EuroQSARPWK EuroQSAR
PWK EuroQSAR
 
Thermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry designThermodynamics for medicinal chemistry design
Thermodynamics for medicinal chemistry design
 
partition coefficients in drug discovery
partition coefficients in drug discoverypartition coefficients in drug discovery
partition coefficients in drug discovery
 
Property-based molecular design: where next? (12-Jun-2015)
Property-based molecular design: where next? (12-Jun-2015)Property-based molecular design: where next? (12-Jun-2015)
Property-based molecular design: where next? (12-Jun-2015)
 
Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)Aspects of pharmaceutical molecular design (Belgrade version)
Aspects of pharmaceutical molecular design (Belgrade version)
 
IQSC Oct 2014
IQSC Oct 2014IQSC Oct 2014
IQSC Oct 2014
 
UCT Oct 2014
UCT Oct 2014UCT Oct 2014
UCT Oct 2014
 
Aspects of pharmaceutical molecular design
Aspects of pharmaceutical molecular designAspects of pharmaceutical molecular design
Aspects of pharmaceutical molecular design
 
Perspective of pharmaceutical molecular design
Perspective of pharmaceutical molecular designPerspective of pharmaceutical molecular design
Perspective of pharmaceutical molecular design
 
Some new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular designSome new directions for pharmaceutical molecular design
Some new directions for pharmaceutical molecular design
 
A survey of halogens (2008 EuroCUP)
A survey of halogens (2008 EuroCUP)A survey of halogens (2008 EuroCUP)
A survey of halogens (2008 EuroCUP)
 
Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)Fragment screening library workshop (IQPC 2008)
Fragment screening library workshop (IQPC 2008)
 
Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)Design of fragment screening libraries (IQPC 2008)
Design of fragment screening libraries (IQPC 2008)
 
Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)Design of compound libraries for fragment screening (Feb 2012 version)
Design of compound libraries for fragment screening (Feb 2012 version)
 
Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)Design of fragment screening libraries (Feb 2010 version)
Design of fragment screening libraries (Feb 2010 version)
 
Lipophilicity in the context of molecular design
Lipophilicity in the context of molecular designLipophilicity in the context of molecular design
Lipophilicity in the context of molecular design
 
From screening to molecular interactions: A short tour
From screening to molecular interactions: A short tour From screening to molecular interactions: A short tour
From screening to molecular interactions: A short tour
 
An overview of drug discovery
An overview of drug discoveryAn overview of drug discovery
An overview of drug discovery
 
I'm a molecule designer... get me out of here!
I'm a molecule designer... get me out of here!I'm a molecule designer... get me out of here!
I'm a molecule designer... get me out of here!
 

Recently uploaded

Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night EnjoyCall Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoynarwatsonia7
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...narwatsonia7
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableNehru place Escorts
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Servicenarwatsonia7
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...narwatsonia7
 
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...Miss joya
 
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls ServiceKesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Servicemakika9823
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatorenarwatsonia7
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipurparulsinha
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girlsnehamumbai
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...astropune
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.MiadAlsulami
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...Taniya Sharma
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiAlinaDevecerski
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call girls in Ahmedabad High profile
 

Recently uploaded (20)

Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night EnjoyCall Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
Call Girls Yelahanka Bangalore 📲 9907093804 💞 Full Night Enjoy
 
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...High Profile Call Girls Coimbatore Saanvi☎️  8250192130 Independent Escort Se...
High Profile Call Girls Coimbatore Saanvi☎️ 8250192130 Independent Escort Se...
 
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls AvailableVip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
Vip Call Girls Anna Salai Chennai 👉 8250192130 ❣️💯 Top Class Girls Available
 
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Servicesauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
sauth delhi call girls in Bhajanpura 🔝 9953056974 🔝 escort Service
 
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
 
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
VIP Call Girls Pune Sanjana 9907093804 Short 1500 Night 6000 Best call girls ...
 
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls ServiceKesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
 
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service CoimbatoreCall Girl Coimbatore Prisha☎️  8250192130 Independent Escort Service Coimbatore
Call Girl Coimbatore Prisha☎️ 8250192130 Independent Escort Service Coimbatore
 
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls JaipurCall Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
Call Girls Service Jaipur Grishma WhatsApp ❤8445551418 VIP Call Girls Jaipur
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy GirlsCall Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
Call Girls In Andheri East Call 9920874524 Book Hot And Sexy Girls
 
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
Best Rate (Hyderabad) Call Girls Jahanuma ⟟ 8250192130 ⟟ High Class Call Girl...
 
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
Artifacts in Nuclear Medicine with Identifying and resolving artifacts.
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
 
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls DelhiRussian Escorts Girls  Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
Russian Escorts Girls Nehru Place ZINATHI 🔝9711199012 ☪ 24/7 Call Girls Delhi
 
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
Call Girls Service Navi Mumbai Samaira 8617697112 Independent Escort Service ...
 

Data-analytic sins in property-based molecular design

  • 1. Data-analytic sins in property-based molecular design Peter Kenny pwk.pub.2008@gmail.com | http://fbdd-lit.blogspot.com
  • 2. TEP = [𝐷𝑟𝑢𝑔 𝑿,𝑡 ] 𝑓𝑟𝑒𝑒 𝐾 𝑑 Target engagement potential (TEP) A basis for molecular design?
  • 3. Property-based design as search for ‘sweet spot’
  • 4. Correlation • Strong correlation implies good predictivity – I have observed a correlation so you must use my rule • Multivariate data analysis (e.g. PCA) usually involves transformation to orthogonal basis • Applying cutoffs (e.g. MW restriction) to data can distort correlations • Noise and range limits in data
  • 5. Quantifying strengths of relationships between continuous variables • Correlation measures – Pearson product-moment correlation coefficient (R) – Spearman's rank correlation coefficient () – Kendall rank correlation coefficient (τ) • Quality of fit measures – Coefficient of determination (R2) is the fraction of the variance in Y that is explained by model – Root mean square error (RMSE)
  • 6. Preparation of synthetic data sets Kenny & Montanari (2013) JCAMD 27:1-13 DOI Add Gaussian noise (SD=10) to Y
  • 7. Correlation inflation by hiding variation See Hopkins, Mason & Overington (2006) Curr Opin Struct Biol 16:127-136 DOI Leeson & Springthorpe (2007) NRDD 6:881-890 DOI Data is naturally binned (X is an integer) and mean value of Y is calculated for each value of X. In some studies, averaged data is only presented graphically and it is left to the reader to judge the strength of the correlation. R = 0.34 R = 0.30 R = 0.31 R = 0.67 R = 0.93 R = 0.996
  • 8. r N 1202 R 0.247 ( 95% CI: 0.193 | 0.299)  0.215 ( P < 0.0001)  0.148 ( P < 0.0001) N 8 R 0.972 ( 95% CI: 0.846 | 0.995)  0.970 ( P < 0.0001)  0.909 ( P = 0.0018) Correlation Inflation in Flatland See Lovering, Bikker & Humblet (2009) JMC 52:6752-6756 DOI
  • 9. Masking variation with standard error See Gleeson (2008) JMC 51:817-834 DOI Partition by value of X into 4 bins with equal numbers of data points and display 95% confidence interval for mean (green) and mean ± SD (blue) for each bin. R = 0.12 R = 0.29 R = 0.28
  • 10. N Bins Degrees of Freedom F P 40 4 3 0.2596 0.8540 400 4 3 12.855 < 0.0001 4000 4 3 115.35 < 0.0001 4000 2 1 270.91 < 0.0001 4000 8 7 50.075 < 0.0001 “In each plot provided, the width of the errors bars and the difference in the mean values of the different categories are indicative of the strength of the relationship between the parameters.” Gleeson (2008) JMC 51:817-834 DOI The error of standard error ANOVA for binned data sets
  • 11. Know your data • Assays are typically run in replicate making it possible to estimate assay variance • Every assay has a finite dynamic range and it may not always be obvious what this is for a particular assay • Dynamic range may have been sacrificed for thoughput but this, by itself, does not make the assay bad • We need to be able analyse in-range and out-of- range data within single unified framework – See Lind (2010) QSAR analysis involving assay results which are only known to be greater than, or less than some cut-off limit. Mol Inf 29:845-852 DOI
  • 12. Depicting variation with percentile plots This graphical representation of data makes it easy to visualize variation and can be used with mixed in-range and out-of-range data. See Colclough et al (2008) BMCL 16:6611-6616 DOI
  • 13. Binning continuous data restricts your options for analysis and places burden of proof on you to show that your conclusions are independent of the binning scheme. Think before you bin! Averaging the binned data was your idea so don’t try blaming me this time!
  • 14. Correlation inflation: some stuff to think about • Model continuous data as continuous data – RMSE is most relevant to prediction but you still need R2 – Fitted parameters may provide insight (e.g. solubility is more sensitive than potency to lipophilicity) • When selecting training data think in terms of Design of Experiments (e.g. evenly spaced values of X) • Try to achieve normally distributed Y (e.g. use pIC50 rather than IC50) • Never make statements about the strength of a relationship when you’ve hidden or masked variation in the data (unless you want a starring role in Correlation Inflation 2) • To be meaningful, a measure of the spread of a distribution must be independent of sample size • Reviewers/editors, mercilessly purge manuscripts of statements like, “A negative correlation was observed between X and Y” or “A and B are correlated/linked”
  • 15. Ligand efficiency metrics (LEMs) considered harmful • We use LEMs to normalize activity with respect to risk factors such as molecular size and lipophilicity • What do we mean by normalization? • We make assumptions about underlying relationship between activity and risk factor(s) when we define an LEM • LEM as measure of extent to which activity beats a trend? Kenny, Leitão & Montanari (2014) JCAMD 28:699-710 DOI
  • 16. Scale activity/affinity by risk factor LE = ΔG/HA Offset activity/affinity by risk factor LipE = pIC50  ClogP Ligand efficiency metrics No reason that dependence of activity on risk factor should be restricted to one of these two linear models
  • 17. Use trend actually observed in data for normalization rather than some arbitrarily assumed trend
  • 18. There’s a reason why we say standard free energy of binding… DG = DH TDS = RTln(Kd/C0) • Adoption of 1 M as standard concentration is arbitrary • A view of a chemical system that changes with the choice of standard concentration is thermodynamically invalid
  • 19. NHA Kd/M C/M (1/NHA) log10(Kd/C) 10 10-3 1 0.30 20 10-6 1 0.30 30 10-9 1 0.30 10 10-3 0.1 0.20 20 10-6 0.1 0.25 30 10-9 0.1 0.27 10 10-3 10 0.40 20 10-6 10 0.35 30 10-9 10 0.33 Effect on LE of changing standard concentration
  • 20. Scaling transformation of parallel lines by dividing Y by X (This is how ligand efficiency is calculated) Size dependency of LE is consequence of non-zero intercept
  • 21. Affinity plotted against molecular weight for minimal binding elements against various targets in inhibitor deconstruction study showing variation in intercept term Hajduk PJ (2006) J Med Chem 49:6972–6976 DOI Is it valid to combine results from different assays in LE analysis?
  • 22. Offsetting transformation of lines with different slope and common intercept by subtracting X from Y (This is how lipophilic efficiency is calculated) Thankfully (hopefully?) nobody has ‘discovered’ lipophilicity-dependent lipophilic efficiency yet
  • 23. Linear fit of ΔG for published data set Mortenson & Murray (2011) JCAMD 25:663-667 DOI
  • 24. Ligand efficiency, group efficiency and residuals plotted for published data set
  • 25. Some more stuff to think about • Normalize activity using trend actually observed in data (this means you have to model the data) • Residuals are invariant with respect to choice in standard concentration • Residuals can be used with other functional forms (e.g. non-linear and multi-linear)