SlideShare a Scribd company logo
1 of 31
Uncertainty in QSAR Predictions –
Bayesian Inference and the Magic of
Bootstrap
Ullrika Sahlin PhD
Centre for Environmental and
Climate Research (CEC)
QSAR integrated assessment
Assessment
model
Input 1
Input 2
Input 3
Decision
node
QSAR
prediction
QSAR
prediction
Experimental
value
Uncertainty in hazard assessment –
does it matter?
4.
Conservative
value of
toxicity
3.
Expected
toxicity
2.
Median
toxicity
1. QSAR
predictions
without
uncertainty
0. No HA
?: 386
Not toxic*:
281
265 262 153
+109
+3
+16
Very toxic:
105
Sahlin et al. 2013. Arguments for Considering Uncertainty in QSAR Predictions
in Hazard and Risk Assessments. ATLA
QSAR integrated hazard assessment
and the AD domain problem
-10 -8 -6 -4
0200400600800
Predicted No Effect Concentration of 386 Triazoles
log min{EC50}
Molecularweight
Relative toxicity potential
Low confidence in prediction
Modes of statistical inference
• Parametric inference
– Explain
– Hypothesis-driven
• Predictive inference
– Predict to support decision making
– Generate hypothesis
• Evidence synthesis
– Consider quality
Geisser. Introduction to predictive inference 1993. Sutton and Abrams 2001. Bayesian
methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research.
To predict…
 is to make a statement
of something we have
not yet observed
 is always made with
uncertainty
 is made using at least
one model
How can I…
• Assess uncertainty in a prediction?
• Take my judgement of confidence in the
model into account?
• Validate the assessment?
Principle for
QSAR modelling
Principle to
judge
confidence in
predictions
Principle to
assess
uncertainty
Uncertainty in a prediction
Predictive error Predictive reliability
Our confidence in using a
model to predict what we
want to predict
0.0 0.1 0.2 0.3 0.4 0.5 0.6
-2-101
hat value
predictivemean
2 4 6 8 10 12 14
-2-101
nC
logEC50
Discrepancy between model
and reality
-5 0 5 10
-10-5051015
nC
predictedy
Different kinds of errors
5e-02 5e-01 5e+00 5e+01 5e+02
51015
distance from model
prediction
+
+ +
+
+
+
+
+ ++++
+ + +
++
+
++
+
+
+
++
+
+ ++
+
+
+
+
++
+
+
+
+
+
+++ +
++
+ +
+
+
+
+
+
+
++
++
++
+
+
+
++ + +
+
+
+
+
+
+
++
+
++
+++
+
+
+
+
+
+
+
+
+
+
++
++
+
+
+
+
++
+
+
+
+
+
+
+
+
+
+
+
+
++
+
+
+
++
+
+
+
++ +++
+
++++++++++
+ +
+ +
+
+ +
+
++ + +
+ ++
+
++
++ + +
++
+
+
+ +
+
+ +
++
++
+
+
+
+
+
+
+
++
+
+
+
+
++
++ +
+
+
+
+
+
+
+
+
+
+ +
+
+
+++
++++
+
+
+
+
+
+
++
+
+
++
++
+
+
+ +
++ + +
+
++
+ +
+
+
+ +
+
+
++
+
+
+
+
+
++ ++
+ +
++
+
+ +
++
++
+
+++ +
+
+
+
+
+
+++
+
++
+
+
+
++
++
+ +
++
+
+
+
+
+
+
+
+
+
+
++ + + ++
+ ++
++ +
+
+
+
+
+
+
+
+ +++
+
+ ++++
+
+
+++
+++++++
+ + +++
+
+
+
+
+
+
+
++
+
+
+
++
+
++
+
+ +
+ ++++ +
+++
+
++ +
+ ++
Predictive reliability
Different measures of predictive
reliability
• Similarity to points in the training data set
• Distance from the centre of training data
• Density of training data around the item to be
predicted
• Sensitivity analysis e.g. standard deviation in
perturbed predictions
Predictive error of a regression
Predictive error of a regression
Predictive distribution
p(Y < y |X,θ)
Predictive error of a regression
Predictive distribution
p(Y < y |X,θ)
Predictive error of a regression
Use likelihood to compare!
Assessment of
predictive
distribution
Frequentist
framework
Frequentist
analytical
Sampling
"external data" Re-sampling
Jackknifing
"without
replacement"
Bootstrapping
"with
replacement"
Bayesian
framework
Bayesian
analytical
Bayesian
sampling
Different ways to assess
I. Bayesian modelling
Assessment of
predictive
distribution
Frequentist
framework
Frequentist
analytical
Sampling
"external data" Re-sampling
Jackknifing
"without
replacement"
Bootstrapping
"with
replacement"
Bayesian
framework
Bayesian
analytical
Bayesian
sampling
I. Bayesian modelling
• Model parameters are
uncertain
• Uncertainty is described by
probability
• Prior information is
subjective
• Data enters through
Bayesian updating
0 50 100 150 200
505560657075
MCMC sampling
parameter 1
parameter2
I. Bayesian modelling
Pros
• Uncertainty is measured by
probability
• Links to decision theory
• Motivated under small data
Cons
• Treatment of high-
dimensional descriptor
space?
• Limitation to specific
models?
• Re-modelling of QSARs
needed
Validation
Fathead Minnow QSARdata R-package
Park and Casella (2008) Journal of the American Statistical
Association, Gramacy and Pantaleo (2010) Bayesian Analysis.
-2 -1 0 1 2
-1012
training data
observed
predicted
R2_Blasso = 0.79
-3 -2 -1 0 1 2
-2-10123
test data
observed
predicted
R2_Blasso = 0.75
Validation
Empirical coverage
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
training data
confidence
hitrate
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
test data
confidence
hitrate
2. Bootstrap sampling
Assessment of
predictive
distribution
Frequentist
framework
Frequentist
analytical
Sampling
"external data" Re-sampling
Jackknifing
"without
replacement"
Bootstrapping
"with
replacement"
Bayesian
framework
Bayesian
analytical
Bayesian
sampling
3. Assessment considering judgment in
predictive reliability
Inspired by Denham 1997 and Clark 2009
Type of distribution:
Gaussian
Mean: Point
prediction yq
Variance: Local Predictive Error Sum of
Squares divided by denominator
3. Assessment considering judgment in
predictive reliability
Inspired by Denham 1997 and Clark 2009
Type of distribution:
Gaussian
Mean: Point
prediction yq
Variance: Local Predictive Error Sum of
Squares divided by denominator
Observed prediction errors Measure of predictive reliability
jj yy ˆ Sampling from distribution of
modified residuals
3. Assessment considering judgment in
predictive reliability
n
j jq
n
j jjjq
q
w
yyw
PRESSW
1 ,
1
2
, )ˆ(
.
)(
2
,
)ˆ(.
jqwkNNj
jjq yyPRESSkNN
n
j jj yyPRESS 1
2
)ˆ(
Inspired by Denham 1997 and Clark 2009
Type of distribution:
Gaussian
Mean: Point
prediction Yq
Variance: Local Predictive Error Sum of
Squares divided by denominator
Validate the assessment
Evaluation on External data
log likelihood score
Assessmentofpredictiveerror
-100 -80 -60 -40 -20 0
equal
W euclidean
W leverage
W ADdens
kNN euclidean
kNN leverage
kNN ADdens
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Empirical coverage (External data)
confidence level
hitrate
1:1
equal
W euclidean
W leverage
W ADdens
kNN euclidean
kNN leverage
kNN ADdens
So – which approach is the best?
-2 -1 0 1 2
-2-1012
training data
observed
predicted
R2_pls = 0.77 R2_boot = 0.83 R2_Blasso = 0.79
-3 -2 -1 0 1 2
-2-10123
test data
observed
predicted
R2_pls = 0.77 R2_boot = 0.78 R2_Blasso = 0.75
So – which approach is the best?
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
training data
confidence
hitrate
1:1
Blasso
Bootstrap
kNN leverage
equal
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
test data
confidence
hitrate
1:1
Blasso
Bootstrap
W euclidean
equal
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
training data
confidence
hitrate
1:1
Blasso
Bootstrap
kNN leverage
equal
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
test data
confidence
hitrate
1:1
Blasso
Bootstrap
W euclidean
equal
So – which approach is the best?
Evaluation on training data
log likelihood score
Assessmentofpredictiveerror
-200 -150 -100 -50 0
Blasso
Bootstrap
kNN leverage
equal
Take home messages
• A predictions is complete when given with
uncertainty specified by probability
• Assessment of uncertainty need both be
theoretical motivated and proved honest in
empirical evaluation of performance measures
• Three useful approaches are to assess uncertainty
through modelling (Bayesian), sampling (e.g.
bootstrapping), or post modelling of predictive
error
• Use appropriate measures to validate the
assessment of uncertainty
Thank you for your attention
Drive safely in the statistical djungle!

More Related Content

Viewers also liked

Uncertainty business strategy by bhawani nandan prasad iim calcutta
Uncertainty business strategy by bhawani nandan prasad iim calcuttaUncertainty business strategy by bhawani nandan prasad iim calcutta
Uncertainty business strategy by bhawani nandan prasad iim calcuttaBhawani N Prasad
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...butest
 
Knowledge management on the desktop
Knowledge management on the desktopKnowledge management on the desktop
Knowledge management on the desktopLaura Dragan
 
The Uncertainty Model: Understanding What Business You Are In
The Uncertainty Model: Understanding What Business You Are InThe Uncertainty Model: Understanding What Business You Are In
The Uncertainty Model: Understanding What Business You Are InAlessandro Daliana
 
Processing Patterns for PredictiveBusiness
Processing Patterns for PredictiveBusinessProcessing Patterns for PredictiveBusiness
Processing Patterns for PredictiveBusinessTim Bass
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic WebMyungjin Lee
 
High-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural EffectsHigh-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural EffectsNBER
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Sara Magliacane
 
Inference And Observation Activity
Inference And Observation ActivityInference And Observation Activity
Inference And Observation ActivityDwayne Squires
 
Applications of artificial intelligence (AI) models for management decision m...
Applications of artificial intelligence (AI) models for management decision m...Applications of artificial intelligence (AI) models for management decision m...
Applications of artificial intelligence (AI) models for management decision m...The Higher Education Academy
 
Lesson on inferencing
Lesson on inferencingLesson on inferencing
Lesson on inferencingteacherwv
 
Prediction And Inference
Prediction And InferencePrediction And Inference
Prediction And Inferenceguest80c4b1
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesGilad Barkan
 
Making inferences ppt lesson
Making inferences ppt lessonMaking inferences ppt lesson
Making inferences ppt lessonTeresa Diaz
 

Viewers also liked (14)

Uncertainty business strategy by bhawani nandan prasad iim calcutta
Uncertainty business strategy by bhawani nandan prasad iim calcuttaUncertainty business strategy by bhawani nandan prasad iim calcutta
Uncertainty business strategy by bhawani nandan prasad iim calcutta
 
Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...Bayesian Inference: An Introduction to Principles and ...
Bayesian Inference: An Introduction to Principles and ...
 
Knowledge management on the desktop
Knowledge management on the desktopKnowledge management on the desktop
Knowledge management on the desktop
 
The Uncertainty Model: Understanding What Business You Are In
The Uncertainty Model: Understanding What Business You Are InThe Uncertainty Model: Understanding What Business You Are In
The Uncertainty Model: Understanding What Business You Are In
 
Processing Patterns for PredictiveBusiness
Processing Patterns for PredictiveBusinessProcessing Patterns for PredictiveBusiness
Processing Patterns for PredictiveBusiness
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
 
High-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural EffectsHigh-Dimensional Methods: Examples for Inference on Structural Effects
High-Dimensional Methods: Examples for Inference on Structural Effects
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...
 
Inference And Observation Activity
Inference And Observation ActivityInference And Observation Activity
Inference And Observation Activity
 
Applications of artificial intelligence (AI) models for management decision m...
Applications of artificial intelligence (AI) models for management decision m...Applications of artificial intelligence (AI) models for management decision m...
Applications of artificial intelligence (AI) models for management decision m...
 
Lesson on inferencing
Lesson on inferencingLesson on inferencing
Lesson on inferencing
 
Prediction And Inference
Prediction And InferencePrediction And Inference
Prediction And Inference
 
Bayesian Belief Networks for dummies
Bayesian Belief Networks for dummiesBayesian Belief Networks for dummies
Bayesian Belief Networks for dummies
 
Making inferences ppt lesson
Making inferences ppt lessonMaking inferences ppt lesson
Making inferences ppt lesson
 

Similar to Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

Final PhD Seminar
Final PhD SeminarFinal PhD Seminar
Final PhD SeminarMatt Moores
 
Image quality assessment and statistical evaluation
Image quality assessment and statistical evaluationImage quality assessment and statistical evaluation
Image quality assessment and statistical evaluationDocumentStory
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis TestingRyan Herzog
 
Uncertainity
UncertainityUncertainity
UncertainityVIGNESH C
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Md Rahman
 
Frequentist Operating Characteristics of Bayesian Posterior Designs
Frequentist Operating Characteristics of Bayesian Posterior DesignsFrequentist Operating Characteristics of Bayesian Posterior Designs
Frequentist Operating Characteristics of Bayesian Posterior DesignsBiomedical Statistical Consulting
 
Measurement Uncertainty (1).ppt
Measurement Uncertainty (1).pptMeasurement Uncertainty (1).ppt
Measurement Uncertainty (1).pptHoussemEddineSassi
 
Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspectivebutest
 
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...Matt Moores
 
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...Alexander Gorban
 
Optimization of sample configurations for variogram estimation
Optimization of sample configurations for variogram estimationOptimization of sample configurations for variogram estimation
Optimization of sample configurations for variogram estimationAlessandro Samuel-Rosa
 
Predicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetPredicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetSalford Systems
 
Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspectivebutest
 
How to Measure Uncertainty
How to Measure UncertaintyHow to Measure Uncertainty
How to Measure UncertaintyRandox
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for HealthcareChandan Reddy
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...
Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...
Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...eHealth Africa
 

Similar to Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap (20)

Errors2
Errors2Errors2
Errors2
 
Final PhD Seminar
Final PhD SeminarFinal PhD Seminar
Final PhD Seminar
 
Image quality assessment and statistical evaluation
Image quality assessment and statistical evaluationImage quality assessment and statistical evaluation
Image quality assessment and statistical evaluation
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Uncertainity
UncertainityUncertainity
Uncertainity
 
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
Robust Prediction of Cancer Disease Using Pattern Classification of Microarra...
 
Frequentist Operating Characteristics of Bayesian Posterior Designs
Frequentist Operating Characteristics of Bayesian Posterior DesignsFrequentist Operating Characteristics of Bayesian Posterior Designs
Frequentist Operating Characteristics of Bayesian Posterior Designs
 
POINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.pptPOINT_INTERVAL_estimates.ppt
POINT_INTERVAL_estimates.ppt
 
Measurement Uncertainty (1).ppt
Measurement Uncertainty (1).pptMeasurement Uncertainty (1).ppt
Measurement Uncertainty (1).ppt
 
Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspective
 
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
bayesImageS: Bayesian computation for medical Image Segmentation using a hidd...
 
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
Do Fractional Norms and Quasinorms Help to Overcome the Curse of Dimensiona...
 
Optimization of sample configurations for variogram estimation
Optimization of sample configurations for variogram estimationOptimization of sample configurations for variogram estimation
Optimization of sample configurations for variogram estimation
 
Predicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNetPredicting Hospital Readmission Using TreeNet
Predicting Hospital Readmission Using TreeNet
 
Probability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning PerspectiveProbability Forecasting - a Machine Learning Perspective
Probability Forecasting - a Machine Learning Perspective
 
How to Measure Uncertainty
How to Measure UncertaintyHow to Measure Uncertainty
How to Measure Uncertainty
 
Big Data Analytics for Healthcare
Big Data Analytics for HealthcareBig Data Analytics for Healthcare
Big Data Analytics for Healthcare
 
Vanderbilt b
Vanderbilt bVanderbilt b
Vanderbilt b
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...
Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...
Kano GIS Day 2014 - The Application of Multivariate Geostatistical analyses i...
 

Recently uploaded

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 

Recently uploaded (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

  • 1. Uncertainty in QSAR Predictions – Bayesian Inference and the Magic of Bootstrap Ullrika Sahlin PhD Centre for Environmental and Climate Research (CEC)
  • 2. QSAR integrated assessment Assessment model Input 1 Input 2 Input 3 Decision node QSAR prediction QSAR prediction Experimental value
  • 3. Uncertainty in hazard assessment – does it matter? 4. Conservative value of toxicity 3. Expected toxicity 2. Median toxicity 1. QSAR predictions without uncertainty 0. No HA ?: 386 Not toxic*: 281 265 262 153 +109 +3 +16 Very toxic: 105 Sahlin et al. 2013. Arguments for Considering Uncertainty in QSAR Predictions in Hazard and Risk Assessments. ATLA
  • 4. QSAR integrated hazard assessment and the AD domain problem -10 -8 -6 -4 0200400600800 Predicted No Effect Concentration of 386 Triazoles log min{EC50} Molecularweight Relative toxicity potential Low confidence in prediction
  • 5. Modes of statistical inference • Parametric inference – Explain – Hypothesis-driven • Predictive inference – Predict to support decision making – Generate hypothesis • Evidence synthesis – Consider quality Geisser. Introduction to predictive inference 1993. Sutton and Abrams 2001. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research.
  • 6. To predict…  is to make a statement of something we have not yet observed  is always made with uncertainty  is made using at least one model
  • 7. How can I… • Assess uncertainty in a prediction? • Take my judgement of confidence in the model into account? • Validate the assessment? Principle for QSAR modelling Principle to judge confidence in predictions Principle to assess uncertainty
  • 8. Uncertainty in a prediction Predictive error Predictive reliability Our confidence in using a model to predict what we want to predict 0.0 0.1 0.2 0.3 0.4 0.5 0.6 -2-101 hat value predictivemean 2 4 6 8 10 12 14 -2-101 nC logEC50 Discrepancy between model and reality
  • 9. -5 0 5 10 -10-5051015 nC predictedy Different kinds of errors
  • 10. 5e-02 5e-01 5e+00 5e+01 5e+02 51015 distance from model prediction + + + + + + + + ++++ + + + ++ + ++ + + + ++ + + ++ + + + + ++ + + + + + +++ + ++ + + + + + + + + ++ ++ ++ + + + ++ + + + + + + + + ++ + ++ +++ + + + + + + + + + + ++ ++ + + + + ++ + + + + + + + + + + + + ++ + + + ++ + + + ++ +++ + ++++++++++ + + + + + + + + ++ + + + ++ + ++ ++ + + ++ + + + + + + + ++ ++ + + + + + + + ++ + + + + ++ ++ + + + + + + + + + + + + + + +++ ++++ + + + + + + ++ + + ++ ++ + + + + ++ + + + ++ + + + + + + + + ++ + + + + + ++ ++ + + ++ + + + ++ ++ + +++ + + + + + + +++ + ++ + + + ++ ++ + + ++ + + + + + + + + + + ++ + + ++ + ++ ++ + + + + + + + + + +++ + + ++++ + + +++ +++++++ + + +++ + + + + + + + ++ + + + ++ + ++ + + + + ++++ + +++ + ++ + + ++ Predictive reliability
  • 11. Different measures of predictive reliability • Similarity to points in the training data set • Distance from the centre of training data • Density of training data around the item to be predicted • Sensitivity analysis e.g. standard deviation in perturbed predictions
  • 12. Predictive error of a regression
  • 13. Predictive error of a regression Predictive distribution p(Y < y |X,θ)
  • 14. Predictive error of a regression Predictive distribution p(Y < y |X,θ)
  • 15. Predictive error of a regression Use likelihood to compare!
  • 16. Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling Different ways to assess
  • 17. I. Bayesian modelling Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling
  • 18. I. Bayesian modelling • Model parameters are uncertain • Uncertainty is described by probability • Prior information is subjective • Data enters through Bayesian updating 0 50 100 150 200 505560657075 MCMC sampling parameter 1 parameter2
  • 19. I. Bayesian modelling Pros • Uncertainty is measured by probability • Links to decision theory • Motivated under small data Cons • Treatment of high- dimensional descriptor space? • Limitation to specific models? • Re-modelling of QSARs needed
  • 20. Validation Fathead Minnow QSARdata R-package Park and Casella (2008) Journal of the American Statistical Association, Gramacy and Pantaleo (2010) Bayesian Analysis. -2 -1 0 1 2 -1012 training data observed predicted R2_Blasso = 0.79 -3 -2 -1 0 1 2 -2-10123 test data observed predicted R2_Blasso = 0.75
  • 21. Validation Empirical coverage 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate
  • 22. 2. Bootstrap sampling Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling
  • 23. 3. Assessment considering judgment in predictive reliability Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction yq Variance: Local Predictive Error Sum of Squares divided by denominator
  • 24. 3. Assessment considering judgment in predictive reliability Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction yq Variance: Local Predictive Error Sum of Squares divided by denominator Observed prediction errors Measure of predictive reliability jj yy ˆ Sampling from distribution of modified residuals
  • 25. 3. Assessment considering judgment in predictive reliability n j jq n j jjjq q w yyw PRESSW 1 , 1 2 , )ˆ( . )( 2 , )ˆ(. jqwkNNj jjq yyPRESSkNN n j jj yyPRESS 1 2 )ˆ( Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction Yq Variance: Local Predictive Error Sum of Squares divided by denominator
  • 26. Validate the assessment Evaluation on External data log likelihood score Assessmentofpredictiveerror -100 -80 -60 -40 -20 0 equal W euclidean W leverage W ADdens kNN euclidean kNN leverage kNN ADdens 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Empirical coverage (External data) confidence level hitrate 1:1 equal W euclidean W leverage W ADdens kNN euclidean kNN leverage kNN ADdens
  • 27. So – which approach is the best? -2 -1 0 1 2 -2-1012 training data observed predicted R2_pls = 0.77 R2_boot = 0.83 R2_Blasso = 0.79 -3 -2 -1 0 1 2 -2-10123 test data observed predicted R2_pls = 0.77 R2_boot = 0.78 R2_Blasso = 0.75
  • 28. So – which approach is the best? 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 1:1 Blasso Bootstrap kNN leverage equal 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate 1:1 Blasso Bootstrap W euclidean equal
  • 29. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 1:1 Blasso Bootstrap kNN leverage equal 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate 1:1 Blasso Bootstrap W euclidean equal So – which approach is the best? Evaluation on training data log likelihood score Assessmentofpredictiveerror -200 -150 -100 -50 0 Blasso Bootstrap kNN leverage equal
  • 30. Take home messages • A predictions is complete when given with uncertainty specified by probability • Assessment of uncertainty need both be theoretical motivated and proved honest in empirical evaluation of performance measures • Three useful approaches are to assess uncertainty through modelling (Bayesian), sampling (e.g. bootstrapping), or post modelling of predictive error • Use appropriate measures to validate the assessment of uncertainty
  • 31. Thank you for your attention Drive safely in the statistical djungle!

Editor's Notes

  1. The number of BTAZs compounds classified as very toxic or not (including potentially*) toxic under the different treatments of QSAR uncertainty both in input and in the output of the assessment. Uncertainty in QSAR predictions is considered in alternatives 2 to 4.
  2. Here is an example from a case-study in the EU-project CADASTER where QSAR predictionswereusedtoinform input parameters to an environmental hazardassessment. Assessmentsof relative toxicityarehereshown for 386 triazoles. The largersizeof a red dot the moretoxic. The compoundsarehereplottedagainst the minimum toxicityvaluewhich is the alternative if not uncertainty in QSAR predictionswouldhavebeenconsidered, and againstmolecularweightwhich has a highinfluence on toxcity. If considerationofuncertainty in QSAR predictionswould not have an effect, the compoundsshouldfollow a staightline. They do not, and thismeansthatuncertainty has an (but small) influence on the outcomeof the assessment. Someofthesecompoudndsfelloutof the applicabilitydomainofone or severalof the QSAR modelsused in the assessment. Thesecompoundsareassessedwithlowerconfidence and aremarkedwith a bluetriangle. Whyuncertaintyanalysis: Usingpointestimatesofpredictionse.g best guess or expectations (the plug in approach) does not garanteethat the assessmentareproducing the best guess or expectedvalueof the output.
  3. There is always atleastonemodelbehind a prediction. It can be a mental model. It can be a by mathematicswelldefinedmodelconsistingof a set ofequations. The modellingmayinvolve a statistical modelwhichoftenareassumedtohold under certainassumptions. It can be the processsofmodellingwhichmore or less transparentlydescribehow the model has beengenerated, parameters estimated and the modelsperformancevalidated. A reasontodwelluponwhichmodelsinclude is that it sets the possibilitiestoassessuncertainty in predictions. This year&apos;s plethora of prognosticators comes thanks to Paul the octopus, who correctly predicted the outcomes of all seven of Germany&apos;s World Cup matches in 2010 in addition to the final between Spain and Holland.
  4. The titlehere is uncertainty in a prediction, I would like toemphasizethatuncertainty is different from predictiontoprediction.Ineedtospecifyuncertainty in an individualpredictionwhen I usepredictivemodelssuch as QSARstoinform decision analysis in someway or another. I havenoticedthat a qualitativejudgementofpredictivereliabilitymayleadto the modelprediction not beingused, butwhat is the alternative, or the modelpredictionbeingusedbutflaggingthat it may not be good. This has led to the ideatolet the judgmentofpredictivereliabilityinfluence the quantitaive part of the uncertainty in a prediction. Information requirementsFirstwe note thatuncertainty in a predictioncannot be reportedwith a model in the same way general measuresofpredictiveperformancearereported, it depends on whattopredict. Later I will show howtoassessmentofuncertainty in a predictionscan be usedtoevaluatewaystojudgeconfidence in predictions. Note thattheremay be different uncertaintyassociatedto different individualpredictions. Error is not equal for all compoundsuponwhich a model is applied. Thisseemsratherobvious, but in practiseareerroroftenspecified as equal for anyprediction, whilepredictivereliabilitycan be very different. Whilereliability is a qualitativeaspectofuncertaintyrelatedto the question is this a trust worthypieceof information, can I usethisprediction in my risk or decision model, (and the followupquestion: if I can,twhat is the alterantive). Error, being a quantitativecharacterizationofuncertaintycan be dealthwith in the risk or deciaionanalysis, it still provideuse an alternative. There is a needtojointlyconsidererror and predictivereliability. Here is a simple modelbased on onedescriptor. The modelpredict a line, predictiveerrorcan be assessed. Here I haveused a Bayesianmodeltoquantifyerror in predictions. Errorincrease the futheroutof the scatterof data pointswehave, alsowhatcanwesayaboutitemsfallingoutsideof the scatterpoints. Bayesianmodelling[Descriptionof a Bayesian regression][Exemplified by the Bayesian Lasso]Predictive distribution increasewith the distanceto the training data set (hat value)
  5. Sopredictiveerror is characterized by a probabilty distribution – the predictive distribution. Note thattheremay be different uncertaintyassociatedto different individualpredictions. Error is not equal for all compoundsuponwhich a model is applied. Thisseemsratherobvious, but in practiseareerroroftenspecified as equal for anyprediction, whilepredictivereliabilitycan be very different. Whilereliability is a qualitativeaspectofuncertaintyrelatedto the question is this a trust worthypieceof information, can I usethisprediction in my risk or decision model, (and the followupquestion: if I can,twhat is the alterantive). Error, being a quantitativecharacterizationofuncertaintycan be dealthwith in the risk or deciaionanalysis, it still provideuse an alternative. There is a needtoconsidererror and reliabilityjointly. Here is a simple modelbased on onedescriptor. The modelpredict a line, predictiveerrorcan be assessed. Here I haveused a Bayesianmodeltoquantifyerror in predictions. The dashedlines mark the boundsofprediction intervals with 95% confidenceofcovering the actualvalue. Errorincrease the furtheroutof the scatterof data pointswehave, alsowhatcanwesayaboutitemsfallingoutsideof the scatterpoints. Predictive distribution increasewith the distanceto the training data set.
  6. Here is anotherexampleofdistance from modelversuspointprediction. Thismodel has a highdimensionaldescriptor space and thereof the scatterof black dots (the training data) and red crosses (external predictions). Hereweclearilyseethatsomecompoundsbecomesevere extrapolations from the AD whenpredicted by thismodel. As an alternative todisregardingthesepredictionswecould ask, yesthesepredictionsare bad, buthow bad and does it matter for our decision?
  7. Judging the reliability in using a modeltopredictaremadeuponseveralcritierias. Firstonecan look for general qualitativecriterias, whether the compoundfullfillcertaincharacteritcstahthe QSAR is modelling. When thepredefinedcriteriasaremet, different measuresof a modelsdomainofapplicabilitycan be usedtoevaluatepredictivereliablility. Bild 0. 2 dimensionel avståndBild 1. avståndBild 2. Täthet (3 dim)Bild 3. Visa på något sätt.
  8. Predictive distributionUncertaintydescribed by a probability distributionDescribes the errorwith a probability distribution
  9. Ifwebelieve the assessmentofuncertaintyto be true, wewouldexpect the truevalueto fall somewhere under the predictive distribution. Close to the center of the predictive distribution moreoften.
  10. Here is an attemptto show an overview over approachestoassesspredictiveerrors (or the predictive distribution).This is not covering all approaches, but the most common and I am happy todiscussthismorewithsomeoneinterested.It has twomainbranches – frequentist (or classlical) statistical framwork and Bayesianframework. I willnow pick and demonstrateoneexample from eachofthesetwobranches.
  11. The first is a Bayesianappraochtoassessuncertainty. Bayesianmodelling is from the beginning designed tomodeluncertainty in parameters usingprobabilities and aretherefore ideal toassespredictiveerrors.
  12. Bayesianmodellingcanquickly be summarized as the activityofmodellingwhere parameters areassigneduncertaintyusingprobabiltiies. A modelconsiistofmodelstructurewhose parameters tobeginwith a assigneduncertainty distribution that express our prior (taht is beforelooking at data) understandingoftheirvalues and characteristicofuncertainty. Data entersthroughBayesianupdating – an this so calledlikelihood principle can be more or less strict .ABayesianmodel is usuallyfitted by Markov Chain Monte Carlo sampling, whichmeansthat an simulation algorithmssearches for optima under the distribtonsof the parameters when the information in data is considered. Priors telluswhereto look and the data telluswhat is a goodplaceto be. In the figurewesee a simulation whichtookusto a good spot for the values on two parameters. When the algorithmseemtostay at the same place – wesaythat it has converged. Wethenthroughawaythosevalues in the beginnnigof the simulation and usethose (here red dots) to generate predictions from the model. Alsosince the parameters areuncertain the predictionswillalso be uncertainty and – viola – wehave a predictive distribution. Bayesianmodelling is THE frameworktoquantifyuncertainty. I provides uncertaintywith a fairlyeasy interpretation – i.e. ouruncertainty in a valuestemming from our expert knowledge and justified by information in empirical observations. At least in theory it is …Gaussian process can deal withhighdimensionaldescriptorspaces, but the mechanisticunderstandingof the model is lost.
  13. TheadvantageswithBayesianmodellingarethatIt result in uncertaintyto be assessed by a probability distributionItinterpretaionofuncertainty is a directlinkto decision theoreticframework – usefulwhenoptimisingtestingstrategies for experimental design or (as in the applications I haveworkingwith) when QSAR predictionsinform input to risk assessmentmodels for chemialregulation. Also, it has a theoretical motivation even under small data sizes (-&gt; Bayesian meta-regression)A problem is that it does not alwayswork in practise. It works best for parametricmodels, sincespecifying priors can be difficultifwe do not whatare in needof priors.It is not clearhowtotreathighdimensionaldescriptor space – the selectionofdescriptors is puzzlingme, from whereshoulddescriptors be part of the model. It is limitedtoBayesianmodellingFinally, it requiresQSARsiftheyalreadyexistto be Re-modelled as Bayesianmodels. Should original set ofdescriptors be considered or the final selectionDifferent parameter values: from pointestimatewith est variance in a frequentistframeworktoposterior distribution depdend on choice of priors in a Bayesianframework. Is it the same QSAR?
  14. Letus look at the overviewofmethodstoassesspredictive distributions. From the frequentisticsideof bransch of the tree I consider re-sampling. Re-sampling sinceweoftenhave a limitation of data. Re-sampling withreplacement, whichmeansthat the same data pointcan be drawnseveraltimes. Thiscancreateinbreeding, i.e. thatsomeresultsappearthat is an artefactof the particula data, and onehavetocautions under small samplesizes. A recipie for Bootstrappingcansimply be toSpecify a quantitywhichuncertaintyweareinterested in. It can be a test statistic, an estimated parameter value or a predictiveerror (i .e. the discrepancybetween a prediction and reality). Thenwespecifyhowtoderivethisquantifybased on observations thatwehave. Thenrepeatedlysample from the observations and let the quantity be derivedmanytimes. Thisresult in a distribution for the quantitywhich express itsuncertainty. Bootstrappingoccurswhenweallow observations to be sampleseveraltimes. A classicalapplication is to fit a modelto data, generate predictions and deriveresiduals, sample from the distribution ofresidualsto generate new data, fit a new model and save the estimated parametrs. Repeatthissevaraltimes. Whatwe get is somethingsimilarto the Bayesianmodel, withuncertainty in the parameters whichresult in uncertainty in the predcitions. The interpretaionofuncertainty is different though. The useofbootstrapsolvesomeof the problems with the Bayesianmodelling. I will not show anyresults from Bootstrappinghere. I willquicklyturnto my third approach toassessuncertainty in a prediction, and that is the approach which do not refit the underlying QSAR model, butusenotionofpredictivereliablity in the assessmentofpredictiveerrors.
  15. Givenare observations ofpredictionerror, i.e. the differencebetween a modelpredictionofcompound not part of the training data set and the actualvalue. For each observation weknow the correposndingmeasureofpredictivereliability. Usingoneof the PRESSesdescribed in the previousslidewecanderive the Local PRESS for a certainquerycompound by comparingitspredictivereliabilitytothoseof the assessment data set. The general algorithm to assess predictive uncertainty samples from the distribution of so calledmodified residuals. A modified residual is found by dividing prediction residuals from an assessment data set, yj–ŷ-j, by each item’s specific standard error SDEPj. If the standard error is properly estimated, and if we assume observed and not yet observed compounds to be exchangeable, the sample of modified residuals provides input for the predictive distribution of individual predictions of new compounds. In this way we do not have to specify what to divide the PRESS value by for the PRESS to be a variance of the predictive distribution. Thisassessment goes quicktorun. Whattakestime is toderive the measuresofpredictivereliability and perhaps LOO predictionerrors for a training data set (if no external data set is used). It alsobecomenecessaryto ask whatmeasureofpredictivereliabilitytouse. In the beginning I mentionedfour different kinds: similarty in descriptor space, distanceto the centreof the AD, densityof the AD closeto the predictedcompound, and sensitivityanalysiswhichcan be the standard deviation in a predictionwhen a model is generatedseveraltimeswith different outcomeseverytime. The nicethingwithhaving a predictive distribution is thatwecanactuallyvalidatehowgoodboth the model and the uncertainty in itspredictionsare. It is very common tocomparemeasuresofpredictivereliabilitythrough the correlationbetweenobservederrors and the measure, butweknowthaterrorscan be both small and large at the same time, theyaredrawn from a distribution. Weknowthatuncertaity in predcitionmayvary from compoundtocompound, butsincewehaveassessedindividualuncertainty in predictions, wecaneasilyplaceeachprediction under itscorrespondingpredictive distribution.
  16. Givenare observations ofpredictionerror, i.e. the differencebetween a modelpredictionofcompound not part of the training data set and the actualvalue. For each observation weknow the correposndingmeasureofpredictivereliability. Usingoneof the PRESSesdescribed in the previousslidewecanderive the Local PRESS for a certainquerycompound by comparingitspredictivereliabilitytothoseof the assessment data set. The general algorithm to assess predictive uncertainty samples from the distribution of so calledmodified residuals. A modified residual is found by dividing prediction residuals from an assessment data set, yj–ŷ-j, by each item’s specific standard error SDEPj. If the standard error is properly estimated, and if we assume observed and not yet observed compounds to be exchangeable, the sample of modified residuals provides input for the predictive distribution of individual predictions of new compounds. In this way we do not have to specify what to divide the PRESS value by for the PRESS to be a variance of the predictive distribution. Thisassessment goes quicktorun. Whattakestime is toderive the measuresofpredictivereliability and perhaps LOO predictionerrors for a training data set (if no external data set is used). It alsobecomenecessaryto ask whatmeasureofpredictivereliabilitytouse. In the beginning I mentionedfour different kinds: similarty in descriptor space, distanceto the centreof the AD, densityof the AD closeto the predictedcompound, and sensitivityanalysiswhichcan be the standard deviation in a predictionwhen a model is generatedseveraltimeswith different outcomeseverytime. The nicethingwithhaving a predictive distribution is thatwecanactuallyvalidatehowgoodboth the model and the uncertainty in itspredictionsare. It is very common tocomparemeasuresofpredictivereliabilitythrough the correlationbetweenobservederrors and the measure, butweknowthaterrorscan be both small and large at the same time, theyaredrawn from a distribution. Weknowthatuncertaity in predcitionmayvary from compoundtocompound, butsincewehaveassessedindividualuncertainty in predictions, wecaneasilyplaceeachprediction under itscorrespondingpredictive distribution.
  17. This approach aimtomodel the errordirectlybased on the judgementofpredictivereliability. For this I need a model for the predictive distribution:Still tamperingwith regressions the predictive distribution is assignedto be Gaussian (bellshaped distribution and symmetricarounditsmean). The meanvalue is the pointprediction from the QSAR model. Information ofpredictiveerror is thencontained in the Varianceofthispredcitive distribution. I let the variance be assessed by a local PRESS divided by a denominator.A reason for this choice is that is should be easytoapply and at best todocumentwithmodels. The Gaussian distribution is a simplification. Other distribution types, even non-parametric, couldhavebeen chosen. Also, the useofbothBayesian and bootstrap as shownbeforerequiresrunning a code (which is possiblebutperhaps not alwaysappreciated).PRESS is a common reportedperfomrancemeasureofQSARs, so why not usethat as a nullmodel for the assessmentofpredictiveuncertaity. Thismeansthat the nullmodelstatesthat all predictiveerrorsareequal and can be derived from the PRESS value. Wehavetriedtwo variants ofLocal PRESS.A weighted PRESS whichweightsaccordingto a measureofsimilarity in predictivereliabilityof the querycompound and of the compound for which I haveobservedpredictionerrors. The weight is constructedsuchthatobservederrorswithrelativelymoresimilarpredictivereliabilityare given higherinfluence in the assessmentvariance. As a consequencevariance for a compoundthat lies in the centreof the AD aremostlybaseduponerrorsobserved for compounds in the centre, and vice versa. A moredirect variant ofthistheme is touselet the PRESS value be morelocal by summing over the k nearestneighbours, wherewhat is near is judgedbased on similarity in predictivereliability. A problem with sampling basedapproaches is that the error in the outscirtsof the AD is less reliablyassessedsince it by definition are less valuesthere, and we do not ass in the Bayesiancaseprovideanyother information. Thus, the locally assessed predictive error can be seen as a conditional predictive error, i.e. the expected error given a compound’s position in the domain of applicability or prior information on uncertainty.
  18. Herearetwowaystovalidateassessmentsofuncertaintyusing an external data set (at best not part of the modelling leading to the assessments). Firstwehavesummed the loggedlikehoodvalues for eachpoint in the external data set. A high score means a better (wellbalanced) assessmentofuncertainty. It meansthatmostcompoundsfell inside the predictivedistriubution and fewwerevery far out. I havenoticedthat the likelihood score can be a bit trickysometimes. And I alwaysprefertoalso look at the graphical display ofempricialcoverages. Empiricalcoverageplotsaregenerated by for different confidencelevelscount the proportion ofcompounds in the data set thatfell inside theircorrespondingprediction intervals. A good and wellbalancedassessmentshould generate a straigthonetooneline. It is importanttokeep in mind thathteunderlying QSAR modelshould be properlyvalidatedbeforedoingthisexcersice.
  19. Bayesian vs bootstrapLoglieklihoodcoverage - while the likelihoodiprovide relative comparison, the empricialcoverageprovide an evaluationthatstand for itself. This is becuaseweuse the uncertaity in predictions as a probabilisticformulatedhypothesisof the observedvalue in the external data set.
  20. Bayesian vs bootstrapLoglieklihoodcoverage - while the likelihoodiprovide relative comparison, the empricialcoverageprovide an evaluationthatstand for itself. This is becuaseweuse the uncertaity in predictions as a probabilisticformulatedhypothesisof the observedvalue in the external data set.