Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Uncertainty business strategy by bh... by Bhawani N Prasad 506 views
- Bayesian Inference: An Introduction... by butest 889 views
- Knowledge management on the desktop by Laura Dragan 489 views
- The Uncertainty Model: Understandin... by Alessandro Daliana 329 views
- Processing Patterns for Predictive... by Tim Bass 2082 views
- Bayesian Inference & Neural Networks by Lukasz Krawczyk 345 views

516 views

Published on

No Downloads

Total views

516

On SlideShare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

14

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Uncertainty in QSAR Predictions – Bayesian Inference and the Magic of Bootstrap Ullrika Sahlin PhD Centre for Environmental and Climate Research (CEC)
- 2. QSAR integrated assessment Assessment model Input 1 Input 2 Input 3 Decision node QSAR prediction QSAR prediction Experimental value
- 3. Uncertainty in hazard assessment – does it matter? 4. Conservative value of toxicity 3. Expected toxicity 2. Median toxicity 1. QSAR predictions without uncertainty 0. No HA ?: 386 Not toxic*: 281 265 262 153 +109 +3 +16 Very toxic: 105 Sahlin et al. 2013. Arguments for Considering Uncertainty in QSAR Predictions in Hazard and Risk Assessments. ATLA
- 4. QSAR integrated hazard assessment and the AD domain problem -10 -8 -6 -4 0200400600800 Predicted No Effect Concentration of 386 Triazoles log min{EC50} Molecularweight Relative toxicity potential Low confidence in prediction
- 5. Modes of statistical inference • Parametric inference – Explain – Hypothesis-driven • Predictive inference – Predict to support decision making – Generate hypothesis • Evidence synthesis – Consider quality Geisser. Introduction to predictive inference 1993. Sutton and Abrams 2001. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research.
- 6. To predict… is to make a statement of something we have not yet observed is always made with uncertainty is made using at least one model
- 7. How can I… • Assess uncertainty in a prediction? • Take my judgement of confidence in the model into account? • Validate the assessment? Principle for QSAR modelling Principle to judge confidence in predictions Principle to assess uncertainty
- 8. Uncertainty in a prediction Predictive error Predictive reliability Our confidence in using a model to predict what we want to predict 0.0 0.1 0.2 0.3 0.4 0.5 0.6 -2-101 hat value predictivemean 2 4 6 8 10 12 14 -2-101 nC logEC50 Discrepancy between model and reality
- 9. -5 0 5 10 -10-5051015 nC predictedy Different kinds of errors
- 10. 5e-02 5e-01 5e+00 5e+01 5e+02 51015 distance from model prediction + + + + + + + + ++++ + + + ++ + ++ + + + ++ + + ++ + + + + ++ + + + + + +++ + ++ + + + + + + + + ++ ++ ++ + + + ++ + + + + + + + + ++ + ++ +++ + + + + + + + + + + ++ ++ + + + + ++ + + + + + + + + + + + + ++ + + + ++ + + + ++ +++ + ++++++++++ + + + + + + + + ++ + + + ++ + ++ ++ + + ++ + + + + + + + ++ ++ + + + + + + + ++ + + + + ++ ++ + + + + + + + + + + + + + + +++ ++++ + + + + + + ++ + + ++ ++ + + + + ++ + + + ++ + + + + + + + + ++ + + + + + ++ ++ + + ++ + + + ++ ++ + +++ + + + + + + +++ + ++ + + + ++ ++ + + ++ + + + + + + + + + + ++ + + ++ + ++ ++ + + + + + + + + + +++ + + ++++ + + +++ +++++++ + + +++ + + + + + + + ++ + + + ++ + ++ + + + + ++++ + +++ + ++ + + ++ Predictive reliability
- 11. Different measures of predictive reliability • Similarity to points in the training data set • Distance from the centre of training data • Density of training data around the item to be predicted • Sensitivity analysis e.g. standard deviation in perturbed predictions
- 12. Predictive error of a regression
- 13. Predictive error of a regression Predictive distribution p(Y < y |X,θ)
- 14. Predictive error of a regression Predictive distribution p(Y < y |X,θ)
- 15. Predictive error of a regression Use likelihood to compare!
- 16. Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling Different ways to assess
- 17. I. Bayesian modelling Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling
- 18. I. Bayesian modelling • Model parameters are uncertain • Uncertainty is described by probability • Prior information is subjective • Data enters through Bayesian updating 0 50 100 150 200 505560657075 MCMC sampling parameter 1 parameter2
- 19. I. Bayesian modelling Pros • Uncertainty is measured by probability • Links to decision theory • Motivated under small data Cons • Treatment of high- dimensional descriptor space? • Limitation to specific models? • Re-modelling of QSARs needed
- 20. Validation Fathead Minnow QSARdata R-package Park and Casella (2008) Journal of the American Statistical Association, Gramacy and Pantaleo (2010) Bayesian Analysis. -2 -1 0 1 2 -1012 training data observed predicted R2_Blasso = 0.79 -3 -2 -1 0 1 2 -2-10123 test data observed predicted R2_Blasso = 0.75
- 21. Validation Empirical coverage 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate
- 22. 2. Bootstrap sampling Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling
- 23. 3. Assessment considering judgment in predictive reliability Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction yq Variance: Local Predictive Error Sum of Squares divided by denominator
- 24. 3. Assessment considering judgment in predictive reliability Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction yq Variance: Local Predictive Error Sum of Squares divided by denominator Observed prediction errors Measure of predictive reliability jj yy ˆ Sampling from distribution of modified residuals
- 25. 3. Assessment considering judgment in predictive reliability n j jq n j jjjq q w yyw PRESSW 1 , 1 2 , )ˆ( . )( 2 , )ˆ(. jqwkNNj jjq yyPRESSkNN n j jj yyPRESS 1 2 )ˆ( Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction Yq Variance: Local Predictive Error Sum of Squares divided by denominator
- 26. Validate the assessment Evaluation on External data log likelihood score Assessmentofpredictiveerror -100 -80 -60 -40 -20 0 equal W euclidean W leverage W ADdens kNN euclidean kNN leverage kNN ADdens 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Empirical coverage (External data) confidence level hitrate 1:1 equal W euclidean W leverage W ADdens kNN euclidean kNN leverage kNN ADdens
- 27. So – which approach is the best? -2 -1 0 1 2 -2-1012 training data observed predicted R2_pls = 0.77 R2_boot = 0.83 R2_Blasso = 0.79 -3 -2 -1 0 1 2 -2-10123 test data observed predicted R2_pls = 0.77 R2_boot = 0.78 R2_Blasso = 0.75
- 28. So – which approach is the best? 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 1:1 Blasso Bootstrap kNN leverage equal 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate 1:1 Blasso Bootstrap W euclidean equal
- 29. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 1:1 Blasso Bootstrap kNN leverage equal 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate 1:1 Blasso Bootstrap W euclidean equal So – which approach is the best? Evaluation on training data log likelihood score Assessmentofpredictiveerror -200 -150 -100 -50 0 Blasso Bootstrap kNN leverage equal
- 30. Take home messages • A predictions is complete when given with uncertainty specified by probability • Assessment of uncertainty need both be theoretical motivated and proved honest in empirical evaluation of performance measures • Three useful approaches are to assess uncertainty through modelling (Bayesian), sampling (e.g. bootstrapping), or post modelling of predictive error • Use appropriate measures to validate the assessment of uncertainty
- 31. Thank you for your attention Drive safely in the statistical djungle!

No public clipboards found for this slide

Be the first to comment