Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

516 views

Published on

This is the presentation from my talk at the excellent Gordon Research Conference on Computer Aided Drug Design 2013.

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
516
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • The number of BTAZs compounds classified as very toxic or not (including potentially*) toxic under the different treatments of QSAR uncertainty both in input and in the output of the assessment. Uncertainty in QSAR predictions is considered in alternatives 2 to 4.
  • Here is an example from a case-study in the EU-project CADASTER where QSAR predictionswereusedtoinform input parameters to an environmental hazardassessment. Assessmentsof relative toxicityarehereshown for 386 triazoles. The largersizeof a red dot the moretoxic. The compoundsarehereplottedagainst the minimum toxicityvaluewhich is the alternative if not uncertainty in QSAR predictionswouldhavebeenconsidered, and againstmolecularweightwhich has a highinfluence on toxcity. If considerationofuncertainty in QSAR predictionswould not have an effect, the compoundsshouldfollow a staightline. They do not, and thismeansthatuncertainty has an (but small) influence on the outcomeof the assessment. Someofthesecompoudndsfelloutof the applicabilitydomainofone or severalof the QSAR modelsused in the assessment. Thesecompoundsareassessedwithlowerconfidence and aremarkedwith a bluetriangle. Whyuncertaintyanalysis: Usingpointestimatesofpredictionse.g best guess or expectations (the plug in approach) does not garanteethat the assessmentareproducing the best guess or expectedvalueof the output.
  • There is always atleastonemodelbehind a prediction. It can be a mental model. It can be a by mathematicswelldefinedmodelconsistingof a set ofequations. The modellingmayinvolve a statistical modelwhichoftenareassumedtohold under certainassumptions. It can be the processsofmodellingwhichmore or less transparentlydescribehow the model has beengenerated, parameters estimated and the modelsperformancevalidated. A reasontodwelluponwhichmodelsinclude is that it sets the possibilitiestoassessuncertainty in predictions. This year's plethora of prognosticators comes thanks to Paul the octopus, who correctly predicted the outcomes of all seven of Germany's World Cup matches in 2010 in addition to the final between Spain and Holland.
  • The titlehere is uncertainty in a prediction, I would like toemphasizethatuncertainty is different from predictiontoprediction.Ineedtospecifyuncertainty in an individualpredictionwhen I usepredictivemodelssuch as QSARstoinform decision analysis in someway or another. I havenoticedthat a qualitativejudgementofpredictivereliabilitymayleadto the modelprediction not beingused, butwhat is the alternative, or the modelpredictionbeingusedbutflaggingthat it may not be good. This has led to the ideatolet the judgmentofpredictivereliabilityinfluence the quantitaive part of the uncertainty in a prediction. Information requirementsFirstwe note thatuncertainty in a predictioncannot be reportedwith a model in the same way general measuresofpredictiveperformancearereported, it depends on whattopredict. Later I will show howtoassessmentofuncertainty in a predictionscan be usedtoevaluatewaystojudgeconfidence in predictions. Note thattheremay be different uncertaintyassociatedto different individualpredictions. Error is not equal for all compoundsuponwhich a model is applied. Thisseemsratherobvious, but in practiseareerroroftenspecified as equal for anyprediction, whilepredictivereliabilitycan be very different. Whilereliability is a qualitativeaspectofuncertaintyrelatedto the question is this a trust worthypieceof information, can I usethisprediction in my risk or decision model, (and the followupquestion: if I can,twhat is the alterantive). Error, being a quantitativecharacterizationofuncertaintycan be dealthwith in the risk or deciaionanalysis, it still provideuse an alternative. There is a needtojointlyconsidererror and predictivereliability. Here is a simple modelbased on onedescriptor. The modelpredict a line, predictiveerrorcan be assessed. Here I haveused a Bayesianmodeltoquantifyerror in predictions. Errorincrease the futheroutof the scatterof data pointswehave, alsowhatcanwesayaboutitemsfallingoutsideof the scatterpoints. Bayesianmodelling[Descriptionof a Bayesian regression][Exemplified by the Bayesian Lasso]Predictive distribution increasewith the distanceto the training data set (hat value)
  • Sopredictiveerror is characterized by a probabilty distribution – the predictive distribution. Note thattheremay be different uncertaintyassociatedto different individualpredictions. Error is not equal for all compoundsuponwhich a model is applied. Thisseemsratherobvious, but in practiseareerroroftenspecified as equal for anyprediction, whilepredictivereliabilitycan be very different. Whilereliability is a qualitativeaspectofuncertaintyrelatedto the question is this a trust worthypieceof information, can I usethisprediction in my risk or decision model, (and the followupquestion: if I can,twhat is the alterantive). Error, being a quantitativecharacterizationofuncertaintycan be dealthwith in the risk or deciaionanalysis, it still provideuse an alternative. There is a needtoconsidererror and reliabilityjointly. Here is a simple modelbased on onedescriptor. The modelpredict a line, predictiveerrorcan be assessed. Here I haveused a Bayesianmodeltoquantifyerror in predictions. The dashedlines mark the boundsofprediction intervals with 95% confidenceofcovering the actualvalue. Errorincrease the furtheroutof the scatterof data pointswehave, alsowhatcanwesayaboutitemsfallingoutsideof the scatterpoints. Predictive distribution increasewith the distanceto the training data set.
  • Here is anotherexampleofdistance from modelversuspointprediction. Thismodel has a highdimensionaldescriptor space and thereof the scatterof black dots (the training data) and red crosses (external predictions). Hereweclearilyseethatsomecompoundsbecomesevere extrapolations from the AD whenpredicted by thismodel. As an alternative todisregardingthesepredictionswecould ask, yesthesepredictionsare bad, buthow bad and does it matter for our decision?
  • Judging the reliability in using a modeltopredictaremadeuponseveralcritierias. Firstonecan look for general qualitativecriterias, whether the compoundfullfillcertaincharacteritcstahthe QSAR is modelling. When thepredefinedcriteriasaremet, different measuresof a modelsdomainofapplicabilitycan be usedtoevaluatepredictivereliablility. Bild 0. 2 dimensionel avståndBild 1. avståndBild 2. Täthet (3 dim)Bild 3. Visa på något sätt.
  • Predictive distributionUncertaintydescribed by a probability distributionDescribes the errorwith a probability distribution
  • Ifwebelieve the assessmentofuncertaintyto be true, wewouldexpect the truevalueto fall somewhere under the predictive distribution. Close to the center of the predictive distribution moreoften.
  • Here is an attemptto show an overview over approachestoassesspredictiveerrors (or the predictive distribution).This is not covering all approaches, but the most common and I am happy todiscussthismorewithsomeoneinterested.It has twomainbranches – frequentist (or classlical) statistical framwork and Bayesianframework. I willnow pick and demonstrateoneexample from eachofthesetwobranches.
  • The first is a Bayesianappraochtoassessuncertainty. Bayesianmodelling is from the beginning designed tomodeluncertainty in parameters usingprobabilities and aretherefore ideal toassespredictiveerrors.
  • Bayesianmodellingcanquickly be summarized as the activityofmodellingwhere parameters areassigneduncertaintyusingprobabiltiies. A modelconsiistofmodelstructurewhose parameters tobeginwith a assigneduncertainty distribution that express our prior (taht is beforelooking at data) understandingoftheirvalues and characteristicofuncertainty. Data entersthroughBayesianupdating – an this so calledlikelihood principle can be more or less strict .ABayesianmodel is usuallyfitted by Markov Chain Monte Carlo sampling, whichmeansthat an simulation algorithmssearches for optima under the distribtonsof the parameters when the information in data is considered. Priors telluswhereto look and the data telluswhat is a goodplaceto be. In the figurewesee a simulation whichtookusto a good spot for the values on two parameters. When the algorithmseemtostay at the same place – wesaythat it has converged. Wethenthroughawaythosevalues in the beginnnigof the simulation and usethose (here red dots) to generate predictions from the model. Alsosince the parameters areuncertain the predictionswillalso be uncertainty and – viola – wehave a predictive distribution. Bayesianmodelling is THE frameworktoquantifyuncertainty. I provides uncertaintywith a fairlyeasy interpretation – i.e. ouruncertainty in a valuestemming from our expert knowledge and justified by information in empirical observations. At least in theory it is …Gaussian process can deal withhighdimensionaldescriptorspaces, but the mechanisticunderstandingof the model is lost.
  • TheadvantageswithBayesianmodellingarethatIt result in uncertaintyto be assessed by a probability distributionItinterpretaionofuncertainty is a directlinkto decision theoreticframework – usefulwhenoptimisingtestingstrategies for experimental design or (as in the applications I haveworkingwith) when QSAR predictionsinform input to risk assessmentmodels for chemialregulation. Also, it has a theoretical motivation even under small data sizes (-> Bayesian meta-regression)A problem is that it does not alwayswork in practise. It works best for parametricmodels, sincespecifying priors can be difficultifwe do not whatare in needof priors.It is not clearhowtotreathighdimensionaldescriptor space – the selectionofdescriptors is puzzlingme, from whereshoulddescriptors be part of the model. It is limitedtoBayesianmodellingFinally, it requiresQSARsiftheyalreadyexistto be Re-modelled as Bayesianmodels. Should original set ofdescriptors be considered or the final selectionDifferent parameter values: from pointestimatewith est variance in a frequentistframeworktoposterior distribution depdend on choice of priors in a Bayesianframework. Is it the same QSAR?
  • Letus look at the overviewofmethodstoassesspredictive distributions. From the frequentisticsideof bransch of the tree I consider re-sampling. Re-sampling sinceweoftenhave a limitation of data. Re-sampling withreplacement, whichmeansthat the same data pointcan be drawnseveraltimes. Thiscancreateinbreeding, i.e. thatsomeresultsappearthat is an artefactof the particula data, and onehavetocautions under small samplesizes. A recipie for Bootstrappingcansimply be toSpecify a quantitywhichuncertaintyweareinterested in. It can be a test statistic, an estimated parameter value or a predictiveerror (i .e. the discrepancybetween a prediction and reality). Thenwespecifyhowtoderivethisquantifybased on observations thatwehave. Thenrepeatedlysample from the observations and let the quantity be derivedmanytimes. Thisresult in a distribution for the quantitywhich express itsuncertainty. Bootstrappingoccurswhenweallow observations to be sampleseveraltimes. A classicalapplication is to fit a modelto data, generate predictions and deriveresiduals, sample from the distribution ofresidualsto generate new data, fit a new model and save the estimated parametrs. Repeatthissevaraltimes. Whatwe get is somethingsimilarto the Bayesianmodel, withuncertainty in the parameters whichresult in uncertainty in the predcitions. The interpretaionofuncertainty is different though. The useofbootstrapsolvesomeof the problems with the Bayesianmodelling. I will not show anyresults from Bootstrappinghere. I willquicklyturnto my third approach toassessuncertainty in a prediction, and that is the approach which do not refit the underlying QSAR model, butusenotionofpredictivereliablity in the assessmentofpredictiveerrors.
  • Givenare observations ofpredictionerror, i.e. the differencebetween a modelpredictionofcompound not part of the training data set and the actualvalue. For each observation weknow the correposndingmeasureofpredictivereliability. Usingoneof the PRESSesdescribed in the previousslidewecanderive the Local PRESS for a certainquerycompound by comparingitspredictivereliabilitytothoseof the assessment data set. The general algorithm to assess predictive uncertainty samples from the distribution of so calledmodified residuals. A modified residual is found by dividing prediction residuals from an assessment data set, yj–ŷ-j, by each item’s specific standard error SDEPj. If the standard error is properly estimated, and if we assume observed and not yet observed compounds to be exchangeable, the sample of modified residuals provides input for the predictive distribution of individual predictions of new compounds. In this way we do not have to specify what to divide the PRESS value by for the PRESS to be a variance of the predictive distribution. Thisassessment goes quicktorun. Whattakestime is toderive the measuresofpredictivereliability and perhaps LOO predictionerrors for a training data set (if no external data set is used). It alsobecomenecessaryto ask whatmeasureofpredictivereliabilitytouse. In the beginning I mentionedfour different kinds: similarty in descriptor space, distanceto the centreof the AD, densityof the AD closeto the predictedcompound, and sensitivityanalysiswhichcan be the standard deviation in a predictionwhen a model is generatedseveraltimeswith different outcomeseverytime. The nicethingwithhaving a predictive distribution is thatwecanactuallyvalidatehowgoodboth the model and the uncertainty in itspredictionsare. It is very common tocomparemeasuresofpredictivereliabilitythrough the correlationbetweenobservederrors and the measure, butweknowthaterrorscan be both small and large at the same time, theyaredrawn from a distribution. Weknowthatuncertaity in predcitionmayvary from compoundtocompound, butsincewehaveassessedindividualuncertainty in predictions, wecaneasilyplaceeachprediction under itscorrespondingpredictive distribution.
  • Givenare observations ofpredictionerror, i.e. the differencebetween a modelpredictionofcompound not part of the training data set and the actualvalue. For each observation weknow the correposndingmeasureofpredictivereliability. Usingoneof the PRESSesdescribed in the previousslidewecanderive the Local PRESS for a certainquerycompound by comparingitspredictivereliabilitytothoseof the assessment data set. The general algorithm to assess predictive uncertainty samples from the distribution of so calledmodified residuals. A modified residual is found by dividing prediction residuals from an assessment data set, yj–ŷ-j, by each item’s specific standard error SDEPj. If the standard error is properly estimated, and if we assume observed and not yet observed compounds to be exchangeable, the sample of modified residuals provides input for the predictive distribution of individual predictions of new compounds. In this way we do not have to specify what to divide the PRESS value by for the PRESS to be a variance of the predictive distribution. Thisassessment goes quicktorun. Whattakestime is toderive the measuresofpredictivereliability and perhaps LOO predictionerrors for a training data set (if no external data set is used). It alsobecomenecessaryto ask whatmeasureofpredictivereliabilitytouse. In the beginning I mentionedfour different kinds: similarty in descriptor space, distanceto the centreof the AD, densityof the AD closeto the predictedcompound, and sensitivityanalysiswhichcan be the standard deviation in a predictionwhen a model is generatedseveraltimeswith different outcomeseverytime. The nicethingwithhaving a predictive distribution is thatwecanactuallyvalidatehowgoodboth the model and the uncertainty in itspredictionsare. It is very common tocomparemeasuresofpredictivereliabilitythrough the correlationbetweenobservederrors and the measure, butweknowthaterrorscan be both small and large at the same time, theyaredrawn from a distribution. Weknowthatuncertaity in predcitionmayvary from compoundtocompound, butsincewehaveassessedindividualuncertainty in predictions, wecaneasilyplaceeachprediction under itscorrespondingpredictive distribution.
  • This approach aimtomodel the errordirectlybased on the judgementofpredictivereliability. For this I need a model for the predictive distribution:Still tamperingwith regressions the predictive distribution is assignedto be Gaussian (bellshaped distribution and symmetricarounditsmean). The meanvalue is the pointprediction from the QSAR model. Information ofpredictiveerror is thencontained in the Varianceofthispredcitive distribution. I let the variance be assessed by a local PRESS divided by a denominator.A reason for this choice is that is should be easytoapply and at best todocumentwithmodels. The Gaussian distribution is a simplification. Other distribution types, even non-parametric, couldhavebeen chosen. Also, the useofbothBayesian and bootstrap as shownbeforerequiresrunning a code (which is possiblebutperhaps not alwaysappreciated).PRESS is a common reportedperfomrancemeasureofQSARs, so why not usethat as a nullmodel for the assessmentofpredictiveuncertaity. Thismeansthat the nullmodelstatesthat all predictiveerrorsareequal and can be derived from the PRESS value. Wehavetriedtwo variants ofLocal PRESS.A weighted PRESS whichweightsaccordingto a measureofsimilarity in predictivereliabilityof the querycompound and of the compound for which I haveobservedpredictionerrors. The weight is constructedsuchthatobservederrorswithrelativelymoresimilarpredictivereliabilityare given higherinfluence in the assessmentvariance. As a consequencevariance for a compoundthat lies in the centreof the AD aremostlybaseduponerrorsobserved for compounds in the centre, and vice versa. A moredirect variant ofthistheme is touselet the PRESS value be morelocal by summing over the k nearestneighbours, wherewhat is near is judgedbased on similarity in predictivereliability. A problem with sampling basedapproaches is that the error in the outscirtsof the AD is less reliablyassessedsince it by definition are less valuesthere, and we do not ass in the Bayesiancaseprovideanyother information. Thus, the locally assessed predictive error can be seen as a conditional predictive error, i.e. the expected error given a compound’s position in the domain of applicability or prior information on uncertainty.
  • Herearetwowaystovalidateassessmentsofuncertaintyusing an external data set (at best not part of the modelling leading to the assessments). Firstwehavesummed the loggedlikehoodvalues for eachpoint in the external data set. A high score means a better (wellbalanced) assessmentofuncertainty. It meansthatmostcompoundsfell inside the predictivedistriubution and fewwerevery far out. I havenoticedthat the likelihood score can be a bit trickysometimes. And I alwaysprefertoalso look at the graphical display ofempricialcoverages. Empiricalcoverageplotsaregenerated by for different confidencelevelscount the proportion ofcompounds in the data set thatfell inside theircorrespondingprediction intervals. A good and wellbalancedassessmentshould generate a straigthonetooneline. It is importanttokeep in mind thathteunderlying QSAR modelshould be properlyvalidatedbeforedoingthisexcersice.
  • Bayesian vs bootstrapLoglieklihoodcoverage - while the likelihoodiprovide relative comparison, the empricialcoverageprovide an evaluationthatstand for itself. This is becuaseweuse the uncertaity in predictions as a probabilisticformulatedhypothesisof the observedvalue in the external data set.
  • Bayesian vs bootstrapLoglieklihoodcoverage - while the likelihoodiprovide relative comparison, the empricialcoverageprovide an evaluationthatstand for itself. This is becuaseweuse the uncertaity in predictions as a probabilisticformulatedhypothesisof the observedvalue in the external data set.
  • Quantiative uncertainty in QSAR predictions - Bayesian predictive inference and the magic of bootstrap

    1. 1. Uncertainty in QSAR Predictions – Bayesian Inference and the Magic of Bootstrap Ullrika Sahlin PhD Centre for Environmental and Climate Research (CEC)
    2. 2. QSAR integrated assessment Assessment model Input 1 Input 2 Input 3 Decision node QSAR prediction QSAR prediction Experimental value
    3. 3. Uncertainty in hazard assessment – does it matter? 4. Conservative value of toxicity 3. Expected toxicity 2. Median toxicity 1. QSAR predictions without uncertainty 0. No HA ?: 386 Not toxic*: 281 265 262 153 +109 +3 +16 Very toxic: 105 Sahlin et al. 2013. Arguments for Considering Uncertainty in QSAR Predictions in Hazard and Risk Assessments. ATLA
    4. 4. QSAR integrated hazard assessment and the AD domain problem -10 -8 -6 -4 0200400600800 Predicted No Effect Concentration of 386 Triazoles log min{EC50} Molecularweight Relative toxicity potential Low confidence in prediction
    5. 5. Modes of statistical inference • Parametric inference – Explain – Hypothesis-driven • Predictive inference – Predict to support decision making – Generate hypothesis • Evidence synthesis – Consider quality Geisser. Introduction to predictive inference 1993. Sutton and Abrams 2001. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research.
    6. 6. To predict…  is to make a statement of something we have not yet observed  is always made with uncertainty  is made using at least one model
    7. 7. How can I… • Assess uncertainty in a prediction? • Take my judgement of confidence in the model into account? • Validate the assessment? Principle for QSAR modelling Principle to judge confidence in predictions Principle to assess uncertainty
    8. 8. Uncertainty in a prediction Predictive error Predictive reliability Our confidence in using a model to predict what we want to predict 0.0 0.1 0.2 0.3 0.4 0.5 0.6 -2-101 hat value predictivemean 2 4 6 8 10 12 14 -2-101 nC logEC50 Discrepancy between model and reality
    9. 9. -5 0 5 10 -10-5051015 nC predictedy Different kinds of errors
    10. 10. 5e-02 5e-01 5e+00 5e+01 5e+02 51015 distance from model prediction + + + + + + + + ++++ + + + ++ + ++ + + + ++ + + ++ + + + + ++ + + + + + +++ + ++ + + + + + + + + ++ ++ ++ + + + ++ + + + + + + + + ++ + ++ +++ + + + + + + + + + + ++ ++ + + + + ++ + + + + + + + + + + + + ++ + + + ++ + + + ++ +++ + ++++++++++ + + + + + + + + ++ + + + ++ + ++ ++ + + ++ + + + + + + + ++ ++ + + + + + + + ++ + + + + ++ ++ + + + + + + + + + + + + + + +++ ++++ + + + + + + ++ + + ++ ++ + + + + ++ + + + ++ + + + + + + + + ++ + + + + + ++ ++ + + ++ + + + ++ ++ + +++ + + + + + + +++ + ++ + + + ++ ++ + + ++ + + + + + + + + + + ++ + + ++ + ++ ++ + + + + + + + + + +++ + + ++++ + + +++ +++++++ + + +++ + + + + + + + ++ + + + ++ + ++ + + + + ++++ + +++ + ++ + + ++ Predictive reliability
    11. 11. Different measures of predictive reliability • Similarity to points in the training data set • Distance from the centre of training data • Density of training data around the item to be predicted • Sensitivity analysis e.g. standard deviation in perturbed predictions
    12. 12. Predictive error of a regression
    13. 13. Predictive error of a regression Predictive distribution p(Y < y |X,θ)
    14. 14. Predictive error of a regression Predictive distribution p(Y < y |X,θ)
    15. 15. Predictive error of a regression Use likelihood to compare!
    16. 16. Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling Different ways to assess
    17. 17. I. Bayesian modelling Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling
    18. 18. I. Bayesian modelling • Model parameters are uncertain • Uncertainty is described by probability • Prior information is subjective • Data enters through Bayesian updating 0 50 100 150 200 505560657075 MCMC sampling parameter 1 parameter2
    19. 19. I. Bayesian modelling Pros • Uncertainty is measured by probability • Links to decision theory • Motivated under small data Cons • Treatment of high- dimensional descriptor space? • Limitation to specific models? • Re-modelling of QSARs needed
    20. 20. Validation Fathead Minnow QSARdata R-package Park and Casella (2008) Journal of the American Statistical Association, Gramacy and Pantaleo (2010) Bayesian Analysis. -2 -1 0 1 2 -1012 training data observed predicted R2_Blasso = 0.79 -3 -2 -1 0 1 2 -2-10123 test data observed predicted R2_Blasso = 0.75
    21. 21. Validation Empirical coverage 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate
    22. 22. 2. Bootstrap sampling Assessment of predictive distribution Frequentist framework Frequentist analytical Sampling "external data" Re-sampling Jackknifing "without replacement" Bootstrapping "with replacement" Bayesian framework Bayesian analytical Bayesian sampling
    23. 23. 3. Assessment considering judgment in predictive reliability Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction yq Variance: Local Predictive Error Sum of Squares divided by denominator
    24. 24. 3. Assessment considering judgment in predictive reliability Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction yq Variance: Local Predictive Error Sum of Squares divided by denominator Observed prediction errors Measure of predictive reliability jj yy ˆ Sampling from distribution of modified residuals
    25. 25. 3. Assessment considering judgment in predictive reliability n j jq n j jjjq q w yyw PRESSW 1 , 1 2 , )ˆ( . )( 2 , )ˆ(. jqwkNNj jjq yyPRESSkNN n j jj yyPRESS 1 2 )ˆ( Inspired by Denham 1997 and Clark 2009 Type of distribution: Gaussian Mean: Point prediction Yq Variance: Local Predictive Error Sum of Squares divided by denominator
    26. 26. Validate the assessment Evaluation on External data log likelihood score Assessmentofpredictiveerror -100 -80 -60 -40 -20 0 equal W euclidean W leverage W ADdens kNN euclidean kNN leverage kNN ADdens 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Empirical coverage (External data) confidence level hitrate 1:1 equal W euclidean W leverage W ADdens kNN euclidean kNN leverage kNN ADdens
    27. 27. So – which approach is the best? -2 -1 0 1 2 -2-1012 training data observed predicted R2_pls = 0.77 R2_boot = 0.83 R2_Blasso = 0.79 -3 -2 -1 0 1 2 -2-10123 test data observed predicted R2_pls = 0.77 R2_boot = 0.78 R2_Blasso = 0.75
    28. 28. So – which approach is the best? 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 1:1 Blasso Bootstrap kNN leverage equal 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate 1:1 Blasso Bootstrap W euclidean equal
    29. 29. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 training data confidence hitrate 1:1 Blasso Bootstrap kNN leverage equal 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 test data confidence hitrate 1:1 Blasso Bootstrap W euclidean equal So – which approach is the best? Evaluation on training data log likelihood score Assessmentofpredictiveerror -200 -150 -100 -50 0 Blasso Bootstrap kNN leverage equal
    30. 30. Take home messages • A predictions is complete when given with uncertainty specified by probability • Assessment of uncertainty need both be theoretical motivated and proved honest in empirical evaluation of performance measures • Three useful approaches are to assess uncertainty through modelling (Bayesian), sampling (e.g. bootstrapping), or post modelling of predictive error • Use appropriate measures to validate the assessment of uncertainty
    31. 31. Thank you for your attention Drive safely in the statistical djungle!

    ×