2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilistic (Bayesian) Combination of Evidence (EDIT) - Jim Berger, September 11, 2018
This document discusses Bayesian approaches to combining evidence from multiple data sources or models. It recommends combining data probabilistically using Bayes' theorem rather than averaging. It provides an example of combining three data sources on annual rainfall measurements by treating the data sources as independent measurements and deriving the posterior distribution of rainfall amounts given the data. It also discusses challenges that arise when combining dependent data sources or models, and presents examples of hierarchical modeling approaches.
The document describes a course on model uncertainty taught in the fall of 2018. It covers topics like statistical and mathematical model uncertainty, Bayesian hypothesis testing and model uncertainty, priors for Bayesian model uncertainty, approximations and computation, model inputs and outputs, model calibration, Gaussian processes, surrogate models, sensitivity analysis, and model discrepancy. The course is taught over 12 weeks by two lecturers and includes weekly topics like introduction to uncertainty, Bayesian analysis, representation of inputs/outputs, calibration, Gaussian processes, surrogate models, sampling techniques, and sensitivity analysis.
This lecture introduces Bayesian hypothesis testing. It discusses an example comparing HIV infection rates between a treatment and placebo group. A Bayesian analysis is presented that calculates posterior probabilities for the null and alternative hypotheses using prior probabilities and Bayes factors. The lecture outlines general notation for Bayesian testing and discusses issues like choosing prior distributions and testing precise versus imprecise hypotheses. It also discusses interpreting Bayes factors and relates posterior probabilities to p-values in some cases.
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
The document discusses hypothesis testing from both frequentist and Bayesian perspectives. It introduces the concept of statistical tests as functions that output accept or reject decisions for hypotheses. P-values are presented as a way to quantify uncertainty in these decisions. Bayes' original 1763 paper on Bayesian statistics is summarized, introducing the concept of the posterior distribution. Bayesian hypothesis testing is then discussed, including the optimal Bayes test and the use of Bayes factors to compare hypotheses without requiring prior probabilities on the hypotheses.
This document summarizes a presentation on testing hypotheses as mixture estimation and the challenges of Bayesian testing. The key points are:
1) Bayesian hypothesis testing faces challenges including the dependence on prior distributions, difficulties interpreting Bayes factors, and the inability to use improper priors in most situations.
2) Testing via mixtures is proposed as a paradigm shift that frames hypothesis testing as a model selection problem involving mixture models rather than distinct hypotheses.
3) Traditional Bayesian testing using Bayes factors and posterior probabilities depends strongly on prior distributions and choices that are difficult to justify, while not providing measures of uncertainty around decisions. Alternative approaches are needed to address these issues.
- Approximate Bayesian computation (ABC) is a technique used when the likelihood function is intractable or unavailable. It approximates the Bayesian posterior distribution in a likelihood-free manner.
- ABC works by simulating parameter values from the prior and simulating pseudo-data. Parameter values are accepted if the simulated pseudo-data are "close" to the observed data according to some distance measure and tolerance level.
- ABC originated in population genetics models where genealogies are considered nuisance parameters that cannot be integrated out of the likelihood. It has since been applied to other fields like econometrics for models with complex or undefined likelihoods.
This document discusses challenges and recent advances in Approximate Bayesian Computation (ABC) methods. ABC methods are used when the likelihood function is intractable or unavailable in closed form. The core ABC algorithm involves simulating parameters from the prior and simulating data, retaining simulations where the simulated and observed data are close according to a distance measure on summary statistics. The document outlines key issues like scalability to large datasets, assessment of uncertainty, and model choice, and discusses advances such as modified proposals, nonparametric methods, and perspectives that include summary construction in the framework. Validation of ABC model choice and selection of summary statistics remains an open challenge.
The document describes a course on model uncertainty taught in the fall of 2018. It covers topics like statistical and mathematical model uncertainty, Bayesian hypothesis testing and model uncertainty, priors for Bayesian model uncertainty, approximations and computation, model inputs and outputs, model calibration, Gaussian processes, surrogate models, sensitivity analysis, and model discrepancy. The course is taught over 12 weeks by two lecturers and includes weekly topics like introduction to uncertainty, Bayesian analysis, representation of inputs/outputs, calibration, Gaussian processes, surrogate models, sampling techniques, and sensitivity analysis.
This lecture introduces Bayesian hypothesis testing. It discusses an example comparing HIV infection rates between a treatment and placebo group. A Bayesian analysis is presented that calculates posterior probabilities for the null and alternative hypotheses using prior probabilities and Bayes factors. The lecture outlines general notation for Bayesian testing and discusses issues like choosing prior distributions and testing precise versus imprecise hypotheses. It also discusses interpreting Bayes factors and relates posterior probabilities to p-values in some cases.
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
The document discusses hypothesis testing from both frequentist and Bayesian perspectives. It introduces the concept of statistical tests as functions that output accept or reject decisions for hypotheses. P-values are presented as a way to quantify uncertainty in these decisions. Bayes' original 1763 paper on Bayesian statistics is summarized, introducing the concept of the posterior distribution. Bayesian hypothesis testing is then discussed, including the optimal Bayes test and the use of Bayes factors to compare hypotheses without requiring prior probabilities on the hypotheses.
This document summarizes a presentation on testing hypotheses as mixture estimation and the challenges of Bayesian testing. The key points are:
1) Bayesian hypothesis testing faces challenges including the dependence on prior distributions, difficulties interpreting Bayes factors, and the inability to use improper priors in most situations.
2) Testing via mixtures is proposed as a paradigm shift that frames hypothesis testing as a model selection problem involving mixture models rather than distinct hypotheses.
3) Traditional Bayesian testing using Bayes factors and posterior probabilities depends strongly on prior distributions and choices that are difficult to justify, while not providing measures of uncertainty around decisions. Alternative approaches are needed to address these issues.
- Approximate Bayesian computation (ABC) is a technique used when the likelihood function is intractable or unavailable. It approximates the Bayesian posterior distribution in a likelihood-free manner.
- ABC works by simulating parameter values from the prior and simulating pseudo-data. Parameter values are accepted if the simulated pseudo-data are "close" to the observed data according to some distance measure and tolerance level.
- ABC originated in population genetics models where genealogies are considered nuisance parameters that cannot be integrated out of the likelihood. It has since been applied to other fields like econometrics for models with complex or undefined likelihoods.
This document discusses challenges and recent advances in Approximate Bayesian Computation (ABC) methods. ABC methods are used when the likelihood function is intractable or unavailable in closed form. The core ABC algorithm involves simulating parameters from the prior and simulating data, retaining simulations where the simulated and observed data are close according to a distance measure on summary statistics. The document outlines key issues like scalability to large datasets, assessment of uncertainty, and model choice, and discusses advances such as modified proposals, nonparametric methods, and perspectives that include summary construction in the framework. Validation of ABC model choice and selection of summary statistics remains an open challenge.
This document discusses Bayesian hypothesis testing and some of the challenges associated with it. It makes three key points:
1) There is tension between using posterior probabilities from a loss function approach versus Bayes factors, which eliminate prior dependence but have no direct connection to the posterior.
2) Bayesian hypothesis testing relies on choosing prior probabilities for hypotheses and prior distributions for parameters, which can strongly impact results and are often arbitrary.
3) Common Bayesian testing procedures like using Bayes factors can produce paradoxical results in some cases, like Lindley's paradox where the Bayes factor favors the null hypothesis as sample size increases despite evidence against it.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. ABC can be framed as a machine learning problem where simulated datasets are used to learn which model is most appropriate. Random forests are well-suited for this as they can handle many correlated summary statistics without information loss. The random forest predicts the most likely model but not posterior probabilities. Instead, the posterior predictive expected error rate across models is proposed to evaluate model selection performance without unstable probability approximations. An example comparing MA(1) and MA(2) time series models illustrates the approach.
Discussion of Persi Diaconis' lecture at ISBA 2016Christian Robert
This document discusses Monte Carlo methods for numerical integration and estimating normalizing constants. It summarizes several approaches: estimating normalizing constants using samples; reverse logistic regression for estimating constants in mixtures; Xiao-Li's maximum likelihood formulation for Monte Carlo integration; and Persi's probabilistic numerics which provide uncertainties for numerical calculations. The document advocates first approximating the distribution of an integrand before estimating its expectation to incorporate non-parametric information and account for multiple estimators.
This document proposes representing hypothesis testing problems as estimating mixture models. Specifically, two competing models are embedded within an encompassing mixture model with a weight parameter between 0 and 1. Inference is then drawn on the mixture representation, treating each observation as coming from the mixture model. This avoids difficulties with traditional Bayesian testing approaches like computing marginal likelihoods. It also allows for a more intuitive interpretation of the weight parameter compared to posterior model probabilities. The weight parameter can be estimated using standard mixture estimation algorithms like Gibbs sampling or Metropolis-Hastings. Several illustrations of the approach are provided, including comparisons of Poisson and geometric distributions.
The document proposes using random forests (RF), a machine learning tool, for approximate Bayesian computation (ABC) model choice rather than estimating model posterior probabilities. RF improves on existing ABC model choice methods by having greater discriminative power among models, being robust to the choice and number of summary statistics, requiring less computation, and providing an error rate to evaluate confidence in the model choice. The authors illustrate the power of the RF-based ABC methodology on controlled experiments and real population genetics datasets.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
- The document provides information about statisticshomeworkhelper.com, a service that offers probability and statistics assignment help. It lists their website, email, and phone number for contacting them.
- It then provides an example of a multi-part statistics problem involving hypothesis testing on coin flips and dice data. It asks the reader to conduct various statistical tests and interpret the results.
- Finally, it lists some additional practice problems involving chi-square tests, ANOVA, and other statistical analyses for the reader to work through.
random forests for ABC model choice and parameter estimationChristian Robert
The document discusses Approximate Bayesian Computation (ABC). It begins by introducing ABC as a likelihood-free method for Bayesian inference when the likelihood function is unavailable or computationally intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data based on a distance measure.
The document then discusses advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem, and including measurement error in the framework. It also discusses the consistency of ABC as the number of simulations increases and sample size grows large. Finally, it discusses applications of ABC to model selection by treating the model index as an additional parameter.
The document discusses Hessian matrices in statistics. It begins by introducing the Hessian matrix and describing relevant statistical concepts like maximum likelihood estimation and the likelihood function. It then provides an example of calculating the Hessian matrix for a Gaussian linear regression model estimated using maximum likelihood. The Hessian shows that the MLE solution maximizes the likelihood function and is the minimum. Larger sample sizes improve MLE estimates, as the estimate converges to the true parameter value as the sample size increases.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
The document discusses roundoff errors that occur in numerical algorithms involving real numbers. It provides an overview of floating point number representation, rounding errors, and how roundoff error accumulates over multiple operations. Specifically, it describes how numbers are stored in computers using a floating point system, the IEEE standard for floating point arithmetic, and identifies different sources of growing roundoff errors and how to reduce their effects.
This document discusses several perspectives and solutions to Bayesian hypothesis testing. It outlines issues with Bayesian testing such as the dependence on prior distributions and difficulties interpreting Bayesian measures like posterior probabilities and Bayes factors. It discusses how Bayesian testing compares models rather than identifying a single true model. Several solutions to challenges are discussed, like using Bayes factors which eliminate the dependence on prior model probabilities but introduce other issues. The document also discusses testing under specific models like comparing a point null hypothesis to alternatives. Overall it presents both Bayesian and frequentist views on hypothesis testing and some of the open controversies in the field.
If you are looking for business statistics homework help, Statisticshelpdesk is your rightest destination. Our experts are capable of solving all grades of business statistics homework with best 100% accuracy and originality. We charge reasonable.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
This document provides an overview of ABC methodology and applications. It begins with examples from population genetics and econometrics that are well-suited for ABC. It then describes the basic ABC algorithm for Bayesian inference using simulation: specifying prior distributions, simulating data under different parameter values, and accepting simulations that best match the observed data. Indirect inference is also discussed as a method for choosing informative summary statistics for ABC. The document traces the origins of ABC to population genetics models from the late 1990s and highlights ongoing contributions from that field to ABC methodology.
An Introduction to Mis-Specification (M-S) Testingjemille6
This document provides an introduction to mis-specification (M-S) testing, which is a methodology for validating statistical models. It discusses how statistical misspecification can render statistical inference unreliable by distorting nominal error probabilities. As an example, it shows how violating the independence assumption in a normal model can increase the actual type I error rate and reduce power. It argues that model validation through M-S testing is important for ensuring reliable inference, but is often neglected due to misunderstandings. All statistical methods rely on an underlying statistical model, so any misspecification impacts reliability regardless of the method used.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
better together? statistical learning in models made of modulesChristian Robert
The document discusses statistical models composed of modular components called modules. Each module may be developed independently and represent different data modalities or domains of knowledge. Joint Bayesian updating treats all modules simultaneously but misspecification of one module can impact the others. Alternative approaches are proposed to allow uncertainty propagation between modules while preventing feedback that could lead to misspecification. Candidate distributions for the modules are discussed, along with strategies for choosing among them based on predictive performance.
This document proposes a method for linear regression on symbolic data where each observation is represented by a Gaussian distribution. It derives the likelihood function for such "Gaussian symbols" and shows that it can be maximized using gradient descent. Simulation results demonstrate that the maximum likelihood estimator performs better than a naive least squares regression on the mean of each symbol. The method extends classical linear regression to the symbolic data setting.
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
Stratified Monte Carlo is proposed as a method to accelerate ABC-MCMC by reducing its computational cost. It involves partitioning the summary statistic space into strata and estimating the ABC likelihood using a stratified Monte Carlo approach based on resampling. This reduces the variance compared to using a single resampled dataset, without introducing significant bias as resampling alone would. The method is tested on a simple Gaussian example where it provides a posterior approximation closer to the true posterior than standard ABC-MCMC.
This document discusses Bayesian hypothesis testing and some of the challenges associated with it. It makes three key points:
1) There is tension between using posterior probabilities from a loss function approach versus Bayes factors, which eliminate prior dependence but have no direct connection to the posterior.
2) Bayesian hypothesis testing relies on choosing prior probabilities for hypotheses and prior distributions for parameters, which can strongly impact results and are often arbitrary.
3) Common Bayesian testing procedures like using Bayes factors can produce paradoxical results in some cases, like Lindley's paradox where the Bayes factor favors the null hypothesis as sample size increases despite evidence against it.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. ABC can be framed as a machine learning problem where simulated datasets are used to learn which model is most appropriate. Random forests are well-suited for this as they can handle many correlated summary statistics without information loss. The random forest predicts the most likely model but not posterior probabilities. Instead, the posterior predictive expected error rate across models is proposed to evaluate model selection performance without unstable probability approximations. An example comparing MA(1) and MA(2) time series models illustrates the approach.
Discussion of Persi Diaconis' lecture at ISBA 2016Christian Robert
This document discusses Monte Carlo methods for numerical integration and estimating normalizing constants. It summarizes several approaches: estimating normalizing constants using samples; reverse logistic regression for estimating constants in mixtures; Xiao-Li's maximum likelihood formulation for Monte Carlo integration; and Persi's probabilistic numerics which provide uncertainties for numerical calculations. The document advocates first approximating the distribution of an integrand before estimating its expectation to incorporate non-parametric information and account for multiple estimators.
This document proposes representing hypothesis testing problems as estimating mixture models. Specifically, two competing models are embedded within an encompassing mixture model with a weight parameter between 0 and 1. Inference is then drawn on the mixture representation, treating each observation as coming from the mixture model. This avoids difficulties with traditional Bayesian testing approaches like computing marginal likelihoods. It also allows for a more intuitive interpretation of the weight parameter compared to posterior model probabilities. The weight parameter can be estimated using standard mixture estimation algorithms like Gibbs sampling or Metropolis-Hastings. Several illustrations of the approach are provided, including comparisons of Poisson and geometric distributions.
The document proposes using random forests (RF), a machine learning tool, for approximate Bayesian computation (ABC) model choice rather than estimating model posterior probabilities. RF improves on existing ABC model choice methods by having greater discriminative power among models, being robust to the choice and number of summary statistics, requiring less computation, and providing an error rate to evaluate confidence in the model choice. The authors illustrate the power of the RF-based ABC methodology on controlled experiments and real population genetics datasets.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
- The document provides information about statisticshomeworkhelper.com, a service that offers probability and statistics assignment help. It lists their website, email, and phone number for contacting them.
- It then provides an example of a multi-part statistics problem involving hypothesis testing on coin flips and dice data. It asks the reader to conduct various statistical tests and interpret the results.
- Finally, it lists some additional practice problems involving chi-square tests, ANOVA, and other statistical analyses for the reader to work through.
random forests for ABC model choice and parameter estimationChristian Robert
The document discusses Approximate Bayesian Computation (ABC). It begins by introducing ABC as a likelihood-free method for Bayesian inference when the likelihood function is unavailable or computationally intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data based on a distance measure.
The document then discusses advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem, and including measurement error in the framework. It also discusses the consistency of ABC as the number of simulations increases and sample size grows large. Finally, it discusses applications of ABC to model selection by treating the model index as an additional parameter.
The document discusses Hessian matrices in statistics. It begins by introducing the Hessian matrix and describing relevant statistical concepts like maximum likelihood estimation and the likelihood function. It then provides an example of calculating the Hessian matrix for a Gaussian linear regression model estimated using maximum likelihood. The Hessian shows that the MLE solution maximizes the likelihood function and is the minimum. Larger sample sizes improve MLE estimates, as the estimate converges to the true parameter value as the sample size increases.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
The document discusses roundoff errors that occur in numerical algorithms involving real numbers. It provides an overview of floating point number representation, rounding errors, and how roundoff error accumulates over multiple operations. Specifically, it describes how numbers are stored in computers using a floating point system, the IEEE standard for floating point arithmetic, and identifies different sources of growing roundoff errors and how to reduce their effects.
This document discusses several perspectives and solutions to Bayesian hypothesis testing. It outlines issues with Bayesian testing such as the dependence on prior distributions and difficulties interpreting Bayesian measures like posterior probabilities and Bayes factors. It discusses how Bayesian testing compares models rather than identifying a single true model. Several solutions to challenges are discussed, like using Bayes factors which eliminate the dependence on prior model probabilities but introduce other issues. The document also discusses testing under specific models like comparing a point null hypothesis to alternatives. Overall it presents both Bayesian and frequentist views on hypothesis testing and some of the open controversies in the field.
If you are looking for business statistics homework help, Statisticshelpdesk is your rightest destination. Our experts are capable of solving all grades of business statistics homework with best 100% accuracy and originality. We charge reasonable.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
This document provides an overview of ABC methodology and applications. It begins with examples from population genetics and econometrics that are well-suited for ABC. It then describes the basic ABC algorithm for Bayesian inference using simulation: specifying prior distributions, simulating data under different parameter values, and accepting simulations that best match the observed data. Indirect inference is also discussed as a method for choosing informative summary statistics for ABC. The document traces the origins of ABC to population genetics models from the late 1990s and highlights ongoing contributions from that field to ABC methodology.
An Introduction to Mis-Specification (M-S) Testingjemille6
This document provides an introduction to mis-specification (M-S) testing, which is a methodology for validating statistical models. It discusses how statistical misspecification can render statistical inference unreliable by distorting nominal error probabilities. As an example, it shows how violating the independence assumption in a normal model can increase the actual type I error rate and reduce power. It argues that model validation through M-S testing is important for ensuring reliable inference, but is often neglected due to misunderstandings. All statistical methods rely on an underlying statistical model, so any misspecification impacts reliability regardless of the method used.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
better together? statistical learning in models made of modulesChristian Robert
The document discusses statistical models composed of modular components called modules. Each module may be developed independently and represent different data modalities or domains of knowledge. Joint Bayesian updating treats all modules simultaneously but misspecification of one module can impact the others. Alternative approaches are proposed to allow uncertainty propagation between modules while preventing feedback that could lead to misspecification. Candidate distributions for the modules are discussed, along with strategies for choosing among them based on predictive performance.
better together? statistical learning in models made of modules
Similar to 2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilistic (Bayesian) Combination of Evidence (EDIT) - Jim Berger, September 11, 2018
This document proposes a method for linear regression on symbolic data where each observation is represented by a Gaussian distribution. It derives the likelihood function for such "Gaussian symbols" and shows that it can be maximized using gradient descent. Simulation results demonstrate that the maximum likelihood estimator performs better than a naive least squares regression on the mean of each symbol. The method extends classical linear regression to the symbolic data setting.
Stratified sampling and resampling for approximate Bayesian computationUmberto Picchini
Stratified Monte Carlo is proposed as a method to accelerate ABC-MCMC by reducing its computational cost. It involves partitioning the summary statistic space into strata and estimating the ABC likelihood using a stratified Monte Carlo approach based on resampling. This reduces the variance compared to using a single resampled dataset, without introducing significant bias as resampling alone would. The method is tested on a simple Gaussian example where it provides a posterior approximation closer to the true posterior than standard ABC-MCMC.
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
An important, and well studied, class of stochastic models is given by stochastic differential equations (SDEs). In this talk, we consider Bayesian inference based on measurements from several individuals, to provide inference at the "population level" using mixed-effects modelling. We consider the case where dynamics are expressed via SDEs or other stochastic (Markovian) models. Stochastic differential equation mixed-effects models (SDEMEMs) are flexible hierarchical models that account for (i) the intrinsic random variability in the latent states dynamics, as well as (ii) the variability between individuals, and also (iii) account for measurement error. This flexibility gives rise to methodological and computational difficulties.
Fully Bayesian inference for nonlinear SDEMEMs is complicated by the typical intractability of the observed data likelihood which motivates the use of sampling-based approaches such as Markov chain Monte Carlo. A Gibbs sampler is proposed to target the marginal posterior of all parameters of interest. The algorithm is made computationally efficient through careful use of blocking strategies, particle filters (sequential Monte Carlo) and correlated pseudo-marginal approaches. The resulting methodology is is flexible, general and is able to deal with a large class of nonlinear SDEMEMs [1]. In a more recent work [2], we also explored ways to make inference even more scalable to an increasing number of individuals, while also dealing with state-space models driven by other stochastic dynamic models than SDEs, eg Markov jump processes and nonlinear solvers typically used in systems biology.
[1] S. Wiqvist, A. Golightly, AT McLean, U. Picchini (2020). Efficient inference for stochastic differential mixed-effects models using correlated particle pseudo-marginal algorithms, CSDA, https://doi.org/10.1016/j.csda.2020.107151
[2] S. Persson, N. Welkenhuysen, S. Shashkova, S. Wiqvist, P. Reith, G. W. Schmidt, U. Picchini, M. Cvijovic (2021). PEPSDI: Scalable and flexible inference framework for stochastic dynamic single-cell models, bioRxiv doi:10.1101/2021.07.01.450748.
This document provides an introduction to Bayesian analysis and Metropolis-Hastings Markov chain Monte Carlo (MCMC). It explains the foundations of Bayesian analysis and how MCMC sampling methods like Metropolis-Hastings can be used to draw samples from posterior distributions that are intractable. The Metropolis-Hastings algorithm works by constructing a Markov chain with the target distribution as its stationary distribution. The document provides an example of using MCMC to perform linear regression in a Bayesian framework.
The document discusses Gaussian process emulators for approximating complex computer models or simulators. It describes how Gaussian process emulators work by modeling the simulator output as a Gaussian stochastic process. The emulator can then predict the simulator output at new input values and provide a measure of prediction accuracy. Key steps in building a Gaussian process emulator include designing runs of the simulator, fitting the emulator parameters using Bayesian methods rather than maximum likelihood to avoid issues, and obtaining closed-form expressions for the predictive distribution at new points. Gaussian process emulators have advantages like interpolation and calibrated predictive accuracy but also challenges like computational scaling with data size.
The document discusses using unusual data sources in insurance. It provides examples of using pictures, text, social media data, telematics, and satellite imagery in insurance. It also discusses challenges in analyzing complex and high-dimensional data from these sources and introduces machine learning tools like PCA, generalized linear models, and evaluating models using loss, risk, and cross-validation.
Predictive Modeling in Insurance in the context of (possibly) big dataArthur Charpentier
This document discusses predictive modeling in insurance in the context of big data. It begins with an introduction to the speaker and outlines some key concepts in actuarial science from both American and European perspectives. It then provides examples of common actuarial problems involving ratemaking, pricing, and claims reserving. The document reviews the history of actuarial models and discusses issues around statistical learning, machine learning, and their relationship to statistics. It also covers model evaluation and various loss functions used in modeling.
This document summarizes the Bayesian Additive Regression Trees (BART) model and the Monotone BART (MBART) extension. BART approximates an unknown function using an ensemble of regression trees with a regularization prior. It connects to ideas in Bayesian nonparametrics, dynamic random basis elements, and gradient boosting. The document outlines the BART MCMC algorithm and how it can provide automatic uncertainty quantification and variable selection. It then introduces MBART, which constrains trees to be monotonic, and describes an MCMC algorithm for fitting MBART models. Examples illustrate BART and MBART fits to simulated monotonic and non-monotonic functions.
The document discusses learning graphical models from data. It describes two main tasks: inference, which is computing answers to queries about a probability distribution described by a Bayesian network, and learning, which is estimating a model from data. It provides examples of learning for completely observed models, including maximum likelihood estimation for the parameters of a conditional Gaussian model. It also discusses supervised versus unsupervised learning of hidden Markov models, and techniques for dealing with small training sets like adding pseudocounts to estimates.
Slides by Alexander März:
The language of statistics is of probabilistic nature. Any model that falls short of providing quantification of the uncertainty attached to its outcome is likely to provide an incomplete and potentially misleading
picture. While this is an irrevocable consensus in statistics, machine
learning approaches usually lack proper ways of quantifying uncertainty. In fact, a possible distinction between the two modelling cultures can be
attributed to the (non)-existence of uncertainty estimates that allow for,
e.g., hypothesis testing or the construction of estimation/prediction
intervals. However, quantification of uncertainty in general and
probabilistic forecasting in particular doesn’t just provide an average
point forecast, but it rather equips the user with a range of outcomes and the probability of each of those occurring.
In an effort of bringing both disciplines closer together, the audience is
introduced to a new framework of XGBoost that predicts the entire
conditional distribution of a univariate response variable. In particular,
XGBoostLSS models all moments of a parametric distribution (i.e., mean,
location, scale and shape [LSS]) instead of the conditional mean only.
Choosing from a wide range of continuous, discrete and mixed
discrete-continuous distribution, modelling and predicting the entire
conditional distribution greatly enhances the flexibility of XGBoost, as it
allows to gain additional insight into the data generating process, as well
as to create probabilistic forecasts from which prediction intervals and
quantiles of interest can be derived. As such, XGBoostLSS contributes to
the growing literature on statistical machine learning that aims at
weakening the separation between Breiman‘s „Data Modelling Culture“ and „Algorithmic Modelling Culture“, so that models designed mainly for
prediction can also be used to describe and explain the underlying data
generating process of the response of interest.
EVT can model extreme events with low probabilities of occurrence. This paper describes EVT techniques and applies them to model extreme daily returns of IBM, Ford, and Nortel stock prices. Specifically, it shows that the Generalized Pareto Distribution (GPD) can appropriately model extreme daily losses. The GPD is estimated from the data and used to calculate Value-at-Risk and Expected Shortfall risk measures for the stock returns.
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
This document presents an investigation of parameter estimation for the generalized extreme value distribution based on record values. Maximum likelihood estimation is used to estimate the parameters β (scale parameter) and ξ (shape parameter). Likelihood equations are derived and solved numerically. Bootstrap and Markov chain Monte Carlo methods are proposed to construct confidence intervals for the parameters since intervals based on asymptotic normality may not perform well due to small sample sizes of records. Bayesian estimation of the parameters using MCMC is also investigated. An illustrative example involving simulated records is provided.
Considerate Approaches to ABC Model SelectionMichael Stumpf
The document discusses using approximate Bayesian computation (ABC) for model selection when directly evaluating likelihoods is computationally intractable, noting that ABC involves simulating data from models and comparing simulated and observed summary statistics, and that constructing minimally sufficient summary statistics is important for accurate ABC model selection.
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
Aris Spanos (Wilson Schmidt Professor of Economics, Virginia Tech)
ABSTRACT: The discussion places the two cultures, the model-driven statistical modeling and the
algorithm-driven modeling associated with Machine Learning (ML) and Statistical
Learning Theory (SLT) in a broader context of paradigm shifts in 20th-century statistics,
which includes Fisher’s model-based induction of the 1920s and variations/extensions
thereof, the Data Science (ML, STL, etc.) and the Graphical Causal modeling in the
1990s. The primary objective is to compare and contrast the effectiveness of different
approaches to statistics in learning from data about phenomena of interest and relate
that to the current discussions pertaining to the statistics wars and their potential
casualties.
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
This document discusses K-means clustering, Expectation Maximization (EM), and Gaussian mixture models (GMM). It begins with an overview of unsupervised learning and introduces K-means as a simple clustering algorithm. It then describes EM as a general algorithm for maximum likelihood estimation that can be applied to problems like GMM. GMM is presented as a density estimation technique that models data using a weighted sum of Gaussian distributions. EM is described as a method for estimating the parameters of a GMM from data.
The document discusses various methods for modeling input distributions in simulation models, including trace-driven simulation, empirical distributions, and fitting theoretical distributions to real data. It provides examples of several continuous and discrete probability distributions commonly used in simulation, including the exponential, normal, gamma, Weibull, binomial, and Poisson distributions. Key parameters and properties of each distribution are defined. Methods for selecting an appropriate input distribution based on summary statistics of real data are also presented.
This document discusses unsupervised learning techniques for clustering data, specifically K-Means clustering and Gaussian Mixture Models. It explains that K-Means clustering groups data by assigning each point to the nearest cluster center and iteratively updating the cluster centers. Gaussian Mixture Models assume the data is generated from a mixture of Gaussian distributions and uses the Expectation-Maximization algorithm to estimate the parameters of the mixture components.
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
This document provides an overview of Frank Nielsen's talk on pattern learning and recognition using information geometry and statistical manifolds. The talk focuses on departing from vector space representations and dealing with (dis)similarities that do not have Euclidean or metric properties. This poses new theoretical and computational challenges for pattern recognition. The talk describes using exponential family mixture models defined on dually flat statistical manifolds induced by convex functions. On these manifolds, dual coordinate systems and dual affine geodesics allow for computing-friendly representations of divergences and similarities between probabilistic patterns. The techniques aim to achieve statistical invariance and enable algorithmic approaches to problems like Gaussian mixture modeling, shape retrieval, and diffusion tensor imaging analysis.
Similar to 2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilistic (Bayesian) Combination of Evidence (EDIT) - Jim Berger, September 11, 2018 (20)
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
1) The document presents a statistical modeling approach called targeted smooth Bayesian causal forests (tsbcf) to smoothly estimate heterogeneous treatment effects over gestational age using observational data from early medical abortion regimens.
2) The tsbcf method extends Bayesian additive regression trees (BART) to estimate treatment effects that evolve smoothly over gestational age, while allowing for heterogeneous effects across patient subgroups.
3) The tsbcf analysis of early medical abortion regimen data found the simultaneous administration to be similarly effective overall to the interval administration, but identified some patient subgroups where effectiveness may vary more over gestational age.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
This document discusses difference-in-differences (DiD) analysis, a quasi-experimental method used to estimate treatment effects. The author notes that while widely applicable, DiD relies on strong assumptions about the counterfactual. She recommends approaches like matching on observed variables between similar populations, thoughtfully specifying regression models to adjust for confounding factors, testing for parallel pre-treatment trends under different assumptions, and considering more complex models that allow for different types of changes over time. The overall message is that DiD requires careful consideration and testing of its underlying assumptions to draw valid causal conclusions.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
This document summarizes a simulation study evaluating causal inference methods for assessing the effects of opioid and gun policies. The study used real US state-level data to simulate the adoption of policies by some states and estimated the effects using different statistical models. It found that with fewer adopting states, type 1 error rates were too high, and most models lacked power. It recommends using cluster-robust standard errors and lagged outcomes to improve model performance. The study aims to help identify best practices for policy evaluation studies.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This document discusses various types of academic writing and provides tips for effective academic writing. It outlines common academic writing formats such as journal papers, books, and reports. It also lists writing necessities like having a clear purpose, understanding your audience, using proper grammar and being concise. The document cautions against plagiarism and not proofreading. It provides additional dos and don'ts for writing, such as using simple language and avoiding filler words. Overall, the key message is that academic writing requires selling your ideas effectively to the reader.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
More from The Statistical and Applied Mathematical Sciences Institute (20)
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
A Free 200-Page eBook ~ Brain and Mind Exercise.pptxOH TEIK BIN
(A Free eBook comprising 3 Sets of Presentation of a selection of Puzzles, Brain Teasers and Thinking Problems to exercise both the mind and the Right and Left Brain. To help keep the mind and brain fit and healthy. Good for both the young and old alike.
Answers are given for all the puzzles and problems.)
With Metta,
Bro. Oh Teik Bin 🙏🤓🤔🥰
SWOT analysis in the project Keeping the Memory @live.pptx
2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilistic (Bayesian) Combination of Evidence (EDIT) - Jim Berger, September 11, 2018
1. SAMSI Fall,2018
✬
✫
✩
✪
Issue arising in several Working Groups:
probabilistic (Bayesian) combination of evidence
A strong suggestion: Combine data from different sources probabilistically
(through Bayes theorem), not through adhoc methods such as averaging.
1
3. SAMSI Fall,2018
✬
✫
✩
✪
Bayesian combination of the three data sources:
• Let θ be the 12 × 30 matrix of annual rainfalls over the specified grid in
the USA.
• For data source i = 1, 2, 3, let Xi (12 × 30) be the corresponding
matrix of mean observed annual rainfalls.
• Let fi(Xi | θ, σ2
i ) be the probability density of the mean observations,
given θ, and the assumed known error variances σ2
i (12 × 30).
– For, instance, if the measurements in each gridpoint (j, k) from each
data source were independent and normally distributed,
fi(Xi | θ, σ2
i ) =
j,k
1
√
2π σi,j,k
e(xi,j,k−θj,k)2
/[2σ2
i,j,k]
.
– But this is not reasonable:
∗ Typically, observations close spatially have correlated errors.
(Ground station measurements might be an exception.)
∗ Then use models with correlated errors, e.g. Gaussian processes.
3
4. SAMSI Fall,2018
✬
✫
✩
✪
• Bayes theorem:
– If the different data sources arose from independent measurements,
– and the prior distribution for θ is π(θ),
∗ often the objective prior π(θ) = 1 would be used
∗ if the σi were partly unknown, they would also need a prior
distribution,
– then Bayes theorem yields that the posterior distribution of θ, given
the data is
π(θ | X1, X2, X3) ∝ π(θ)
3
i=1
fi(Xi | θ, σ2
i ) .
4
5. SAMSI Fall,2018
✬
✫
✩
✪
Things to note: From π(θ | X1, X2, X3), any question about θ can be
answered.
• Under the complete independence and normality assumption, the
estimate of θ would be the posterior mean, given coordinatewise by
ˆθj,k = θj,kπ(θ | X1, X2, X3) dθ =
3
i=1
σ−2
i,j,k
[
3
l=1 σ−2
l,j,k]
xi,j,k .
– Associated standard errors (the square roots of the posterior
variances) are available, as are confidence intervals for the θj,k.
• Correlations and non-normal distributions are handled in the same
way; the computations just may need MCMC.
• If the different data sources were given on different grids, there are
spatial upscaling and downscaling ways of handling it.
5
6. SAMSI Fall,2018
✬
✫
✩
✪
An Example with different types of densities:
• Data Source 1 observes independent X1i ∼ N(θ, σ2
), i = 1, . . . , n.
– So Data Source 1 has probability density
f1(x11, x12, . . . , x1n | θ, σ2
) =
n
i=1
1
√
2π σ
e(x1i−θ)2
/[2σ2
]
.
• The raw incoming information to Data Source 2 is X2i ∼ N(θ, σ2
),
i = 1, . . . , m, but it is an old measuring instrument that only records if
X2i > 0 (call this Yi = 1) or X2i < 0 (call this Yi = 0).
– P(Yi = 1) = P(X2i > 0 | θ) = 1 − Φ(− θ
σ ), where Φ is the normal cdf.
– So Data Source 2 has probability density
f2(y1, y2, . . . , ym | θ, σ2
) =
m
i=1
1 − Φ −
θ
σ
yi
Φ −
θ
σ
(1−yi)
.
– With a prior density π(θ, σ2
), the posterior density is
π(θ, σ2
| x, y) ∝ f1(x11, x12, . . . x1n | θ, σ2
)f2(y1, y2, . . . , ym | θ, σ2
)π(θ, σ2
) .
6
7. SAMSI Fall,2018
✬
✫
✩
✪
Combining dependent information sources:
Examples:
• Combining information from four Triangle weather forecasters.
• Combining information from climate models
• Combining information from market forecasters
•
...
The problem: If the dependent information sources, i = 1, 2, . . . , m, provide
posterior densities π∗
i (θ) for the unknown θ, combining them is hard:
• Averaging them (often called the linear opinion pool) is not
probabilistic, and will be bad if they are nearly independent.
• Multiplying the π∗
i (θ) together (often called the independent opinion
pool) is wrong if they are quite dependent.
7
8. SAMSI Fall,2018
✬
✫
✩
✪
Example: Suppose the four Triangle weather forecasters posterior
distributions, π∗
i (θ), were developed as
• π∗
i (θ) ∝ fi(xi | θ)π(θ), where π(θ) is the weather model forecast and
the fi(xi | θ) are independent recent local temperature measurements.
• Then the correct posterior distribution is π∗
(θ) ∝ π(θ) 4
i=1 fi(xi | θ)
(not 1
4
4
i=1 π∗
i (θ) or 4
i=1 π∗
i (θ)).
• You could find this, if you independently knew π(θ), since
π∗
(θ) ∝
4
i=1 π∗
i (θ)
π(θ)3
.
• Usually, however, you cannot untangle dependent information from
different sources.
8
10. SAMSI Fall,2018
✬
✫
✩
✪
Hierarchical Modeling:
• For the jth
pyroclastic flow at ith
volcano, model the relationship of the
observed basal friction angle bij to volume Vij by
log tan bij = ai + ci log Vij + ǫij, ǫij ∼ N(0, σ2
i ) .
• Assume that the ci from different volcanoes arise from a normal
distribution N(µc, σ2
c ).
• Assign objective priors to the ai, the σ2
i , and to µc, σ2
c :
– a constant prior for the ai and µc;
– the prior 1/σ2
i for the σ2
i ;
– the prior 1/σc for σ2
c .
• Find the posterior distribution of all unknown unknown parameters,
given the data {bij, i = 1, . . . , 5; j = 1, . . . , ni}.
10
12. SAMSI Fall,2018
✬
✫
✩
✪
Incorporation into the probabilistic risk assessment:
• From the posterior, generate a sample of (a, c) for Montserrat, using a
version of Markov Chain Monte Carlo (MCMC) called Gibbs sampling.
This yields the volume/bias curves below.
• Combine each curve with an emulator of TITAN2D to predict
probabilistic risk in Belham valley.
• The average, over the curves, of the risk is the reported risk.
0 5 10 15 20
8
10
12
14
16
18
Volume (x106
m3
)
BasalFriction(degrees)
50 replicates of φ vs. V at Montserrat
12
14. SAMSI Fall,2018
✬
✫
✩
✪
Outline
• Possible goals for Bayesian model uncertainty analysis
• Formulation and notation for Bayesian model uncertainty
• Model averaging
• Attractions of the Bayesian approach
• Challenges with the Bayesian approach
• Approaches to prior choice for model uncertainty
14
16. SAMSI Fall,2018
✬
✫
✩
✪
Goal 1. Selection of the true model from a collection of models that
contains the truth.
– decision-theoretically, often viewed as choice of 0-1 loss in model selection
Goal 2. Selection of a model good for prediction of future observations or
phenomena
– decision-theoretically, often viewed as choice of squared error loss for
predictions, or Kullback-Liebler divergence for predictive distributions
Goal 3. Finding a simple approximate model that is close enough to the
true model to be useful. (This is purely a utility issue.)
Goal 4. Finding important covariates, important relationships between
covariates, and other features of model structure. (Often, the model is
not of primary importance; it is the relationships of covariates in the
model, or whether certain covariates should even be in the model, that
are of interest.)
16
17. SAMSI Fall,2018
✬
✫
✩
✪
Caveat 1: Suppose the models under consideration do not contain the
true model, often called the open model scenario.
Positive fact: Bayesian analysis will asymptotically give probability one
to the model that is as close as possible to the true model (in Kullback
Leibler divergence), among the models considered, so the Bayesian
approach is still viable.
Issues of concern:
• Bayesian internal estimates of accuracy may be misleading (Barron,
2004).
• Model averaging may no longer be optimal.
• There is no obvious reason why models need even be ‘plausible’.
Caveat 2: There are two related but quite different statistical paradigms
involving model uncertainty (not discussed here):
• Model criticism or model checking.
• The process of model development.
17
19. SAMSI Fall,2018
✬
✫
✩
✪
Models (or hypotheses). Data, x, is assumed to have arisen from one of
several models:
M1: x has density f1(x | θ1)
M2: x has density f2(x | θ2)
...
Mq: x has density fq(x | θq)
Assign prior probabilities, P(Mi), to each model. Common is equal
probabilities
P(Mi) =
1
q
,
but this is inappropriate in many scenarios, such as variable selection
(discussed in Lecture 4).
19
20. SAMSI Fall,2018
✬
✫
✩
✪
Under model Mi :
– Prior density of θi: θi ∼ πi(θi)
– Marginal density of x:
mi(x) = fi(x | θi)πi(θi) dθi
measures “how likely is x under Mi”
– Posterior density of θi:
πi(θi | x) = π(θi | x, Mi) =
fi(x | θi)πi(θi)
mi(x)
quantifies (posterior) beliefs about θi if the true model was Mi.
20
21. SAMSI Fall,2018
✬
✫
✩
✪
Bayes factor of Mj to Mi:
Bji =
mj(x)
mi(x)
Posterior probability of Mi:
P(Mi | x) =
P(Mi)mi(x)
q
j=1 P(Mj)mj(x)
=
1 +
q
j=1
P(Mj)
P(Mi)
Bji
−1
Particular case : P(Mj) = 1/q :
P(Mi | x) = mi(x) =
mi(x)
q
j=1 mj(x)
=
1
q
j=1 Bji
Reporting : It is useful to separately report {mi(x)} and {P(Mi)}, along
with
P(Mi | x) =
P(Mi) mi(x)
q
j=1 P(Mj) mj(x)
.
21
22. SAMSI Fall,2018
✬
✫
✩
✪
Example: location-scale
Suppose X1, X2, . . . , Xn are i.i.d with density
f(xi | µ, σ) =
1
σ
g
xi − µ
σ
Several models are entertained:
MN : g is N(0, 1)
MU : g is Uniform (0, 1)
MC: g is Cauchy (0, 1)
ML: g is Left Exponential ( 1
σ e(x−µ)/σ
, x ≤ µ)
MR: g is Right Exponential ( 1
σ e−(x−µ)/σ
, x ≥ µ)
22
23. SAMSI Fall,2018
✬
✫
✩
✪
Difficulty: The models are not nested and have no common low
dimensional sufficient statistics; classical model selection would thus
typically rely on asymptotics.
Prior distribution: choose, for i = 1, . . . , 5,
P(Mi) = 1
q
= 1
5
;
utilize the objective prior πi(θi) = πi(µ, σ) = 1
σ
.
Objective improper priors (invariant) are o.k. for the common parameters
in situations having the same invariance structure, if the right-Haar
prior is used (Berger, Pericchi, and Varshavsky, 1998).
Marginal distributions, mN
(x | M), for these models can then be
calculated in closed form.
23
24. SAMSI Fall,2018
✬
✫
✩
✪
Here mN
(x | M) =
n
i=1
1
σ
g xi−µ
σ
1
σ
dµ dσ.
For the five models, the marginals are
1. Normal: mN
(x | MN ) =
Γ((n−1)/2)
(2 π)(n−1)/2
√
n ( i(xi−¯x)2
)
(n−1)/2
2. Uniform: mN
(x | MU ) = 1
n (n−1)(x(n)−x(1)) n−1
3. Cauchy: mN
(x | MC) is given in Spiegelhalter (1985).
4. Left Exponential: mN
(x | ML) =
(n−2)!
nn(x(n)−¯x) n−1
5. Right Exponential: mN
(x | MR) =
(n−2)!
nn(¯x−x(1)) n−1
24
25. SAMSI Fall,2018
✬
✫
✩
✪
Consider four classic data sets:
• Darwin’s data (n = 15)
• Cavendish’s data (n = 29)
• Stigler’s Data Set 9 (n = 20)
• A randomly generated Cauchy sample (n = 14)
25
29. SAMSI Fall,2018
✬
✫
✩
✪
Accounting for model uncertainty
• Selecting a single model and using it for inference
ignores model uncertainty, resulting in inferior inferences, and
considerable overstatements of accuracy.
• The Bayesian approach incorporates this uncertainty
by model averaging; if, say, inference concerning ξ is desired, it would
be based on
π(ξ | x) =
q
i=1
P(Mi | x) π(ξ | x , Mi).
Note: ξ must have the same meaning across models.
• a useful link, with papers, software, URL’s about BMA
http://www.research.att.com/∼volinsky/bma.html
29
30. SAMSI Fall,2018
✬
✫
✩
✪
Example: Vehicle emissions data from McDonald et. al. (1995)
Goal: From i.i.d. vehicle emission data X = (X1, . . . , Xn), one desires to
determine the probability that the vehicle type will meet regulatory
standards.
Traditional models for this type of data are Weibull and lognormal
distributions given, respectively, by
M1 : fW (x | β, γ) =
γ
β
x
β
γ−1
exp −
x
β
γ
M2 : fL(x | µ, σ2
) =
1
x
√
2πσ2
exp
−(log x − µ)2
2σ2
.
Model Averaging Analysis:
• For the studied data set, P(M1 | x) = .712. Hence,
P(meeting regulatory standard) = .712 P(meeting standard | M1)
+.288 P(meeting standard | M2) .
30
31. SAMSI Fall,2018
✬
✫
✩
✪
Model averaging most used for prediction:
Goal: Predict a future Y , given X
Optimal prediction is based on model averaging (cf, Draper, 1994). If
one of the Mi is indeed the true model (but unknown), the optimal
predictor is based on
m(y | x) =
k
i=1
P(Mi | x)mi(y | x, Mi) ,
where
mi(y | x, Mi) = fi(y | θi)πi(θi | x)dθi
is the predictive distribution when Mi is true.
31
33. SAMSI Fall,2018
✬
✫
✩
✪
Reason 1: Ease of interpretation
• Bayes factors ❀ odds
Posterior model probabilities ❀ direct interpretation
• In the location example and Darwin’s data, the posterior model
probabilities are mN (x) = .390, mU (x) = .056,
mC(x) = .430, ¯mL(x) = .124, mR(x) = .0001,
Bayes factors are BNU = 6.96, BNC = .91, BNL = 3.15, BNR = 3900.
• Interpreting five p-values (or 20, if all pairwise tests are done with
different nulls) is much harder.
33
34. SAMSI Fall,2018
✬
✫
✩
✪
Reason 2: Prior information can be incorporated, if desired
• If the default report is mi(x), i = 1, . . . , q, then for any prior
probabilities
P(Mi | x) =
P(Mi) mi(x)
q
j=1 P(Mj) mj(x)
Example: In the Darwin’s data, if symmetric models are twice as likely
as the non-symmetric, then
Pr(MN ) = Pr(MU ) = Pr(MC) = 1/4 and Pr(ML) = Pr(MR) = 1/8,
so that
Pr(MN | x) = .416, Pr(MU | x) = .06, Pr(MC | x) = .458,
Pr(ML | x) = .066, Pr(MR | x) = .00005.
34
35. SAMSI Fall,2018
✬
✫
✩
✪
Reason 3: Consistency
A. If one of the Mi is true, then ¯mi → 1 as n → ∞
B. If some other model, M∗
(not in the original collection of models), is
true, ¯mi → 1 for that model closest to M∗
in Kullback-Leibler
divergence
(Berk, 1966, Dmochowski, 1994)
Reason 4: Ockham’s razor
Bayes Factors automatically seek parsimony; no adhoc penalties for model
complexity are needed.
35
36. SAMSI Fall,2018
✬
✫
✩
✪
Reason 5: Frequentist interpretation
In many important situations involving testing of M1 versus M2,
B12/(1 + B12) and 1/(1 + B12) are ‘optimal’ conditional frequentist error
probabilities of Type I and Type II, respectively. Indeed, Pr(H0 | data x)
is the frequentist type I error probability, conditional on observing data of
the same “strength of evidence” as the actual data x. (Classical
unconditional error probabilities make the mistake of reporting the error
averaged over data of very different strengths.)
See Sellke, Bayarri and Berger (2001) for discussion and earlier references.
36
37. SAMSI Fall,2018
✬
✫
✩
✪
Reason 6: The Ability to Account for Model Uncertainty through Model
Averaging
• For non-Bayesians, it is highly problematical to perform inference from
a model based on the same data used to select the model.
– It is not legitimate to do so in the frequentist paradigm.
– It is very hard to determine how to ‘correct’ the inferences (and
known methods are often extremely conservative).
• Bayesian inference based on model averaging overcomes this problem.
37
38. SAMSI Fall,2018
✬
✫
✩
✪
Reason 7: Generality of application
• Bayesian approach is viable for any number of models, regular or
irregular, nested or not, large or small sample sizes.
• Classical approach is most developed for only two models (hypotheses),
and typically requires at least one of: (i) nested models, (ii) standard
distributions, (iii) regular asymptotics.
• Example 1: In the location-scale example, there were 5 non-nested
models, with small sample sizes, irregular asymptotics, and no reduced
sufficient statistics.
• Example 2: Suppose the Xi are i.i.d from the N(θ, 1) density, where θ =
“mean effect of T1 - mean effect of T2.”
Standard Testing Formulation:
H0 : θ = 0 (no difference in treatments) vs. H1 : θ = 0 (a difference exists)
A More Revealing Formulation:
H0 : θ = 0 (no difference) vs. H1 : θ < 0 ( T2 is better) vs. H2 : θ > 0 (
T1 is better)
38
39. SAMSI Fall,2018
✬
✫
✩
✪
Reason 8: Standard Bayesian ‘ease of application’ motivations
1. In sequential scenarios, there is no need to ‘spend α’
for looks at the data; posterior probabilities are not affected by the
reason for stopping experimentation (discussed later).
2. The distributions of known censoring variables are
not needed for Bayesian analysis (not discussed).
3. Multiple testing problems can be addressed in a straightforward
manner (not discussed).
39
41. SAMSI Fall,2018
✬
✫
✩
✪
Difficulty 1: Inadequacy of common priors, such as (improper)
objective priors, vague proper priors, and conjugate priors.
As in hypothesis testing, improper priors can only be used for “common
parameters” in models and vague proper priors are usually horrible!
The problem with conjugate priors:
Normal Example: Suppose X1, . . . , Xn are i.i.d. N(x | θ, σ2
) and it is
desired to test H0 : θ = 0 versus H1 : θ = 0 .
• The natural conjugate priors are
π1(σ2
) = 1/σ2
; π2(θ | σ2
) = N(θ | 0, σ2
) and π2(σ2
) = 1/σ2
.
• Bayes factor: B12 =
√
n + 1 1+t2
/[(n−1)(n+1)]
1+t2/(n−1)
n/2
, where t =
√
n¯x
s/
√
n−1
.
• Silly behavior, called ‘information inconsistency’:
as |t| → ∞, B12 → (n + 1)−(n−1)/2
> 0
For instance: if n = 3, B12 → 1/4
if n = 5, B12 → 1/36
41
42. SAMSI Fall,2018
✬
✫
✩
✪
Difficulty 2: Computation
Computation of the needed marginal distributions (or their ratios) can be
very challenging in high dimensions. (A useful approximation is
discussed later.)
The total number of models under consideration can be enormous (e.g., in
variable selection), so good search strategies are needed.
Difficulty 3: Posterior Inference
When thousands (or millions) of models all have similarly high posterior
probability (as is common), how does one summarize the information
contained therein?
Model averaging summaries are needed, but summaries of what quantities?
(Lecture 4.)
42
43. SAMSI Fall,2018
✬
✫
✩
✪
VI. Approaches to prior choice for model
uncertainty
• Use of ‘conventional priors’ (Lecture 4)
• Use of intrinsic priors (discussed for hypothesis testing in Lecture 1)
• Inducing model priors from a single prior
43
44. SAMSI Fall,2018
✬
✫
✩
✪
Inducing Model Priors from a Single Prior
The idea:
• Specify a prior πL(βL) for the ‘largest’ model.
• Use this prior to induce priors on the other models. Possibilities include
– In variable selection, conditioning by setting variables to be removed to
zero. (Logically sound if variables are orthogonal)
– Marginalizing out the variables to be removed. (Not logically sound)
– Matching model parameters for other models with the parameters of the
largest model by, say, minimizing Kullback-Leibler divergence between
models, and then projecting πL(βL). (Probably best, but hard)
44
45. SAMSI Fall,2018
✬
✫
✩
✪
An example: Suppose the model parameters are probabilities, with the
largest model having probabilities p1, . . . , pm, with prior
πL(p1, . . . , pm) ∼ Dirichlet(1, . . . , 1) (i.e., the uniform distribution on the
simplex such that m
i=1 pi = 1).
If model Ml has parameters (pi1 , . . . , pil
),
• conditioning yields
π(pi1
, . . . , pil
, p∗
| other pj = 0) = Dirichlet(1, . . . , 1), where
p∗
= 1 −
l
j=1 pij .
This is sensible.
• marginalizing yields π(pi1 , . . . , pil
, p∗
) = Dirichlet(1, . . . , 1, m − l).
This is typically much too concentrated near zero when m is large,
with the pik
having means 1/m.
45
46. SAMSI Fall,2018
✬
✫
✩
✪
Example. Posterior Model Probabilities for Variable Selection in
Probit Regression
(from CMU Case Studies VII, Viele et. al. case study)
Motivating example: Prosopagnosia (face blindness), is a condition
(usually developing after brain trauma) under which the individual cannot
easily distinguish between faces. A psychological study was conducted by
Tarr, Behrmann and Gauthier to address whether this extended to a
difficulty of distinguishing other objects, or was particular to faces.
The study considered 30 control subjects (C) and 2 subjects (S) diagnosed
as having prosopagnosia, and had them try to differentiate between similar
faces, similar ‘Greebles’ and similar ‘objects’ at varying levels of difficulty
and varying comparison time.
46
47. SAMSI Fall,2018
✬
✫
✩
✪
Data: Sample size was n = 20, 083, the response and covariates being
C = Subject C
S = Subject S
G = Greebles
O = Object
D = Difficulty
B = Brief time
A = Images match or not
⇒ R = Response (answer correct or not).
All variables are binary. All interactions, consisting of all combinations of
C, S, G, O, D, B, A (except{C=S=1} and {G=O=1}), are of interest. There are 3
possible subjects (a control and the two patients), 3 possible stimuli (faces,
objects, or greebles), and 2 possibilities for each of D, B and A, yielding
3 × 3 × 2 × 2 × 2 = 72 possible interactions.
47
48. SAMSI Fall,2018
✬
✫
✩
✪
Statistical modeling: For a specified interaction vector Xi, let yi and
ni − yi be the numbers of successes and failures among the responses with
that interaction vector, with probability of success pi assumed to follow the
probit regression model
pi = Φ(β1 +
72
j=2 Xijβj),
with Φ being the normal cdf. The full model likelihood (up to a fixed
proportionality constant) is then
f(y | β) =
72
i=1 pyi
i (1 − pi)ni−yi
.
Goal: Select from among the 272
submodels which have some of the βj set
equal to zero. (Actually, only models with graphical structure were
considered, i.e., if an interaction term is in the model, all the lower order
effects must also be there.)
48
49. SAMSI Fall,2018
✬
✫
✩
✪
Prior Choice: Conditionally inducing submodel priors from a full-model
prior
• A standard noninformative prior for p = (p1, p2, . . . , p72) is the uniform
prior, usable here since it is proper.
• Changing variables from p to β yields πL(β) is N72(0, (X′
X)−1
),
where X′
= (X′
1, X′
2, . . . , X′
72).
This is the full model prior.
• For the model with a nonzero subvector β(j) of dimension kj, with the
remaining βi = 0, the induced prior would be
πL(β(j) | β(−j) = 0) = Nkj (0, (X′
(j)X(j))−1
) ,
where β(−j) is the subvector of parameters that are zero and X(j) is
the corresponding design matrix. (This is a nice conditional property
of normal distributions.)
49