This document proposes representing hypothesis testing problems as estimating mixture models. Specifically, two competing models are embedded within an encompassing mixture model with a weight parameter between 0 and 1. Inference is then drawn on the mixture representation, treating each observation as coming from the mixture model. This avoids difficulties with traditional Bayesian testing approaches like computing marginal likelihoods. It also allows for a more intuitive interpretation of the weight parameter compared to posterior model probabilities. The weight parameter can be estimated using standard mixture estimation algorithms like Gibbs sampling or Metropolis-Hastings. Several illustrations of the approach are provided, including comparisons of Poisson and geometric distributions.
Discussion of Persi Diaconis' lecture at ISBA 2016Christian Robert
This document discusses Monte Carlo methods for numerical integration and estimating normalizing constants. It summarizes several approaches: estimating normalizing constants using samples; reverse logistic regression for estimating constants in mixtures; Xiao-Li's maximum likelihood formulation for Monte Carlo integration; and Persi's probabilistic numerics which provide uncertainties for numerical calculations. The document advocates first approximating the distribution of an integrand before estimating its expectation to incorporate non-parametric information and account for multiple estimators.
This document summarizes a presentation on testing hypotheses as mixture estimation and the challenges of Bayesian testing. The key points are:
1) Bayesian hypothesis testing faces challenges including the dependence on prior distributions, difficulties interpreting Bayes factors, and the inability to use improper priors in most situations.
2) Testing via mixtures is proposed as a paradigm shift that frames hypothesis testing as a model selection problem involving mixture models rather than distinct hypotheses.
3) Traditional Bayesian testing using Bayes factors and posterior probabilities depends strongly on prior distributions and choices that are difficult to justify, while not providing measures of uncertainty around decisions. Alternative approaches are needed to address these issues.
This document discusses Bayesian hypothesis testing and some of the challenges associated with it. It makes three key points:
1) There is tension between using posterior probabilities from a loss function approach versus Bayes factors, which eliminate prior dependence but have no direct connection to the posterior.
2) Bayesian hypothesis testing relies on choosing prior probabilities for hypotheses and prior distributions for parameters, which can strongly impact results and are often arbitrary.
3) Common Bayesian testing procedures like using Bayes factors can produce paradoxical results in some cases, like Lindley's paradox where the Bayes factor favors the null hypothesis as sample size increases despite evidence against it.
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
The document discusses hypothesis testing from both frequentist and Bayesian perspectives. It introduces the concept of statistical tests as functions that output accept or reject decisions for hypotheses. P-values are presented as a way to quantify uncertainty in these decisions. Bayes' original 1763 paper on Bayesian statistics is summarized, introducing the concept of the posterior distribution. Bayesian hypothesis testing is then discussed, including the optimal Bayes test and the use of Bayes factors to compare hypotheses without requiring prior probabilities on the hypotheses.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
This lecture introduces Bayesian hypothesis testing. It discusses an example comparing HIV infection rates between a treatment and placebo group. A Bayesian analysis is presented that calculates posterior probabilities for the null and alternative hypotheses using prior probabilities and Bayes factors. The lecture outlines general notation for Bayesian testing and discusses issues like choosing prior distributions and testing precise versus imprecise hypotheses. It also discusses interpreting Bayes factors and relates posterior probabilities to p-values in some cases.
Discussion of Persi Diaconis' lecture at ISBA 2016Christian Robert
This document discusses Monte Carlo methods for numerical integration and estimating normalizing constants. It summarizes several approaches: estimating normalizing constants using samples; reverse logistic regression for estimating constants in mixtures; Xiao-Li's maximum likelihood formulation for Monte Carlo integration; and Persi's probabilistic numerics which provide uncertainties for numerical calculations. The document advocates first approximating the distribution of an integrand before estimating its expectation to incorporate non-parametric information and account for multiple estimators.
This document summarizes a presentation on testing hypotheses as mixture estimation and the challenges of Bayesian testing. The key points are:
1) Bayesian hypothesis testing faces challenges including the dependence on prior distributions, difficulties interpreting Bayes factors, and the inability to use improper priors in most situations.
2) Testing via mixtures is proposed as a paradigm shift that frames hypothesis testing as a model selection problem involving mixture models rather than distinct hypotheses.
3) Traditional Bayesian testing using Bayes factors and posterior probabilities depends strongly on prior distributions and choices that are difficult to justify, while not providing measures of uncertainty around decisions. Alternative approaches are needed to address these issues.
This document discusses Bayesian hypothesis testing and some of the challenges associated with it. It makes three key points:
1) There is tension between using posterior probabilities from a loss function approach versus Bayes factors, which eliminate prior dependence but have no direct connection to the posterior.
2) Bayesian hypothesis testing relies on choosing prior probabilities for hypotheses and prior distributions for parameters, which can strongly impact results and are often arbitrary.
3) Common Bayesian testing procedures like using Bayes factors can produce paradoxical results in some cases, like Lindley's paradox where the Bayes factor favors the null hypothesis as sample size increases despite evidence against it.
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
The document discusses hypothesis testing from both frequentist and Bayesian perspectives. It introduces the concept of statistical tests as functions that output accept or reject decisions for hypotheses. P-values are presented as a way to quantify uncertainty in these decisions. Bayes' original 1763 paper on Bayesian statistics is summarized, introducing the concept of the posterior distribution. Bayesian hypothesis testing is then discussed, including the optimal Bayes test and the use of Bayes factors to compare hypotheses without requiring prior probabilities on the hypotheses.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
This lecture introduces Bayesian hypothesis testing. It discusses an example comparing HIV infection rates between a treatment and placebo group. A Bayesian analysis is presented that calculates posterior probabilities for the null and alternative hypotheses using prior probabilities and Bayes factors. The lecture outlines general notation for Bayesian testing and discusses issues like choosing prior distributions and testing precise versus imprecise hypotheses. It also discusses interpreting Bayes factors and relates posterior probabilities to p-values in some cases.
The document describes a course on model uncertainty taught in the fall of 2018. It covers topics like statistical and mathematical model uncertainty, Bayesian hypothesis testing and model uncertainty, priors for Bayesian model uncertainty, approximations and computation, model inputs and outputs, model calibration, Gaussian processes, surrogate models, sensitivity analysis, and model discrepancy. The course is taught over 12 weeks by two lecturers and includes weekly topics like introduction to uncertainty, Bayesian analysis, representation of inputs/outputs, calibration, Gaussian processes, surrogate models, sampling techniques, and sensitivity analysis.
This document discusses Bayesian approaches to combining evidence from multiple data sources or models. It recommends combining data probabilistically using Bayes' theorem rather than averaging. It provides an example of combining three data sources on annual rainfall measurements by treating the data sources as independent measurements and deriving the posterior distribution of rainfall amounts given the data. It also discusses challenges that arise when combining dependent data sources or models, and presents examples of hierarchical modeling approaches.
This document provides lecture notes on hypothesis testing. It begins with an introduction to hypothesis testing and how it differs from estimation in its hypothetical reasoning approach. It then discusses Fisher's significance testing approach, including defining a test statistic, its sampling distribution under the null hypothesis, and calculating a p-value. It provides examples of applying this approach. Finally, it discusses some weaknesses of Fisher's approach identified by Neyman and Pearson and how their approach improved upon it by introducing the concept of alternative hypotheses and pre-data error probabilities.
An Introduction to Mis-Specification (M-S) Testingjemille6
This document provides an introduction to mis-specification (M-S) testing, which is a methodology for validating statistical models. It discusses how statistical misspecification can render statistical inference unreliable by distorting nominal error probabilities. As an example, it shows how violating the independence assumption in a normal model can increase the actual type I error rate and reduce power. It argues that model validation through M-S testing is important for ensuring reliable inference, but is often neglected due to misunderstandings. All statistical methods rely on an underlying statistical model, so any misspecification impacts reliability regardless of the method used.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
This document summarizes lecture notes on generative learning algorithms from CS229. It begins by explaining the difference between discriminative algorithms like logistic regression that directly model p(y|x), versus generative algorithms that instead model the distributions p(x|y) and p(y). It then introduces Gaussian discriminant analysis (GDA) as the first generative algorithm discussed. GDA models p(x|y=0) and p(x|y=1) as multivariate Gaussian distributions, allowing it to estimate the parameters to build models of each class. To classify new points, it compares how well they match each class model.
This document discusses regression with frailty in survival analysis using the Cox proportional hazards model. It introduces survival analysis concepts like the hazard function and survival function. It then describes how to incorporate frailty, a random effect, into the Cox model to account for clustering in survival times. The Newton-Raphson method is used to estimate model parameters by maximizing the penalized partial likelihood. A simulation study applies this approach to data on infections in kidney patients.
Multiple Imputation of Missing Blood Pressure Covariates in Survival Analysis examines how to handle missing blood pressure (BP) measurements in survival analysis. BP was not measured for about 12.5% of participants. Those with missing BP had higher mortality rates, potentially distorting the influence of BP on survival. The study uses multiple imputation to generate plausible BP values for participants with missing data and pools results from multiple imputed datasets to account for uncertainty in imputed values.
This document discusses orthogonalization methods for fast Bayesian model selection in linear regression. It explores using techniques like principal component analysis (PCA) and a novel decorrelation method (DECO) to orthogonalize the Gram matrix XT X. This orthogonalization aims to enable scalable Bayesian model selection and averaging when the number of predictors p is large, by making the computations more efficient. The document also considers situations where p is both small and large.
1. The document discusses categorical data analysis and goodness-of-fit tests. It introduces concepts such as univariate categorical data, expected counts, the chi-square test statistic, and assumptions of the chi-square test.
2. An example analyzes faculty status data from a university using a goodness-of-fit test to determine if the proportions are equal across categories. The test fails to reject the null hypothesis that the proportions are equal.
3. Tests for homogeneity and independence in two-way tables are described. Examples calculate expected counts and perform chi-square tests to compare populations' category proportions.
This document discusses various machine learning techniques including:
1. Tree pruning involves first growing a large tree and then pruning branches that do not improve the objective function. This prevents early stopping.
2. Boosting uses multiple weak learners sequentially to get an additive model that approximates the regression function. It combines many simple models to create a powerful ensemble model.
3. Unsupervised learning techniques like principal component analysis and clustering are used to find patterns in data without an outcome variable. These include reducing dimensions and partitioning data into subgroups.
Regularization and variable selection via elastic netKyusonLim
The document summarizes the Elastic Net regularization method for variable selection in datasets with more predictors than observations (p > n). It describes how the Elastic Net overcomes limitations of LASSO and Ridge Regression by performing automatic variable selection, continuous shrinkage, and selecting groups of correlated predictors. The Naive Elastic Net formulation is presented, along with how it relates to LASSO and Ridge penalties. Computational details of the Elastic Net, including the LARS-EN algorithm and simulations, are discussed.
BlUP and BLUE- REML of linear mixed modelKyusonLim
This document discusses linear mixed models and the estimation methods BLUP and BLUE. It provides an introduction to random and fixed effects, as well as the mixed model equations used to derive BLUP and BLUE simultaneously. BLUP provides the best linear unbiased predictions of random effects, while BLUE gives the best linear unbiased estimates of fixed effects. The document also provides an example using the orthodontic growth data set to demonstrate fitting a linear mixed model and estimate variance components with REML.
The document discusses bivariate distribution and correlation. It defines bivariate distribution as a distribution where each individual or unit of a set assumes two values, relating to two different variables. Correlation analyzes the relationship between two variables in a bivariate distribution. There are different types of correlation like positive, negative, no correlation, perfect correlation and weak/strong correlation. The coefficient of correlation, calculated using Pearson's method, measures the degree of association between two related variables. Regression analysis involves predicting the value of one variable based on the known value of another variable if they are significantly correlated.
This document provides an introduction to statistical model selection. It discusses various approaches to model selection including predictive risk, Bayesian methods, information theoretic measures like AIC and MDL, and adaptive methods. The key goals of model selection are to understand the bias-variance tradeoff and select models that offer the best guaranteed predictive performance on new data. Model selection aims to find the right level of complexity to explain patterns in available data while avoiding overfitting.
This document discusses methods for comparing two population or treatment means, including notation, hypothesis tests, and confidence intervals. Key points covered include:
1) Notation for comparing two means includes the sample size, mean, variance, and standard deviation for each population or treatment.
2) Hypothesis tests for comparing two means can use a z-test if the population standard deviations are known, or a two-sample t-test if the standard deviations are unknown.
3) Confidence intervals can be constructed for the difference between two population means using a t-distribution, assuming independent random samples of sufficient size or approximately normal populations.
A Geometric Note on a Type of Multiple Testing-07-24-2015Junfeng Liu
This document summarizes new perspectives on false discovery rate (FDR) control procedures for multiple testing. It examines FDR control using linear and quadratic rejection cut-off routes applied to ordered p-values. Key findings include: 1) the FDR is controlled at π0q regardless of where the Ha p-value profile crosses the no-rejection boundary, 2) specificity approaches limits as the Ha mean increases, 3) quadratic cuts control FDR better when Ha means are close to zero. Numerical simulations explore the impact of factors like population size, variation levels, and mean profiles on discovery rates and FDR.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. ABC can be framed as a machine learning problem where simulated datasets are used to learn which model is most appropriate. Random forests are well-suited for this as they can handle many correlated summary statistics without information loss. The random forest predicts the most likely model but not posterior probabilities. Instead, the posterior predictive expected error rate across models is proposed to evaluate model selection performance without unstable probability approximations. An example comparing MA(1) and MA(2) time series models illustrates the approach.
The document describes a course on model uncertainty taught in the fall of 2018. It covers topics like statistical and mathematical model uncertainty, Bayesian hypothesis testing and model uncertainty, priors for Bayesian model uncertainty, approximations and computation, model inputs and outputs, model calibration, Gaussian processes, surrogate models, sensitivity analysis, and model discrepancy. The course is taught over 12 weeks by two lecturers and includes weekly topics like introduction to uncertainty, Bayesian analysis, representation of inputs/outputs, calibration, Gaussian processes, surrogate models, sampling techniques, and sensitivity analysis.
This document discusses Bayesian approaches to combining evidence from multiple data sources or models. It recommends combining data probabilistically using Bayes' theorem rather than averaging. It provides an example of combining three data sources on annual rainfall measurements by treating the data sources as independent measurements and deriving the posterior distribution of rainfall amounts given the data. It also discusses challenges that arise when combining dependent data sources or models, and presents examples of hierarchical modeling approaches.
This document provides lecture notes on hypothesis testing. It begins with an introduction to hypothesis testing and how it differs from estimation in its hypothetical reasoning approach. It then discusses Fisher's significance testing approach, including defining a test statistic, its sampling distribution under the null hypothesis, and calculating a p-value. It provides examples of applying this approach. Finally, it discusses some weaknesses of Fisher's approach identified by Neyman and Pearson and how their approach improved upon it by introducing the concept of alternative hypotheses and pre-data error probabilities.
An Introduction to Mis-Specification (M-S) Testingjemille6
This document provides an introduction to mis-specification (M-S) testing, which is a methodology for validating statistical models. It discusses how statistical misspecification can render statistical inference unreliable by distorting nominal error probabilities. As an example, it shows how violating the independence assumption in a normal model can increase the actual type I error rate and reduce power. It argues that model validation through M-S testing is important for ensuring reliable inference, but is often neglected due to misunderstandings. All statistical methods rely on an underlying statistical model, so any misspecification impacts reliability regardless of the method used.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
This document summarizes lecture notes on generative learning algorithms from CS229. It begins by explaining the difference between discriminative algorithms like logistic regression that directly model p(y|x), versus generative algorithms that instead model the distributions p(x|y) and p(y). It then introduces Gaussian discriminant analysis (GDA) as the first generative algorithm discussed. GDA models p(x|y=0) and p(x|y=1) as multivariate Gaussian distributions, allowing it to estimate the parameters to build models of each class. To classify new points, it compares how well they match each class model.
This document discusses regression with frailty in survival analysis using the Cox proportional hazards model. It introduces survival analysis concepts like the hazard function and survival function. It then describes how to incorporate frailty, a random effect, into the Cox model to account for clustering in survival times. The Newton-Raphson method is used to estimate model parameters by maximizing the penalized partial likelihood. A simulation study applies this approach to data on infections in kidney patients.
Multiple Imputation of Missing Blood Pressure Covariates in Survival Analysis examines how to handle missing blood pressure (BP) measurements in survival analysis. BP was not measured for about 12.5% of participants. Those with missing BP had higher mortality rates, potentially distorting the influence of BP on survival. The study uses multiple imputation to generate plausible BP values for participants with missing data and pools results from multiple imputed datasets to account for uncertainty in imputed values.
This document discusses orthogonalization methods for fast Bayesian model selection in linear regression. It explores using techniques like principal component analysis (PCA) and a novel decorrelation method (DECO) to orthogonalize the Gram matrix XT X. This orthogonalization aims to enable scalable Bayesian model selection and averaging when the number of predictors p is large, by making the computations more efficient. The document also considers situations where p is both small and large.
1. The document discusses categorical data analysis and goodness-of-fit tests. It introduces concepts such as univariate categorical data, expected counts, the chi-square test statistic, and assumptions of the chi-square test.
2. An example analyzes faculty status data from a university using a goodness-of-fit test to determine if the proportions are equal across categories. The test fails to reject the null hypothesis that the proportions are equal.
3. Tests for homogeneity and independence in two-way tables are described. Examples calculate expected counts and perform chi-square tests to compare populations' category proportions.
This document discusses various machine learning techniques including:
1. Tree pruning involves first growing a large tree and then pruning branches that do not improve the objective function. This prevents early stopping.
2. Boosting uses multiple weak learners sequentially to get an additive model that approximates the regression function. It combines many simple models to create a powerful ensemble model.
3. Unsupervised learning techniques like principal component analysis and clustering are used to find patterns in data without an outcome variable. These include reducing dimensions and partitioning data into subgroups.
Regularization and variable selection via elastic netKyusonLim
The document summarizes the Elastic Net regularization method for variable selection in datasets with more predictors than observations (p > n). It describes how the Elastic Net overcomes limitations of LASSO and Ridge Regression by performing automatic variable selection, continuous shrinkage, and selecting groups of correlated predictors. The Naive Elastic Net formulation is presented, along with how it relates to LASSO and Ridge penalties. Computational details of the Elastic Net, including the LARS-EN algorithm and simulations, are discussed.
BlUP and BLUE- REML of linear mixed modelKyusonLim
This document discusses linear mixed models and the estimation methods BLUP and BLUE. It provides an introduction to random and fixed effects, as well as the mixed model equations used to derive BLUP and BLUE simultaneously. BLUP provides the best linear unbiased predictions of random effects, while BLUE gives the best linear unbiased estimates of fixed effects. The document also provides an example using the orthodontic growth data set to demonstrate fitting a linear mixed model and estimate variance components with REML.
The document discusses bivariate distribution and correlation. It defines bivariate distribution as a distribution where each individual or unit of a set assumes two values, relating to two different variables. Correlation analyzes the relationship between two variables in a bivariate distribution. There are different types of correlation like positive, negative, no correlation, perfect correlation and weak/strong correlation. The coefficient of correlation, calculated using Pearson's method, measures the degree of association between two related variables. Regression analysis involves predicting the value of one variable based on the known value of another variable if they are significantly correlated.
This document provides an introduction to statistical model selection. It discusses various approaches to model selection including predictive risk, Bayesian methods, information theoretic measures like AIC and MDL, and adaptive methods. The key goals of model selection are to understand the bias-variance tradeoff and select models that offer the best guaranteed predictive performance on new data. Model selection aims to find the right level of complexity to explain patterns in available data while avoiding overfitting.
This document discusses methods for comparing two population or treatment means, including notation, hypothesis tests, and confidence intervals. Key points covered include:
1) Notation for comparing two means includes the sample size, mean, variance, and standard deviation for each population or treatment.
2) Hypothesis tests for comparing two means can use a z-test if the population standard deviations are known, or a two-sample t-test if the standard deviations are unknown.
3) Confidence intervals can be constructed for the difference between two population means using a t-distribution, assuming independent random samples of sufficient size or approximately normal populations.
A Geometric Note on a Type of Multiple Testing-07-24-2015Junfeng Liu
This document summarizes new perspectives on false discovery rate (FDR) control procedures for multiple testing. It examines FDR control using linear and quadratic rejection cut-off routes applied to ordered p-values. Key findings include: 1) the FDR is controlled at π0q regardless of where the Ha p-value profile crosses the no-rejection boundary, 2) specificity approaches limits as the Ha mean increases, 3) quadratic cuts control FDR better when Ha means are close to zero. Numerical simulations explore the impact of factors like population size, variation levels, and mean profiles on discovery rates and FDR.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. ABC can be framed as a machine learning problem where simulated datasets are used to learn which model is most appropriate. Random forests are well-suited for this as they can handle many correlated summary statistics without information loss. The random forest predicts the most likely model but not posterior probabilities. Instead, the posterior predictive expected error rate across models is proposed to evaluate model selection performance without unstable probability approximations. An example comparing MA(1) and MA(2) time series models illustrates the approach.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
This document discusses challenges and recent advances in Approximate Bayesian Computation (ABC) methods. ABC methods are used when the likelihood function is intractable or unavailable in closed form. The core ABC algorithm involves simulating parameters from the prior and simulating data, retaining simulations where the simulated and observed data are close according to a distance measure on summary statistics. The document outlines key issues like scalability to large datasets, assessment of uncertainty, and model choice, and discusses advances such as modified proposals, nonparametric methods, and perspectives that include summary construction in the framework. Validation of ABC model choice and selection of summary statistics remains an open challenge.
This document discusses nested sampling, a technique for Bayesian computation and evidence evaluation. It begins by introducing Bayesian inference and the evidence integral. It then shows that nested sampling transforms the multidimensional evidence integral into a one-dimensional integral over the prior mass constrained to have likelihood above a given value. The document outlines the nested sampling algorithm and shows that it provides samples from the posterior distribution. It also discusses termination criteria and choices of sample size for the algorithm. Finally, it provides a numerical example of nested sampling applied to a Gaussian model.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
The document discusses simulation as a tool for statistical computation. Simulation allows researchers to reproduce randomness on a computer by using deterministic pseudo-random number generators. It is useful for evaluating complex systems, determining properties of statistical procedures, validating models, approximating integrals, and maximizing functions. Simulation relies on generating uniform random variables between 0 and 1 using algorithms like congruential generators, and is widely used across many fields involving probability and statistics.
The document discusses computational methods for Bayesian statistics when direct simulation from the target distribution is not possible or efficient. It introduces Markov chain Monte Carlo (MCMC) methods, including the Metropolis-Hastings algorithm and Gibbs sampler, which generate dependent samples that approximate the target distribution. The Metropolis-Hastings algorithm uses a proposal distribution to randomly walk through the parameter space. Approximate Bayesian computation (ABC) is also introduced as a method that approximates the posterior distribution when the likelihood is intractable.
This document discusses the use of approximate Bayesian computation (ABC) to perform Bayesian inference when the likelihood function is intractable. It presents François and Pudlo's (F&P) ABC method, which uses a summary statistic S(·), an importance proposal g(·), a kernel K(·), and a bandwidth h. The document outlines the approximations and errors involved in F&P's ABC and discusses calibrating and optimizing the choice of summary statistic and bandwidth. It notes that while the optimal summary statistic is the expectation E(θ|y), this is usually unavailable, so F&P suggest estimating it using simulated data from a pilot run.
Omiros' talk on the Bernoulli factory problemBigMC
This document summarizes previous work on simulating events of unknown probability using reverse time martingales. It discusses von Neumann's solution to the Bernoulli factory problem where f(p)=1/2. It also summarizes the Keane-O'Brien existence result, the Nacu-Peres Bernstein polynomial approach, and issues with implementing the Nacu-Peres algorithm at large n due to the large number of strings involved. It proposes developing a reverse time martingale approach to address these issues.
This document discusses Bayesian model comparison in cosmology using population Monte Carlo methods. It provides background on key questions in cosmology that can be addressed using cosmic microwave background data from experiments like WMAP and Planck. Population Monte Carlo and adaptive importance sampling methods are introduced to help approximate Bayesian evidence for different cosmological models given the immense computational challenges of working with this cosmological data.
Those are the slides made for the course given in the Gran Paradiso National Park on July 13-17, based on the original slides for the three first chapters of Bayesian Core. The specific datasets used to illustrate those slides are not publicly available.
This document discusses various importance sampling methods for approximating Bayes factors, which are used for Bayesian model selection. It compares regular importance sampling, bridge sampling, harmonic means, mixtures to bridge sampling, and Chib's solution. An example application to probit modeling of diabetes in Pima Indian women is presented to illustrate regular importance sampling. Markov chain Monte Carlo methods like the Metropolis-Hastings algorithm and Gibbs sampling can be used to sample from the probit models.
The document discusses the history and importance of chocolate in human civilization. It notes that chocolate originated in Mesoamerica over 3000 years ago and was prized by the Aztecs and Mayans for its taste. Cocoa beans were used as currency and their cultivation was tightly regulated. The Spanish brought cocoa to Europe in the 16th century, starting its global spread and the development of the chocolate industry.
This document discusses various importance sampling methods for approximating marginal likelihoods, including regular importance sampling, bridge sampling, and harmonic means. It compares these methods on a probit model example using data on diabetes in Pima Indian women. Regular importance sampling uses the MLE distribution as an importance function. Bridge sampling introduces a pseudo-posterior to handle models with different parameter dimensions. Harmonic means directly uses the posterior sample but requires a proposal distribution with lighter tails than the posterior.
The document discusses statistical models and exponential families. It states that for most of the course, data is assumed to be a random sample from a distribution F. Repetition of observations via the law of large numbers and central limit theorem increases information about F. Exponential families are a class of parametric distributions with convenient analytic properties, where the density can be written as a function of natural parameters in an exponential form. Examples of exponential families include the binomial and normal distributions.
The document proposes using random forests (RF), a machine learning tool, for approximate Bayesian computation (ABC) model choice rather than estimating model posterior probabilities. RF improves on existing ABC model choice methods by having greater discriminative power among models, being robust to the choice and number of summary statistics, requiring less computation, and providing an error rate to evaluate confidence in the model choice. The authors illustrate the power of the RF-based ABC methodology on controlled experiments and real population genetics datasets.
The document discusses computational methods for Bayesian model choice and model comparison. It introduces Bayes factors and the evidence as central quantities for model comparison. It then describes various computational methods for approximating the evidence, including importance sampling solutions like bridge sampling, harmonic mean approximations using posterior samples, and approximating the evidence using mixture representations.
Regression analysis models the relationship between variables, where the dependent variable is modeled as a function of one or more independent variables. Linear regression models take forms such as straight-line, polynomial, Fourier, and interaction models. Multiple linear regression is useful for understanding variable effects, predicting values, and finding relationships between multiple independent and dependent variables. Methods like robust, stepwise, ridge, and partial least squares regression address issues like outliers, multicollinearity, and correlated predictors. Response surface and generalized linear models extend linear regression to nonlinear relationships. Multivariate regression models multiple dependent variables.
Regression analysis models the relationship between variables, including dependent and independent variables. Linear regression models take forms like straight lines, polynomials, trigonometric, and interaction terms. Multiple linear regression is useful for understanding variable effects, predicting values, and dealing with multicollinearity using methods like ridge regression, partial least squares, and stepwise regression. Nonlinear and generalized linear models also describe nonlinear relationships. Multivariate regression involves multiple response variables.
These are some slides I use in my Multivariate Statistics course to teach psychology graduate student the basics of structural equation modeling using the lavaan package in R. Topics are at an introductory level, for someone without prior experience with the topic.
1) Quantum generalized linear models (QGLMs) provide an alternative way to minimize error in statistical models by leveraging the geometry of probability distributions and quantum computing gates.
2) In tests on simulated overdispersed data and a real-world forest fire dataset, QGLMs performed comparably to or better than state-of-the-art machine learning algorithms like random forests and boosted regression.
3) QGLMs have the potential to improve statistical modeling by eliminating the need for link functions and exploiting the underlying geometry of models through quantum computations. This could help extend models to new distributions and problem types.
This document provides an overview of parametric methods in machine learning, including maximum likelihood estimation, evaluating estimators using bias and variance, the Bayes estimator, and parametric classification and regression. Key points covered include:
- Maximum likelihood estimation chooses parameters that maximize the likelihood function to produce the most probable distribution given observed data.
- Bias and variance are used to evaluate estimators, with the goal of minimizing both to improve accuracy. High bias or variance can indicate underfitting or overfitting.
- The Bayes estimator treats unknown parameters as random variables and uses prior distributions and Bayes' rule to estimate their expected values given data.
This document discusses linear regression models, which are used to model relationships between variables. Linear regression models hypothesize that the relationship between a dependent variable and one or more independent variables is linear. The model is fitted to data using the least squares method to estimate parameters that minimize the sum of squared errors between predicted and actual values. The slope and intercept of the fitted linear model can be interpreted, and the model's explanatory and predictive power are evaluated using metrics like the coefficient of determination (R) and mean absolute error.
A Novel Bayes Factor for Inverse Model Selection Problem based on Inverse Ref...inventionjournals
Statistical model selection problem can be divided into two broad categories based on Forward and Inverse problem. Compared to a wealthy of literature available for Forward model selection, there are very few methods applicable for Inverse model selection context. In this article we propose a novel Bayes factor for model selection in Bayesian Inverse Problem context. The proposed Bayes Factor is specially designed for Inverse problem with the help of Inverse Reference Distribution (IRD). We will discuss our proposal from decision theoretic perspective.
Multivariate and Conditional Distributionssusered887b
The document discusses key concepts in multivariate analysis including:
1) The multivariate normal distribution plays a fundamental role as both a population model and approximate sampling distribution for many statistics.
2) Multivariate distributions are determined by their mean vectors and covariance matrices.
3) Multivariate analysis involves measuring and analyzing dependence between variables and sets of variables.
4) Many real-world problems fall within the framework of multivariate normal theory.
This document discusses several perspectives and solutions to Bayesian hypothesis testing. It outlines issues with Bayesian testing such as the dependence on prior distributions and difficulties interpreting Bayesian measures like posterior probabilities and Bayes factors. It discusses how Bayesian testing compares models rather than identifying a single true model. Several solutions to challenges are discussed, like using Bayes factors which eliminate the dependence on prior model probabilities but introduce other issues. The document also discusses testing under specific models like comparing a point null hypothesis to alternatives. Overall it presents both Bayesian and frequentist views on hypothesis testing and some of the open controversies in the field.
The document discusses probabilistic and statistical models for outlier detection, specifically focusing on methods for extreme-value analysis of univariate and multivariate data distributions. It introduces some of the earliest and most fundamental statistical methods for outlier detection, including probabilistic tail inequalities like the Markov inequality and Chebychev inequality, which can be used to determine the probability that an extreme value should be considered anomalous. It also discusses how extreme-value analysis methods can be extended from univariate to multivariate data and how mixture models provide a probabilistic approach to identifying both outliers and extreme values.
A statistical criterion for reducing indeterminacy in linear causal modelingGianluca Bontempi
This document proposes a new statistical criterion called C to help distinguish between causal patterns in completely connected triplets when inferring causal relationships from observational data. The criterion is based on differences in values of the term S, which is derived from the covariance matrix, between different causal hypotheses. This criterion informs an algorithm called RC that incorporates both relevance and causal measures to iteratively select variables. Experiments on linear and nonlinear networks show RC has higher accuracy than other algorithms at inferring network structure. The criterion C and RC algorithm help address challenges of causal inference from complex data where dependencies are frequent.
Bayesian Variable Selection in Linear Regression and A ComparisonAtilla YARDIMCI
In this study, Bayesian approaches, such as Zellner, Occam’s Window and Gibbs sampling, have been compared in terms of selecting the correct subset for the variable selection in a linear regression model. The aim of this comparison is to analyze Bayesian variable selection and the behavior of classical criteria by taking into consideration the different values of β and σ and prior expected levels.
This document provides a tutorial on Bayesian model averaging (BMA). BMA accounts for model uncertainty by averaging over multiple models, weighted by their posterior probabilities. Standard statistical practice selects a single best model, ignoring model uncertainty. BMA offers improved predictive performance over any single model. However, implementing BMA presents challenges including an enormous number of terms to average over and difficult integrals. The document discusses methods for managing the summation and computing the necessary integrals, including Occam's window, Markov chain Monte Carlo model composition, and stochastic search variable selection. Examples are also provided to demonstrate the application and benefits of BMA.
BioSHaRE: Analysis of mixed effects models using federated data analysis appr...Lisette Giepmans
BioSHaRE conference July 28th, 2015, Milan - Latest tools and services for data sharing
Stream 3: Study application and results
Contact info:
Prof. Edwin van den Heuvel
University of Eindhoven
e.r.v.d.heuvel@tue.nl
key words: biobank, bioshare, cohort, data sharing, epidemiology, harmonisation, statistics
A Regularization Approach to the Reconciliation of Constrained Data SetsAlkis Vazacopoulos
This document proposes a new regularization approach to reconcile constrained data sets. The approach assumes unmeasured variables have a finite but equal uncertainty to derive an iterative solution that does not require explicitly computing a projection matrix at each step. This avoids issues when the projection matrix is non-invertible. The method arrives at a minimized solution by reformulating the problem to include an added regularization term for the unmeasured variables. It also provides an alternative way to classify variables without using the projection matrix.
This document proposes the inflated discrete beta regression (IDBR) model to analyze Likert scale and rating scale data from questionnaires. The IDBR addresses four key characteristics of such data: 1) the responses are discrete and bounded, 2) the distributions are often skewed, 3) one response level is frequently inflated, and 4) existing models do not account for all these factors simultaneously. The IDBR models the mean and dispersion of an underlying beta distribution jointly as a function of covariates, and also models the probability of selecting the inflated response level as a separate function of covariates. Simulation studies show the IDBR produces more precise predictions than competing models. The IDBR is illustrated using political orientation data from a European
Using Data Sets to Simulate Evolution within Complex Environments (Poster)Bruce Edmonds
An agent-based model of evolution and adaption is presented with complex genomes, sexual and asexual reproduction. However, unlike other models, this has a purposefully complex environment. Individuals are spread over a 2D space upon which is projected a complex data set derived from observations of the real world. Energy extraction for the purposes of life and reproduction is achieved by the match between the genome and the data in its locality. In this model different genomes develop to exploit different parts of the space, with some genomes acquiring a greater generality than others. This model can be used to explore the trade-offs inherent between diversity, specialization and inter-breeding among individuals.
This document provides an overview of statistical concepts for analyzing experimental data, including z-tests, t-tests, and ANOVAs. It discusses developing experimental hypotheses and distinguishing between null and alternative hypotheses. Key concepts explained include p-values, type I and type II errors, and determining statistical significance. Examples are given of applying a t-test and ANOVA to compare brain volume changes before and after childbirth. Limitations of statistical analyses with respect to including entire populations are also noted.
This thesis aims to formulate a simple measurement to evaluate and compare the predictive distributions of out-of-sample forecasts between autoregressive (AR) and vector autoregressive (VAR) models. The author conducts simulation studies to estimate AR and VAR models using Bayesian inference. A measurement is developed that uses out-of-sample forecasts and predictive distributions to evaluate the full forecast error probability distribution at different horizons. The measurement is found to accurately evaluate single forecasts and calibrate forecast models.
Similar to testing as a mixture estimation problem (20)
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills MN
By harnessing the power of High Flux Vacuum Membrane Distillation, Travis Hills from MN envisions a future where clean and safe drinking water is accessible to all, regardless of geographical location or economic status.
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfSelcen Ozturkcan
Ozturkcan, S., Berndt, A., & Angelakis, A. (2024). Mending clothing to support sustainable fashion. Presented at the 31st Annual Conference by the Consortium for International Marketing Research (CIMaR), 10-13 Jun 2024, University of Gävle, Sweden.
Evidence of Jet Activity from the Secondary Black Hole in the OJ 287 Binary S...Sérgio Sacani
Wereport the study of a huge optical intraday flare on 2021 November 12 at 2 a.m. UT in the blazar OJ287. In the binary black hole model, it is associated with an impact of the secondary black hole on the accretion disk of the primary. Our multifrequency observing campaign was set up to search for such a signature of the impact based on a prediction made 8 yr earlier. The first I-band results of the flare have already been reported by Kishore et al. (2024). Here we combine these data with our monitoring in the R-band. There is a big change in the R–I spectral index by 1.0 ±0.1 between the normal background and the flare, suggesting a new component of radiation. The polarization variation during the rise of the flare suggests the same. The limits on the source size place it most reasonably in the jet of the secondary BH. We then ask why we have not seen this phenomenon before. We show that OJ287 was never before observed with sufficient sensitivity on the night when the flare should have happened according to the binary model. We also study the probability that this flare is just an oversized example of intraday variability using the Krakow data set of intense monitoring between 2015 and 2023. We find that the occurrence of a flare of this size and rapidity is unlikely. In machine-readable Tables 1 and 2, we give the full orbit-linked historical light curve of OJ287 as well as the dense monitoring sample of Krakow.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)eitps1506
Description:
Dive into the fascinating realm of solid-state physics with our meticulously crafted online PowerPoint presentation. This immersive educational resource offers a comprehensive exploration of the fundamental concepts, theories, and applications within the realm of solid-state physics.
From crystalline structures to semiconductor devices, this presentation delves into the intricate principles governing the behavior of solids, providing clear explanations and illustrative examples to enhance understanding. Whether you're a student delving into the subject for the first time or a seasoned researcher seeking to deepen your knowledge, our presentation offers valuable insights and in-depth analyses to cater to various levels of expertise.
Key topics covered include:
Crystal Structures: Unravel the mysteries of crystalline arrangements and their significance in determining material properties.
Band Theory: Explore the electronic band structure of solids and understand how it influences their conductive properties.
Semiconductor Physics: Delve into the behavior of semiconductors, including doping, carrier transport, and device applications.
Magnetic Properties: Investigate the magnetic behavior of solids, including ferromagnetism, antiferromagnetism, and ferrimagnetism.
Optical Properties: Examine the interaction of light with solids, including absorption, reflection, and transmission phenomena.
With visually engaging slides, informative content, and interactive elements, our online PowerPoint presentation serves as a valuable resource for students, educators, and enthusiasts alike, facilitating a deeper understanding of the captivating world of solid-state physics. Explore the intricacies of solid-state materials and unlock the secrets behind their remarkable properties with our comprehensive presentation.
Microbial interaction
Microorganisms interacts with each other and can be physically associated with another organisms in a variety of ways.
One organism can be located on the surface of another organism as an ectobiont or located within another organism as endobiont.
Microbial interaction may be positive such as mutualism, proto-cooperation, commensalism or may be negative such as parasitism, predation or competition
Types of microbial interaction
Positive interaction: mutualism, proto-cooperation, commensalism
Negative interaction: Ammensalism (antagonism), parasitism, predation, competition
I. Mutualism:
It is defined as the relationship in which each organism in interaction gets benefits from association. It is an obligatory relationship in which mutualist and host are metabolically dependent on each other.
Mutualistic relationship is very specific where one member of association cannot be replaced by another species.
Mutualism require close physical contact between interacting organisms.
Relationship of mutualism allows organisms to exist in habitat that could not occupied by either species alone.
Mutualistic relationship between organisms allows them to act as a single organism.
Examples of mutualism:
i. Lichens:
Lichens are excellent example of mutualism.
They are the association of specific fungi and certain genus of algae. In lichen, fungal partner is called mycobiont and algal partner is called
II. Syntrophism:
It is an association in which the growth of one organism either depends on or improved by the substrate provided by another organism.
In syntrophism both organism in association gets benefits.
Compound A
Utilized by population 1
Compound B
Utilized by population 2
Compound C
utilized by both Population 1+2
Products
In this theoretical example of syntrophism, population 1 is able to utilize and metabolize compound A, forming compound B but cannot metabolize beyond compound B without co-operation of population 2. Population 2is unable to utilize compound A but it can metabolize compound B forming compound C. Then both population 1 and 2 are able to carry out metabolic reaction which leads to formation of end product that neither population could produce alone.
Examples of syntrophism:
i. Methanogenic ecosystem in sludge digester
Methane produced by methanogenic bacteria depends upon interspecies hydrogen transfer by other fermentative bacteria.
Anaerobic fermentative bacteria generate CO2 and H2 utilizing carbohydrates which is then utilized by methanogenic bacteria (Methanobacter) to produce methane.
ii. Lactobacillus arobinosus and Enterococcus faecalis:
In the minimal media, Lactobacillus arobinosus and Enterococcus faecalis are able to grow together but not alone.
The synergistic relationship between E. faecalis and L. arobinosus occurs in which E. faecalis require folic acid
MICROBIAL INTERACTION PPT/ MICROBIAL INTERACTION AND THEIR TYPES // PLANT MIC...
testing as a mixture estimation problem
1. Testing as a mixture estimation problem
Christian P. Robert
Universit´e Paris-Dauphine and University of Warwick
Joint work with K. Kamary, K. Mengersen, and J. Rousseau
3. Introduction
Hypothesis testing
central problem of statistical inference
dramatically differentiating feature between classical and
Bayesian paradigms
wide open to controversy and divergent opinions, even within
the Bayesian community
non-informative Bayesian testing case mostly unresolved,
witness the Jeffreys–Lindley paradox
[Berger (2003), Mayo & Cox (2006), Gelman (2008)]
4. Testing hypotheses
Bayesian model selection as comparison of k potential
statistical models towards the selection of model that fits the
data “best”
mostly accepted perspective: it does not primarily seek to
identify which model is “true”, but rather to indicate which
fits data better
tools like Bayes factor naturally include a penalisation
addressing model complexity, mimicked by Bayes Information
(BIC) and Deviance Information (DIC) criteria
posterior predictive tools successfully advocated in Gelman et
al. (2013) even though they imply multiple uses of data
5. Bayesian modelling
Standard Bayesian approach to testing: consider two families of
models, one for each of the hypotheses under comparison,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
and associate with each model a prior distribution,
θ1 ∼ π1(θ1) and θ2 ∼ π2(θ2) ,
6. Bayesian modelling
Standard Bayesian approach to testing: consider two families of
models, one for each of the hypotheses under comparison,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
in order to compare the marginal likelihoods
m1(x) =
Θ1
f1(x|θ1) π1(θ1) dθ1 and m2(x) =
Θ2
f2(x|θ2) π1(θ2) dθ2
7. Bayesian modelling
Standard Bayesian approach to testing: consider two families of
models, one for each of the hypotheses under comparison,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
either through Bayes factor or posterior probability, respectively:
B12 =
m1(x)
m2(x)
, P(M1|x) =
ω1m1(x)
ω1m1(x) + ω2m2(x)
;
the latter depends on the prior weights ωi
8. Bayesian decision
Bayesian decision step
comparing Bayes factor B12 with threshold value of one or
comparing posterior probability P(M1|x) with bound
When comparing more than two models, model with highest
posterior probability is the one selected, but highly dependent on
the prior modelling.
9. Bayesian decision
Bayesian decision step
comparing Bayes factor B12 with threshold value of one or
comparing posterior probability P(M1|x) with bound
When comparing more than two models, model with highest
posterior probability is the one selected, but highly dependent on
the prior modelling.
10. Some difficulties
tension between using posterior probabilities justified by a
binary loss function but depending on unnatural prior weights,
and using Bayes factors that eliminate this dependence but
escape direct connection with posterior distribution, unless
prior weights are integrated within the loss
subsequent and delicate interpretation (or calibration) of the
strength of the Bayes factor towards supporting a given
hypothesis or model, because it is not a Bayesian decision rule
similar difficulty with posterior probabilities, with tendency to
interpret them as p-values (rather than the opposite) when
they only report through a marginal likelihood ratio the
respective strengths of fitting the data to both models
11. Some more difficulties
long-lasting impact of the prior modeling, meaning the choice
of the prior distributions on the parameter spaces of both
models under comparison, despite overall consistency proof for
Bayes factor
discontinuity in use of improper priors since they are not
justified in most testing situations, leading to many alternative
if ad hoc solutions, where data is either used twice or split in
artificial ways
binary (accept vs. reject) outcome more suited for immediate
decision (if any) than for model evaluation, in connection with
rudimentary loss function
12. Some additional difficulties
related impossibility to ascertain simultaneous misfit or to
detect presence of outliers;
lack of assessment of uncertainty associated with decision
itself
difficult computation of marginal likelihoods in most settings
with further controversies about which computational solution
to adopt
strong dependence of posterior probabilities on conditioning
statistics, which in turn undermines their validity for model
assessment, as exhibited in ABC
temptation to create pseudo-frequentist equivalents such as
q-values with even less Bayesian justifications
13. Paradigm shift
New proposal of a paradigm shift in Bayesian processing of
hypothesis testing and of model selection
convergent and naturally interpretable solution
more extended use of improper priors
Simple representation of the testing problem as a two-component
mixture estimation problem where the weights are formally equal
to 0 or 1
14. Paradigm shift
New proposal of a paradigm shift in Bayesian processing of
hypothesis testing and of model selection
convergent and naturally interpretable solution
more extended use of improper priors
Simple representation of the testing problem as a two-component
mixture estimation problem where the weights are formally equal
to 0 or 1
15. Paradigm shift
Simple representation of the testing problem as a two-component
mixture estimation problem where the weights are formally equal
to 0 or 1
Approach inspired from consistency result of Rousseau and
Mengersen (2011) on estimated overfitting mixtures
Mixture representation not directly equivalent to the use of a
posterior probability
Potential of a better approach to testing, while not expanding
number of parameters
Calibration of the posterior distribution of the weight of a
model, while moving from the artificial notion of the posterior
probability of a model
16. Encompassing mixture model
Idea: Given two statistical models,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
embed both within an encompassing mixture
Mα : x ∼ αf1(x|θ1) + (1 − α)f2(x|θ2) , 0 α 1 (1)
Note: Both models correspond to special cases of (1), one for
α = 1 and one for α = 0
Draw inference on mixture representation (1), as if each
observation was individually and independently produced by the
mixture model
17. Encompassing mixture model
Idea: Given two statistical models,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
embed both within an encompassing mixture
Mα : x ∼ αf1(x|θ1) + (1 − α)f2(x|θ2) , 0 α 1 (1)
Note: Both models correspond to special cases of (1), one for
α = 1 and one for α = 0
Draw inference on mixture representation (1), as if each
observation was individually and independently produced by the
mixture model
18. Encompassing mixture model
Idea: Given two statistical models,
M1 : x ∼ f1(x|θ1) , θ1 ∈ Θ1 and M2 : x ∼ f2(x|θ2) , θ2 ∈ Θ2 ,
embed both within an encompassing mixture
Mα : x ∼ αf1(x|θ1) + (1 − α)f2(x|θ2) , 0 α 1 (1)
Note: Both models correspond to special cases of (1), one for
α = 1 and one for α = 0
Draw inference on mixture representation (1), as if each
observation was individually and independently produced by the
mixture model
19. Inferential motivations
Sounds like an approximation to the real model, but several
definitive advantages to this paradigm shift:
Bayes estimate of the weight α replaces posterior probability
of model M1, equally convergent indicator of which model is
“true”, while avoiding artificial prior probabilities on model
indices, ω1 and ω2
interpretation of estimator of α at least as natural as handling
the posterior probability, while avoiding zero-one loss setting
α and its posterior distribution provide measure of proximity
to the models, while being interpretable as data propensity to
stand within one model
further allows for alternative perspectives on testing and
model choice, like predictive tools, cross-validation, and
information indices like WAIC
20. Computational motivations
avoids highly problematic computations of the marginal
likelihoods, since standard algorithms are available for
Bayesian mixture estimation
straightforward extension to a finite collection of models, with
a larger number of components, which considers all models at
once and eliminates least likely models by simulation
eliminates difficulty of “label switching” that plagues both
Bayesian estimation and Bayesian computation, since
components are no longer exchangeable
posterior distribution of α evaluates more thoroughly strength
of support for a given model than the single figure outcome of
a posterior probability
variability of posterior distribution on α allows for a more
thorough assessment of the strength of this support
21. Noninformative motivations
additional feature missing from traditional Bayesian answers:
a mixture model acknowledges possibility that, for a finite
dataset, both models or none could be acceptable
standard (proper and informative) prior modeling can be
reproduced in this setting, but non-informative (improper)
priors also are manageable therein, provided both models first
reparameterised towards shared parameters, e.g. location and
scale parameters
in special case when all parameters are common
Mα : x ∼ αf1(x|θ) + (1 − α)f2(x|θ) , 0 α 1
if θ is a location parameter, a flat prior π(θ) ∝ 1 is available
22. Weakly informative motivations
using the same parameters or some identical parameters on
both components highlights that opposition between the two
components is not an issue of enjoying different parameters
those common parameters are nuisance parameters, to be
integrated out
prior model weights ωi rarely discussed in classical Bayesian
approach, even though linear impact on posterior probabilities.
Here, prior modeling only involves selecting a prior on α, e.g.,
α ∼ B(a0, a0)
while a0 impacts posterior on α, it always leads to mass
accumulation near 1 or 0, i.e. favours most likely model over
the other
sensitivity analysis straightforward to carry
approach easily calibrated by parametric boostrap providing
reference posterior of α under each model
natural Metropolis–Hastings alternative
23. Computational issues
Specificities of mixture estimation in such settings.
While likelihood is a regular mixture likelihood, weight α close
to one of the boundaries
Gibbs completion bound to be inefficient when sample size
grows, more than usual!
24. Gibbs sampler
Consider sample x = (x1, x2, . . . , xn) from (1).
Completion by latent component indicators ζi leads to completed
likelihood
(θ, α1, α2 | x, ζ) =
n
i=1
αζi
f (xi | θζi
)
= αn1
(1 − α)n2
n
i=1
f (xi | θζi
) ,
where
(n1, n2) =
n
i=1
Iζi =1,
n
i=1
Iζi =2
under constraint
n =
2
j=1
n
i=1
Iζi =j
.
[Diebolt & Robert, 1990]
25. Local concentration
Gibbs sampling scheme valid from a theoretical point of view but
faces convergence difficulties, due to prior concentration on
boundaries of (0, 1) for the mixture weight α
As sample size n grows, sample of the α’s shows less and less
switches between the vicinity of zero and the vicinity of one
26. Local concentration
Gibbs sequences (αt ) on the first component weight for the mixture model
αN(µ, 1) + (1 − α)N(0, 1) for a N(0, 1) sample of size N = 5, 10, 50, 100, 500,
103
(from top to bottom) based on 105
simulations. The y-range range for all
series is (0, 1).
27. MCMC alternative
Simple Metropolis-Hastings algorithm where
the model parameters θi generated from respective full
posteriors of both models (i.e., based on entire sample) and
mixture weight α generated from a random walk proposal on
(0, 1)
Rare occurrence for mixtures to allow for independent proposals
28. MCMC alternative
Metropolis–Hastings sequences (αt ) on the first component weight for the
mixture model αN(µ, 1) + (1 − α)N(0, 1) for a N(0, 1) sample of size N = 5,
10, 50, 100, 500, 103
(from top to bottom) based on 105
simulations.The
y-range for all series is (0, 1).
30. Poisson/Geometric
choice betwen Poisson P(λ) and Geometric Geo(p)
distribution
mixture with common parameter λ
Mα : αP(λ) + (1 − α)Geo(1/1+λ)
Allows for Jeffreys prior since resulting posterior is proper
independent Metropolis–within–Gibbs with proposal
distribution on λ equal to Poisson posterior (with acceptance
rate larger than 75%)
31. Poisson/Geometric
choice betwen Poisson P(λ) and Geometric Geo(p)
distribution
mixture with common parameter λ
Mα : αP(λ) + (1 − α)Geo(1/1+λ)
Allows for Jeffreys prior since resulting posterior is proper
independent Metropolis–within–Gibbs with proposal
distribution on λ equal to Poisson posterior (with acceptance
rate larger than 75%)
32. Poisson/Geometric
choice betwen Poisson P(λ) and Geometric Geo(p)
distribution
mixture with common parameter λ
Mα : αP(λ) + (1 − α)Geo(1/1+λ)
Allows for Jeffreys prior since resulting posterior is proper
independent Metropolis–within–Gibbs with proposal
distribution on λ equal to Poisson posterior (with acceptance
rate larger than 75%)
33. Beta prior
When α ∼ Be(a0, a0) prior, full conditional posterior
α ∼ Be(n1(ζ) + a0, n2(ζ) + a0)
Exact Bayes factor opposing Poisson and Geometric
B12 = nn¯xn
n
i=1
xi ! Γ n + 2 +
n
i=1
xi Γ(n + 2)
although undefined from a purely mathematical viewpoint
34. Parameter estimation
1e-04 0.001 0.01 0.1 0.2 0.3 0.4 0.5
3.903.954.004.054.10
Posterior means of λ and medians of α for 100 Poisson P(4) datasets of size
n = 1000, for a0 = .0001, .001, .01, .1, .2, .3, .4, .5. Each posterior approximation
is based on 104
Metropolis-Hastings iterations.
35. Parameter estimation
1e-04 0.001 0.01 0.1 0.2 0.3 0.4 0.5
0.9900.9920.9940.9960.9981.000
Posterior means of λ and medians of α for 100 Poisson P(4) datasets of size
n = 1000, for a0 = .0001, .001, .01, .1, .2, .3, .4, .5. Each posterior approximation
is based on 104
Metropolis-Hastings iterations.
36. MCMC convergence
Dataset from a Poisson distribution P(4): Estimations of (top) λ and (bottom)
α via MH for 5 samples of size n = 5, 50, 100, 500, 10, 000.
37. Consistency
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
a0=.1
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
a0=.3
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
a0=.5
log(sample size)
Posterior means (sky-blue) and medians (grey-dotted) of α, over 100 Poisson
P(4) datasets for sample sizes from 1 to 1000.
38. Behaviour of Bayes factor
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
log(sample size)
0 1 2 3 4 5 6 7
0.00.20.40.60.81.0
log(sample size)
Comparison between P(M1|x) (red dotted area) and posterior medians of α
(grey zone) for 100 Poisson P(4) datasets with sample sizes n between 1 and
1000, for a0 = .001, .1, .5
39. Normal-normal comparison
comparison of a normal N(θ1, 1) with a normal N(θ2, 2)
distribution
mixture with identical location parameter θ
αN(θ, 1) + (1 − α)N(θ, 2)
Jeffreys prior π(θ) = 1 can be used, since posterior is proper
Reference (improper) Bayes factor
B12 = 2
n−1/2
exp 1/4
n
i=1
(xi − ¯x)2
,
40. Normal-normal comparison
comparison of a normal N(θ1, 1) with a normal N(θ2, 2)
distribution
mixture with identical location parameter θ
αN(θ, 1) + (1 − α)N(θ, 2)
Jeffreys prior π(θ) = 1 can be used, since posterior is proper
Reference (improper) Bayes factor
B12 = 2
n−1/2
exp 1/4
n
i=1
(xi − ¯x)2
,
41. Normal-normal comparison
comparison of a normal N(θ1, 1) with a normal N(θ2, 2)
distribution
mixture with identical location parameter θ
αN(θ, 1) + (1 − α)N(θ, 2)
Jeffreys prior π(θ) = 1 can be used, since posterior is proper
Reference (improper) Bayes factor
B12 = 2
n−1/2
exp 1/4
n
i=1
(xi − ¯x)2
,
42. Consistency
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.00.20.40.60.81.0
n=15
0.00.20.40.60.81.0
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.650.700.750.800.850.900.951.00
n=50
0.650.700.750.800.850.900.951.00
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.750.800.850.900.951.00
n=100
0.750.800.850.900.951.00
0.1 0.2 0.3 0.4 0.5 p(M1|x)
0.930.940.950.960.970.980.991.00
n=500
0.930.940.950.960.970.980.991.00
Posterior means (wheat) and medians of α (dark wheat), compared with
posterior probabilities of M0 (gray) for a N(0, 1) sample, derived from 100
datasets for sample sizes equal to 15, 50, 100, 500. Each posterior
approximation is based on 104
MCMC iterations.
43. Comparison with posterior probability
0 100 200 300 400 500
-50-40-30-20-100
a0=.1
sample size
0 100 200 300 400 500
-50-40-30-20-100
a0=.3
sample size
0 100 200 300 400 500
-50-40-30-20-100
a0=.4
sample size
0 100 200 300 400 500
-50-40-30-20-100
a0=.5
sample size
Plots of ranges of log(n) log(1 − E[α|x]) (gray color) and log(1 − p(M1|x)) (red
dotted) over 100 N(0, 1) samples as sample size n grows from 1 to 500. and α
is the weight of N(0, 1) in the mixture model. The shaded areas indicate the
range of the estimations and each plot is based on a Beta prior with
a0 = .1, .2, .3, .4, .5, 1 and each posterior approximation is based on 104
iterations.
44. Comments
convergence to one boundary value as sample size n grows
impact of hyperarameter a0 slowly vanishes as n increases, but
present for moderate sample sizes
when simulated sample is neither from N(θ1, 1) nor from
N(θ2, 2), behaviour of posterior varies, depending on which
distribution is closest
46. Another normal-normal comparison
compare N(0, 1) against N(µ, 1) model, hence testing
whether or not µ = 0
embedded case: cannot use improper priors on µ and
substitute a µ ∼ N(0, 1) prior
inference on α contrasted between the N(0, 1) case and
outside the null distribution
47. Two dynamics
1/n .1 .2 .3 .4 .5 1
0.50.60.70.80.91.0
5 10 50 100 500 1000
0.00.20.40.60.81.0
Posterior distributions of α under a Beta B(a0, a0) prior (left) for
a0 = 1/n, .1, .2, .3, .4, .5, 1 with 102
N(1, 1) observations and (right) for a0 = .1
with n = 5, 10, 50, 100, 500, 103
N(1, 1) observations
48. Logistic or Probit?
binary dataset, R dataset about diabetes in 200 Pima Indian
women with body mass index as explanatory variable
comparison of logit and probit fits could be suitable. We are
thus comparing both fits via our method
M1 : yi | xi
, θ1 ∼ B(1, pi ) where pi =
exp(xi θ1)
1 + exp(xi θ1)
M2 : yi | xi
, θ2 ∼ B(1, qi ) where qi = Φ(xi
θ2)
49. Common parameterisation
Local reparameterisation strategy that rescales parameters of the
probit model M2 so that the MLE’s of both models coincide.
[Choudhuty et al., 2007]
Φ(xi
θ2) ≈
exp(kxi θ2)
1 + exp(kxi θ2)
and use best estimate of k to bring both parameters into coherency
(k0, k1) = (θ01/θ02, θ11/θ12) ,
reparameterise M1 and M2 as
M1 :yi | xi
, θ ∼ B(1, pi ) where pi =
exp(xi θ)
1 + exp(xi θ)
M2 :yi | xi
, θ ∼ B(1, qi ) where qi = Φ(xi
(κ−1
θ)) ,
with κ−1θ = (θ0/k0, θ1/k1).
50. Prior modelling
Under default g-prior
θ ∼ N2(0, n(XT
X)−1
)
full conditional posterior distributions given allocations
π(θ | y, X, ζ) ∝
exp i Iζi =1yi xi θ
i;ζi =1[1 + exp(xi θ)]
exp −θT
(XT
X)θ 2n
×
i;ζi =2
Φ(xi
(κ−1
θ))yi
(1 − Φ(xi
(κ−1
θ)))(1−yi )
hence posterior distribution clearly defined
51. Results
Logistic Probit
a0 α θ0 θ1
θ0
k0
θ1
k1
.1 .352 -4.06 .103 -2.51 .064
.2 .427 -4.03 .103 -2.49 .064
.3 .440 -4.02 .102 -2.49 .063
.4 .456 -4.01 .102 -2.48 .063
.5 .449 -4.05 .103 -2.51 .064
Histograms of posteriors of α in favour of logistic model where a0 = .1, .2, .3,
.4, .5 for (a) Pima dataset, (b) Data from logistic model, (c) Data from probit
model
52. Asymptotic consistency
Posterior consistency for mixture testing procedure under minor
conditions
Two different cases
the two models, M1 and M2, are well separated
model M1 is a submodel of M2.
53. Asymptotic consistency
Posterior consistency for mixture testing procedure under minor
conditions
Two different cases
the two models, M1 and M2, are well separated
model M1 is a submodel of M2.
54. Posterior concentration rate
Let π be the prior and xn = (x1, · · · , xn) a sample with true
density f ∗
proposition
Assume that, for all c > 0, there exist Θn ⊂ Θ1 × Θ2 and B > 0 such that
π [Θc
n] n−c
, Θn ⊂ { θ1 + θ2 nB
}
and that there exist H 0 and L, δ > 0 such that, for j = 1, 2,
sup
θ,θ ∈Θn
fj,θj
− fj,θj
1 LnH
θj − θj , θ = (θ1, θ2), θ = (θ1, θ2) ,
∀ θj − θ∗
j δ; KL(fj,θj
, fj,θ∗
j
) θj − θ∗
j .
Then, when f ∗
= fθ∗,α∗ , with α∗
∈ [0, 1], there exists M > 0 such that
π (α, θ); fθ,α − f ∗
1 > M log n/n|xn
= op(1) .
55. Separated models
Assumption: Models are separated, i.e. there is identifiability:
∀α, α ∈ [0, 1], ∀θj , θj , j = 1, 2 Pθ,α = Pθ ,α ⇒ α = α , θ = θ
Further
inf
θ1∈Θ1
inf
θ2∈Θ2
f1,θ1
− f2,θ2 1 > 0
and, for θ∗
j ∈ Θj , if Pθj
weakly converges to Pθ∗
j
, then θj converges
in the Euclidean topology to θ∗
j
56. Separated models
Assumption: Models are separated, i.e. there is identifiability:
∀α, α ∈ [0, 1], ∀θj , θj , j = 1, 2 Pθ,α = Pθ ,α ⇒ α = α , θ = θ
theorem
Under above assumptions, then for all > 0,
π [|α − α∗
| > |xn
] = op(1)
If θj → fj,θj
is C2 in a neighbourhood of θ∗
j , j = 1, 2,
f1,θ∗
1
− f2,θ∗
2
, f1,θ∗
1
, f2,θ∗
2
are linearly independent in y and there
exists δ > 0 such that
f1,θ∗
1
, f2,θ∗
2
, sup
|θ1−θ∗
1 |<δ
|D2
f1,θ1
|, sup
|θ2−θ∗
2 |<δ
|D2
f2,θ2
| ∈ L1
then
π |α − α∗
| > M log n/n xn
= op(1).
57. Separated models
Assumption: Models are separated, i.e. there is identifiability:
∀α, α ∈ [0, 1], ∀θj , θj , j = 1, 2 Pθ,α = Pθ ,α ⇒ α = α , θ = θ
Theorem allows for interpretation of α under the posterior: If data
xn is generated from model M1 then posterior on α concentrates
around α = 1
58. Embedded case
Here M1 is a submodel of M2, i.e.
θ2 = (θ1, ψ) and θ2 = (θ1, ψ0 = 0)
corresponds to f2,θ2 ∈ M1
Same posterior concentration rate
log n/n
for estimating α when α∗ ∈ (0, 1) and ψ∗ = 0.
59. Null case
Case where ψ∗ = 0, i.e., f ∗ is in model M1
Two possible paths to approximate f ∗: either α goes to 1 (path 1)
or ψ goes to 0 (path 2)
New identifiability condition: Pθ,α = P∗ only if
α = 1, θ1 = θ∗
1, θ2 = (θ∗
1, ψ) or α 1, θ1 = θ∗
1, θ2 = (θ∗
1, 0)
Prior
π(α, θ) = πα(α)π1(θ1)πψ(ψ), θ2 = (θ1, ψ)
with common (prior on) θ1
60. Null case
Case where ψ∗ = 0, i.e., f ∗ is in model M1
Two possible paths to approximate f ∗: either α goes to 1 (path 1)
or ψ goes to 0 (path 2)
New identifiability condition: Pθ,α = P∗ only if
α = 1, θ1 = θ∗
1, θ2 = (θ∗
1, ψ) or α 1, θ1 = θ∗
1, θ2 = (θ∗
1, 0)
Prior
π(α, θ) = πα(α)π1(θ1)πψ(ψ), θ2 = (θ1, ψ)
with common (prior on) θ1
61. Assumptions
[B1] Regularity: Assume that θ1 → f1,θ1 and θ2 → f2,θ2 are 3
times continuously differentiable and that
F∗
¯f 3
1,θ∗
1
f 3
1,θ∗
1
< +∞, ¯f1,θ∗
1
= sup
|θ1−θ∗
1 |<δ
f1,θ1
, f 1,θ∗
1
= inf
|θ1−θ∗
1 |<δ
f1,θ1
F∗
sup|θ1−θ∗
1 |<δ | f1,θ∗
1
|3
f 3
1,θ∗
1
< +∞, F∗
| f1,θ∗
1
|4
f 4
1,θ∗
1
< +∞,
F∗
sup|θ1−θ∗
1 |<δ |D2
f1,θ∗
1
|2
f 2
1,θ∗
1
< +∞, F∗
sup|θ1−θ∗
1 |<δ |D3
f1,θ∗
1
|
f 1,θ∗
1
< +∞
62. Assumptions
[B2] Integrability: There exists S0 ⊂ S ∩ {|ψ| > δ0}, for some
positive δ0 and satisfying Leb(S0) > 0, and such that for all
ψ ∈ S0,
F∗
sup|θ1−θ∗
1 |<δ f2,θ1,ψ
f 4
1,θ∗
1
< +∞, F∗
sup|θ1−θ∗
1 |<δ f 3
2,θ1,ψ
f 3
1,θ1∗
< +∞,
63. Assumptions
[B3] Stronger identifiability : Set
f2,θ∗
1,ψ∗ (x) = θ1 f2,θ∗
1,ψ∗ (x)T
, ψf2,θ∗
1,ψ∗ (x)T T
.
Then for all ψ ∈ S with ψ = 0, if η0 ∈ R, η1 ∈ Rd1
η0(f1,θ∗
1
− f2,θ∗
1,ψ) + ηT
1 θ1 f1,θ∗
1
= 0 ⇔ η1 = 0, η2 = 0
64. Consistency
theorem
Given the mixture fθ1,ψ,α = αf1,θ1 + (1 − α)f2,θ1,ψ and a sample
xn = (x1, · · · , xn) issued from f1,θ∗
1
, under assumptions B1 − B3
are satisfied, and an M > 0 such that
π (α, θ); fθ,α − f ∗
1 > M log n/n|xn
= op(1).
If α ∼ B(a1, a2), with a2 < d2, and if the prior πθ1,ψ is absolutely
continuous with positive and continuous density at (θ∗
1, 0), then for
all Mn going to ∞
π |α − α∗
| > Mn(log n)γ
/
√
n|xn
= op(1), γ = max((d1 + a2)/(d2 − a2), 1)/2,
65. Conclusion
many applications of the Bayesian paradigm concentrate on
the comparison of scientific theories and on testing of null
hypotheses
natural tendency to default to Bayes factors
poorly understood sensitivity to prior modeling and posterior
calibration
Time is ripe for a paradigm shift
66. Conclusion
Time is ripe for a paradigm shift
original testing problem replaced with a better controlled
estimation target
allow for posterior variability over the component frequency as
opposed to deterministic Bayes factors
range of acceptance, rejection and indecision conclusions
easily calibrated by simulation
posterior medians quickly settling near the boundary values of
0 and 1
potential derivation of a Bayesian b-value by looking at the
posterior area under the tail of the distribution of the weight
67. Prior modelling
Partly common parameterisation always feasible and hence
allows for reference priors
removal of the absolute prohibition of improper priors in
hypothesis testing
prior on the weight α shows sensitivity that naturally vanishes
as the sample size increases
default value of a0 = 0.5 in the Beta prior
68. Computing aspects
proposal that does not induce additional computational strain
when algorithmic solutions exist for both models, they can be
recycled towards estimating the encompassing mixture
easier than in standard mixture problems due to common
parameters that allow for original MCMC samplers to be
turned into proposals
Gibbs sampling completions useful for assessing potential
outliers but not essential to achieve a conclusion about the
overall problem