The document discusses statistical models and exponential families. It states that for most of the course, data is assumed to be a random sample from a distribution F. Repetition of observations via the law of large numbers and central limit theorem increases information about F. Exponential families are a class of parametric distributions with convenient analytic properties, where the density can be written as a function of natural parameters in an exponential form. Examples of exponential families include the binomial and normal distributions.
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
The document discusses likelihood functions and inference. It begins by defining the likelihood function as the function that gives the probability of observing a sample given a parameter value. The likelihood varies with the parameter, while the density function varies with the data. Maximum likelihood estimation chooses parameters that maximize the likelihood function. The score function is the gradient of the log-likelihood and has an expected value of zero at the true parameter value. The Fisher information matrix measures the curvature of the likelihood surface and provides information about the precision of parameter estimates. It relates to the concentration of likelihood functions around the true parameter value as sample size increases.
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
The document discusses the bootstrap method and its applications in statistical inference. It introduces the bootstrap as a technique for estimating properties of estimators like variance and distribution when the true sampling distribution is unknown. This is done by treating the observed sample as if it were the population and resampling with replacement to create new simulated samples. The bootstrap then approximates characteristics of the sampling distribution, allowing inferences like confidence intervals to be constructed.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
This document discusses approximate Bayesian computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that match the observed data closely according to some distance measure and tolerance level. The document outlines the basic ABC algorithm and discusses some advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including the tolerance level in the inferential framework. It also provides examples of applying ABC to problems like inferring the number of socks in a drawer from an observation and simulating the outcome of a historical naval battle.
The document discusses curve sketching of functions by analyzing their derivatives. It provides:
1) A checklist for graphing a function which involves finding where the function is positive/negative/zero, its monotonicity from the first derivative, and concavity from the second derivative.
2) An example of graphing the cubic function f(x) = 2x^3 - 3x^2 - 12x through analyzing its derivatives.
3) Explanations of the increasing/decreasing test and concavity test to determine monotonicity and concavity from a function's derivatives.
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
The document discusses likelihood functions and inference. It begins by defining the likelihood function as the function that gives the probability of observing a sample given a parameter value. The likelihood varies with the parameter, while the density function varies with the data. Maximum likelihood estimation chooses parameters that maximize the likelihood function. The score function is the gradient of the log-likelihood and has an expected value of zero at the true parameter value. The Fisher information matrix measures the curvature of the likelihood surface and provides information about the precision of parameter estimates. It relates to the concentration of likelihood functions around the true parameter value as sample size increases.
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
The document discusses the bootstrap method and its applications in statistical inference. It introduces the bootstrap as a technique for estimating properties of estimators like variance and distribution when the true sampling distribution is unknown. This is done by treating the observed sample as if it were the population and resampling with replacement to create new simulated samples. The bootstrap then approximates characteristics of the sampling distribution, allowing inferences like confidence intervals to be constructed.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
This document discusses approximate Bayesian computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that match the observed data closely according to some distance measure and tolerance level. The document outlines the basic ABC algorithm and discusses some advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem to allow for larger tolerances, and including the tolerance level in the inferential framework. It also provides examples of applying ABC to problems like inferring the number of socks in a drawer from an observation and simulating the outcome of a historical naval battle.
The document discusses curve sketching of functions by analyzing their derivatives. It provides:
1) A checklist for graphing a function which involves finding where the function is positive/negative/zero, its monotonicity from the first derivative, and concavity from the second derivative.
2) An example of graphing the cubic function f(x) = 2x^3 - 3x^2 - 12x through analyzing its derivatives.
3) Explanations of the increasing/decreasing test and concavity test to determine monotonicity and concavity from a function's derivatives.
This document discusses three methods for finding the roots of nonlinear equations:
1) Bisection method, which converges linearly but is guaranteed to find a root.
2) Newton's method, which converges quadratically (much faster) but may diverge if the starting point is too far from the root.
3) Secant method, which is faster than bisection but slower than Newton's, and also requires starting points close to the root. Newton's and secant methods can be extended to systems of nonlinear equations.
The document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations that are close to the observed data based on a distance measure and tolerance level. This provides samples from an approximation of the posterior distribution. The document provides examples that motivate ABC and outlines the basic ABC algorithm. It also discusses extensions and improvements to the standard ABC method.
Multiple estimators for Monte Carlo approximationsChristian Robert
This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
A Geometric Note on a Type of Multiple Testing-07-24-2015Junfeng Liu
This document summarizes new perspectives on false discovery rate (FDR) control procedures for multiple testing. It examines FDR control using linear and quadratic rejection cut-off routes applied to ordered p-values. Key findings include: 1) the FDR is controlled at π0q regardless of where the Ha p-value profile crosses the no-rejection boundary, 2) specificity approaches limits as the Ha mean increases, 3) quadratic cuts control FDR better when Ha means are close to zero. Numerical simulations explore the impact of factors like population size, variation levels, and mean profiles on discovery rates and FDR.
This document provides an overview of statistical inference concepts including:
1. Best unbiased estimators, which have minimum mean squared error for a given parameter. The best unbiased estimator, if it exists, must be a function of a sufficient statistic.
2. Sufficiency and the Rao-Blackwell theorem, which states that conditioning an estimator on a sufficient statistic produces a uniformly better estimator.
3. The Cramér-Rao lower bound, which provides a lower bound on the variance of unbiased estimators. Examples are given to illustrate key concepts like when the bound may not hold.
4. Examples are worked through to find minimum variance unbiased estimators, maximum likelihood estimators, and confidence intervals for various distributions
The document describes the binomial experiment, which involves repeating a trial with two possible outcomes (success or failure) a fixed number of times. Each trial has a known probability of success (p) or failure (q=1-p) and trials are independent. The number of successes is then modeled by the binomial distribution. An example calculates the probabilities of different numbers of students being late to class when sampling 5 students, where the probability of an individual student being late is 10%. Key aspects of the binomial experiment and calculating binomial probabilities are summarized.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
The document discusses Lagrange interpolation, which involves constructing a polynomial that passes through a set of known data points. Specifically, it describes:
- The interpolation problem of predicting an unknown value (fI) at a point (xI) given known values (fi) at nodes (xi)
- How Lagrange interpolation polynomials are defined using basis polynomials (Ln,k) such that each basis polynomial is 1 at its node and 0 at other nodes
- An example of constructing a 3rd degree Lagrange interpolation polynomial to interpolate an unknown value f(3) using 4 known data points
The document discusses approximate Bayesian computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to some distance measure. The document covers the basic ABC algorithm, convergence properties as the tolerance approaches zero, examples of ABC for probit models and MA time series models, and advances such as modifying the proposal distribution to increase efficiency.
This document discusses numerical methods for finding the roots of nonlinear equations. It covers the bisection method, Newton-Raphson method, and their applications. The bisection method uses binary search to bracket the root within intervals that are repeatedly bisected until a solution is found. The Newton-Raphson method approximates the function as a linear equation to rapidly converge on roots. Examples and real-world applications are provided for both methods.
random forests for ABC model choice and parameter estimationChristian Robert
The document discusses Approximate Bayesian Computation (ABC). It begins by introducing ABC as a likelihood-free method for Bayesian inference when the likelihood function is unavailable or computationally intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data based on a distance measure.
The document then discusses advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem, and including measurement error in the framework. It also discusses the consistency of ABC as the number of simulations increases and sample size grows large. Finally, it discusses applications of ABC to model selection by treating the model index as an additional parameter.
This document discusses various methods for polynomial interpolation of data points, including Newton's divided difference interpolating polynomials and Lagrange interpolation polynomials. It provides formulas and examples for linear, quadratic, and general polynomial interpolation using Newton's method. It also covers an example of using multiple linear regression with log-transformed data to develop a model relating fluid flow through a pipe to the pipe's diameter and slope.
The document discusses Lagrange interpolating polynomials, which provide an alternative way to write an nth order polynomial that passes through a given set of n+1 data points. The Lagrange interpolating polynomial is defined as the sum of nth order Lagrange coefficient polynomials multiplied by the y-values at each data point. This allows the interpolating polynomial to be determined directly from the data points using simple formulas, without solving systems of equations. An example demonstrates computing the 4th order Lagrange interpolating polynomial for voltage data from three different batteries over time.
This document discusses random variables and their probability distributions. It defines different types of random variables such as real, complex, discrete, continuous, and mixed. It also defines key concepts such as sample space, cumulative distribution function (CDF), and probability density function (PDF) for both discrete and continuous random variables. Examples are provided to illustrate how to calculate the CDF and PDF for different random variables. Properties of CDFs and PDFs are also covered.
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
The document discusses hypothesis testing from both frequentist and Bayesian perspectives. It introduces the concept of statistical tests as functions that output accept or reject decisions for hypotheses. P-values are presented as a way to quantify uncertainty in these decisions. Bayes' original 1763 paper on Bayesian statistics is summarized, introducing the concept of the posterior distribution. Bayesian hypothesis testing is then discussed, including the optimal Bayes test and the use of Bayes factors to compare hypotheses without requiring prior probabilities on the hypotheses.
1. Approximate Bayesian computation (ABC) is a technique used for Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to some distance measure.
2. The document outlines ABC and some of its applications, including ABC as an inference machine for estimating posterior distributions, ABC for model choice between different models, and the "genetics" of ABC in terms of algorithmic improvements.
3. A toy example is provided to illustrate ABC, showing how it can approximate the true posterior distribution of a parameter given a simulated dataset when the likelihood function is unavailable in closed form.
The document proposes using random forests (RF), a machine learning tool, for approximate Bayesian computation (ABC) model choice rather than estimating model posterior probabilities. RF improves on existing ABC model choice methods by having greater discriminative power among models, being robust to the choice and number of summary statistics, requiring less computation, and providing an error rate to evaluate confidence in the model choice. The authors illustrate the power of the RF-based ABC methodology on controlled experiments and real population genetics datasets.
This document discusses three methods for finding the roots of nonlinear equations:
1) Bisection method, which converges linearly but is guaranteed to find a root.
2) Newton's method, which converges quadratically (much faster) but may diverge if the starting point is too far from the root.
3) Secant method, which is faster than bisection but slower than Newton's, and also requires starting points close to the root. Newton's and secant methods can be extended to systems of nonlinear equations.
The document discusses Approximate Bayesian Computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data from the model, accepting simulations that are close to the observed data based on a distance measure and tolerance level. This provides samples from an approximation of the posterior distribution. The document provides examples that motivate ABC and outlines the basic ABC algorithm. It also discusses extensions and improvements to the standard ABC method.
Multiple estimators for Monte Carlo approximationsChristian Robert
This document discusses multiple estimators that can be used to approximate integrals using Monte Carlo simulations. It begins by introducing concepts like multiple importance sampling, Rao-Blackwellisation, and delayed acceptance that allow combining multiple estimators to improve accuracy. It then discusses approaches like mixtures as proposals, global adaptation, and nonparametric maximum likelihood estimation (NPMLE) that frame Monte Carlo estimation as a statistical estimation problem. The document notes various advantages of the statistical formulation, like the ability to directly estimate simulation error from the Fisher information. Overall, the document presents an overview of different techniques for combining Monte Carlo simulations to obtain more accurate integral approximations.
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document describes a new method called component-wise approximate Bayesian computation (ABCG or ABC-Gibbs) that combines approximate Bayesian computation (ABC) with Gibbs sampling. ABCG aims to more efficiently explore parameter spaces when the number of parameters is large. It works by alternately sampling each parameter from its ABC-approximated conditional distribution given current values of other parameters. The document provides theoretical analysis showing ABCG converges to a stationary distribution under certain conditions. It also presents examples demonstrating ABCG can better separate estimates from the prior compared to simple ABC, especially for hierarchical models.
ABC stands for approximate Bayesian computation. It is a method for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC produces samples from an approximate posterior distribution by simulating parameter and summary statistic values that match the observed summary statistics within a tolerance level. The choice of summary statistics is important but difficult, as there is typically no sufficient statistic. Several strategies have been developed for selecting good summary statistics, including using random forests or the Lasso to evaluate and select from a large set of potential summaries.
The document describes Approximate Bayesian Computation (ABC), a technique for performing Bayesian inference when the likelihood function is intractable or impossible to evaluate directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to a distance measure and tolerance level. ABC provides an approximation to the posterior distribution that improves as the tolerance level decreases and more informative summary statistics are used. The document discusses the ABC algorithm, properties of the exact ABC posterior distribution, and challenges in selecting appropriate summary statistics.
A Geometric Note on a Type of Multiple Testing-07-24-2015Junfeng Liu
This document summarizes new perspectives on false discovery rate (FDR) control procedures for multiple testing. It examines FDR control using linear and quadratic rejection cut-off routes applied to ordered p-values. Key findings include: 1) the FDR is controlled at π0q regardless of where the Ha p-value profile crosses the no-rejection boundary, 2) specificity approaches limits as the Ha mean increases, 3) quadratic cuts control FDR better when Ha means are close to zero. Numerical simulations explore the impact of factors like population size, variation levels, and mean profiles on discovery rates and FDR.
This document provides an overview of statistical inference concepts including:
1. Best unbiased estimators, which have minimum mean squared error for a given parameter. The best unbiased estimator, if it exists, must be a function of a sufficient statistic.
2. Sufficiency and the Rao-Blackwell theorem, which states that conditioning an estimator on a sufficient statistic produces a uniformly better estimator.
3. The Cramér-Rao lower bound, which provides a lower bound on the variance of unbiased estimators. Examples are given to illustrate key concepts like when the bound may not hold.
4. Examples are worked through to find minimum variance unbiased estimators, maximum likelihood estimators, and confidence intervals for various distributions
The document describes the binomial experiment, which involves repeating a trial with two possible outcomes (success or failure) a fixed number of times. Each trial has a known probability of success (p) or failure (q=1-p) and trials are independent. The number of successes is then modeled by the binomial distribution. An example calculates the probabilities of different numbers of students being late to class when sampling 5 students, where the probability of an individual student being late is 10%. Key aspects of the binomial experiment and calculating binomial probabilities are summarized.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
The document discusses Lagrange interpolation, which involves constructing a polynomial that passes through a set of known data points. Specifically, it describes:
- The interpolation problem of predicting an unknown value (fI) at a point (xI) given known values (fi) at nodes (xi)
- How Lagrange interpolation polynomials are defined using basis polynomials (Ln,k) such that each basis polynomial is 1 at its node and 0 at other nodes
- An example of constructing a 3rd degree Lagrange interpolation polynomial to interpolate an unknown value f(3) using 4 known data points
The document discusses approximate Bayesian computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to some distance measure. The document covers the basic ABC algorithm, convergence properties as the tolerance approaches zero, examples of ABC for probit models and MA time series models, and advances such as modifying the proposal distribution to increase efficiency.
This document discusses numerical methods for finding the roots of nonlinear equations. It covers the bisection method, Newton-Raphson method, and their applications. The bisection method uses binary search to bracket the root within intervals that are repeatedly bisected until a solution is found. The Newton-Raphson method approximates the function as a linear equation to rapidly converge on roots. Examples and real-world applications are provided for both methods.
random forests for ABC model choice and parameter estimationChristian Robert
The document discusses Approximate Bayesian Computation (ABC). It begins by introducing ABC as a likelihood-free method for Bayesian inference when the likelihood function is unavailable or computationally intractable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data based on a distance measure.
The document then discusses advances in ABC, including modifying the proposal distribution to increase efficiency, viewing it as a conditional density estimation problem, and including measurement error in the framework. It also discusses the consistency of ABC as the number of simulations increases and sample size grows large. Finally, it discusses applications of ABC to model selection by treating the model index as an additional parameter.
This document discusses various methods for polynomial interpolation of data points, including Newton's divided difference interpolating polynomials and Lagrange interpolation polynomials. It provides formulas and examples for linear, quadratic, and general polynomial interpolation using Newton's method. It also covers an example of using multiple linear regression with log-transformed data to develop a model relating fluid flow through a pipe to the pipe's diameter and slope.
The document discusses Lagrange interpolating polynomials, which provide an alternative way to write an nth order polynomial that passes through a given set of n+1 data points. The Lagrange interpolating polynomial is defined as the sum of nth order Lagrange coefficient polynomials multiplied by the y-values at each data point. This allows the interpolating polynomial to be determined directly from the data points using simple formulas, without solving systems of equations. An example demonstrates computing the 4th order Lagrange interpolating polynomial for voltage data from three different batteries over time.
This document discusses random variables and their probability distributions. It defines different types of random variables such as real, complex, discrete, continuous, and mixed. It also defines key concepts such as sample space, cumulative distribution function (CDF), and probability density function (PDF) for both discrete and continuous random variables. Examples are provided to illustrate how to calculate the CDF and PDF for different random variables. Properties of CDFs and PDFs are also covered.
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
The document discusses hypothesis testing from both frequentist and Bayesian perspectives. It introduces the concept of statistical tests as functions that output accept or reject decisions for hypotheses. P-values are presented as a way to quantify uncertainty in these decisions. Bayes' original 1763 paper on Bayesian statistics is summarized, introducing the concept of the posterior distribution. Bayesian hypothesis testing is then discussed, including the optimal Bayes test and the use of Bayes factors to compare hypotheses without requiring prior probabilities on the hypotheses.
1. Approximate Bayesian computation (ABC) is a technique used for Bayesian inference when the likelihood function is intractable or unavailable. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to some distance measure.
2. The document outlines ABC and some of its applications, including ABC as an inference machine for estimating posterior distributions, ABC for model choice between different models, and the "genetics" of ABC in terms of algorithmic improvements.
3. A toy example is provided to illustrate ABC, showing how it can approximate the true posterior distribution of a parameter given a simulated dataset when the likelihood function is unavailable in closed form.
The document proposes using random forests (RF), a machine learning tool, for approximate Bayesian computation (ABC) model choice rather than estimating model posterior probabilities. RF improves on existing ABC model choice methods by having greater discriminative power among models, being robust to the choice and number of summary statistics, requiring less computation, and providing an error rate to evaluate confidence in the model choice. The authors illustrate the power of the RF-based ABC methodology on controlled experiments and real population genetics datasets.
This document discusses several computational methods for Bayesian model choice, including importance sampling, cross-model solutions, nested sampling, and approximate Bayesian computation (ABC) model choice. It introduces Bayes factors and marginal likelihoods as key quantities for Bayesian model comparison, and describes how Bayesian model choice involves allocating probabilities to different models and computing the marginal likelihood or evidence for each model.
Approximative Bayesian Computation (ABC) methods allow approximating intractable likelihoods in Bayesian inference. ABC rejection sampling simulates parameters from the prior and keeps those where simulated data is close to observed data. ABC Markov chain Monte Carlo creates a Markov chain over the parameters where proposed moves are accepted if simulated data is similar to observed. Population Monte Carlo and ABC-MCMC improve on rejection sampling by using sequential importance sampling and MCMC moves to propose parameters in high density regions.
- Approximate Bayesian computation (ABC) is a technique used when the likelihood function is intractable or unavailable. It approximates the Bayesian posterior distribution in a likelihood-free manner.
- ABC works by simulating parameter values from the prior and simulating pseudo-data. Parameter values are accepted if the simulated pseudo-data are "close" to the observed data according to some distance measure and tolerance level.
- ABC originated in population genetics models where genealogies are considered nuisance parameters that cannot be integrated out of the likelihood. It has since been applied to other fields like econometrics for models with complex or undefined likelihoods.
The document discusses computational methods for Bayesian model choice and model comparison. It introduces Bayes factors and the evidence as central quantities for model comparison. It then describes various computational methods for approximating the evidence, including importance sampling solutions like bridge sampling, harmonic mean approximations using posterior samples, and approximating the evidence using mixture representations.
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
Aggregate of three different papers on Rao-Blackwellisation, from Casella & Robert (1996), to Douc & Robert (2010), to Banterle et al. (2015), presented during an OxWaSP workshop on MCMC methods, Warwick, Nov 20, 2015
This document summarizes a talk given by Heiko Strathmann on using partial posterior paths to estimate expectations from large datasets without full posterior simulation. The key ideas are:
1. Construct a path of "partial posteriors" by sequentially adding mini-batches of data and computing expectations over these posteriors.
2. "Debias" the path of expectations to obtain an unbiased estimator of the true posterior expectation using a technique from stochastic optimization literature.
3. This approach allows estimating posterior expectations with sub-linear computational cost in the number of data points, without requiring full posterior simulation or imposing restrictions on the likelihood.
Experiments on synthetic and real-world examples demonstrate competitive performance versus standard M
no U-turn sampler, a discussion of Hoffman & Gelman NUTS algorithmChristian Robert
The document describes the No-U-Turn Sampler (NUTS), an extension of Hamiltonian Monte Carlo (HMC) that aims to avoid the random walk behavior and poor mixing that can occur when the trajectory length L is not set appropriately. NUTS augments the model with a slice variable and uses a deterministic procedure to select a set of candidate states C based on the instantaneous distance gain, avoiding the need to manually tune L. It builds up a set of possible states B by doubling a binary tree and checking the distance criterion on subtrees, then samples from the uniform distribution over C to generate proposals. This allows NUTS to automatically determine an appropriate trajectory length and avoid issues like periodicity that can plague
Last year, I gave an advanced graduate course at CREST about Jeffreys' Theory of Probability. Those are the slides. I also wrote a paper published on arXiv with two participants to the course.
International Conference on Monte Carlo techniques
Closing conference of thematic cycle
Paris July 5-8th 2016
Campus les cordeliers
Jere Koskela's slides
This document discusses Bayesian hypothesis testing and some of the challenges associated with it. It makes three key points:
1) There is tension between using posterior probabilities from a loss function approach versus Bayes factors, which eliminate prior dependence but have no direct connection to the posterior.
2) Bayesian hypothesis testing relies on choosing prior probabilities for hypotheses and prior distributions for parameters, which can strongly impact results and are often arbitrary.
3) Common Bayesian testing procedures like using Bayes factors can produce paradoxical results in some cases, like Lindley's paradox where the Bayes factor favors the null hypothesis as sample size increases despite evidence against it.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. ABC can be framed as a machine learning problem where simulated datasets are used to learn which model is most appropriate. Random forests are well-suited for this as they can handle many correlated summary statistics without information loss. The random forest predicts the most likely model but not posterior probabilities. Instead, the posterior predictive expected error rate across models is proposed to evaluate model selection performance without unstable probability approximations. An example comparing MA(1) and MA(2) time series models illustrates the approach.
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
The document proposes a delayed acceptance method for accelerating Metropolis-Hastings algorithms. It begins with a motivating example of non-informative inference for mixture models where computing the prior density is costly. It then introduces the delayed acceptance approach which splits the acceptance probability into pieces that are evaluated sequentially, avoiding computing the full acceptance ratio each time. It validates that the delayed acceptance chain is reversible and provides bounds on its spectral gap and asymptotic variance compared to the original chain. Finally, it discusses optimizing the delayed acceptance approach by considering the expected square jump distance and cost per iteration to maximize efficiency.
This document proposes representing hypothesis testing problems as estimating mixture models. Specifically, two competing models are embedded within an encompassing mixture model with a weight parameter between 0 and 1. Inference is then drawn on the mixture representation, treating each observation as coming from the mixture model. This avoids difficulties with traditional Bayesian testing approaches like computing marginal likelihoods. It also allows for a more intuitive interpretation of the weight parameter compared to posterior model probabilities. The weight parameter can be estimated using standard mixture estimation algorithms like Gibbs sampling or Metropolis-Hastings. Several illustrations of the approach are provided, including comparisons of Poisson and geometric distributions.
This document discusses challenges and recent advances in Approximate Bayesian Computation (ABC) methods. ABC methods are used when the likelihood function is intractable or unavailable in closed form. The core ABC algorithm involves simulating parameters from the prior and simulating data, retaining simulations where the simulated and observed data are close according to a distance measure on summary statistics. The document outlines key issues like scalability to large datasets, assessment of uncertainty, and model choice, and discusses advances such as modified proposals, nonparametric methods, and perspectives that include summary construction in the framework. Validation of ABC model choice and selection of summary statistics remains an open challenge.
The document discusses random variables and probability distributions. It defines a random variable as a function that assigns a numerical value to each outcome in a sample space. Random variables can be discrete or continuous. The probability distribution of a random variable describes its possible values and the probabilities associated with each value. It then discusses the binomial distribution in detail as an example of a theoretical probability distribution. The binomial distribution applies when there are a fixed number of independent yes/no trials, each with the same constant probability of success.
This document provides an overview and agenda for a master's level course on probability and statistics. It covers key topics like statistical models, probability distributions, conditional distributions, convergence theorems, sampling, confidence intervals, decision theory, and testing procedures. Examples of common probability distributions and functions are also presented, including the cumulative distribution function, probability density function, independence, and conditional independence. Additional references for further reading are included.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
ISM_Session_5 _ 23rd and 24th December.pptxssuser1eba67
The document discusses random variables and their probability distributions. It defines discrete and continuous random variables and their key characteristics. Discrete random variables can take on countable values while continuous can take any value in an interval. Probability distributions describe the probabilities of a random variable taking on different values. The mean and variance are discussed as measures of central tendency and variability. Joint probability distributions are introduced for two random variables. Examples and homework problems are also provided.
This document provides an overview of probabilistic reasoning and uncertainty in knowledge representation. It discusses:
1) Using probability theory to represent uncertainty quantitatively rather than logical rules with certainty factors.
2) Key concepts in probability theory including random variables, probability distributions, joint probabilities, marginal probabilities, and conditional probabilities.
3) Representing a problem domain as a probabilistic model with a sample space of possible variable states.
4) Independence of variables allowing simpler computation of probabilities.
5) The document is an introduction to probabilistic reasoning concepts to be covered in more detail later, including Bayesian networks.
This document provides an overview of probabilistic reasoning and uncertainty in knowledge representation. It discusses:
1) Using probability theory to represent uncertainty quantitatively rather than logical rules with certainty factors.
2) Key concepts in probability theory including random variables, probability distributions, joint probabilities, marginal probabilities, and conditional probabilities.
3) Representing a problem domain as a probabilistic model with a sample space of possible variable states.
4) Independence of variables allowing simpler computation of probabilities.
5) The document is an introduction to probabilistic reasoning concepts to be covered in more detail later, including Bayesian networks.
This document provides an overview of some common continuous and discrete probability distributions, including their probability density/mass functions, cumulative distribution functions, expected values, variances, and moment generating functions. It covers standard and rescaled versions of distributions like the uniform, normal, exponential, gamma, binomial, Poisson, geometric, negative binomial, and hypergeometric distributions. Formulas are provided for transforming between standard and rescaled versions.
The document discusses random processes and probability theory concepts relevant to communication systems. It defines key terms like random variables, sample space, events, probability, and distributions. It describes different types of random variables like discrete and continuous, and their probability mass functions and density functions. It also discusses statistical measures like mean, variance, and covariance that are used to characterize random signals and compare signals. Specific random variables discussed include binomial and uniform. The document provides foundations for analyzing random signals in communication systems.
The document discusses cumulative distribution functions (CDFs) and probability density functions (PDFs) for continuous random variables. It provides definitions and properties of CDFs and PDFs. For CDFs, it describes how they give the probability that a random variable is less than or equal to a value. For PDFs, it explains how they provide the probability of a random variable taking on a particular value. The document also gives examples of CDFs and PDFs for exponential and uniform random variables.
This document provides information on probability distributions and related concepts. It defines discrete and continuous random distributions. It explains probability distribution functions for discrete and continuous random variables and their properties. It also discusses mathematical expectation, variance, and examples of calculating these values for random variables.
Basics of probability in statistical simulation and stochastic programmingSSA KPI
AACIMP 2010 Summer School lecture by Leonidas Sakalauskas. "Applied Mathematics" stream. "Stochastic Programming and Applications" course. Part 2.
More info at http://summerschool.ssa.org.ua
1) A sufficient statistic contains all the information needed to estimate an unknown parameter θ from a sample.
2) The factorization theorem provides a way to determine if a statistic is sufficient - if the likelihood function can be written as the product of two functions, one depending on the statistic and θ and the other not depending on θ, then the statistic is sufficient.
3) A minimal sufficient statistic is a sufficient statistic that is a function of all other sufficient statistics, representing the ultimate data reduction. The likelihood ratio test can be used to determine if a statistic is minimal sufficient.
WSC 2011, advanced tutorial on simulation in StatisticsChristian Robert
This document discusses recent advances in simulation methods for statistics. It motivates the use of such methods by explaining how latent variable models can make inference computationally difficult. It introduces Monte Carlo integration and the Metropolis-Hastings algorithm as two important simulation techniques. The document also discusses how Bayesian analysis provides a framework to combine prior information with data, but computing the posterior distribution can be challenging for complex models. Simulation methods are presented as a way to approximate solutions to these computationally difficult statistical problems.
Chapter 3 – Random Variables and Probability DistributionsJasonTagapanGulla
The document summarizes key concepts about random variables and probability distributions. It defines a random variable as a function that assigns numerical values to outcomes of a statistical experiment. Random variables can be discrete, taking on countable values, or continuous, having values on a continuous scale.
For discrete random variables, the probabilities of each value are specified by the probability mass function (PMF). For continuous random variables, the probability density function (PDF) gives the probability of values in an interval as the area under the curve of the function. The cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a value.
An example calculates the PMF of the number of defective computers selected in
this materials is useful for the students who studying masters level in elect...BhojRajAdhikari5
This document discusses concepts related to continuous and discrete random variables including:
- Defining continuous random variables based on their cumulative distribution function (CDF) being continuous.
- Introducing the probability density function (PDF) as the derivative of the CDF, which indicates the probability of a continuous random variable being near a given value.
- Defining joint and conditional probability distribution functions for multiple random variables.
- Discussing statistical independence and functions of random variables.
- Introducing statistical averages like the expected value, variance, and standard deviation of random variables. Formulas for calculating these are provided.
This document contains lecture notes on machine learning and deep learning. It discusses regression, classification, and neural networks. For regression and classification, it presents the optimal functions that minimize error and relates them to conditional expectations. It also provides bounds on the generalization error of functions learned through empirical risk minimization. For neural networks, it discusses their ability to approximate functions and bounds the VC-dimension of neural networks with multiple hidden layers.
The document discusses expectation of discrete random variables. It defines expectation as the weighted average of all possible values a random variable can take, with weights given by each value's probability. Expectation provides a measure of the central tendency of a probability distribution. Several examples are provided to demonstrate calculating expectation for different discrete random variables and distributions like binomial, geometric, and Poisson. Properties of expectation like linearity and independence are also covered.
This document discusses differentially private distributed Bayesian linear regression with Markov chain Monte Carlo (MCMC) methods. It proposes adding noise to the summaries (S) and coefficients (z) of local linear regression models on different devices to provide differential privacy. Gibbs sampling is used to simulate the genuine posterior distribution over the linear model parameters (theta, sigma_y, Sigma_x, z1:J, S1:J) in a distributed manner while maintaining privacy. Alternative approaches like exploiting approximate posteriors from all devices or learning iteratively are also mentioned.
This document discusses mixture models and approximations to computing model evidence. It contains:
1) An overview of mixtures of distributions and common priors used for mixtures.
2) Approximations to computing marginal likelihoods or model evidence using Chib's representation and Rao-Blackwellization. Permutations are used to address label switching issues.
3) Methods for more efficient sampling for computing model evidence, including iterative bridge sampling and dual importance sampling with approximations to reduce the number of permutations considered.
Sequential Monte Carlo is also briefly mentioned as an alternative approach.
This document describes the adaptive restore algorithm, a non-reversible Markov chain Monte Carlo method. It begins with an overview of the restore process, which takes regenerations from an underlying diffusion or jump process to construct a reversible Markov chain with a target distribution. The adaptive restore process enriches this by allowing the regeneration distribution to adapt over time. It converges almost surely to the minimal regeneration distribution. Parameters like the initial regeneration distribution and rates are discussed. Examples are provided for the adaptive Brownian restore algorithm and calibrating the parameters.
This document summarizes techniques for approximating marginal likelihoods and Bayes factors, which are important quantities in Bayesian inference. It discusses Geyer's 1994 logistic regression approach, links to bridge sampling, and how mixtures can be used as importance sampling proposals. Specifically, it shows how optimizing the logistic pseudo-likelihood relates to the bridge sampling optimal estimator. It also discusses non-parametric maximum likelihood estimation based on simulations.
This document discusses Bayesian restricted likelihood methods for situations where the likelihood cannot be fully trusted. It presents several approaches including empirical likelihood, Bayesian empirical likelihood, using insufficient statistics, approximate Bayesian computation (ABC), and MCMC on manifolds. The key ideas are developing Bayesian tools that are robust to model misspecification by questioning the likelihood, prior, and other assumptions.
This document discusses various methods for approximating marginal likelihoods and Bayes factors, including:
1. Geyer's 1994 logistic regression approach for approximating marginal likelihoods using importance sampling.
2. Bridge sampling and its connection to Geyer's approach. Optimal bridge sampling requires knowledge of unknown normalizing constants.
3. Using mixtures of importance distributions and the target distribution as proposals to estimate marginal likelihoods through Rao-Blackwellization. This connects to bridge sampling estimates.
4. The document discusses various methods for approximating marginal likelihoods and comparing hypotheses using Bayes factors. It outlines the historical development and connections between different approximation techniques.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document summarizes Approximate Bayesian Computation (ABC). It discusses how ABC provides a way to approximate Bayesian inference when the likelihood function is intractable or too computationally expensive to evaluate directly. ABC works by simulating data under different parameter values and accepting simulations that are close to the observed data according to a distance measure and tolerance level. Key points discussed include:
- ABC provides an approximation to the posterior distribution by sampling from simulations that fall within a tolerance of the observed data.
- Summary statistics are often used to reduce the dimension of the data and improve the signal-to-noise ratio when applying the tolerance criterion.
- Random forests can help select informative summary statistics and provide semi-automated ABC
The document describes a new method called component-wise approximate Bayesian computation (ABC) that combines ABC with Gibbs sampling. It aims to improve ABC's ability to efficiently explore parameter spaces when the number of parameters is large. The method works by alternating sampling from each parameter's ABC posterior conditional distribution given current values of other parameters and the observed data. The method is proven to converge to a stationary distribution under certain assumptions, especially for hierarchical models where conditional distributions are often simplified. Numerical experiments on toy examples demonstrate the method can provide a better approximation of the true posterior than vanilla ABC.
1) Likelihood-free Bayesian experimental design is discussed as an intractable likelihood optimization problem, where the goal is to find the optimal design d that minimizes expected loss without using the full posterior distribution.
2) Several Bayesian tools are proposed to make the design problem more Bayesian, including Bayesian non-parametrics, annealing algorithms, and placing a posterior on the design d.
3) Gaussian processes are a default modeling choice for complex unknown functions in these problems, but their accuracy is difficult to assess and they may incur a dimension curse.
a discussion of Chib, Shin, and Simoni (2017-8) Bayesian moment modelsChristian Robert
This document discusses Bayesian estimation of conditional moment models. It presents several approaches for completing conditional moment models for Bayesian processing, including using non-parametric parts, empirical likelihood Bayesian tools, or maximum entropy alternatives. It also discusses simplistic ABC alternatives and innovative aspects of introducing tolerance parameters for misspecification and cancelling conditional aspects. Unconditional and conditional model comparison using empirical likelihoods and Bayes factors is proposed.
This document discusses using the Wasserstein distance for inference in generative models. It begins with an overview of approximate Bayesian computation (ABC) and how distances between samples are used. It then introduces the Wasserstein distance as an alternative distance that can have lower variance than the Euclidean distance. Computational aspects and asymptotics of using the Wasserstein distance are discussed. The document also covers how transport distances can handle time series data.
Temple of Asclepius in Thrace. Excavation resultsKrassimira Luka
The temple and the sanctuary around were dedicated to Asklepios Zmidrenus. This name has been known since 1875 when an inscription dedicated to him was discovered in Rome. The inscription is dated in 227 AD and was left by soldiers originating from the city of Philippopolis (modern Plovdiv).
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
Statistics (1): estimation, Chapter 1: Models
1. Chapter 1 :
statistical vs. real models
Statistical models
Quantities of interest
Exponential families
2. Statistical models
For most of the course, we assume that the data is a random
sample x1, . . . , xn and that
x1, . . . , xn F(x)
as i.i.d. variables or as transforms of i.i.d. variables
Motivation:
Repetition of observations increases information about F, by virtue
of probabilistic limit theorems (LLN, CLT)
3. Statistical models
For most of the course, we assume that the data is a random
sample x1, . . . , xn and that
x1, . . . , xn F(x)
as i.i.d. variables or as transforms of i.i.d. variables
Motivation:
Repetition of observations increases information about F, by virtue
of probabilistic limit theorems (LLN, CLT)
Warning 1: Some aspects of F may ultimately remain unavailable
4. Statistical models
For most of the course, we assume that the data is a random
sample x1, . . . , xn and that
x1, . . . , xn F(x)
as i.i.d. variables or as transforms of i.i.d. variables
Motivation:
Repetition of observations increases information about F, by virtue
of probabilistic limit theorems (LLN, CLT)
Warning 2: The model is always wrong, even though we behave as
if...
5. Limit of averages
Case of an iid sequence x1, . . . , xn N(0, 1)
Evolution of the range of X
n across 1000 repetitions, along with one random
sequence and the theoretical 95% range
6. Limit theorems
Law of Large Numbers (LLN)
If X1, . . . ,Xn are i.i.d. random variables, with a well-de
7. ned
expectation E[X]
X1 + . . . + Xn
n
prob
! E[X]
[proof: see Terry Tao's What's new, 18 June 2008]
8. Limit theorems
Law of Large Numbers (LLN)
If X1, . . . ,Xn are i.i.d. random variables, with a well-de
9. ned
expectation E[X]
X1 + . . . + Xn
n
a.s.
! E[X]
[proof: see Terry Tao's What's new, 18 June 2008]
10. Limit theorems
Law of Large Numbers (LLN)
If X1, . . . ,Xn are i.i.d. random variables, with a well-de
11. ned
expectation E[X]
X1 + . . . + Xn
n
a.s.
! E[X]
Central Limit Theorem (CLT)
If X1, . . . ,Xn are i.i.d. random variables, with a well-de
18. X1 + . . . + Xn
n
- E[X]
dist.
! N(0, 2)
[proof: see Terry Tao's What's new, 5 January 2010]
Continuity Theorem
If
Xn
dist.
! a
and g is continuous at a, then
g(Xn)
dist.
! g(a)
19. Limit theorems
Central Limit Theorem (CLT)
If X1, . . . ,Xn are i.i.d. random variables, with a well-de
22. X1 + . . . + Xn
n
- E[X]
dist.
! N(0, 2)
[proof: see Terry Tao's What's new, 5 January 2010]
Slutsky's Theorem
If Xn, Yn, Zn converge in distribution to X, a, and b, respectively,
then
XnYn + Zn
dist.
! aX + b
23. Limit theorems
Central Limit Theorem (CLT)
If X1, . . . ,Xn are i.i.d. random variables, with a well-de
26. X1 + . . . + Xn
n
- E[X]
dist.
! N(0, 2)
[proof: see Terry Tao's What's new, 5 January 2010]
Delta method's Theorem
If p
nfXn - g
dist.
! Np(0,
)
and g : Rp ! Rq is a continuously dierentiable function on a
neighbourhood of 2 Rp, with a non-zero gradient rg(), then
p
n fg(Xn) - g()g
dist.
! Nq(0,rg()T
rg())
27. Exemple 1: Binomial sample
Case # 1: Observation of i.i.d. Bernoulli variables
xi B(p)
with unknown parameter p (e.g., opinion poll)
Case # 2: Observation of independent Bernoulli variables
xi B(pi)
with unknown and dierent parameters pi (e.g., opinion poll,
u
epidemics)
Transform of i.i.d. u1, . . . , un:
xi = I(ui 6 pi)
28. Exemple 1: Binomial sample
Case # 1: Observation of i.i.d. Bernoulli variables
xi B(p)
with unknown parameter p (e.g., opinion poll)
Case # 2: Observation of conditionally independent Bernoulli
variables
xijzi B(p(zi))
with covariate-driven parameters p(zi) (e.g., opinion poll,
u
epidemics)
Transform of i.i.d. u1, . . . , un:
xi = I(ui 6 pi)
29. Parametric versus non-parametric
Two classes of statistical models:
Parametric when F varies within a family of distributions
indexed by a parameter that belongs to a
30. nite dimension
space :
F 2 fF, 2 g
and to know F is to know which it corresponds to
(identi
31. ability);
Non-parametric all other cases, i.e. when F is not constrained
in a parametric way or when only some aspects of F are of
interest for inference
Trivia: Machine-learning does not draw such a strict distinction
between classes
32. Parametric versus non-parametric
Two classes of statistical models:
Parametric when F varies within a family of distributions
indexed by a parameter that belongs to a
33. nite dimension
space :
F 2 fF, 2 g
and to know F is to know which it corresponds to
(identi
34. ability);
Non-parametric all other cases, i.e. when F is not constrained
in a parametric way or when only some aspects of F are of
interest for inference
Trivia: Machine-learning does not draw such a strict distinction
between classes
35. Non-parametric models
In non-parametric models, there may still be constraints on the
range of F`s as for instance
EF[YjX = x] = (
36. Tx), varF(YjX = x) = 2
in which case the statistical inference only deals with estimating or
testing the constrained aspects or providing prediction.
Note: Estimating a density or a regression function like (
37. Tx) is
only of interest in a restricted number of cases
38. Parametric models
When F = F, inference usually covers the whole of the parameter
and provides
point estimates of , i.e. values substituting for the unknown
true
con
39. dence intervals (or regions) on as regions likely to
contain the true
testing speci
40. c features of (true or not?) or of the whole
family (goodness-of-
41. t)
predicting some other variable whose distribution depends on
z1, . . . , zm G(z)
Inference: all those procedures depend on the sample (x1, . . . , xn)
42. Parametric models
When F = F, inference usually covers the whole of the parameter
and provides
point estimates of , i.e. values substituting for the unknown
true
con
43. dence intervals (or regions) on as regions likely to
contain the true
testing speci
44. c features of (true or not?) or of the whole
family (goodness-of-
45. t)
predicting some other variable whose distribution depends on
z1, . . . , zm G(z)
Inference: all those procedures depend on the sample (x1, . . . , xn)
46. Example 1: Binomial experiment again
Model: Observation of i.i.d. Bernoulli variables
xi B(p)
with unknown parameter p (e.g., opinion poll)
Questions of interest:
1 likely value of p or range thereof
2 whether or not p exceeds a level p0
3 how many more observations are needed to get an estimation
of p precise within two decimals
4 what is the average length of a lucky streak (1's in a row)
47. Exemple 2: Normal sample
Model: Observation of i.i.d. Normal variates
xi N(, 2)
with unknown parameters and 0 (e.g., blood pressure)
Questions of interest:
1 likely value of or range thereof
2 whether or not is above the mean of another sample
y1, . . . , ym
3 percentage of extreme values in the next batch of m xi's
4 how many more observations to exclude zero from likely values
5 which of the xi's are outliers
48. Quantities of interest
Statistical distributions (incompletely) characterised by (1-D)
moments:
central moments
1 = E[X] =
Z
xdF(x) k = E
h
(X - 1)k
i
k 1
non-central moments
k = E
Xk
k 1
quantile
P(X ) =
and (2-D) moments
cov(Xi, Xj) =
Z
(xi - E[Xi])(xj - E[Xj])dF(xi, xj)
Note: For parametric models, those quantities are transforms of
the parameter
49. Example 1: Binomial experiment again
Model: Observation of i.i.d. Bernoulli variables
Xi B(p)
Single parameter p with
E[X] = p var(X) = p(1 - p)
[somewhat boring...]
Median and mode
50. Example 1: Binomial experiment again
Model: Observation of i.i.d. Binomial variables
Xi B(n, p) P(X = k) =
n
k
pk(1 - p)n-k
Single parameter p with
E[X] = np var(X) = np(1 - p)
[somewhat less boring!]
Median and mode
51. Example 2: Normal experiment again
Model: Observation of i.i.d. Normal variates
xi N(, 2) i = 1, . . . , n ,
with unknown parameters and 0 (e.g., blood pressure)
1 = E[X] = var(X) = 2 3 = 0 4 = 34
Median and mode equal to
52. Exponential families
Class of parametric densities with nice analytic properties
Start from the normal density:
'(x; ) =
1
p
2
exp
x - x2=2 - 2=2
=
expf-2=2g
p
2
exp fxg exp
-x2=2
where and x only interact through single exponential product
54. nition
A parametric family of distributions on X is an exponential family
if its density with respect to a measure satis
55. es
f(xj) = c()h(x) expfT(x)T()g , 2 ,
where T() and () are k-dimensional functions and c() and h()
are positive unidimensional functions.
Function c() is redundant, being de
57. Exponential families (examples)
Example 1: Binomial experiment again
Binomial variable
X B(n, p) P(X = k) =
n
k
pk(1 - p)n-k
can be expressed as
P(X = k) = (1 - p)n
n
k
expfk log(p=(1 - p))g
hence
c(p) = (1 - p)n , h(x) =
n
x
, T(x) = x , (p) = log(p=(1 - p))
58. Exponential families (examples)
Example 2: Normal experiment again
Normal variate
X N(, 2)
with parameter = (, 2) and density
f(xj) =
1
p
22
expf-(x - )2=22g
=
1
p
22
expf-x2=22 + x=2 - 2=22g
=
expf-2=22g
p
22
expf-x2=22 + x=2g
hence
c() =
expf-2=22g
p
22
, T(x) =
x2
x
, () =
-1=22
=2
60. nition
In an exponential family, the natural parameter is () and the
natural parameter space is
=
61. 2 Rk;
Z
X
h(x) expfT(x)Tgd(x) 1
Example For the B(m, p) distribution, the natural parameter is
= logfp=(1 - p)g
and the natural parameter space is R
63. nition
In an exponential family, the natural parameter is () and the
natural parameter space is
=
64. 2 Rk;
Z
X
h(x) expfT(x)Tgd(x) 1
Example For the B(m, p) distribution, the natural parameter is
= logfp=(1 - p)g
and the natural parameter space is R
65. regular and minimal exponential families
Possible to add/delete useless components of T:
De
66. nition
A regular exponential family corresponds to the case where is an
open set. A minimal exponential family corresponds to the case
when the Ti(X)'s are linearly independent, i.e.
P(TT(X) = const.) = 0 for 6= 0
Also called non-degenerate exponential family
Usual assumption when working with exponential families
67. regular and minimal exponential families
Possible to add/delete useless components of T:
De
68. nition
A regular exponential family corresponds to the case where is an
open set. A minimal exponential family corresponds to the case
when the Ti(X)'s are linearly independent, i.e.
P(TT(X) = const.) = 0 for 6= 0
Also called non-degenerate exponential family
Usual assumption when working with exponential families
69. Illustrations
For a Normal N(, 2) distribution,
f(xj, ) =
1
p
2
1
expf-x2=22 + =2 x - 2=22g
means this is a two-dimensional minimal exponential family
For a fourth-power distribution
f(xj) = C expf-(x - )4gg / e-x4
e43x-62x2+4x3-4
implies this is a three-power distribution
[Exercise:
71. convexity properties
Highly regular densities
Theorem
The natural parameter space of an exponential family is convex
and the inverse normalising constant c-1() is a convex function.
Example For B(n, p), the natural parameter space is R and the
inverse normalising constant (1 + exp())n is convex
72. convexity properties
Highly regular densities
Theorem
The natural parameter space of an exponential family is convex
and the inverse normalising constant c-1() is a convex function.
Example For B(n, p), the natural parameter space is R and the
inverse normalising constant (1 + exp())n is convex
73. analytic properties
Lemma
If the density of X has the minimal representation
f(xj) = c()h(x) expfT(x)Tg
then the natural statistic T(X) is also from an exponential family
and there exists a measure T such that the density of T(X)
against T is
c() expftTg
74. analytic properties
Theorem
If the density of T(X) against T is c() expftTg, if the real value
function ' is measurable with
Z
j'(t)j expftTg dT (t) 1
on the interior of , then
f : !
Z
'(t) expftTg dT (t)
is an analytic function on the interior of and
rf() =
Z
t'(t) expftTg dT (t)
75. analytic properties
Theorem
If the density of T(X) against T is c() expftTg, if the real value
function ' is measurable with
Z
j'(t)j expftTg dT (t) 1
on the interior of , then
f : !
Z
'(t) expftTg dT (t)
is an analytic function on the interior of and
rf() =
Z
t'(t) expftTg dT (t)
Example For B(n, p), the natural parameter space is R and the
inverse normalising constant (1 + exp())n is convex
76. moments of exponential families
Normalising constant c() inducing all moments
Proposition
If T(x) 2 Rd and the density of T(X) is expfT(x)T - ()g, then
E
expfT(x)Tug
= expf ( + u) - ()g
and () is the cumulant generating function.
77. moments of exponential families
Normalising constant c() inducing all moments
Proposition
If T(x) 2 Rd and the density of T(X) is expfT(x)T - ()g, then
E[Ti(x)] =
@ ()
@i
i = 1, . . . , d,
and
E
cov(Ti(X), Tj(X))
=
@2 ()
@i@j
i, j = 1, . . . , d
Sort of integration by part in parameter space:
Z
79. further examples of exponential families
Example
Chi-square 2
k distribution corresponing to distribution of
X21
+ . . . + X2
k when Xi N(0, 1), with density
fk(z) =
zk=2-1 expf-z=2g
2k=2 (k=2)
z 2 R+
80. further examples of exponential families
Counter-Example
Non-central chi-square 2
k() distribution corresponing to
distribution of X21
+ . . . + X2
k when Xi N(, 1), with density
fk,(z) = 1=2 (z=)k=4-1=2 expf-(z + )=2gIk=2-1(
p
z) z 2 R+
where = k2 and I Bessel function of second order
81. further examples of exponential families
Counter-Example
Fisher Fn,m distribution corresponding to the ratio
Z =
Yn=n
Ym=m
n, Ym 2
m ,
Yn 2
with density
fm,n(z) =
(n=m)n=2
B(n=2,m=2)
zn=2-1 (1 + n=mz)-n+m=2
82. further examples of exponential families
Example
Ising Be(n=2,m=2) distribution corresponding to the distribution of
Z =
nY
nY +m
when Y Fn,m
has density
fm,n(z) =
1
B(n=2,m=2)
zn=2-1 (1 - z)m=2-1 z 2 (0, 1)