This document summarizes a talk on inference on treatment effects after model selection. It discusses challenges with inferring treatment effects after refitting a model selected via a procedure like lasso. Specifically, refitting can lead to bias due to overfitting or underfitting the model. The document proposes using repeated data splitting to remove the overfitting bias. In each split, part of the data is used for model selection and the other part for estimating treatment effects without overfitting bias. This approach reduces bias compared to simply refitting the full model.
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Modelweekendsunny
This document summarizes the author's approach to predicting short term stock price movements in the 2010 INFORMS Data Mining Contest. It began with support vector machines and logistic regression, then tried LASSO (logistic regression with variable selection) and other methods. The author eventually used a two-stage variable selection method with LASSO on lagged data to select variables for a generalized linear model, achieving 3rd place. The document outlines the basic analysis, variable selection methods explored including traditional approaches and L1-penalized LASSO, and results from using future information against the evaluation criteria.
This document discusses approximate Bayesian computation (ABC) methods for performing Bayesian inference when the likelihood function is intractable. ABC methods approximate the posterior distribution by simulating data under different parameter values and selecting simulations that match the observed data based on summary statistics. The document outlines how ABC originated in population genetics to model complex demographic scenarios and mutation processes. It then describes the basic ABC rejection sampling algorithm and how it provides an approximation of the posterior distribution by sampling from regions of high density defined by the summary statistics.
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
Large sample property of the bayes factor in a spline semiparametric regressi...Alexander Decker
This document summarizes a research paper about investigating the large sample property of the Bayes factor for testing the polynomial component of a spline semiparametric regression model against a fully spline alternative model. It considers a semiparametric regression model where the mean function has two parts - a parametric linear component and a nonparametric penalized spline component. By representing the model as a mixed model, it obtains the closed form of the Bayes factor and proves that the Bayes factor is consistent under certain conditions on the prior and design matrix. It establishes that the Bayes factor converges to infinity under the pure polynomial model and converges to zero almost surely under the spline semiparametric alternative model.
This document summarizes several statistical methods for handling non-ignorable nonresponse in data, including maximum likelihood estimation, partial likelihood approaches, generalized method of moments, and exponential tilting methods. It discusses full likelihood-based maximum likelihood estimation using the observed data likelihood and EM algorithm. Partial likelihood approaches like conditional likelihood and pseudo likelihood are presented as alternatives that use a subset of the observed data.
This document discusses Bayesian inference on mixtures models. It covers several key topics:
1. Density approximation and consistency results for mixtures as a way to approximate unknown distributions.
2. The "scarcity phenomenon" where the posterior probabilities of most component allocations in mixture models are zero, concentrating on just a few high probability allocations.
3. Challenges with Bayesian inference for mixtures, including identifiability issues, label switching, and complex combinatorial calculations required to integrate over all possible component allocations.
This document discusses predictive mean matching (PMM) imputation in survey sampling. It begins with an outline and overview of the basic setup, assumptions, and PMM imputation method. It then presents three main theorems: 1) the asymptotic normality of the PMM estimator when the regression parameter β* is known, 2) the asymptotic normality when β* is estimated, and 3) the asymptotic properties of nearest neighbor imputation. The document also discusses variance estimation for the PMM estimator using replication methods like the bootstrap or jackknife. In summary, it provides a theoretical analysis of the asymptotic properties of PMM imputation and approaches for estimating the variance.
Predicting Short Term Movements of Stock Prices: A Two-Stage L1-Penalized Modelweekendsunny
This document summarizes the author's approach to predicting short term stock price movements in the 2010 INFORMS Data Mining Contest. It began with support vector machines and logistic regression, then tried LASSO (logistic regression with variable selection) and other methods. The author eventually used a two-stage variable selection method with LASSO on lagged data to select variables for a generalized linear model, achieving 3rd place. The document outlines the basic analysis, variable selection methods explored including traditional approaches and L1-penalized LASSO, and results from using future information against the evaluation criteria.
This document discusses approximate Bayesian computation (ABC) methods for performing Bayesian inference when the likelihood function is intractable. ABC methods approximate the posterior distribution by simulating data under different parameter values and selecting simulations that match the observed data based on summary statistics. The document outlines how ABC originated in population genetics to model complex demographic scenarios and mutation processes. It then describes the basic ABC rejection sampling algorithm and how it provides an approximation of the posterior distribution by sampling from regions of high density defined by the summary statistics.
To get a copy of the slides for free Email me at: japhethmuthama@gmail.com
You can also support my PhD studies by donating a 1 dollar to my PayPal.
PayPal ID is japhethmuthama@gmail.com
Large sample property of the bayes factor in a spline semiparametric regressi...Alexander Decker
This document summarizes a research paper about investigating the large sample property of the Bayes factor for testing the polynomial component of a spline semiparametric regression model against a fully spline alternative model. It considers a semiparametric regression model where the mean function has two parts - a parametric linear component and a nonparametric penalized spline component. By representing the model as a mixed model, it obtains the closed form of the Bayes factor and proves that the Bayes factor is consistent under certain conditions on the prior and design matrix. It establishes that the Bayes factor converges to infinity under the pure polynomial model and converges to zero almost surely under the spline semiparametric alternative model.
This document summarizes several statistical methods for handling non-ignorable nonresponse in data, including maximum likelihood estimation, partial likelihood approaches, generalized method of moments, and exponential tilting methods. It discusses full likelihood-based maximum likelihood estimation using the observed data likelihood and EM algorithm. Partial likelihood approaches like conditional likelihood and pseudo likelihood are presented as alternatives that use a subset of the observed data.
This document discusses Bayesian inference on mixtures models. It covers several key topics:
1. Density approximation and consistency results for mixtures as a way to approximate unknown distributions.
2. The "scarcity phenomenon" where the posterior probabilities of most component allocations in mixture models are zero, concentrating on just a few high probability allocations.
3. Challenges with Bayesian inference for mixtures, including identifiability issues, label switching, and complex combinatorial calculations required to integrate over all possible component allocations.
This document discusses predictive mean matching (PMM) imputation in survey sampling. It begins with an outline and overview of the basic setup, assumptions, and PMM imputation method. It then presents three main theorems: 1) the asymptotic normality of the PMM estimator when the regression parameter β* is known, 2) the asymptotic normality when β* is estimated, and 3) the asymptotic properties of nearest neighbor imputation. The document also discusses variance estimation for the PMM estimator using replication methods like the bootstrap or jackknife. In summary, it provides a theoretical analysis of the asymptotic properties of PMM imputation and approaches for estimating the variance.
The Bayesian paradigm provides a coherent approach for quantifying uncertainty given available data and prior information. Aspects of uncertainty that arise in practice include uncertainty regarding parameters within a model, the choice of model, and propagation of uncertainty in parameters and models for predictions. In this talk I will present Bayesian approaches for addressing model uncertainty given a collection of competing models including model averaging and ensemble methods that potentially use all available models and will highlight computational challenges that arise in implementation of the paradigm.
Some sampling techniques for big data analysisJae-kwang Kim
This document describes different sampling techniques for big data analysis, including reservoir sampling and its variants. It provides an example to illustrate simple random sampling and calculates the expected value and variance of sampling errors. It then discusses probability sampling and its advantages over non-probability sampling. The document also introduces survey sampling and challenges in the era of big data, as well as how sampling techniques can still be useful for handling big data. It outlines reservoir sampling and two methods to improve it: balanced reservoir sampling and stratified reservoir sampling. A simulation study is described to compare the performance of these reservoir sampling methods.
Collaborative filtering using orthogonal nonnegative matrixAllenWu
This document summarizes a research paper that proposes using orthogonal nonnegative matrix tri-factorization (ONMTF) to fuse model-based and memory-based collaborative filtering approaches. ONMTF is used to co-cluster users and items to obtain centroids that are then used to select similar users and items for predicting unknown ratings. Experimental results on movie rating datasets show the ONMTF approach improves prediction accuracy over other collaborative filtering methods.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
This document discusses generative and discriminative classifiers. Generative classifiers model the joint distribution of data and labels, while discriminative classifiers directly model the conditional probability of labels given data. Naive Bayes is an example of a generative classifier, while logistic regression is a discriminative classifier that directly models the probability of a label given input features. The document provides mathematical details on naive Bayes, logistic regression, and how logistic regression can be trained to maximize conditional likelihood through gradient descent.
This chapter discusses continuous random variables and their probability density functions. It introduces the normal and exponential distributions and how to calculate probabilities and descriptive statistics for continuous random variables. It also shows how to approximate the binomial distribution using the normal distribution. The chapter objectives are to introduce continuous random variables, discuss the normal distribution and standard normal table, and demonstrate the normal approximation to the binomial distribution.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
This document proposes an approximate Bayesian inference method for estimating propensity scores under nonresponse. It involves treating the estimating equations as random variables and assigning a prior distribution to the transformed parameters. Samples are drawn from the posterior distribution of the parameters given the observed data to make inferences. The method is shown to be asymptotically consistent and confidence regions can be constructed from the posterior samples. Extensions are discussed to incorporate auxiliary variables and perform Bayesian model selection by assigning a spike-and-slab prior over the model parameters.
This document summarizes key concepts from Chapter 2 of a book on statistical methods for handling incomplete data. It introduces the likelihood-based approach and defines key terms like the likelihood function, maximum likelihood estimator, Fisher information, and missing at random. The chapter also provides examples of observed likelihood functions for censored regression and survival analysis models with missing data.
This document discusses approximate Bayesian computation (ABC) for model choice between multiple models. It introduces the ABC algorithm for model choice, which approximates the posterior probabilities of models given the data by simulating parameters from the prior and accepting simulations based on the distance between simulated and observed sufficient statistics. Issues with choosing sufficient statistics that apply to all models are discussed. The document also examines the limiting behavior of the ABC approximation to the Bayes factor as the tolerance approaches 0 and infinity. It notes that discrepancies can arise if sufficient statistics are not cross-model sufficient. An example comparing Poisson and geometric models demonstrates this.
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
The document discusses interpretable sparse sliced inverse regression (IS-SIR) for functional data regression. It begins with background on using metamodels as proxies for computationally expensive agronomic models to understand relationships between climate inputs and plant outputs. SIR is presented as a semi-parametric regression technique that identifies relevant subspaces to predict outputs from functional inputs. The proposal involves combining SIR with automatic interval selection to point out interpretable predictor intervals. Simulations are discussed to evaluate the proposed method.
The document discusses approximate Bayesian computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to some distance measure. The document covers the basic ABC algorithm, convergence properties as the tolerance approaches zero, examples of ABC for probit models and MA time series models, and advances such as modifying the proposal distribution to increase efficiency.
The document discusses estimation of multi-Granger network causal models from time series data. It proposes a joint modeling approach to estimate vector autoregressive (VAR) models for multiple time series datasets simultaneously. The key steps are:
1. Estimate the inverse covariance matrices for each dataset using a factor model approach.
2. Use the estimated inverse covariance matrices in a generalized fused lasso optimization to jointly estimate the VAR coefficient matrices for each dataset.
Simulation results show the joint modeling approach improves estimation of the VAR coefficients and reduces forecasting error compared to estimating the models separately, especially when the number of time points is small. The factor modeling approach also provides a better estimate of the inverse covariance than using the empirical estimate.
ABC with data cloning for MLE in state space modelsUmberto Picchini
An application of the "data cloning" method for parameter estimation via MLE aided by Approximate Bayesian Computation. The relevant paper is http://arxiv.org/abs/1505.06318
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
Erica Rutter presents non-parametric techniques for estimating tumor heterogeneity from data. She describes using a Prohorov metric framework to determine the approximate distributions of diffusion (D) and growth (ρ) parameters from data, without assumptions about their distributions. She creates synthetic data from a known ρ distribution and solves the inverse problem to estimate ρ, comparing solutions using delta functions and spline functions with varying numbers of nodes. The Akaike Information Criteria is used to select the optimal number of nodes. Representative results show the estimated ρ distribution matching the true distribution well.
This document provides an overview of Approximate Bayesian Computation (ABC) methods for Bayesian model choice. ABC methods allow Bayesian inference when the likelihood function is intractable or unavailable. The ABC algorithm works by simulating parameters from the prior and accepting simulations where the simulated and observed data are close according to some distance measure and tolerance level. ABC outputs an approximation of the posterior distribution. An example application is presented for choosing a probit model for diabetes risk using data on Pima Indian women.
This document provides an overview of machine learning approaches for sequential data. It discusses Hidden Markov Models (HMMs), which model sequential data as a Markov process but cannot capture long-range dependencies. Window-based approaches consider a window of features but cannot model label dependencies. Maximum entropy models predict labels discriminatively but cannot model sequences. Conditional random fields (CRFs) are discriminative models that can jointly predict whole sequences by modeling dependencies between labels and features. CRFs overcome limitations of previous approaches by capturing long-range patterns in sequential data.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. It proposes:
1. Using random forests to infer a model from summary statistics, as random forests can handle a large number of statistics and find efficient combinations.
2. Replacing estimates of posterior model probabilities, which are poorly approximated, with posterior predictive expected losses to evaluate models.
3. An example comparing MA(1) and MA(2) time series models using two autocorrelations as summaries, finding embedded models and that random forests perform similarly to other methods on small problems.
Regression is a method used for prediction problems involving continuous or ordered target variables. It models the relationship between predictor variables and a dependent variable. Linear regression finds the best fitting straight line to model this relationship, while nonlinear regression can model more complex relationships. Regularization techniques like ridge and lasso regression can help reduce overfitting. Regression trees and other models extend regression to handle categorical predictors. Evaluation metrics measure the accuracy of numeric predictions versus actual values.
Factors are categorical variables. The sums of the values of these variables are called levels. In this talk, we consider the variable selection problem where the set of potential predictors contains both factors and numerical variables. Formally, this problem is a particular case of the standard variable selection problema, where factors are coded using dummy variables. As such, the Bayesian solution would be straightforward and, possibly because of this, the problem. Despite its importance, this issue has not received much attention in the literature. Nevertheless, we show that this perception is illusory and that in fact several inputs, like the assignment of prior probabilities over the model space or the parameterization adopted for factors may have a large (and difficult to anticipate) impact on the results. We provide a solution to these issues that extends the proposals in the standard variable selection problem and does not depend on how the factors are coded using dummy variables. Our approach is illustrated with a real example concerning a childhood obesity study in Spain.
Authors: Gonzalo Garcia-donato and Rui Paulo
The Bayesian paradigm provides a coherent approach for quantifying uncertainty given available data and prior information. Aspects of uncertainty that arise in practice include uncertainty regarding parameters within a model, the choice of model, and propagation of uncertainty in parameters and models for predictions. In this talk I will present Bayesian approaches for addressing model uncertainty given a collection of competing models including model averaging and ensemble methods that potentially use all available models and will highlight computational challenges that arise in implementation of the paradigm.
Some sampling techniques for big data analysisJae-kwang Kim
This document describes different sampling techniques for big data analysis, including reservoir sampling and its variants. It provides an example to illustrate simple random sampling and calculates the expected value and variance of sampling errors. It then discusses probability sampling and its advantages over non-probability sampling. The document also introduces survey sampling and challenges in the era of big data, as well as how sampling techniques can still be useful for handling big data. It outlines reservoir sampling and two methods to improve it: balanced reservoir sampling and stratified reservoir sampling. A simulation study is described to compare the performance of these reservoir sampling methods.
Collaborative filtering using orthogonal nonnegative matrixAllenWu
This document summarizes a research paper that proposes using orthogonal nonnegative matrix tri-factorization (ONMTF) to fuse model-based and memory-based collaborative filtering approaches. ONMTF is used to co-cluster users and items to obtain centroids that are then used to select similar users and items for predicting unknown ratings. Experimental results on movie rating datasets show the ONMTF approach improves prediction accuracy over other collaborative filtering methods.
The document discusses the benefits of exercise for mental health. Regular physical activity can help reduce anxiety and depression and improve mood and cognitive functioning. Exercise causes chemical changes in the brain that may help protect against mental illness and improve symptoms.
This document discusses generative and discriminative classifiers. Generative classifiers model the joint distribution of data and labels, while discriminative classifiers directly model the conditional probability of labels given data. Naive Bayes is an example of a generative classifier, while logistic regression is a discriminative classifier that directly models the probability of a label given input features. The document provides mathematical details on naive Bayes, logistic regression, and how logistic regression can be trained to maximize conditional likelihood through gradient descent.
This chapter discusses continuous random variables and their probability density functions. It introduces the normal and exponential distributions and how to calculate probabilities and descriptive statistics for continuous random variables. It also shows how to approximate the binomial distribution using the normal distribution. The chapter objectives are to introduce continuous random variables, discuss the normal distribution and standard normal table, and demonstrate the normal approximation to the binomial distribution.
This document summarizes approximate Bayesian computation (ABC) methods. It begins with an overview of ABC, which provides a likelihood-free rejection technique for Bayesian inference when the likelihood function is intractable. The ABC algorithm works by simulating parameters and data until the simulated and observed data are close according to some distance measure and tolerance level. The document then discusses the asymptotic properties of ABC, including consistency of ABC posteriors and rates of convergence under certain assumptions. It also notes relationships between ABC and k-nearest neighbor methods. Examples applying ABC to autoregressive time series models are provided.
This document proposes an approximate Bayesian inference method for estimating propensity scores under nonresponse. It involves treating the estimating equations as random variables and assigning a prior distribution to the transformed parameters. Samples are drawn from the posterior distribution of the parameters given the observed data to make inferences. The method is shown to be asymptotically consistent and confidence regions can be constructed from the posterior samples. Extensions are discussed to incorporate auxiliary variables and perform Bayesian model selection by assigning a spike-and-slab prior over the model parameters.
This document summarizes key concepts from Chapter 2 of a book on statistical methods for handling incomplete data. It introduces the likelihood-based approach and defines key terms like the likelihood function, maximum likelihood estimator, Fisher information, and missing at random. The chapter also provides examples of observed likelihood functions for censored regression and survival analysis models with missing data.
This document discusses approximate Bayesian computation (ABC) for model choice between multiple models. It introduces the ABC algorithm for model choice, which approximates the posterior probabilities of models given the data by simulating parameters from the prior and accepting simulations based on the distance between simulated and observed sufficient statistics. Issues with choosing sufficient statistics that apply to all models are discussed. The document also examines the limiting behavior of the ABC approximation to the Bayes factor as the tolerance approaches 0 and infinity. It notes that discrepancies can arise if sufficient statistics are not cross-model sufficient. An example comparing Poisson and geometric models demonstrates this.
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
The document discusses interpretable sparse sliced inverse regression (IS-SIR) for functional data regression. It begins with background on using metamodels as proxies for computationally expensive agronomic models to understand relationships between climate inputs and plant outputs. SIR is presented as a semi-parametric regression technique that identifies relevant subspaces to predict outputs from functional inputs. The proposal involves combining SIR with automatic interval selection to point out interpretable predictor intervals. Simulations are discussed to evaluate the proposed method.
The document discusses approximate Bayesian computation (ABC), a simulation-based method for conducting Bayesian inference when the likelihood function is intractable or impossible to compute directly. ABC works by simulating data under different parameter values, and accepting simulations that are close to the observed data according to some distance measure. The document covers the basic ABC algorithm, convergence properties as the tolerance approaches zero, examples of ABC for probit models and MA time series models, and advances such as modifying the proposal distribution to increase efficiency.
The document discusses estimation of multi-Granger network causal models from time series data. It proposes a joint modeling approach to estimate vector autoregressive (VAR) models for multiple time series datasets simultaneously. The key steps are:
1. Estimate the inverse covariance matrices for each dataset using a factor model approach.
2. Use the estimated inverse covariance matrices in a generalized fused lasso optimization to jointly estimate the VAR coefficient matrices for each dataset.
Simulation results show the joint modeling approach improves estimation of the VAR coefficients and reduces forecasting error compared to estimating the models separately, especially when the number of time points is small. The factor modeling approach also provides a better estimate of the inverse covariance than using the empirical estimate.
ABC with data cloning for MLE in state space modelsUmberto Picchini
An application of the "data cloning" method for parameter estimation via MLE aided by Approximate Bayesian Computation. The relevant paper is http://arxiv.org/abs/1505.06318
Approximate Bayesian model choice via random forestsChristian Robert
The document describes approximate Bayesian computation (ABC) methods for model choice when likelihoods are intractable. ABC generates parameter-dataset pairs from the prior and retains those where the simulated and observed datasets are similar according to a distance measure on summary statistics. For model choice, ABC approximates posterior model probabilities by the proportion of simulations from each model that are retained. Machine learning techniques can also be used to infer the most likely model directly from the simulated summary statistics.
This document discusses approximate Bayesian computation (ABC) techniques for performing Bayesian inference when the likelihood function is not available in closed form. It covers the basic ABC algorithm and discusses challenges with high-dimensional data. It also summarizes recent advances in ABC that incorporate nonparametric regression, reproducing kernel Hilbert spaces, and neural networks to help address these challenges.
Erica Rutter presents non-parametric techniques for estimating tumor heterogeneity from data. She describes using a Prohorov metric framework to determine the approximate distributions of diffusion (D) and growth (ρ) parameters from data, without assumptions about their distributions. She creates synthetic data from a known ρ distribution and solves the inverse problem to estimate ρ, comparing solutions using delta functions and spline functions with varying numbers of nodes. The Akaike Information Criteria is used to select the optimal number of nodes. Representative results show the estimated ρ distribution matching the true distribution well.
This document provides an overview of Approximate Bayesian Computation (ABC) methods for Bayesian model choice. ABC methods allow Bayesian inference when the likelihood function is intractable or unavailable. The ABC algorithm works by simulating parameters from the prior and accepting simulations where the simulated and observed data are close according to some distance measure and tolerance level. ABC outputs an approximation of the posterior distribution. An example application is presented for choosing a probit model for diabetes risk using data on Pima Indian women.
This document provides an overview of machine learning approaches for sequential data. It discusses Hidden Markov Models (HMMs), which model sequential data as a Markov process but cannot capture long-range dependencies. Window-based approaches consider a window of features but cannot model label dependencies. Maximum entropy models predict labels discriminatively but cannot model sequences. Conditional random fields (CRFs) are discriminative models that can jointly predict whole sequences by modeling dependencies between labels and features. CRFs overcome limitations of previous approaches by capturing long-range patterns in sequential data.
The document discusses using random forests for approximate Bayesian computation (ABC) model choice. It proposes:
1. Using random forests to infer a model from summary statistics, as random forests can handle a large number of statistics and find efficient combinations.
2. Replacing estimates of posterior model probabilities, which are poorly approximated, with posterior predictive expected losses to evaluate models.
3. An example comparing MA(1) and MA(2) time series models using two autocorrelations as summaries, finding embedded models and that random forests perform similarly to other methods on small problems.
Regression is a method used for prediction problems involving continuous or ordered target variables. It models the relationship between predictor variables and a dependent variable. Linear regression finds the best fitting straight line to model this relationship, while nonlinear regression can model more complex relationships. Regularization techniques like ridge and lasso regression can help reduce overfitting. Regression trees and other models extend regression to handle categorical predictors. Evaluation metrics measure the accuracy of numeric predictions versus actual values.
Factors are categorical variables. The sums of the values of these variables are called levels. In this talk, we consider the variable selection problem where the set of potential predictors contains both factors and numerical variables. Formally, this problem is a particular case of the standard variable selection problema, where factors are coded using dummy variables. As such, the Bayesian solution would be straightforward and, possibly because of this, the problem. Despite its importance, this issue has not received much attention in the literature. Nevertheless, we show that this perception is illusory and that in fact several inputs, like the assignment of prior probabilities over the model space or the parameterization adopted for factors may have a large (and difficult to anticipate) impact on the results. We provide a solution to these issues that extends the proposals in the standard variable selection problem and does not depend on how the factors are coded using dummy variables. Our approach is illustrated with a real example concerning a childhood obesity study in Spain.
Authors: Gonzalo Garcia-donato and Rui Paulo
Bayesian Variable Selection in Linear Regression and A ComparisonAtilla YARDIMCI
In this study, Bayesian approaches, such as Zellner, Occam’s Window and Gibbs sampling, have been compared in terms of selecting the correct subset for the variable selection in a linear regression model. The aim of this comparison is to analyze Bayesian variable selection and the behavior of classical criteria by taking into consideration the different values of β and σ and prior expected levels.
Regression analysis models the relationship between variables, where the dependent variable is modeled as a function of one or more independent variables. Linear regression models take forms such as straight-line, polynomial, Fourier, and interaction models. Multiple linear regression is useful for understanding variable effects, predicting values, and finding relationships between multiple independent and dependent variables. Methods like robust, stepwise, ridge, and partial least squares regression address issues like outliers, multicollinearity, and correlated predictors. Response surface and generalized linear models extend linear regression to nonlinear relationships. Multivariate regression models multiple dependent variables.
Regression analysis models the relationship between variables, including dependent and independent variables. Linear regression models take forms like straight lines, polynomials, trigonometric, and interaction terms. Multiple linear regression is useful for understanding variable effects, predicting values, and dealing with multicollinearity using methods like ridge regression, partial least squares, and stepwise regression. Nonlinear and generalized linear models also describe nonlinear relationships. Multivariate regression involves multiple response variables.
This document discusses multiple linear regression. It begins by explaining linear regression and its applications. It then discusses multiple linear regression, where there is more than one independent variable. As an example, it describes using multiple linear regression to estimate company profits based on various independent variables. The document provides resources for learning more about linear regression in Python.
We approach the screening problem - i.e. detecting which inputs of a computer model significantly impact the output - from a formal Bayesian model selection point of view. That is, we place a Gaussian process prior on the computer model and consider the $2^p$ models that result from assuming that each of the subsets of the $p$ inputs affect the response. The goal is to obtain the posterior probabilities of each of these models. In this talk, we focus on the specification of objective priors on the model-specific parameters and on convenient ways to compute the associated marginal likelihoods. These two problems that normally are seen as unrelated, have challenging connections since the priors proposed in the literature are specifically designed to have posterior modes in the boundary of the parameter space, hence precluding the application of approximate integration techniques based on e.g. Laplace approximations. We explore several ways of circumventing this difficulty, comparing different methodologies with synthetic examples taken from the literature.
Authors: Gonzalo Garcia-Donato (Universidad de Castilla-La Mancha) and Rui Paulo (Universidade de Lisboa)
We describe different approaches for specifying models and prior distributions for estimating heterogeneous treatment effects using Bayesian nonparametric models. We make an affirmative case for direct, informative (or partially informative) prior distributions on heterogeneous treatment effects, especially when treatment effect size and treatment effect variation is small relative to other sources of variability. We also consider how to provide scientifically meaningful summaries of complicated, high-dimensional posterior distributions over heterogeneous treatment effects with appropriate measures of uncertainty.
Maximum likelihood estimation of regularisation parameters in inverse problem...Valentin De Bortoli
This document discusses an empirical Bayesian approach for estimating regularization parameters in inverse problems using maximum likelihood estimation. It proposes the Stochastic Optimization with Unadjusted Langevin (SOUL) algorithm, which uses Markov chain sampling to approximate gradients in a stochastic projected gradient descent scheme for optimizing the regularization parameter. The algorithm is shown to converge to the maximum likelihood estimate under certain conditions on the log-likelihood and prior distributions.
This document summarizes a journal article that proposes an alternative approach to variable selection called the KL adaptive lasso. The KL adaptive lasso replaces the squared error loss used in traditional adaptive lasso with Kullback-Leibler divergence loss. The paper shows that the KL adaptive lasso enjoys oracle properties, meaning it performs as well as if the true underlying model was given. Specifically, it consistently selects the true variables and estimates their coefficients at optimal rates. The KL adaptive lasso can also be solved using efficient algorithms like LARS. The approach is extended to generalized linear models, and theoretical properties are discussed.
Stuck with your Regression Assignment? Get 24/7 help from tutors with Phd in the subject. Email us at support@helpwithassignment.com
Reach us at http://www.HelpWithAssignment.com
This document presents a presentation on regression analysis submitted to Dr. Adeel. It includes:
- An introduction to regression analysis and its uses in measuring relationships between variables and making predictions.
- Methods for studying regression including graphically, algebraically using least squares, and deviations from means.
- An example calculating regression equations using data on students' grades and scores through least squares and deviations from means.
- Conclusion that the regression equations match those obtained through other common methods.
1. The document discusses approximate Bayesian computation (ABC), a technique used when the likelihood function is intractable. ABC works by simulating parameters from the prior and simulating data, rejecting simulations that are not close to the observed data based on a tolerance level.
2. Random forests can be used in ABC to select informative summary statistics from a large set of possibilities and estimate parameters. The random forests classify simulations as accepted or rejected based on the summaries, implicitly selecting important summaries.
3. Calibrating the tolerance level in ABC is important but difficult, as it determines how close simulations must be to the observed data. Methods discussed include using quantiles of prior predictive simulations or asymptotic convergence properties.
The document describes a method for identifying small inclusions (or point sources) embedded in a medium from multistatic boundary measurements. It proposes using a sampling method combined with the reciprocity gap concept. This allows identifying the locations and properties of small inclusions without needing to compute the full background Green's tensor. Numerical validation is provided by testing the method on multiple small inclusion configurations.
Asymptotic properties of bayes factor in one way repeated measurements modelAlexander Decker
1) The document discusses asymptotic properties of Bayes factors for testing linear models in one-way repeated measurements designs.
2) It considers a linear mixed model with one within-subject factor and one between-subject factor, including random unit effects and error.
3) The authors investigate the consistency of the Bayes factor for testing a fixed effects model against this mixed model alternative. Under certain conditions on priors and design matrices, they derive the analytic form of the Bayes factor and show it is consistent.
Asymptotic properties of bayes factor in one way repeated measurements modelAlexander Decker
1) The document discusses asymptotic properties of Bayes factors for testing linear models in one-way repeated measurements designs.
2) It considers a linear mixed model with one within-subject factor and one between-subject factor that incorporates random effects and error terms.
3) Under certain conditions on the prior distributions and design matrix, the document identifies the analytic form of the Bayes factor and shows that it is consistent as sample size increases.
A big task often faced by practitioners is in deciding the appropriate model to adopt
in fitting count datasets. This paper is aimed at investigating a suitable model for fitting
highly skewed count datasets. Among other models, COM-Poisson regression model was
proposed in this paper for fitting count data due to its varying normalizing constant. Some
statistical models were investigated along with the proposed model; these include
Poisson, Negative Binomial, Zero-Inflated, Zero-inflated Poisson and Quasi- Poisson
models. A real life dataset relating to visits to Doctor within a given period was equally
used to test the behavior of the underlying models. From the findings, it is recommended
that COM-Poisson regression model should be adopted in fitting highly skewed count
datasets irrespective of the type of dispersion.
Control Synthesis by Sum of Squares OptimizationBehzad Samadi
The document outlines a presentation on control synthesis using sum of squares optimization. It begins with an introduction to convex optimization and sum of squares analysis. It then discusses applications of these techniques to control systems and stability analysis. The document provides examples of using sum of squares to solve global optimization problems and verify stability of nonlinear systems.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
I am Paul G. I am a Data Analysis Assignment Expert at excelhomeworkhelp.com. I hold a Master's in Statistics, from Queensland, Australia. I have been helping students with their assignments for the past 10 years. I solved assignments related to Data Analysis.
Visit excelhomeworkhelp.com or email info@excelhomeworkhelp.com. You can also call on +1 678 648 4277 for any assistance with Data Analysis Assignment.
Similar to MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment Effects after Model Selection, Jingshen Wang, April 30, 2019 (20)
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
I will discuss paradigmatic statistical models of inference and learning from high dimensional data, such as sparse PCA and the perceptron neural network, in the sub-linear sparsity regime. In this limit the underlying hidden signal, i.e., the low-rank matrix in PCA or the neural network weights, has a number of non-zero components that scales sub-linearly with the total dimension of the vector. I will provide explicit low-dimensional variational formulas for the asymptotic mutual information between the signal and the data in suitable sparse limits. In the setting of support recovery these formulas imply sharp 0-1 phase transitions for the asymptotic minimum mean-square-error (or generalization error in the neural network setting). A similar phase transition was analyzed recently in the context of sparse high-dimensional linear regression by Reeves et al.
Many different measurement techniques are used to record neural activity in the brains of different organisms, including fMRI, EEG, MEG, lightsheet microscopy and direct recordings with electrodes. Each of these measurement modes have their advantages and disadvantages concerning the resolution of the data in space and time, the directness of measurement of the neural activity and which organisms they can be applied to. For some of these modes and for some organisms, significant amounts of data are now available in large standardized open-source datasets. I will report on our efforts to apply causal discovery algorithms to, among others, fMRI data from the Human Connectome Project, and to lightsheet microscopy data from zebrafish larvae. In particular, I will focus on the challenges we have faced both in terms of the nature of the data and the computational features of the discovery algorithms, as well as the modeling of experimental interventions.
1) The document presents a statistical modeling approach called targeted smooth Bayesian causal forests (tsbcf) to smoothly estimate heterogeneous treatment effects over gestational age using observational data from early medical abortion regimens.
2) The tsbcf method extends Bayesian additive regression trees (BART) to estimate treatment effects that evolve smoothly over gestational age, while allowing for heterogeneous effects across patient subgroups.
3) The tsbcf analysis of early medical abortion regimen data found the simultaneous administration to be similarly effective overall to the interval administration, but identified some patient subgroups where effectiveness may vary more over gestational age.
Difference-in-differences is a widely used evaluation strategy that draws causal inference from observational panel data. Its causal identification relies on the assumption of parallel trends, which is scale-dependent and may be questionable in some applications. A common alternative is a regression model that adjusts for the lagged dependent variable, which rests on the assumption of ignorability conditional on past outcomes. In the context of linear models, Angrist and Pischke (2009) show that the difference-in-differences and lagged-dependent-variable regression estimates have a bracketing relationship. Namely, for a true positive effect, if ignorability is correct, then mistakenly assuming parallel trends will overestimate the effect; in contrast, if the parallel trends assumption is correct, then mistakenly assuming ignorability will underestimate the effect. We show that the same bracketing relationship holds in general nonparametric (model-free) settings. We also extend the result to semiparametric estimation based on inverse probability weighting.
We develop sensitivity analyses for weak nulls in matched observational studies while allowing unit-level treatment effects to vary. In contrast to randomized experiments and paired observational studies, we show for general matched designs that over a large class of test statistics, any valid sensitivity analysis for the weak null must be unnecessarily conservative if Fisher's sharp null of no treatment effect for any individual also holds. We present a sensitivity analysis valid for the weak null, and illustrate why it is conservative if the sharp null holds through connections to inverse probability weighted estimators. An alternative procedure is presented that is asymptotically sharp if treatment effects are constant, and is valid for the weak null under additional assumptions which may be deemed reasonable by practitioners. The methods may be applied to matched observational studies constructed using any optimal without-replacement matching algorithm, allowing practitioners to assess robustness to hidden bias while allowing for treatment effect heterogeneity.
This document discusses difference-in-differences (DiD) analysis, a quasi-experimental method used to estimate treatment effects. The author notes that while widely applicable, DiD relies on strong assumptions about the counterfactual. She recommends approaches like matching on observed variables between similar populations, thoughtfully specifying regression models to adjust for confounding factors, testing for parallel pre-treatment trends under different assumptions, and considering more complex models that allow for different types of changes over time. The overall message is that DiD requires careful consideration and testing of its underlying assumptions to draw valid causal conclusions.
We present recent advances and statistical developments for evaluating Dynamic Treatment Regimes (DTR), which allow the treatment to be dynamically tailored according to evolving subject-level data. Identification of an optimal DTR is a key component for precision medicine and personalized health care. Specific topics covered in this talk include several recent projects with robust and flexible methods developed for the above research area. We will first introduce a dynamic statistical learning method, adaptive contrast weighted learning (ACWL), which combines doubly robust semiparametric regression estimators with flexible machine learning methods. We will further develop a tree-based reinforcement learning (T-RL) method, which builds an unsupervised decision tree that maintains the nature of batch-mode reinforcement learning. Unlike ACWL, T-RL handles the optimization problem with multiple treatment comparisons directly through a purity measure constructed with augmented inverse probability weighted estimators. T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs. However, ACWL seems more robust against tree-type misspecification than T-RL when the true optimal DTR is non-tree-type. At the end of this talk, we will also present a new Stochastic-Tree Search method called ST-RL for evaluating optimal DTRs.
A fundamental feature of evaluating causal health effects of air quality regulations is that air pollution moves through space, rendering health outcomes at a particular population location dependent upon regulatory actions taken at multiple, possibly distant, pollution sources. Motivated by studies of the public-health impacts of power plant regulations in the U.S., this talk introduces the novel setting of bipartite causal inference with interference, which arises when 1) treatments are defined on observational units that are distinct from those at which outcomes are measured and 2) there is interference between units in the sense that outcomes for some units depend on the treatments assigned to many other units. Interference in this setting arises due to complex exposure patterns dictated by physical-chemical atmospheric processes of pollution transport, with intervention effects framed as propagating across a bipartite network of power plants and residential zip codes. New causal estimands are introduced for the bipartite setting, along with an estimation approach based on generalized propensity scores for treatments on a network. The new methods are deployed to estimate how emission-reduction technologies implemented at coal-fired power plants causally affect health outcomes among Medicare beneficiaries in the U.S.
Laine Thomas presented information about how causal inference is being used to determine the cost/benefit of the two most common surgical surgical treatments for women - hysterectomy and myomectomy.
We provide an overview of some recent developments in machine learning tools for dynamic treatment regime discovery in precision medicine. The first development is a new off-policy reinforcement learning tool for continual learning in mobile health to enable patients with type 1 diabetes to exercise safely. The second development is a new inverse reinforcement learning tools which enables use of observational data to learn how clinicians balance competing priorities for treating depression and mania in patients with bipolar disorder. Both practical and technical challenges are discussed.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
This document summarizes a simulation study evaluating causal inference methods for assessing the effects of opioid and gun policies. The study used real US state-level data to simulate the adoption of policies by some states and estimated the effects using different statistical models. It found that with fewer adopting states, type 1 error rates were too high, and most models lacked power. It recommends using cluster-robust standard errors and lagged outcomes to improve model performance. The study aims to help identify best practices for policy evaluation studies.
We study experimental design in large-scale stochastic systems with substantial uncertainty and structured cross-unit interference. We consider the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and propose a class of local experimentation schemes that can be used to optimize these payments without perturbing the overall market equilibrium. We show that, as the system size grows, our scheme can estimate the gradient of the platform’s utility with respect to p while perturbing the overall market equilibrium by only a vanishingly small amount. We can then use these gradient estimates to optimize p via any stochastic first-order optimization method. These results stem from the insight that, while the system involves a large number of interacting units, any interference can only be channeled through a small number of key statistics, and this structure allows us to accurately predict feedback effects that arise from global system changes using only information collected while remaining in equilibrium.
We discuss a general roadmap for generating causal inference based on observational studies used to general real world evidence. We review targeted minimum loss estimation (TMLE), which provides a general template for the construction of asymptotically efficient plug-in estimators of a target estimand for realistic (i.e, infinite dimensional) statistical models. TMLE is a two stage procedure that first involves using ensemble machine learning termed super-learning to estimate the relevant stochastic relations between the treatment, censoring, covariates and outcome of interest. The super-learner allows one to fully utilize all the advances in machine learning (in addition to more conventional parametric model based estimators) to build a single most powerful ensemble machine learning algorithm. We present Highly Adaptive Lasso as an important machine learning algorithm to include.
In the second step, the TMLE involves maximizing a parametric likelihood along a so-called least favorable parametric model through the super-learner fit of the relevant stochastic relations in the observed data. This second step bridges the state of the art in machine learning to estimators of target estimands for which statistical inference is available (i.e, confidence intervals, p-values etc). We also review recent advances in collaborative TMLE in which the fit of the treatment and censoring mechanism is tailored w.r.t. performance of TMLE. We also discuss asymptotically valid bootstrap based inference. Simulations and data analyses are provided as demonstrations.
Climate change mitigation has traditionally been analyzed as some version of a public goods game (PGG) in which a group is most successful if everybody contributes, but players are best off individually by not contributing anything (i.e., “free-riding”)—thereby creating a social dilemma. Analysis of climate change using the PGG and its variants has helped explain why global cooperation on GHG reductions is so difficult, as nations have an incentive to free-ride on the reductions of others. Rather than inspire collective action, it seems that the lack of progress in addressing the climate crisis is driving the search for a “quick fix” technological solution that circumvents the need for cooperation.
This document discusses various types of academic writing and provides tips for effective academic writing. It outlines common academic writing formats such as journal papers, books, and reports. It also lists writing necessities like having a clear purpose, understanding your audience, using proper grammar and being concise. The document cautions against plagiarism and not proofreading. It provides additional dos and don'ts for writing, such as using simple language and avoiding filler words. Overall, the key message is that academic writing requires selling your ideas effectively to the reader.
Machine learning (including deep and reinforcement learning) and blockchain are two of the most noticeable technologies in recent years. The first one is the foundation of artificial intelligence and big data, and the second one has significantly disrupted the financial industry. Both technologies are data-driven, and thus there are rapidly growing interests in integrating them for more secure and efficient data sharing and analysis. In this paper, we review the research on combining blockchain and machine learning technologies and demonstrate that they can collaborate efficiently and effectively. In the end, we point out some future directions and expect more researches on deeper integration of the two promising technologies.
In this talk, we discuss QuTrack, a Blockchain-based approach to track experiment and model changes primarily for AI and ML models. In addition, we discuss how change analytics can be used for process improvement and to enhance the model development and deployment processes.
This talk builds on recent empirical work addressing the extent to which the transaction graph serves as an early-warning indicator for large financial losses. By identifying certain sub-graphs ('chainlets') with causal effect on price movements, we demonstrate the impact of extreme transaction graph activity on the intraday volatility of the Bitcoin prices series. In particular, we infer the loss distributions conditional on extreme chainlet activity. Armed with this empirical representation, we propose a modeling approach to explore conditions under which the market is stabilized by transaction graph aware agents.
More from The Statistical and Applied Mathematical Sciences Institute (20)
How to Setup Default Value for a Field in Odoo 17Celine George
In Odoo, we can set a default value for a field during the creation of a record for a model. We have many methods in odoo for setting a default value to the field.
How to Download & Install Module From the Odoo App Store in Odoo 17Celine George
Custom modules offer the flexibility to extend Odoo's capabilities, address unique requirements, and optimize workflows to align seamlessly with your organization's processes. By leveraging custom modules, businesses can unlock greater efficiency, productivity, and innovation, empowering them to stay competitive in today's dynamic market landscape. In this tutorial, we'll guide you step by step on how to easily download and install modules from the Odoo App Store.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...TechSoup
Whether you're new to SEO or looking to refine your existing strategies, this webinar will provide you with actionable insights and practical tips to elevate your nonprofit's online presence.
Information and Communication Technology in EducationMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 2)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐈𝐂𝐓 𝐢𝐧 𝐞𝐝𝐮𝐜𝐚𝐭𝐢𝐨𝐧:
Students will be able to explain the role and impact of Information and Communication Technology (ICT) in education. They will understand how ICT tools, such as computers, the internet, and educational software, enhance learning and teaching processes. By exploring various ICT applications, students will recognize how these technologies facilitate access to information, improve communication, support collaboration, and enable personalized learning experiences.
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐬𝐨𝐮𝐫𝐜𝐞𝐬 𝐨𝐧 𝐭𝐡𝐞 𝐢𝐧𝐭𝐞𝐫𝐧𝐞𝐭:
-Students will be able to discuss what constitutes reliable sources on the internet. They will learn to identify key characteristics of trustworthy information, such as credibility, accuracy, and authority. By examining different types of online sources, students will develop skills to evaluate the reliability of websites and content, ensuring they can distinguish between reputable information and misinformation.
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...EduSkills OECD
Andreas Schleicher, Director of Education and Skills at the OECD presents at the launch of PISA 2022 Volume III - Creative Minds, Creative Schools on 18 June 2024.
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumMJDuyan
(𝐓𝐋𝐄 𝟏𝟎𝟎) (𝐋𝐞𝐬𝐬𝐨𝐧 𝟏)-𝐏𝐫𝐞𝐥𝐢𝐦𝐬
𝐃𝐢𝐬𝐜𝐮𝐬𝐬 𝐭𝐡𝐞 𝐄𝐏𝐏 𝐂𝐮𝐫𝐫𝐢𝐜𝐮𝐥𝐮𝐦 𝐢𝐧 𝐭𝐡𝐞 𝐏𝐡𝐢𝐥𝐢𝐩𝐩𝐢𝐧𝐞𝐬:
- Understand the goals and objectives of the Edukasyong Pantahanan at Pangkabuhayan (EPP) curriculum, recognizing its importance in fostering practical life skills and values among students. Students will also be able to identify the key components and subjects covered, such as agriculture, home economics, industrial arts, and information and communication technology.
𝐄𝐱𝐩𝐥𝐚𝐢𝐧 𝐭𝐡𝐞 𝐍𝐚𝐭𝐮𝐫𝐞 𝐚𝐧𝐝 𝐒𝐜𝐨𝐩𝐞 𝐨𝐟 𝐚𝐧 𝐄𝐧𝐭𝐫𝐞𝐩𝐫𝐞𝐧𝐞𝐮𝐫:
-Define entrepreneurship, distinguishing it from general business activities by emphasizing its focus on innovation, risk-taking, and value creation. Students will describe the characteristics and traits of successful entrepreneurs, including their roles and responsibilities, and discuss the broader economic and social impacts of entrepreneurial activities on both local and global scales.
8. Literature review
Post selection inference
Uniform Inference
Berk et al. (2013), Bachoc et al. (2016), Kuchibhotla et al. (2018)
Data Splitting
Rinaldo et al. (2016), Fithian et al. (2014)
Selective (conditional) Inference
Lee et al. (2016), Zhao et al. (2017), Tian and Taylor (2018)
3 / 22
9. Literature review
Post selection inference
Uniform Inference
Berk et al. (2013), Bachoc et al. (2016), Kuchibhotla et al. (2018)
Data Splitting
Rinaldo et al. (2016), Fithian et al. (2014)
Selective (conditional) Inference
Lee et al. (2016), Zhao et al. (2017), Tian and Taylor (2018)
Commonality of these different approaches: a data dependent target βM
.
In this talk: structural parameter α as target.
3 / 22
12. Key points of the talk
Refitting approach
αrefit is biased: over-fitting and under-fitting.
Provide statistical insight in the bias.
Develop repeated data splitting procedure to remove the bias.
Cross-fitting is not as efficient as the repeated data splitting.
5 / 22
14. High-dimensional approximately linear model
Model setup
Y = αD + Xβ + Rn + ε, E(ε|D, X) = 0.
α − parameter of interest
D − treatment or variable of interest
6 / 22
15. High-dimensional approximately linear model
Model setup
Y = αD + Xβ + Rn + ε, E(ε|D, X) = 0.
α − parameter of interest
D − treatment or variable of interest
X − high dimensional covariates (e.g. basis functions for nonparametric
regression functions)
6 / 22
16. High-dimensional approximately linear model
Model setup
Y = αD + Xβ + Rn + ε, E(ε|D, X) = 0.
α − parameter of interest
D − treatment or variable of interest
X − high dimensional covariates (e.g. basis functions for nonparametric
regression functions)
ε − noise
6 / 22
17. High-dimensional approximately linear model
Model setup
Y = αD + Xβ + Rn + ε, E(ε|D, X) = 0.
α − parameter of interest
D − treatment or variable of interest
X − high dimensional covariates (e.g. basis functions for nonparametric
regression functions)
ε − noise
β − sparse vector of coefficients, i.e.
M0 = {j : βj = 0, j = 1, · · · , p}, |M0| = s0 p.
6 / 22
18. High-dimensional approximately linear model
Model setup
Y = αD + Xβ + Rn + ε, E(ε|D, X) = 0.
α − parameter of interest
D − treatment or variable of interest
X − high dimensional covariates (e.g. basis functions for nonparametric
regression functions)
ε − noise
β − sparse vector of coefficients, i.e.
M0 = {j : βj = 0, j = 1, · · · , p}, |M0| = s0 p.
Rn − approximation error
6 / 22
19. High-dimensional approximately linear model
Model setup
Y = αD + Xβ + Rn + ε, E(ε|D, X) = 0.
α − parameter of interest
D − treatment or variable of interest
X − high dimensional covariates (e.g. basis functions for nonparametric
regression functions)
ε − noise
β − sparse vector of coefficients, i.e.
M0 = {j : βj = 0, j = 1, · · · , p}, |M0| = s0 p.
Rn − approximation error
Under Neyman-Robin causal model and the unconfoundedness assumption, α is
the causal effect.
6 / 22
20. Common perception and challenges for inference after refitting
A common perception
Inference after refitting is valid, because many model selection methods satisfy
the “oracle property” (Fan and Li, 2001)
lim
n→∞
P(M = M0) = 1.
Challenges
“Oracle property” requires strong stringent assumptions.
Perfect model selection does not happen with high probability in finite
samples.
7 / 22
21. Common perception and challenges for inference after refitting
A common perception
Inference after refitting is valid, because many model selection methods satisfy
the “oracle property” (Fan and Li, 2001)
lim
n→∞
P(M = M0) = 1.
Challenges
“Oracle property” requires strong stringent assumptions.
Perfect model selection does not happen with high probability in finite
samples.
7 / 22
22. Common perception and challenges for inference after refitting
A common perception
Inference after refitting is valid, because many model selection methods satisfy
the “oracle property” (Fan and Li, 2001)
lim
n→∞
P(M = M0) = 1.
Challenges
“Oracle property” requires strong stringent assumptions.
Perfect model selection does not happen with high probability in finite
samples.
7 / 22
42. Percentage of perfect model selection vs. model size
Perfect model selection never happens with high probability.
11 / 22
43. Summary: Refitting bias of M
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε
over-fitting
+ (D (I − PM
)D)−1
D (I − PM
)Xβ
under-fitting
,
where ZM
= (D, XM
), and PM
= XM
(X
M
XM
)−1
XM
12 / 22
44. Summary: Refitting bias of M
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε
over-fitting
+ (D (I − PM
)D)−1
D (I − PM
)Xβ
under-fitting
,
where ZM
= (D, XM
), and PM
= XM
(X
M
XM
)−1
XM
Over-fitting and under-fitting bias
If M ⊂ M0, αrefit has under-fitting bias (omitted variable bias).
12 / 22
45. Summary: Refitting bias of M
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε
over-fitting
+ (D (I − PM
)D)−1
D (I − PM
)Xβ
under-fitting
,
where ZM
= (D, XM
), and PM
= XM
(X
M
XM
)−1
XM
Over-fitting and under-fitting bias
If M ⊂ M0, αrefit has under-fitting bias (omitted variable bias).
If M0 ⊂ M, αrefit has over-fitting bias due to spurious correlation (fan)
E(αrefit − α) = E e1 Z
M
ZM
−1
Z
M
E(ε|ZM
) .
12 / 22
46. Summary: Refitting bias of M
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε
over-fitting
+ (D (I − PM
)D)−1
D (I − PM
)Xβ
under-fitting
,
where ZM
= (D, XM
), and PM
= XM
(X
M
XM
)−1
XM
Over-fitting and under-fitting bias
If M ⊂ M0, αrefit has under-fitting bias (omitted variable bias).
If M0 ⊂ M, αrefit has over-fitting bias due to spurious correlation (fan)
E(αrefit − α) = E e1 Z
M
ZM
−1
Z
M
E(ε|ZM
) .
Over- and under-fitting bias may occur simultaneously.
Hong et al. (2018) and Chernozhukov et al. (2018) discussed a similar bias issue.
12 / 22
47. Removing the over-fitting bias by data splitting
Suppose that M0 ⊂ M. Then the refitted estimator simplifies to
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε.
13 / 22
48. Removing the over-fitting bias by data splitting
Suppose that M0 ⊂ M. Then the refitted estimator simplifies to
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε.
Remove the over-fitting bias by data splitting (Mosteller and Tukey, 1977):
13 / 22
49. Removing the over-fitting bias by data splitting
Suppose that M0 ⊂ M. Then the refitted estimator simplifies to
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε.
Remove the over-fitting bias by data splitting (Mosteller and Tukey, 1977):
13 / 22
50. Removing the over-fitting bias by data splitting
Suppose that M0 ⊂ M. Then the refitted estimator simplifies to
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε.
Remove the over-fitting bias by data splitting (Mosteller and Tukey, 1977):
On T2, the over-fitting bias vanishes since
E(εT2 |ZM
) = 0.
13 / 22
51. Removing the over-fitting bias by data splitting
Suppose that M0 ⊂ M. Then the refitted estimator simplifies to
αrefit − α = e1(Z
M
ZM
)−1
Z
M
ε.
Remove the over-fitting bias by data splitting (Mosteller and Tukey, 1977):
On T2, the over-fitting bias vanishes since
E(εT2 |ZM
) = 0.
Data-splitting removes the over-fitting bias, but it increases the estimation
variability.
13 / 22
56. R-Split: Repeated Data Splitting
On each split, αk depends on the data and random subsample indices.
14 / 22
57. R-Split: Repeated Data Splitting
In theory, B → ∞ and
α = E (αk|Data) .
In practice, B is a large number, e.g. B = 1000.
15 / 22
58. R-Split: Repeated Data Splitting
In theory, B → ∞ and
α = E (αk|Data) .
In practice, B is a large number, e.g. B = 1000.
R-Split is similar to Bagging (Breiman, 1996).
15 / 22
59. R-Split: Repeated Data Splitting
In theory, B → ∞ and
α = E (αk|Data) .
In practice, B is a large number, e.g. B = 1000.
R-Split is similar to Bagging (Breiman, 1996).
Sub-samples for both estimation and model selection are random and can
overlap.
15 / 22
64. Cross-fitting vs. R-Split
αcv − α =
1
2
Σ−1
M1
IM1
+ Σ−1
M2
IM2
1
n
n
i=1
εiZi + op(1/
√
n)
Variance decomposition of αCV
Var(αcv − α) = E Var
1
2
Σ−1
M1
IM1
+ Σ−1
M2
IM2
1
n
n
i=1
εiZi Data
+ Var E
1
2
Σ−1
M1
IM1
+ Σ−1
M2
IM2
1
n
n
i=1
εiZi Data
variance of R-Split
≥Var(α − α)
17 / 22
65. Cross-fitting vs. R-Split
αcv − α =
1
2
Σ−1
M1
IM1
+ Σ−1
M2
IM2
1
n
n
i=1
εiZi + op(1/
√
n)
Variance decomposition of αCV
Var(αcv − α) = E Var
1
2
Σ−1
M1
IM1
+ Σ−1
M2
IM2
1
n
n
i=1
εiZi Data
+ Var(α − α)
≥Var(α − α)
17 / 22
66. Cross-fitting vs. R-Split
αcv − α =
1
2
Σ−1
M1
IM1
+ Σ−1
M2
IM2
1
n
n
i=1
εiZi + op(1/
√
n)
Variance decomposition of αCV
Var(αcv − α) = E Var
1
2
Σ−1
M1
IM1
+ Σ−1
M2
IM2
1
n
n
i=1
εiZi Data
+ Var(α − α)
≥Var(α − α)
If M1 = M2 = M0, then Var(αcv − α) = Var(α − α).
R-Split reduces the variance by aggregating over all possible random
models.
17 / 22
67. R-Split: Asymptotic Normality
Theorem (R-Split)
Under certain assumptions, the R-Split estimator has the following linear
representation
α − α = ηn
1
n
n
i=1
εiZi + op(1/
√
n),
and thus
σ−1
n
√
n(α − α) ; N(0, 1),
with σn = σε ηnΣnηn
1
2
, Σ = Z Z/n, and Z = (D, X).
18 / 22
68. R-Split: Regularity assumptions
Assumption 1. Characterization of ηn
There exists a random vector ηn ∈ Rp+1
which is independent of ε and satisfies
E P e1Σ−1
M
Data − ηn
1
= op 1/ log p ,
where P : R|M|
→ Rp+1
is an embedding that sparsifies a vector.
19 / 22
69. R-Split: Regularity assumptions
Assumption 1. Characterization of ηn
There exists a random vector ηn ∈ Rp+1
which is independent of ε and satisfies
E P e1Σ−1
M
Data − ηn
1
= op 1/ log p ,
where P : R|M|
→ Rp+1
is an embedding that sparsifies a vector.
Suppose M = M0 for all splits,
ηn,j =
(e1Σ−1
M0
)j if j ∈ M0,
0 otherwise,
and therefore
α − α = e1Σ−1
M0
1
n
n
i=1
εiZi,M0 + op(1/
√
n).
19 / 22
70. R-Split: Regularity assumptions
Assumption 1. Characterization of ηn
There exists a random vector ηn ∈ Rp+1
which is independent of ε and satisfies
E P e1Σ−1
M
Data − ηn
1
= op 1/ log p ,
where P : R|M|
→ Rp+1
is an embedding that sparsifies a vector.
Suppose M = M0 for all splits,
ηn,j =
(e1Σ−1
M0
)j if j ∈ M0,
0 otherwise,
and therefore
α − α = e1Σ−1
M0
1
n
n
i=1
εiZi,M0 + op(1/
√
n).
For fixed model M0, α reduces to OLS based on the full sample.
Our theory generalizes OLS based on fixed to random models.
19 / 22
71. R-Split: Regularity assumptions
Assumption 1. Characterization of ηn
There exists a random vector ηn ∈ Rp+1
which is independent of ε and satisfies
E P e1Σ−1
M
Data − ηn
1
= op 1/ log p ,
where P : R|M|
→ Rp+1
is an embedding that sparsifies a vector.
Assumption 2. (Negligible under-fitting bias)
The under-fitting bias is negligible after averaging over all splits.
19 / 22
72. R-Split: Regularity assumptions
Assumption 1. Characterization of ηn
There exists a random vector ηn ∈ Rp+1
which is independent of ε and satisfies
E P e1Σ−1
M
Data − ηn
1
= op 1/ log p ,
where P : R|M|
→ Rp+1
is an embedding that sparsifies a vector.
Assumption 2. (Negligible under-fitting bias)
The under-fitting bias is negligible after averaging over all splits.
Assumption 3. (“Robust” model selection procedure)
The distribution of M remains stable if only one out of n observations changes.
19 / 22
73. R-Split: Regularity assumptions
Assumption 1. Characterization of ηn
There exists a random vector ηn ∈ Rp+1
which is independent of ε and satisfies
E P e1Σ−1
M
Data − ηn
1
= op 1/ log p ,
where P : R|M|
→ Rp+1
is an embedding that sparsifies a vector.
Assumption 2. (Negligible under-fitting bias)
The under-fitting bias is negligible after averaging over all splits.
Assumption 3. (“Robust” model selection procedure)
The distribution of M remains stable if only one out of n observations changes.
Assumption 4. (Sparsity level)
The selected model sizes are of the same order as s0 and s0 = o(n).
19 / 22
74. Conclusion
Refitting approach
The bias of αrefit is composed of two parts: under-fitting and over-fitting.
R-Split (Repeated data Splitting) removes the over-fitting bias without
much sacrifice of efficiency.
R-Split is more efficient than cross-fitting.
20 / 22
75. Conclusion
Refitting approach
The bias of αrefit is composed of two parts: under-fitting and over-fitting.
R-Split (Repeated data Splitting) removes the over-fitting bias without
much sacrifice of efficiency.
R-Split is more efficient than cross-fitting.
Jingshen Wang, Xuming He, and Gongjun Xu. Debiased Inference on Treatment Effect
in a High Dimensional Model. Journal of the American Statistical Association, 2019.
20 / 22
76. Reference I
Franc¸ois Bachoc, David Preinerstorfer, and Lukas Steinberger. Uniformly valid confidence intervals post-model-selection. arXiv preprint
arXiv:1611.01043, 2016.
Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, and Linda Zhao. Valid post-selection inference. The Annals of Statistics, 41(2):802–837,
2013.
Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
Sougata Chaudhuri, Abraham Bagherjeiran, and James Liu. Ranking and calibrating click-attributed purchases in performance display advertising. In
Proceedings of the ADKDD’17, page 7. ACM, 2017.
Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine
learning for treatment and structural parameters. The Econometrics Journal, 21(1):C1–C68, 2018.
Bradley Efron. Estimation and accuracy after model selection. Journal of the American Statistical Association, 109(507):991–1007, 2014.
Robert F Engle, Clive WJ Granger, John Rice, and Andrew Weiss. Semiparametric estimates of the relation between weather and electricity sales.
Journal of the American statistical Association, 81(394):310–320, 1986.
Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical
Association, 96(456):1348–1360, 2001.
William Fithian, Dennis Sun, and Jonathan Taylor. Optimal inference after model selection. arXiv preprint arXiv:1410.2597, 2014.
Liang Hong, Todd A Kuffner, and Ryan Martin. On overfitting and post-selection uncertainty assessments. Biometrika, 105(1):221–224, 2018.
Arun Kumar Kuchibhotla, Lawrence D Brown, Andreas Buja, Edward I George, and Linda Zhao. A model free perspective for linear regression:
Uniform-in-model bounds for post selection inference. arXiv preprint arXiv:1802.05801, 2018.
Jason D Lee, Dennis L Sun, Yuekai Sun, and Jonathan E Taylor. Exact post-selection inference, with application to the lasso. The Annals of Statistics,
44(3):907–927, 2016.
Frederick Mosteller and John Wilder Tukey. Data analysis and regression: a second course in statistics. Addison-Wesley Series in Behavioral Science:
Quantitative Methods, 1977.
Max Pashkevich, Sundar Dorai-Raj, Melanie Kellar, and Dan Zigmond. Empowering online advertisements by empowering viewers with the right to
choose: the relative effectiveness of skippable video advertisements on youtube. Journal of Advertising Research, 52(4):451–457, 2012.
Alessandro Rinaldo, Larry Wasserman, Max G’Sell, Jing Lei, and Ryan Tibshirani. Bootstrapping and sample splitting for high-dimensional,
assumption-free inference. arXiv preprint arXiv:1611.05401, 2016.
Xiaoying Tian and Jonathan Taylor. Selective inference with a randomized response. The Annals of Statistics, 46(2):679–710, 2018.
Allen J Wilcox and Ian T Russell. Birthweight and perinatal mortality: I. On the frequency distribution of birthweight. International Journal of
Epidemiology, 12(3):314–318, 1983.
J Yerushalmy. The relationship of parents’ cigarette smoking to outcome of pregnancy–implications as to the problem of inferring causation from
observed associations. American Journal of Epidemiology, 93(6):443–443, 1971.
Qingyuan Zhao, Dylan S Small, and Ashkan Ertefaie. Selective inference for effect modification via the lasso. arXiv preprint arXiv:1705.08020, 2017.
21 / 22
78. R-Split: estimation of the variance
Estimator of the variance of α
By the non-parametric delta method, we have
σ2
n =n
n
j=1
n − 1
n − n2
B−1
B
b=1
(vbj − B−1
B
k=1
vkj)αb
2
approx. of the squared influence function
−
n2n
B2(n − n2)
B
b=1
(αb − α)2
finite “B”-bias correction
.
B : the number of the repeated data splitting
n2 : the size of the sample used for refitting
vbj =
1 if jth obs. is used for refitting in bth sub-sample
0 otherwise.
Note: this is a generalization of the nonparametric delta method for
bootstrapping in Efron (2014).
22 / 22