The document discusses high-dimensional sparse econometric models where the number of predictors (p) is much larger than the sample size (n). It outlines an approach for estimating regression functions using penalization methods like the LASSO. Specifically, it discusses:
1. Using the LASSO estimator to minimize squared errors while penalizing the l1-norm of coefficients, inducing sparsity.
2. Choosing the optimal penalty level as a function of the error variance and sample size. Variants like the square-root LASSO provide a tuning-free approach.
3. Examples showing how sparse approximations can better capture patterns in population data than traditional low-dimensional approximations.
Statistical Inference Part II: Types of Sampling DistributionDexlab Analytics
This is an in-depth analysis of the way different types of sampling distribution works focusing on their specific functions and interrelations as part of the discussion on the theory of sampling.
In this slide, variables types, probability theory behind the algorithms and its uses including distribution is explained. Also theorems like bayes theorem is also explained.
Statistical Inference Part II: Types of Sampling DistributionDexlab Analytics
This is an in-depth analysis of the way different types of sampling distribution works focusing on their specific functions and interrelations as part of the discussion on the theory of sampling.
In this slide, variables types, probability theory behind the algorithms and its uses including distribution is explained. Also theorems like bayes theorem is also explained.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
Concepts include decision tree with its examples. Measures used for splitting in decision tree like gini index, entropy, information gain, pros and cons, validation. Basics of random forests with its example and uses.
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Rupak Roy
Detailed demonstration of Multiple Sample Test like Analysis of Variance (ANOVA), kinds of ANOVA One Way, Two Way, Chi-square with their assumptions and applications using excel, and much more.
Let me know if anything is needed. Happy to help. ping @ #bobrupakroy
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
The KNN (K Nearest Neighbors) algorithm analyzes all available data points and classifies this data, then classifies new cases based on these established categories. It is useful for recognizing patterns and for estimating. The KNN Classification algorithm is useful in determining probable outcome and results, and in forecasting and predicting results, given the existence of multiple variables.
very detailed illustration of Log of Odds, Logit/ logistic regression and their types from binary logit, ordered logit to multinomial logit and also with their assumptions.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
Keynote at WOMBAT 2019 (Monash University) https://www.monash.edu/business/wombat2019
Abstract:
Studying causal effects and structures is central to research in management, social science, economics, and other areas, yet typical analysis methods are designed for low-dimensional data. Classification & Regression Trees ("trees") and their variants are popular predictive tools used in many machine learning applications and predictive research, as they are powerful in high-dimensional predictive scenarios. Yet trees are not commonly used in causal-explanatory research. In this talk I will describe adaptations of trees that we developed for tackling two causal-explanatory issues: self selection and confounder detection. For self selection, we developed a novel tree-based approach adjusting for observable self-selection bias in intervention studies, thereby creating a useful tool for analysis of observational impact studies as well as post-analysis of experimental data which scales for big data. For tackling confounders, we repurose trees for automated detection of potential Simpson's paradoxes in data with few or many potential confounding variables, and even with very large samples. I'll also show insights revealed when applying these trees to applications in eGov, labor economics, and healthcare.
This overview discusses the predictive analytical technique known as Random Forest Regression, a method of analysis that creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. This technique is useful to determine which predictors have a significant impact on the target values, e.g., the impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary. Random Forest Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. Random Forest Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is suitable for binary and multiclass classification. Naïve Bayes performs well in cases of categorical input variables compared to numerical variables. It is useful for making predictions and forecasting data based on historical results.
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithminventionjournals
International Journal of Business and Management Invention (IJBMI) is an international journal intended for professionals and researchers in all fields of Business and Management. IJBMI publishes research articles and reviews within the whole field Business and Management, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Recently, the machine learning community has expressed strong interest in applying latent variable modeling strategies to causal inference problems with unobserved confounding. Here, I discuss one of the big debates that occurred over the past year, and how we can move forward. I will focus specifically on the failure of point identification in this setting, and discuss how this can be used to design flexible sensitivity analyses that cleanly separate identified and unidentified components of the causal model.
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
Concepts include decision tree with its examples. Measures used for splitting in decision tree like gini index, entropy, information gain, pros and cons, validation. Basics of random forests with its example and uses.
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Rupak Roy
Detailed demonstration of Multiple Sample Test like Analysis of Variance (ANOVA), kinds of ANOVA One Way, Two Way, Chi-square with their assumptions and applications using excel, and much more.
Let me know if anything is needed. Happy to help. ping @ #bobrupakroy
Intro and maths behind Bayes theorem. Bayes theorem as a classifier. NB algorithm and examples of bayes. Intro to knn algorithm, lazy learning, cosine similarity. Basics of recommendation and filtering methods.
The KNN (K Nearest Neighbors) algorithm analyzes all available data points and classifies this data, then classifies new cases based on these established categories. It is useful for recognizing patterns and for estimating. The KNN Classification algorithm is useful in determining probable outcome and results, and in forecasting and predicting results, given the existence of multiple variables.
very detailed illustration of Log of Odds, Logit/ logistic regression and their types from binary logit, ordered logit to multinomial logit and also with their assumptions.
Thanks, for your time, if you enjoyed this short article there are tons of topics in advanced analytics, data science, and machine learning available in my medium repo. https://medium.com/@bobrupakroy
Repurposing Classification & Regression Trees for Causal Research with High-D...Galit Shmueli
Keynote at WOMBAT 2019 (Monash University) https://www.monash.edu/business/wombat2019
Abstract:
Studying causal effects and structures is central to research in management, social science, economics, and other areas, yet typical analysis methods are designed for low-dimensional data. Classification & Regression Trees ("trees") and their variants are popular predictive tools used in many machine learning applications and predictive research, as they are powerful in high-dimensional predictive scenarios. Yet trees are not commonly used in causal-explanatory research. In this talk I will describe adaptations of trees that we developed for tackling two causal-explanatory issues: self selection and confounder detection. For self selection, we developed a novel tree-based approach adjusting for observable self-selection bias in intervention studies, thereby creating a useful tool for analysis of observational impact studies as well as post-analysis of experimental data which scales for big data. For tackling confounders, we repurose trees for automated detection of potential Simpson's paradoxes in data with few or many potential confounding variables, and even with very large samples. I'll also show insights revealed when applying these trees to applications in eGov, labor economics, and healthcare.
This overview discusses the predictive analytical technique known as Random Forest Regression, a method of analysis that creates a set of Decision Trees from a randomly selected subset of the training set, and aggregates by averaging values from different decision trees to decide the final target value. This technique is useful to determine which predictors have a significant impact on the target values, e.g., the impact of average rainfall, city location, parking availability, distance from hospital, and distance from shopping on the price of a house, or the impact of years of experience, position and productive hours on employee salary. Random Forest Regression is limited to predicting numeric output so the dependent variable has to be numeric in nature. The minimum sample size is 20 cases per independent variable. Random Forest Regression is just one of the numerous predictive analytical techniques and algorithms included in the Assisted Predictive Modeling module of the Smarten augmented analytics solution. This solution is designed to serve business users with sophisticated tools that are easy to use and require no data science or technical skills. Smarten is a representative vendor in multiple Gartner reports including the Gartner Modern BI and Analytics Platform report and the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms Report.
The method of differences-in-differences (DID) is widely used to estimate causal effects. The primary advantage of DID is that it can account for time-invariant bias from unobserved confounders. However, the standard DID estimator will be biased if there is an interaction between history in the after period and the groups. That is, bias will be present if an event besides the treatment occurs at the same time and affects the treated group in a differential fashion. We present a method of bounds based on DID that accounts for an unmeasured confounder that has a differential effect in the post-treatment time period. These DID bracketing bounds are simple to implement and only require partitioning the controls into two separate groups. We also develop two key extensions for DID bracketing bounds. First, we develop a new falsification test to probe the key assumption that is necessary for the bounds estimator to provide consistent estimates of the treatment effect. Next, we develop a method of sensitivity analysis that adjusts the bounds for possible bias based on differences between the treated and control units from the pretreatment period. We apply these DID bracketing bounds and the new methods we develop to an application on the effect of voter identification laws on turnout. Specifically, we focus estimating whether the enactment of voter identification laws in Georgia and Indiana had an effect on voter turnout.
Naive Bayes is a classification algorithm that is suitable for binary and multiclass classification. It is suitable for binary and multiclass classification. Naïve Bayes performs well in cases of categorical input variables compared to numerical variables. It is useful for making predictions and forecasting data based on historical results.
Study on Evaluation of Venture Capital Based onInteractive Projection Algorithminventionjournals
International Journal of Business and Management Invention (IJBMI) is an international journal intended for professionals and researchers in all fields of Business and Management. IJBMI publishes research articles and reviews within the whole field Business and Management, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
CHPTER 3: Multiple Linear Regression
Introduction
In simple regression we study the relationship between a dependent variable and a single explanatory (independent variable); assume that a dependent variable is influenced by only one explanatory variable.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.2: Regression
First post of a Data Science blog about Linear Regression using Matlab.
For more information, please visit:
http://datascienceinsights.blogspot.com.br/
https://github.com/tadeuferreirajr/MachineLearning
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Econometrics of High-Dimensional Sparse Models
1. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Econometrics of High-Dimensional Sparse
Models
(p much larger than n)
Victor Chernozhukov Christian Hansen
NBER, July 2013
VC and CH Econometrics of High-Dimensional Sparse Models
2. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
3. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline for Econometric Theory of “Big Data”
Part I.
1. High-Dimensional Sparse Models (HDSM)
◮ Models
◮ Motivating Examples
2. Estimation of Regression Functions via Penalization and
Selection Methods
◮ ℓ1-penalization or LASSO methods
◮ post-selection estimators or Post-Lasso methods
Part II.
3. Estimation and Inference in IV regression with Many Instruments
4. Estimation and Inference on Treatment Effects with Many
Controls in a Partially Linear Model.
5. Generalizations.
VC and CH Econometrics of High-Dimensional Sparse Models
4. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Materials
1. A semi-review article:
◮ Belloni, Chernozhukov, and Hansen, ”Inference in
High-Dimensional Sparse Econometric Models”, 2010, Advances
in Economics and Econometrics, 10th World Congress.
http://arxiv.org/pdf/1201.0220v1.pdf
2. Research Articles Listed in References.
3. Stata and or Matlab codes are available for most empirical
examples via links to be posted at www.mit.edu/˜vchern/.
VC and CH Econometrics of High-Dimensional Sparse Models
5. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Part I.
VC and CH Econometrics of High-Dimensional Sparse Models
6. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
7. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
1. High-Dimensional Sparse Econometric Model
HDSM. A response variable yi obeys
yi = x′
i β0 + ǫi , ǫi ∼ (0, σ2
), i = 1, ..., n
where xi are p-dimensional; w.l.o.g. we normalize each regressor:
xi = (xij , j = 1, ..., p)′
,
1
n
n
i=1
x2
ij = 1.
p possibly much larger than n.
VC and CH Econometrics of High-Dimensional Sparse Models
8. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
1. High-Dimensional Sparse Econometric Model
HDSM. A response variable yi obeys
yi = x′
i β0 + ǫi , ǫi ∼ (0, σ2
), i = 1, ..., n
where xi are p-dimensional; w.l.o.g. we normalize each regressor:
xi = (xij , j = 1, ..., p)′
,
1
n
n
i=1
x2
ij = 1.
p possibly much larger than n.
The key assumption is sparsity, the number of relevant regressors is
much smaller than the sample size:
s := β0 0 =
p
j=1
1{β0j = 0} ≪ n,
This generalizes the traditional parametric framework used in
empirical economics, by allowing the identity
T = {j ∈ {1, ..., p} : β0j = 0}
of the relevant s regressors be unknown.
VC and CH Econometrics of High-Dimensional Sparse Models
9. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Motivation for high p
◮ transformations of basic regressors zi,
xi = (P1(zi ), ..., Pp(zi ))′
,
◮ for example, in wage regressions, Pj s are polynomials or B-splines
in education and experience.
◮ and/or simply a very large list of regressors,
◮ a list of country characteristics in cross-country growth regressions
(Barro & Lee),
◮ housing characteristics in hedonic regressions (American Housing
Survey)
◮ price and product characteristics at the point of purchase (scanner
data, TNS).
◮ judge characteristics in the analysis of economic impacts of the
eminent domain
VC and CH Econometrics of High-Dimensional Sparse Models
10. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
From Sparsity to Approximate Sparsity
◮ The key assumption is that the number of non-zero regression
coefficients is smaller than the sample size:
s := β0 0 =
p
j=1
1{β0j = 0} ≪ n.
◮ The idea is that a low-dimensional (s-dimensional) submodel
accurately approximates the full p-dimensional model. The
approximation error is in fact zero.
◮ The approximately sparse model allows for a non-zero
approximation error
yi = x′
i β0 + ri
regression function
+ǫi ,
that is not bigger than the size of estimation error, namely as
n → ∞
s log p
n
→ 0,
1
n
n
i=1
r2
i σ
s
n
→ 0.
VC and CH Econometrics of High-Dimensional Sparse Models
11. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
◮ Example:
yi =
∞
j=1
θj xj + ǫi , |θj | j−a
, a > 1/2,
has s = σn1/2a
, because we need only s regressors with largest
coefficients to have
1
n
n
i=1
r2
i σ
s
n
.
◮ The approximately sparse model generalizes the exact sparse
model, by letting in approximation error.
◮ This model also generalizes the traditional series/sieve
regression model by letting the identity
T = {j ∈ {1, ..., p} : β0j = 0}
of the most important s series terms be unknown.
◮ All results we present are for the approximately sparse model.
VC and CH Econometrics of High-Dimensional Sparse Models
12. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Example 1: Series Models of Wage Function
◮ In this example, abstract away from the estimation questions,
using population/census data. In order to visualize the idea of
the approximate sparsity, consider a contrived example.
◮ Consider a series expansion of the conditional expectation
E[yi |zi ] of wage yi given education zi .
◮ A conventional series approximation to the regression function is,
for example,
E[yi |zi ] = β1 + β2P1(zi ) + β3P2(zi ) + β4P3(zi ) + ri
where P1, ..., P3 are low-order polynomials (or other terms).
VC and CH Econometrics of High-Dimensional Sparse Models
13. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
8 10 12 14 16 18 20
6.06.57.0
education
wage
Traditional Approximation of Expected Wage Function
using Polynomials
◮ In the figure, true regression function E[yi|zi ] computed using
U.S. Census data, year 2000, prime-age white men.
VC and CH Econometrics of High-Dimensional Sparse Models
14. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
◮ Can we a find a much better series approximation, with the same
number of parameters?
◮ Yes, if we can capture the oscillatory behavior of E[yi|zi ] in some
regions.
◮ We consider a “very long” expansion
E[yi |zi] =
p
j=1
β0j Pj (zi) + r′
i ,
with polynomials and dummy variables, and shop around just for
a few terms that capture “oscillations”.
◮ We do this using the LASSO – which finds a parsimonious model
by minimizing squared errors, while penalizing the size of the
model through by the sum of absolute values of coefficients. In
this example we can also find the “right” terms by “eye-balling”.
VC and CH Econometrics of High-Dimensional Sparse Models
15. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
8 10 12 14 16 18 20
6.06.57.0
education
wage
Lasso Approximation of Expected Wage Function
using Polynomials and Dummies
VC and CH Econometrics of High-Dimensional Sparse Models
16. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
8 10 12 14 16 18 20
6.06.57.0
education
wage
Traditional vs Lasso Approximation
of Expected Wage Functions
with Equal Number of Parameters
VC and CH Econometrics of High-Dimensional Sparse Models
17. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wiThe Framework Two Examples
Errors of Traditional and Lasso-Based Sparse Approximations
RMSE Max Error
Conventional Series Approximation 0.135 0.290
Lasso-Based Series Approximation 0.031 0.063
Notes.
1. Conventional approximation relies on low order polynomial with 4 parameters.
2. Sparse approximation relies on a combination of polynomials and dummy
variables and also has 4 parameters.
Conclusion. Examples show how the new framework nests and expands the
traditional parsimonious modelling framework used in empirical economics.
VC and CH Econometrics of High-Dimensional Sparse Models
18. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
19. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
2. Estimation of Regression Functions via
L1-Penalization and Selection
◮ When p is large, good idea to do selection or penalization to prevent
overfitting. Ideally, would like to try to minimize a BIC type criterion
function
1
n
n
i=1
[yi − x′
i β]2
+ λ β 0, β 0 =
p
j=1
1{β0j = 0}
but this is not computationally feasible – NP hard.
◮ A solution (Frank and Friedman, 94, Tibshirani, 96) is to replace the ℓ0
”norm” by a closest convex function – the ℓ1-norm. LASSO estimator β
then minimizes
1
n
n
i=1
[yi − x′
i β]2
+ λ β 1, β 1 =
p
j=1
|βj |.
Globally convex, computable in polynomial time. Kink in the penalty
induces the solution ˆβ to have lots of zeroes, so often used as a model
selection device.
VC and CH Econometrics of High-Dimensional Sparse Models
20. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The LASSO
◮ The rate-optimal choice of penalty level is
λ =σ · 2 2 log(pn)/n.
(Bickel, Ritov, Tsybakov, Annals of Statistics, 2009).
◮ The choice relies on knowing σ, which may be apriori hard to estimate
when p ≫ n.
VC and CH Econometrics of High-Dimensional Sparse Models
21. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The LASSO
◮ The rate-optimal choice of penalty level is
λ =σ · 2 2 log(pn)/n.
(Bickel, Ritov, Tsybakov, Annals of Statistics, 2009).
◮ The choice relies on knowing σ, which may be apriori hard to estimate
when p ≫ n.
◮ Can estimate σ by iterating from a conservative starting value (standard
deviation around the sample mean) , see Belloni and Chernozhukov
(2009, Bernoulli). Very simple.
◮ Cross-validation is often used as well and performs well in Monte-Carlo,
but its theoretical validity is an open question in the settings p ≫ n.
VC and CH Econometrics of High-Dimensional Sparse Models
22. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The
√
LASSO
◮ A way around is the
√
LASSO estimator minimizing (Belloni,
Chernozhukov, Wang, 2010, Biometrika)
1
n
n
i=1
[yi − x′
i β]2 + λ β 1,
◮ The rate-optimal penalty level is pivotal – independent of σ:
λ = 2 log(pn)/n.
◮ Tuning-Free. Globally convex, polynomial time computable via
conic programming.
VC and CH Econometrics of High-Dimensional Sparse Models
23. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
24. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
Q(β0) + ∇Q(β0)′
β
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
25. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
λ β 1
λ = ∇Q(β0) ∞
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
26. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
λ β 1
λ > ∇Q(β0) ∞
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
27. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics via Convex Geometry
A simple case: yi = x′
i β0
=0
+ǫi
−5 −4 −3 −2 −1 0 1 2 3 4 5
β0 = 0
Q(β)
λ > ∇Q(β0) ∞
Q(β) + λ β 1
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2
for LASSO
◮ Q(β) = 1
n
n
i=1[yi − x′
i β]2 for
√
LASSO
VC and CH Econometrics of High-Dimensional Sparse Models
28. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics
◮ LASSO (and variants) will successfully “zero out” lots of irrelevant
regressors, but it won’t be perfect, (no procedure can distinguish
β0j = C/
√
n from 0, and so model selection mistakes are bound
to happen).
◮ λ is chosen to dominate the norm of the subgradient:
P(λ > ∇Q(β0) ∞) → 1,
and the choices of λ mentioned precisely implement that.
VC and CH Econometrics of High-Dimensional Sparse Models
29. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Heuristics
◮ LASSO (and variants) will successfully “zero out” lots of irrelevant
regressors, but it won’t be perfect, (no procedure can distinguish
β0j = C/
√
n from 0, and so model selection mistakes are bound
to happen).
◮ λ is chosen to dominate the norm of the subgradient:
P(λ > ∇Q(β0) ∞) → 1,
and the choices of λ mentioned precisely implement that.
◮ In the case of
√
LASSO,
∇Q(β0)
∞
= max
1≤j≤p
|1
n
n
i=1 ǫixij |
1
n
n
i=1 ǫ2
i
does not depend on σ.
◮ Hence for
√
LASSO λ does not depend on σ.
VC and CH Econometrics of High-Dimensional Sparse Models
30. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Dealing with Heteroscedasticity∗
Heteroscedastic Model:
yi = x′
i β0 + ri + ǫi , ǫi ∼ (0, σ2
i ).
◮ Heteroscedastic forms of Lasso – Belloni, Chen, Chernozhukov, Hansen
(Econometrica, 2012). Fully data-driven.
β ∈ arg min
β∈Rp
1
n
n
i=1
[yi − x′
i β]2
+ λ Ψβ 1, λ = 2 2 log(pn)/n
Ψ = diag[(n−1
n
i=1
[x2
ij ǫ2
i ])1/2
+ op(1), j = 1, ..., p]
VC and CH Econometrics of High-Dimensional Sparse Models
31. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Dealing with Heteroscedasticity∗
Heteroscedastic Model:
yi = x′
i β0 + ri + ǫi , ǫi ∼ (0, σ2
i ).
◮ Heteroscedastic forms of Lasso – Belloni, Chen, Chernozhukov, Hansen
(Econometrica, 2012). Fully data-driven.
β ∈ arg min
β∈Rp
1
n
n
i=1
[yi − x′
i β]2
+ λ Ψβ 1, λ = 2 2 log(pn)/n
Ψ = diag[(n−1
n
i=1
[x2
ij ǫ2
i ])1/2
+ op(1), j = 1, ..., p]
◮ Penalty loadings Ψ are estimated iteratively:
1. initialize, e.g., ˆǫi = yi − ¯y, Ψ = diag[(n−1 n
i=1[x2
ij ˆǫ2
i ])1/2
, j = 1, ..., p]
2. obtain β, update
ˆǫi = yi − x′
i
ˆβ, Ψ = diag[(n−1 n
i=1[x2
ij ˆǫ2
i ])1/2
, j = 1, ..., p]
3. iterate on the previous step.
◮ For Heteroscedastic forms of
√
LASSO, see Belloni, Chernozhukov,
Wang (Annals of Statistics, R & R, 2011).
VC and CH Econometrics of High-Dimensional Sparse Models
32. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Probabilistic intuition for the latter construction ∗
Construction makes the ”noise” in Kuhn-Tucker conditions self-normalized,
and λ dominates the ”noise”.
Union bounds and the moderate deviation theory for self-normalized sums
(Jing, Shao, Wang, Ann. Prob., 2005) imply that:
P max
1≤j≤p
2|1
n
n
i=1[ǫi xij ]|
1
n
n
i=1 ǫ2
i x2
ij
”max norm of gradient”
≤ λ
penalty level
= 1 − O(1/n).
under the condition that
log p = o(n1/3
)
if for all i ≤ n, j ≤ p
E[x3
ij ǫ3
i ] ≤ K.
VC and CH Econometrics of High-Dimensional Sparse Models
33. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Some properties
◮ Due to kink in the penalty, LASSO (and variants) will successfully
“zero out” lots of irrelevant regressors (but don’t expect it to be
perfect).
◮ Lasso procedures bias/shrink the non-zero coefficient estimates
towards zero.
◮ The latter property motivates the use of Least squares after
Lasso, or Post-Lasso.
VC and CH Econometrics of High-Dimensional Sparse Models
34. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Post-Model Selection Estimator, or Post-LASSO
Define the post-selection, e.g., post-LASSO estimator as follows:
1. In step one, select the model using the LASSO or
√
LASSO.
2. In step two, apply ordinary LS to the selected model.
VC and CH Econometrics of High-Dimensional Sparse Models
35. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Regularity Condition on X∗
◮ A simple sufficient condition is as follows.
Condition RSE. Take any C > 1. With probability approaching 1, matrix
M =
1
n
n
i=1
xi x′
i ,
obeys
0 < K ≤ min
δ 0≤sC
δ′
Mδ
δ′δ
≤ max
δ 0≤sC
δ′
Mδ
δ′δ
≤ K′
< ∞. (1)
◮ This holds under i.i.d. sampling if E[xi x′
i ] has eigenvalues bounded
away from zero and above, and:
– xi has light tails (i.e., log-concave) and s log p = o(n);
– or bounded regressors maxij |xij | ≤ K and s(log p)5
= o(n).
Ref. Rudelson and Vershynin (2009).
VC and CH Econometrics of High-Dimensional Sparse Models
36. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Result 1: Rates for LASSO/
√
LASSO
Theorem (Rates)
Under practical regularity conditions– including errors having 4 + δ bounded
moments and log p = o(n1/3
) – with probability approaching 1,
ˆβ − β0
1
n
n
i=1
[x′
i
ˆβ − x′
i β0]2 σ
s log(n ∨ p)
n
◮ The rate is close to the “oracle” rate s/n, obtainable when we know
the “true” model T; p shows up only through log p.
◮ References.
- LASSO — Bickel, Ritov, Tsybakov (Annals of Statistics 2009), Gaussian
errors.
- heteroscedastic LASSO – Belloni, Chen, Chernozhukov, Hansen
(Econometrica 2012), non-Gaussian errors.
-
√
LASSO – Belloni, Chernozhukov and Wang (Biometrika, 2010),
non-Gaussian errors.
- heteroscedastic
√
LASSO – Belloni, Chernozhukov and Wang (Annals R
&R , 2010), non-Gaussian errors.
VC and CH Econometrics of High-Dimensional Sparse Models
37. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Result 2: Post-Model Selection Estimator
In the rest of the talk LASSO means all of its variants, especially their
heteroscedastic versions.
Recall that the post-LASSO estimator is defined as follows:
1. In step one, select the model using the LASSO.
2. In step two, apply ordinary LS to the selected model.
◮ Lasso (or any other method) is not perfect at model selection –
might include “junk”, exclude some relevant regressors.
◮ Analysis of all post-selection methods in this lecture accounts for
imperfect model selection .
VC and CH Econometrics of High-Dimensional Sparse Models
38. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Result 2: Post-Selection Estimator
Theorem (Rate for Post-Selection Estimator)
Under practical conditions, with probability approaching 1,
ˆβPL − β0
1
n
n
i=1
[x′
i
ˆβPL − x′
i β0]2 σ
s
n
log(n ∨ p),
Under some further exceptional cases faster, up to σ s
n .
◮ Even though LASSO does not in general perfectly select the
relevant regressors, Post-LASSO performs at least as well.
◮ This result was first derived for least squares by
◮ Belloni and Chernozhukov (Bernoulli, 2009).
◮ Extended to heteroscedastic, non-Gaussian case in
◮ Belloni, Chen, Chernozhukov, Hansen (Econometrica, 2012).
VC and CH Econometrics of High-Dimensional Sparse Models
39. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo
◮ In this simulation we used
s = 6, p = 500, n = 100
yi = x′
i β0 + ǫi , ǫi ∼ N(0, σ2
),
β0 = (1, 1, 1/2, 1/3, 1/4, 1/5, 0, . . . , 0)′
xi ∼ N(0, Σ), Σij = (1/2)|i−j|
, σ2
= 1
◮ Ideal benchmark: Oracle estimator which runs OLS of yi on
xi1, ..., xi6. This estimator is not feasible outside Monte-Carlo.
VC and CH Econometrics of High-Dimensional Sparse Models
40. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo Results: Prediction Error
RMSE: [E[x′
i (ˆβ − β0)]2
]1/2
n = 100, p = 500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
LASSO Post-LASSO Oracle
Estimation Risk
Lasso is not perfect at model selection, but does find good models, allowing
Lasso and Post-Lasso to perform at the near-Oracle level.
VC and CH Econometrics of High-Dimensional Sparse Models
41. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo Results: Bias
Norm of the Bias E ˆβ − β0
n = 100, p = 500
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
LASSO Post-LASSO
Bias
Post-Lasso often outperforms Lasso due to removal of shrinkage bias.
VC and CH Econometrics of High-Dimensional Sparse Models
42. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Part II.
VC and CH Econometrics of High-Dimensional Sparse Models
43. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
44. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
3. Estimation and Inference with Many Instruments
Focus discussion on a simple IV model:
yi = di α + ǫi ,
di = g(zi ) + vi , (first stage)
ǫi
vi
| zi ∼ 0,
σ2
ǫ σǫv
σǫv σ2
v
◮ can have additional controls wi entering both equations –
assume these have been partialled out; also can have multiple
endogenous variables; see references for details
◮ the main target is α, and g is the unspecified regression function
= “optimal instrument”
◮ We have either
◮ Many instruments. xi = zi , or
◮ Many technical insturments. xi = P(zi ), e.g. polynomials,
trigonometric terms.
◮ where the number of instruments
p is large, possibly much larger than n
.
VC and CH Econometrics of High-Dimensional Sparse Models
45. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
3. Inference in the Instrumental Variable Model
◮ Assume approximate sparsity:
g(zi ) = E[di |zi ] = x′
i β0
sparse approximation
+ ri
approx error
that is, optimal instrument is approximated by s (unknown)
instruments, such that
s := β0 0 ≪ n,
1
n
n
i=1
r2
i ≤ σv
s
n
◮ We shall find these ”effective” instruments amongst xi by Lasso,
and estimate the optimal instrument by Post-Lasso,
ˆg(zi ) = x′
i
ˆβPL.
◮ Estimate α using the estimated optimal instrument via 2SLS.
VC and CH Econometrics of High-Dimensional Sparse Models
46. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Example 2: Instrument Selection in Angrist-Krueger
Data
◮ yi = wage
◮ di = education (endogenous)
◮ α = returns to schooling
◮ zi = quarter of birth and controls (50 state of birth dummies and 7
year of birth dummies)
◮ xi = P(zi ), includes zi and all interactions
◮ a very large list, p = 1530
Using few instruments (3 quarters of birth) or many instruments
(1530) gives big standard errors. So it seems a good idea to use
instrument selection to see if can improve.
VC and CH Econometrics of High-Dimensional Sparse Models
47. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
AK Example
Estimator Instruments Schooling Coef Rob Std Error
2SLS (3 IVs) 3 .10 .020
2SLS (All IVs) 1530 .10 .042
2SLS (LASSO IVs) 12 .10 .014
Notes:
◮ About 12 constructed instruments contain nearly all information.
◮ Fuller’s form of 2SLS is used due to robustness.
◮ The Lasso selection of instruments and standard errors are fully
justified theoretically below
VC and CH Econometrics of High-Dimensional Sparse Models
48. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
2SLS with Post-LASSO estimated Optimal IV
2SLS with Post-LASSO estimated Optimal IV
◮ In step one, estimate optimal instrument g(zi ) = x′
i β using
Post-LASSO estimator.
◮ In step two, compute the 2SLS using optimal instrument as IV,
α = [
1
n
n
i=1
[di g(zi )′
]]−1 1
n
n
i=1
[g(zi )yi ]
VC and CH Econometrics of High-Dimensional Sparse Models
49. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
IV Selection: Theoretical Justification
Theorem (Result 3: 2SLS with LASSO-selected IV)
Under practical regularity conditions, if the optimal instrument is
sufficient sparse, namely s2
log2
p = o(n), and is strong, namely
|E[di g(zi )]| is bounded away from zero, we have that
σ−1
n
√
n(α − α) →d N(0, 1),
where σ2
n is the standard White’s robust formula for the variance of
2SLS. The estimator is semi-parametrically efficient under
homoscedasticity.
◮ Ref: Belloni, Chen, Chernozhukov, and Hansen (Econometrica, 2012)
for a general statement.
◮ A weak-instrument robust procedure is also available – the sup-score
test; see Ref.
◮ Key point: “Selection mistakes” are asymptotically negligible due to
”low-bias” property of the estimating equations, which we shall discuss
later.
VC and CH Econometrics of High-Dimensional Sparse Models
50. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
IV Selection: Monte Carlo Justification
A representative example: Everything Gaussian, with
di =
100
j=1
xij · µj
+ vi , |µ| < 1
This is an approximately sparse model where most of information is
contained in a few instruments.
Case 1. p = 100 < n = 250, first stage E[F] = 40
Estimator RMSE 5% Rej Prob
(Fuller’s) 2SLS ( All IVs) 0.13 5.6%
2SLS (LASSO IVs) 0.08 6%
Remark. Fuller’s 2SLS is a consistent under many instruments, and is a state of the art
method.
VC and CH Econometrics of High-Dimensional Sparse Models
51. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
IV Selection: Monte Carlo Justification
A representative example: Everything Gaussian, with
di =
100
j=1
xij · µj
+ vi , |µ| < 1
This is an approximately sparse model where most of information is
contained in a few instruments.
Case 2. p = 100 = n = 100, first stage E[F] = 40
Estimator RMSE 5% Rej Prob
(Fuller’s) 2SLS (Alls IVs) 5.05 8%
2SLS (LASSO IVs) 0.13 6%
◮ Conclusion. Performance of the new method is quite reassuring.
VC and CH Econometrics of High-Dimensional Sparse Models
52. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
53. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
4. Estimation & Inference on Treatment Effects in a
Partially Linear Model
Example 3: (Exogenous) Cross-Country Growth Regression.
◮ Relation between growth rate and initial per capita GDP,
conditional on covariates, describing institutions and
technological factors:
GrowthRate
yi
= β0 + α
ATE
log(GDP)
di
+
p
j=1
βj xij + ǫi
where the model is exogenous,
E[ǫi|di , xi] = 0.
◮ Test the convergence hypothesis – α < 0 – poor countries catch
up with richer countries, conditional on similar institutions etc.
Prediction from the classical Solow growth model.
◮ In Barro-Lee data, we have p = 60 covariates, n = 90
observations. Need to do selection.
VC and CH Econometrics of High-Dimensional Sparse Models
54. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
How to perform selection?
◮ (Don’t do it!) Naive/Textbook selection
1. Drop all x′
ij s that have small coefficients, using model selection
devices (classical such as t-tests or modern)
2. Run OLS of yi on di and selected regressors.
Does not work because fails to control omitted variable bias.
(Leeb and P¨otscher, 2009).
◮ We propose Double Selection approach:
1. Select controls xij ’s that predict yi .
2. Select controls xij ’s that predict di .
3. Run OLS of yi on di and the union of controls selected in steps 1
and 2.
◮ The additional selection step controls the omitted variable bias.
◮ We find that the coefficient on lagged GDP is negative, and the
confidence intervals exclude zero.
VC and CH Econometrics of High-Dimensional Sparse Models
55. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Real GDP per capita (log)
Method Effect Std. Err.
Barro-Lee (Economic Reasoning) −0.02 0.005
All Controls (n = 90, p = 60) −0.02 0.031
Post-Naive Selection −0.01 0.004
Post-Double-Selection −0.03 0.011
◮ Double-Selection finds 8 controls, including trade-openness and
several education variables.
◮ Our findings support the conclusions reached in Barro and Lee
and Barro and Sala-i-Martin.
◮ Using all controls is very imprecise.
◮ Using naive selection gives a biased estimate for the speed of
convergence.
VC and CH Econometrics of High-Dimensional Sparse Models
56. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
TE in a Partially Linear Model
Partially linear regression model (exogenous)
yi = di α0 + g(zi ) + ζi , E[ζi | zi , di ] = 0,
◮ yi is the outcome variable
◮ di is the policy/treatment variable whose impact is α0
◮ zi represents confounding factors on which we need to condition
For us the auxilliary equation will be important:
di = m(zi ) + vi, E[vi | zi] = 0,
◮ m summarizes the counfounding effect and creates omitted
variable biases.
VC and CH Econometrics of High-Dimensional Sparse Models
57. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
TE in a Partially Linear Model
Use many control terms xi = P(zi ) ∈ IRp
to approximate g and m
yi = di α0 + x′
i βg0 + rgi
g(zi )
+ζi , di = x′
i βm0 + rmi
m(zi )
+vi
◮ Many controls. xi = zi.
◮ Many technical controls. xi = P(zi ), e.g. polynomials,
trigonometric terms.
Key assumption: g and m are approximately sparse
VC and CH Econometrics of High-Dimensional Sparse Models
58. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
The Inference Problem and Caveats
yi = di α0 + x′
i βg0 + ri + ζi , E[ζi | zi , di ] = 0,
Naive/Textbook Inference:
1. Select controls terms by running Lasso (or variants) of yi on di
and xi
2. Estimate α0 by least squares of yi on di and selected controls,
apply standard inference
However, this naive approach has caveats:
◮ Relies on perfect model selection and exact sparsity. Extremely
unrealistic.
◮ Easily and badly breaks down both theoretically (Leeb and
P¨otscher, 2009) and practically.
VC and CH Econometrics of High-Dimensional Sparse Models
59. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Monte Carlo
◮ In this simulation we used: p = 200, n = 100, α0 = .5
yi = di α0 + x′
i (cy θ0) + ζi , ζi ∼ N(0, 1)
di = x′
i (cdθ0) + vi, vi ∼ N(0, 1)
◮ approximately sparse model:
θ0j = 1/j2
◮ let cy and cd vary to vary R2
in each equation
◮ regressors are correlated Gaussians:
x ∼ N(0, Σ), Σkj = (0.5)|j−k|
.
VC and CH Econometrics of High-Dimensional Sparse Models
60. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Distribution of Naive Post Selection Estimator
R2
d = .5 and R2
y = .5
−8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8
0
=⇒ badly biased, misleading confidence intervals;
predicted by theorems in Leeb and P¨otscher (2009)
VC and CH Econometrics of High-Dimensional Sparse Models
61. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Inference Quality After Model Selection
Look at the rejection probabilities of a true hypothesis.
yi = di α0 + x′
i (
=⇒ R2
y
cy θ0) + ζi
di = x′
i ( cd
=⇒ R2
d
θ0) + vi
Ideal Rejection Rate
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
VC and CH Econometrics of High-Dimensional Sparse Models
62. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Inference Quality of Naive Selection
Look at the rejection probabilities of a true hypothesis.
Naive/Textbook Selection
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
Ideal
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
actual rejection probability (LEFT) is far off the nominal rejection probability (RIGHT)
consistent with results of Leeb and P¨otscher (2009)
VC and CH Econometrics of High-Dimensional Sparse Models
63. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Our Proposal: Post Double Selection Method
To define the method, write the reduced form (substitute out di )
yi = x′
i
¯β0 + ¯ri + ¯ζi ,
di = x′
i βm0 + rmi + vi ,
1. (Direct) Let ˆI1 be controls selected by Lasso of yi on xi.
2. (Indirect) Let ˆI2 be controls selected by Lasso of di on xi .
3. (Final) Run least squares of yi on di and union of selected controls:
(ˇα, ˇβ) = argmin
α∈R,β∈Rp
{
1
n
n
i=1
[(yi − di α − x′
i β)2
] : βj = 0, ∀j ∈ ˆI = ˆI1 ∪ˆI2}.
The post-double-selection estimator.
◮ Belloni, Chernozhukov, Hansen (World Congress, 2010).
◮ Belloni, Chernozhukov, Hansen (ReStud, 2011, R &R)
VC and CH Econometrics of High-Dimensional Sparse Models
64. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Distributions of Post Double Selection Estimator
R2
d = .5 and R2
y = .5
−8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8
0
=⇒ low bias, accurate confidence intervals
Belloni, Chernozhukov, Hansen (2011)
VC and CH Econometrics of High-Dimensional Sparse Models
65. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Inference Quality After Model Selection
Double Selection
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
Ideal
0
0.2
0.4
0.6
0.8
0
0.2
0.4
0.6
0.8
0
0.1
0.2
0.3
0.4
0.5
Second Stage R2First Stage R2
the left plot is rejection frequency of the t-test based on the post-double-selection
estimator
Belloni, Chernozhukov, Hansen (2011)
VC and CH Econometrics of High-Dimensional Sparse Models
66. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Intuition
◮ The double selection method is robust to moderate selection
mistakes.
◮ The Indirect Lasso step — the selection among the controls xi
that predict di – creates this robustness. It finds controls whose
omission would lead to a ”large” omitted variable bias, and
includes them in the regression.
◮ In essence the procedure is a selection version of Frisch-Waugh
procedure for estimating linear regression.
VC and CH Econometrics of High-Dimensional Sparse Models
67. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
More Intuition
Think about omitted variables bias in case with one treatment (d) and one
regressor (x):
yi = αdi + βxi + ζi ; di = γxi + vi
If we drop xi , the short regression of yi on di gives
√
n(α − α) = good term +
√
n (D′
D/n)−1
(X′
X/n)(γβ)
OMVB
.
◮ the good term is asymptotically normal, and we want
√
nγβ → 0.
VC and CH Econometrics of High-Dimensional Sparse Models
68. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
More Intuition
Think about omitted variables bias in case with one treatment (d) and one
regressor (x):
yi = αdi + βxi + ζi ; di = γxi + vi
If we drop xi , the short regression of yi on di gives
√
n(α − α) = good term +
√
n (D′
D/n)−1
(X′
X/n)(γβ)
OMVB
.
◮ the good term is asymptotically normal, and we want
√
nγβ → 0.
◮ naive selection drops xi if β = O( log n/n), but
√
nγ log n/n → ∞
◮ double selection drops xi only if both β = O( log n/n) and
γ = O( log n/n), that is, if
√
nγβ = O((log n)/
√
n) → 0.
VC and CH Econometrics of High-Dimensional Sparse Models
69. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Main Result
Theorem (Result 4: Inference on a Coefficient in
Regression)
Uniformly within a rich class of models, in which g and m admit a
sparse approximation with s2
log2
(p ∨ n)/n → 0 and other practical
conditions holding,
σ−1
n
√
n(ˇα − α0) →d N(0, 1),
where σ2
n is Robinoson’s formula for variance of LS in a partially linear
model. Under homoscedasticity, semi-parametrically efficient.
◮ Model selection mistakes are asymptotically negligible due to
double selection.
VC and CH Econometrics of High-Dimensional Sparse Models
70. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Bonus Track: Generalizations.∗
◮ The double selection (DS) procedure implicitly identifies α0 implicitly off
the moment condition:
E[Mi (α0, g0, m0)] = 0,
where
Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi ))
where g0 and m0 are (implicitly) estimated by the post-selection
estimators.
VC and CH Econometrics of High-Dimensional Sparse Models
71. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Bonus Track: Generalizations.∗
◮ The double selection (DS) procedure implicitly identifies α0 implicitly off
the moment condition:
E[Mi (α0, g0, m0)] = 0,
where
Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi ))
where g0 and m0 are (implicitly) estimated by the post-selection
estimators.
◮ The DS procedure works because Mi is ”immunized” against
perturbations in g0 and m0:
∂
∂g
E[Mi (α0, g, m0)]|g=g0 = 0,
∂
∂m
E[Mi (α0, g0, m)]|m=m0 = 0.
◮ Moderate selection errors translate into moderate estimation errors,
which have asymptotically negligible effects on large sample distribution
of estimators of α0 based on the sample analogs of equations above.
VC and CH Econometrics of High-Dimensional Sparse Models
72. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Can this be generalized? Yes. Generally want to create moment
equations such that target parameter α0 is identified via moment
condition:
E[Mi (α0, h0)] = 0,
where α0 is the main parameter, and h0 is a nuisance function (e.g.
h0 = (g0, m0)), with Mi ”immunized” against perturbations in h0:
∂
∂h
E[Mi (α0, h)]|h=h0
= 0
◮ This property allows for ”non-regular” estimation of h, via
post-selection or other regularization methods, with rates that are
slower than 1/
√
n.
◮ It allows for moderate selection mistakes in estimation.
◮ In absence of the immunization property, the post-selection
inference breaks down.
VC and CH Econometrics of High-Dimensional Sparse Models
73. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Bonus Track: Generalizations.∗
Examples in this Framework:
1. IV model
Mi (α, g) = (yi − di α)g(zi )
has immunization property (since E[(yi − di α0)˜g(zi )] = 0 for any ˜g), and this
=⇒ validity of inference after selection-based estimation of g)
2. Partially linear model
Mi (α, g, m) = (yi − di α − g(zi ))(di − m(zi ))
has immunization property, which =⇒ validity of post-selection inference,
where we do double selection – controls that explain g and m.
3. Logistic, Quantile regression
◮ Belloni, Chernozhukov, Kato (2013, ArXiv)
◮ Belloni, Chernozhukov, Ying (2013, ArXiv)
VC and CH Econometrics of High-Dimensional Sparse Models
74. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
75. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
5. Heterogeneous Treatment Effects∗
◮ Here di is binary, indicating the receipt of the treatment,
◮ Drop partially linear structure; instead assume di is fully
interacted with all other control variables:
yi = di g(1, zi ) + (1 − di )g(0, zi )
g(di ,zi )
+ζi , E[ζi | di , zi] = 0
di = m(zi) + ui , E[ui |zi ] = 0 (as before)
◮ Target parameter. Average Treatment Effect:
α0 = E[g(1, zi ) − g(0, zi )].
◮ Example. di = 401(k) eligibility, zi = characteristics of the
worker/firm, yi= net savings or total wealth, α0 = the average
impact of 401(k) eligibility on savings.
VC and CH Econometrics of High-Dimensional Sparse Models
76. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
5. Heterogeneous Treatment Effects ∗
An appropriate Mi is given by Hahn’s (1998) efficient score
Mi (α, g, m) =
di (yi − g(1, zi ))
m(zi )
−
(1 − di )(yi − g(0, zi ))
1 − m(zi )
+ g(1, zi ) − g(0, zi ) − α.
which is ”immunized” against perturbations in g0 and m0:
∂
∂g
E[Mi (α0, g, m0)]|g=g0
= 0,
∂
∂m
E[Mi (α0, g0, m)]|m=m0
= 0.
Hence the post-double selection estimator for α is given by
ˇα =
1
N
N
i=1
di (yi − g(1, zi ))
ˆm(zi )
−
(1 − di )(yi − g(0, zi ))
1 − m(zi )
+ g(1, zi ) − g(0, zi ) ,
where we estimate g and m via post- selection (Post-Lasso)
methods.
VC and CH Econometrics of High-Dimensional Sparse Models
77. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Theorem (Result 5: Inference on ATE)
Uniformly within a rich class of models, in which g and m admit a
sparse approximation with s2
log2
(p ∨ n)/n → 0 and other practical
conditions holding,
σ−1
n
√
n(ˇα − α0) →d N(0, 1),
where σ2
n = E[M2
i (α0, g0, m0)].
Moreover, ˇα is semi-parametrically efficient for α0.
◮ Model selection mistakes are asymptotically negligible due to the
use of ”immunizing” moment equations.
◮ Ref. Belloni, Chernozhukov, Hansen “Inference on TE after selection amongst
high-dimensional controls” (2013 version, available via CEMMAP).
VC and CH Econometrics of High-Dimensional Sparse Models
78. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Outline
Plan
1. High-Dimensional Sparse Framework
The Framework
Two Examples
2. Estimation of Regression Functions via Penalization and Selection
3. Estimation and Inference with Many Instruments
4. Estimation & Inference on Treatment Effects in a Partially Linear Model
5. Estimation and Inference on TE in a General Model
Conclusion
VC and CH Econometrics of High-Dimensional Sparse Models
79. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
Conclusion
◮ Approximately sparse model
◮ Corresponds to the usual ”parsimonious” approach, but
specification searches are put on rigorous footing
◮ Useful for predicting regression functions
◮ Useful for selection of instruments
◮ Useful for selection of controls, but avoid naive/textbook
selection
◮ Use double selection that protects against omitted variable bias
◮ Use “immunized” moment equations more generally
VC and CH Econometrics of High-Dimensional Sparse Models
80. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
References
◮ Bickel, P., Y. Ritov and A. Tsybakov, “Simultaneous analysis of Lasso and Dantzig selector”, Annals of Statistics, 2009.
◮ Candes E. and T. Tao, “The Dantzig selector: statistical estimation when p is much larger than n,” Annals of Statistics, 2007.
◮ Donald S. and W. Newey, “Series estimation of semilinear models,” Journal of Multivariate Analysis, 1994.
◮ Tibshirani, R, “Regression shrinkage and selection via the Lasso,” J. Roy. Statist. Soc. Ser. B, 1996.
◮ Frank, I. E., J. H. Friedman (1993): “A Statistical View of Some Chemometrics Regression Tools,”Technometrics, 35(2), 109–135.
◮ Gautier, E., A. Tsybakov (2011): “High-dimensional Instrumental Variables Rergession and Confidence Sets,” arXiv:1105.2454v2
◮ Hahn, J. (1998): “On the role of the propensity score in efficient semiparametric estimation of average treatment effects,”
Econometrica, pp. 315–331.
◮ Heckman, J., R. LaLonde, J. Smith (1999): “The economics and econometrics of active labor market programs,” Handbook of
labor economics, 3, 1865–2097.
◮ Imbens, G. W. (2004): “Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review,” The Review of
Economics and Statistics, 86(1), 4–29.
◮ Leeb, H., and B. M. P¨otscher (2008): “Can one estimate the unconditional distribution of post-model-selection estimators?,”
Econometric Theory, 24(2), 338–376.
◮ Robinson, P. M. (1988): “Root-N-consistent semiparametric regression,” Econometrica, 56(4), 931–954.
◮ Rudelson, M., R. Vershynin (2008): “On sparse reconstruction from Foruier and Gaussian Measurements”, Comm Pure Appl
Math, 61, 1024-1045.
◮ Jing, B.-Y., Q.-M. Shao, Q. Wang (2003): “Self-normalized Cramer-type large deviations for independent random variables,” Ann.
Probab., 31(4), 2167–2215.
VC and CH Econometrics of High-Dimensional Sparse Models
81. Plan 1. High-Dimensional Sparse Framework 2. Estimation of Regression Functions via Penalization and Selection 3. Estimation and Inference wi
All references below are downloadable via www.arXiv.org:
◮ Belloni, A., V. Chernozhukov, and C. Hansen (2010) “Inference for High-Dimensional Sparse Econometric Models,” Advances in
Economics and Econometrics. 10th World Congress of Econometric Society, Shanghai, 2010. (ArXiv, 2011).
◮ Belloni, Chernozhukov, Hansen (2011) “Inference on TE after selection amongst high-dimensional controls”, CEMMAP working
paper (revised June, 2013). R&R Review of Economic Studies. (ArXiv, 2011)
◮ Belloni, A., D. Chen, V. Chernozhukov, and C. Hansen (2012): “Sparse Models and Methods for Optimal Instruments with an
Application to Eminent Domain,” Econometrica, 80, 2369–2429. (ArXiv, 2010)
◮ Belloni, A., and V. Chernozhukov (2011a): “ℓ1-penalized quantile regression in high-dimensional sparse models,” Ann. Statist.,
39(1), 82–130. (ArXiv, 2009)
◮ Belloni, A., and V. Chernozhukov (2013): “Least Squares After Model Selection in High-dimensional Sparse Models,” Bernoulli,
19(2), 521–547. (ArXiv, 2009)
◮ Belloni, A., V. Chernozhukov, L. Wang (2011): “Square-Root-LASSO: Pivotal Recovery of Sparse Signals via Conic Programming,”
Biometrika, 98(4), 791–806. (ArXiv, 2010).
◮ Belloni, A., V. Chernozhukov, K. Kato (2013): “Uniform Post Selection Inference for LAD Regression Models,” arXiv preprint
arXiv:1304.0282. (ArXiv, 2013)
◮ Belloni, A., V. Chernozhukov, L. Wang (2011): “Square-Root-LASSO: Pivotal Recovery of Nonparametric Regression Functions
via Conic Programming,” (ArXiv, 2011)
◮ Belloni, A., V. Chernozhukov, Y. Wei (2013): “Honest Confidence Regions for Logistic Regression with a Large Number of
Controls,” arXiv preprint arXiv:1304.3969 (ArXiv, 2013).
VC and CH Econometrics of High-Dimensional Sparse Models