• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
PLSC 503_spring 2013_lecture 1
 

PLSC 503_spring 2013_lecture 1

on

  • 3,381 views

 

Statistics

Views

Total Views
3,381
Views on SlideShare
302
Embed Views
3,079

Actions

Likes
0
Downloads
3
Comments
0

1 Embed 3,079

http://www.thaddunning.com 3079

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    PLSC 503_spring 2013_lecture 1 PLSC 503_spring 2013_lecture 1 Presentation Transcript

    • Causality Statistical inferencePLSC 503: Quantitative Methods, Week 1 Thad Dunning Department of Political Science Yale University Lecture Notes Week 1: Causal Inference and Potential Outcomes Lecture Notes, Week 1 1/ 41
    • Causality Statistical inferenceIntroduction to 503 Social scientists use many methods, for many different purposes. Lecture Notes, Week 1 2/ 41
    • Causality Statistical inferenceIntroduction to 503 Social scientists use many methods, for many different purposes. Quantitative analysis can play several roles: Lecture Notes, Week 1 2/ 41
    • Causality Statistical inferenceIntroduction to 503 Social scientists use many methods, for many different purposes. Quantitative analysis can play several roles: Description, e.g., conceptualization and measurement Lecture Notes, Week 1 2/ 41
    • Causality Statistical inferenceIntroduction to 503 Social scientists use many methods, for many different purposes. Quantitative analysis can play several roles: Description, e.g., conceptualization and measurement Causal inference Lecture Notes, Week 1 2/ 41
    • Causality Statistical inferenceIntroduction to 503 Social scientists use many methods, for many different purposes. Quantitative analysis can play several roles: Description, e.g., conceptualization and measurement Causal inference The latter is perhaps the trickiest. Lecture Notes, Week 1 2/ 41
    • Causality Statistical inferenceIntroduction to 503 Social scientists use many methods, for many different purposes. Quantitative analysis can play several roles: Description, e.g., conceptualization and measurement Causal inference The latter is perhaps the trickiest. This course introduces causal and statistical models for quantitative analysis, and places emphasis on the importance of strong research design. Lecture Notes, Week 1 2/ 41
    • Causality Statistical inferenceIntroduction to 503 Social scientists use many methods, for many different purposes. Quantitative analysis can play several roles: Description, e.g., conceptualization and measurement Causal inference The latter is perhaps the trickiest. This course introduces causal and statistical models for quantitative analysis, and places emphasis on the importance of strong research design. It is important to master technique; it is even more important to understand core assumptions. Lecture Notes, Week 1 2/ 41
    • Causality Statistical inferenceOrganization of the course Causal and statistical inference under the potential outcomes model Lecture Notes, Week 1 3/ 41
    • Causality Statistical inferenceOrganization of the course Causal and statistical inference under the potential outcomes model Regression as a descriptive tool (bivariate and multivariate, in scalar and matrix form) Lecture Notes, Week 1 3/ 41
    • Causality Statistical inferenceOrganization of the course Causal and statistical inference under the potential outcomes model Regression as a descriptive tool (bivariate and multivariate, in scalar and matrix form) Regression models: causal and statistical inference Lecture Notes, Week 1 3/ 41
    • Causality Statistical inferenceOrganization of the course Causal and statistical inference under the potential outcomes model Regression as a descriptive tool (bivariate and multivariate, in scalar and matrix form) Regression models: causal and statistical inference (Spring break) Lecture Notes, Week 1 3/ 41
    • Causality Statistical inferenceOrganization of the course Causal and statistical inference under the potential outcomes model Regression as a descriptive tool (bivariate and multivariate, in scalar and matrix form) Regression models: causal and statistical inference (Spring break) Various topics in the design and analysis of experimental and observational data: Lecture Notes, Week 1 3/ 41
    • Causality Statistical inferenceOrganization of the course Causal and statistical inference under the potential outcomes model Regression as a descriptive tool (bivariate and multivariate, in scalar and matrix form) Regression models: causal and statistical inference (Spring break) Various topics in the design and analysis of experimental and observational data: E.g., difference-in-difference designs, matching, natural experiments Lecture Notes, Week 1 3/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceOne Design for Causal Inference: Does Voter PressureShape Turnout? Why people fail to vote—and why they vote at all—are both puzzles for many social scientists. Lecture Notes, Week 1 4/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceOne Design for Causal Inference: Does Voter PressureShape Turnout? Why people fail to vote—and why they vote at all—are both puzzles for many social scientists. Maybe people have intrinsic incentives that lead them to vote (a sense of duty?). Or maybe they respond to peer pressure. Lecture Notes, Week 1 4/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceOne Design for Causal Inference: Does Voter PressureShape Turnout? Why people fail to vote—and why they vote at all—are both puzzles for many social scientists. Maybe people have intrinsic incentives that lead them to vote (a sense of duty?). Or maybe they respond to peer pressure. However, testing hypotheses about what causes people to vote is challenging. Lecture Notes, Week 1 4/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceOne Design for Causal Inference: Does Voter PressureShape Turnout? Why people fail to vote—and why they vote at all—are both puzzles for many social scientists. Maybe people have intrinsic incentives that lead them to vote (a sense of duty?). Or maybe they respond to peer pressure. However, testing hypotheses about what causes people to vote is challenging. Gerber and Green have conducted many experimental studies to assess what factors influence turnout, e.g., phone calls, door-to-door contacts–and social pressure. Lecture Notes, Week 1 4/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceOne Design for Causal Inference: Does Voter PressureShape Turnout? Why people fail to vote—and why they vote at all—are both puzzles for many social scientists. Maybe people have intrinsic incentives that lead them to vote (a sense of duty?). Or maybe they respond to peer pressure. However, testing hypotheses about what causes people to vote is challenging. Gerber and Green have conducted many experimental studies to assess what factors influence turnout, e.g., phone calls, door-to-door contacts–and social pressure. Before the August 2006 primary election in Michigan, 180,000 households were assigned either to a control group, or to receive one of four mailings regarding the election. Lecture Notes, Week 1 4/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceCivic Duty mailing Lecture Notes, Week 1 5/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference“Hawthorne effect” mailing Lecture Notes, Week 1 6/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceSelf (own-record) mailing Lecture Notes, Week 1 7/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceNeighbors/Social Pressure mailing Lecture Notes, Week 1 8/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceEstimated Treatment Effects Lecture Notes, Week 1 9/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceSome points about the analysis The analysis can be extremely simple: a difference of means may be just the right tool. Lecture Notes, Week 1 10/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceSome points about the analysis The analysis can be extremely simple: a difference of means may be just the right tool. This is design-based inference: Lecture Notes, Week 1 10/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceSome points about the analysis The analysis can be extremely simple: a difference of means may be just the right tool. This is design-based inference: Confounding is controlled through ex-ante research design choices—not through ex-post statistical adjustment. Lecture Notes, Week 1 10/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceSome points about the analysis The analysis can be extremely simple: a difference of means may be just the right tool. This is design-based inference: Confounding is controlled through ex-ante research design choices—not through ex-post statistical adjustment. This simple analysis rests on a model that is often, though not always, quite credible. Lecture Notes, Week 1 10/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceSome points about the analysis The analysis can be extremely simple: a difference of means may be just the right tool. This is design-based inference: Confounding is controlled through ex-ante research design choices—not through ex-post statistical adjustment. This simple analysis rests on a model that is often, though not always, quite credible. This contrasts with many conventional model-based approaches, which instead rely on regression modeling to approximate an experimental ideal Lecture Notes, Week 1 10/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceSome points about the analysis The analysis can be extremely simple: a difference of means may be just the right tool. This is design-based inference: Confounding is controlled through ex-ante research design choices—not through ex-post statistical adjustment. This simple analysis rests on a model that is often, though not always, quite credible. This contrasts with many conventional model-based approaches, which instead rely on regression modeling to approximate an experimental ideal “The power of multiple regression analysis is that it allows us to do in non-experimental environments what natural scientists are able to do in a controlled laboratory setting: keep other factors fixed” (Wooldridge 2009: 77). Lecture Notes, Week 1 10/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. The statistics can be simple, transparent, and credible. Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. The statistics can be simple, transparent, and credible. Yet, they also have important limitations Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. The statistics can be simple, transparent, and credible. Yet, they also have important limitations External validity issues; also, interventions may or may not be substantively relevant. Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. The statistics can be simple, transparent, and credible. Yet, they also have important limitations External validity issues; also, interventions may or may not be substantively relevant. Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. The statistics can be simple, transparent, and credible. Yet, they also have important limitations External validity issues; also, interventions may or may not be substantively relevant. In practice, the analysis may be more or less design-based. Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. The statistics can be simple, transparent, and credible. Yet, they also have important limitations External validity issues; also, interventions may or may not be substantively relevant. In practice, the analysis may be more or less design-based. How best to maximize the promise—and minimize the pitfalls—is our focus. Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceStrengths and limitations of strong research design Strong designs can improve causal inferences in diverse substantive contexts. The statistics can be simple, transparent, and credible. Yet, they also have important limitations External validity issues; also, interventions may or may not be substantively relevant. In practice, the analysis may be more or less design-based. How best to maximize the promise—and minimize the pitfalls—is our focus. Whatever kind of research you do, considering these issues can help you think more clearly about research design and causal inference Lecture Notes, Week 1 11/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceDefining and Measuring Causality Three types questions arise in philosophical discussions of causality (Brady): 1. How do people understand causality when they use the concept? (Psychological/linguistic) Lecture Notes, Week 1 12/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceDefining and Measuring Causality Three types questions arise in philosophical discussions of causality (Brady): 1. How do people understand causality when they use the concept? (Psychological/linguistic) 2. What is causality? (Metaphysical/ontological) Lecture Notes, Week 1 12/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceDefining and Measuring Causality Three types questions arise in philosophical discussions of causality (Brady): 1. How do people understand causality when they use the concept? (Psychological/linguistic) 2. What is causality? (Metaphysical/ontological) 3. How do we discover when causality is operative? (Epistemological/Inferential) Lecture Notes, Week 1 12/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference1. Counterfactuals A counterfactual statement contains a false premise, and an assertion of what would have occurred had the premise been true: Lecture Notes, Week 1 13/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference1. Counterfactuals A counterfactual statement contains a false premise, and an assertion of what would have occurred had the premise been true: If an economic stimulus plan had not been adopted, then X . . . (where X might be “the economy would not have recovered” or “the economy would be in worse shape today”). Lecture Notes, Week 1 13/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference1. Counterfactuals A counterfactual statement contains a false premise, and an assertion of what would have occurred had the premise been true: If an economic stimulus plan had not been adopted, then X . . . (where X might be “the economy would not have recovered” or “the economy would be in worse shape today”). For any cause C, the causal effect of C is the difference between what would happen in two states of the world: e.g., one in which C is present and one in which C is (counterfactually) absent. Lecture Notes, Week 1 13/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference1. Counterfactuals A counterfactual statement contains a false premise, and an assertion of what would have occurred had the premise been true: If an economic stimulus plan had not been adopted, then X . . . (where X might be “the economy would not have recovered” or “the economy would be in worse shape today”). For any cause C, the causal effect of C is the difference between what would happen in two states of the world: e.g., one in which C is present and one in which C is (counterfactually) absent. Counterfactuals play a critical role in causal inference—though they aren’t always sufficient: Lecture Notes, Week 1 13/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference1. Counterfactuals A counterfactual statement contains a false premise, and an assertion of what would have occurred had the premise been true: If an economic stimulus plan had not been adopted, then X . . . (where X might be “the economy would not have recovered” or “the economy would be in worse shape today”). For any cause C, the causal effect of C is the difference between what would happen in two states of the world: e.g., one in which C is present and one in which C is (counterfactually) absent. Counterfactuals play a critical role in causal inference—though they aren’t always sufficient: “If the storm had not occurred, the mercury in the barometer would not have fallen” is a valid counterfactual statement, yet changes in air pressure, not storms, are the cause of falling mercury in barometers. Lecture Notes, Week 1 13/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference2. Manipulation Causation as forced movement (Lakoff) Lecture Notes, Week 1 14/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference2. Manipulation Causation as forced movement (Lakoff) E.g., children learn about causation by dropping a fork Lecture Notes, Week 1 14/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference2. Manipulation Causation as forced movement (Lakoff) E.g., children learn about causation by dropping a fork When combined with counterfactuals, manipulation provides a strong criterion: Lecture Notes, Week 1 14/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference2. Manipulation Causation as forced movement (Lakoff) E.g., children learn about causation by dropping a fork When combined with counterfactuals, manipulation provides a strong criterion: Does playing basketball make children grow tall? No. Intuitively, that’s because the following statement doesn’t make sense: Lecture Notes, Week 1 14/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference2. Manipulation Causation as forced movement (Lakoff) E.g., children learn about causation by dropping a fork When combined with counterfactuals, manipulation provides a strong criterion: Does playing basketball make children grow tall? No. Intuitively, that’s because the following statement doesn’t make sense: If we had intervened to make the children play basketball, they would have grown tall. Lecture Notes, Week 1 14/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inference2. Manipulation Causation as forced movement (Lakoff) E.g., children learn about causation by dropping a fork When combined with counterfactuals, manipulation provides a strong criterion: Does playing basketball make children grow tall? No. Intuitively, that’s because the following statement doesn’t make sense: If we had intervened to make the children play basketball, they would have grown tall. (Here is also a place where mechanistic understandings of causality come into play). Lecture Notes, Week 1 14/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferencePotential Outcomes In statistics, an idea that combines counterfactuals and manipulation (Neyman; Rubin; Holland) Lecture Notes, Week 1 15/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferencePotential Outcomes In statistics, an idea that combines counterfactuals and manipulation (Neyman; Rubin; Holland) Imagine, e.g., an experiment with two treatment conditions, say, a treatment and a control group Lecture Notes, Week 1 15/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferencePotential Outcomes In statistics, an idea that combines counterfactuals and manipulation (Neyman; Rubin; Holland) Imagine, e.g., an experiment with two treatment conditions, say, a treatment and a control group The potential outcome under treatment Yi (1) is the outcome some unit i would experience if assigned to treatment. Lecture Notes, Week 1 15/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferencePotential Outcomes In statistics, an idea that combines counterfactuals and manipulation (Neyman; Rubin; Holland) Imagine, e.g., an experiment with two treatment conditions, say, a treatment and a control group The potential outcome under treatment Yi (1) is the outcome some unit i would experience if assigned to treatment. The potential outcome under control Yi (0) is the outcome i would experience if assigned to control. Lecture Notes, Week 1 15/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferencePotential Outcomes In statistics, an idea that combines counterfactuals and manipulation (Neyman; Rubin; Holland) Imagine, e.g., an experiment with two treatment conditions, say, a treatment and a control group The potential outcome under treatment Yi (1) is the outcome some unit i would experience if assigned to treatment. The potential outcome under control Yi (0) is the outcome i would experience if assigned to control. The unit causal effect is the difference between these two outcomes: Yi (1) − Yi (0) Lecture Notes, Week 1 15/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferencePotential Outcomes In statistics, an idea that combines counterfactuals and manipulation (Neyman; Rubin; Holland) Imagine, e.g., an experiment with two treatment conditions, say, a treatment and a control group The potential outcome under treatment Yi (1) is the outcome some unit i would experience if assigned to treatment. The potential outcome under control Yi (0) is the outcome i would experience if assigned to control. The unit causal effect is the difference between these two outcomes: Yi (1) − Yi (0) This parameter is not directly observable, because we see Yi (1) or Yi (0) but not both. (The “fundamental problem of causal inference”—Holland). Lecture Notes, Week 1 15/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAverage Causal Effects Attention often focuses instead on average causal effects, e.g., for some units i = 1, ..., N, N 1 [Yi (1) − Yi (0)]. N i =1 Lecture Notes, Week 1 16/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAverage Causal Effects Attention often focuses instead on average causal effects, e.g., for some units i = 1, ..., N, N 1 [Yi (1) − Yi (0)]. N i =1 This parameter is the difference between two counterfactuals: the average outcome if all units were assigned to treatment, minus the average if all units were assigned to control. Lecture Notes, Week 1 16/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAverage Causal Effects Attention often focuses instead on average causal effects, e.g., for some units i = 1, ..., N, N 1 [Yi (1) − Yi (0)]. N i =1 This parameter is the difference between two counterfactuals: the average outcome if all units were assigned to treatment, minus the average if all units were assigned to control. The Neyman model is a causal model: it stipulates how units respond when they are assigned to treatment or control. Lecture Notes, Week 1 16/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAverage Causal Effects Attention often focuses instead on average causal effects, e.g., for some units i = 1, ..., N, N 1 [Yi (1) − Yi (0)]. N i =1 This parameter is the difference between two counterfactuals: the average outcome if all units were assigned to treatment, minus the average if all units were assigned to control. The Neyman model is a causal model: it stipulates how units respond when they are assigned to treatment or control. Such response schedules play a critical role in causal inference. Lecture Notes, Week 1 16/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceA missing data problem Notice that the fundamental problem of inference applies to average causal effects: N 1 [Yi (1) − Yi (0)]. N i =1 Lecture Notes, Week 1 17/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceA missing data problem Notice that the fundamental problem of inference applies to average causal effects: N 1 [Yi (1) − Yi (0)]. N i =1 If we assign all N units to treatment, we don’t see Yi (0) for any i. Similarly if all N units go to control. Lecture Notes, Week 1 17/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceA missing data problem Notice that the fundamental problem of inference applies to average causal effects: N 1 [Yi (1) − Yi (0)]. N i =1 If we assign all N units to treatment, we don’t see Yi (0) for any i. Similarly if all N units go to control. Lecture Notes, Week 1 17/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceA missing data problem Notice that the fundamental problem of inference applies to average causal effects: N 1 [Yi (1) − Yi (0)]. N i =1 If we assign all N units to treatment, we don’t see Yi (0) for any i. Similarly if all N units go to control. Social science often relies on empirical comparisons—e.g., between some set of units for whom we observe Yi (1) and another set for whom we observe Yi (0). Lecture Notes, Week 1 17/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceA missing data problem Notice that the fundamental problem of inference applies to average causal effects: N 1 [Yi (1) − Yi (0)]. N i =1 If we assign all N units to treatment, we don’t see Yi (0) for any i. Similarly if all N units go to control. Social science often relies on empirical comparisons—e.g., between some set of units for whom we observe Yi (1) and another set for whom we observe Yi (0). Lecture Notes, Week 1 17/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceA missing data problem Notice that the fundamental problem of inference applies to average causal effects: N 1 [Yi (1) − Yi (0)]. N i =1 If we assign all N units to treatment, we don’t see Yi (0) for any i. Similarly if all N units go to control. Social science often relies on empirical comparisons—e.g., between some set of units for whom we observe Yi (1) and another set for whom we observe Yi (0). One group serves as the counterfactual for the other—which may help us overcome this fundamental problem. Lecture Notes, Week 1 17/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceRandom assignment and the missing data problem Randomization is one way to solve the missing data problem. Lecture Notes, Week 1 18/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceRandom assignment and the missing data problem Randomization is one way to solve the missing data problem. The logic of random sampling is key. Lecture Notes, Week 1 18/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceRandom assignment and the missing data problem Randomization is one way to solve the missing data problem. The logic of random sampling is key. The question is how we can use sample data—statistics—to estimate parameters—like the average causal effect. Lecture Notes, Week 1 18/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceRandom assignment and the missing data problem Randomization is one way to solve the missing data problem. The logic of random sampling is key. The question is how we can use sample data—statistics—to estimate parameters—like the average causal effect. To understand the sampling process, we need a box model Lecture Notes, Week 1 18/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceA box model for experiments and natural experiments Yi (1) Yi (0) Yi (1) Yi (0) Yi (1) Yi (0) … Yi (1) Yi (0) Yi (1) Yi (0) € € € € Study group€ € Yi (1) … Yi (1) Yi (1) Yi (0) Yi (0) … Yi (0) € € € € Treatment group Control group € € € € € € Lecture Notes, Week 1 19/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceThe average causal effect Yi (1) Yi (0) Yi (1) Yi (0) Yi (1) Yi (0) … Yi (1) Yi (0) Yi (1) Yi (0) € € € € Study group€ € Yi (1) … Yi (1) Yi (1) Yi (0) Yi (0) … Yi (0) € € € € Treatment group Control group € € € € € € 1 N The estimand: ∑[Y (1) −Yi (0)] N i=1 i € Lecture Notes, Week 1 20/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceEstimating the average causal effect Yi (1) Yi (0) Yi (1) Yi (0) Yi (1) Yi (0) … Yi (1) Yi (0) Yi (1) Yi (0) € € € € Study group€ € Yi (1) … Yi (1) Yi (1) Yi (0) Yi (0) … Yi (0) € € € € Treatment group Control group € € € € € € 1 N The estimand: ∑[Y (1) −Yi (0)] N i=1 i 1 m 1 N An unbiased estimator: € ∑[Y | T = 1] − N − m ∑[Yi | Ti = 0] m i=1 i i i=m +1 where Ti is an indicator for treatment assignment. Under this model, a random subset of size m<N units is assigned to treatment. € The units assigned to the treatment group are indexed from 1 to m. Lecture Notes, Week 1 21/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceThe expected value, and definition of unbiasedness With a simple random sample, the expected value for the sample mean equals the population mean. Lecture Notes, Week 1 22/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceThe expected value, and definition of unbiasedness With a simple random sample, the expected value for the sample mean equals the population mean. So the expected value of the mean of the Yi (1)s in the assigned-to-treatment group equals the average in the study group (the population). Lecture Notes, Week 1 22/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceThe expected value, and definition of unbiasedness With a simple random sample, the expected value for the sample mean equals the population mean. So the expected value of the mean of the Yi (1)s in the assigned-to-treatment group equals the average in the study group (the population). Same with the control group: the sample mean of the Yi (0)s in the assigned-to-control group, on average, will equal the true population mean. Lecture Notes, Week 1 22/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceThe expected value, and definition of unbiasedness With a simple random sample, the expected value for the sample mean equals the population mean. So the expected value of the mean of the Yi (1)s in the assigned-to-treatment group equals the average in the study group (the population). Same with the control group: the sample mean of the Yi (0)s in the assigned-to-control group, on average, will equal the true population mean. In any given experiment, the control group mean may be too high or too low (as may the treatment group mean). Lecture Notes, Week 1 22/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceThe expected value, and definition of unbiasedness With a simple random sample, the expected value for the sample mean equals the population mean. So the expected value of the mean of the Yi (1)s in the assigned-to-treatment group equals the average in the study group (the population). Same with the control group: the sample mean of the Yi (0)s in the assigned-to-control group, on average, will equal the true population mean. In any given experiment, the control group mean may be too high or too low (as may the treatment group mean). But across infinitely many (hypothetical) replications of the sampling process, the average of the sample averages will equal the true average in the study group. Lecture Notes, Week 1 22/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceThe expected value, and definition of unbiasedness With a simple random sample, the expected value for the sample mean equals the population mean. So the expected value of the mean of the Yi (1)s in the assigned-to-treatment group equals the average in the study group (the population). Same with the control group: the sample mean of the Yi (0)s in the assigned-to-control group, on average, will equal the true population mean. In any given experiment, the control group mean may be too high or too low (as may the treatment group mean). But across infinitely many (hypothetical) replications of the sampling process, the average of the sample averages will equal the true average in the study group. Similarly, the expected value of the difference of sample averages equals the average difference in the population. The difference-of-means estimator is unbiased for the average causal effect. Lecture Notes, Week 1 22/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAssumptions of the Neyman model The Neyman urn model involves several assumptions: Lecture Notes, Week 1 23/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAssumptions of the Neyman model The Neyman urn model involves several assumptions: As-if Random: Units are sampled at random from the study group and assigned to treatment or control Lecture Notes, Week 1 23/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAssumptions of the Neyman model The Neyman urn model involves several assumptions: As-if Random: Units are sampled at random from the study group and assigned to treatment or control Non-Interference/SUTVA: Each unit’s outcome depends only on its treatment assignment (and not on the assignment of other units) (Analogue to regression models: Unit i’s response depends on i’s covariate values and error term). Lecture Notes, Week 1 23/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAssumptions of the Neyman model The Neyman urn model involves several assumptions: As-if Random: Units are sampled at random from the study group and assigned to treatment or control Non-Interference/SUTVA: Each unit’s outcome depends only on its treatment assignment (and not on the assignment of other units) (Analogue to regression models: Unit i’s response depends on i’s covariate values and error term). Exclusion restriction: Treatment assignment only affects outcomes through treatment receipt Lecture Notes, Week 1 23/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAssumptions of the Neyman model The Neyman urn model involves several assumptions: As-if Random: Units are sampled at random from the study group and assigned to treatment or control Non-Interference/SUTVA: Each unit’s outcome depends only on its treatment assignment (and not on the assignment of other units) (Analogue to regression models: Unit i’s response depends on i’s covariate values and error term). Exclusion restriction: Treatment assignment only affects outcomes through treatment receipt N.B.: As we will see, these are also assumptions of standard regression models, too Lecture Notes, Week 1 23/ 41
    • An application Causality Defining causality Statistical inference Potential outcomes Causal inferenceAssumptions of the Neyman model The Neyman urn model involves several assumptions: As-if Random: Units are sampled at random from the study group and assigned to treatment or control Non-Interference/SUTVA: Each unit’s outcome depends only on its treatment assignment (and not on the assignment of other units) (Analogue to regression models: Unit i’s response depends on i’s covariate values and error term). Exclusion restriction: Treatment assignment only affects outcomes through treatment receipt N.B.: As we will see, these are also assumptions of standard regression models, too Important questions: how can the validity of these assumptions be probed? Lecture Notes, Week 1 23/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelA box model for rolling a die Let’s talk a bit more about random variables and their expectations. Lecture Notes, Week 1 24/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelA box model for rolling a die Let’s talk a bit more about random variables and their expectations. A six-sided fair die has an equal probability of landing 1, 2, 3, 4, 5, or 6 each time it is rolled. Lecture Notes, Week 1 24/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelA box model for rolling a die Let’s talk a bit more about random variables and their expectations. A six-sided fair die has an equal probability of landing 1, 2, 3, 4, 5, or 6 each time it is rolled. A single roll of the die can thus be modeled as a draw at random from a box of tickets 1
 2
 3
 4
 5
 6
 ?
 Lecture Notes, Week 1 24/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelRandom variables and observed values A ticket drawn at random from the box is a random variable. Lecture Notes, Week 1 25/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelRandom variables and observed values A ticket drawn at random from the box is a random variable. Definition: a random variable is a chance procedure for generating a number. Lecture Notes, Week 1 25/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelRandom variables and observed values A ticket drawn at random from the box is a random variable. Definition: a random variable is a chance procedure for generating a number. The value of a particular draw is an observed value (or realization) of this random variable. 1   2   3   4   5   6   4   = X Lecture Notes, Week 1 25/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAnother random variable Suppose we discard all values of the die of 5 and 6. Lecture Notes, Week 1 26/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAnother random variable Suppose we discard all values of the die of 5 and 6. Now we’ve got a new random variable: 1   2   3   4   ?   = Y Lecture Notes, Week 1 26/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelOperations of random variables An arithmetic operation on draws from boxes makes a new random variable. Lecture Notes, Week 1 27/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelOperations of random variables An arithmetic operation on draws from boxes makes a new random variable. For example, define a new random variable Z in terms of X and Y: Y +X =Z Lecture Notes, Week 1 27/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelOperations of random variables An arithmetic operation on draws from boxes makes a new random variable. For example, define a new random variable Z in terms of X and Y: Y +X =Z 1   2   3   4   1   2   3   4   5   6   ?   + ?   Y + X = Z Lecture Notes, Week 1 27/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelManipulating random variables Other operations are fine, too. Lecture Notes, Week 1 28/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelManipulating random variables Other operations are fine, too. 1   2   3   4   1   2   3   4   5   6   3 ?    ?   3Y  X = W Lecture Notes, Week 1 28/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelExpectations The expected value of a random variable is a number. Lecture Notes, Week 1 29/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelExpectations The expected value of a random variable is a number. If X is a random draw from a box of numbered tickets, then E(X) is the average of the tickets in the box. Lecture Notes, Week 1 29/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelExpectations The expected value of a random variable is a number. If X is a random draw from a box of numbered tickets, then E(X) is the average of the tickets in the box. The observed value of the random variable will be somewhere around this number—sometimes too high, sometimes too low. Lecture Notes, Week 1 29/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelExpectations The expected value of a random variable is a number. If X is a random draw from a box of numbered tickets, then E(X) is the average of the tickets in the box. The observed value of the random variable will be somewhere around this number—sometimes too high, sometimes too low. Consider a previous example: 1. E (X ) = ? 2. E (Y ) = ? 3. E (2X + 3Y ) = ? 4. Does E (Z ) = E (2X + 3Y ) = 2E (X ) + 3E (Y )? 1   2   3   4   1   2   3   4   5   6   ?   + ?   Y + X = Z Lecture Notes, Week 1 29/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelIndependence and dependence Suppose X and Y are random variables. Pretend you know the value of X . Do the chances of Y depend on that value? If so, X and Y are dependent. If not, they are independent. Lecture Notes, Week 1 30/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelIndependence and dependence Suppose X and Y are random variables. Pretend you know the value of X . Do the chances of Y depend on that value? If so, X and Y are dependent. If not, they are independent. Are X and Y dependent or independent?  2    2    2    4    2    9    4    2    4    4    4    9      ?    ?   X Y Lecture Notes, Week 1 30/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelDrawing with and without replacement Suppose we make two draws from the box below. If the draws are made with replacement, are they independent? If the draws are made without replacement, are they independent? 1   2   3   4   5   6   ?   ?   Lecture Notes, Week 1 31/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelConditional versus unconditional probabilities Suppose we make two draws without replacement from the box below. What is the chance that the second draw is 4? What is the chance that the second draw is 4, given that the first draw is 3? 1   2   3   4   Lecture Notes, Week 1 32/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelConditional versus unconditional probabilities In answering such questions, it is helpful to map out all possible outcomes of the two draws. second draw 1 2 3 4 1 n.a. 12 13 14 2 21 n.a. 23 24 first draw 3 31 32 n.a. 34 4 41 41 43 n.a. In the table, “12” means that 1 is the observed value of the first draw and 2 is the observed value of the second draw. Lecture Notes, Week 1 33/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on chance processes A data variable is a list of numbers. Lecture Notes, Week 1 34/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on chance processes A data variable is a list of numbers. A random variable is a chance procedure for generating a number. Lecture Notes, Week 1 34/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on chance processes A data variable is a list of numbers. A random variable is a chance procedure for generating a number. Sometimes, a data variable can be viewed as a list of observed values of random variables. Lecture Notes, Week 1 34/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on chance processes A data variable is a list of numbers. A random variable is a chance procedure for generating a number. Sometimes, a data variable can be viewed as a list of observed values of random variables. Tomorrow morning, you go out and ask the age of the first person you meet. Is this a random variable? Lecture Notes, Week 1 34/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on chance processes A data variable is a list of numbers. A random variable is a chance procedure for generating a number. Sometimes, a data variable can be viewed as a list of observed values of random variables. Tomorrow morning, you go out and ask the age of the first person you meet. Is this a random variable? Random variables involve a model for the process which generated the data. Lecture Notes, Week 1 34/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on chance processes A data variable is a list of numbers. A random variable is a chance procedure for generating a number. Sometimes, a data variable can be viewed as a list of observed values of random variables. Tomorrow morning, you go out and ask the age of the first person you meet. Is this a random variable? Random variables involve a model for the process which generated the data. In this section, we started by writing down a box model for rolling a six-sided die. This sort of modeling is basic to statistical inference. Lecture Notes, Week 1 34/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on chance processes A data variable is a list of numbers. A random variable is a chance procedure for generating a number. Sometimes, a data variable can be viewed as a list of observed values of random variables. Tomorrow morning, you go out and ask the age of the first person you meet. Is this a random variable? Random variables involve a model for the process which generated the data. In this section, we started by writing down a box model for rolling a six-sided die. This sort of modeling is basic to statistical inference. Sometimes, the models are apt descriptions of the chance procedure. Sometimes, they are not . . . Lecture Notes, Week 1 34/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelStatistics and parameters For our model of rolling a six-sided die, the expected value of a draw from the box is the box’s mean: 1+2+3+4+5+6 = 3.5 (1) 6 Lecture Notes, Week 1 35/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelStatistics and parameters For our model of rolling a six-sided die, the expected value of a draw from the box is the box’s mean: 1+2+3+4+5+6 = 3.5 (1) 6 Lecture Notes, Week 1 35/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelStatistics and parameters For our model of rolling a six-sided die, the expected value of a draw from the box is the box’s mean: 1+2+3+4+5+6 = 3.5 (1) 6 Lecture Notes, Week 1 35/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelStatistics and parameters For our model of rolling a six-sided die, the expected value of a draw from the box is the box’s mean: 1+2+3+4+5+6 = 3.5 (1) 6 Lecture Notes, Week 1 35/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelStatistics and parameters For our model of rolling a six-sided die, the expected value of a draw from the box is the box’s mean: 1+2+3+4+5+6 = 3.5 (1) 6 The expected value is a parameter. Lecture Notes, Week 1 35/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelStatistics and parameters For our model of rolling a six-sided die, the expected value of a draw from the box is the box’s mean: 1+2+3+4+5+6 = 3.5 (1) 6 The expected value is a parameter. In this case, we know what it is. In other cases, we might use observed values to estimate the parameter. Drawing inferences from data to the box is the focus of statistics. (Reasoning forward from the box to the data is the study of probability). Lecture Notes, Week 1 35/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on random variables If we roll a die n times and add up the total number of spots, we have another random variable: n Ui , (2) i =1 where Ui is the number of spots on the ith roll. Lecture Notes, Week 1 36/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on random variables If we roll a die n times and add up the total number of spots, we have another random variable: n Ui , (2) i =1 where Ui is the number of spots on the ith roll. This is like drawing n tickets from the box with replacement. Lecture Notes, Week 1 36/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on random variables If we roll a die n times and add up the total number of spots, we have another random variable: n Ui , (2) i =1 where Ui is the number of spots on the ith roll. This is like drawing n tickets from the box with replacement. The Ui are independent, identically distributed random variables. Lecture Notes, Week 1 36/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelMore on random variables If we roll a die n times and add up the total number of spots, we have another random variable: n Ui , (2) i =1 where Ui is the number of spots on the ith roll. This is like drawing n tickets from the box with replacement. The Ui are independent, identically distributed random variables. Distributing expectations, we have n n E( Ui ) = E (Ui ) = n · 3.5 i =1 i Lecture Notes, Week 1 36/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelSampling from the box Again, throw a die n times and count the number of spots on each roll. The sample mean is n 1 U= Ui (3) n i =1 Lecture Notes, Week 1 37/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelSampling from the box Again, throw a die n times and count the number of spots on each roll. The sample mean is n 1 U= Ui (3) n i =1 The sample mean is a sum of random variables and is itself a random variable—it will turn out a little different in each different sample of n rolls of the die. Lecture Notes, Week 1 37/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelSampling from the box Again, throw a die n times and count the number of spots on each roll. The sample mean is n 1 U= Ui (3) n i =1 The sample mean is a sum of random variables and is itself a random variable—it will turn out a little different in each different sample of n rolls of the die. When n is large, however, U E(Ui ) = 3.5 (4) where means “about equal to.” Lecture Notes, Week 1 37/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelEstimating parameters Thus, we can use repeated observations to estimate parameters like the expectations of random variables. Lecture Notes, Week 1 38/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelEstimating parameters Thus, we can use repeated observations to estimate parameters like the expectations of random variables. In this example, we use observed values of independent, identically distributed random variables–n draws from a box of tickets. Lecture Notes, Week 1 38/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelEstimating parameters Thus, we can use repeated observations to estimate parameters like the expectations of random variables. In this example, we use observed values of independent, identically distributed random variables–n draws from a box of tickets. This is a good model for the rolling of dice. Lecture Notes, Week 1 38/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelEstimating parameters Thus, we can use repeated observations to estimate parameters like the expectations of random variables. In this example, we use observed values of independent, identically distributed random variables–n draws from a box of tickets. This is a good model for the rolling of dice. In other contexts, things may be more complicated. Lecture Notes, Week 1 38/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelHypothetical illustration of potential outcomes for localbudgets when village council heads are women or men Yi (0) Yi (1) Budget share if Budget share if τi village head is village head is Unit causal Village i male female effect Village 1 10 15 5 Village 2 15 15 0 Village 3 20 30 10 Village 4 20 15 -5 Village 5 10 20 10 Village 6 15 15 0 Village 7 15 30 15 Average 15 20 5 From Gerber and Green (2012), drawing on Chattopadhyay and Duflo (2004) Lecture Notes, Week 1 39/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAn experiment where m = 2 villages are assigned totreatment and N − m = 5 go to control 15   10   30   20      ?      ?    ?    ?   15   15   15   20   20   10    ?    ?   15   15   30   15    ?   Lecture Notes, Week 1 40/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAn unbiased estimator for the average causal effect More generally, denote the units assigned to treatment by i = 1, ..., m and those assigned to control by i = m + 1, ..., N. Lecture Notes, Week 1 41/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAn unbiased estimator for the average causal effect More generally, denote the units assigned to treatment by i = 1, ..., m and those assigned to control by i = m + 1, ..., N. m Y Define Y T = im i to be the sample mean of the assigned-to-treatment group. Lecture Notes, Week 1 41/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAn unbiased estimator for the average causal effect More generally, denote the units assigned to treatment by i = 1, ..., m and those assigned to control by i = m + 1, ..., N. m Y Define Y T = im i to be the sample mean of the assigned-to-treatment group. N Yi Define Y C = N −1 to be the sample mean of the m+ m assigned-to-control group. Lecture Notes, Week 1 41/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAn unbiased estimator for the average causal effect More generally, denote the units assigned to treatment by i = 1, ..., m and those assigned to control by i = m + 1, ..., N. m Y Define Y T = im i to be the sample mean of the assigned-to-treatment group. N Yi Define Y C = N −1 to be the sample mean of the m+ m assigned-to-control group. Then, m N i Yi m +1 Yi E (Y T − Y C ) = E ( ) − E( ) (5) m N−m Lecture Notes, Week 1 41/ 41
    • Box models Causality Statistical Inference Statistical inference Inference under the potential outcomes modelAn unbiased estimator for the average causal effect More generally, denote the units assigned to treatment by i = 1, ..., m and those assigned to control by i = m + 1, ..., N. m Y Define Y T = im i to be the sample mean of the assigned-to-treatment group. N Yi Define Y C = N −1 to be the sample mean of the m+ m assigned-to-control group. Then, m N i Yi m +1 Yi E (Y T − Y C ) = E ( ) − E( ) (5) m N−m On problem set: show that the difference-of-means estimator is unbiased for the average causal effect. Lecture Notes, Week 1 41/ 41