- Categorical variables are variables that are described by words rather than numbers, like "cat person" vs "dog person". Continuous variables take numerical values like income or test scores.
- To analyze the effect of a categorical variable in a regression, it must be converted into a binary variable using 0s and 1s. This allows a comparison of means between the included group and omitted group.
- Difference-in-differences estimation compares the change in outcomes over time between a treatment group and a control group, allowing researchers to account for other factors to isolate the causal effect of a treatment or policy. It requires the presence of a treatment and control group as well as pre- and post-treatment observations.
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Topic 4 (binary)
1. DIFFERENCE IN MEANS AND
REGRESSIONS WITH BINARY
INDEPENDENT VARIABLES
ECON 355 – Regression Analysis
Ryan Herzog, Ph.D.
2. IN THIS TOPIC
• Categorical variables
• Mean-comparison tests
• Difference-in-difference(s) estimation technique
3. CATEGORICAL VARIABLES
• Up until now we have been dealing with continuous variables, i.e. the price of a house or
the size of a house
• Categorical variables are different, they are usually described by a word, not number. We
can deal with them by grouping them.
• For example, if I ask you if you are a cat or a dog person and record the numbers in a
spreadsheet, I will be dealing with words “cat” and “dog” not numbers
4. CLASS DATA COLLECTION
• Please use the following link to input data about yourself:
- Cat person/dog person
- Coffee consumption
https://docs.google.com/spreadsheets/d/1R7cPm92FeYuaxAQRLINh1_zCLL_VlEZwj_K9CGz
1zPA/edit?usp=sharing
- Do cat and dog lovers consume the same amount of caffeine?
5. DIFFERENCE IN MEANS
• To test if there is a difference in means between two groups, we need to find the t-stat:
• Find the means and the difference between them
• Divide the difference by the standard error:
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
• The null hypothesis is that the means of the two groups are equal, i.e. cat and dog
lovers consume the same amount of caffeine
• The alternative is that they do not consume the same amount of caffeine ( the means of
the two groups are different)
6. BINARY VARIABLES
• To record categorical variables we will use dummies/binary variables: 1 and 0
• For example, if I have two groups that are mutually exclusive, meaning each observation
can only belong to one group but not both at the same time, I will assign “1” to the first
group and “0” to the other.
• For example, cat lovers can be coded as 1, and dog lovers will then be coded as 0.
• Stata: use TeachingRatings.dta dataset
• Which variables are continuous and which ones are categorical?
7. DIY
• Work in Stata
• We need to find out the difference in means of student-teacher evaluations for male
and female professors
• bys female: summarize course_eval
• Use the formula for t-stat to test if the means between the two groups are equal
8. REGRESSION WITH A BINARY INDEPENDENT VARIABLE
• When we have a binary variable on the right-hand side we are effectively comparing
means between two groups the included group and the excluded group.
Stata: reg course_eval female
• The interpretation of beta changes
• It is not anymore “when X increases by 1 unit” but rather ”For the included group (what
is it?) the dependent variable on average changes by beta compared to the excluded
group (what is it?)”
9. LET’S TRY MORE EXAMPLES
• Regress course evaluations on the following binary variables
• Minority (equal to 1 if the professor represents a teaching minority, 0 otherwise)
• One credit (equal to 1 if the course is a 1-credit course, 0 otherwise)
• nnenglish (equal to 1 if the professor’s native language is not English, 0 otherwise)
• Intro (equal to 1 if the course is introductory, 0 otherwise)
• Are the relationships statistically significant?
• If yes, please interpret them
10. CONDUCTING A MEAN-COMPARISON TEST (T-TEST) IN
STATA
• We can also find the same answer by conducting the t-test analysis in Stata
• Statistics=>summaries, tables, and tests=>classical tests of hypotheses=>t-test (mean-
comparison test)
1. Run a mean-comparison test of teaching evaluations based on the gender
2. Run a mean-comparison test of teaching evaluations based on any other categorical
variable
11. CREATING A BINARY VARIABLE IN STATA
• Please use “binarydata_stata” dataset
• Library – stands for a family member owning a library card when the respondent was 14
• Urban – the respondent lives in an urban area at 2002 interview
• Government – the respondent works for the government
• To create a binary variable for the “library”
gen libraryd=0
replace libraryd=1 if library=="yes”
• Do earnings of those whose family owned a library card differ from the earnings of
those whose family did not? If yes, by how much?
12. DIY
• Please convert the government variable into a binary variable
• By conducting a mean-comparison test in Stata please pick the correct interpretation of
the test result
13. DIFFERENCE-IN-DIFFERENCES ESTIMATION TECHNIQUE
• Allows to show causality
• Needs a treatment and a control group
• We can use difference-in-differences, for example, if there is a new policy implemented
on a local level
• Example: Card and Krueger (2000).
Control group: fast food stores in Eastern Pennsylvania
Treatment group: fast food stores in New Jersey
Treatment: increase in minimum wage in New Jersey on April 1, 1992
Compare employment growth in Eastern Pennsylvania and New Jersey before and after
treatment
14. PAIRED T-TEST
• The paired t-test is used to determine whether the mean of a dependent variable (e.g.,
weight, anxiety level, salary, reaction time, etc.) is the same in two related groups (e.g.,
two groups of participants that are measured at two different "time points" or who
undergo two different "conditions").
• To understand whether there was a difference in managers' salaries before and after undertaking
a PhD
• Your dependent variable would be "salary", and your two related groups would be the two
different "time points”
• To understand whether there was a difference in smokers' daily cigarette consumption 6 week
after wearing nicotine patches compared with wearing patches that did not contain nicotine,
known as a "placebo"
• Your dependent variable would be "daily cigarette consumption", and your two related groups
would be the two different "conditions" participants were exposed to; that is, cigarette
consumption values after wearing "nicotine patches" (the treatment group) compared to after
wearing the "placebo" (the control group).
• Specifically, you use a paired t-test to determine whether the mean difference between
two groups is statistically significantly different to zero.
15. DIFF-N-DIFF CONTINUED. ANOTHER EXAMPLE
• Richardson and Troost (2009). Different monetary policies by federal reserve districts
• Mississippi is divided between 6th and 8th federal reserve districts
• During the Great Depression Atlanta Federal Reserve (6th district) increased lending by
30-40% to rescue banks from bankruptcy; St. Louis Fed if anything cut the lending by
10% (laissez faire)
• Treatment group – Mississippi banks in the 6th federal reserve district
• Control group – Mississippi banks in the 8th federal reserve district
• Treatment – monetary policy during Great Depression
16. MONETARY POLICY DURING GREAT DEPRESSION CONT’D
• Use banks.xlsx
• District 6 and district 8 variables signify the number of banks in each district at a point in
time
• Use filter in excel to find out the number of banks on the first of July each year
• What is the difference in the number of banks between 1929 and 1933 in the 6th district?
• What is the difference in the number of banks between 1929 and 1933 in the 8th district?
• What is the difference of the two differences?
17. DIFF-N-DIFF WATER SUPPLY AND CHOLERA EXAMPLE
• John Snow (1855) – described the relationship between water supply and cholera death
in London overtime
• In South London both Southwark and Vauxhall Company and Lambert Company drew
water from contaminated Thames in central London in 1849
• In 1852 Lambeth Company started drawing water from an uncontaminated water source
upstream.
• What would we expect to happen in this case?
• What is the control group? Treatment group? Treatment?
18. DIFF-N-DIFF WATER SUPPLY AND CHOLERA EXAMPLE
CONT’D
• Use Cholera_deaths excel dataset
• To conduct the diff-n-diff analysis here what should you do? What is the conclusion?
• In Stata
• Stata: statistics=>summaries, tables, and tests=>classical tests of hypotheses=>t-test (mean-
comparison test)=>paired test=> by group
• What is the conclusion based on the statistical significance of the test?
19. REVIEW
1. What is the difference between categorical variables and continuous variables? Please give an example
of each
2. To run a regression with a categorical variable as an independent variable what do we need to do?
3. What is the difference in the interpretation of a regression with a continuous independent variable and a
regression with a categorical independent variable?
4. How do we interpret the constant in a regression with a categorical independent variable? Please give
an example
5. What do we mean by “omitted group” when including a categorical variable in a regression? Please give
an example
6. What does conducting a mean-comparison test allow us to do?
7. How do we conduct a mean-comparison test in Stata? When do we conduct two-sample t-test and
when do we conduct a paired t-test?
8. To be able to conduct a difference in difference analysis what do we need to have?
9. Intuitively, how do we conduct a difference in difference analysis?