The document discusses dummy variables, which take on values of 1 and 0 to represent dichotomous categories like male/female or urban/rural. Dummy variables allow for estimating different intercepts for subgroups in a regression. When using two dummy variables to represent a single categorical variable, there is perfect multicollinearity, so only m-1 dummies should be used for a variable with m categories. The document provides examples of using dummy variables to estimate separate regression lines for male and female subgroups.
3. Nature of “dummy” variable:
(1)Variables that assume such “1” and “0” values
(2) Variables usually indicates the dichotomized
“presence” or “absence”, “yes” or “no”, etc.
(3) Variables indicates a “quality” or an attribute,
such as “male” or “female”,
“black” or “white”,
“urban” or non-urban”
“before” or “after”
“North” or “south”, “east” or “west”
………..etc.
7. 10
15
20
25
30
35
0 1 2 3 4 5 6 7 8
Male
Linear (Male)
Salary Y
X
teaching
years
Y = 1 + 2 X (male)
^ ^ ^
Two separate models: Ym = 1 + 2 Xm + um
Yf = ’1 + ’2 Xf + uf
(male)
(female)
Linear (Female)
Female
Y = ’1+ ’2X (female)
^ ^
^
8. Assuming *2 = 2, same slope but different constant
between Yi and Xi.
1st model: Yi = 1 + *1 Di + 2 Xi + ui
Yi = 1 + *1 Di + 2 Xi + *2 DiXi + ui
Yi = annual salary (each obs.)
Xi = years of teaching experience
Di = 1 if male
= 0 otherwise (female)
control
variable
Assuming *2 2, different slope and different constant
between Yi and Xi.
2nd model:
9. Salary Y
X
teaching
years
Y = ’1+ ’2X (female)
^ ^ ^
Y = 1 + 2 X (male)
^ ^ ^
0 1 2 3 4 5 6 7 8
Male
Female
Linear (Male)
Linear (Female)
15
20
25
30
35
10
Y = ”1 + ”2 X (whole)
^ ^ ^
Two separate models: Ym = 1 + 2 Xm + um
Yf = ’1 + ’2 Xf + uf
(male)
(female)
10. D1 + D2 = 1
D1 = 1 - D2
male female
annual
Salary
years of
teaching
obs D2 D1 Y X
1 0 1 23 1
2 1 0 19.5 1
3 0 1 24 2
4 1 0 21 2
5 0 1 25 3
6 1 0 22 3
7 0 1 26.5 4
8 1 0 23.1 4
9 1 0 25 5
10 0 1 28 5
11 0 1 29.5 6
12 1 0 26 6
13 1 0 27.5 7
14 0 1 31.5 7
15 1 0 29 6
16 0 1 22 5
17 1 0 19 2
18 0 1 18 2
19 1 0 21.7 5
20 1 0 18.5 2
21 0 1 21 4
22 0 1 20.5 4
23 1 0 17 1
24 1 0 17.5 1
25 0 1 21.2 5
Each dummy
identify two
different
categories,
but when
sum up two
dummies
it cannot
identify
which is
male or female
11. Caution in the use of Dummy variables
(Dummy variable trap)
If we introduce two dummy variables in one model to
identify two categories of one qualitative variable such as
Yi = 1+ *1 D1i + **1 D2i + 2 Xi + ui
where D1i = 1 if female
= 0 otherwise
where D2i = 1 if male
= 0 otherwise
This model cannot be estimated because of
perfect collinearity between D1 and D2
D1 = 1 - D2
or D2 = 1 - D1
or D1 + D2 = 1 ( Perfect collinearity )
12. Use two dummy variables to identify two different qualitative
categories in one model will be fall into the
“Trap of perfect multi-collinearity”
General rule : To avoid the perfect multicollinearity
If a qualitative variable has “m” categories,
introduce only “m-1” dummy variables.
1
D1 D2 D3 D4 D5 … Dm-1
age
1 10 20 30 40 m
Categories
dummy =>
Qualitative variable
13. Measure the estimated result for two groups:
Male: ==> Yi = (1 + *1 D2i)+ 2Xi D2i = 1
^ ^ ^ ^
Female: ==> Yi = 1 + 2Xi D2i = 0
^ ^ ^
Now consider different intercepts of two groups:
Model: Yi = 1 + *1 D2i + 2Xi + ui
D2i = 1 if male
= 0 otherwise, (i.e. female)
When a category is assigned the value of zero, this
category is called a control category (or omitted group).
2
14. In order to test whether there is any difference in
the relationships between two categories
Compare: Yi = 1 + 2Xi
^ ^ ^
Yi = (1 + *1 D)+ 2 Xi
^ ^ ^ ^
If t-statistics is significant in *1, there is
different in constant term.
=>same 2 means two categories of X have the
same relationship with Y
^
^
Check the t-statistics
15. H0 : *1 = 0
H1 : *1 > 0 or H1 : *1 0
Appropriate test is the t-test on *0
^
Compare the critical tc
(α/2, n-k) and the estimated t*
If t* > tc ==> reject H0 : *1 = 0
Y = 1 + *1Di+ 2 Xi + *2DiXi
^ ^ ^ ^
^
Check
t-statistics
=
This part is
testing the
difference of
intercept
This part is testing
whether any
difference in slope
of two categories
Check
t-statistics
=
16. Separate Examples for female and male:
Female Male
The two regression results performed differently in slope
and intercept. But are they really statistically different?
We cannot answer from these two separate regression results
unless you test the F*.
17. Set two dummies for the Example: Table 15.1 +15.5
Yi = ( ’1 +”1D1) +2 Xi
^ ^ ^ ^
= (17.937-1.2810) + 1.561X
D1:Female =1
others = 0
D2:Male =1
others = 0
= (16.656+1.2810) + 1.561X
Yi = (1 + *1 D2)+ 2 Xi
^ ^ ^ ^
20. If D2: Male =1
Female: Y = 1 + 2 Xi
^
^ =16.255 + 1.677 X
Male: Y = (1+ ’1 D2)+(2+ ’2D2)X
^ ^ ^ ^ =18.689 + 1.373 X
21. One qualitative variable with more than two categories
(Health care) = 1 + ’1 D2 + ’’1 D3 + 2Income + u
(Y) (X)
D2 = 1 if high school education
= 0 otherwise
D3 = 1 if college education
= 0 otherwise
2
22. Health
care
income
Less than high school education
Y = 1 + 2 X
^ ^ ^
1
^
High school education
Y = (1 + ’1 D2)+ 2X
^ ^ ^ ^
D2 = 1
’1
^
D3 = 1
College education
Y = (1 + 1
” D3)+2 X
^ ^ ^ ^
’’1
^
24. Less than high school: Yi = -1.2859 + 0.1722 Xi
^
Yi = (-1.2859 - 0.068 ) + 0.1722 Xi
^
= -1.3539 + 0.1722 X
High school:
If t value of D2 is
statistically significant
Yi = (-1.2859 + 0.447 ) + 0.1722 Xi
^
= -0.8389 + 0.1722 Xi
College:
If t value of D3 is
statistically significant
= -1.2859 + 0.1722 X
= -1.2859 + 0.1722 X
If t-test is not
statistically significant
Measuring the estimated results of different groups:
25. One Qualitative variable with many categories :
Example : An estimate model on three different
age’s medical care expenditure
Yi = 1 + ’1 D1 + ’’1 D2 + 2 Xi + ui
(t-value) (t-value)
where D1 = 1 if 55 > age > 25
= 0 otherwise
D2 = 1 if age > 55
= 0 otherwise
A1 + A2 1
A2 =1
A1 =1
0
25 55
26. measure the estimated models are :
age below 25 Y = 1 + 2 X
^ ^ ^
Y = (1 + ’1D1)+ 2 X
^ ^ ^ ^
25 < age < 55
age > 55 Y = (1 + ’’1D2)+2 X
^ ^ ^ ^
H0 : ’1 = 0, ’’1 = 0 t1
*
H1 : ’1 0, ’’1 0 t2
*
Compare to tc
(α/2, n-k)
27. ’0
^
Y = ( 1 + ’1)+ 2X
^ ^ ^ ^
^
Y = ( 1+ ”1)+2X
^ ^ ^ ^
’’0
In scatter diagram :
0
^
Y
X
Y = (1 ) + 2 X
^ ^ ^
28. One Qualitative variable with many categories :
Example : An estimate model on four different age’s
medical care expenditure
Y = 1 + ’1 D1 + ”1 D2 + ”’1 D3 + 2 X + u
where D1 = 1 if age > 55
= 0 otherwise
D2 = 1 if 35 < age 55
= 0 otherwise
D3 = 1 if 15 < age 35
= 0 otherwise
29. Measure the estimated models are :
age 15 Y = 1 + 2 X
^ ^ ^
15 < age 35 Y = (1 + ’1D3) + 2 X
^ ^ ^ ^
35 < age 55 Y = (1 + ”2D2)+ 2 X
^ ^ ^ ^
age > 55 Y = (1 + ’’’1D1)+ 2 X
^ ^ ^ ^
30. Two qualitative variables
(Y) Salary = 1 + ’1D1 + ”1 D2 + 2X + u
or Y = 1+ ’1D1+ ”1D2 + 2X + ’2D1*X + ”2D2*X + u’
D1 = 1 if male
= 0 otherwise
sex
D2 = 1 if white
= 0 otherwise race
(1) Mean salary for “non-white” female teacher:
Y = 1 + 2X that is D1 = 0, D2 = 0
^ ^ ^
(2) Mean salary for “non-white” male teacher:
Y = (1 + ’1 D1) + (2+ ’2D1)X that is D1 = 1, D2 = 0
^ ^ ^ ^ ^
31. (3) Mean salary for “white” female teacher:
Y = (1 + ’’1 D2) + 2 X + ”2D2X that is D1 = 0, D2 = 1
^ ^ ^ ^
(4) Mean salary for “white” male teacher:
Y = (1 + ’1 D1 +”1D2)+ (2+ ’2D1+ ”2D2)X that is D1 = 1,
D2 = 1
^ ^ ^ ^ ^ ^ ^
32. D = 1 if 1970-1981
= 0 otherwise
(1982-1995)
Different types of dummy regression:
Y = 1 + 2 X + ’1D + ’2D*X
H0 : ’1 = 0 and ’2 = 0
Y = 1 + 2 X + ’1 D + ’2D*X
H0 : ’1 = 0
Y = 1 + 2 X + ’1D + ’2D*X
H0 : ’1 0 and ’2 0
Y = 1 + 2 X + ’1 D + ’2D*X
H0 : ’2 = 0
1.
2
3
4
34. Y
X
A0 = B0
1
B1
Concurrent regressions
A1
1
A0 = B0, A1 B1
Y
X
A0
1
A1
dissimilar regressions
A0 B0, A1 B1
B0
1
B1
35. Interactive effects between the two qualitative variables
Spending(Y) = 1 + ’1 D1 + ”1 D2 + 2 income(X) + u
D1 = 1 if female
= 0 otherwise
sex
D2 = 1 if college graduate
= 0 otherwise
education
Spending(Y) = 1 + ’1D1 + ”1D2 + ’”1D1*D2 + 2income(X) + u
Interaction effect:
’1 = different effect of being a female
”1= different effect of being a college graduate
”’1 = different effect of being a female with college graduate
36. Example : how can we test the hypothesis that the gasoline
spending is different between a new car and a used car ?
Let us assume that at the begin mile, there is no
different between used car and new car.
gas spending
miles running
Y
X
0
^
New car Y = 1 + 2 X
^ ^ ^
used car Y = 1+ 2X
^
Y = 1+ (2 + ’2)X
^ ^ ^ ^
^ ^
37. The estimated relations are :
used car : Yi = 1 + (2 + ’2D) Xi where D = 1
^ ^ ^ ^
new car : Yi = 1 + 2 Xi
^ ^ ^
==
Yi = 1 + 2 Xi
==
^ ^ ^
or
If ’2 0, means the estimated slopes for cars is different.
^
Let 2= 2 + ’2 D where D = 1 if used car
= 0 otherwise
Now in one model :
multiplicative
dummy variable
Yi = 1 + (2 + ’2 D) Xi + ui
= 1 + 2 Xi + ’2 D*Xi +ui
= 1 + 2 Xi + ’2 Zi + ui
38. Test whether ’2 = 0 or not ?
^
(ii) use t-test on ’2:Y = 1 +2 Xi + ’2 Z
^
^ ^ ^ ^
H0 : ’2 = 0
^
H1 : ’2 > 0 Used car is spending more gasoline per mile
^
compare tc
(α, N-3) and t*
If t* > tc
(α, N-3) reject H0
or H1: ’2 0 If t* > tc
(α/2, n-3) reject Ho
(i) Compare two separate models: (a) Y = 1 + 2 X
(b) Y = ’1 + 2X
^ ^ ^
^ ^ ^
40. Shifts in both intercept and slope
E = 1 + 2 T + u
E : electricity consumption
T : temperature
To capture effect of seasonal factors
E = 1 + ’1D1 +”1D2 + ’’’1 D3 + 2T + u
where D1 = 1 if winter
0 otherwise
D2 = 1 if spring
0 otherwise
D3 = 1 if summer
0 otherwise
spring summer fall winter
Q1 Q2 Q3 Q4
41. ’’’1
E=(1+ ”’1)+ 2T (Summer)
^
^
^
^
^
1
^
T
E
E = 1 + 2T (Fall)
^ ^
^
E = (1 + ”1) + 2 T (Spring)
^ ^
^
^
’’1
^
E = (1 + ’1) + 2 T (winter)
^ ^ ^
’1
^
^
Measure the basic difference of four seasonal results :
Fall E = 1 + 2 T
^ ^ ^
Spring E = (1 + ”1)+ 2 T
^ ^ ^ ^
Winter E = (1 + ’1)+ 2 T
^ ^ ^ ^
Summer E = ( 1+”’1)+ 2 T
^ ^ ^ ^
42. Also consider the slope in different seasons
Let *2 = 2 + ’2D1 + ’’2 D2 + ’’’2 D3
Thus, the full general specification is
E = [1+ ’1D1 + ”1D2+ ”’1D3] + 2T + ’2 D1 T + ”2D2 T
+ ”’2D3 T + u Z1 Z2
Z3
43. Measure the four seasonal results :
Fall E = 1 + 2 T
^ ^ ^
Spring E = (1 + ”1)+(2 + ”1) T
^ ^ ^ ^
^
Winter E = (1 + ’1)+ (2 + ’2) T
^ ^ ^ ^
^
Summer E = ( 1+”’1)+ (2 + ”’2) T
^ ^ ^ ^
^
1
^
T
E
E = 1 + 2T (Fall)
^ ^
^
E = (1 + ’1)+(2 +’2)T(winter)
^ ^ ^ ^ ^
’1
^
E = (1 + ”1)+(2 + ”2)T (Spring)
^
^
^
^
’’1
^
’’’1
E=(1+ ”’1)+(2+ ”’2)T(Summer)
^
^
^
^
^
^
44. Quarterly effect is same as seasonal effect
D1 = 1 1st Quarter
= 0 otherwise
D2 = 1 2nd Quarter
= 0 otherwise
D3 = 1 3rd Quarter
= 0 otherwise
Control quarter is the 4th quarter
45. 1. Set the seasonal dummy01= 1 if there is the 1st quarter
= 0 otherwise
47. Basic model
Yt = 1 + 2 Xt + ut
1974
1960
1989
Define a dummy variable : D = 1 for the period
1974 onward
= 0 otherwise
To test whether the structures of two periods are
different, the specification must assume that
*1 = 1 + ’1 D
*2 = 2 + ’2 D
Dummy regression:
Yt = 1 + ’1 D + 2 Xt + ’2D Xt + ut
(2)
48. The Chow test on the Unemployment rate-capacity utilization rate
Dependent Var. Constant CAPt R2 F RSS n
_
unemplt 30.0 -0.293 0.761 93.6 17.15 30
(12.1) (9.7) RSSR
^
unemplt 19.64 -0.175 0.59 19.7 4.69 14
(5.9) (4.4) RSS1
^
unemplt 30.63 -0.296 0.871 102.1 3.29 16
(13.1) (10.1) RSS2
^
Note : t-values are in parentheses
49. H0 : No structural change
H1 : yes
For the unrestricted model :
RSSu = RSS1 + RSS2 = 4.69 + 3.29
= 7.98
F* =
(RSSR - RSSu) / k
RSSu / (T - 2k)
=
(17.15 - 7.98) / 2
7.98 / (30 - 4)
= 14.9
F* > Fc ==> reject H0
Fc
0.01, k, T -2k = Fc
0.01 = 5.53
0.05 0.05, 2, 26 = 3.37
Restriction F-test procedures:
50. Sample : 1960 - 1989
Dt = 1 1974 to 1980
= 0 prior to 1974
unempl = 19.6 + 11.0 Dt - 0.175 CAPt - 0.121 (Dt*CAPt)
^
(6.7) (2.7) (5.0) (2.5)
R2 = 0.88 SEE = 0.554 F = 72.2 n = 30
_
The estimated of 1974-1980:
unempl = (19.6+11.0) - (0.175+0.121)CAP
= 30.6 - 0.296 CAP
^
^
The estimated of 1960-1973:
unempl = 19.6 - 0.175 CAP
Using the dummy variable to identify the structural change
59. ln Y = 1 + 2 X + ’1 D
(Salary) (years of teaching)
D1 = 1 for male
= 0 otherwise
ln Y = 2.9298 + 0.0546 X2 + 0.1341 D
^
t=(481.5) (48.3) (27.2)
R2 = 0.995 DW = 2.51
exp(0.1341)=1.1435
Exp(0)=1
This means the starting salary of male teacher is higher than the
female teacher by 14.35 percent.
The estimated male teacher salary :
ln Y = (2.9298 + 0.1341) + 0.0546 X
ln Y = 3.0639 + 0.0546 X
^
^