14 dummy

Methods of Economic
Research
Lecture 14
Dummy Variables
Presentation of regression results

3/2/2011 1

yi = 0 + 1 xi + 2 d1i + ui i = 1,.....,n

Teacher’s Pay
Slope = β1

Slope = β1

Starting salary for
males = β0 + 2
Starting salary for
females = β0

Years of
teaching
• 2 categories/ groups : female and male experience, x
• female is chosen as the base group (benchmark group), if observation is
3/2/2011 d =0
female 1i 2

• The model is to have 2 intercepts: and +

Dummy Variables
(number of categories > 2)
 We may have a group of dummy variables
where the number of categories is greater
than two.
e.g: 4 seasons –
Spring, Summer, Autumn, Winter
4 age groups – 16-25, 26-40, 41-55, 56-64
years.
 different locations- urban, semi-rural, rural
3/2/2011 3

A General Principle for Including
Dummies to Indicate Different Groups
• If the model is to have different intercepts for N
categories

• N-1 dummy variables are needed.

• The dummy variable coefficient (e.g. 2 ) for a particular
group (male group, d1i=1) represents the estimated
difference in intercepts between that group (male) and
the base group (female, d1i=0).

• The intercept ( 0 ) for the base group is the overall
intercept for the model.
3/2/2011 4

An example: Dummy Variables
(number of categories > 2)
Assume we have a quarterly data containing
the following information:
Aggregate consumption of beer (y)
Personal disposable income (x)

We want to use the seasons together with
income x to explain the variations in y.

3/2/2011 5

Aggregate consumption of beer (y), on
personal disposable income (x), quarterly
data
Quarter y (pints) x (£)
1 Jan -Mar 2003
2 April – June 2003
3 July – Sept 2003
4 Oct – Dec 2003
1 Jan – Mar 2004
:
d1t = 1 if observation t is from the first quarter (JFM) 0 otherwise.
d2t= 1 if observation t is from the second quarter (AMJ) 0 otherwise.
d3t= 1 if observation t is from the third quarter (JAS) 0 otherwise.
d4t = 1 if observation t is from the fourth quarter (OND) 0 otherwise.

3/2/2011 6

Can we use 4 dummies to indicate 4 seasons?
yt = β0 + β1x1 + β2d1t + β3d2t + β4d3t + β5dt4 + ut

NO! We cannot estimate this model.

Let’s consider the four dummy variables and
assume that the first observation in the sample is
from the first quarter. The values taken by the four
dummy variables are shown in the following table.

3/2/2011 7

t d1t d2t d3t d4t idit
1 1 0 0 0 1
2 0 1 0 0 1
3 0 0 1 0 1
4 0 0 0 1 1
5 1 0 0 0 1
6 0 1 0 0 1
7 0 0 1 0 1
8 0 0 0 1 1
9 1 0 0 0 1
10 0 1 0 0 1
: : : : : :

d1t + d2t + d3t + dt4=1
3/2/2011 8

We cannot estimate this model
because
• An assumption of the classical linear regression
model: NO EXACT LINEAR RELATIONSHIP
among any of the independent variable in the
model.
• A necessary assumption for estimation of the
parameters of the model.
• If there is an exact linear relationship, it is
impossible to disentangle the separate
influences of the different explanatory variables

3/2/2011 9

The Dummy Variable Trap
• In the case of using N dummies to indicate
N groups, perfect multicollinearity is
introduced.
• This is known as the “dummy variable
trap”, when too many dummies describe a
given number of groups.

3/2/2011 10

• The solution to the problem is to omit one
category.
• Does not matter which one to omit. The
omitted category is the base group.

3/2/2011 11

• Omit d1, the model becomes:
yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut

• Because we have omitted d1, quarter 1 becomes
the “base quarter”.
• β0 in the model is the intercept for quarter
1, overall intercept in the model.

3/2/2011 12

Aggregate Quarter 1, all these dummies = 0
consumption
of beer
(pints), y

Aggregate
0 personal
disposable
income, x
3/2/2011 13

yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut
consumption
of beer
(pints), y

Aggregate
0 personal
disposable
income, x
3/2/2011 14

consumption
of beer
(pints), y

Slope = β1

Intercept for Q1 β0
Aggregate
Overall intercept of the model
0 personal
disposable
income, x
3/2/2011 15

Quarter 2, d2=1 but d3 and d4= 0
Aggregate
consumption
of beer
(pints), y

Slope = β1

β0
Aggregate
0 personal
disposable
income, x
3/2/2011 16

= β2
Quarter 2, d2t=1 but d3t and dt4= 0
Aggregate
consumption
of beer
(pints), y

Slope = β1

β0
Aggregate
0 personal
disposable
income, x
3/2/2011 17

= β2
Aggregate
consumption
of beer
(pints), y

Slope = β1
Slope = β1

Intercept for Q2 β0 + 2

Aggregate
0 personal
disposable
income, x
3/2/2011 18

= β3
Aggregate
consumption
of beer
(pints), y
Slope =β1

Slope = β1
Slope = β1



Aggregate
0 personal
disposable
income, x
3/2/2011 19

= β4
Quarter 4, d4t=1 but d2t and d4t= 0
Aggregate
consumption
of beer Slope =β1
(pints), y Slope =β1

Slope = β1
Slope = β1



Aggregate
0 personal
disposable
income, x
3/2/2011 20

Dummy variables – interpreting the
results
Example of how to interpret our results:
The aggregate consumption of beer in the
fourth quarter (OND) is estimated to be β4
pints higher (or lower) than in the first
quarter (JFM), ceteris paribus (everything
else remaining the same).

3/2/2011 21

Dummy variables – interpreting the
results
• It does not matter which of the four
dummies you leave out. This will affect the
parameter estimates, but not the position of
the regression lines. (Try to verify this in
seminar 7)
• You always compare the estimates of the
coefficients on the dummies you include with
the omitted category.

3/2/2011 22


We can test for the joint significance of the
“season” dummies using an F-test of:
H 0: 2 = 3 = 4 = 0

H1: H0 is not true.

3/2/2011 23

Important rule

With a set of dummy variables indicating category
(e.g. season, social class, occupational
class, marital status, region, postcode), always
omit one of them from the model to avoid the
problem of perfect multicollinearity (the “dummy
variable trap”).
The problem of perfect multicollinearity arises
because there is a perfect linear relationship
between the variables on the right hand side of the
model.
3/2/2011 24

It is incorrect to say that “the dummy
variables are perfectly correlated with each
other”; this is a common error.
The interpretations of coefficients on the
included dummies are made in comparison
to the omitted one.

3/2/2011 25

Presentation of Regression results

When presenting regression results in a
document, there are two possibilities. You can
present a table similar to the results table in
PASW.
a
Coefficients

Unstandardized Standardized
Coefficients Coefficients 95% Confidence Interval for B
Model B Std. Error Beta t Sig. Lower Bound Upper Bound
1 (Constant) 18.500 9.086 2.036 .088 -3.732 40.732
X .140** .030 .882 4.590 .004 .065 .215
a. Dependent Variable: Y

3/2/2011 26

• Alternatively, you can present an equation, with
standard errors and t-statistics appearing in
brackets underneath the coefficients.
• When there are just a few variables, this second
method is more appropriate.
ˆ
Y 18.500 0.140 X
(se) (9.086) (0.030)
(t ) (2.036) (4.590)
** Indicates strong significance
(p-value<0.01)

3/2/2011 27

You can also edit the results table in PASW
– therefore, you can add the stars (**) as
appropriate.

3/2/2011 28

p-value of a hypothesis test
When we conduct a t-test, we look at the t-
statistic (t-ratio) and compare it with a critical
value from our tables df=n-k-1. This tells us
whether to reject the null hypothesis or not.
If we reject H0, it is also useful to know how
strong is the evidence against H0.

3/2/2011 29

Definition: The p-value of a test is the
probability of obtaining a more extreme
value than the one we have actually
obtained, if H0 is true.
The smaller the p-value – the stronger the
evidence against H0.

3/2/2011 30

• If p-value < 0.01, there is strong evidence
against H0 (**).
• If p-value < 0.05, there is evidence against
H0 (*).
• If p-value < 0.10, there is mild evidence
against H0.
• If p-value > 0.10, we do not have evidence
to reject H0.
3/2/2011 31

Conclusions from hypothesis tests
– correct wording
Correct Incorrect

X has a significant β1 has a significant
effect on Y. effect on Y.
ˆ
β1 is significantly β1 is significantly
different from zero different from zero.

ˆ
β1 is significantly β1 is positive.
positive.
3/2/2011 32

Further point – there is evidence
that………….
Eg. Remember to say:
• There is evidence that food is a normal
good.
Don’t say – Food is a normal good.
• When p-value > 0.10, we do not have
enough evidence to reject the H0 . Or, we do
not reject the H0.
Don’t say – We accept H0.
3/2/2011 33

14 dummy

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

14 dummy