What is dummy variable?
Qualitative variable usually indicates the presence and absence of quality or an
attribute such as male and female, black and white, democrat and republican. If the
qualitative variables takes only two values 0 and 1 (absence or presence) then the
variable is called dummy variable. Example: suppose a qualitative variable sex
indicates the presence or absence of attribute such as male or female.
“1” may indicate that a person is a male and “0” may indicate that a person is
female. Variables that assume that “0” and “1” values are called dummy variables.

Alternative name of dummy variable
-indicator variable
-binary variable
-qualitative variable
-categorical variable
-dichotomous variable

Explain dummy variables in term of model or ANOVA model.
Dummy variables can be used in regression model just as easily as qualitative
variables. As a matter of fact that a linear regression model may contain
explanatory variables that are exclusively dummy or qualitative in nature. Such
model are called analysis of variance model or ANOVA model.
Let us consider the following modelYi=α+βDi+µi …………. (i)
where Yi= annual salary of a college professor
1 𝑖𝑓 𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟
𝐷𝑖 = {
0 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟
Di is called dummy variable
µi ~ NID (0, σ²)
we get from equation (i)
Mean salary of female professor,
E(Yi│Di=o)= α
Mean salary of male professor,
E(Yi│Di=1)= α+β

Interpretation:
Here the intercept term α gives the mean salary of female college
professor. The slope coefficient β tells by how much the mean salary of male
professor differs from the mean salary of his female counter part.
α+β reflecting the mean salary of college professor.

Write down the advantages of dummy variables.
1. Dummy variables are data classifying device that is they divide a sample into
various subgroups based on qualitative or attributes.
2. If a model has several qualitative variables with several classes introduction of
dummy variables can consume a large number of d.f.
3. Since the dummy variable are non-stochastic they create no special problems
in the application of OLS.

What is a dummy variable trap? How will you avoid dummy variable
trap?
Let us consider a modelYi= α1+ α2D2i+ α3D3i+βXi+µi ………………… (i)
Here Yi are the annual salary of a college professor.
Xi is the years of teaching experience of college professor
D2i = {

1 𝑖𝑓 𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟
0 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟

D3i = {

1 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟
0 𝑖𝑓 𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟

The model (i) cannot be estimated because of perfect collinearity between D2 and
D3. To see this we have a sample of 3 male professors and 2 female professors.
The design matrix is-
Male
Male
Female
Male
Female

Y1
Y2
Y3
Y4
Y5

α1
1
1
1
1
1

D2
1
1
0
1
0

D3
0
0
1
0
1

X
X1
X2
X3
X4
X5

The first column denote the common intercept term α1. We see that,
D2 =1-D3 and D3 =1-D2.
That means, D2 and D3 are perfectly collinear. Thus avoiding the perfect
collinearity the general rule is if a qualitative variable has m categories then it has
only (m-1) dummy variables. If this rule is not followed we shall fall into dummy
variable trap.
To avoid the dummy variable trap we can write the above model asYi= α2D2i+ α3D3i+βXi+µi
In this mode we have drop the intercept term αi. If we drop the intercept term αi we
will not fall into perfect multicollinearity/the dummy variable trap because we
have no longer the perfect collinearity.

Comparing two regression lines in terms dummy variable approach
Let us consider, pool all n1 and n2 observations together and estimating the
following regressionYi= α1+ α2Di+ β1Xi+ β2DiXi +µi …………………….(i)
Where, Yi and Xi are savings and income and
Di = {

1 𝑓𝑜𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑖𝑛 𝑡ℎ𝑒 1𝑠𝑡
0 𝑓𝑜𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡

To see the implication of model (i) and assuming that, E(µi)=0 we obtain
E (Yi│Di=0; Xi) = αi+β1Xi ………….. (ii)
E (Yi│Di=1; Xi) = (α1+ α2) + (β1+ β2)Xi……………… (iii)
Let,

α1 = γ1
β1 = γ 2
α1+ α2= λ1
β1+ β2= λ2

So the equation of (ii) and (iii) is,
E (Yi│Di=0; Xi) = γ1+ γ2Xi…………….. (iv)
E (Yi│Di=1; Xi) = λ1+ λ2Xi……………... (v)
Therefore estimating equation (i) is equivalent to estimating the two individuals,
Re-construction period and post-reconstruction period. Where in equation (i) α1 is
the differential intercept term and α2 is the differential slope coefficient.

Find out the aggregate saving income relationship has changed between
the two periods.
Let us consider two linear regression model are, Re-construction period
Yi= λ1+ λ2Xi+ µ1i………………….(i)
i=1,2,…,ni
Post-construction period,
Yi= γ1+ γ2Xi+ µ2i………………….(i)
i=1,2,…,n2i
where, Yi= savings
X= income
µ1i and µ2i are the disturbance term in the two regression model.
Now regression model (i) and (ii) present the following four possibility
1) If λ1= γ1 and λ2= γ2 that means, the two regression model are identical then
it is called coincident regression.
Y

λ2= γ2

λ1= γ1
X
Income
(a) Coincident

2) If λ1≠γ1 and λ2= γ2 that means the two regression differ only in their
locations that means intercept then it is called parallel regression.

Y
λ2= γ2

λ2= γ2
γ1
λ1
X
(b) Coincident
3) If λ1=γ1 and λ2≠ γ2 that means the two regression have same intercept
different slopes. Then it is called concurrent regression

Y
γ2
λ2

λ1= γ1
X
(c) concurrent

4) If λ1≠γ1 and λ2≠ γ2 then the two regression equation are completely
different that means the regression is called dissimilar regression.
Y
γ2

λ2
λ1
γ1
X
(d) dissimilar

Question: Suppose the college professor salary regression model defined asYi= α1+ α2D2i+ α3D3i + α4(D2iD3i)+BXi +µi
Where Yi=annual salary of a college professor
Xi= years of experience
D2=

1 if male professor
0 if female professor

D3= 1 if the professor is white or 0 otherwise
Explain the terms (i) α2 (ii) α4 (iii) D2iD3i
(v) What about the effect of female and non-white professor
(vi) Find,
E(Yi│D2=1, D3=1,Xi=10) and interpret it.
Solution:
1. α2 is the differential effect of being male professor
2. α4 is the differential effect of male-white professor
3. D2iD3i be the interaction between two qualitative variables D2 and D3. It
means non-white have lower mean salary i. e they are male or female. A
female non-white may earn lower salary than a male non-white. So
interaction may be expressed such kind of assumption which may be
untrainable
4. The effect of female and non-white professor are the followingE(Yi│D2i=0, D3i=0)= α1+βXi
5. So it can be concluded that the mean salary depends on only the slope
coefficient and the coefficient of years of experience.
6.
E(Yi│D2=1, D3=1,Xi=10)
So the mean salary of male and white professor is which is the mean salary of
male and white professor when years of experience are 10 years.

Dummy variable

  • 1.
    What is dummyvariable? Qualitative variable usually indicates the presence and absence of quality or an attribute such as male and female, black and white, democrat and republican. If the qualitative variables takes only two values 0 and 1 (absence or presence) then the variable is called dummy variable. Example: suppose a qualitative variable sex indicates the presence or absence of attribute such as male or female. “1” may indicate that a person is a male and “0” may indicate that a person is female. Variables that assume that “0” and “1” values are called dummy variables. Alternative name of dummy variable -indicator variable -binary variable -qualitative variable -categorical variable -dichotomous variable Explain dummy variables in term of model or ANOVA model. Dummy variables can be used in regression model just as easily as qualitative variables. As a matter of fact that a linear regression model may contain explanatory variables that are exclusively dummy or qualitative in nature. Such model are called analysis of variance model or ANOVA model. Let us consider the following modelYi=α+βDi+µi …………. (i) where Yi= annual salary of a college professor 1 𝑖𝑓 𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟 𝐷𝑖 = { 0 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟
  • 2.
    Di is calleddummy variable µi ~ NID (0, σ²) we get from equation (i) Mean salary of female professor, E(Yi│Di=o)= α Mean salary of male professor, E(Yi│Di=1)= α+β Interpretation: Here the intercept term α gives the mean salary of female college professor. The slope coefficient β tells by how much the mean salary of male professor differs from the mean salary of his female counter part. α+β reflecting the mean salary of college professor. Write down the advantages of dummy variables. 1. Dummy variables are data classifying device that is they divide a sample into various subgroups based on qualitative or attributes. 2. If a model has several qualitative variables with several classes introduction of dummy variables can consume a large number of d.f. 3. Since the dummy variable are non-stochastic they create no special problems in the application of OLS. What is a dummy variable trap? How will you avoid dummy variable trap? Let us consider a modelYi= α1+ α2D2i+ α3D3i+βXi+µi ………………… (i) Here Yi are the annual salary of a college professor. Xi is the years of teaching experience of college professor D2i = { 1 𝑖𝑓 𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟 0 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟 D3i = { 1 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟 0 𝑖𝑓 𝑚𝑎𝑙𝑒 𝑝𝑟𝑜𝑓𝑒𝑠𝑠𝑜𝑟 The model (i) cannot be estimated because of perfect collinearity between D2 and D3. To see this we have a sample of 3 male professors and 2 female professors. The design matrix is-
  • 3.
    Male Male Female Male Female Y1 Y2 Y3 Y4 Y5 α1 1 1 1 1 1 D2 1 1 0 1 0 D3 0 0 1 0 1 X X1 X2 X3 X4 X5 The first columndenote the common intercept term α1. We see that, D2 =1-D3 and D3 =1-D2. That means, D2 and D3 are perfectly collinear. Thus avoiding the perfect collinearity the general rule is if a qualitative variable has m categories then it has only (m-1) dummy variables. If this rule is not followed we shall fall into dummy variable trap. To avoid the dummy variable trap we can write the above model asYi= α2D2i+ α3D3i+βXi+µi In this mode we have drop the intercept term αi. If we drop the intercept term αi we will not fall into perfect multicollinearity/the dummy variable trap because we have no longer the perfect collinearity. Comparing two regression lines in terms dummy variable approach Let us consider, pool all n1 and n2 observations together and estimating the following regressionYi= α1+ α2Di+ β1Xi+ β2DiXi +µi …………………….(i) Where, Yi and Xi are savings and income and Di = { 1 𝑓𝑜𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑖𝑛 𝑡ℎ𝑒 1𝑠𝑡 0 𝑓𝑜𝑟 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑒𝑥𝑡 To see the implication of model (i) and assuming that, E(µi)=0 we obtain E (Yi│Di=0; Xi) = αi+β1Xi ………….. (ii) E (Yi│Di=1; Xi) = (α1+ α2) + (β1+ β2)Xi……………… (iii)
  • 4.
    Let, α1 = γ1 β1= γ 2 α1+ α2= λ1 β1+ β2= λ2 So the equation of (ii) and (iii) is, E (Yi│Di=0; Xi) = γ1+ γ2Xi…………….. (iv) E (Yi│Di=1; Xi) = λ1+ λ2Xi……………... (v) Therefore estimating equation (i) is equivalent to estimating the two individuals, Re-construction period and post-reconstruction period. Where in equation (i) α1 is the differential intercept term and α2 is the differential slope coefficient. Find out the aggregate saving income relationship has changed between the two periods. Let us consider two linear regression model are, Re-construction period Yi= λ1+ λ2Xi+ µ1i………………….(i) i=1,2,…,ni Post-construction period, Yi= γ1+ γ2Xi+ µ2i………………….(i) i=1,2,…,n2i where, Yi= savings X= income µ1i and µ2i are the disturbance term in the two regression model. Now regression model (i) and (ii) present the following four possibility 1) If λ1= γ1 and λ2= γ2 that means, the two regression model are identical then it is called coincident regression. Y λ2= γ2 λ1= γ1 X Income
  • 5.
    (a) Coincident 2) Ifλ1≠γ1 and λ2= γ2 that means the two regression differ only in their locations that means intercept then it is called parallel regression. Y λ2= γ2 λ2= γ2 γ1 λ1 X (b) Coincident 3) If λ1=γ1 and λ2≠ γ2 that means the two regression have same intercept different slopes. Then it is called concurrent regression Y γ2 λ2 λ1= γ1
  • 6.
    X (c) concurrent 4) Ifλ1≠γ1 and λ2≠ γ2 then the two regression equation are completely different that means the regression is called dissimilar regression. Y γ2 λ2 λ1 γ1 X (d) dissimilar Question: Suppose the college professor salary regression model defined asYi= α1+ α2D2i+ α3D3i + α4(D2iD3i)+BXi +µi Where Yi=annual salary of a college professor Xi= years of experience D2= 1 if male professor 0 if female professor D3= 1 if the professor is white or 0 otherwise Explain the terms (i) α2 (ii) α4 (iii) D2iD3i (v) What about the effect of female and non-white professor (vi) Find,
  • 7.
    E(Yi│D2=1, D3=1,Xi=10) andinterpret it. Solution: 1. α2 is the differential effect of being male professor 2. α4 is the differential effect of male-white professor 3. D2iD3i be the interaction between two qualitative variables D2 and D3. It means non-white have lower mean salary i. e they are male or female. A female non-white may earn lower salary than a male non-white. So interaction may be expressed such kind of assumption which may be untrainable 4. The effect of female and non-white professor are the followingE(Yi│D2i=0, D3i=0)= α1+βXi 5. So it can be concluded that the mean salary depends on only the slope coefficient and the coefficient of years of experience. 6. E(Yi│D2=1, D3=1,Xi=10) So the mean salary of male and white professor is which is the mean salary of male and white professor when years of experience are 10 years.