Dummy variable model

Bule Hora University
College of Business and Economics
Department of Economics
Course Title: Econometrics Theory and Application
Course Code: MLSCM 2033
Credit Hr: 3 ECTC: 5
Aschalew Shiferaw
1

Dummy Variable Model
• In the regression analysis the dependent variable may also be
influenced by variable that are qualitative in nature (in addition to
quantitative variables).
• Such variables include Sex, Marital status, job category, region,
seasons, etc.
• We quantify such variables by artificially assigning values to them (
for example, assigning 0 and 1 to sex, where 0 indicates male and 1
indicates female), and then in the regression equation together with
the other independent variables. Such variables are called dummy
variable.
2

• ANOVA model
• This model involves only dummy variables as explanatory variables.
• Example: Consider the following model:
Where Yi= annul starting salary of an employee
3
1 2 .......(1)
i i i
Y D u
 
  
1
0
i
for male
D
for female

 


• Under the usual assumption of CLRM, the mean salary for a female
employee is:
4
1 2 1 1
0
1 2 1 2 1 2
0
( / 0) ( (0) ) ( )
:
( / 1) ( (1) ) ( )
i i i i
i i i i
E Y D E u E u
and themean salary for a maleemployeeis
E Y D E u E u
   
     


      
        

• ANCOVA models
• Regression models in most economic research involves quantitative
explanatory variables in addition to dummy variables. Such models
are known as Covariance (ANCOVA).
• Example: Consider the following model:
• Where Yi= annual starting salary of an employee
Xi= Years of work experiences
5
1
0
i
for male
D
for female

 

1 2 3 .......(2)
i i i i
Y D X u
  
   

• Assuming that E(ui)=0 we can see that the mean salary of a female
employee is:
• And the mean salary for a male employee is:
Remark: If we use two dummy variables( one for each male and
female), our model becomes:
6
1 2 3 1 3 1 3
0
( / 0) ( (0) ) ( )
i i i i i i i
E Y D E X u X E u X
      

         
1 2 3 1 2 3 1 2 3
0
( / 1) ( (1) ) ( ) ( )
i i i i i i i
E Y D E X u X E u X
        

           
1 2 1 3 2 4
1 2
.......(3)
1 1
0 0
i i i i i
i i
Y D D X u
for male for female
where D and D
for female for male
   
    
 
 
 
 

Cont…
• In model (3) it can clearly be seen that ,that is perfect
Multicollinearity between .Consequently, the model can not
be estimated.
• Thus, the number of dummies should be one less than the number of
categories. For example, if a variable has four categories, we should
construct only three dummy variables.
• Note
1.The category that is assigned a value 0 is referred to as the base category
or the benchmark category, and all comparisons are made with reference
to this category.
2.The Coefficient attached to the dummy variables (e.g α2 in the model (2)) is
referred to as the differential intercept coefficient. It tells us by how much
the value of the intercept term of the category that is assigned the value 1
differs from that of the base category.
7
1 2 1
i i
D D
 
1 2
i i
D and D

Cont…
• Dummy variable model are also used if one has take care of seasonal
factors. For example, if we have quarterly data on consumption (C)
and income (Y), we fit the regression model.
• In equation (4) , the constant term( ) is the intercept for the base
category (quarter IV). The intercept terms for quarters I,II and III are
8
0 1 1 2 2 3 3 4
1 2 3
1 2 3
.......(4)
, :
1 1 1
,
0 0 0
i i i i i i
i i i
Y D D D Y u
Where D D and D areseasonal dummiesdefined by
for quarter I for quarter II for quarter III
D D and D
otherwise otherwise otherwise
    
     
  
  
  
  
0

0 1 0 2 0 3
( ), ( ) ( ),
and respectively
     
  

Test of Structural Stability
• Suppose we are interested in estimating a simple saving function that
relates domestic households savings(S) with gross domestic product
(Y) for a certain country. Suppose further that, at certain point of
time, a series of economic reform have been introduced.
• The hypothesis here is that such reforms might have considerably
influenced the savings-income R/ship, that is, the R/ship b/n saving
and income might be different in the post-reform period as
compared to that in the pre-reform period.
• If this hypothesis is true, then we say a structural change has
happened. How we check if this is so?
9

1. Chow’s test
One approach for testing the presence of structural change (structural
instability) is by means of Chow’s test. The step involved in this
procedure are as follows:
a) Estimate the regression equation
For the whole period (pre-reform plus post- reform periods) and find
the Error Sum of Square ( )
b) Estimate equation (5) using the available data in the pre-reform
period (say, of size n1), that is, estimate the model:
10
, 1,2,..., .......(5)
i i i
S Y i n
  
   
R
ESS
1 1 1
, 1,2,...,
i i i
S Y i n
  
   

And find the Error sum of square ( )
c) Estimate equation (5) using the available data in the post-reform
period (say, of size n2), that is, estimate the model:
And find the Error Sum of Square ( )
d) Calculate:
e)Calculate the Chow test statistic:
11
1
ESS
2 2 2
, 1,2,...,
i i i
S Y i n
  
   
2
ESS
1 2
UR
ESS ESS ESS
 
1 2
( ) /
/( 2 )
.
R UR
UR
R
ESS ESS k
F
ESS n n K
Where ESS is error sum of square for the whole period
K isthenumber of estimated regressioncoefficients


 

f) Decision rule: Reject the null hypothesis of identical intercept and
slope for pre-reform and post-reform period, that is:
Where is the critical value from F-distribution
with K (in our case K=2) and degree of freedom for a given
significance level α.
Note that rejecting Ho (the null hypothesis) means there is a
structural change.
12
1 2
1 2
1 2
:
:
( , 2 )
Ho
if
F F K n n k

 
 





 
1 2
( , 2 )
F K n n k
  
1 2 2
n n k
 

Illustrative Example:
1. The following data is on domestic household saving (S) and gross
domestic product (Y) for India for the period 1980 to 2002.
13
year Savings GDP year Savings GDP
1980 27136 401128 1992 162906 737791.6
1981 31355 425072.8 1993 193621.3 781345
1982 34368 438079.5 1994 251463.4 838031
1983 38587 471742.2 1995 298747.3 899563
1984 46063 492077.3 1996 317260.7 970082
1985 54167 513990 1997 352178 1016595
1986 58951 536256.7 1998 374659 1082748
1987 72908 556777.8 1999 468681 1148368
1988 87913 615098.4 2000 495986 1198592
1989 106979 656331.2 2001 535185 1267833
1990 131340 692871.5 2002 597697 1318321
1991 143908 701863.2
1 1
2 2
644994361.865( in ( 12))
2736652790.434( in ( 11))
13937337067.461( ).
:
R
ESS error sum of squares the pre reform period n
ESS error sum of squares the post reform period n
ESS error sum of squares for the whole period
We then have
  
  

1 2 3381647152.299
UR
ESS ESS ESS
  

• The test statistics is:
• Decision: Since the calculated value of F exceeds the tabulated value, we
reject the null hypothesis of identical intercept and slopes for pre-reform
and post-reform periods at the 5% level of significance. Thus, we can
conclude that there is a structural change.
14
1 2
( ) /
/( 2 )
R UR
UR
ESS ESS k
F
ESS n n K


 
(13937337067.461) 3381647152.299) / 2
29.65
3381647152.299/(12 11 2(2))
2 19deg
5% 3.52
F
Thetabulated value fromthe F distribution with and reeof freedomat
the level of significanceis

 
 


• Drawbacks
• Chow’s test does not tell us whether the difference (change) is in the
slope only, in intercept only or in both the intercept and the slope.
2.Using Dummy variables
Write the saving function as:
15
1 2 3 ( ) , 1,2,..., .......(6)
, :
0
1
t t t t t t
t
S o D Y DY u i n
Where St is household saving at time t Yt isGDPat timet and
pre reform period
D
post reform period
   
     


 



• Here is the differential slope of coefficient indicating how much
the slope coefficient of the pre-reform period saving function differs
from the slope coefficient of the savings function in the post reform
period. Observe that
• If β1 and β3 are both statistically significant as judged by the t-test,
then the pre-reform and post-reform regressions differ in both the
intercept and the slope.
• However, if only β1 is statistically significant, then the pre-reform and
post-reform regression differ only in the intercept( meaning the
marginal propensity to save(MPS) is the same for pre-reform and
post-reform periods).
16
3

2
1 2 3
( / 0, )
( / 1, ) ( ) ( )
t t t t
t t t t
E S D Y o Y
E S D Y o Y
 
   
  
    

• Similarly , if only β3 is statistically significant , then the two
regressions differ only in the slope(MPS).
Estimating equation (6) using above data by OLS yields the following
results.
17
Unstandardized
Coefficients
standardized
Coefficients
t Sig
β Std.Error Beta B Std.Error
(constant) -1336990.916 21258.997 -6.289 0.000
Dt -228628.947 30775.092 -0.641 -7.429 0.000
Yt 0.375 0.039 0.596 9.717 0.000
DtYT 0.339 0.044 1.003 7.674 0.000

• We can see that both differential intercept coefficient
And differential slope coefficient are statistically
significant. Thus the saving- income R/ship for the two periods is
different.
18
1 228628.947( 0.001)
p value


  
3 0.339( 0.001)
p value


 

Limited Dependent variable model
• Dependent variable in a regression equation simply represent a
discrete choice assuming only a limited number of values.
• Model involving dependent variable of this kind are called limited
(discrete) dependent variable models ( also called qualitative
response model.
• Example of choices
A)whether to work or not
B) whether to attend formal education or not
C) Choice of occupation
D)Which brand of consumer durable goods to purchase
19

• In the above situation, the variables are discrete valued. Example (a)
and (b) the variables are binary ( having only two possible values),
whereas, the variable in (c) and (d) are multinomial ( having more
than two but a finite number of distinct values).
• In such cases instead of standard regression models, we apply
different methods of modeling and analyzing discrete data.
• Example
Suppose the choice is whether to work or not. The discrete dependent
variable we are working with will assume only two values 0 and 1:
20
th
th
1 if i individual is working/seeking work
0 if i individual is not working
i
Y


 



• There are four approaches to developing a probability
model for a binary response variable:
1. The linear probability model (LPM)
2. The logit model
3. The probit model
4. The tobit model
21

The Linear Probability Model (LPM)
• To fix ideas, consider the following regression model:
• where X = family income and Y = 1 if the family owns a house and 0 if
it does not own a house.
Model (*) looks like a typical linear regression model but because the
regressand is binary, or dichotomous, it is called a linear probability
model (LPM).
• This is because the conditional expectation of Yi given Xi , E(Yi | Xi ),
can be interpreted as the conditional probability that the event will
occur given Xi , that is, Pr (Yi = 1 | Xi ).
22
i 1 2 i i
Y = ß + ß X + u ...............(* )

Thus, in our example, E(Yi | Xi ) gives the probability of a family owning a
house and whose income is the given amount Xi .
The justification of the name LPM for models like Eq. (*) can be seen as
follows:
• Assuming E(ui ) = 0, as usual (to obtain unbiased estimators), we obtain
• Now, if Pi = probability that Yi = 1 (that is, the event occurs), and (1 − Pi ) =
probability that Yi = 0 (that is, the event does not occur), the variable Yi
has the following (probability) distribution:
Yi Probability
0 1 − Pi
1 Pi
Total 1
• That is, Yi follows the Bernoulli probability distribution.
23
i i 1 2 i
E(Y | X ) = ß + ß X ......(**)

• That is, Yi follows the Bernoulli probability distribution.
• Now, by the definition of mathematical expectation, we obtain:
E(Yi ) = 0(1 − Pi ) + 1(Pi ) = Pi (***)
• Comparing Eq. (**) with Eq. (***), we can equate
E(Yi | Xi ) = β1 + β2Xi = Pi (****)
• that is, the conditional expectation of the model (*) can, in fact, be interpreted as
the conditional probability of Yi. In general, the expectation of a Bernoulli random
variable is the probability that the random variable equals 1.
• In passing note that if there are n independent trials, each with a probability p of
success and probability (1 − p) of failure, and X of these trials represent the
number of successes, then X is said to follow the binomial distribution. The
mean of the binomial distribution is np and its variance is np(1 − p). The term
success is defined in the context of the problem.
24

Logit and Probit
• Logit and probit models are the most widely used models for
estimating the functional relationship between dependent and
independent variables in practice.
• Logit and probit models are also among the generalized linear models (GLM)
family.
• If the latent variable is unobserved or the dependent variable is binary, this
model cannot be estimated using the normal least squares method (OLS).
• Instead, the maximum probability estimate is used which requires
assumptions about the distribution of errors.
• Often, the choice is between the normal errors in the probit model and
the logistic errors in the logit model
25

Con’t…
• Limited dependent variables are generally divided into two groups:
censored and truncated regression models.
• The Tobit model is also known as the censored regression model.
• When the dependent variable is censored, least squares estimates
give biased results. Therefore, when censored is applied to the
dependent variable, the Tobit model allows us to derive consistent
and asymptotically efficient predictors.
• Tobit model, which is known as models where the dependent
variable has a lower or upper limit, was first used by Tobin to analyze
household expenditures by working on durable consumer goods
considering the fact that expenditure cannot be negative.
26

Logit Model
• In the logit regression model, none of the assumptions (linear
distribution of the dependent variable, withdrawal of independent
variables from normal distribution, normal distribution of the error
term and no relationship between error term values, etc.) involved in
the linear regression analysis are not sought/required.
• Therefore, it provides researchers with considerable flexibility and has
become a more preferred method.
• A general linear regression model can be written as expressed in
Equation 1, where yi is a dependent variable and xi is an independent
variable.
𝑦𝑖=𝛼+𝛽1𝑥1+𝛽2𝑥2+⋯+𝛽𝑘𝑥𝑘+𝜀𝑖 (1)
27

Logit Model
• In the above model, 𝛼 constant term and 𝛽 are regression coefficients. This
model can be predicted by classical OLS when the dependent variable is
continuous. However, logit or probit regression methods are used in cases
where the dependent variable is discrete.
• The Logit model can be used to model the probability of a particular class or
event with two states.
• Suppose that the unobservable or latent variable generated from the observed
variable 𝑦𝑖 between −∞ and +∞. is 𝑦𝑖∗.
• Values greater than 𝑦𝑖∗ are considered 𝑦𝑖=1 and values less than or equal to 𝑦𝑖∗
are considered 𝑦𝑖=0.
• The latent variable 𝑦𝑖∗ is assumed to be linearly dependent on the observed 𝑥𝑖
throughout the structural model. 𝑦𝑖∗ is connected to the binary variable 𝑦𝑖
observed by the measurement equation in Equation 2:
28

Logit Model
• Thus, when the dependent variable yi takes “0” and “1”, binary logit takes the
name of the model. When the dependent variable is “1”, the probability is
expressed by Equation 3:
• In this model, 𝑃𝑖 provides information about the argument 𝑥𝑖 while the first
individual expresses the probability of making a particular choice. Thus 𝑃𝑖 also
takes values between “0” and “1”. The equations given in Equation 4 and
Equation 5 can be written here:
31

Logit Model
• To determine the logit function, 𝛼 and 𝛽 parameters cannot be directly
predicted by OLS and Equation 6 is used to estimate the model:
• If equations (3) and (6) are proportional,
32

Logit Model
Equation 7 is obtained. It is also the odds or odds ratio (Odds Ratio, OR).
• Variables close to 1 among these OR values are not the factors that have a
significant effect on the change of 𝑦.
• For OR values greater than 1, it is interpreted that the factor is an important risk
factor provided that the coefficient is significant.
• Values close to zero indicate that the factor is an important risk factor, provided
that the coefficient is significant, but that it is a negative factor that causes the y
to take low values
• Equation 8 can be written by taking the natural logarithm of this model according
to “e” base:
33

Logit Model
• 𝐿𝑖 is the difference rate logarithm and is linear with respect to both 𝑥𝑖 and
parameters. Here 𝐿𝑖 is called the “logit model” . This model is a semi-logarithmic
function. Therefore, the logit model is one of the best known models among
generalized linear models.
• In order to estimate the parameters in the model, when the 𝐿𝑖 function, 𝑃𝑖=1 and
𝑃𝑖=0 are put in their places in logit 𝐿𝑖, then 𝑙𝑛(1/0) and 𝑙𝑛(0/1) values are
obtained which are insignificant.
• Estimates of the parameters in the 𝐿𝑖 function cannot be found by OLS but these
parameters can be estimated by the maximum likelihood model (ML).
34

Logit Model
• However, the following points should be taken into consideration
in research using logit model .
• All appropriate independent variables should be included in the model:
Failure to include some variables in the model may cause the error term to
grow and the model to be inadequate.
• All unsuitable independent variables should be excluded: Inclusion of
causally inappropriate variables in the model can complicate the model.
• Observation should be done on the same individual once and there should
be no repeated measurements.
• The measurement error in the independent variables must be small:
measurement errors should be small, no missing (missing) data. Errors can
lead to bias in estimating coefficients and inadequacy of the model.
• There should be no multicollinearity between the independent variables:
The independent variables must not be interrelated.
• There should be no extreme values: As with linear regression, extreme
values can significantly affect the result. 35

Logit model
• In the Logit model, the coefficients cannot be directly interpreted as
the effect of a change in independent variables on the expected
value of the dependent variable.
• For this reason, OR values or marginal effects can be calculated in
applications. Furthermore, the sign of the coefficients indicates the
direction of the relationship between the argument and the
probability of occurrence of the event.
36

Probit Model
• In the linear probability model, which is one of the qualitative preference models
with qualitative variables that can take two values, the most obvious problem is
that the predicted probability values fall outside the range of “0” and “1”.
• One of the models used to solve this problem is the probit model.
• This model is a nonlinear model in terms of coefficients that allows the
probabilities to remain between “0” and “1”. When the dependent variable 𝑦𝑖 is
binary, 𝑃𝑖 is expressed in Equation 9:
• Here ϕ is the cumulative distribution function and 𝛽 maximum likelihood
coefficients of the standard normal distribution.
37

Probit model
• The probit model assumes that the basic dependent variable is
normally distributed, whereas the 𝑦 dependent variable
assumes that the variable is based on the logistic curve.
• Therefore, the tail regions of the logit cumulative distribution
function of these two models are wider than those of the probit
model.
38

Probit model
• Although these two models give similar results, it is not possible to directly
compare the predicted main mass coefficients of the two models.
• Provided that it does not fall outside the range “0” and “1”, a model should be
found so that the relationship between 𝑃𝑖 and 𝑥𝑖 is curvilinear: increases in 𝑥𝑖
also increase 𝑃𝑖. The illustration of the model with the above two features is
given in Fig. 2:
39
Figure 2. Logit and probit cumulative distribution

Probit Model
• The probit model utilizes the cumulative normal distribution function and is called
the “normit model” in the literature. The probit probability model based on the
normal cumulative distribution function can be represented by Equation 10:
• Where 𝑥𝑖 is observable but 𝑦𝑖∗ is not observable.
• As in the Logit model, if 𝑦𝑖=1 then 𝑦𝑖∗>0, but if 𝑦𝑖∗<0 then 𝑦𝑖=0. When assigning the
result of the variable 𝑦𝑖, the value of τ used as the threshold value is generally taken
as “0” and another number value can be used instead of zero .
• Considering that 𝑦𝑖 has a threshold value that cannot be observed as it is and is
expressed as 𝑦𝑖∗, it can be said that if 𝑦𝑖 exceeds the value 𝑦𝑖,∗ the event will occur
and if it does not, the event will not occur (Equation 11).
40

Probit model
• The case that 𝑦𝑖∗ is less than or equal to 𝑦𝑖 is calculated from standardized
cumulative distribution functions under the assumption of normality. If ϕ(Z)
cumulative normal distribution function is defined as ϕ(Z)=P(Z≤z) for the normal
standard variable Z, then Equation 12 and Equation 13 are expressed as follows:
• The variable Z here is a standardized normal variable with a mean of “0” and a
variance of “1”. Thus, the model can be represented by Equation 14:
• In this model, 𝐹−1 is the inverse of the normal cumulative distribution function. It
is possible to state the following assumptions for the Probit model .
41

Cont’d
𝑦𝑖∈{0,1},𝑖=1,2,…,𝑛
𝑃𝑖=𝐸(𝑌=1/𝑥)=ϕ(𝛽𝑥𝑖) (Unit normal cumulative distribution function)
𝑦1,𝑦2 ,…,𝑦𝑛 are statistically independent
• There is no exact or multicollinearity among all 𝑥𝑖's
• Binary probit models WLSM (Weighted Least Squares Method), ML
(Maximum Likelihood Method), minimum chi-square iterative can be
estimated with WLSM. In addition, the coefficient of R2 in the probit
model does not give us any idea as to whether the functional form of
the model is well chosen.
42

Tobit Model
• The sample where the information about the dependent variable is found only
for some observations is known as censored sample.
• This model is also shown among models with a limited dependent variable
because the dependent variable is limited. When censorship is applied to the
dependent variable, the regression model is expressed in Equation 15:
• This model is called “Tobit model”. 𝑦𝑖∗ is the latent variable and τ is the censor
point. Observed and censored for values greater than τ (Equation 16):
• In the traditional Tobit model in Equation 16 when τ=0, some observations above
𝑦𝑖∗ take the value of zero. That is, it is expressed as Equation 17;
43

Tobit Model
• If the dependent variable is censored, having a lower limit and/or an upper limit,
then the least squares estimators of the regression parameters are biased and
inconsistent
• We can apply an alternative estimation procedure, which is called Tobit
• Tobit is a maximum likelihood procedure that recognizes that we have data of two
sorts:
1. The limit observations (y = 0)
2. The non-limit observations (y > 0)
• The two types of observations that we observe, the limit observations and
those that are positive, are generated by the latent variable y* crossing the
zero threshold or not crossing that threshold
44

Difference and similarities among Logit, Probit and
Tobit
• The most commonly used models among these preference models are logit and probit
models.
• Both logit and probit model analyses are very similar and the probability estimates
obtained are close to each other.
• However, while log-odds (likelihood ratios) are used in logit model analysis, the
cumulative normal distribution of probit model is used.
• The structural models of Logit, probit and Tobit are similar, but the models are different.
• In the Tobit model, the observed values of the dependent variable are known when 𝑦𝑖
∗>τ.
• In the probit and logit model, if only 𝑦𝑖∗>τ, 𝑦 value is “1”. However, if the data are
below the threshold (τ), they cannot be known and the value 𝑦 is assumed to be zero.
• More information is available on the Tobit model. Therefore, it is expected that
coefficient estimations obtained from Tobit model will be more effective than those
obtained from probit model.
45

Multinomial logit model
• We are often faced with choices involving more than two
alternatives
• These are called multinomial choice situations
• If you are shopping for a laundry detergent, which one
do you choose? OMO, DURU, POPULAR, SPECIAL
BRIGHT, and so on
• If you enroll in the business school, will you major in
economics, marketing, management, finance, or
accounting?
46

Multinomial logit model
• Multinomial logistic regression is used to predict categorical
placement in or the probability of category membership on a
dependent variable based on multiple independent variables.
• The independent variables can be either dichotomous (i.e., binary) or
continuous (i.e.,interval or ratio in scale).
• Multinomial logistic regression is a simple extension of binary logistic
regression that allows for more than two categories of the dependent
or outcome variable.
• Like binary logistic regression, multinomial logistic regression uses
maximum likelihood estimation to evaluate the probability of
categorical membership.
• The multinomial logistic model assumes that data are case specific;
that is, each independent variable has a single value for each case. 47

Dummy variable model

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dummy variable model

Similar to Dummy variable model (20)

Recently uploaded

Recently uploaded (20)

Dummy variable model