0
Methods of Economic                 Research                      Lecture 14                  Dummy Variables           Pr...
yi =   0   +   1   xi +   2   d1i + ui   i = 1,.....,n               Teacher’s Pay                                        ...
Dummy Variables           (number of categories > 2) We may have a group of dummy variableswhere the number of categories...
A General Principle for Including Dummies to Indicate Different Groups• If the model is to have different intercepts for N...
An example: Dummy Variables             (number of categories > 2)Assume we have a quarterly data containingthe following ...
Aggregate consumption of beer (y), onpersonal disposable income (x), quarterlydataQuarter                      y (pints)  ...
Can we use 4 dummies to indicate 4 seasons?yt = β0 + β1x1 + β2d1t + β3d2t + β4d3t + β5dt4 + utNO! We cannot estimate this ...
t d1t d2t d3t d4t    idit                1 1 0 0 0            1                2 0 1 0 0            1                3 0 0...
We cannot estimate this model            because• An assumption of the classical linear regression  model: NO EXACT LINEAR...
The Dummy Variable Trap• In the case of using N dummies to indicateN groups, perfect multicollinearity isintroduced.• This...
• The solution to the problem is to omit onecategory.• Does not matter which one to omit. Theomitted category is the base ...
• Omit d1, the model becomes:yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut• Because we have omitted d1, quarter 1 becomesthe...
yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut           Aggregate     Quarter 1, all these dummies = 0           consumption...
yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut           Aggregate     Quarter 1, all these dummies = 0           consumption...
yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut                     Aggregate       Quarter 1, all these dummies = 0          ...
yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut                         Quarter 2, d2=1 but d3 and d4= 0           Aggregate  ...
yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut                         = β2                         Quarter 2, d2t=1 but d3t ...
yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut                                = β2                                Quarter 2, ...
yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut                                           = β3                              Qu...
yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut                                                           = β4                ...
Dummy variables – interpreting the                 resultsExample of how to interpret our results:The aggregate consumptio...
Dummy variables – interpreting the                 results• It does not matter which of the fourdummies you leave out. Thi...
yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + utWe can test for the joint significance of the“season” dummies using an F-test o...
Important ruleWith a set of dummy variables indicating category(e.g. season, social class, occupationalclass, marital stat...
It is incorrect to say that “the dummyvariables are perfectly correlated with eachother”; this is a common error.The inter...
Presentation of Regression results When presenting regression results in a document, there are two possibilities. You can ...
• Alternatively, you can present an equation, withstandard errors and t-statistics appearing inbrackets underneath the coe...
You can also edit the results table in PASW– therefore, you can add the stars (**) asappropriate.3/2/2011                 ...
p-value of a hypothesis testWhen we conduct a t-test, we look at the t-statistic (t-ratio) and compare it with a criticalv...
p-value of a hypothesis testDefinition: The p-value of a test is theprobability of obtaining a more extremevalue than the ...
p-value of a hypothesis test• If p-value < 0.01, there is strong evidenceagainst H0 (**).• If p-value < 0.05, there is evi...
Conclusions from hypothesis tests         – correct wordingCorrect                IncorrectX has a significant    β1 has a...
Further point – there is evidencethat………….Eg. Remember to say:• There is evidence that food is a normalgood.  Don’t say – ...
Upcoming SlideShare
Loading in...5
×

14 dummy

820

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
820
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
43
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "14 dummy"

  1. 1. Methods of Economic Research Lecture 14 Dummy Variables Presentation of regression results3/2/2011 1
  2. 2. yi = 0 + 1 xi + 2 d1i + ui i = 1,.....,n Teacher’s Pay Slope = β1 Slope = β1Starting salary formales = β0 + 2Starting salary forfemales = β0 Years of teaching • 2 categories/ groups : female and male experience, x • female is chosen as the base group (benchmark group), if observation is 3/2/2011 d =0 female 1i 2 • The model is to have 2 intercepts: and +
  3. 3. Dummy Variables (number of categories > 2) We may have a group of dummy variableswhere the number of categories is greaterthan two.e.g: 4 seasons –Spring, Summer, Autumn, Winter4 age groups – 16-25, 26-40, 41-55, 56-64years. different locations- urban, semi-rural, rural3/2/2011 3
  4. 4. A General Principle for Including Dummies to Indicate Different Groups• If the model is to have different intercepts for N categories• N-1 dummy variables are needed.• The dummy variable coefficient (e.g. 2 ) for a particular group (male group, d1i=1) represents the estimated difference in intercepts between that group (male) and the base group (female, d1i=0).• The intercept ( 0 ) for the base group is the overall intercept for the model.3/2/2011 4
  5. 5. An example: Dummy Variables (number of categories > 2)Assume we have a quarterly data containingthe following information:Aggregate consumption of beer (y)Personal disposable income (x)We want to use the seasons together withincome x to explain the variations in y.3/2/2011 5
  6. 6. Aggregate consumption of beer (y), onpersonal disposable income (x), quarterlydataQuarter y (pints) x (£)1 Jan -Mar 20032 April – June 20033 July – Sept 20034 Oct – Dec 20031 Jan – Mar 2004:d1t = 1 if observation t is from the first quarter (JFM) 0 otherwise.d2t= 1 if observation t is from the second quarter (AMJ) 0 otherwise.d3t= 1 if observation t is from the third quarter (JAS) 0 otherwise.d4t = 1 if observation t is from the fourth quarter (OND) 0 otherwise.3/2/2011 6
  7. 7. Can we use 4 dummies to indicate 4 seasons?yt = β0 + β1x1 + β2d1t + β3d2t + β4d3t + β5dt4 + utNO! We cannot estimate this model.Let’s consider the four dummy variables andassume that the first observation in the sample isfrom the first quarter. The values taken by the fourdummy variables are shown in the following table.3/2/2011 7
  8. 8. t d1t d2t d3t d4t idit 1 1 0 0 0 1 2 0 1 0 0 1 3 0 0 1 0 1 4 0 0 0 1 1 5 1 0 0 0 1 6 0 1 0 0 1 7 0 0 1 0 1 8 0 0 0 1 1 9 1 0 0 0 1 10 0 1 0 0 1 : : : : : : d1t + d2t + d3t + dt4=13/2/2011 8
  9. 9. We cannot estimate this model because• An assumption of the classical linear regression model: NO EXACT LINEAR RELATIONSHIP among any of the independent variable in the model.• A necessary assumption for estimation of the parameters of the model.• If there is an exact linear relationship, it is impossible to disentangle the separate influences of the different explanatory variables3/2/2011 9
  10. 10. The Dummy Variable Trap• In the case of using N dummies to indicateN groups, perfect multicollinearity isintroduced.• This is known as the “dummy variabletrap”, when too many dummies describe agiven number of groups.3/2/2011 10
  11. 11. • The solution to the problem is to omit onecategory.• Does not matter which one to omit. Theomitted category is the base group.3/2/2011 11
  12. 12. • Omit d1, the model becomes:yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut• Because we have omitted d1, quarter 1 becomesthe “base quarter”.• β0 in the model is the intercept for quarter1, overall intercept in the model.3/2/2011 12
  13. 13. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut Aggregate Quarter 1, all these dummies = 0 consumption of beer (pints), y Aggregate 0 personal disposable income, x3/2/2011 13
  14. 14. yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut Aggregate Quarter 1, all these dummies = 0 consumption of beer (pints), y Aggregate 0 personal disposable income, x3/2/2011 14
  15. 15. yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut Aggregate Quarter 1, all these dummies = 0 consumption of beer (pints), y Slope = β1Intercept for Q1 β0 AggregateOverall intercept of the model 0 personal disposable income, x 3/2/2011 15
  16. 16. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut Quarter 2, d2=1 but d3 and d4= 0 Aggregate consumption of beer (pints), y Slope = β1 β0 Aggregate 0 personal disposable income, x3/2/2011 16
  17. 17. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut = β2 Quarter 2, d2t=1 but d3t and dt4= 0 Aggregate consumption of beer (pints), y Slope = β1 β0 Aggregate 0 personal disposable income, x3/2/2011 17
  18. 18. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut = β2 Quarter 2, d2t=1 but d3t and dt4= 0 Aggregate consumption of beer (pints), y Slope = β1 Slope = β1Intercept for Q2 β0 + 2 Intercept for Q1 β0 Aggregate 0 personal disposable income, x 3/2/2011 18
  19. 19. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut = β3 Quarter 3, d3t=1 but d2t and dt4= 0 Aggregate consumption of beer (pints), y Slope =β1 Slope = β1 Slope = β1Intercept for Q3 β0 + 3Intercept for Q2 β0 + 2 Intercept for Q1 β0 Aggregate 0 personal disposable income, x 3/2/2011 19
  20. 20. yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut = β4 Quarter 4, d4t=1 but d2t and d4t= 0 Aggregate consumption of beer Slope =β1 (pints), y Slope =β1 Slope = β1 Slope = β1 Intercept for Q4 β0 + 4Intercept for Q3 β0 + 3Intercept for Q2 β0 + 2 Intercept for Q1 β0 Aggregate 0 personal disposable income, x 3/2/2011 20
  21. 21. Dummy variables – interpreting the resultsExample of how to interpret our results:The aggregate consumption of beer in thefourth quarter (OND) is estimated to be β4pints higher (or lower) than in the firstquarter (JFM), ceteris paribus (everythingelse remaining the same).3/2/2011 21
  22. 22. Dummy variables – interpreting the results• It does not matter which of the fourdummies you leave out. This will affect theparameter estimates, but not the position ofthe regression lines. (Try to verify this inseminar 7)• You always compare the estimates of thecoefficients on the dummies you include withthe omitted category.3/2/2011 22
  23. 23. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + utWe can test for the joint significance of the“season” dummies using an F-test of: H 0: 2 = 3 = 4 = 0 H1: H0 is not true.3/2/2011 23
  24. 24. Important ruleWith a set of dummy variables indicating category(e.g. season, social class, occupationalclass, marital status, region, postcode), alwaysomit one of them from the model to avoid theproblem of perfect multicollinearity (the “dummyvariable trap”).The problem of perfect multicollinearity arisesbecause there is a perfect linear relationshipbetween the variables on the right hand side of themodel.3/2/2011 24
  25. 25. It is incorrect to say that “the dummyvariables are perfectly correlated with eachother”; this is a common error.The interpretations of coefficients on theincluded dummies are made in comparisonto the omitted one.3/2/2011 25
  26. 26. Presentation of Regression results When presenting regression results in a document, there are two possibilities. You can present a table similar to the results table in PASW. a Coefficients Unstandardized Standardized Coefficients Coefficients 95% Confidence Interval for BModel B Std. Error Beta t Sig. Lower Bound Upper Bound1 (Constant) 18.500 9.086 2.036 .088 -3.732 40.732 X .140** .030 .882 4.590 .004 .065 .215 a. Dependent Variable: Y 3/2/2011 26
  27. 27. • Alternatively, you can present an equation, withstandard errors and t-statistics appearing inbrackets underneath the coefficients.• When there are just a few variables, this secondmethod is more appropriate. ˆ Y 18.500 0.140 X (se) (9.086) (0.030) (t ) (2.036) (4.590)** Indicates strong significance (p-value<0.01)3/2/2011 27
  28. 28. You can also edit the results table in PASW– therefore, you can add the stars (**) asappropriate.3/2/2011 28
  29. 29. p-value of a hypothesis testWhen we conduct a t-test, we look at the t-statistic (t-ratio) and compare it with a criticalvalue from our tables df=n-k-1. This tells uswhether to reject the null hypothesis or not.If we reject H0, it is also useful to know howstrong is the evidence against H0.3/2/2011 29
  30. 30. p-value of a hypothesis testDefinition: The p-value of a test is theprobability of obtaining a more extremevalue than the one we have actuallyobtained, if H0 is true.The smaller the p-value – the stronger theevidence against H0.3/2/2011 30
  31. 31. p-value of a hypothesis test• If p-value < 0.01, there is strong evidenceagainst H0 (**).• If p-value < 0.05, there is evidence againstH0 (*).• If p-value < 0.10, there is mild evidenceagainst H0.• If p-value > 0.10, we do not have evidenceto reject H0.3/2/2011 31
  32. 32. Conclusions from hypothesis tests – correct wordingCorrect IncorrectX has a significant β1 has a significanteffect on Y. effect on Y. ˆ β1 is significantly β1 is significantlydifferent from zero different from zero. ˆ β1 is significantly β1 is positive.positive.3/2/2011 32
  33. 33. Further point – there is evidencethat………….Eg. Remember to say:• There is evidence that food is a normalgood. Don’t say – Food is a normal good.• When p-value > 0.10, we do not haveenough evidence to reject the H0 . Or, we donot reject the H0.Don’t say – We accept H0.3/2/2011 33
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×