Upcoming SlideShare
×

# 14 dummy

1,247 views

Published on

Published in: Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,247
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
68
0
Likes
0
Embeds 0
No embeds

No notes for slide

### 14 dummy

1. 1. Methods of Economic Research Lecture 14 Dummy Variables Presentation of regression results3/2/2011 1
2. 2. yi = 0 + 1 xi + 2 d1i + ui i = 1,.....,n Teacher’s Pay Slope = β1 Slope = β1Starting salary formales = β0 + 2Starting salary forfemales = β0 Years of teaching • 2 categories/ groups : female and male experience, x • female is chosen as the base group (benchmark group), if observation is 3/2/2011 d =0 female 1i 2 • The model is to have 2 intercepts: and +
3. 3. Dummy Variables (number of categories > 2) We may have a group of dummy variableswhere the number of categories is greaterthan two.e.g: 4 seasons –Spring, Summer, Autumn, Winter4 age groups – 16-25, 26-40, 41-55, 56-64years. different locations- urban, semi-rural, rural3/2/2011 3
4. 4. A General Principle for Including Dummies to Indicate Different Groups• If the model is to have different intercepts for N categories• N-1 dummy variables are needed.• The dummy variable coefficient (e.g. 2 ) for a particular group (male group, d1i=1) represents the estimated difference in intercepts between that group (male) and the base group (female, d1i=0).• The intercept ( 0 ) for the base group is the overall intercept for the model.3/2/2011 4
5. 5. An example: Dummy Variables (number of categories > 2)Assume we have a quarterly data containingthe following information:Aggregate consumption of beer (y)Personal disposable income (x)We want to use the seasons together withincome x to explain the variations in y.3/2/2011 5
6. 6. Aggregate consumption of beer (y), onpersonal disposable income (x), quarterlydataQuarter y (pints) x (£)1 Jan -Mar 20032 April – June 20033 July – Sept 20034 Oct – Dec 20031 Jan – Mar 2004:d1t = 1 if observation t is from the first quarter (JFM) 0 otherwise.d2t= 1 if observation t is from the second quarter (AMJ) 0 otherwise.d3t= 1 if observation t is from the third quarter (JAS) 0 otherwise.d4t = 1 if observation t is from the fourth quarter (OND) 0 otherwise.3/2/2011 6
7. 7. Can we use 4 dummies to indicate 4 seasons?yt = β0 + β1x1 + β2d1t + β3d2t + β4d3t + β5dt4 + utNO! We cannot estimate this model.Let’s consider the four dummy variables andassume that the first observation in the sample isfrom the first quarter. The values taken by the fourdummy variables are shown in the following table.3/2/2011 7
8. 8. t d1t d2t d3t d4t idit 1 1 0 0 0 1 2 0 1 0 0 1 3 0 0 1 0 1 4 0 0 0 1 1 5 1 0 0 0 1 6 0 1 0 0 1 7 0 0 1 0 1 8 0 0 0 1 1 9 1 0 0 0 1 10 0 1 0 0 1 : : : : : : d1t + d2t + d3t + dt4=13/2/2011 8
9. 9. We cannot estimate this model because• An assumption of the classical linear regression model: NO EXACT LINEAR RELATIONSHIP among any of the independent variable in the model.• A necessary assumption for estimation of the parameters of the model.• If there is an exact linear relationship, it is impossible to disentangle the separate influences of the different explanatory variables3/2/2011 9
10. 10. The Dummy Variable Trap• In the case of using N dummies to indicateN groups, perfect multicollinearity isintroduced.• This is known as the “dummy variabletrap”, when too many dummies describe agiven number of groups.3/2/2011 10
11. 11. • The solution to the problem is to omit onecategory.• Does not matter which one to omit. Theomitted category is the base group.3/2/2011 11
12. 12. • Omit d1, the model becomes:yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut• Because we have omitted d1, quarter 1 becomesthe “base quarter”.• β0 in the model is the intercept for quarter1, overall intercept in the model.3/2/2011 12
13. 13. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut Aggregate Quarter 1, all these dummies = 0 consumption of beer (pints), y Aggregate 0 personal disposable income, x3/2/2011 13
14. 14. yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut Aggregate Quarter 1, all these dummies = 0 consumption of beer (pints), y Aggregate 0 personal disposable income, x3/2/2011 14
15. 15. yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut Aggregate Quarter 1, all these dummies = 0 consumption of beer (pints), y Slope = β1Intercept for Q1 β0 AggregateOverall intercept of the model 0 personal disposable income, x 3/2/2011 15
16. 16. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut Quarter 2, d2=1 but d3 and d4= 0 Aggregate consumption of beer (pints), y Slope = β1 β0 Aggregate 0 personal disposable income, x3/2/2011 16
17. 17. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut = β2 Quarter 2, d2t=1 but d3t and dt4= 0 Aggregate consumption of beer (pints), y Slope = β1 β0 Aggregate 0 personal disposable income, x3/2/2011 17
18. 18. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut = β2 Quarter 2, d2t=1 but d3t and dt4= 0 Aggregate consumption of beer (pints), y Slope = β1 Slope = β1Intercept for Q2 β0 + 2 Intercept for Q1 β0 Aggregate 0 personal disposable income, x 3/2/2011 18
19. 19. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + ut = β3 Quarter 3, d3t=1 but d2t and dt4= 0 Aggregate consumption of beer (pints), y Slope =β1 Slope = β1 Slope = β1Intercept for Q3 β0 + 3Intercept for Q2 β0 + 2 Intercept for Q1 β0 Aggregate 0 personal disposable income, x 3/2/2011 19
20. 20. yt = β0 + β1x1 + β2d2t + β3d3t + β4d4t + ut = β4 Quarter 4, d4t=1 but d2t and d4t= 0 Aggregate consumption of beer Slope =β1 (pints), y Slope =β1 Slope = β1 Slope = β1 Intercept for Q4 β0 + 4Intercept for Q3 β0 + 3Intercept for Q2 β0 + 2 Intercept for Q1 β0 Aggregate 0 personal disposable income, x 3/2/2011 20
21. 21. Dummy variables – interpreting the resultsExample of how to interpret our results:The aggregate consumption of beer in thefourth quarter (OND) is estimated to be β4pints higher (or lower) than in the firstquarter (JFM), ceteris paribus (everythingelse remaining the same).3/2/2011 21
22. 22. Dummy variables – interpreting the results• It does not matter which of the fourdummies you leave out. This will affect theparameter estimates, but not the position ofthe regression lines. (Try to verify this inseminar 7)• You always compare the estimates of thecoefficients on the dummies you include withthe omitted category.3/2/2011 22
23. 23. yt = β0 + β1x1 + β2d2t + β3d3t + β4dt4 + utWe can test for the joint significance of the“season” dummies using an F-test of: H 0: 2 = 3 = 4 = 0 H1: H0 is not true.3/2/2011 23
24. 24. Important ruleWith a set of dummy variables indicating category(e.g. season, social class, occupationalclass, marital status, region, postcode), alwaysomit one of them from the model to avoid theproblem of perfect multicollinearity (the “dummyvariable trap”).The problem of perfect multicollinearity arisesbecause there is a perfect linear relationshipbetween the variables on the right hand side of themodel.3/2/2011 24
25. 25. It is incorrect to say that “the dummyvariables are perfectly correlated with eachother”; this is a common error.The interpretations of coefficients on theincluded dummies are made in comparisonto the omitted one.3/2/2011 25
26. 26. Presentation of Regression results When presenting regression results in a document, there are two possibilities. You can present a table similar to the results table in PASW. a Coefficients Unstandardized Standardized Coefficients Coefficients 95% Confidence Interval for BModel B Std. Error Beta t Sig. Lower Bound Upper Bound1 (Constant) 18.500 9.086 2.036 .088 -3.732 40.732 X .140** .030 .882 4.590 .004 .065 .215 a. Dependent Variable: Y 3/2/2011 26
27. 27. • Alternatively, you can present an equation, withstandard errors and t-statistics appearing inbrackets underneath the coefficients.• When there are just a few variables, this secondmethod is more appropriate. ˆ Y 18.500 0.140 X (se) (9.086) (0.030) (t ) (2.036) (4.590)** Indicates strong significance (p-value<0.01)3/2/2011 27
28. 28. You can also edit the results table in PASW– therefore, you can add the stars (**) asappropriate.3/2/2011 28
29. 29. p-value of a hypothesis testWhen we conduct a t-test, we look at the t-statistic (t-ratio) and compare it with a criticalvalue from our tables df=n-k-1. This tells uswhether to reject the null hypothesis or not.If we reject H0, it is also useful to know howstrong is the evidence against H0.3/2/2011 29
30. 30. p-value of a hypothesis testDefinition: The p-value of a test is theprobability of obtaining a more extremevalue than the one we have actuallyobtained, if H0 is true.The smaller the p-value – the stronger theevidence against H0.3/2/2011 30
31. 31. p-value of a hypothesis test• If p-value < 0.01, there is strong evidenceagainst H0 (**).• If p-value < 0.05, there is evidence againstH0 (*).• If p-value < 0.10, there is mild evidenceagainst H0.• If p-value > 0.10, we do not have evidenceto reject H0.3/2/2011 31
32. 32. Conclusions from hypothesis tests – correct wordingCorrect IncorrectX has a significant β1 has a significanteffect on Y. effect on Y. ˆ β1 is significantly β1 is significantlydifferent from zero different from zero. ˆ β1 is significantly β1 is positive.positive.3/2/2011 32
33. 33. Further point – there is evidencethat………….Eg. Remember to say:• There is evidence that food is a normalgood. Don’t say – Food is a normal good.• When p-value > 0.10, we do not haveenough evidence to reject the H0 . Or, we donot reject the H0.Don’t say – We accept H0.3/2/2011 33