Upcoming SlideShare
×

# Econometric in application

456 views

Published on

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
456
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
22
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Econometric in application

1. 1. Econometrics ASSIGNMENT ON‘Multiple Regression Analysis’ Prepared For:- Dr. Md. Kamal Uddin Professor Department of International Business University of Dhaka Prepared By:- Hazera Akter Roll No: 01 8th Semester , BBA 1St Batch Department Of International Business University of Dhaka Date of Submission 7th April , 2012
2. 2. Assignment Topic ‘Multiple Regression Analysis with Test ofHeteroskedasticity, Autocorrelation and Multicollinearity ’ Table of Contents Topics Page No. Analysis Summery Data Set ANALYSIS SUMMARY 2
3. 3. In multiple regression analysis, we study the relationship between an explainedvariable and a number of explanatory variables. In this Assignment, the currentsalary structure has been analyzed with the effects of some influential factors forsetting salary. The purpose of this analysis includes,Cause analysis: Learn more about the relationship between several independentvariables and a dependent variable.Impact analysis: Assess the impact of changing an independent variable to thevalue of dependent variable.Time series analysis: Predict values of a time series, using either previous valuesof just that one series, or values from other series as well.In the detailed analysis of the Multiple Regression, The Interpretation incudes,• Considering the R2 (0.491) value ,we can infer that for overall estimation this model is not strong.• The model for Salary estimation for Employee of Coca-Cola company includes almost all collinear variables.• But this model is very useful considering for having very low Heteroskedasticity and Autocorrelation problem.So, these overall analysis results would help the management of Coca-Colacompany to set or estimate Salary in revised decision round. Data Set 3
4. 4. A multinational corporation named “The Coca-Cola Company” would like to study ontheir employees’ salary structure in their Bangladesh Subsidiary Venture, bypredicting Salary based on some influential factors like Gender, Age, EducationLevel of the employees. A sample of 30 employees’ current salary data israndomly drawn to perform a Regression analysis. The Data set is exhibitedbelow_In this Data set,Dependent Variable, Y= Current SalaryID Current Gende Job Age Education Work Minority Salary (Tk) r Seniority Level Experience Class1 16080 0 81 28.50 16 0.25 02 41400 0 73 40.33 16 12.50 13 21960 1 83 31.08 15 4.08 04 19200 0 93 31.17 16 1.83 15 28350 0 83 41.92 19 13.00 06 27250 1 80 29.50 18 2.42 07 16080 0 79 28.00 15 3.17 08 14100 0 67 28.75 15 0.50 09 12420 1 96 27.42 15 1.17 110 12300 1 77 52.92 12 26.42 011 15720 0 84 33.50 15 6.00 112 8880 1 88 54.33 12 27.00 013 22000 0 93 32.33 17 2.67 014 22800 0 98 41.17 15 12.00 015 19020 1 64 31.92 19 2.25 116 12300 1 94 46.25 12 20.00 017 22200 1 81 30.75 19 5.17 018 10380 1 72 32.67 15 6.92 119 8520 0 70 58.50 15 31.00 020 27500 0 89 34.17 17 3.17 021 11460 1 79 46.58 15 21.75 122 20500 0 83 35.17 16 5.75 023 27700 0 85 43.25 20 11.17 124 28000 1 65 28.00 16 1.58 125 22000 1 65 39.75 19 10.75 026 27250 0 78 30.08 19 2.92 027 27000 0 83 30.17 17 0.75 128 9000 1 70 44.50 12 18.00 029 31300 0 91 30.17 18 3.92 130 11760 0 70 26.83 15 1.25 0 4
5. 5. Independent Variable,X1= Sex of EmployeeX2= Job SeniorityX3= Age of EmployeeX4= Education LevelX5= Work ExperienceX6= Minority ClassificationType of Scales Used HereAttributes of measurement object in this analysis can be measured by differenttypes of scales:Nominal Scale: X1= Sex of Employee “ Where Male = 0 and Female = 1” X6= Minority Classification “ Where White = 0 and Nonwhite = 1”Ratio Scale: X2= Job Seniority(Years in only in Coca-Cola) X3= Age of Employee(Years) X4= Education Level(Scores) X5= Work Experience(Years- overall job life)All of these Variable has Numeric Value and can obtain an absolute Zero.So, In this Multivariate Data Set we have to perform a Multiple RegressionAnalysis for predicting Possible Current Salary of an employee.NOTE: All the analysis has been performed with the “SPSS” Software. For theease of presentation of analysis the Variables are discussed with their detailednames/meanings. MULTIPLE REGRESSION ANALYSIS RESULTS 5
6. 6. Variables Entered/Removed Variables Variables Model Entered Removed Method 1 MINORITY . Enter CLASSIFICATIO N, JOB SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE, EDUCATIONAL LEVEL, WORK EXPERIENCEa a. All requested variables entered. Model Summaryb Adjusted R Std. Error of the Model R R Square Square Estimate 1 .701a .491 .358 6458.883 a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE, EDUCATIONAL LEVEL, WORK EXPERIENCE b. Dependent Variable: CURRENT SALARY ANOVAbModel Sum of Squares df Mean Square F Sig.1 Regression 9.246E8 6 1.541E8 3.694 .010a Residual 9.595E8 23 4.172E7 Total 1.884E9 29 6
7. 7. Coefficientsa Standardized Unstandardized Coefficients CoefficientsModel B Std. Error Beta t Sig.1 (Constant) -25969.540 23234.542 -1.118 .275 SEX OF EMPLOYEE -2126.081 2778.333 -.133 -.765 .452 JOB SENIORITY 82.398 130.286 .100 .632 .533 AGE OF EMPLOYEE 263.053 829.669 .286 .317 .754 EDUCATIONAL LEVEL 2026.429 707.189 .564 2.865 .009 WORK EXPERIENCE -298.406 870.804 -.329 -.343 .735 MINORITY 1846.496 2528.644 .112 .730 .473 CLASSIFICATIONa. Dependent Variable: CURRENT SALARYThus , The estimated Model of Multiple Regression Equation, Y = −25969.54 −2126.081 X1 + 82.398X2 + 263.053X3 + 2026.429X4−298.406 X5 +1846.496 X6 + Ui (Regression of y on x) R2=0.491 Ui= Errors Commentary on resulted ModelThis equation suggests that Education Level is far more important than all otherindependent variables. The equation says that one more score on educationbackground, holding all other independent variables constant, results in anincrease in Salary of TK. 2026. That is, if we consider the persons with thesame level of other positions, the one with one more score of educationcan be expected to have higher salary of TK. 2026.After Education level Minority classification is considered highly in settingsalary structure. Here if we consider people with same level in all other 7
8. 8. independent variables (constant), the one White/ Nonwhite (with anyparticular race determined by company management) can expected tohave incrementing salary structure and thus higher salary of TK. 2126.The equation also says that one more year of job seniority, holding all otherindependent variables constant, results in an increase in Salary of TK. 82. Thatis, if we consider the persons with the same level of other positions, theone with one more year on job on the Coca-Cola company, can beexpected to have higher salary of TK. 82.This equation also shows that one more year of Age, holding all otherindependent variables constant, results in an increase in Salary of TK. 263. Thatis, if we consider the persons with the same level of other positions, theone with one more year of age, can be expected to have higher salary ofTK. 263.This shows the age of Employee is more influential than their jobyears on the company.Here if we consider people with same level in all other independent variables(constant), the one with sex male/ female (with any particular sexdetermined by company management) can expected to havediscriminatory salary structure and thus lower salary of TK. 2126.Of course,all these numbers are subject to uncertainty, it will be clear that we shouldbe dropping the variable X1 completely.Similarly if we consider two people with same education level and holdingall other independent variables constant, the one with one more year ofexperience can expected to have lower salary of TK. 298 2126.Of course,all these numbers are subject to uncertainty, it will be clear that we shouldbe dropping the variable X5 completely.Interpretation of the constant term:Clearly, that is the salary one would get with no qualification in variablefactors and only with minimum quality to be recruited in the company. Buta negative salary is not possible. So, what would be the salary if a personjust joined the firm?In Conclusion, we have to state that the sample is not fully representativefrom all people working in the company. We can not extrapolate the results 8
9. 9. too far out of this sample range. We can not use the equation to predictwhat a new entrant would earn. So at the inference, we can say that thisregression equation model should not be used also for making other generalizeddecisions for any salary structure.Simple Regression for Negative Influencing Factors Show, Variables Entered/Removedb Variables VariablesModel Entered Removed Method1 SEX OF . Enter a EMPLOYEEa. All requested variables entered.b. Dependent Variable: CURRENT SALARY Model Summary Adjusted R Std. Error of theModel R R Square Square Estimate1 .343a .118 .086 7705.174a. Predictors: (Constant), SEX OF EMPLOYEE Coefficientsa Standardized Unstandardized Coefficients CoefficientsModel B Std. Error Beta t Sig.1 (Constant) 22191.765 1868.779 11.875 .000 SEX OF EMPLOYEE -5486.380 2838.880 -.343 -1.933 .063a. Dependent Variable: CURRENT SALARYIt is found that the simple regression of Sex of Employee on Current Salary yetshows negative influence without having all other variable’s influence. But initialsalary(α) is positive here. 9
10. 10. Now, Variables Entered/Removedb Variables VariablesModel Entered Removed Method1 WORK . Enter a EXPERIENCEa. All requested variables entered.b. Dependent Variable: CURRENT SALARY Model Summary Adjusted R Std. Error of theModel R R Square Square Estimate1 .391a .153 .123 7549.967a. Predictors: (Constant), WORK EXPERIENCE Coefficientsa Standardized Unstandardized Coefficients CoefficientsModel B Std. Error Beta t Sig.1 (Constant) 22884.178 1940.377 11.794 .000 WORK EXPERIENCE -355.087 157.964 -.391 -2.248 .033a. Dependent Variable: CURRENT SALARYAgain, It is found that the simple regression of Work of experience on Current Salary yetshows negative influence without having all other variable’s influence. But initialsalary(α) is also positive here.However, after allowing for the effects of Sex of employee and Work of experience, wefind from the multiple regression equation that it also yields lower salary same as simpleregression. So, the omission of variables only yields the positive initial salary(α), butsimilar effect of other independent variables. 10
11. 11. HETEROSKEDASTICITY IN MULTIPLE REGRESSIONIn multiple regression, one of the assumptions we have made until now that theerrors have a common variance. This is known as the homoskedasticityassumption. But, if we don’t have a constant variance we say they areheteroskedastic.In our Data set analyzing through SPSS we get, Descriptive Statistics Mean Std. Deviation NCURRENT SALARY 19814.33 8060.314 30SEX OF EMPLOYEE .43 .504 30JOB SENIORITY 80.47 9.748 30AGE OF EMPLOYEE 36.3227 8.76549 30EDUCATIONAL LEVEL 16.00 2.244 30WORK EXPERIENCE 8.6453 8.87542 30MINORITY .37 .490 30CLASSIFICATION Residuals Statisticsa Minimum Maximum Mean Std. Deviation NPredicted Value 10342.00 29286.66 19814.33 5313.421 30Residual -8926.251 21585.666 .000 6061.042 30Std. Predicted Value -1.783 1.783 .000 1.000 30Std. Residual -1.447 3.499 .000 .983 30a. Dependent Variable: CURRENT SALARY 11
12. 12. Here, Residuals plot trumpet-shaped => Residuals do not have constant variance.Using the residuals this histogram is associated with dependent variable, leavingindependent variables for ease of getting error variance. The graph shows that itis not totally normal distribution. There are some disturbances in this data set.So we get the prevailing, but lower Heteroskedasticity problem here. Model Summaryb Adjusted R Std. Error of the Model R R Square Square Estimate 1 .701a .491 .358 6458.883 a. Predictors: (Constant), MINORITY CLASSIFICATION, JOB SENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE, EDUCATIONAL LEVEL, WORK EXPERIENCE b. Dependent Variable: CURRENT SALARYAccording to White and Gleijser test, we measure Heteroskedasticity problembased on R2. So here we don’t reject hypothesis of Homoskedasticity(R 2<0.50). 12
13. 13. In this Normal P-P Plot, we get least square line which is also very near to benormal. So, we get here also very lower Heteroskedasticity problem. 13
14. 14. Again, regressing Standardized Residual on Standardized Predicted value, we findvery Heteroskedasticity problem for showing no particular trend in this plot.Although, We have very low Heteroskedasticity problem, we can solve the restby “Possible correction => log transformation of variable weight”This log linear form’s R2 are not comparable, since the variance of dependentvariable is different. 14
15. 15. AUTOCORRELATION IN MULTIPLE REGRESSIONIn multiple Regression analysis, the correlation between error terms, is calledAutocorrelation. For detecting Autocorrelation problem Durbin-Watson test isthe simplest and most commonly used. Here the ϕ for testing hypothesis ofhaving Autocorrelation in Data set. Model SummarybModel Durbin-Watson1 2.168aa. Predictors: (Constant), MINORITY CLASSIFICATION, JOBSENIORITY, AGE OF EMPLOYEE, SEX OF EMPLOYEE,EDUCATIONAL LEVEL, WORK EXPERIENCEb. Dependent Variable: CURRENT SALARY Coefficientsa CorrelationsModel Zero-order Partial Part1 SEX OF EMPLOYEE -.343 -.158 -.114 JOB SENIORITY .094 .131 .094 AGE OF EMPLOYEE -.313 .066 .047 EDUCATIONAL LEVEL .659 .513 .426 WORK EXPERIENCE -.391 -.071 -.051 MINORITY .224 .151 .109 CLASSIFICATIONa. Dependent Variable: CURRENT SALARY 15
16. 16. Residuals Statisticsa Minimum Maximum Mean Std. Deviation NPredicted Value 8323.94 31453.22 19814.33 5646.471 30Residual -7812.773 20206.270 .000 5752.046 30Std. Predicted Value -2.035 2.061 .000 1.000 30Std. Residual -1.210 3.128 .000 .891 30a. Dependent Variable: CURRENT SALARY 16
17. 17. Correlations MINORITY EDUCATION WORK CURRENT SEX OF JOB AGE OF CLASSIFICAT AL LEVEL EXPERIENCE SALARY EMPLOYEE SENIORITY EMPLOYEE IONPearson CURRENT .659 -.391 1.000 -.343 .094 -.313 .224Correlation SALARY -.391 SEX OF -.274 .271 -.343 1.000 -.225 .183 .033 EMPLOYEE JOB -.085 -.035 .094 -.225 1.000 .003 .000 SENIORITY AGE OF -.411 .979 -.313 .183 .003 1.000 -.196 EMPLOYEE EDUCATION 1.000 -.497 .659 -.274 -.085 -.411 .188 AL LEVEL WORK -.497 1.000 -.391 .271 -.035 .979 -.200 EXPERIENC E MINORITY .188 -.200 .224 .033 .000 -.196 1.000 CLASSIFICA TIONSig. (1-tailed) CURRENT .000 .016 . .032 .311 .046 .117 SALARY SEX OF .071 .074 .032 . .116 .166 .432 EMPLOYEE JOB .327 .428 .311 .116 . .494 .498 SENIORITY AGE OF .012 .000 .046 .166 .494 . .150 EMPLOYEE EDUCATION . .003 .000 .071 .327 .012 .160 AL LEVEL WORK .003 . .016 .074 .428 .000 .144 EXPERIENC E MINORITY .160 .144 .117 .432 .498 .150 . CLASSIFICA TION 17
18. 18. As here the D-W Statistic is 2.168 which is very near to 2. We know that if D-WStatistic is 2it indicates zero correlation (ϕ=0) between Error terms. So in ourdata set, there is very low Autocorrelation problem.In solution of Autocorrelation problem, we can apply the LM Test, BKW Test etc. MULTICOLLINEARITY IN MULTIPLE REGRESSIONOne important problem in the application of multiple regression analysis involvesthe possible collinearity of the explanatory variables. This condition refers tosituations in which some of the explanatory variables are highly correlated witheach other.One method of measuring multicollinearity uses the Variance InflationFactor(VIF)For each explanatory variable. We get VIF shown below through SPSS, Coefficientsa Collinearity StatisticsModel Tolerance VIF1 SEX OF EMPLOYEE .734 1.362 JOB SENIORITY .939 1.065 AGE OF EMPLOYEE .033 29.964 WORK EXPERIENCE .032 31.372 MINORITY CLASSIFICATION .950 1.053a. Dependent Variable: EDUCATIONAL LEVEL 18
19. 19. Coefficientsa Collinearity StatisticsModel Tolerance VIF1 SEX OF EMPLOYEE .848 1.179 JOB SENIORITY .924 1.082 AGE OF EMPLOYEE .810 1.235 MINORITY .937 1.068 CLASSIFICATION EDUCATIONAL LEVEL .756 1.322a. Dependent Variable: WORK EXPERIENCE 19
20. 20. Coefficientsa Collinearity StatisticsModel Tolerance VIF1 JOB SENIORITY .918 1.089 AGE OF EMPLOYEE .031 32.365 MINORITY .947 1.056 CLASSIFICATION EDUCATIONAL LEVEL .572 1.749 WORK EXPERIENCE .028 35.927a. Dependent Variable: SEX OF EMPLOYEE 20
21. 21. Coefficientsa Collinearity StatisticsModel Tolerance VIF1 AGE OF EMPLOYEE .028 35.540 MINORITY .938 1.066 CLASSIFICATION EDUCATIONAL LEVEL .602 1.662 WORK EXPERIENCE .025 40.063 SEX OF EMPLOYEE .755 1.324a. Dependent Variable: JOB SENIORITY 21
22. 22. Coefficientsa Collinearity StatisticsModel Tolerance VIF1 MINORITY .938 1.066 CLASSIFICATION EDUCATIONAL LEVEL .721 1.388 WORK EXPERIENCE .718 1.392 SEX OF EMPLOYEE .890 1.124a. Dependent Variable: AGE OF EMPLOYEE 22
23. 23. Coefficientsa Collinearity StatisticsModel Tolerance VIF1 EDUCATIONAL LEVEL .610 1.641 WORK EXPERIENCE .025 40.044 SEX OF EMPLOYEE .763 1.311 AGE OF EMPLOYEE .028 35.539a. Dependent Variable: MINORITY CLASSIFICATION 23
24. 24. The tolerance for a variable is (1 - R-squared) for the regression of that variableon all the other independents, ignoring the dependent. When tolerance is closeto 0 there is high multicollinearity of that variable with other independents andthe coefficients will be unstable.VIF is the variance inflation factor, which is simply the reciprocal of tolerance.Therefore, when VIF is high there is high multicollinearity and instability of thecoefficients. 24
25. 25. As a rule of thumb, if tolerance is less than .20, a problem with multicollinearity isindicated.From above graph and considering VIF results, we can interpret there is very highcollinearity among the independent variables.We can solve this problem through, • Ridge Regression • Principle component Regression • Dropping the most influential variables • Using Ratios or First Differences • Using Extraneous Estimates • Getting more dataConcluding Comments :By analyzing the Multiple Regression, Considering the R2 (0.491) value ,we caninfer that for overall estimation this model is not strong.Again, we have found that the model for Salary estimation for Employee of Coca-Cola company includes almost all collinear variables. But this model is very usefulconsidering for having very low Heteroskedasticity and Autocorrelation problem. 25
26. 26. 26
27. 27. 26