1. Quantitative Analysis for Managers<br />Regression analysis application<br />Instructor: Prof. MINE AYSEN DOYRAN<br />Student: RecepMaz<br />
2. Regression analysis<br />Regression analysis includes any techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. <br />Regression analysis helps one understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held fixed. <br />Regression analysis estimates the conditional expectation of the dependent variable given the independent variables — that is, the average value of the dependent variable when the independent variables are held fixed. <br />
3. Regression analysis<br />The focus is on a quantile, or other location parameter of the conditional distribution of the dependent variable given the independent variables. In all cases, the estimation target is a function of the independent variables called the regression function. <br />In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function, which can be described by a probability distribution.<br />
4. Regression analysis<br />Regression analysis is widely used for prediction and forecasting, where its use has substantial overlap with the field of machine learning. <br />Regression analysis is also used to understand which among the independent variables are related to the dependent variable, and to explore the forms of these relationships. <br />
5. Regression analysis<br />In restricted circumstances, regression analysis can be used to infer causal relationships between the independent and dependent variables.<br />Simple linear regression models have only two variables<br />Multiple regression models have more variables<br />
6. Regression models involve the following variables<br />The variable to be predicted is called the dependent variable, Y<br /> Sometimes called the response variable<br />The value of this variable depends on the value of the independent variable, X<br />Sometimes called the explanatory or predictor variable, control variable<br />A regression model relates Y to a function of X <br />
7. Independent variable<br />Independent variable<br />Dependent variable<br />= +<br />Introduction regression models<br />dependent variable, Y<br />independent variable, X<br />A regression model relates Y to a function of X<br />
8. Testing the Model for Significance<br />If the F-statistic is large, the significance level (P-value) will be low, indicating it is unlikely this would have occurred by chance<br />If P value of F Statistic (Significance F) is smaller than 0.05 (5%), it means that your regression model is statistically significant. <br />
9. Testing the Model for Significance<br />The best model is a statistically significant model with a high r2 and few variables<br />As more variables are added to the model, the r2-value usually increases<br />For this reason, the adjusted r2 value is often used to determine the usefulness of an additional variable<br />The adjusted r2 takes into account the number of independent variables in the model<br />
10. Testing the Model for Significance<br /> As the number of variables increases, the adjusted r2 gets smaller unless the increase due to the new variable is large enough to offset the change in k (number of independent variables) <br />
11. Testing the Model for Significance<br /> <br />In general, if a new variable increases the adjusted r2, it should probably be included in the model<br />In some cases, variables contain duplicate information<br />When two independent variables are correlated, they are said to be collinear<br />When more than two independent variables are correlated, multicollinearity exists<br />When multicollinearity is present, hypothesis tests for the individual coefficients are not valid but the model may still be useful<br />
12. Hypothesis statement , dependent variable and independent variable <br />Dependent variable……: Total number of white people between 18 to 64 years<br />Independent variable…: Number of white people below poverty level between 18 to 64 years<br />Hypothesis statement..: Hypothesis statement is that while population of white adult people (18 to 64 years) increases, number of white people between 18 to 64 years who are living below poverty level decrease by the years.<br />
13. INTERPREATION OF REGRESSION OUTPUTS<br /> R Square<br />R square= 0.024884311=2.5% of variation in total number of white people between 18 to 64 years is explained by white people below poverty level . This value is indicating weak fitness.<br />I f R square is too high (0,8/0,9…) we will have multicollinearity problem. Which means our variables correlated each other. Fortunately, our R square value is not too high and it is also between 0 and 1. <br />
14. INTERPREATION OF REGRESSION OUTPUTS<br /> Adjusted R square<br />Adjusted R Square= -0.0834618768434626=-8.3% this value is indicating weak fitness.<br />If the number of observations is small we may obtain a higher value of r square. This can provide a very misleading indicator of goodness of fit. That is why many researchers use adjusted R square value instead.<br />If the adjusted R square value higher than R square value we may face multicollinearity problem.<br />Adjusted R Square=-8.3% < R square=2.5% . We don’t have multicollinearity problem.<br />
15. INTERPREATION OF REGRESSION OUTPUTS<br />Significance F<br />The most important indicator to analysis regression outputs significance F. This value refers statical significant of regression model. This value provides evidence of existence of a linear relationship between our two variables. It also provides a measure of the total variation explained by the regression relative to the total unexplained variation. The higher the significance F, the better the overall fit of the regression line. Significance F values of 5% (0.05) or less are generally considered statistically significant. Like P values, lower the significant of the value, the more confident we can be of the overall significance of the regression equation. <br />Interpretation of Significance F is the low number means there is only 64% chance that our regression model fits the data purely by accident. <br />Significance F=0.643195730271619=64% > 5% that means ,there is no significant relationship between our two variables.<br />
16. INTERPREATION OF REGRESSION OUTPUTS<br />P value<br />P value=0.000253490931854696=0.025% .It indicates high statistical significance of our independent variables individually. It shows how confident we are in your analysis. For a P value to be statistically significant, it has to be; <br />P value=5%=0.05<br />P value=1%=0.01<br />P value=10%=0.10<br />
Be the first to comment