Your SlideShare is downloading.
×

×

Saving this for later?
Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.

Text the download link to your phone

Standard text messaging rates apply

Like this presentation? Why not share!

7,374

Published on

No Downloads

Total Views

7,374

On Slideshare

0

From Embeds

0

Number of Embeds

1

Shares

0

Downloads

446

Comments

0

Likes

7

No embeds

No notes for slide

- 1. The World of Linear Regression
- 2. What is regression analysis? <ul><li>Regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables. </li></ul><ul><li>The regression framework is at the heart of empirical social and political science research. </li></ul><ul><li>Regression analysis acts as a statistical surrogate for controlled experiments, and can be used to make causal inferences. </li></ul>
- 3. Regression models <ul><li>Researchers translate verbal theories, hypotheses, even hunches into models. </li></ul><ul><li>A model shows how and under what conditions two (or more) variables are related. </li></ul><ul><li>A regression model with a dependent variable and one independent variable is known as a bivariate regression model. </li></ul><ul><li>A regression model with a dependent variable and two or more independent variables and/or control variables is known as a multivariate regression model. </li></ul>
- 4. Scatterplots <ul><li>A scatterplot graphs the sample observations by placing them along the X,Y axis. </li></ul><ul><li>The X axis generally represents the values of the independent variable, and the Y axis usually represents the value of the dependent variable. </li></ul><ul><li>X is the horizontal axis; Y is the vertical axis. </li></ul>
- 5. Scatterplots <ul><li>Scatterplots allow you to study the flow of the dots, or the relationship between the two variables </li></ul><ul><li>Scatterplots allow political scientists to identify -- positive or negative relationships -- monotonic or linear relationships </li></ul>
- 6. Scatterplot
- 7.
- 8. Regression Equation The linear equation is specified as follows: Y = a + bX Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0) b = is the slope of the regression line
- 9. Regression Equation <ul><li>Y = a + bX </li></ul><ul><li>a can be positive or negative. In high school algebra, you may have referred to a as the intercept. This is because a is the point at which the slope line passes through the Y axis. </li></ul><ul><li>b (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship and a negative coefficient denotes a negative relationship. </li></ul><ul><li>The substantive interpretation of the slope coefficient depends on the variables involved, how they are coded and the scale of the variables. Larger coefficients may indicate a stronger relationship, but not necessarily. </li></ul>
- 10. The Regression Model <ul><li>The goal of regression analysis is to find an equation which “best fits” the data. </li></ul><ul><li>In regression, an equation is found in such a way that its graph is a line that minimizes the squared vertical distances between the data points and the lines drawn. </li></ul>
- 11. <ul><li>d 1 and d 2 represent the distances of observed data points from an estimated regression line. </li></ul><ul><li>Regression analysis uses a mathematical procedure that finds the single line that minimizes the squared distances from the line. </li></ul>
- 12. Regression Equation The standard regression equation is the same as the linear equation with one exception: the error term. Y = α + βX + ε Where Y = dependent variable α = constant term β = slope or regression coefficient X = independent variable ε = error term
- 13. Regression Equation This regression procedure is known as ordinary least squares (OLS). α (the constant term) is interpreted the same as before β (the regression coefficient) tells how much Y changes if X changes by one unit. The regression coefficient indicates the direction and strength of the relationship between the two quantitative variables.
- 14. Regression Equation The error ( ε ) indicates that observed data do not follow a neat pattern that can be summarized with a straight line. A observation's score on Y can be broken into two parts: α + βX is due to the independent variable ε is due to error Observed value = Predicted value (α + βX) + error (ε)
- 15. Regression Equation The error is the difference between the predicted value of Y and the observed value of Y. This difference is known as the residual .
- 16.
- 17.
- 18. Regression Interpretation For the data on the scatterplot: Y (depvar) = telephone lines for 1,000 people X (indvar) = Infant mortality We can use regression analysis to examine the relationship between communication capacity (measured here as telephone lines per capita) and infant mortality.
- 19. Regression Interpretation In this analysis, the intercept and regression coefficient are as follows: α (or constant) = 121 Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population. β = -1.25 Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.
- 20. Regression Interpretation
- 21. Regression Interpretation <ul><li>These calculations can be useful because they allow you to make useful predictions about the data. </li></ul><ul><li>An increase from 1 to 10 deaths per 1,000 live births is associated with a decline of 119.75 – 108.5 = 11.25 telephone lines. </li></ul><ul><li>Interpreting the meaning of a coefficient can be tricky. What does a coefficient of -1.25 mean? -- Well, it means a negative relationship between infant mortality and phone lines. -- It means for every additional infant death there is a decrease of 1.25 phone lines. </li></ul><ul><li>This information is useful, but is there a measure that tells us how good a job we do predicting the observed values? </li></ul>
- 22. Scatterplot
- 23. R-squared <ul><li>Yes, the measure is known as R-squared (or R 2 ). </li></ul><ul><li>As stated earlier, there are two component parts of the total deviation from the mean, which is usually measured as the sum of squares (or total variance). </li></ul><ul><li>The difference between the mean and the predicted value of Y. This is the explained part of the deviation, or (Regression Sum of Squares). </li></ul><ul><li>The second component is the residual sum of squares (Residual Sum of Squares), which measures prediction errors. The is the unexplained part of the deviation. </li></ul>
- 24. R-squared <ul><li>Total SS = Regression SS + Residual SS In other words, the total sum of squares is the sum of the regression sum of squares and the residual sum of squares. </li></ul><ul><li>R 2 = Regression SS/TSS The more variance the regression model explains, the higher the R 2 . </li></ul>
- 25.
- 26.

Be the first to comment