Upcoming SlideShare
Loading in...5

Like this? Share it with your network








Total Views
Views on SlideShare
Embed Views



1 Embed 41 41



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Regression Presentation Transcript

  • 1. The World of Linear Regression
  • 2. What is regression analysis?
    • Regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables.
    • The regression framework is at the heart of empirical social and political science research.
    • Regression analysis acts as a statistical surrogate for controlled experiments, and can be used to make causal inferences.
  • 3. Regression models
    • Researchers translate verbal theories, hypotheses, even hunches into models.
    • A model shows how and under what conditions two (or more) variables are related.
    • A regression model with a dependent variable and one independent variable is known as a bivariate regression model.
    • A regression model with a dependent variable and two or more independent variables and/or control variables is known as a multivariate regression model.
  • 4. Scatterplots
    • A scatterplot graphs the sample observations by placing them along the X,Y axis.
    • The X axis generally represents the values of the independent variable, and the Y axis usually represents the value of the dependent variable.
    • X is the horizontal axis; Y is the vertical axis.
  • 5. Scatterplots
    • Scatterplots allow you to study the flow of the dots, or the relationship between the two variables
    • Scatterplots allow political scientists to identify
    • -- positive or negative relationships -- monotonic or linear relationships
  • 6. Scatterplot
  • 7.  
  • 8. Regression Equation The linear equation is specified as follows: Y = a + bX Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0) b = is the slope of the regression line
  • 9. Regression Equation
    • Y = a + bX
    • a can be positive or negative. In high school algebra, you may have referred to a as the intercept. This is because a is the point at which the slope line passes through the Y axis.
    • b (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship and a negative coefficient denotes a negative relationship.
    • The substantive interpretation of the slope coefficient depends on the variables involved, how they are coded and the scale of the variables. Larger coefficients may indicate a stronger relationship, but not necessarily.
  • 10. The Regression Model
    • The goal of regression analysis is to find an equation which “best fits” the data.
    • In regression, an equation is found in such a way that its graph is a line that minimizes the squared vertical distances between the data points and the lines drawn.
  • 11.
    • d 1 and d 2 represent the distances of observed data points from an estimated regression line.
    • Regression analysis uses a mathematical procedure that finds the single line that minimizes the squared distances from the line.
  • 12. Regression Equation The standard regression equation is the same as the linear equation with one exception: the error term. Y = α + βX + ε Where Y = dependent variable α = constant term β = slope or regression coefficient X = independent variable ε = error term
  • 13. Regression Equation This regression procedure is known as ordinary least squares (OLS). α (the constant term) is interpreted the same as before β (the regression coefficient) tells how much Y changes if X changes by one unit. The regression coefficient indicates the direction and strength of the relationship between the two quantitative variables.
  • 14. Regression Equation The error ( ε ) indicates that observed data do not follow a neat pattern that can be summarized with a straight line. A observation's score on Y can be broken into two parts: α + βX is due to the independent variable ε is due to error Observed value = Predicted value (α + βX) + error (ε)
  • 15. Regression Equation The error is the difference between the predicted value of Y and the observed value of Y. This difference is known as the residual .
  • 16.  
  • 17.  
  • 18. Regression Interpretation For the data on the scatterplot: Y (depvar) = telephone lines for 1,000 people X (indvar) = Infant mortality We can use regression analysis to examine the relationship between communication capacity (measured here as telephone lines per capita) and infant mortality.
  • 19. Regression Interpretation In this analysis, the intercept and regression coefficient are as follows: α (or constant) = 121 Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population. β = -1.25 Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.
  • 20. Regression Interpretation
  • 21. Regression Interpretation
    • These calculations can be useful because they allow you to make useful predictions about the data.
    • An increase from 1 to 10 deaths per 1,000 live births is associated with a decline of 119.75 – 108.5 = 11.25 telephone lines.
  • 22.
    • Interpreting the meaning of a coefficient can be tricky. What does a coefficient of -1.25 mean?
    • -- Well, it means a negative relationship between infant mortality and phone lines.
    • -- It means for every additional infant death there is a decrease of 1.25 phone lines.
    • This information is useful, but is there a measure that tells us how good a job we do predicting the observed values?
  • 23. Scatterplot
  • 24. R-squared
    • Yes, the measure is known as R-squared (or R 2 ).
    • As stated earlier, there are two component parts of the total deviation from the mean, which is usually measured as the sum of squares (or total variance).
    • The difference between the mean and the predicted value of Y. This is the explained part of the deviation, or (Regression Sum of Squares).
    • The second component is the residual sum of squares (Residual Sum of Squares), which measures prediction errors. The is the unexplained part of the deviation.
  • 25. R-squared
    • Total SS = Regression SS + Residual SS In other words, the total sum of squares is the sum of the regression sum of squares and the residual sum of squares.
    • R 2 = Regression SS/TSS The more variance the regression model explains, the higher the R 2 .
  • 26.  
  • 27.