Your SlideShare is downloading. ×
0
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Regression
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Regression

7,374

Published on

Published in: Technology, Economy & Finance
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
7,374
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
446
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The World of Linear Regression
  • 2. What is regression analysis? <ul><li>Regression analysis is a technique for measuring the relationship between two interval- or ratio-level variables. </li></ul><ul><li>The regression framework is at the heart of empirical social and political science research. </li></ul><ul><li>Regression analysis acts as a statistical surrogate for controlled experiments, and can be used to make causal inferences. </li></ul>
  • 3. Regression models <ul><li>Researchers translate verbal theories, hypotheses, even hunches into models. </li></ul><ul><li>A model shows how and under what conditions two (or more) variables are related. </li></ul><ul><li>A regression model with a dependent variable and one independent variable is known as a bivariate regression model. </li></ul><ul><li>A regression model with a dependent variable and two or more independent variables and/or control variables is known as a multivariate regression model. </li></ul>
  • 4. Scatterplots <ul><li>A scatterplot graphs the sample observations by placing them along the X,Y axis. </li></ul><ul><li>The X axis generally represents the values of the independent variable, and the Y axis usually represents the value of the dependent variable. </li></ul><ul><li>X is the horizontal axis; Y is the vertical axis. </li></ul>
  • 5. Scatterplots <ul><li>Scatterplots allow you to study the flow of the dots, or the relationship between the two variables </li></ul><ul><li>Scatterplots allow political scientists to identify -- positive or negative relationships -- monotonic or linear relationships </li></ul>
  • 6. Scatterplot
  • 7.  
  • 8. Regression Equation The linear equation is specified as follows: Y = a + bX Where Y = dependent variable X = independent variable a = constant (value of Y when X = 0) b = is the slope of the regression line
  • 9. Regression Equation <ul><li>Y = a + bX </li></ul><ul><li>a can be positive or negative. In high school algebra, you may have referred to a as the intercept. This is because a is the point at which the slope line passes through the Y axis. </li></ul><ul><li>b (the slope coefficient) can be positive or negative. A positive coefficient denotes a positive relationship and a negative coefficient denotes a negative relationship. </li></ul><ul><li>The substantive interpretation of the slope coefficient depends on the variables involved, how they are coded and the scale of the variables. Larger coefficients may indicate a stronger relationship, but not necessarily. </li></ul>
  • 10. The Regression Model <ul><li>The goal of regression analysis is to find an equation which “best fits” the data. </li></ul><ul><li>In regression, an equation is found in such a way that its graph is a line that minimizes the squared vertical distances between the data points and the lines drawn. </li></ul>
  • 11. <ul><li>d 1 and d 2 represent the distances of observed data points from an estimated regression line. </li></ul><ul><li>Regression analysis uses a mathematical procedure that finds the single line that minimizes the squared distances from the line. </li></ul>
  • 12. Regression Equation The standard regression equation is the same as the linear equation with one exception: the error term. Y = α + βX + ε Where Y = dependent variable α = constant term β = slope or regression coefficient X = independent variable ε = error term
  • 13. Regression Equation This regression procedure is known as ordinary least squares (OLS). α (the constant term) is interpreted the same as before β (the regression coefficient) tells how much Y changes if X changes by one unit. The regression coefficient indicates the direction and strength of the relationship between the two quantitative variables.
  • 14. Regression Equation The error ( ε ) indicates that observed data do not follow a neat pattern that can be summarized with a straight line. A observation's score on Y can be broken into two parts: α + βX is due to the independent variable ε is due to error Observed value = Predicted value (α + βX) + error (ε)
  • 15. Regression Equation The error is the difference between the predicted value of Y and the observed value of Y. This difference is known as the residual .
  • 16.  
  • 17.  
  • 18. Regression Interpretation For the data on the scatterplot: Y (depvar) = telephone lines for 1,000 people X (indvar) = Infant mortality We can use regression analysis to examine the relationship between communication capacity (measured here as telephone lines per capita) and infant mortality.
  • 19. Regression Interpretation In this analysis, the intercept and regression coefficient are as follows: α (or constant) = 121 Means that when X (infant deaths) is 0 deaths, there are 121 phone lines per 1,000 population. β = -1.25 Means that when X (deaths) increases by 1, there is a predicted or estimated decrease of 1.25 phone lines.
  • 20. Regression Interpretation
  • 21. Regression Interpretation <ul><li>These calculations can be useful because they allow you to make useful predictions about the data. </li></ul><ul><li>An increase from 1 to 10 deaths per 1,000 live births is associated with a decline of 119.75 – 108.5 = 11.25 telephone lines. </li></ul><ul><li>Interpreting the meaning of a coefficient can be tricky. What does a coefficient of -1.25 mean? -- Well, it means a negative relationship between infant mortality and phone lines. -- It means for every additional infant death there is a decrease of 1.25 phone lines. </li></ul><ul><li>This information is useful, but is there a measure that tells us how good a job we do predicting the observed values? </li></ul>
  • 22. Scatterplot
  • 23. R-squared <ul><li>Yes, the measure is known as R-squared (or R 2 ). </li></ul><ul><li>As stated earlier, there are two component parts of the total deviation from the mean, which is usually measured as the sum of squares (or total variance). </li></ul><ul><li>The difference between the mean and the predicted value of Y. This is the explained part of the deviation, or (Regression Sum of Squares). </li></ul><ul><li>The second component is the residual sum of squares (Residual Sum of Squares), which measures prediction errors. The is the unexplained part of the deviation. </li></ul>
  • 24. R-squared <ul><li>Total SS = Regression SS + Residual SS In other words, the total sum of squares is the sum of the regression sum of squares and the residual sum of squares. </li></ul><ul><li>R 2 = Regression SS/TSS The more variance the regression model explains, the higher the R 2 . </li></ul>
  • 25.  
  • 26.  

×