Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bivariate linear regression


Published on

in this presentation, I've tried to compile all the details about bivariate linear regression and correlation. This presentation has all the key issues addressed, but those who want to use it have to speak more and verbally describe all the details covered according to the understanding of your audience group. Hope you find it useful

Published in: Education, Technology, Business
  • People used to laugh at me behind my back before I was in shape or successful. Once I lost a lot of weight, I was so excited that I opened my own gym, and began helping others. I began to get quite a large following of students, and finally, I didn't catch someone laughing at me behind my back any longer. CLICK HERE NOW ♥♥♥
    Are you sure you want to  Yes  No
    Your message goes here

Bivariate linear regression

  1. 1. Linear Regression Dr Menaal Kaushal JR II Department of S P M S N Medical College, Agra 1 22-11-2013
  2. 2. Statistical Analysis can be:  Univariate: When Only one variable is studied. E.g Heights of all the IV graders, ages of mothers delivering at a DH, etc. (Measures of Central Tendency, Measures of Dispersion)  Bivariate: When relationship between two variables are studied. e.g. Relationship between height and weight of Every Child in the IV grade; relation between mother’s age & birth weight of her baby, etc.  Multivariate: When relationship between more than two variables are studied. E.g Relationship between height, weight and MAC of every child in the IV grade 2 22-11-2013
  3. 3. Bivariate Regression  Linear Regression: When the data is continuous  Logistic Regression: When the data is categorical, e.g. the research question can be answered as either yes or no category 3 22-11-2013
  4. 4. Levels (Types) of Data  Nominal (Categorical) Measures: Are exhaustive and mutually exclusive (e.g., religion), gender  Ordinal Measures: All of the above plus can be rank-ordered (e.g., social class).  Interval Measures: All of the above plus equal differences between measurement points (temperature in ℃ or ℉ ).  Ratio Measures: All of the above plus a true zero point (weight, Absolute Temperature in Kelvin). 4 22-11-2013
  5. 5. Relationship Between Two Variables  Association: any relation between variables  Positive association: above average values of one variable tend to go with above average values of the other; the scatter slopes up  Negative association: above average values of one variable tend to go with below average values of the other; the scatter slopes down  Linear association: roughly, the scatter diagram is clustered around a straight line. This is Correlation 5 22-11-2013
  6. 6. 6 22-11-2013
  7. 7. [‘p-0 7 22-11-2013
  8. 8. 8 22-11-2013
  9. 9. The “Football” Bivariate Normal Scatter Plot 9 22-11-2013
  10. 10. Can you identify any difference? 10 22-11-2013
  11. 11. How Tightly Clustered Are these Data? 11 22-11-2013
  12. 12. Calculating the Correlation Coefficient 12 22-11-2013
  13. 13. So, How to Calculate r 13 22-11-2013
  14. 14. Formula of Correlation Coefficient Lets Simplify:  Convert the data into Standard units.  Multiply the corresponding standard unit values of x and y  r is the mean of this product 14 22-11-2013
  15. 15. Properties of Correlation Coefficient  The calculations uses only standard units so r is a pure number with no units  -1≤ r ≤ 1  In the extreme cases, r = -1 when the scatter diagram is a perfect straight line sloping down. If r = 1, the scatter diagram is a perfect line sloping up  Switching the variables x and y does not change r. it remains the same 15 22-11-2013
  16. 16.  Adding a constant to one of the lists just slides the scatter diagram so r stays the same  Multiplying one of the lists by a positive constant does not change standard units so r stays the same  Multiplying just one (not both) of the lists by a negative constant switches the signs of the standard units of that variable, so r has the same absolute value but its sign gets switched. 16 22-11-2013
  17. 17. Heteroscadastic Curve 17 22-11-2013
  18. 18. What r can not tell?  Association is not causation. r does not tell “Why”  r is only used for linearly correlated variables. It measures linear association.  This diagram shows a strong relation between x& y, but it is not linear. But r for this diagram comes out to be Zero 18 22-11-2013
  19. 19. Beware of:  Outliers  Tendency for Ecological correlations 19 22-11-2013
  20. 20. Deal with the outliers 20 22-11-2013
  21. 21. Can you find the outlier? 21 22-11-2013
  22. 22. Avoid “Ecological Correlation”: Replacing students by averages can artificially increase clustering. This is not desirable. 22 22-11-2013
  23. 23. Regression  The technique to estimate dependent variable “y”, for a given value of variable “x” when they are linearly associated and the correlation coefficient “r” is known. 23 22-11-2013
  24. 24. Each estimate is at the center of the vertical strip 22-11-2013 24
  25. 25. 25 22-11-2013
  26. 26. The slope of the green line= r 26 22-11-2013
  27. 27. The Equation of Regression  Estimate of y = r* given x (in Standard units)  ⇒ estimate of y- µy = r (x- µx) SDy SDx  Estimate of y= Slope* (x) + intercept  (Here Slope= r* SDy / SDx and intercept= µy-slope*x) 27 22-11-2013
  28. 28. Why call “Regression”  Sir Francis Galton 1822- 1911: “The Galton Effect”  “Those who have high values in one variable tend to be not as high in the second variable”  A eugenicist, who gave the idea of SD and regression  “Fathers who are tall, tend to have sons who are not quite that tall on average”  All data regresses towards “mediocrity”  i.e. regresses towards mean  The Regression Fallacy or Sophomore Slump 28 22-11-2013
  29. 29. 29 22-11-2013
  30. 30. Univariate Normal Bivariate Normal +1 r.m.s. error 68% 68% µx r +1 SD 30 22-11-2013
  31. 31. Residual Plot Regardless of the shape of the scatter diagram: the average of the residuals is Always 0, There is No linear association between residuals and x. The residual plot should not show any trend or linear relation. Good regression: Residual plot should look like a formless 31 22-11-2013 blob around the horizontal axis
  32. 32. Residual Plot as a Diagnostic Tool 32 22-11-2013
  33. 33. Questions?? 33 22-11-2013