Lesson 8 Linear Correlation And Regression

12,897 views

Published on

Published in: Technology
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
12,897
On SlideShare
0
From Embeds
0
Number of Embeds
58
Actions
Shares
0
Downloads
867
Comments
0
Likes
18
Embeds 0
No embeds

No notes for slide

Lesson 8 Linear Correlation And Regression

  1. 1. Simple Linear Regression and Correlation Teaching Assistant: Zuo Xiaoyu Chapter 8
  2. 2. Outline <ul><li>Discussion part </li></ul><ul><li>Steps of performing correlation and regression </li></ul><ul><li>The Distinguish and connection Between Linear Correlation and Regression? </li></ul><ul><li>Experiment part (incorporated in discussion part) </li></ul><ul><li>Simple linear correlation </li></ul><ul><li>Simple linear regression </li></ul><ul><li>Exercise part </li></ul>
  3. 3. Discussion part
  4. 4. Case <ul><li>In a study of the relationship between plasma amphetamine levels and amphetamine-induced psychosis, 10 psychosis amphetamine abusers underwent psychiatric evaluation and were assigned a psychosis intensity score. At the same time, plasma amphetamine levels in these patients were determined. The results are shown in Table 8-1 </li></ul>Data file: discussion.sav
  5. 5. Table 8-1 psychosis intensity scores and plasma amphetamine levels for 10 chronic amphetamine abusers 475 55 10 350 40 9 200 15 8 425 50 7 400 35 6 450 45 5 150 15 4 250 20 3 300 30 2 150 10 1 Plasma amphetamine mg/ml (X) Psychosis intensity score (Y) patient
  6. 6. Question 1 <ul><li>Is there an intuitive relationship between plasma amphetamine levels and amphetamine-induced psychosis ? </li></ul>Scatter plot diagram Both variables are random
  7. 7. Procedure <ul><li>8.1.2 Data File </li></ul><ul><li>Variable Name: x; Variable Label: Psychosis intensity scores </li></ul><ul><li>Variable Name: y; Variable Label: Plasma amphetamine (mg/ml) </li></ul><ul><li>8.1.3 Procedure </li></ul><ul><li>(1) scatter diagram </li></ul><ul><li>from the menus, choose: Analyze Graphs Scatter </li></ul><ul><li>In scatter plot box, choose “simple”, click on button. </li></ul><ul><li>In simple Scatter plot box, move y to the box of “Y Axis” and move x to the box of “X Axis”, click on button. </li></ul>
  8. 8. Procedure
  9. 9. Scatter diagram
  10. 10. Different types of relation
  11. 11. Question 2 <ul><li>How to quantify the relationship between plasma amphetamine levels and amphetamine-induced psychosis ? </li></ul>Correlation coefficient
  12. 12. Procedure <ul><li>(2) From the menus, choose: Analyze Correlate Bivariate, open “Bivariate Correlations” dialog box; move y and x to the “Variable” box; choose “Pearson” for Correlation Coefficients (default), or choose “spearman” if the variable are not normal distributed; click on button. </li></ul>
  13. 14. Output and Interpretation Pearson correlation Spearman correlation
  14. 15. Correlation Coefficient <ul><li>Pearson correlation coefficient </li></ul><ul><li>Spearman’s rank correlation coefficient </li></ul><ul><li>Both X and Y are random </li></ul><ul><li>X and Y follow bivariate normal distribution </li></ul>
  15. 16. Spearman’s rank correlation coefficient <ul><li>It is useful to: </li></ul><ul><li>ranked data </li></ul><ul><li>As well as measurement data </li></ul><ul><li>---- not follow a normal distribution ; </li></ul><ul><li>or not sure about the distribution; </li></ul><ul><li>or not precisely measured </li></ul><ul><li>or X or Y are ordinal variables </li></ul>
  16. 17. The direction of correlation? -- positive or negative The strength of correlation? high or not? -- Is the absolute value big enough? Complete correlation : +1 or -1, Understanding the r
  17. 18. Question 3 <ul><li>Can we draw a conclusion that plasma amphetamine levels are correlated with amphetamine-induced psychosis in population? </li></ul><ul><li>What is the actual situation in population? </li></ul>Hypothesis testing on r Interval estimate of ρ
  18. 19. Hypothesis testing and interval estimation <ul><li>t test (Assume normal distribution) H 0 : ρ =0, H 1 : ρ ≠0 </li></ul><ul><li>Interval estimation </li></ul>Inverse  of hyperbolic tangent
  19. 20. Short summary <ul><li>Scatter plot diagram </li></ul><ul><li>Compute correlation index </li></ul><ul><li>(descriptive) </li></ul><ul><li>Is the index statistically significant? -----hypothesis testing </li></ul><ul><li>(inference) </li></ul><ul><li>Interpretation of correlation coefficient </li></ul><ul><li>(application) </li></ul>
  20. 21. Question 4 <ul><li>Could we predict the psychosis intensity score from the plasma amphetamine levels ? </li></ul><ul><li>EX: Could we estimate and predict the psychosis intensity score when the plasma amphetamine levels is 440 and 460? </li></ul>Linear regression
  21. 22. Procedure <ul><li>From the menus, choose: Analyze Regression Linear, open “Linear Regression” dialog box; move y to “Dependent” box and move x to “Independent” box; click on button. </li></ul>
  22. 23. Y X
  23. 24. Output and Interpretation Intercept and slope
  24. 25. Question 5 <ul><li>Could this regression equation be established in the our studying population? </li></ul><ul><li>Could we use this regression equation to predict the psychosis intensity score when the plasma amphetamine levels is 440 and 460, respectively? </li></ul>Hypothesis testing on the total equation--ANOVA
  25. 26. Output and Interpretation ANOVA result
  26. 27. Question 6 <ul><li>What is the proportion of the psychosis intensity score could we explain from the plasma amphetamine levels ? </li></ul><ul><li>Could we view the plasma amphetamine levels as the influence factor of the amphetamine-induced psychosis ? </li></ul>R square Hypothesis testing on regression coefficient--- t -test
  27. 28. Output and Interpretation R square Hypothesis testing on the regression coefficient
  28. 29. Short summary <ul><li>Scatter plot diagram </li></ul><ul><li>Compute the slope and intercept of sample </li></ul><ul><li>(descriptive) </li></ul><ul><li>Is the regression equation significant? -----ANOVA </li></ul><ul><li>(inference) </li></ul><ul><li>Is the regression coefficient significant? -----one sample t -test </li></ul><ul><li>(inference) </li></ul><ul><li>Interpretation and application of the regression model. </li></ul><ul><li>(application) </li></ul>
  29. 30. Basic assumptions ---- LINE <ul><li>(1) L inear : There exists a linear tendency between the dependent variable and the independent variable </li></ul><ul><li>(2) I ndependent : The individual observations are independent each other </li></ul><ul><li>(3) N ormal : Given the value of, the corresponding follows a normal distribution </li></ul><ul><li>(4) E qual variances : The variances of for different values of are all equal, denoted with . </li></ul>
  30. 31. Pre-requisite for linear regression <ul><li>(1) Linear : There exists a linear tendency between the dependent variable and the independent variable </li></ul><ul><li>(2) Independent : The individual observations are independent each other </li></ul><ul><li>(3) Normal : Given the value of, the corresponding follows a normal distribution </li></ul><ul><li>(4) Equal variances : The variances of for different values of are all equal, denoted with . </li></ul>
  31. 32. Summary of discussion part
  32. 33. Two types of questions: <ul><li>Whether there is a linear relationship? </li></ul><ul><li>-- Linear correlation </li></ul><ul><li>How to predict one variable by another variable? </li></ul><ul><li>-- Linear regression </li></ul>
  33. 34. Summary <ul><li>The Distinguish and connection Between Linear Correlation and Regression? </li></ul><ul><li>Basic concepts </li></ul><ul><li>Basic assumptions for data </li></ul><ul><li>Correlation Coefficient and Regression Coefficient </li></ul>
  34. 35. Summary <ul><li>Assumptions: </li></ul><ul><li>Correlation: Both X and Y are random </li></ul><ul><li>Regression: (LINE) </li></ul><ul><li>Y must be random </li></ul><ul><li>X could be random or not random </li></ul><ul><li>Correlation Coefficient (r) </li></ul>
  35. 36. Summary <ul><li>Linear Regression Equation, Regression Coefficient (b) </li></ul><ul><li>Try to estimate  and  , getting </li></ul>
  36. 37. Summary <ul><li>Connection: When both X and Y are random </li></ul><ul><li>1) Same sign for Correlation Coefficient </li></ul><ul><li>and Regression Coefficient </li></ul><ul><li>2) t tests are equivalent </li></ul><ul><li>t r = t b </li></ul><ul><li>3) Determination Coefficient </li></ul><ul><li>R=SS regression /SS total </li></ul><ul><li>R=r 2 </li></ul>
  37. 38. 3. For bivariate normal distributed variables, regression could be used to interpret correlation: The high determine coefficient indicates the X is closely correlated to Y.   2. The hypothesis testing for correlation coefficient and regression coefficient is equivalent. 1. The correlation coefficient has the same sign as regression coefficient. connection 1.investigate the quantitative dependency relationship between variables 2.prediction 3. variable selection investigate the quantitative association Application Independent variable be a normally distributed random variable. Bivariate normal distribution Pre-requisite Investigate the dependency relationship between the independent and dependent variables. Quantify the relationship between two or more variables. Implication Regression Correlation
  38. 39. Discussion——true or false? <ul><li>1. Put any two variables together for correlation and regression ? </li></ul><ul><li>( × They must have some relation in subject matter) </li></ul><ul><li>2. Correlation and regression mean causality? </li></ul><ul><li>( × sometimes may be indirect relation or even no any real relation) </li></ul><ul><li>3. A big value of r means a big regression coefficient b ? </li></ul><ul><li>( × ) </li></ul><ul><li>4. To reject means that the correlation is strong? </li></ul><ul><li>(× just only means ) </li></ul>
  39. 40. Discussion——true or false? <ul><li>5. A regression equation is statistically significant means that one can well predict Y by X ? </li></ul><ul><li>( × well predict or not depends on coefficient of determination) </li></ul><ul><li>6. The regression equation is allowed to be applied beyond the range of the data set ? </li></ul><ul><li>( × ) </li></ul>
  40. 41. <ul><li>To explore the correlation between </li></ul><ul><li>the heights of father and son, 20 graduate male </li></ul><ul><li>students were randomly selected from a name </li></ul><ul><li>list of graduates in a high school. The heights </li></ul><ul><li>(cm) of fathers and sons were measured. </li></ul><ul><li>(1) What is the relationship of the heights of father and son? </li></ul><ul><li>(2) Can we predict the son’s height if a father with height 166 cm? </li></ul>Exercise
  41. 42. Heights (cm) of 20 pairs of father and son
  42. 43. About Homework <ul><li>Test for D ifference </li></ul><ul><li>—— treated as paired design </li></ul><ul><li>Test for A ssociation </li></ul><ul><li>—— treated as independent design </li></ul>McNemar
  43. 44. Assignment <ul><li>P 129 N. 5 </li></ul>
  44. 45. <ul><li>Thank you!!! </li></ul>

×