Chapter 08correlation


Published on

Published in: Business
  • Be the first to comment

Chapter 08correlation

  1. 1. Chapter 8 Simple Linear Correlation
  2. 2. <ul><li>Up to now, the statistical methods you have learnt concern with single variable only </li></ul><ul><li>Such as </li></ul><ul><li>To estimate the average height among high school students </li></ul><ul><li>To compare the average height of high school students between city and country side </li></ul><ul><li>However, the relationship between two variables is often concerned in practice: </li></ul><ul><li>Example: For high school students, </li></ul><ul><li>Height and Age – linear relation? </li></ul><ul><li>Height and Weight – linear relation? </li></ul>
  3. 3. <ul><li>In this chapter, we are going to study: </li></ul><ul><li>two variables, </li></ul><ul><li>linear relationship between two variables </li></ul><ul><li>Two types of questions: </li></ul><ul><li>Whether there is a linear relationship? </li></ul><ul><li>-- Linear correlation </li></ul><ul><li>How to predict one variable by another variable? </li></ul><ul><li>-- Linear regression </li></ul>
  4. 4. <ul><li>Example 7.1   To explore the correlation between </li></ul><ul><li>systolic pressure and diastolic pressure (mmHg), </li></ul><ul><li>665 girls aged from 6 to 10 years were measured. </li></ul><ul><li>Two random variables X and Y </li></ul><ul><li>Sample: 665 girls </li></ul><ul><li>The individuals in the sample should be </li></ul><ul><li>independent each other. </li></ul>
  5. 5. <ul><li>Example 7.2 To explore the correlation between </li></ul><ul><li>the heights of father and son, 20 graduate male </li></ul><ul><li>students were randomly selected from a name </li></ul><ul><li>list of graduates in a high school. The heights </li></ul><ul><li>(cm) of fathers and sons were measured </li></ul><ul><li>Table 7.1 Heights (cm) of 20 pairs of father and son </li></ul>
  6. 6. 12.1 Linear correlation
  7. 7. Scatter Diagram : Fig. 7.1 Scatter diagram of systolic and diastolic blood pressures (mmHg) of 665 girls of 6 to 10 years old
  8. 10. 7.2 Correlation Coefficient 7.2.1 Population correlation coefficient <ul><li>Pearson’s product-moment linear correlation coefficient: </li></ul><ul><li>The mean of “product of the two standardized variables” </li></ul><ul><li>---- simple correlation coefficient </li></ul>-- covariance between X and Y
  9. 11. Pearson’s product moment sample correlation coefficient r 7.2.2 Sample correlation coefficient
  10. 12. A measurement of linear relationship: 1) Whether there is a correlation If the correlation coefficient is 0 or not big enough -- no correlation 2) If correlation coefficient is big enough The direction of correlation? -- positive or negative The strength of correlation? high or not? -- Is the absolute value big enough? Complete correlation : +1 or -1,
  11. 13. Example 7.3 Calculate the correlation coefficient between the heights of father and son.
  12. 14. <ul><li>r is sample correlation coefficient, change from sample to sample </li></ul><ul><li>There is a population correlation coefficient, denoted by ρ </li></ul><ul><li>Question : Whether ρ =0 or not? </li></ul><ul><li>Assumption: </li></ul><ul><li>X and Y follow a bi-variable normal distribution </li></ul>7.3 Inference on Correlation Coefficient 7.3.1 Hypothesis test
  13. 15. <ul><li>H 0 : ρ =0, H 1 : ρ ≠0 α =0.05 </li></ul><ul><li>(1) Checking a special table (Table 8 in appendix 2 ) </li></ul><ul><li>H 0 is rejected </li></ul><ul><li>---- positive correlation between the heights of father and son. </li></ul><ul><li>Question: </li></ul><ul><li>Since , very small, can we say </li></ul><ul><li>the correlation is very strong ? </li></ul><ul><li>Does a small P value mean that the correlation </li></ul><ul><li>is strong ? </li></ul>
  14. 16. Question : If r =0.90, can you claim the two variables are correlated each other? Table 8 Critical values for r
  15. 17. (2) t test (Assume normal distribution) H 0 : ρ =0, H 1 : ρ ≠0 <ul><li>If P -value < α , then reject H 0 , conclude that </li></ul><ul><li>the population correlation coefficient is significantly different from 0. </li></ul><ul><li> =20-2=18, </li></ul><ul><li>The population correlation coefficient might not be 0. </li></ul>
  16. 18. 7.3.2 Interval estimation <ul><li>Assumption: X and Y follow a bi-variable normal distribution. </li></ul><ul><li>Pre-knowledge: </li></ul><ul><li>(1) hyperbolic tangent and its inverse </li></ul><ul><li>Hyperbolic tangent ( 双曲正切 ) </li></ul><ul><li>Inverse  of hyperbolic tangent ( 反双曲正切 ) </li></ul>
  17. 19. <ul><li>approximately follows a normal distribution </li></ul><ul><li>Confidence interval of : </li></ul><ul><li>or </li></ul><ul><li>Taking a transformation of </li></ul>
  18. 20. Example 7.5 After getting , please find out a 95% confidence interval for population correlation coefficient .
  19. 21. <ul><li>It is useful to: </li></ul><ul><li>ranked data </li></ul><ul><li>As well as measurement data </li></ul><ul><li>---- not follow a normal distribution; </li></ul><ul><li>or not sure about the distribution; </li></ul><ul><li>or not precisely measured </li></ul><ul><li>or X or Y are ordinal variables </li></ul>7.4 Rank Correlation 7.4.1 Spearman ’ s rank correlation coefficient
  20. 22. Spearman ’ s rank correlation coefficient <ul><li>sort (x 1 ,x 2 ,…,x n ), get rank p i for x i </li></ul><ul><li>sort (y 1 ,y 2 ,…,y n ), get rank q i for y i </li></ul><ul><li>n pairs of observations, (x 1 ,y 1 ), …, (x n ,y n ) </li></ul>
  21. 23. Example 7.6 An etiology study on liver cancer has collected data on liver-cancer-specific death rate and the relative content of aflatoxin in certain food for 10 Counties. Putting the ranks into the formula of Spearman’s correlation coefficient
  22. 24. Table 9 Critical value for Spearman’s rank correlation coefficient 7.4.2 Hypothesis test for (1) Checking a special table (Table 9) P <0.02 and it is significant
  23. 25. <ul><li>(2) t test </li></ul><ul><li>Same as the t test for Pearson’s correlation </li></ul><ul><li>coefficient </li></ul><ul><li>If p is small, then reject </li></ul>