Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Chapter 08correlation


Published on

Published in: Business
  • Be the first to comment

Chapter 08correlation

  1. 1. Chapter 8 Simple Linear Correlation
  2. 2. <ul><li>Up to now, the statistical methods you have learnt concern with single variable only </li></ul><ul><li>Such as </li></ul><ul><li>To estimate the average height among high school students </li></ul><ul><li>To compare the average height of high school students between city and country side </li></ul><ul><li>However, the relationship between two variables is often concerned in practice: </li></ul><ul><li>Example: For high school students, </li></ul><ul><li>Height and Age – linear relation? </li></ul><ul><li>Height and Weight – linear relation? </li></ul>
  3. 3. <ul><li>In this chapter, we are going to study: </li></ul><ul><li>two variables, </li></ul><ul><li>linear relationship between two variables </li></ul><ul><li>Two types of questions: </li></ul><ul><li>Whether there is a linear relationship? </li></ul><ul><li>-- Linear correlation </li></ul><ul><li>How to predict one variable by another variable? </li></ul><ul><li>-- Linear regression </li></ul>
  4. 4. <ul><li>Example 7.1   To explore the correlation between </li></ul><ul><li>systolic pressure and diastolic pressure (mmHg), </li></ul><ul><li>665 girls aged from 6 to 10 years were measured. </li></ul><ul><li>Two random variables X and Y </li></ul><ul><li>Sample: 665 girls </li></ul><ul><li>The individuals in the sample should be </li></ul><ul><li>independent each other. </li></ul>
  5. 5. <ul><li>Example 7.2 To explore the correlation between </li></ul><ul><li>the heights of father and son, 20 graduate male </li></ul><ul><li>students were randomly selected from a name </li></ul><ul><li>list of graduates in a high school. The heights </li></ul><ul><li>(cm) of fathers and sons were measured </li></ul><ul><li>Table 7.1 Heights (cm) of 20 pairs of father and son </li></ul>
  6. 6. 12.1 Linear correlation
  7. 7. Scatter Diagram : Fig. 7.1 Scatter diagram of systolic and diastolic blood pressures (mmHg) of 665 girls of 6 to 10 years old
  8. 10. 7.2 Correlation Coefficient 7.2.1 Population correlation coefficient <ul><li>Pearson’s product-moment linear correlation coefficient: </li></ul><ul><li>The mean of “product of the two standardized variables” </li></ul><ul><li>---- simple correlation coefficient </li></ul>-- covariance between X and Y
  9. 11. Pearson’s product moment sample correlation coefficient r 7.2.2 Sample correlation coefficient
  10. 12. A measurement of linear relationship: 1) Whether there is a correlation If the correlation coefficient is 0 or not big enough -- no correlation 2) If correlation coefficient is big enough The direction of correlation? -- positive or negative The strength of correlation? high or not? -- Is the absolute value big enough? Complete correlation : +1 or -1,
  11. 13. Example 7.3 Calculate the correlation coefficient between the heights of father and son.
  12. 14. <ul><li>r is sample correlation coefficient, change from sample to sample </li></ul><ul><li>There is a population correlation coefficient, denoted by ρ </li></ul><ul><li>Question : Whether ρ =0 or not? </li></ul><ul><li>Assumption: </li></ul><ul><li>X and Y follow a bi-variable normal distribution </li></ul>7.3 Inference on Correlation Coefficient 7.3.1 Hypothesis test
  13. 15. <ul><li>H 0 : ρ =0, H 1 : ρ ≠0 α =0.05 </li></ul><ul><li>(1) Checking a special table (Table 8 in appendix 2 ) </li></ul><ul><li>H 0 is rejected </li></ul><ul><li>---- positive correlation between the heights of father and son. </li></ul><ul><li>Question: </li></ul><ul><li>Since , very small, can we say </li></ul><ul><li>the correlation is very strong ? </li></ul><ul><li>Does a small P value mean that the correlation </li></ul><ul><li>is strong ? </li></ul>
  14. 16. Question : If r =0.90, can you claim the two variables are correlated each other? Table 8 Critical values for r
  15. 17. (2) t test (Assume normal distribution) H 0 : ρ =0, H 1 : ρ ≠0 <ul><li>If P -value < α , then reject H 0 , conclude that </li></ul><ul><li>the population correlation coefficient is significantly different from 0. </li></ul><ul><li> =20-2=18, </li></ul><ul><li>The population correlation coefficient might not be 0. </li></ul>
  16. 18. 7.3.2 Interval estimation <ul><li>Assumption: X and Y follow a bi-variable normal distribution. </li></ul><ul><li>Pre-knowledge: </li></ul><ul><li>(1) hyperbolic tangent and its inverse </li></ul><ul><li>Hyperbolic tangent ( 双曲正切 ) </li></ul><ul><li>Inverse  of hyperbolic tangent ( 反双曲正切 ) </li></ul>
  17. 19. <ul><li>approximately follows a normal distribution </li></ul><ul><li>Confidence interval of : </li></ul><ul><li>or </li></ul><ul><li>Taking a transformation of </li></ul>
  18. 20. Example 7.5 After getting , please find out a 95% confidence interval for population correlation coefficient .
  19. 21. <ul><li>It is useful to: </li></ul><ul><li>ranked data </li></ul><ul><li>As well as measurement data </li></ul><ul><li>---- not follow a normal distribution; </li></ul><ul><li>or not sure about the distribution; </li></ul><ul><li>or not precisely measured </li></ul><ul><li>or X or Y are ordinal variables </li></ul>7.4 Rank Correlation 7.4.1 Spearman ’ s rank correlation coefficient
  20. 22. Spearman ’ s rank correlation coefficient <ul><li>sort (x 1 ,x 2 ,…,x n ), get rank p i for x i </li></ul><ul><li>sort (y 1 ,y 2 ,…,y n ), get rank q i for y i </li></ul><ul><li>n pairs of observations, (x 1 ,y 1 ), …, (x n ,y n ) </li></ul>
  21. 23. Example 7.6 An etiology study on liver cancer has collected data on liver-cancer-specific death rate and the relative content of aflatoxin in certain food for 10 Counties. Putting the ranks into the formula of Spearman’s correlation coefficient
  22. 24. Table 9 Critical value for Spearman’s rank correlation coefficient 7.4.2 Hypothesis test for (1) Checking a special table (Table 9) P <0.02 and it is significant
  23. 25. <ul><li>(2) t test </li></ul><ul><li>Same as the t test for Pearson’s correlation </li></ul><ul><li>coefficient </li></ul><ul><li>If p is small, then reject </li></ul>