How To Compute & Interpret Pearson’S Product Moment Correlation Coefficient?

49,203 views
48,774 views

Published on

How To Compute & Interpret Pearson’S Product Moment Correlation Coefficient?

Published in: Business
2 Comments
6 Likes
Statistics
Notes
No Downloads
Views
Total views
49,203
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
899
Comments
2
Likes
6
Embeds 0
No embeds

No notes for slide

How To Compute & Interpret Pearson’S Product Moment Correlation Coefficient?

  1. 1. How to Compute and Interpret Pearson’s R Page 1 How To Compute and Interpret Pearson’s Product Moment Correlation Coefficient Objective: Learn how to compute, interpret and use Pearson’s correlations coefficient. Keywords and Concepts 1. Product moment correlation 8. Positive correlation coefficient 9. Correlation does not imply causation 2. Degree of association 10. Property of linearity 3. Karl Pearson 11. Nonlinear relationship 4. X-axis 12. Coefficient of determination (r2) 5. Y-axis 13. Percent common variance 6. Bivariate scatter-plot 14. Statistical significance 7. Negative correlation The degree of association between two variables (correlation) can be described by a visual representation or by a number (termed a coefficient) indicating the strength of association. The quantitative computation of the correlation was first derived in 1896 by Karl Pearson and is referred to as “Pearson’s product moment correlation coefficient.” Visual Description of Quantitative Relationships A visual description of a correlation appears as a scatter plot where scores for two variables (from the same or different subjects) are plotted with one variable on the X-axis (horizontal axis) and the other variable on the Y-axis (vertical axis). Figure 1 displays a bivariate scatter-plot of the relationship between height (cm) and weight (kg) for a group of male college students enrolled in exercise physiology at the University of Michigan in the fall semester 2002. Each data point in the graph represents one person’s
  2. 2. How to Compute and Interpret Pearson’s R Page 2 score on both variables. Note, the pattern of association shows that increasing values of height generally correspond to increasing values of weight. Figure 1. Plot of height versus weight for a sample of college students. 100 95 90 85 Weight, kg 80 75 70 65 60 55 50 170 175 180 185 190 195 Height, cm Figure 2 (next page) displays eight other scatterplot examples. Graphs (a), (b), and (c) depict a pattern of increasing values of Y corresponding to increasing values of X. In plots (a) to (c) the dot pattern becomes closer to a straight line, suggesting that the relationship between X and Y becomes stronger. The scatterplots in (d), (e) and (f) depict patterns where the Y values decrease as the X values increase. Again, proceeding from graph (d) to (f), the relationship becomes stronger. In contrast to the first six graphs, the scatter-plot of (g) shows no pattern (correlation) between X and Y. Finally, the scatter plot of (h) shows a pattern, but not a straight-line pattern as with the other plots.
  3. 3. How to Compute and Interpret Pearson’s R Page 3 Visual inspection of scatter plots only permits a subjective and general description of relationships. To quantify relationships between variables the Pearson’s correlation coefficient is used.
  4. 4. How to Compute and Interpret Pearson’s R Page 4 Figure 2. Different scatter diagrams.
  5. 5. How to Compute and Interpret Pearson’s R Page 5 Pearson’s Correlation Coefficient (r) The several different computational formula for computing Pearson’s r each result in the same answer (except for rounding errors). The correlation coefficient should be rounded to three decimal places. Rounding in the middle of a calculation often creates substantial errors, therefore, round-off only at the last calculation. Formula Pearson’s formula to calculate r follows: ∑ XY − ( ∑ X )( ∑ Y ) N N N r= Eq. 1 ∑X − (∑ ) * ∑ − (∑ ) 2 2 X Y Y 2 2 N N N N where: N represents the number of pairs of data ∑ denotes the summation of the items indicated ∑X denotes the sum of all X scores ∑X2 indicates that each X score should be squared and then those squares summed (∑X)2 indicates that the X scores should be summed and the total squared. [avoid confusing ∑X2 (the sum of the X squared scores) and (∑X)2 (the square of the sum of the X scores] ∑Y denotes the sum of all y-scores ∑Y2 indicates that each Y score should be squared and then those squares summed (∑Y)2 indicates that the Y scores should be summed and the total squared
  6. 6. How to Compute and Interpret Pearson’s R Page 6 ∑XY indicates that each X score should be first multiplied by its corresponding Y score and the product (XY) summed The numerator in equation 1 equals the mean of XY ( X ) minus the mean of X ( Y X ) times the mean of Y ( Y ); the denominators are the standard deviation for X (SDX) and the standard deviation for Y (SDY). [See; How to compute and interpret measures of variability: the range, variance and standard deviation] Thus, Pearson’s formula can be written as: XY−XY r= S X∗S Y D D Example Compute the correlation coefficient (r) for the height-weight data shown in Figure 1. Pertinent calculations are given in Table 1. Table 1. Height and weight of a sample of college age males. X2 Y2 Height, cm (X) Weight, kg (Y) X*Y 174 61 10614 30276 3721 175 65 11375 30625 4225 176 67 11792 30976 4489 177 68 12036 31329 4624 178 72 12816 31684 5184 182 74 13468 33124 5476 183 80 14640 33489 6400 186 87 16182 34596 7569 189 92 17388 35721 8464 193 95 18335 37249 9025 2 ∑Y2=59177 ∑X=1813 ∑Y=761 ∑XY=138646 ∑X =329069 ∑X2/N=32906.9 ∑Y2/N=5917.7 ∑X/N=181.3 ∑Y/N=76.1 ∑XY/N=13864.6
  7. 7. How to Compute and Interpret Pearson’s R Page 7 XY − X Y r= SDX * SDY 13864.6 − (181.3)(76.1) r= 32906.9 − (181.3)2 * (5917.7) − (76.1) 2 67.67 r= 68.605 r = 0.986 Interpreting Pearson’s Correlation Coefficient The usefulness of the correlation depends on its size and significance. If r reliably differs from 0.00, the r-value will be statistically significant (i.e., does not result from a chance occurrence.) implying that if the same variables were measured on another set of similar subjects, a similar r-value would result. If r achieves significance we conclude that the relationship between the two variables was not due to chance. How To Evaluate A Correlation The values of r always fall between -1 and +1 and the value does not change if all values of either variable are converted to a different scale. For example, if the weights of the students in Figure 1 were given in pounds instead of kilograms, the value of r would not change (nor would the shape of the scatter plot.) The size of any correlation generally evaluates as follows: Correlation Value Interpretation ≤0.50 Very low 0.51 to 0.79 Low 0.80 to 0.89 Moderate ≥0.90 High (Good)
  8. 8. How to Compute and Interpret Pearson’s R Page 8 A high (or low) negative correlation has the same interpretation as a high (or low) positive correlation. A negative correlation indicates that high scores in one variable are associated with low scores in the other variable (see Figure 2, graphs d, e, f). Correlation Does Not Imply Causation “CORRELATION DOES NOT IMPLY CAUSATION!” Just because one variable relates to another variable does not mean that changes in one causes changes in the other. Other variables may be acting on one or both of the related variables and affect them in the same direction. Cause-and-effect may be present, but correlation does not prove cause. For example, the length of a person’s pants and the length of their legs are positively correlated - people with longer legs have longer pants; but increasing one’s pant length will not lengthen one’s legs! Property of Linearity The conclusion of no significant linear correlation does not mean that X and Y are not related in any way. The data depicted in Figure 2h result in r = 0, indicating no linear correlation between the two variables. However, close examination show a definite pattern in the data reflecting a very strong “nonlinear” relationship. Pearson’s correlation apply only to linear data. Coefficient of Determination (r2) The relationship between two variables can be represented by the overlap of two circles representing each variable (Figure 3). If the circles do not overlap, no relationship exists; if they overlap completely, the correlation equals r = 1.0. If the circles overlap somewhat, as in Figure 3 (next page), the area of overlap represents the amount of variance in the dependent (Y-variable) than can be explained by the independent (X- variable). The area of overlap, called the percent common variance, calculates as: r2 x 100 For example, if two variables are correlated r = 0.71 they have 50% common variance (0.712 x 100 = 50%) indicating that 50% of the variability in the Y-variable can
  9. 9. How to Compute and Interpret Pearson’s R Page 9 be explained by variance in the X-variable. The remaining 50% of the variance in Y remains unexplained. This unexplained variance indicates the error when predicting Y from X. For example, strength and speed are related about r = 0.80, (r2 = 64% common variance) indicating 64% of both strength and speed come from common factors and the remaining 36% remains unexplained by the correlation. Figure 3. Example of the coefficient of determination (percent common variance r2x100). x-variable; predictor y-variable; predicted variable variable Area of overlap; r2x100 (percent common variance) Statistical Significance of a Correlation When pairs of number used to compute r are small, a spuriously high value can occur by chance. For example, suppose numbers 1, 2, and 3 each written on a separate piece of paper are placed in a hat. The numbers are then blindly drawn one at a time on two different occasions. The possibility exists that the numbers could be drawn in the same order twice. This would produce r = 1.0 (a perfect correlation.) But this value would be a chance occurrence since no known factor(s) can cause such a relationship. In contrast, the odds of 100 numbers being randomly selected in the same order twice are very low. Thus, if the r value with Npairs=100 is high, we conclude that chance cannot be a factor explaining the correlation. Thus, the number of pairs of values (N) determines
  10. 10. How to Compute and Interpret Pearson’s R Page 10 the odds that a relationship could happen by chance. If N is small, r must be large to be significant (not caused by chance). When N is large, a small r-value may be significant. Table 2 is used to determine the significance of r. The left column df represents the degrees of freedom: df = Npairs - 2 (the number of pairs of XY scores minus 2). df represents the number of values that are free to vary when the sum of the variable is set; df compensates for small values of N by requiring higher absolute values of r before being considered significant. Step 1. Find the degrees of freedom in the left column: df = Npairs of data -2 Step 2. Read across the df row and compare the obtained r value with the value listed in one of the columns. The heading at the top of each column indicates the odds of a chance occurrence, (the probability of error when declaring r to be significant.) p=.10 is the 10% probability; p=.05 is the 5% probability level, and p=.01 is the 1% probability level. Reading from the table for df = 10, a correlation as high as r = 0.497 occurs 10 times in 100 by chance alone (p=.10); r = 0.576 occurs 5 times in 100 by chance (p=.05); and r=0.708 occurs 1 time in 100 by chance (p=.01). If r locates between values in any two columns, use the left of the two columns (greater odds for chance). If r does not equal or exceed the value in the p=.10 column, it is said to be nonsignificant (NS). Negative r’s are read using the absolute value of r. Table 2. Values of the Correlation Coefficient (r) df p = .10 p = .05 p = .01 1 9877 .9969 .9999 2 .900 .950 .990 3 .805 .878 .959 4 .729 .811 .917 5 .669 .754 .875 6 .621 .707 .834 7 .582 .666 .798 8 .549 .632 .765 9 .521 .602 .735 10 .497 .576 .708 11 .476 .553 .684 12 .457 .532 .661 13 .441 .514 .641
  11. 11. How to Compute and Interpret Pearson’s R Page 11 14 .426 .497 .623 15 .412 .482 .606 16 .400 .468 .590 17 .389 .456 .575 18 .378 .444 .561 19 .369 .433 .549 20 .360 .423 .537 25 .323 .381 .487 30 .296 .349 .449 35 .275 .325 .418 40 .257 .304 393 45 .243 .288 .372 50 .231 .273 .354 60 .211 .250 .325 70 .195 .232 .302 80 .183 .217 .283 90 .173 .205 .267 100 .164 .195 .254 From Biometrika Tables For Statisticiarts (Vol. 1) (3rd ea.) by E.S. Pearson and H.O. Hartley (Eds.), 1966, London: Biometrika Trustees. Copyright 1966 by Biometrika. http://www.umich.edu/~exphysio/MVS250/PearsonCorr.doc

×