Simple correlation


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Simple correlation

  1. 1. 14-12-13 Magdy Ibrahim Mostafa Prof. Obstetrics & Gynecology, Faculty of Medicine, Cairo University Director; Research, Biostatistics & IT Units, MEDC, Cairo University Management member; EBM Unit, MEDC, Cairo University Scientific Council Member, Egyptian IT Fellowship Board Member, Egyptian Ob/Gyn Fellowship Associate Editor; Kasr Al Aini Journal of Obstetrics and Gynecology Peer Reviewer; Gyn Endocrin J, Gyn Oncol J, Obstet Gynecol Invest Journal Peer Reviewer; Cairo University Medical Journal, Kasr El Aini Medical Journal, MEFS Journal.y 1
  2. 2. 14-12-13 Correlation In two series of numerical data Age Height Age BMD The values in one variable may vary correspondingly with the other one Correlation: + ve OR - ve When the two variables increase & decrease in parallel (Same direction) positive correlation. When one goes up the other goes down proportionally (Opposite directions) negative correlation Correlation = Causation 2
  3. 3. 14-12-13 Importance of correlation 1. Facilitates difficult measures 2. Study of effectors: • Dependent variable (outcome) • Independent variable(s) (predictors or effectors) Correlation between payment & working hours Scatter diagram 3
  4. 4. 14-12-13 Correlation between Payment & working hours Conclusion: 1. As working hours increase payment increase Positive correlation (proportionate correlation) 2. The increase in payment is constant in relation to increase in working hours Linear correlation Correlation between TV watch & school grade Scatter diagram 4
  5. 5. 14-12-13 Correlation between TV watching & school grade Conclusion: 1. As TV watching hours increase, the final school grade decrease Negative correlation (inverse correlation) 2. The decrease in grade is constant in relation to increase in TV watching hours Linear correlation Correlation between Age and Height Scatter Diagram 5
  6. 6. 14-12-13 Correlation between Age and Height Scatter Diagram 1200 3000 1000 Distance before discomfort (m) 3500 2000 1500 1000 800 600 400 200 500 22 24 26 28 30 32 34 36 38 40 0 42 45 Gestational age (weeks) Positive linear correlation 50 55 60 65 70 75 Negative linear correlation Age (years) 170 200 150 Amniotic fluid volume (ml) 210 190 Height (cm) Weight (g) 2500 180 170 160 130 110 90 70 150 28 33 38 Age (years) 43 No correlation 48 50 15 20 25 30 35 40 45 Gestational age (weeks) Non linear correlation 6
  7. 7. 14-12-13 Non-Linear Correlation 120 Height 120 Height Straight line 100 Curve 100 80 80 60 60 40 40 20 20 Curve is Closer to points 0 0 0 5 10 15 0 5 10 Age (Years) Age (Years) Linear 15 Non-Linear The correlation coefficient (meaning and magnitude) Examining plots is a good way to determine the nature and strength of the relationship between two variables However, you need an objective measure to replace subjective descriptions like strong, weak, I can't make up my mind, and none 7
  8. 8. 14-12-13 The correlation coefficient (meaning and magnitude) Mathematically, correlation is represented by what is known as: correlation coefficient The correlation coefficient ranges from: “0” (means no correlation) to 1 (perfect correlation) The sign is for the direction and not a value The correlation coefficient (interpretation) Interpretation of “cc”: From 0 to 0.25 (-0.25) = little or no relationship From 0.25 to 0.50 (-0.25 to 0.50) = fair From 0.50 to 0.75 (-0.50 to -0.75) = moderate to good Greater than 0.75 (or -0.75) = very good to excellent Strong relation may not be clinically important 8
  9. 9. 14-12-13 The correlation Does NOT tell us if Y is a function of X Does NOT tell us if X is a function of Y Does NOT tell us if X causes Y Does NOT tell us if Y causes X Coefficient does NOT tell us what the scatterplot looks like Correlation between Age and Height Strength of correlation 95 180 Height (Cm) 185 90 175 85 170 80 165 75 160 70 155 65 60 150 0 20 40 60 80 0 2 4 Age (Years) 8 Age (Years) cc = 0.012 Weak 6 cc = 0.983 Strong 9
  10. 10. 14-12-13 Correlation between Age and Height Direction of correlation 95 Number of cold episods/year 7 Height 90 6 85 5 80 4 75 3 70 2 65 1 60 0 2 4 6 80 0 10 Age (Years) cc = 0.983 Positive 20 30 40 50 60 Exposure to Sun (h/week) cc = - 0.73 Negative Correlation coefficient Y Dependent variable +1 0 -1 X Increases Y Increases X Change Y Not Follow X Increases Y Decreases X Independent variable 10
  11. 11. 14-12-13 Which test? Linear correlation Normal data Pearson product moment correlation (r) Non-normal data Spearman correlation (R) The Pearson correlation Is a measure of the strength of the linear correlation between two variables in one sample “r” indicates: Strength of relationship (strong, weak, or none) Direction of relationship from 0 to 1 either (-)ve or (+)ve 11
  12. 12. 14-12-13 Pearson Correlation Assumptions: 1. Variables are quantitative or ordinal 2. Normally distributed variables 3. Linear relationship (monotonic + constant change) The Pearson “r” is Symmetric, since the correlation of x and y is the same as the correlation of y and x Unaffected by linear transformations, such as adding a constant to all numbers or dividing all numbers by a constant WARNING: Never compute correlation coefficients for nominal variables, even if they are nicely coded with numbers. A correlation between governorate and income is meaningless 12
  13. 13. 14-12-13 The Pearson correlation “r” is a measure of LINEAR ASSOCIATION When “r” = ZERO This means NO LINEAR CORREATION – this does NOT mean there is NO CORRELATION 13
  14. 14. 14-12-13 14
  15. 15. 14-12-13 15
  16. 16. 14-12-13 Pearson Correlation Pearson “r” is an appropriate summary measure for the first plot only, since data are near a straight line In the second plot, the relationship is not linear, so it doesn't make sense to describe how tightly the points cluster around a straight line In the third plot, the perfect relationship is distorted by an outlier point In the fourth plot, there appear to be two subgroups of cases in which there is no linear relationship between the two variables 16
  17. 17. 14-12-13 Pearson Correlation If you don't plot your data, you can't tell whether a correlation coefficient is a good summary of the relationship The value of a correlation coefficient also depends on the range of values for which observations are taken Even if there is a linear relationship between two variables, you won't detect it if you consider a small range of values of the variables For example, height may be a poor predictor of weight if you restrict your range of heights to those over six feet No extrapolation 17
  18. 18. 14-12-13 Pearson Correlation Limitations: Linearity: Can’t describe non-linear relationships (most biological relations) Truncation of range: Underestimate strength of relationship if you can’t see full range of x value No proof of causation Testing hypothesis Pearson correlation coefficient describes the correlation between the sample observations on two variables in the same way that ρ describes the relationship in a population Thus we need to knowing if we may conclude that ρ # 0 The hypotheses are: H0: ρ = 0 (no correlation in the population) Ha: ρ ≠ 0 (there is correlation in the population) 18
  19. 19. 14-12-13 Testing hypothesis The test used is the t test (revise t test uses) Statistically significant doesn’t mean clinically important or useful If you are examining many correlations coefficients, have to use the Bonferroni adjustment Coefficient of determination The square of Pearson cc, r2, is the proportion of variation in the values of y that is explained by the regression model with x Amount of variance accounted for in y by x Percentage increase in accuracy you gain by using the regression line to make predictions 0 ≤ r2 ≤ 1 (100%) The larger r2 , the stronger the linear relationship The closer r2 is to 1, the more confident we are in our prediction 19
  20. 20. 14-12-13 Coefficient of determination Example Topography of adipose tissue (AT) is associated with metabolic complications considered as risk factors for cardiovascular disease To measure the amount of intraabdominal AT as part of the evaluation of the cardiovascular-disease risk of an individual. Computed tomography (CT), the only available technique that precisely and reliably measures the amount of deep abdominal AT, however, is costly, exposes the subject to irradiation and is not available to many physicians 20
  21. 21. 14-12-13 Example Despres and his colleagues conducted a study to develop equations to predict the amount of deep abdominal AT from simple anthropometric measurements Among the measurements taken on each subject were deep abdominal AT obtained by CT and waist circumference. The question of interest is how well can deep abdominal AT correlates to waist circumference Spearman Correlation It is a measure of the strength and direction of association that exists between two variables measured on at least an ordinal scale It is denoted by the symbol rs, R The test is used for either ordinal variables or for interval/ratio data that has failed the assumptions necessary for conducting the Pearson's product-moment correlation The values of the variables are converted in ranks and then correlated 21
  22. 22. 14-12-13 Spearman Correlation Assumptions: 1. Variables are measured on an ordinal, interval or ratio scale 2. Variables need NOT be normally distributed 3. There is a monotonic relationship (either the variables increase in value together or as one variable value increases the other variable value decreases) but linearity is not needed 4. This type of correlation is NOT very sensitive to outliers SPSS work 22
  23. 23. 14-12-13 Example no. x(WC) Y(Abd.AT) 1 2 3 4 5 6 7 74.8 72.6 84.0 74.7 71.9 80.9 83.4 28.8 25.7 42.8 25.9 21.7 39.1 42.6 Multiple Correlation 23