Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

4. correlations


Published on


Published in: Education
  • Be the first to comment

4. correlations

  1. 1. Steve Saffhill Research Methods in Sport & Exercise Correlations
  2. 2.  We cannot prove anything by science.  This does not detract from the importance of science neither does it detract from the amazing achievements of science.  If we cannot prove then what can we do?  We can make statements based on probability.  Probability can be likened to the ‘odds’ of something happening.  ‘Real and not due to chance’
  3. 3. Remember!! We cannot prove – therefore we need to make statements about how confident we are in saying what we are saying. What is the probability that the result is due to the intervention (IV)? Statistics is used to determine the probability that the no effect statement (called the null hypothesis) is not supported. If we are confident that the null hypothesis is not supported then we can confidently accept the research hypothesis.
  4. 4. How inferential statistics work • Inferential statistics test a null hypothesis • They produce a probability value “p value” for you to interpret! • This is a value calculated of “whether there is a likelihood of an apparent relationship (or difference in t-tests/ANOVA) between two or more things is down to chance or not”!
  5. 5. P Values • If p = .05 then in 95 cases out of 100 the result is real and not due to chance (i.e., there is 5% chance of rejecting the null when in fact it may be true). • If p = .01 then 99 cases out of 100 the result is real and not due to chance (i.e., 1% chance of rejecting null when it may be true). • Rejection of null hypothesis when in fact true is Type I error
  6. 6. What p value do we use? • In addition to SPSS giving us a p value when we run our stats we also set a p value at the start of the study to compare it to = 0.05 (.05 same thing) at the start of any study (called alpha: ). • So... if SPSS gives us a p value of less than the one we set at the start of the study (i.e., p<.05) then we say that our results are real and not due to chance! • And...We reject the null hypothesis! • If the p value is more than .05 (i.e., p>.05) then we conclude there is no relationship or difference • And... we accept the null hypothesis!
  7. 7. Inferential Test: Correlation Testing for relationships Parametric data Non-parametric data Pearson Product moment correlation Spearman rank order correlation
  8. 8. Parametric Assumptions Reminder: 1. The data must be randomly sampled 2. The data must be high level data (interval/ratio not nominal or ordinal) 3. The data must be normally distributed a) curve & b) z scores!!! 4. The data must be of equal variance
  9. 9. Correlation Correlation = association or ‘going together’ between variables  Expenditure is correlated with income.  Swimming speed is correlated with stroke rate.  High jump performance is correlated with Height. Each statement is one variable associated with more of a second variable.
  10. 10. • However ‘more’ is vague - mathematically we need to quantify what is meant by ‘more’? • The mathematical technique of correlation was devised to specify the extent to which two things (variables) are associated.
  11. 11. SPSS gives us a value for the Correlation coefficient = the number used to express the extent of association THIS IS CALLED THE (r) value Perfect association, i.e. a lot of one variable is always associated with a lot of another variable will have a correlation coefficient of +1.00 (r=1) If there is no association between two variable then the correlation coefficient (r) = 0.00 Most variables will have values somewhere between 0 and 1.00 (either +/-)
  12. 12. Positive correlation ‘Improved physical fitness is related to increased levels of exercise’ • In this case more of one variable (fitness) is accompanied by more of the other (training) • Another way of expressing this is as a direct relationship.
  13. 13. Negative correlation ‘Outside temperature and weight of clothing worn’ • In this case more of one variable (temperature) is accompanied by less of the other (weight of clothing) • Another way of expressing this is as an inverse relationship. • Instead of running between 0.00 and +1.00, a negative correlation coefficient takes values between 0.00 and –1.00
  14. 14. Range of correlation coefficients Values may be interpreted as follows: 0.2 = a tendency to be related 0.5 = moderate relationship 0.9 = strong relationship
  15. 15. Graphical representation of Correlations Such a graph is referred to as a scatter graph or scatter plot
  16. 16. Examples of correlation coefficients (r value)  This is a statistic for testing a supposed LINEAR association between two variables has the symbol r  The line on the scatter graph around which the points are evenly dispersed is called by various names, e.g. the line of best fit or regression line  The closer r is to 1.00 (+ or -) the closer the points are dispersed around the line of best fit.  E.g., more linear
  17. 17. Graphical representation Good News. It is quite possible, from inspection of a scatter plot, to do two things: (1) Determine whether there is linear relationship between the variables, in which case the correlation inferential test is a meaningful statistic to use (2) Fairly accurately estimate what the value of the correlation statistic (r) would be if calculated.
  18. 18. Bad News Correlation does not show the relationship or causality The moral.. Always work from the scatter plot first and decide if the Pearson correlation is a suitable statistic to use. A researcher runs 4 correlation tests and gets an r value of .94 from SPSS for all 4 of them!!! What does it mean?...............................BUT Correlation does not show the relationship or causality
  19. 19. 19 Correlations 1 .816** . .002 11 11 .816** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X1 Y1 X1 Y1 Correlation is significant at the 0.01 level (2-tailed). **. Correlations 1 .816** . .002 11 11 .816** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X1 Y2 X1 Y2 Correlation is significant at the 0.01 level (2-tailed). **. Correlations 1 .816** . .002 11 11 .816** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X1 Y3 X1 Y3 Correlation is significant at the 0.01 level (2-tailed). **. Correlations 1 .817** . .002 11 11 .817** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X2 Y4 X2 Y4 Correlation is significant at the 0.01 level (2-tailed). **. Y1 1110987654 X1 16 14 12 10 8 6 4 2 Y2 109876543 X1 16 14 12 10 8 6 4 2 Y3 141210864 X1 16 14 12 10 8 6 4 2 Y4 141210864 X2 20 18 16 14 12 10 8 6
  20. 20. Limitations of correlation studies Correlation does not imply causation A correlation between two variables does not mean that one causes the other. • Does anxiety cause a reduction in performance? • Does performance cause anxiety? • Or is it something else unidentified that is leading to an increase?
  21. 21. Causation can only be shown via an experimental study in which an independent variable can be manipulated to bring about an effect. • E.g. does EPO use improve cycling performance?
  22. 22. Experimental Design Example  Does EPO use improve cycling performance?  Two group of trained cyclists  EPO vs Control groups  Hypothesis:  EPO use will improve cycling performance  We’ve now set up a hypothesis to test!  Following testing EPO users performed better  But…what else might have led to the data we found??
  23. 23. Performance Differences Type of bike Genetics Day of the week Time of day Fatigue EPO dose Individual Response Gender Altitude training
  24. 24. • How do we know that the difference we observed has been caused by our manipulation of the IV (EPO v no EPO) and not one of the other factors? • We can limit the impact of these other factors!!! • Done by randomly allocating people to the conditions of our IV • This reduces probability that the 2 groups differ on things like training volume etc and thus eliminates these as possible causes! • = more confident in our ability to infer a causal relationship!
  25. 25. For example, there is a strong positive correlation between death by drowning and ice cream sales. When many ice creams are sold more people die by drowning. You could not conclude that ice cream causes drowning; anymore than you could conclude that a high incidence of drowning causes people to buy ice cream. Why do you think the two variable are strongly positively correlated? Clue: look for a variable that affects both ice cream sales and drowning in a like fashion.
  26. 26. Interpreting reliability of correlation results  If the study were to be repeated what is the chance of obtaining the same result?  To test the reliability we need a research hypothesis and a null hypothesis Research hypothesis - a relationship exists Null hypothesis - no relationship exists
  27. 27. Beware Sample size exerts considerable effect on reliability A weak correlation will be regarded as significant (reliable) if sample size is large and A strong correlation will be insignificant if the sample size is small. What do you do? Apply your own judgement - statistics will not interpret your results! 27
  28. 28. The meaningfulness of r  Since the reliability of r may be in doubt we need to know how meaningful the value is.  This implies that although a correlation may exist and it is reliable (significant) - what does it mean?  What does it tell us about the relationship between the two variables?  The association might be statistically significant but is it of any importance?  Meaningfulness is often interpreted by the coefficient of determination, R2 28
  29. 29.  In this method, the portion of common association of the factors that influence the two variables is determined.  In other words, the coefficient of determination (R2) indicates that portion of the total variance in one measure that can be explained, or accounted for, by the variance in the other measure.  Standing long jump and vertical jump for example.  R2 = r x r  Shared variance = r2 x 100 = ?%
  30. 30.  What is equally interesting is the unexplained variance - which if r = 0.7, therefore R2 = 0.49 (0.7 x 0.7 = .49)  Shared Variance = 49%  More than half of the factors affecting each don’t relate to one another (51% is explained by something else). 30
  31. 31. What Shared Variance means 31 The Venn diagram above illustrates the meaning of the coefficient of determination - the a of common variance (area of overlap)
  32. 32.  The unexplained variance is due to unique factors applicable to each event;  i.e. factors that affect one variable but not the other and vice versa.  Of course, the study is not designed to give this answer but it often generates some interesting discussion and may point the way for future research. 32
  33. 33. Types of Correlation Statistics 33 Pearson’s r • A parametric statistic – both variables must exhibit parametric properties. • If one of the variables is not parametric then an alternative measure of association is chosen. Spearman’s rho • May be used for non-parametric data. Chi Square measure of association • Use for nominal data (Gender, position etc)
  34. 34. Summary i. Check for parametric properties ii. Always view the scatter graph first iii. Check that relationship appears linear iv. Look for outliers v. Consider carefully the range you have sampled vi. Calculate r2 vii. Explain the shared variance and unexplained variance 34