Steve Saffhill
Research Methods in Sport & Exercise
Correlations
 We cannot prove anything by science.
 This does not detract from the importance of science neither does it detract
from the amazing achievements of science.
 If we cannot prove then what can we do?
 We can make statements based on probability.
 Probability can be likened to the ‘odds’ of something
happening.
 ‘Real and not due to chance’
Remember!!
We cannot prove – therefore we need to make statements about
how confident we are in saying what we are saying.
What is the probability that the result is due to the
intervention (IV)?
Statistics is used to determine the probability that the no effect
statement (called the null hypothesis) is not supported.
If we are confident that the null hypothesis is not supported
then we can confidently accept the research hypothesis.
How inferential statistics work
• Inferential statistics test a null hypothesis
• They produce a probability value “p value” for you to interpret!
• This is a value calculated of “whether there is a likelihood of an apparent relationship
(or difference in t-tests/ANOVA) between two or more things is down to chance or
not”!
P Values
• If p = .05 then in 95 cases out of 100 the result is real
and not due to chance (i.e., there is 5% chance of
rejecting the null when in fact it may be true).
• If p = .01 then 99 cases out of 100 the result is real and
not due to chance (i.e., 1% chance of rejecting null
when it may be true).
• Rejection of null hypothesis when in fact true is Type I
error
What p value do we use?
• In addition to SPSS giving us a p value when we run our stats we also set a
p value at the start of the study to compare it to = 0.05 (.05 same thing) at
the start of any study (called alpha: ).
• So... if SPSS gives us a p value of less than the one we set at the start of
the study (i.e., p<.05) then we say that our results are real and not due to
chance!
• And...We reject the null hypothesis!
• If the p value is more than .05 (i.e., p>.05) then we conclude there is no
relationship or difference
• And... we accept the null hypothesis!
Inferential Test: Correlation
Testing for relationships
Parametric data Non-parametric data
Pearson Product
moment
correlation
Spearman
rank order
correlation
Parametric Assumptions Reminder:
1. The data must be randomly sampled
2. The data must be high level data
(interval/ratio not nominal or ordinal)
3. The data must be normally distributed
a) curve & b) z scores!!!
4. The data must be of equal variance
Correlation
Correlation = association or ‘going together’ between variables
 Expenditure is correlated with income.
 Swimming speed is correlated with stroke rate.
 High jump performance is correlated with Height.
Each statement is one variable associated with more of a second variable.
• However ‘more’ is vague - mathematically we need to quantify
what is meant by ‘more’?
• The mathematical technique of correlation was devised to
specify the extent to which two things (variables) are associated.
SPSS gives us a value for the Correlation coefficient = the number used to express the
extent of association
THIS IS CALLED THE (r) value
Perfect association, i.e. a lot of one variable is always associated with a lot of another
variable will have a correlation coefficient of +1.00 (r=1)
If there is no association between two variable then the correlation coefficient (r) =
0.00
Most variables will have values somewhere between 0 and 1.00 (either +/-)
Positive correlation
‘Improved physical fitness is related to increased levels of exercise’
• In this case more of one variable (fitness) is accompanied by more of the
other (training)
• Another way of expressing this is as
a direct relationship.
Negative correlation
‘Outside temperature and weight of clothing worn’
• In this case more of one variable (temperature) is accompanied by less
of the other (weight of clothing)
• Another way of expressing this is as an inverse relationship.
• Instead of running between 0.00 and +1.00, a negative correlation
coefficient takes values between 0.00 and –1.00
Range of correlation coefficients
Values may be interpreted as follows:
0.2 = a tendency to be related
0.5 = moderate relationship
0.9 = strong relationship
Graphical representation of Correlations
Such a graph is referred to as a scatter graph or scatter plot
Examples of correlation coefficients (r value)
 This is a statistic for testing a supposed LINEAR association between two
variables has the symbol r
 The line on the scatter graph around which the points are evenly
dispersed is called by various names, e.g. the line of best fit or regression
line
 The closer r is to 1.00 (+ or -) the closer the points are dispersed around
the line of best fit.
 E.g., more linear
Graphical representation
Good News.
It is quite possible, from inspection of a scatter
plot, to do two things:
(1) Determine whether there is linear relationship
between the variables, in which case the
correlation inferential test is a meaningful
statistic to use
(2) Fairly accurately estimate what the value of the
correlation statistic (r) would be if calculated.
Bad News
Correlation does not show the relationship or causality
The moral..
Always work from the scatter plot first and decide if the
Pearson correlation is a suitable statistic to use.
A researcher runs 4 correlation tests and gets an r
value of .94 from SPSS for all 4 of them!!!
What does it mean?...............................BUT
Correlation does not show the
relationship or causality
19
Correlations
1 .816**
. .002
11 11
.816** 1
.002 .
11 11
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
X1
Y1
X1 Y1
Correlation is significant at the 0.01 level
(2-tailed).
**.
Correlations
1 .816**
. .002
11 11
.816** 1
.002 .
11 11
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
X1
Y2
X1 Y2
Correlation is significant at the 0.01 level
(2-tailed).
**.
Correlations
1 .816**
. .002
11 11
.816** 1
.002 .
11 11
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
X1
Y3
X1 Y3
Correlation is significant at the 0.01 level
(2-tailed).
**.
Correlations
1 .817**
. .002
11 11
.817** 1
.002 .
11 11
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
X2
Y4
X2 Y4
Correlation is significant at the 0.01 level
(2-tailed).
**.
Y1
1110987654
X1
16
14
12
10
8
6
4
2
Y2
109876543
X1
16
14
12
10
8
6
4
2
Y3
141210864
X1
16
14
12
10
8
6
4
2
Y4
141210864
X2
20
18
16
14
12
10
8
6
Limitations of correlation studies
Correlation does not imply causation
A correlation between two variables does not mean that one causes the
other.
• Does anxiety cause a reduction in performance?
• Does performance cause anxiety?
• Or is it something else unidentified that is leading to an increase?
Causation can only be shown via an experimental
study in which an independent variable can be
manipulated to bring about an effect.
• E.g. does EPO use improve cycling performance?
Experimental Design Example
 Does EPO use improve cycling performance?
 Two group of trained cyclists
 EPO vs Control groups
 Hypothesis:
 EPO use will improve cycling performance
 We’ve now set up a hypothesis to test!
 Following testing EPO users performed better
 But…what else might have led to the data we found??
Performance
Differences
Type of
bike
Genetics
Day of the
week
Time of
day
Fatigue
EPO dose
Individual
Response
Gender
Altitude
training
• How do we know that the difference we observed has been caused
by our manipulation of the IV (EPO v no EPO) and not one of the
other factors?
• We can limit the impact of these other factors!!!
• Done by randomly allocating people to the conditions of our IV
• This reduces probability that the 2 groups differ on things like
training volume etc and thus eliminates these as possible causes!
• = more confident in our ability to infer a causal relationship!
For example, there is a strong positive correlation between death
by drowning and ice cream sales. When many ice creams are sold
more people die by drowning.
You could not conclude that ice cream causes drowning; anymore
than you could conclude that a high incidence of drowning causes
people to buy ice cream.
Why do you think the two variable are strongly positively correlated?
Clue: look for a variable that affects both ice cream sales and
drowning in a like fashion.
Interpreting reliability of correlation results
 If the study were to be repeated what is the chance of obtaining the same
result?
 To test the reliability we need a research hypothesis and a null hypothesis
Research hypothesis - a relationship exists
Null hypothesis - no relationship exists
Beware
Sample size exerts considerable effect on reliability
A weak correlation will be regarded as significant (reliable) if
sample size is large and
A strong correlation will be insignificant if the sample size is
small.
What do you do?
Apply your own judgement - statistics will not
interpret your results!
27
The meaningfulness of r
 Since the reliability of r may be in doubt we need to
know how meaningful the value is.
 This implies that although a correlation may exist and it
is reliable (significant) - what does it mean?
 What does it tell us about the relationship between the
two variables?
 The association might be statistically significant but is it
of any importance?
 Meaningfulness is often interpreted by the coefficient of
determination, R2
28
 In this method, the portion of common association of the factors that
influence the two variables is determined.
 In other words, the coefficient of determination (R2) indicates that
portion of the total variance in one measure that can be explained, or
accounted for, by the variance in the other measure.
 Standing long jump and vertical jump for example.
 R2 = r x r
 Shared variance = r2 x 100 = ?%
 What is equally interesting is the unexplained
variance - which if r = 0.7, therefore R2 = 0.49 (0.7 x
0.7 = .49)
 Shared Variance = 49%
 More than half of the factors affecting each don’t
relate to one another (51% is explained by
something else).
30
What Shared Variance means
31
The Venn diagram above illustrates the meaning of the coefficient of determination - the a
of common variance (area of overlap)
 The unexplained variance is due to unique factors
applicable to each event;
 i.e. factors that affect one variable but not the
other and vice versa.
 Of course, the study is not designed to give this
answer but it often generates some interesting
discussion and may point the way for future
research.
32
Types of Correlation Statistics
33
Pearson’s r
• A parametric statistic – both variables must exhibit parametric properties.
• If one of the variables is not parametric then an alternative measure of
association is chosen.
Spearman’s rho
• May be used for non-parametric data.
Chi Square measure of association
• Use for nominal data (Gender, position etc)
Summary
i. Check for parametric properties
ii. Always view the scatter graph first
iii. Check that relationship appears linear
iv. Look for outliers
v. Consider carefully the range you have sampled
vi. Calculate r2
vii. Explain the shared variance and unexplained variance
34

4. correlations

  • 1.
    Steve Saffhill Research Methodsin Sport & Exercise Correlations
  • 2.
     We cannotprove anything by science.  This does not detract from the importance of science neither does it detract from the amazing achievements of science.  If we cannot prove then what can we do?  We can make statements based on probability.  Probability can be likened to the ‘odds’ of something happening.  ‘Real and not due to chance’
  • 3.
    Remember!! We cannot prove– therefore we need to make statements about how confident we are in saying what we are saying. What is the probability that the result is due to the intervention (IV)? Statistics is used to determine the probability that the no effect statement (called the null hypothesis) is not supported. If we are confident that the null hypothesis is not supported then we can confidently accept the research hypothesis.
  • 4.
    How inferential statisticswork • Inferential statistics test a null hypothesis • They produce a probability value “p value” for you to interpret! • This is a value calculated of “whether there is a likelihood of an apparent relationship (or difference in t-tests/ANOVA) between two or more things is down to chance or not”!
  • 5.
    P Values • Ifp = .05 then in 95 cases out of 100 the result is real and not due to chance (i.e., there is 5% chance of rejecting the null when in fact it may be true). • If p = .01 then 99 cases out of 100 the result is real and not due to chance (i.e., 1% chance of rejecting null when it may be true). • Rejection of null hypothesis when in fact true is Type I error
  • 6.
    What p valuedo we use? • In addition to SPSS giving us a p value when we run our stats we also set a p value at the start of the study to compare it to = 0.05 (.05 same thing) at the start of any study (called alpha: ). • So... if SPSS gives us a p value of less than the one we set at the start of the study (i.e., p<.05) then we say that our results are real and not due to chance! • And...We reject the null hypothesis! • If the p value is more than .05 (i.e., p>.05) then we conclude there is no relationship or difference • And... we accept the null hypothesis!
  • 7.
    Inferential Test: Correlation Testingfor relationships Parametric data Non-parametric data Pearson Product moment correlation Spearman rank order correlation
  • 8.
    Parametric Assumptions Reminder: 1.The data must be randomly sampled 2. The data must be high level data (interval/ratio not nominal or ordinal) 3. The data must be normally distributed a) curve & b) z scores!!! 4. The data must be of equal variance
  • 9.
    Correlation Correlation = associationor ‘going together’ between variables  Expenditure is correlated with income.  Swimming speed is correlated with stroke rate.  High jump performance is correlated with Height. Each statement is one variable associated with more of a second variable.
  • 10.
    • However ‘more’is vague - mathematically we need to quantify what is meant by ‘more’? • The mathematical technique of correlation was devised to specify the extent to which two things (variables) are associated.
  • 11.
    SPSS gives usa value for the Correlation coefficient = the number used to express the extent of association THIS IS CALLED THE (r) value Perfect association, i.e. a lot of one variable is always associated with a lot of another variable will have a correlation coefficient of +1.00 (r=1) If there is no association between two variable then the correlation coefficient (r) = 0.00 Most variables will have values somewhere between 0 and 1.00 (either +/-)
  • 12.
    Positive correlation ‘Improved physicalfitness is related to increased levels of exercise’ • In this case more of one variable (fitness) is accompanied by more of the other (training) • Another way of expressing this is as a direct relationship.
  • 13.
    Negative correlation ‘Outside temperatureand weight of clothing worn’ • In this case more of one variable (temperature) is accompanied by less of the other (weight of clothing) • Another way of expressing this is as an inverse relationship. • Instead of running between 0.00 and +1.00, a negative correlation coefficient takes values between 0.00 and –1.00
  • 14.
    Range of correlationcoefficients Values may be interpreted as follows: 0.2 = a tendency to be related 0.5 = moderate relationship 0.9 = strong relationship
  • 15.
    Graphical representation ofCorrelations Such a graph is referred to as a scatter graph or scatter plot
  • 16.
    Examples of correlationcoefficients (r value)  This is a statistic for testing a supposed LINEAR association between two variables has the symbol r  The line on the scatter graph around which the points are evenly dispersed is called by various names, e.g. the line of best fit or regression line  The closer r is to 1.00 (+ or -) the closer the points are dispersed around the line of best fit.  E.g., more linear
  • 17.
    Graphical representation Good News. Itis quite possible, from inspection of a scatter plot, to do two things: (1) Determine whether there is linear relationship between the variables, in which case the correlation inferential test is a meaningful statistic to use (2) Fairly accurately estimate what the value of the correlation statistic (r) would be if calculated.
  • 18.
    Bad News Correlation doesnot show the relationship or causality The moral.. Always work from the scatter plot first and decide if the Pearson correlation is a suitable statistic to use. A researcher runs 4 correlation tests and gets an r value of .94 from SPSS for all 4 of them!!! What does it mean?...............................BUT Correlation does not show the relationship or causality
  • 19.
    19 Correlations 1 .816** . .002 1111 .816** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X1 Y1 X1 Y1 Correlation is significant at the 0.01 level (2-tailed). **. Correlations 1 .816** . .002 11 11 .816** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X1 Y2 X1 Y2 Correlation is significant at the 0.01 level (2-tailed). **. Correlations 1 .816** . .002 11 11 .816** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X1 Y3 X1 Y3 Correlation is significant at the 0.01 level (2-tailed). **. Correlations 1 .817** . .002 11 11 .817** 1 .002 . 11 11 Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N X2 Y4 X2 Y4 Correlation is significant at the 0.01 level (2-tailed). **. Y1 1110987654 X1 16 14 12 10 8 6 4 2 Y2 109876543 X1 16 14 12 10 8 6 4 2 Y3 141210864 X1 16 14 12 10 8 6 4 2 Y4 141210864 X2 20 18 16 14 12 10 8 6
  • 20.
    Limitations of correlationstudies Correlation does not imply causation A correlation between two variables does not mean that one causes the other. • Does anxiety cause a reduction in performance? • Does performance cause anxiety? • Or is it something else unidentified that is leading to an increase?
  • 21.
    Causation can onlybe shown via an experimental study in which an independent variable can be manipulated to bring about an effect. • E.g. does EPO use improve cycling performance?
  • 22.
    Experimental Design Example Does EPO use improve cycling performance?  Two group of trained cyclists  EPO vs Control groups  Hypothesis:  EPO use will improve cycling performance  We’ve now set up a hypothesis to test!  Following testing EPO users performed better  But…what else might have led to the data we found??
  • 23.
    Performance Differences Type of bike Genetics Day ofthe week Time of day Fatigue EPO dose Individual Response Gender Altitude training
  • 24.
    • How dowe know that the difference we observed has been caused by our manipulation of the IV (EPO v no EPO) and not one of the other factors? • We can limit the impact of these other factors!!! • Done by randomly allocating people to the conditions of our IV • This reduces probability that the 2 groups differ on things like training volume etc and thus eliminates these as possible causes! • = more confident in our ability to infer a causal relationship!
  • 25.
    For example, thereis a strong positive correlation between death by drowning and ice cream sales. When many ice creams are sold more people die by drowning. You could not conclude that ice cream causes drowning; anymore than you could conclude that a high incidence of drowning causes people to buy ice cream. Why do you think the two variable are strongly positively correlated? Clue: look for a variable that affects both ice cream sales and drowning in a like fashion.
  • 26.
    Interpreting reliability ofcorrelation results  If the study were to be repeated what is the chance of obtaining the same result?  To test the reliability we need a research hypothesis and a null hypothesis Research hypothesis - a relationship exists Null hypothesis - no relationship exists
  • 27.
    Beware Sample size exertsconsiderable effect on reliability A weak correlation will be regarded as significant (reliable) if sample size is large and A strong correlation will be insignificant if the sample size is small. What do you do? Apply your own judgement - statistics will not interpret your results! 27
  • 28.
    The meaningfulness ofr  Since the reliability of r may be in doubt we need to know how meaningful the value is.  This implies that although a correlation may exist and it is reliable (significant) - what does it mean?  What does it tell us about the relationship between the two variables?  The association might be statistically significant but is it of any importance?  Meaningfulness is often interpreted by the coefficient of determination, R2 28
  • 29.
     In thismethod, the portion of common association of the factors that influence the two variables is determined.  In other words, the coefficient of determination (R2) indicates that portion of the total variance in one measure that can be explained, or accounted for, by the variance in the other measure.  Standing long jump and vertical jump for example.  R2 = r x r  Shared variance = r2 x 100 = ?%
  • 30.
     What isequally interesting is the unexplained variance - which if r = 0.7, therefore R2 = 0.49 (0.7 x 0.7 = .49)  Shared Variance = 49%  More than half of the factors affecting each don’t relate to one another (51% is explained by something else). 30
  • 31.
    What Shared Variancemeans 31 The Venn diagram above illustrates the meaning of the coefficient of determination - the a of common variance (area of overlap)
  • 32.
     The unexplainedvariance is due to unique factors applicable to each event;  i.e. factors that affect one variable but not the other and vice versa.  Of course, the study is not designed to give this answer but it often generates some interesting discussion and may point the way for future research. 32
  • 33.
    Types of CorrelationStatistics 33 Pearson’s r • A parametric statistic – both variables must exhibit parametric properties. • If one of the variables is not parametric then an alternative measure of association is chosen. Spearman’s rho • May be used for non-parametric data. Chi Square measure of association • Use for nominal data (Gender, position etc)
  • 34.
    Summary i. Check forparametric properties ii. Always view the scatter graph first iii. Check that relationship appears linear iv. Look for outliers v. Consider carefully the range you have sampled vi. Calculate r2 vii. Explain the shared variance and unexplained variance 34

Editor's Notes

  • #8 How to select an appropriate test??