Your SlideShare is downloading. ×
Correlation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Correlation

54,651

Published on

Overviews non-parametric and parametric approaches to (bivariate) linear correlation. See also: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Correlation

Overviews non-parametric and parametric approaches to (bivariate) linear correlation. See also: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Correlation

Published in: Education, Technology
3 Comments
18 Likes
Statistics
Notes
No Downloads
Views
Total Views
54,651
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1,685
Comments
3
Likes
18
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 7126/6667 Survey Research & Design in Psychology
    Semester 1, 2014, University of Canberra, ACT, Australia
    James T. Neill
    Home page: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology
    Lecture page: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Correlation
    Image source:http://commons.wikimedia.org/wiki/File:Gnome-power-statistics.svg
    Image author: GNOME project
    Image license: GPL, http://en.wikipedia.org/wiki/GNU_General_Public_License
    Description: Overviews non-parametric and parametric approaches to (bivariate) linear correlation.
    This lecture is accompanied by a computer-lab based tutorial, the notes for which are available here: http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Tutorials/Correlation
  • This lecture explains:
    The purpose of using parametric and non-parametric correlational analyses (i.e., what question(s) are we trying to answer)
    The nature of covariation – what does it mean if two variables covary or “vary together”
    Correlational analyses
    Types of answers – What are we looking for? What can we conclude?
    Types of correlation – And how to select appropriate correlational measures and graphical techniques based on the variables' level of measurement
    Interpretation – of correlational relations and graphs
    Assumptions / Limitations
    Dealing with several correlations
  • Image source: http://commons.wikimedia.org/wiki/File:Bee_1_by_andy205.jpg
    Image author: http://www.sxc.hu/profile/andy205
    License: Copyright, can be used for any purpose as stated on the image author's profile
    Responses to a single variable will vary (i.e., they will be distributed across a range).
    Responses to two or more variables may covary, i.e., they may share some variation e.g., when the value of one variable is high, the other also tends to be relatively high (or low).
    The world is made of covariation! Look around – look closely - everywhere we look, there are patterns of covariation, i.e., when two or more states tend to co-occur - e.g., higher rainfall tends to be associated with lusher plant growth.
    From the distribution of stars to the behaviour of ants, we can observe predictable co-occurrence of phenomena (they tend to occur together) - e.g., when students study harder, they tend to perform better on assessment tasks.
  • Image source: http://www.flickr.com/photos/notsogoodphotography/420751815/
    By notsogoodphotography: http://www.flickr.com/photos/notsogoodphotography/
    License: CC-by-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en
    We observe covariations in the psycho-social world.
    We can measure our observations.
  • Image source: http://www.flickr.com/photos/kimota/146865077/
    By gualtiero: http://www.flickr.com/photos/kimota/
    License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
    The covariation between a set of variables provides the underlying the building blocks for all the major types of statistical modelling.
    If you can understand bivariation covariation (the shared variation of two variables) then you are well on your way to being able to understand multivariate analyses and modelling techniques.
    Correlation underlies more advanced methods in this course:
    - internal consistency
    - linear and multiple regression
    - factor analysis
    One of the themes I’ve been trying to convince you of so far in this course is that a solid grasp of the power of supposedly ‘basic’ statistics is extremely valuable for both:
    - reporting descriptive statistics
    - as a bases for understanding advanced statistical methods
    Correlation is one of the most useful underlying basic statistics.
  • This is a basic, fundamental bivariate question. Really, the only other question we can ask is about differences between two variables. However, these two questions are essential the same, just framed in different ways, a point we will came back in later lectures on analysing differences.
  • Image source: http://commons.wikimedia.org/wiki/File:Scatterplot_r%3D-.76.png
    Image author: Online Statistics Education: A Multimedia Course of Study Project Leader: David M. Lane, Rice University
    Image license: Public domain
  • See also:
    http://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Tutorials/Correlation/Types_of_correlation_and_level_of_measurement
    http://wilderdom.com/courses/surveyresearch/tutorials/2/#Types
    http://ucspace.canberra.edu.au/display/7126/Types+of+correlation+and+level+of+measurement
  • Image source: http://www.health.state.pa.us/hpa/stats/techassist/risknotion.htm
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Available through SPSS Crosstabs
    From the Quick Fun Survey Data – Tutorial 2
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Available through SPSS Crosstabs - Cells
    From the Quick Fun Survey Data – Tutorial 2
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Available through SPSS Crosstabs - Cells
    From the Quick Fun Survey Data – Tutorial 2 - Correlation
  • Image source: http://www.burningcutlery.com/derek/bargraph/
    http://www.burningcutlery.com/derek/bargraph/cluster.png
    Image license: Appears to be GPL, http://www.burningcutlery.com/derek/bargraph/gpl.txt
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Available through SPSS Graphs – Bar Chart - Clustered
    From the Quick Fun Survey Data – Tutorial 2 - Correlation
  • Check minimum expected cell size >= 5
    Image source:
    http://en.wikiversity.org/wiki/Pearson%27s_chi-square_test
    Image license: CC by SA 3.0, Creative Commons Attribution/Share-Alike Licens, http://creativecommons.org/licenses/by-sa/3.0/
    Previously used: http://www.geography-site.co.uk/pages/skills/fieldwork/statimage/chisqu.gif
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    From the Quick Fun Survey Data – Tutorial 2 - Correlation
  • Image source: Unknown
    Chi-square obtained (1 df) = 10.26 > Chi-square critical value for critical alpha .05 (1 df) = 3.84, therefore the sample data was unlikely to have come from a population in which the two variables are unrelated, i.e., reject the null hypothesis that there is no relationship between the variables.
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Available through SPSS Crosstabs - Statistics
  • Correlations still range -1 to +1
  • Available through SPSS Crosstabs or Correlate – Bivariate
    Reference:
    http://en.wikipedia.org/wiki/Kendall_tau_rank_correlation_coefficient
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    On the x-axis, 0 = believe; 1 = don’t believe
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Is there a relationship between belief in God and the number of countries visited?
    From the Quick Fun Survey data – Tutorial 2
  • Image source: Unknown
    Image author: Unknown
    Image license: Unknown
  • Image source: http://www.flickr.com/photos/brandi666/495013956/
    IV should be treated as X and DV as Y, although this is not always distinct.
  • Image source: Howell (Fundamentals)‏
  • Image source: Howell (Fundamentals)‏
  • Image source: Howell (Fundamentals)‏
  • Image source: Wikiversity
  • Recall that variance is sum of (X-meanX) squared / N-1
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    When would covXY be large and positive?
    When would covXY be large and negative?
  • b. .15 because the SD’s are 2 and 3 respectively, which will result in the covariance being standardized to less than half its value for the correlation.
  • Research hypothesis - predict that there will be a negative relationship between weight & self-esteem, such that as weight increases self-esteem decreases.
    Null hypothesis: no relationship between weight & SE.
  • PASW automatically provides sig. test when using Correlate – Bivariate
    Also try this handy calculator: Significance of the Difference Between an Observed Correlation Coefficient and a Hypothetical Value of rho
    http://faculty.vassar.edu/lowry/rpop.html
    Alternatively look up r tables with df = N - 2
  • Image source: Howell (Fundamentals)‏
  • The size of correlation required to be significant decreases as N increases – why?
    To test the hypothesis that the population correlation is zero (p = 0) use:t = [r*SQRT(n-2)]/[SQRT(1-r2)]
    where r is the correlation coefficient and n is sample size.
    Look up the t value in a table of the distribution of t, for (n - 2) degrees of freedom. If the computed t value is as high or higher than the table t value, then conclude r is significant different from 0).
    http://www2.chass.ncsu.edu/garson/pa765/correl.htm#signif
  • Image license: Unknown
    A confidence interval can be calculated for a line of best fit (or correlation). If the lower and upper confidence intervals are both above or below 0, then it indicates a significant difference from no correlation (zero).
    For more information, see:
    http://onlinestatbook.com/chapter8/correlation_ci.html
    pp. 260-261, Howell (Methods)‏
    How do I set confidence limits on my correlation coefficients?
    SPSS does not compute confidence limits directly.
    http://www2.chass.ncsu.edu/garson/pa765/correl.htm#limits
  • Image source: http://www.chriscorrea.com/2005/poverty-and-performance-on-the-naep/
  • c. the true correlation between and Age and Performance is not equal to 0
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    References
    Allen & Bennett, 2010, p. 173
    Howell, 2010, p. 344
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
  • Evans, J. D. (1996). Straightforward statistics for the behavioral sciences . Pacific Grove, CA: Brooks/Cole Publishing.
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    This plot is the same as the previous one - the x-axis display has simply been rescaled.
    Also see the effect of linear transformation on r:
    http://davidmlane.com/hyperstat/A69920.html
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Answer: c. 0
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Answer: c. 0 (no variation in X scores)‏
  • Also see end of Howell (Fundamentals) Ch9 for an example write-up
  • Also see Howell (Methods) p. 263
    http://www2.chass.ncsu.edu/garson/pa765/correl.htm
  • Evans, J. D. (1996). Straightforward statistics for the behavioral sciences . Pacific Grove, CA: Brooks/Cole Publishing.
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
    Statistical tests are available non-linear relationships, but it is best to start with a visual examination of the plotted data.
  • Evans, J. D. (1996). Straightforward statistics for the behavioral sciences . Pacific Grove, CA: Brooks/Cole Publishing.
  • Image source: Unknown
    Image license: Unknown
    Homoscedasticity is guaranteed by having normally distributed variables.
    Heteroscedasticity weakens but does not invalidate correlational analysis.
    http://www.pfc.cfs.nrcan.gc.ca/profiles/wulder/mvstats/homosced_e.html
    http://davidmlane.com/hyperstat/A121947.html
  • Image source: http://davidmlane.com/hyperstat/A68809.html
    Image license: Unknown
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
  • Image author: James Neill, 2007
    Image license: Creative Commons Attribution 2.5
  • Image source: http://en.wikipedia.org/wiki/Image:Pchart.jpg
  • Image source: http://en.wikipedia.org/wiki/Image:Pchart.jpg
  • Image source: Unknown
  • Image source: Unknown
  • Image source: This lecture
    Image author: James Neill
    Image license: Creative Commons Attribution 3.0, CC-by-A 3.0
  • Image source: James Neill
    By James Neill
    License: Public domain
  • Transcript

    • 1. Lecture 4 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Correlation Image source: http://commons.wikimedia.org/wiki/File:Gnome-power-statistics.svg, GPL
    • 2. 2 1. Covariation 2. Purpose of correlation 3. Linear correlation 4. Types of correlation 5. Interpreting correlation 6. Assumptions / limitations 7. Dealing with several correlations Overview
    • 3. 3 Howitt & Cramer (2011/2014) ● Ch 6/7: Relationships between two or more variables: Diagrams and tables ● Ch 7/8: Correlation coefficients: Pearson correlation and Spearman’s rho ● Ch 10/11: Statistical significance for the correlation coefficient: A practical introduction to statistical inference ● Ch 14/15: Chi-square: Differences between samples of frequency data ● Note: Howitt and Cramer doesn't cover point bi-serial correlation Readings
    • 4. 4 Covariation
    • 5. 5 The world is made of covariations e.g., pollen and bees e.g., study and grades e.g., nutrients and growth
    • 6. 6 We observe covariations in the psycho- social world. We can measure our observations and analyse the extent to which they co-occur. e.g., depictions of violence in the environment. e.g., psychological states such as stress and depression.
    • 7. 7 Covariations are the basis of more complex models.
    • 8. 8 Purpose of correlation
    • 9. 9 The underlying purpose of correlation is to help address the question: What is the • relationship or • association or • shared variance or • co-relation between two variables? Purpose of correlation
    • 10. 10 Purpose of correlation Other ways of expressing the underlying correlational question include: To what extent do variables • covary? • depend on one another? • explain one another?
    • 11. 11 Linear correlation
    • 12. 12 r = -.76 Linear correlation The extent to which two variables have a simple linear (straight-line) relationship. Image source: http://commons.wikimedia.org/wiki/File:Scatterplot_r%3D-.76.png, Public domain
    • 13. 13 Linear correlation Linear relations between variables are indicated by correlations: • Direction: Correlation sign (+ / -) indicates direction of linear relationship • Strength: Correlation size indicates strength (ranges from -1 to +1) • Statistical significance: p indicates likelihood that the observed relationship could have occurred by chance
    • 14. 14 What is the linear correlation? Types of answers • No relationship (r = 0) (X and Y are independent) • Linear relationship (X and Y are dependent) –As X ↑s, so does Y (r > 0) –As X ↑s, Y ↓s (r < 0) • Non-linear relationship
    • 15. 15 Types of correlation To decide which type of correlation to use, consider the levels of measurement for each variable.
    • 16. 16 Types of correlation • Nominal by nominal: Phi (Φ) / Cramer’s V, Chi-squared • Ordinal by ordinal: Spearman’s rank / Kendall’s Tau b • Dichotomous by interval/ratio: Point bi-serial rpb • Interval/ratio by interval/ratio: Product-moment or Pearson’s r
    • 17. 17 Types of correlation and LOM Scatterplot Product- moment correlation (r) Int/Ratio ⇐⇑Recode Scatterplot or clustered bar chart Spearman's Rho or Kendall's Tau Ordinal Scatterplot, bar chart Point bi-serial correlation (rpb ) ⇐ Recode Clustered bar- chart, Chi-square, Phi (φ) or Cramer's V Nominal Int/RatioOrdinalNominal
    • 18. 18 Nominal by nominal
    • 19. 19 Nominal by nominal correlational approaches • Contingency (or cross-tab) tables – Observed – Expected – Row and/or column %s – Marginal totals • Clustered bar chart • Chi-square • Phi/Cramer's V
    • 20. 20 Contingency tables ● Bivariate frequency tables ● Cell frequencies (red) ● Marginal totals (blue)
    • 21. 21 Contingency table: Example RED = Contingency cells BLUE = Marginal totals
    • 22. 22 Contingency table: Example Chi-square is based on the differences between the actual and expected cell counts.
    • 23. 23 Example Row and/or column cell percentages may also aid interpretation e.g., ~2/3rds of smokers snore, whereas only ~1/3rd of non-smokers snore.
    • 24. 24 Clustered bar graph Bivariate bar graph of frequencies or percentages. The category axis bars are clustered (by colour or fill pattern) to indicate the the second variable’s categories.
    • 25. 25 ~2/3rds of snorers are smokers, whereas only ~1/3rd of non- snores are smokers.
    • 26. 26 Pearson chi-square test
    • 27. 27 Write-up: χ2 (1, 186) = 10.26, p = .001 Pearson chi-square test: Example
    • 28. 28 Chi-square distribution: Example
    • 29. 29 Phi (φ) & Cramer’s V Phi (φ) • Use for 2x2, 2x3, 3x2 analyses e.g., Gender (2) & Pass/Fail (2) Cramer’s V • Use for 3x3 or greater analyses e.g., Favourite Season (4) x Favourite Sense (5) (non-parametric measures of correlation)
    • 30. 30 Phi (φ) & Cramer’s V: Example χ2 (1, 186) = 10.26, p = .001, ϕ = .24
    • 31. 31 Ordinal by ordinal
    • 32. 32 Ordinal by ordinal correlational approaches • Spearman's rho (rs ) • Kendall tau (τ) • Alternatively, use nominal by nominal techniques (i.e., recode or treat as lower level of measurement)
    • 33. 33 Graphing ordinal by ordinal data • Ordinal by ordinal data is difficult to visualise because its non-parametric, yet there may be many points. • Consider using: –Non-parametric approaches (e.g., clustered bar chart) –Parametric approaches (e.g., scatterplot with binning)
    • 34. 34 Spearman’s rho (rs ) or Spearman's rank order correlation • For ranked (ordinal) data –e.g. Olympic Placing correlated with World Ranking • Uses product-moment correlation formula • Interpretation is adjusted to consider the underlying ranked scales
    • 35. 35 Kendall’s Tau (τ) • Tau a –Does not take joint ranks into account • Tau b –Takes joint ranks into account –For square tables • Tau c –Takes joint ranks into account –For rectangular tables
    • 36. 36 Dichotomous by interval/ratio
    • 37. 37 Point-biserial correlation (rpb ) • One dichotomous & one continuous variable –e.g., belief in god (yes/no) and amount of international travel • Calculate as for Pearson's product-moment r, • Adjust interpretation to consider the underlying scales
    • 38. 38 Point-biserial correlation (rpb ): Example Those who report that they believe in God also report having travelled to slightly fewer countries (rpb = -.10) but this difference could have occurred by chance (p > .05), thus H0 is not rejected. Do not believe Believe
    • 39. 39 Point-biserial correlation (rpb ): Example 0 = No 1 = Yes
    • 40. 40 Interval/ratio by interval/ratio
    • 41. 41 Scatterplot • Plot each pair of observations (X, Y) –x = predictor variable (independent; IV) –y = criterion variable (dependent; DV) • By convention: –IV on the x (horizontal) axis –DV on the y (vertical) axis • Direction of relationship: –+ve = trend from bottom left to top right –-ve r= trend from top left to bottom right
    • 42. 42 Scatterplot showing relationship between age & cholesterol with line of best fit
    • 43. 43 • The correlation between 2 variables is a measure of the degree to which pairs of numbers (points) cluster together around a best-fitting straight line • Line of best fit: y = a + bx • Check for: –outliers –linearity Line of best fit
    • 44. 44 What's wrong with this scatterplot? IV should be treated as X and DV as Y, although this is not always distinct.
    • 45. 45 Scatterplot example: Strong positive (.81) Q: Why is infant mortality positively linearly associated with the number of physicians (with the effects of GDP removed)? A: Because more doctors tend to be deployed to areas with infant mortality (socio-economic status aside).
    • 46. 46 Scatterplot example: Weak positive (.14)
    • 47. 47 Scatterplot example: Moderately strong negative (-.76)
    • 48. 48 Pearson product-moment correlation (r) ● The product-moment correlation is the standardised covariance.
    • 49. 49 Covariance • Variance shared by 2 variables • Covariance reflects the direction of the relationship: +ve cov indicates +ve relationship -ve cov indicates -ve relationship Cross products 1 ))(( − −−Σ = N YYXX CovXY
    • 50. 50 Covariance: Cross-products -ve cro ss pro duc ts X1 403020100 Y1 3 3 2 2 1 1 0 -ve cross products -ve cross products +ve cross products +ve cross products
    • 51. 51 Covariance • Depends on the measurement scale → Can’t compare covariance across different scales of measurement (e.g., age by weight in kilos versus age by weight in grams). • Therefore, standardise covariance (divide by the cross-product of the SDs) → correlation • Correlation is an effect size – i.e., standardised measure of strength of linear relationship
    • 52. 52 For a given set of data the covariance between X and Y is 1.20. The SD of X is 2 and the SD of Y is 3. The resulting correlation is: a. .20 b. .30 c. .40 d. 1.20 Covariance, SD, and correlation: Example quiz question Answer: 1.20 / 2 x 3 = .20
    • 53. 53 Hypothesis testing Almost all correlations are not 0, therefore the question is: “What is the likelihood that a relationship between variables is a ‘true’ relationship - or could it simply be a result of random sampling variability or ‘chance’?”
    • 54. 54 Significance of correlation • Null hypothesis (H0): ρ = 0: assumes that there is no ‘true’ relationship (in the population) • Alternative hypothesis (H1): ρ <> 0: assumes that the relationship is real (in the population) • Initially assume H0 is true, and evaluate whether the data support H1. • ρ (rho) = population product-moment correlation coefficient
    • 55. 55 How to test the null hypothesis • Select a critical value (alpha (α)); commonly .05 • Can use a 1 or 2-tailed test • Calculate correlation and its p value. Compare this to the critical value. • If p < critical value, the correlation is statistically significant, i.e., that there is less than a x% chance that the relationship being tested is due to random sampling variability.
    • 56. 56 Correlation – SPSS output C o r r e l a t i o n s . 7 1 3 * * . 0 0 0 2 1 P e a r s o n C o r r e l a t i o n S i g . ( 2 - t a i l e d ) N P e a r s o n C o r r e l a t i o n S i g . ( 2 - t a i l e d ) N C i g a r e t t e C o n s u m p t i o n p e r A d u l t p e r D a y C H D M o r t a l i t y p e r 1 0 , 0 0 0 C i g a r e t t e C o n s u m p t i o n p e r A d u l t p e r D a y C H D M o r t a l i t y p e r 1 0 , 0 0 0 C o r r e l a t i o n i s s i g n i f i c a n t a t t h e 0 . 0 1 l e v e l ( 2 - t a i l e d ) . * * .
    • 57. 57 Imprecision in hypothesis testing • Type I error: rejects H0 when it is true • Type II error: Accepts H0 when it is false • Significance test result will depend on the power of study, which is a function of: –Effect size (r) –Sample size (N) –Critical alpha level (αcrit )
    • 58. 58 Significance of correlation df critical (N-2) p = .05 5 .67 10 .50 15 .41 20 .36 25 .32 30 .30 50 .23 200 .11 500 .07 1000 .05 The size of correlation required to be significant decreases as N increases – why?
    • 59. 59 Scatterplot showing a confidence interval for a line of best fit
    • 60. 60 US States 4th Academic Achievement by SES Scatterplot showing a confidence interval for a line of best fit
    • 61. 61 If the correlation between Age and test Performance is statistically significant, it means that: a. there is an important relationship between Age and test Performance b. the true correlation between Age and Performance in the population is equal to 0 c. the true correlation between Age and Performance in the population is not equal to 0 d. getting older causes you to do poorly on tests Practice quiz question: Significance of correlation
    • 62. 62 Interpreting correlation
    • 63. 63 Coefficient of Determination (r2 ) • CoD = The proportion of variance or change in one variable that can be accounted for by another variable. • e.g., r = .60, r2 = .36
    • 64. 64 Interpreting correlation (Cohen, 1988) • A correlation is an effect size • Rule of thumb Strength r r2 Weak: .1 - .3 1 - 10% Moderate: .3 - .5 10 - 25% Strong: >.5 > 25%
    • 65. 65 Size of correlation (Cohen, 1988) WEAK (.1 - .3) MODERATE (.3-.5) STRONG (>.5)
    • 66. 66 Interpreting correlation (Evans, 1996) Strength r r2 very weak 0 - .19 (0 to 4%) weak .20 - .39 (4 to 16%) moderate .40 - .59 (16 to 36%) strong .60 - .79 (36% to 64%) very strong .80 - 1.00 (64% to 100%)
    • 67. 67 X1 403020100 Y1 3 3 2 2 1 1 0 Correlation of this scatterplot = -.9 Scale has no effect on correlation.
    • 68. 68 X1 1009080706050403020100 Y1 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 Correlation of this scatterplot = -.9 Scale has no effect on correlation.
    • 69. 69 What do you estimate the correlation of this scatterplot of height and weight to be? a. -.5 b. -1 c. 0 d. .5 e. 1 WEIGHT 737271706968676665 HEIGHT 176 174 172 170 168 166
    • 70. 70 What do you estimate the correlation of this scatterplot to be? a. -.5 b. -1 c. 0 d. .5 e. 1 X 5.65.45.25.04.84.64.4 Y 14 12 10 8 6 4 2
    • 71. 71 What do you estimate the correlation of this scatterplot to be? a. -.5 b. -1 c. 0 d. .5 e. 1 X 1412108642 Y 6 5 5 5 5 5 4
    • 72. 72 Write-up: Example “Number of children and marital satisfaction were inversely related (r (48) = -.35, p < .05), such that contentment in marriage tended to be lower for couples with more children. Number of children explained approximately 10% of the variance in marital satisfaction, a small-moderate effect (see Figure 1).”
    • 73. 73 Assumptions and limitations (Pearson product-moment linear correlation)
    • 74. 74 Assumptions and limitations 1. Levels of measurement 2. Normality 3. Linearity 1. Effects of outliers 2. Non-linearity 4. Homoscedasticity 5. No range restriction 6. Homogenous samples 7. Correlation is not causation
    • 75. 75 Normality • The X and Y data should be sampled from populations with normal distributions • Do not overly rely on a single indicator of normality; use histograms, skewness and kurtosis (within -1 and +1) • Inferential tests of normality (e.g., Shapiro-Wilks) are overly sensitive when sample is large
    • 76. 76 Effect of outliers • Outliers can disproportionately increase or decrease r. • Options –compute r with & without outliers –get more data for outlying values –recode outliers as having more conservative scores –transformation –recode variable into lower level of measurement
    • 77. 77 Age & self-esteem (r = .63) AGE 8070605040302010 SE 10 8 6 4 2 0
    • 78. 78 Age & self-esteem (outliers removed) r = .23 AGE 40302010 SE 9 8 7 6 5 4 3 2 1
    • 79. 79 Non-linear relationships Check scatterplot Can a linear relationship ‘capture’ the lion’s share of the variance? If so,use r. Y2 6050403020100 X2 40 30 20 10 0
    • 80. 80 Non-linear relationships If non-linear, consider • Does a linear relation help? • Transforming variables to ‘create’ linear relationship • Use a non-linear mathematical function to describe the relationship between the variables
    • 81. 81 Scedasticity • Homoscedasticity refers to even spread about a line of best fit • Heteroscedasticity refers to uneven spread about a line of best fit • Assess visually and with Levene's test
    • 82. 82 Scedasticity
    • 83. 83 Range restriction • Range restriction is when the sample contains restricted (or truncated) range of scores –e.g., level of hormones and age < 18 might have linear relationship • If range restriction, be cautious in generalising beyond the range for which data is available –e.g., level of hormones may not continue to increase linearly with age after age 18
    • 84. 84 Range restriction
    • 85. 85 Heterogenous samples ● Sub-samples (e.g., males & females) may artificially increase or decrease overall r. ● Solution - calculate r separately for sub- samples & overall, look for differences W1 80706050 H1 190 180 170 160 150 140 130
    • 86. 86 Scatterplot of Same-sex & Opposite-sex Relations by Gender ♂ r = .67 ♀ r = .52 Opp Sex Relations 76543210 SameSexRelations 7 6 5 4 3 2 SEX female male ♂ r = .67 ♀ r = .52
    • 87. 87 Scatterplot of Weight and Self- esteem by Gender WEIGHT 120110100908070605040 SE 10 8 6 4 2 0 SEX male female ♂ r = .50 ♀r = -.48
    • 88. 88 Correlation is not causation e.g.,: correlation between ice cream consumption and crime, but actual cause is temperature
    • 89. 89 Correlation is not causation e.g.,: Stop global warming: Become a pirate As the number of pirates has reduced, global temperature has increased.
    • 90. 90 Causation may be in the eye of the beholder It's a rather interesting phenomenon. Every time I press this lever, that graduate student breathes a sigh of relief.
    • 91. 91 Dealing with several correlations
    • 92. 92 Scatterplot matrices organise scatterplots and correlations amongst several variables at once. However, they are not sufficiently for over for more than about five variables at a time. Dealing with several correlations Image source: Unknown
    • 93. 93 Correlation matrix: Example of an APA Style Correlation Table
    • 94. 94 Scatterplotmatrix
    • 95. 95 Summary
    • 96. 96 Summary: Covariation 1. The world is made of covariations. 2. Covariations are the building blocks of more complex analyses, including 1. factor analysis 2. reliability analysis 3. multiple regression
    • 97. 97 Summary: Purpose of correlation 1. Correlation is a standardised measure of the extent to which two phenomenon co-relate. 2. Correlation does not prove causation – may be opposite causality, co-causal, or due to other variables.
    • 98. 98 Summary: Types of correlation • Nominal by nominal: Phi (Φ) / Cramer’s V, Chi-squared • Ordinal by ordinal: Spearman’s rank / Kendall’s Tau b • Dichotomous by interval/ratio: Point bi-serial rpb • Interval/ratio by interval/ratio: Product-moment or Pearson’s r
    • 99. 99 Summary: Correlation steps 1. Choose measure of correlation and graphs based on levels of measurement. 2. Check graphs (e.g., scatterplot)
    • 100. 100 Summary: Correlation steps 3. Consider –Effect size (e.g., Φ, Cramer's V, r, r2 ) –Direction –Inferential test (p) 4. Interpret/Discuss –Relate back to hypothesis –Size, direction, significance –Limitations e.g., • Heterogeneity (sub-samples) • Range restriction • Causality?
    • 101. 101 Summary: Interpreting correlation • Coefficient of determination –Correlation squared –Indicates % of shared variance Strength r r2 Weak: .1 - .3 1 – 10% Moderate: .3 - .5 10 - 25% Strong: > .5 > 25%
    • 102. 102 Summary: Asssumptions & limitations 1. Levels of measurement 2. Normality 3. Linearity 1. Effects of outliers 2. Non-linearity 4. Homoscedasticity 5. No range restriction 6. Homogenous samples 7. Correlation is not causation
    • 103. 103 Summary: Dealing with several correlations • Correlation matrix • Scatterplot matrix
    • 104. 104 References Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Pacific Grove, CA: Brooks/Cole Publishing. Howell, D. C. (2007). Fundamental statistics for the behavioral sciences. Belmont, CA: Wadsworth. Howell, D. C. (2010). Statistical methods for psychology (7th ed.). Belmont, CA: Wadsworth. Howitt, D. & Cramer, D. (2011). Introduction to statistics in psychology (5th ed.). Harlow, UK: Pearson.
    • 105. 105 Open Office Impress ● This presentation was made using Open Office Impress. ● Free and open source software. ● http://www.openoffice.org/product/impress.html

    ×