Correlation

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Notes on slide 1

    7126/6667 Survey Research & Design in Psychology Semester 1, 2009, University of Canberra, ACT, Australia James T. Neill Home page: http://ucspace.canberra.edu.au/display/7126 Lecture page: http://ucspace.canberra.edu.au/display/7126/Lecture+-+Correlations Image source: http://www.flickr.com/photos/jurvetson/916142/ By jurvetson: http://www.flickr.com/photos/jurvetson/ License: CC-by-A 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en Description: Overviews non-parametric and parametric approaches to (bivariate) linear correlation. This lecture is accompanied by a computer-lab based tutorial, the notes for which are available here: http://ucspace.canberra.edu.au/display/7126/Tutorial+-+Correlations This is our first, and in many ways, only look at bivariate relations (the relations between two variables). The previous lecture (Descriptives and Graphing) examined the distributional properties of a single variable (univariate). The next lecture will look at analysing the correlational properties of many variables simultaneously.

    5 Favorites

    Correlation - Presentation Transcript

    1. Lecture 4 Survey Research & Design in Psychology James Neill, 2009 Correlations
      • Purpose of correlation
      • Covariation
      • Correlational analyses
        • Types of answers
        • Types of correlation
        • Interpretation
        • Assumptions / Limitations
      • Dealing with several correlations
      Overview
    2. Purpose of correlation The purpose of correlation is to help address the question: “ What is the
      • relationship or
      • degree of association or
      • amount of shared variance
      between two variables? ” Direction Strength Statistical sig.
    3. Purpose of correlation Another way of putting the correlational question is: “ To what extent are two variables
      • dependent or
      • independent
      of one another? ”
    4. The world is made of covariation.
    5. We observe covariations in the psycho-social world. We can measure our observations. Depictions of violence in the environment. Psychological states such as stress and depression.
    6. Correlation example: World population and total carbon emissions
    7. Correlation is not causation e.g.,: Stop global warming: Become a pirate
    8. Causation may be in the eye of the beholder : It's a rather interesting phenomenon
    9. Covariations are the basis of more complex models.
    10. Correlational analyses Correlational analyses are commonly used to examine the extent to which two variables have a simple linear relationship . Correlations provide the building blocks for:
      • Factor analysis
      • Reliability
      • Regression etc.
    11. Correlational analyses Of interest in examining the linear relationship between 2 variables is the:
      • direction,
      • strength, and
        • ranges from -1 to +1
          • Sign indicates direction
          • Size indicates strength
      • statistical significance
    12. Correlational analyses Measures the extent to which:
      • differences in one variable can be predicted from differences in the other variable
      • one variable varies with another variable
    13. Types of answers
      • No relationship (independence)‏
      • Linear relationship:
        • As one variable ↑ s, so does the other (+ve)‏
        • As one variable ↑ s, the other ↓ s (-ve)‏
      • Non-linear
      • Restricted range
      • Heterogeneous samples
    14. Types of correlation To decide which of the following types of correlation to use, we need to consider the levels of measurement for each variable:
      • Phi ( Φ ) / Cramer’s V
      • Spearman’s rank / Kendall’s Tau b
      • Point bi-serial r pb
      • Product-moment or Pearson’s r
    15. Types of correlation and LOM Scatterplot Product-moment correlation ( r )‏     Int/Ratio   Recode Scatterplot Point bi-serial or Spearman/Kendall Scatterplot or clustered bar chart Spearman's Rho or Kendall's Tau   Ordinal Scatterplot, bar chart or error-bar chart Point bi-serial correlation ( r pb )‏ Clustered bar-graph Chi-squared Phi (φ) or Cramer's V Clustered bar-graph Chi-squared Phi (φ) or Cramer's V Nominal Int/Ratio Ordinal Nominal  
    16. Tufte: Graphics reveal data 6.89 8.0 5.73 5.0 4.74 5.0 5.68 5.0 7.91 8.0 6.42 7.0 7.26 7.0 4.82 7.0 5.56 8.0 8.15 12.0 9.13 12.0 10.84 12.0 12.50 19.0 5.39 4.0 3.10 4.0 4.26 4.0 5.25 8.0 6.08 6.0 6.13 6.0 7.24 6.0 7.04 8.0 8.84 14.0 8.10 14.0 9.96 14.0 8.47 8.0 7.81 11.0 9.26 11.0 8.33 11.0 8.84 8.0 7.11 9.0 8.77 9.0 8.81 9.0 7.71 8.0 12.74 13.0 8.74 13.0 7.58 13.0 5.76 8.0 6.77 8.0 8.14 8.0 6.95 8.0 6.58 8.0 7.46 10.0 9.14 10.0 8.04 10.0 Y4 X4 Y3 X3 Y2 X2 Y1 X1
    17. Tufte: Graphics reveal data.
    18. Nominal by Nominal
    19. Contingency tables
      • Bivariate frequency tables
      • Marginal totals
      • Can include %
    20. Contingency tables
    21. Contingency table: Example
    22. Contingency table: Example
    23. Example Contingency table: Example
    24. Clustered bar graph
      • Bar graph of frequencies or percentages with the category axis clustered by coloured bars to indicate the two variable’s categories
    25. Clustered bar graph
    26. Chi-squared
    27. Chi-squared: Example
    28. Chi-squared: Example
    29. Phi (  ) & Cramer’s V Phi (  )‏
      • Two dichotomous variables (2x2, 2x3, 3x2)‏
      • e.g., Gender & Pass/Fail
      Cramer’s V
      • Two dichotomous variables (3x3 or greater)‏
      • e.g., Favourite Season x Favourite Sense
    30. Phi (  ) & Cramer’s V : Example
    31. Ordinal by Ordinal
    32. Spearman’s rho ( r s )‏
      • For ranked (or recoded to ordinal) data
      • Uses product-moment correlation, but interpretations must be adjusted to consider the underlying ranked scales
        • e.g. Olympic Placing vs. World Ranking
    33. Kendall’s Tau-b
      • For ordinal/ranked data
      • Takes joint ranks into account
      • Ranges -1 to +1, but only for square tables
    34. Dichotomous by Interval/Ratio
    35. Point-biserial correlation ( r pb )‏
      • One dichotomous & one continuous variable
      • Calculate as for Pearson’s r, but adjust to consider the underlying scales
        • e.g., gender and self-esteem
    36. Point-biserial correlation ( r pb )‏: Example
    37. Point-biserial correlation ( r pb )‏: Example
    38. Product-moment correlation ( r )
      • For two interval and/or ratio variables
      • r = cov xy
      s x s y
    39. Interval/Ratio by Interval/Ratio
    40. Scatterplots
      • Plot each pair of observations (X, Y)‏
      • x = predictor variable (independent)‏
      • y = criterion variable (dependent)‏
      • Check for:
        • outliers
        • linearity
      • ‘ Line of best fit’
      y = a + bx
      • The correlation between 2 variables is a measure of the degree to which pairs of numbers (points) cluster together around a best-fitting straight line
      • By convention, the independent variable (IV) should be plotted on the x (horizontal) axis, and the dependent variable (DV) on the y (vertical) axis.
      Scatterplots
    41. Scatterplot showing relationship between age & blood cholesterol
    42. What's wrong with this scatterplot? IV should treated as X and DV as Y, although this is not always distinct.
    43. Why is infant mortality positively linearly associated with the number of physicians (with the effects of GDP removed)? – because more doctors tend to be deployed to areas with infant mortality (socio-economic status aside). S catterplot example: Strong positive (.81)‏
    44. Scatterplot example: Weak positive (.14)‏
    45. Scatterplot example: Moderately strong negative (-.76)‏
    46. Covariance
      • Variance shared by 2 variables
      • Covariance reflects the direction of the relationship:
      +ve cov indicates + relationship -ve cov indicates - relationship. Cross products
    47. Covariance: Cross-products -ve cross products -ve dev. products -ve dev. products +ve dev. products +ve dev. products
    48. Covariance
      • Covariance is dependent on the scale of measurement used
      • Can’t compare magnitude of cov across different scales of measurement (e.g., age by weight in kilos versus age by weight in grams).
      • Therefore, standardise covariance -> > correlation
    49. Correlation formulae
    50. For a given set of data the covariance between X and Y is .80. The SD of X is 2 and the SD of Y is 3. The resulting correlation is closest to: a. .00 b. .15 c. .80 d. -.30 Covariance, SD , and correlation: Quiz Answer: .80 / 2 x 3 = .15
    51. The correlation between 2 variables is:
      • an effect size – i.e., standardised measure of amount of covariation.
      Correlation & effect size
    52. Correlation - SPSS
    53. Hypothesis testing Almost all correlations are not 0, therefore: “ What is the likelihood that a relationship between variables is a ‘true’ relationship, or could it simply be a result of random sampling variability or ‘chance’?”
    54. Significance of correlation
      • Null hypothesis (H 0 ): assumes that there is no ‘true’ relationship
      • Alternative hypothesis (H 1 ): assumes that the relationship is real
      • We initially assume the null hypothesis, and evaluate whether the data support the alternative hypothesis
    55. Significance of correlation
      • Null hypothesis: H 0 :  = 0
      • Alternative hypothesis H 0 :  0
       rho  = population product-moment correlation coefficient
    56. How do we test the null hypothesis?
      • Use statistical tests of probability which produce a p value
      • Select a criterion (or critical value) for statistical significance (alpha ( α ) level); commonly .05‏
    57. How do we test the null hypothesis?
      • Calcute a p value and compare it to the critical value.
      • If p is less than the critical value, this indicates statistical significance, i.e., that there is less than a X% chance that the relationship being tested is due to random sampling variability.
    58. Imprecision in hypothesis testing
      • Type I error : rejecting the null hypothesis when it is true
      • Type II error : Accepting the null hypothesis when it is false
      • Statistical significance is a function of:
        • effect size,
        • sample size
        • alpha level
    59. Significance of correlation
      • A 1- or 2-tailed significance test can be done in an effort to infer to a population
      • Result will depend on the power of study (i.e., higher N, p , and r , more likely to be sig.)‏
      • Alternatively look up r tables with df = N - 2
    60. Scatterplot showing a confidence interval for a line of best fit
    61. US States 4 th Academic Achievement by SES
    62. Significance of correlation df critical ( N -2) p = .05 5 .67 10 .50 15 .41 20 .36 25 .32 30 .30 50 .23 200 .11 500 .07 1000 .05
    63. If the correlation between Age and test Performance is statistically significant, it means that: a. there is an important relationship between Age and test Performance b. the true correlation between Age and Performance in the population is equal to 0 c. the true correlation between Age and Performance in the population is not equal to 0 d. getting older causes you to do poorly on tests Significance of correlation
    64. Descriptive assumptions
      • Levels of measurement ≥ interval
      • Linear relations
      • No outliers
    65. Inferential assumptions
      • Homoscedasticity
      • Similar, normal underlying distributions
      • Correction for attenuation (unreliability)‏
      • Minimal and normally distributed measurement error
    66. Homoscedasticity
    67. Factors that affect correlation
      • Covariation
      • Restricted range
      • Heterogenous samples
      • Scale has no effect
    68. Coefficient of Determination ( r 2 )‏
      • CoD = The proportion of variance or change in one variable that can be accounted for by another variable.
      • e.g., r = .60, r 2 = .36
    69. Interpreting correlation (Cohen, 1988)‏ A correlation is an effect size, so guidelines re strength can be suggested. Strength r r 2 weak: .1 to .3 (1 to 10%)‏ moderate: .3 to .5 (10 to 25%)‏ strong: >.5 (> 25%)‏
    70. Size of correlation (Cohen, 1988)‏ WEAK (.1 - .3)‏ MODERATE (.3-.5)‏ STRONG (>.5)‏
    71. Interpreting correlation Strength r r 2 very weak 0 - .19 (0 to 4%)‏ weak .20 - .39 (4 to 16%)‏ moderate .40 - .59 (16 to 36%)‏ strong .60 - .79 (36% to 64%)‏ very strong .80 - 1.00 (64% to 100%)‏
    72. Interpreting correlation
    73. Interpreting correlation
    74. Interpreting correlation 4
    75. Correlation of this scatterplot = .9
    76. Correlation of this scatterplot = .9
    77. What do you estimate the correlation of this scatterplot of height and weight to be? a. -.5 b. -1 c. 0 d. .5 e. 1
    78. What do you estimate the correlation of this scatterplot to be? a. -.5 b. -1 c. 0 d. .5 e. 1
    79. What do you estimate the correlation of this scatterplot to be? a. -.5 b. -1 c. 0 d. .5 e. 1
    80. Assumptions / Limitations
      • Linearity & non-linearity
      • Range restriction
      • Heterogenous samples
      • Types of answers
      • Types of correlation
      • Interpretation
      • Assumptions / Limitations
      Assumptions / Limitations: Overview
    81. Non-linear relationships Check scatterplot Can a linear relationship ‘capture’ the lion’s share of the variance? If so,use r .
    82. Non-linear relationships
      • If non-linear, consider transforming variables to ‘create’ linear relationship = equivalent to finding a non-linear mathematical function to describe the relationship between the variables
    83. Range restriction
      • Range restriction is when sample contains restricted (or truncated) range of scores
        • e.g., fluid intelligence and age < 18 might have linear relationship
      • If range restriction, be cautious in generalising beyond the range for which data is available
        • E.g., fluid intelligence does not continue to increase linearly with age after age 18
    84. Range restriction
    85. Heterogenous samples
      • Sub-samples (e.g., males & females) may artificially increase or decrease overall r.
      • Solution - calculate r separately for sub-samples & overall, look for differences
    86. Scatterplot of Same-sex & Opposite-sex Relations by Gender ♂ r = .67 ♀ r = .52
    87. Scatterplot of Weight and Self-esteem by Gender ♂ r = .50 ♀ r = -.48
    88. Effect of outliers
      • Outliers can disproportionately increase or decrease r .
      • Options
        • compute r with & without outliers
        • get more data for outlying values
        • recode outliers as having more conservative scores
        • transformation
        • recode variable into lower level of measurement
    89. Age & self-esteem ( r = .63)‏
    90. Age & self-esteem (outliers removed) r = .23
    91. Dealing with Several Correlations
    92. Dealing with several correlations
      • Scatterplot matrices and correlation matrices organise correlations amongst several variables at once.
      • Example (Kliewer et al, 1998)‏
        • 99 young children
        • Measured level of
          • Witnessed violence, Intrusive thoughts, Social support, and Internalizing symptoms
    93. Correlation matrix: Example of an APA Style Correlation Table
    94. Scatterplot matrix
    95. Checklist
      • Graphs & Scatterplots
        • Outliers?
        • Linear?
        • Does each variable have a reasonable range?
        • Are there sub-samples to consider?
      • Choose appropriate measure of association
      • Conduct inferential test (if needed)‏
      • Interpret/Discuss
    96. Reporting
      • Relate back to research hypothesis
      • Describe & interpret correlation
        • direction of relationship
        • size/strength
        • significance
      • If many correlations, report in a table
      • Acknowledge limitations e.g.,
        • Heterogeneity (sub-samples)‏
        • Range restriction
        • Causality?
    97. Writing-up example “ Number of children and marital satisfaction were inversely related ( r (48) = -.35, p < .05), such that contentment in marriage tended to be lower for couples with more children. Number of children explained approximately 10% of the variance in marital satisfaction, a small-moderate effect.”
    98. Key points
      • Covariations are the building blocks of more complex analysies, e.g., reliability analysis, factor analysis, multiple regression
      • Correlation does not prove causation – may be in opposite direction, co-causal, or due to other variables.
    99. Key points
      • Check scatterplots to see whether a correlation makes sense
      • Choose measure of correlation based on levels of measurement .
      • Report measures of effect size (e.g., Φ , Cramer's V , r , r 2 ), direction of relationship , and inferential statistical significance .

    + University of CanberraUniversity of Canberra, 2 years ago

    custom

    8057 views, 5 favs, 3 embeds more stats

    Overviews non-parametric and parametric approaches more

    More info about this document

    CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

    Go to text version

    • Total Views 8057
      • 7290 on SlideShare
      • 767 from embeds
    • Comments 0
    • Favorites 5
    • Downloads 268
    Most viewed embeds
    • 765 views on http://ucspace.canberra.edu.au
    • 1 views on http://64.233.179.104
    • 1 views on http://static.slideshare.net

    more

    All embeds
    • 765 views on http://ucspace.canberra.edu.au
    • 1 views on http://64.233.179.104
    • 1 views on http://static.slideshare.net

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories