SlideShare a Scribd company logo
1 of 9
Download to read offline
Page 14.1 (C:dataStatPrimercorrelation.wpd)
14: Correlation
Introduction | Scatter Plot | The Correlational Coefficient | Hypothesis Test | Assumptions | An Additional Example
Introduction
Correlation quantifies the extent to which two quantitative variables, X and Y, “go together.” When high values of X
are associated with high values of Y, a positive correlation exists. When high values of X are associated with low
values of Y, a negative correlation exists.
Illustrative data set. We use the data set bicycle.sav to illustrate correlational methods. In this cross-sectional
data set, each observation represents a neighborhood. The X variable is socioeconomic status measured as the
percentage of children in a neighborhood receiving free or reduced-fee lunches at school. The Y variable is bicycle
helmet use measured as the percentage of bicycle riders in the neighborhood wearing helmets. Twelve
neighborhoods are considered:
Neighborhood
X
(% receiving reduced-fee lunch)
Y
(% wearing bicycle helmets)
Fair Oaks 50 22.1
Strandwood 11 35.9
Walnut Acres 2 57.9
Discov. Bay 19 22.2
Belshaw 26 42.4
Kennedy 73 5.8
Cassell 81 3.6
Miner 51 21.4
Sedgewick 11 55.2
Sakamoto 2 33.3
Toyon 19 32.4
Lietz 25 38.4
Three are twelve observations (n = 12). Overall, = 30.83 and = 30.883. We want to explore the relation
between socioeconomic status and the use of bicycle helmets.
It should be noted that an outlier (84, 46.6) has been removed from this data set so that we may quantify the linear
relation between X and Y.
Tukey, J. W. (1977). EDA. Reading, Mass.: Addison-Wesley, p. 43.1
Kruskal, W. H. (1959). Some Remarks on Wild Observations. http://www.tufts.edu/~gdallal/out.htm.2
Page 14.2 (C:dataStatPrimercorrelation.wpd)
Figure 2
Scatter Plot
The first step is create a scatter plot of the data. “There is no excuse for failing
to plot and look.”1
In general, scatter plots may reveal a
• positive correlation (high values of X associated with high values of Y)
• negative correlation (high values of X associated with low values of Y)
• no correlation (values of X are not at all predictive of values of Y).
These patterns are demonstrated in the figure to the right.
Illustrative example. A scatter plot of the illustrative data is shown to the
right. The plot reveals that high values of X are associated with low
values of Y. That is to say, as the number of children receiving
reduced-fee meals at school increases, bicycle helmet use rates decrease’
a negative correlation exists.
In addition, there is an aberrant observation (“outlier”) in the upper-right
quadrant. Outliers should not be ignored—it is important to say
something about aberrant observations. What should be said exactly2
depends on what can be learned and what is known. It is possible the
lesson learned from the outlier is more important than the main object of
the study. In the illustrative data, for instance, we have a low SES school
with an envious safety record. What gives?
Page 14.3 (C:dataStatPrimercorrelation.wpd)
Correlation Coefficient
The General Idea
Correlation coefficients (denoted r) are
statistics that quantify the relation between X
and Y in unit-free terms. When all points of a
scatter plot fall directly on a line with an
upward incline, r = +1; When all points fall
directly on a downward incline, r = !1.
Such perfect correlation is seldom
encountered. We still need to measure
correlational strength, –defined as the degree
to which data point adhere to an imaginary trend line passing through the “scatter cloud.” Strong correlations are
associated with scatter clouds that adhere closely to the imaginary trend line. Weak correlations are associated with
scatter clouds that adhere marginally to the trend line.
The closer r is to +1, the stronger the positive correlation. The closer r is to !1, the stronger the negative correlation.
Examples of strong and weak correlations are shown below. Note: Correlational strength can not be quantified
visually. It is too subjective and is easily influenced by axis-scaling. The eye is not a good judge of correlational
strength.
Page 14.4 (C:dataStatPrimercorrelation.wpd)
(1)
(2)
(3)
(4)
Pearson’s Correlation Coefficient
To calculate a correlation coefficient, you normally need three different sums of squares (SS). The sum of squares
for variable X, the sum of square for variable Y, and the sum of the cross-product of XY.
The sum of squares for variable X is:
XXThis statistic keeps track of the spread of variable X. For the illustrative data, = 30.83 and SS = (50!30.83) +2
x(11!30.83) + . . . + (25!30.83) = 7855.67. Since this statistic is the numerator of the variance of X (s ), it can also2 2 2
XX x XXbe calculated as SS = (s )(n!1). Thus, SS = (714.152)(12!1) = 7855.67.2
The sum of squares for variable Y is:
YThis statistic keeps track of the spread of variable Y and is the numerator of the variance of Y (s ). For the2
YYillustrative data = 30.883 and SS = (22.1!30.883) + (35.9!30.883) + . . . + (38.4!30.883) = 3159.68. An2 2 2
YY Y YYalterative way to calculate the sum of squares for variable Y is SS = (s )(n!1). Thus, SS = (287.243)(12!1) =2
3159.68.
XYFinally, the sum of the cross-products (SS ) is:
XYFor the illustrative data, SS = (50!30.83)(22.1!30.883) + (11!30.83)(35.9!30.883) + . . . + (25!30.83)(38.4 !
30.883) = !4231.1333. This statistic is analogous to the other sums of squares except that it is used to quantify the
extent to which the two variables “go together”.
The correlation coefficient (r) is
For the illustrative data, r =
Page 14.5 (C:dataStatPrimercorrelation.wpd)
Interpretation of Pearson’s Correlation Coefficient
The sign of the correlation coefficient determines whether the correlation is positive or negative. The magnitude of
the correlation coefficient determines the strength of the correlation. Although there are no hard and fast rules for
describing correlational strength, I [hesitatingly] offer these guidelines:
0 < |r| < .3 weak correlation
.3 < |r| < .7 moderate correlation
|r| > 0.7 strong correlation
For example, r = -0.849 suggests a strong negative correlation.
SPSS: To calculate correlation coefficients click Analyze > Correlate > Bivariate. Then select
variables for analysis. Several bivariate correlation coefficients can be calculated simultaneously and displayed
as a correlation matrix. Clicking the Options button and checking "Cross-product deviations and covariances”
computes sums of squares (Formulas 17.1 - 17.3).
Coefficient of Determination
The coefficient of determination is the square of the correlation coefficient (r ). For illustrative data,2
r = -0.849 = 0.72. This statistic quantifies the proportion of the variance of one variable “explained” (in a2 2
statistical sense, not a causal sense) by the other. The illustrative coefficient of determination of 0.72 suggests 72%
of the variability in helmet use is explained by socioeconomic status.
Page 14.6 (C:dataStatPrimercorrelation.wpd)
Figure 5
(5)
(6)
Hypothesis Test
The sample correlation coefficient r is the estimator of population correlation
coefficient r (rho). Recall that relations in samples do not necessarily depict the
same in the population. For example, in Figure 6, the population of all dots
demonstrates no correlation. If by chance the encircled points were sampled, an
inverse association would appear. Thus, some samples cannot be relied on.
The null and alternative hypotheses are .
If fixed-level testing is conducted, an " level is selected.
The hypothesis can be tested with a t statistic:
rwhere se represents the standard error the correlation coefficient:
Under the null hypothesis, this t statistic has n ! 2 degrees of freedom. Test results are converted to a p value before
conclusions are drawn.
r statIllustrative example. For the illustrative data, se = = 0.167 and t = = !5.08, df =
012!2 = 10, p = .00048. This provides evidence in support of the rejection of H .
SPSS: The p value — labeled “Sig. (2-tailed)” — is calculated as part of the Analyze > Correlate >
Bivariate procedure and described on the prior page.
Page 14.7 (C:dataStatPrimercorrelation.wpd)
Assumptions
We have in the past considered two types of assumptions:
• validity assumptions
• distributional assumptions
Validity assumptions require valid measurements, a good sample, unconfounded comparisons. There requirements
are consistent throughout your studies.
The only distributional assumption needed to use r descriptively is linearity. Correlation coefficients assume the
relation can be described with a straight line. The most direct way to determine linearity is to look at the scatterplot.
Inferential methods require that the joint distribution of X and Y is bivariate Normal. This means the three-
dimensional distribution of the scatter plot is bell-shaped from all angles:
Always remember that correlation does not equate with causation. The jump from a statistical relation to a causal
relation requires a different framework.
Page 14.8 (C:dataStatPrimercorrelation.wpd)
An Additional Example
The height of a child, of course, increases over time. Since the
pattern of growth varies from child to child, one way to understand
the general growth pattern is by using the average of childrens’
heights by age. Data in the table to the right consider average height
of participants in the Childhood Respiratory Disease study in East
Boston, Massachusetts by age (fev.sav; data source = Rosner,1990,
p. 39).
Data may be plotted to aid analysis. The figure at the bottom of the
page shows average age with error bars based on the standard error
meanof each mean (recall that se = s / /n, and that this is a measure of
the means precision – means based on large samples are more
precise than means based on small samples, according to the “square
root law”). Can the data be described by a straight line? (Not
entirely, but perhaps in the range 4 to 11. Might a curved line
between the ages of 4 and 17 be more useful? Perhaps we should
split the data between boys and girls?)
Page 14.9 (C:dataStatPrimercorrelation.wpd)

More Related Content

What's hot

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionKhalid Aziz
 
Simple linear regression project
Simple linear regression projectSimple linear regression project
Simple linear regression projectJAPAN SHAH
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisShailendra Tomar
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysisNimrita Koul
 
Bivariate linear regression
Bivariate linear regressionBivariate linear regression
Bivariate linear regressionMenaal Kaushal
 
Applied Statistics In Business
Applied Statistics In BusinessApplied Statistics In Business
Applied Statistics In BusinessAshish Nangla
 
Regression analysis
Regression analysisRegression analysis
Regression analysisRavi shankar
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
Regression & correlation
Regression & correlationRegression & correlation
Regression & correlationAtiq Rehman
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVADerek Kane
 
Mutiple linear regression project
Mutiple linear regression projectMutiple linear regression project
Mutiple linear regression projectJAPAN SHAH
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
Introduction to Regression Analysis and R
Introduction to Regression Analysis and R   Introduction to Regression Analysis and R
Introduction to Regression Analysis and R Rachana Taneja Bhatia
 

What's hot (20)

Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Simple linear regression project
Simple linear regression projectSimple linear regression project
Simple linear regression project
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Bivariate linear regression
Bivariate linear regressionBivariate linear regression
Bivariate linear regression
 
Applied Statistics In Business
Applied Statistics In BusinessApplied Statistics In Business
Applied Statistics In Business
 
Correlation
CorrelationCorrelation
Correlation
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression
RegressionRegression
Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression & correlation
Regression & correlationRegression & correlation
Regression & correlation
 
Data Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVAData Science - Part IV - Regression Analysis & ANOVA
Data Science - Part IV - Regression Analysis & ANOVA
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Chapter13
Chapter13Chapter13
Chapter13
 
Mutiple linear regression project
Mutiple linear regression projectMutiple linear regression project
Mutiple linear regression project
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
Introduction to Regression Analysis and R
Introduction to Regression Analysis and R   Introduction to Regression Analysis and R
Introduction to Regression Analysis and R
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 

Viewers also liked

Ad fraud & Programmatic
Ad fraud & ProgrammaticAd fraud & Programmatic
Ad fraud & ProgrammaticNeeraj Mishra
 
Como crearse un blog
Como crearse un blogComo crearse un blog
Como crearse un blogronny1b
 
Julie Jacko\'s presentation on ARRA at work in Minnesota
Julie Jacko\'s presentation on ARRA at work in MinnesotaJulie Jacko\'s presentation on ARRA at work in Minnesota
Julie Jacko\'s presentation on ARRA at work in MinnesotaJulie Jacko
 
Emprender en BilboValley. InnovUP
Emprender en BilboValley. InnovUPEmprender en BilboValley. InnovUP
Emprender en BilboValley. InnovUPOcean Data Tracker
 
Omaha Storm Chasers Community Relations Overview
Omaha Storm Chasers Community Relations OverviewOmaha Storm Chasers Community Relations Overview
Omaha Storm Chasers Community Relations Overviewjbentup
 
Omaha Storm Chasers' Community Relations 2012 Review
Omaha Storm Chasers' Community Relations 2012 ReviewOmaha Storm Chasers' Community Relations 2012 Review
Omaha Storm Chasers' Community Relations 2012 Reviewjbentup
 
Science of Drip Marketing
Science of Drip MarketingScience of Drip Marketing
Science of Drip MarketingNeeraj Mishra
 
Programmatic AD Buying - A Tactical Guide
Programmatic AD Buying - A Tactical GuideProgrammatic AD Buying - A Tactical Guide
Programmatic AD Buying - A Tactical GuideNeeraj Mishra
 

Viewers also liked (12)

Ad fraud & Programmatic
Ad fraud & ProgrammaticAd fraud & Programmatic
Ad fraud & Programmatic
 
Como crearse un blog
Como crearse un blogComo crearse un blog
Como crearse un blog
 
Julie Jacko\'s presentation on ARRA at work in Minnesota
Julie Jacko\'s presentation on ARRA at work in MinnesotaJulie Jacko\'s presentation on ARRA at work in Minnesota
Julie Jacko\'s presentation on ARRA at work in Minnesota
 
Emprender en BilboValley. InnovUP
Emprender en BilboValley. InnovUPEmprender en BilboValley. InnovUP
Emprender en BilboValley. InnovUP
 
Entering the Human Age
Entering the Human AgeEntering the Human Age
Entering the Human Age
 
Omaha Storm Chasers Community Relations Overview
Omaha Storm Chasers Community Relations OverviewOmaha Storm Chasers Community Relations Overview
Omaha Storm Chasers Community Relations Overview
 
07 podzemne vode web
07 podzemne vode web07 podzemne vode web
07 podzemne vode web
 
Omaha Storm Chasers' Community Relations 2012 Review
Omaha Storm Chasers' Community Relations 2012 ReviewOmaha Storm Chasers' Community Relations 2012 Review
Omaha Storm Chasers' Community Relations 2012 Review
 
Science of Drip Marketing
Science of Drip MarketingScience of Drip Marketing
Science of Drip Marketing
 
Društvene grupe
Društvene grupeDruštvene grupe
Društvene grupe
 
Programmatic AD Buying - A Tactical Guide
Programmatic AD Buying - A Tactical GuideProgrammatic AD Buying - A Tactical Guide
Programmatic AD Buying - A Tactical Guide
 
Adjectives and adverbs
Adjectives and adverbsAdjectives and adverbs
Adjectives and adverbs
 

Similar to Correlation (20)

Scattergrams
ScattergramsScattergrams
Scattergrams
 
Applied statistics part 4
Applied statistics part  4Applied statistics part  4
Applied statistics part 4
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Regression
RegressionRegression
Regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
TTests.ppt
TTests.pptTTests.ppt
TTests.ppt
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docx
 
Assessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docxAssessment 2 ContextIn many data analyses, it is desirable.docx
Assessment 2 ContextIn many data analyses, it is desirable.docx
 
Correlation
CorrelationCorrelation
Correlation
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
Statistics Statistics
Statistics
 
ch 13 Correlation and regression.doc
ch 13 Correlation  and regression.docch 13 Correlation  and regression.doc
ch 13 Correlation and regression.doc
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Correlation mp
Correlation mpCorrelation mp
Correlation mp
 
Chap04 01
Chap04 01Chap04 01
Chap04 01
 
Math n Statistic
Math n StatisticMath n Statistic
Math n Statistic
 
Chapter05
Chapter05Chapter05
Chapter05
 
MLR Project (Onion)
MLR Project (Onion)MLR Project (Onion)
MLR Project (Onion)
 
Correlation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptxCorrelation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptx
 

Recently uploaded

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Correlation

  • 1. Page 14.1 (C:dataStatPrimercorrelation.wpd) 14: Correlation Introduction | Scatter Plot | The Correlational Coefficient | Hypothesis Test | Assumptions | An Additional Example Introduction Correlation quantifies the extent to which two quantitative variables, X and Y, “go together.” When high values of X are associated with high values of Y, a positive correlation exists. When high values of X are associated with low values of Y, a negative correlation exists. Illustrative data set. We use the data set bicycle.sav to illustrate correlational methods. In this cross-sectional data set, each observation represents a neighborhood. The X variable is socioeconomic status measured as the percentage of children in a neighborhood receiving free or reduced-fee lunches at school. The Y variable is bicycle helmet use measured as the percentage of bicycle riders in the neighborhood wearing helmets. Twelve neighborhoods are considered: Neighborhood X (% receiving reduced-fee lunch) Y (% wearing bicycle helmets) Fair Oaks 50 22.1 Strandwood 11 35.9 Walnut Acres 2 57.9 Discov. Bay 19 22.2 Belshaw 26 42.4 Kennedy 73 5.8 Cassell 81 3.6 Miner 51 21.4 Sedgewick 11 55.2 Sakamoto 2 33.3 Toyon 19 32.4 Lietz 25 38.4 Three are twelve observations (n = 12). Overall, = 30.83 and = 30.883. We want to explore the relation between socioeconomic status and the use of bicycle helmets. It should be noted that an outlier (84, 46.6) has been removed from this data set so that we may quantify the linear relation between X and Y.
  • 2. Tukey, J. W. (1977). EDA. Reading, Mass.: Addison-Wesley, p. 43.1 Kruskal, W. H. (1959). Some Remarks on Wild Observations. http://www.tufts.edu/~gdallal/out.htm.2 Page 14.2 (C:dataStatPrimercorrelation.wpd) Figure 2 Scatter Plot The first step is create a scatter plot of the data. “There is no excuse for failing to plot and look.”1 In general, scatter plots may reveal a • positive correlation (high values of X associated with high values of Y) • negative correlation (high values of X associated with low values of Y) • no correlation (values of X are not at all predictive of values of Y). These patterns are demonstrated in the figure to the right. Illustrative example. A scatter plot of the illustrative data is shown to the right. The plot reveals that high values of X are associated with low values of Y. That is to say, as the number of children receiving reduced-fee meals at school increases, bicycle helmet use rates decrease’ a negative correlation exists. In addition, there is an aberrant observation (“outlier”) in the upper-right quadrant. Outliers should not be ignored—it is important to say something about aberrant observations. What should be said exactly2 depends on what can be learned and what is known. It is possible the lesson learned from the outlier is more important than the main object of the study. In the illustrative data, for instance, we have a low SES school with an envious safety record. What gives?
  • 3. Page 14.3 (C:dataStatPrimercorrelation.wpd) Correlation Coefficient The General Idea Correlation coefficients (denoted r) are statistics that quantify the relation between X and Y in unit-free terms. When all points of a scatter plot fall directly on a line with an upward incline, r = +1; When all points fall directly on a downward incline, r = !1. Such perfect correlation is seldom encountered. We still need to measure correlational strength, –defined as the degree to which data point adhere to an imaginary trend line passing through the “scatter cloud.” Strong correlations are associated with scatter clouds that adhere closely to the imaginary trend line. Weak correlations are associated with scatter clouds that adhere marginally to the trend line. The closer r is to +1, the stronger the positive correlation. The closer r is to !1, the stronger the negative correlation. Examples of strong and weak correlations are shown below. Note: Correlational strength can not be quantified visually. It is too subjective and is easily influenced by axis-scaling. The eye is not a good judge of correlational strength.
  • 4. Page 14.4 (C:dataStatPrimercorrelation.wpd) (1) (2) (3) (4) Pearson’s Correlation Coefficient To calculate a correlation coefficient, you normally need three different sums of squares (SS). The sum of squares for variable X, the sum of square for variable Y, and the sum of the cross-product of XY. The sum of squares for variable X is: XXThis statistic keeps track of the spread of variable X. For the illustrative data, = 30.83 and SS = (50!30.83) +2 x(11!30.83) + . . . + (25!30.83) = 7855.67. Since this statistic is the numerator of the variance of X (s ), it can also2 2 2 XX x XXbe calculated as SS = (s )(n!1). Thus, SS = (714.152)(12!1) = 7855.67.2 The sum of squares for variable Y is: YThis statistic keeps track of the spread of variable Y and is the numerator of the variance of Y (s ). For the2 YYillustrative data = 30.883 and SS = (22.1!30.883) + (35.9!30.883) + . . . + (38.4!30.883) = 3159.68. An2 2 2 YY Y YYalterative way to calculate the sum of squares for variable Y is SS = (s )(n!1). Thus, SS = (287.243)(12!1) =2 3159.68. XYFinally, the sum of the cross-products (SS ) is: XYFor the illustrative data, SS = (50!30.83)(22.1!30.883) + (11!30.83)(35.9!30.883) + . . . + (25!30.83)(38.4 ! 30.883) = !4231.1333. This statistic is analogous to the other sums of squares except that it is used to quantify the extent to which the two variables “go together”. The correlation coefficient (r) is For the illustrative data, r =
  • 5. Page 14.5 (C:dataStatPrimercorrelation.wpd) Interpretation of Pearson’s Correlation Coefficient The sign of the correlation coefficient determines whether the correlation is positive or negative. The magnitude of the correlation coefficient determines the strength of the correlation. Although there are no hard and fast rules for describing correlational strength, I [hesitatingly] offer these guidelines: 0 < |r| < .3 weak correlation .3 < |r| < .7 moderate correlation |r| > 0.7 strong correlation For example, r = -0.849 suggests a strong negative correlation. SPSS: To calculate correlation coefficients click Analyze > Correlate > Bivariate. Then select variables for analysis. Several bivariate correlation coefficients can be calculated simultaneously and displayed as a correlation matrix. Clicking the Options button and checking "Cross-product deviations and covariances” computes sums of squares (Formulas 17.1 - 17.3). Coefficient of Determination The coefficient of determination is the square of the correlation coefficient (r ). For illustrative data,2 r = -0.849 = 0.72. This statistic quantifies the proportion of the variance of one variable “explained” (in a2 2 statistical sense, not a causal sense) by the other. The illustrative coefficient of determination of 0.72 suggests 72% of the variability in helmet use is explained by socioeconomic status.
  • 6. Page 14.6 (C:dataStatPrimercorrelation.wpd) Figure 5 (5) (6) Hypothesis Test The sample correlation coefficient r is the estimator of population correlation coefficient r (rho). Recall that relations in samples do not necessarily depict the same in the population. For example, in Figure 6, the population of all dots demonstrates no correlation. If by chance the encircled points were sampled, an inverse association would appear. Thus, some samples cannot be relied on. The null and alternative hypotheses are . If fixed-level testing is conducted, an " level is selected. The hypothesis can be tested with a t statistic: rwhere se represents the standard error the correlation coefficient: Under the null hypothesis, this t statistic has n ! 2 degrees of freedom. Test results are converted to a p value before conclusions are drawn. r statIllustrative example. For the illustrative data, se = = 0.167 and t = = !5.08, df = 012!2 = 10, p = .00048. This provides evidence in support of the rejection of H . SPSS: The p value — labeled “Sig. (2-tailed)” — is calculated as part of the Analyze > Correlate > Bivariate procedure and described on the prior page.
  • 7. Page 14.7 (C:dataStatPrimercorrelation.wpd) Assumptions We have in the past considered two types of assumptions: • validity assumptions • distributional assumptions Validity assumptions require valid measurements, a good sample, unconfounded comparisons. There requirements are consistent throughout your studies. The only distributional assumption needed to use r descriptively is linearity. Correlation coefficients assume the relation can be described with a straight line. The most direct way to determine linearity is to look at the scatterplot. Inferential methods require that the joint distribution of X and Y is bivariate Normal. This means the three- dimensional distribution of the scatter plot is bell-shaped from all angles: Always remember that correlation does not equate with causation. The jump from a statistical relation to a causal relation requires a different framework.
  • 8. Page 14.8 (C:dataStatPrimercorrelation.wpd) An Additional Example The height of a child, of course, increases over time. Since the pattern of growth varies from child to child, one way to understand the general growth pattern is by using the average of childrens’ heights by age. Data in the table to the right consider average height of participants in the Childhood Respiratory Disease study in East Boston, Massachusetts by age (fev.sav; data source = Rosner,1990, p. 39). Data may be plotted to aid analysis. The figure at the bottom of the page shows average age with error bars based on the standard error meanof each mean (recall that se = s / /n, and that this is a measure of the means precision – means based on large samples are more precise than means based on small samples, according to the “square root law”). Can the data be described by a straight line? (Not entirely, but perhaps in the range 4 to 11. Might a curved line between the ages of 4 and 17 be more useful? Perhaps we should split the data between boys and girls?)