SlideShare a Scribd company logo
1 of 28
Download to read offline
Correlation
Anthony J. Evans
Professor of Economics, ESCP Europe
www.anthonyjevans.com
(cc) Anthony J. Evans 2019 | http://creativecommons.org/licenses/by-nc-sa/3.0/
Introduction to Correlation
• Correlation is a measure of the “co”-relation between two
or more variables
• The main result of a correlation is called the correlation
coefficient (or "r").
– It ranges from -1.0 to +1.0.
– The closer r is to |1|, the more closely the two
variables are related
• Also known as the Pearson Product Moment correlation
Coefficient
2
Defining correlation
• The correlation measures the direction and strength of the
linear relationship between variables x and y
– Population = “ρ” (rho)
– Sample = "r"
x
i
x
s
xx
z
-
=Where
3€
r =
ZxZy∑
n −1
Correlation
• Notice how correlation is based on standardisation – i.e. the z-
scores for each value
• There is no distinction between explanatory and dependent
variables
– It doesn’t matter what you label as x or y
• Both variables need to be quantitative, not categorical
• r can be strongly influenced by outliers
• -1 <r <1
• If r = -1 there is a perfectly negative correlation
• If r = 0 there is no correlation
• If r = +1 there is a perfectly positive correlation
4
Description
• Measures correlation of
values by their position
in a ordered list
• A ranking
• Ex.: 1,2,3,4,5,..
Spearman
FormulaCoefficient
If our data is in an
ordered list, we have to
use the Spearman
coefficient, which is a
type of Pearson
correlation
Correlation Coefficients: An Alternative
A Caveat
Pearson
• Measures the
correlation between
series of cardinal data
• Actual values
• Ex.: 80, 90, 75, 15,…
5
€
r =
ZxZy∑
n −1
𝑟 𝑠 = 1 −
6Σ𝑑)
𝑛(𝑛) − 1)
Example: Spearman Coefficient
Type of drink Last month This month
Coffee 1 3
Tea 2 4
Orange juice 3 1
Lemon juice 4 2
Whisky 5 6
Red Wine 6 10
White Wine 7 9
Brandy 8 7
Chocolate 9 8
Cider 10 5
Rank of preferences of drinks in
a Market Research
6
A Caveat
Example: Spearman Coefficient
Rs = 1 - (6 x 64) / (10 x (100-1))
Rs = 0.61212
Type of drink Last month This month d d2
Coffee 1 3 2 4
Tea 2 4 2 4
Orange juice 3 1 -2 4
Lemon juice 4 2 -2 4
Whisky 5 6 1 1
Red Wine 6 10 4 16
White Wine 7 9 2 4
Brandy 8 7 -1 1
Chocolate 9 8 -1 1
Cider 10 5 -5 25
64
7
A Caveat
𝑟 𝑠 = 1 −
6Σ𝑑)
𝑛(𝑛) − 1)
Example: Pearson Product Moment Correlation Coefficient
Region Alcohol Tobacco
North 6.47 4.03
Yorkshire 6.13 3.76
North East 6.19 3.77
East Midlands 4.89 3.34
West Midlands 5.63 3.47
East Anglia 4.52 2.92
South East 5.89 3.2
South West 4.79 2.71
Wales 5.27 3.53
Scotland 6.08 4.51
Source: Moore & McCabe p.133
N = 10
Example: Pearson Product Moment Correlation Coefficient
9
1
2
3 4
5
67
Region
Alcohol
(X) Alcohol Z Tobacco (Y) Tobacco Z ZxZy
North 6.47 1.304199908 4.03 0.957459102 1.248718073
Yorkshire 6.13 0.802584559 3.76 0.446561953 0.358403728
North East 6.19 0.891104915 3.77 0.465484069 0.414795142
East Midlands 4.89 -1.026836127 3.34 -0.348166946 0.357510399
West Midlands 5.63 0.064914928 3.47 -0.10217943 -0.00663297
East Anglia 4.52 -1.572711654 2.92 -1.142895845 1.797445615
South East 5.89 0.448503136 3.2 -0.613076579 -0.274966768
South West 4.79 -1.174370053 2.71
-
1.540260295 1.808835564
Wales 5.27
-
0.466207207 3.53 0.01135327
-
0.005292976
Scotland 6.08 0.728817596 4.51 1.865720701 1.359770076
Total= 7.058585881
Mean= 5.586 Mean= 3.524
St Dev= 0.6778102 St Dev= 0.528482103
Correl= 0.78428732 Correl= 0.78428732
Example: Pearson Product Moment Correlation Coefficient
1. Work out the mean
2. Work out the standard deviation
3. Calculate Z scores for X and Y
4. Multiply Z scores together
5. Total Z scores
6. Divide by n-1
7. Verify
r = 0.784
Pretty strong positive relationship between smoking and
alcohol consumption
=AVERAGE(B2:B11)
=STDEV(B2:B11)
=CORREL(B2:B11,D2:D11)
=(B2-$B$14)/$B$15
=C2*E2
=SUM(F2:F11)
=F13/9
11
Example: Pearson Product Moment Correlation Coefficient
Source: http://www.uwsp.edu/psych/stat/7/correlat.htm 12
Example: Correlation Coefficient
Source: http://davidmlane.com/hyperstat/A63407.html
positive relationship
r = 0.63
13
x x
yy y
x
Positive
r = 0.6
Strong positive
r = 0.9
Perfect positive
r = 1
Examples of Positive Correlation: r>1
14
x x
yy y
x
Negative
r = - 0.4
Strong negative
r = - 0.8
Perfect negative
r = - 1
Examples of Negative Correlation: r<1
15
Correlation Pitfalls
• Nonlinearity (as we’ve seen)
• Truncated Range
– i.e. wrongly categorised dataset
• Outliers
• Causation
– Correlation shows that two random variables are
related but doesn’t tell us anything about whether one
causes the other
1. May confuse whether Aà B or B à A
2. There might be a third variable, C that causes both
3. It may just be a coincidence
16
Other Correlation Pitfalls: Confusion between A and B
17
Other Correlation Pitfalls: Variable C
Ice Cream sales don’t
cause Sun Tan sales and
vice versa: both are being
caused by hot weather
19
Other Correlation Pitfalls: Coincidence
Highway fatalities and lemon imports
20Johnson, S.R., 2008 “The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”,
Journal of Chemical Information and Modeling, 48(1):25-26
Chocolate consumption and Nobel laureates
21Messerli, F.H., 2012, “Chocolate Consumption, Cognitive Function, and Nobel Laureates”, The New England
Journal of Medicine, 367:1562-1564
22Source: http://www.tylervigen.com/spurious-correlations
23Source: http://www.tylervigen.com/spurious-correlations
Summary
• You will almost always use cardinal data, and therefore you
should associate correlation with the Pearson Product
Moment Correlation Coefficient
• However retain the technique and intuition behind the
Spearman method in case you use ordered/ranked data
24
Solutions
25
Example: Spearman Coefficient
rs = 1 - (6 x 64) / (10 x (100-1))
rs = 0.61212
Type of drink Last month This month d d2
Coffee 1 3 2 4
Tea 2 4 2 4
Orange juice 3 1 -2 4
Lemon juice 4 2 -2 4
Whisky 5 6 1 1
Red Wine 6 10 4 16
White Wine 7 9 2 4
Brandy 8 7 -1 1
Chocolate 9 8 -1 1
Cider 10 5 -5 25
64
26
A Caveat
𝑟 𝑠 = 1 −
6Σ𝑑)
𝑛(𝑛) − 1)
Example: Correlation Coefficient
Source: http://davidmlane.com/hyperstat/A63407.html
positive relationship
r = 0.63
27
• This presentation forms part of a free, online course
on analytics
• http://econ.anthonyjevans.com/courses/analytics/
28

More Related Content

What's hot

Pearson product moment correlation
Pearson product moment correlationPearson product moment correlation
Pearson product moment correlationSharlaine Ruth
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis_pem
 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression pptSantosh Bhaskar
 
Correlation & Linear Regression
Correlation & Linear RegressionCorrelation & Linear Regression
Correlation & Linear RegressionAzmi Mohd Tamil
 
Karl pearson's correlation
Karl pearson's correlationKarl pearson's correlation
Karl pearson's correlationfairoos1
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAjendra7846
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAbdelaziz Tayoun
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis pptAnil Mishra
 
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...RekhaChoudhary24
 
Correlation by ramesh kumar
Correlation by ramesh kumarCorrelation by ramesh kumar
Correlation by ramesh kumarKVS
 

What's hot (20)

Correlation
CorrelationCorrelation
Correlation
 
Correlation continued
Correlation continuedCorrelation continued
Correlation continued
 
Pearson product moment correlation
Pearson product moment correlationPearson product moment correlation
Pearson product moment correlation
 
Correlation and regression analysis
Correlation and regression analysisCorrelation and regression analysis
Correlation and regression analysis
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression ppt
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation mp
Correlation mpCorrelation mp
Correlation mp
 
Correlation & Linear Regression
Correlation & Linear RegressionCorrelation & Linear Regression
Correlation & Linear Regression
 
Correlation
CorrelationCorrelation
Correlation
 
Karl pearson's correlation
Karl pearson's correlationKarl pearson's correlation
Karl pearson's correlation
 
Regression
RegressionRegression
Regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation by ramesh kumar
Correlation by ramesh kumarCorrelation by ramesh kumar
Correlation by ramesh kumar
 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
 

Similar to Correlation

Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxspoonerneddy
 
Biostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.pptBiostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.pptletayh2016
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencycheweb1
 
STATISTICAL MEASURES.ppt
STATISTICAL MEASURES.pptSTATISTICAL MEASURES.ppt
STATISTICAL MEASURES.pptsadiakhan783184
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingOffice for National Statistics
 
Ct lecture 17. introduction to logistic regression
Ct lecture 17. introduction to logistic regressionCt lecture 17. introduction to logistic regression
Ct lecture 17. introduction to logistic regressionHau Pham
 
Lesson03_new
Lesson03_newLesson03_new
Lesson03_newshengvn
 
Lesson03_static11
Lesson03_static11Lesson03_static11
Lesson03_static11thangv
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptxCallplanetsDeveloper
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptxCallplanetsDeveloper
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor AnalysisShailendra Tomar
 
Basic data analyses skills for science research
Basic data analyses skills for science researchBasic data analyses skills for science research
Basic data analyses skills for science researchLaw Law
 
Research Methodology-3
Research Methodology-3Research Methodology-3
Research Methodology-3anupta jana
 
Optimising Risk-Based Screening: The Case of Diabetic Eye Disease
Optimising Risk-Based Screening: The Case of Diabetic Eye DiseaseOptimising Risk-Based Screening: The Case of Diabetic Eye Disease
Optimising Risk-Based Screening: The Case of Diabetic Eye DiseaseOffice of Health Economics
 

Similar to Correlation (20)

Notes Chapter 4.pptx
Notes Chapter 4.pptxNotes Chapter 4.pptx
Notes Chapter 4.pptx
 
Chapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docxChapter 9Multivariable MethodsObjectives• .docx
Chapter 9Multivariable MethodsObjectives• .docx
 
Biostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.pptBiostatistics lecture notes 7.ppt
Biostatistics lecture notes 7.ppt
 
Network meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistencyNetwork meta-analysis & models for inconsistency
Network meta-analysis & models for inconsistency
 
Correlation.pptx
Correlation.pptxCorrelation.pptx
Correlation.pptx
 
Stats Project PowerPoint
Stats Project PowerPointStats Project PowerPoint
Stats Project PowerPoint
 
STATISTICAL MEASURES.ppt
STATISTICAL MEASURES.pptSTATISTICAL MEASURES.ppt
STATISTICAL MEASURES.ppt
 
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher TrainingONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
ONS Guide to Social and Economic Research – Welsh Baccalaureate Teacher Training
 
Cost of living
Cost of livingCost of living
Cost of living
 
Ct lecture 17. introduction to logistic regression
Ct lecture 17. introduction to logistic regressionCt lecture 17. introduction to logistic regression
Ct lecture 17. introduction to logistic regression
 
Lesson03_new
Lesson03_newLesson03_new
Lesson03_new
 
Lesson03_static11
Lesson03_static11Lesson03_static11
Lesson03_static11
 
Correlation.pdf
Correlation.pdfCorrelation.pdf
Correlation.pdf
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
 
Topic 5 Covariance & Correlation.pptx
Topic 5  Covariance & Correlation.pptxTopic 5  Covariance & Correlation.pptx
Topic 5 Covariance & Correlation.pptx
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Basic data analyses skills for science research
Basic data analyses skills for science researchBasic data analyses skills for science research
Basic data analyses skills for science research
 
Measures of Dispersion
Measures of DispersionMeasures of Dispersion
Measures of Dispersion
 
Research Methodology-3
Research Methodology-3Research Methodology-3
Research Methodology-3
 
Optimising Risk-Based Screening: The Case of Diabetic Eye Disease
Optimising Risk-Based Screening: The Case of Diabetic Eye DiseaseOptimising Risk-Based Screening: The Case of Diabetic Eye Disease
Optimising Risk-Based Screening: The Case of Diabetic Eye Disease
 

More from Anthony J. Evans (16)

Time Series
Time SeriesTime Series
Time Series
 
The Suitcase Case
The Suitcase CaseThe Suitcase Case
The Suitcase Case
 
Nonparametric Statistics
Nonparametric StatisticsNonparametric Statistics
Nonparametric Statistics
 
Student's T Test
Student's T TestStudent's T Test
Student's T Test
 
Significance Tests
Significance TestsSignificance Tests
Significance Tests
 
Taxi for Professor Evans
Taxi for Professor EvansTaxi for Professor Evans
Taxi for Professor Evans
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Probability Distributions
Probability Distributions Probability Distributions
Probability Distributions
 
Probability Theory
Probability Theory Probability Theory
Probability Theory
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Statistical Literacy
Statistical Literacy Statistical Literacy
Statistical Literacy
 
Quantitative Methods
Quantitative Methods Quantitative Methods
Quantitative Methods
 
Collecting and Presenting Data
Collecting and Presenting DataCollecting and Presenting Data
Collecting and Presenting Data
 
Numeracy Skills 1
Numeracy Skills 1Numeracy Skills 1
Numeracy Skills 1
 
The Dynamic AD AS Model
The Dynamic AD AS ModelThe Dynamic AD AS Model
The Dynamic AD AS Model
 
Numeracy Skills 2
Numeracy Skills 2Numeracy Skills 2
Numeracy Skills 2
 

Recently uploaded

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 

Recently uploaded (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 

Correlation

  • 1. Correlation Anthony J. Evans Professor of Economics, ESCP Europe www.anthonyjevans.com (cc) Anthony J. Evans 2019 | http://creativecommons.org/licenses/by-nc-sa/3.0/
  • 2. Introduction to Correlation • Correlation is a measure of the “co”-relation between two or more variables • The main result of a correlation is called the correlation coefficient (or "r"). – It ranges from -1.0 to +1.0. – The closer r is to |1|, the more closely the two variables are related • Also known as the Pearson Product Moment correlation Coefficient 2
  • 3. Defining correlation • The correlation measures the direction and strength of the linear relationship between variables x and y – Population = “ρ” (rho) – Sample = "r" x i x s xx z - =Where 3€ r = ZxZy∑ n −1
  • 4. Correlation • Notice how correlation is based on standardisation – i.e. the z- scores for each value • There is no distinction between explanatory and dependent variables – It doesn’t matter what you label as x or y • Both variables need to be quantitative, not categorical • r can be strongly influenced by outliers • -1 <r <1 • If r = -1 there is a perfectly negative correlation • If r = 0 there is no correlation • If r = +1 there is a perfectly positive correlation 4
  • 5. Description • Measures correlation of values by their position in a ordered list • A ranking • Ex.: 1,2,3,4,5,.. Spearman FormulaCoefficient If our data is in an ordered list, we have to use the Spearman coefficient, which is a type of Pearson correlation Correlation Coefficients: An Alternative A Caveat Pearson • Measures the correlation between series of cardinal data • Actual values • Ex.: 80, 90, 75, 15,… 5 € r = ZxZy∑ n −1 𝑟 𝑠 = 1 − 6Σ𝑑) 𝑛(𝑛) − 1)
  • 6. Example: Spearman Coefficient Type of drink Last month This month Coffee 1 3 Tea 2 4 Orange juice 3 1 Lemon juice 4 2 Whisky 5 6 Red Wine 6 10 White Wine 7 9 Brandy 8 7 Chocolate 9 8 Cider 10 5 Rank of preferences of drinks in a Market Research 6 A Caveat
  • 7. Example: Spearman Coefficient Rs = 1 - (6 x 64) / (10 x (100-1)) Rs = 0.61212 Type of drink Last month This month d d2 Coffee 1 3 2 4 Tea 2 4 2 4 Orange juice 3 1 -2 4 Lemon juice 4 2 -2 4 Whisky 5 6 1 1 Red Wine 6 10 4 16 White Wine 7 9 2 4 Brandy 8 7 -1 1 Chocolate 9 8 -1 1 Cider 10 5 -5 25 64 7 A Caveat 𝑟 𝑠 = 1 − 6Σ𝑑) 𝑛(𝑛) − 1)
  • 8. Example: Pearson Product Moment Correlation Coefficient Region Alcohol Tobacco North 6.47 4.03 Yorkshire 6.13 3.76 North East 6.19 3.77 East Midlands 4.89 3.34 West Midlands 5.63 3.47 East Anglia 4.52 2.92 South East 5.89 3.2 South West 4.79 2.71 Wales 5.27 3.53 Scotland 6.08 4.51 Source: Moore & McCabe p.133 N = 10
  • 9. Example: Pearson Product Moment Correlation Coefficient 9
  • 10. 1 2 3 4 5 67 Region Alcohol (X) Alcohol Z Tobacco (Y) Tobacco Z ZxZy North 6.47 1.304199908 4.03 0.957459102 1.248718073 Yorkshire 6.13 0.802584559 3.76 0.446561953 0.358403728 North East 6.19 0.891104915 3.77 0.465484069 0.414795142 East Midlands 4.89 -1.026836127 3.34 -0.348166946 0.357510399 West Midlands 5.63 0.064914928 3.47 -0.10217943 -0.00663297 East Anglia 4.52 -1.572711654 2.92 -1.142895845 1.797445615 South East 5.89 0.448503136 3.2 -0.613076579 -0.274966768 South West 4.79 -1.174370053 2.71 - 1.540260295 1.808835564 Wales 5.27 - 0.466207207 3.53 0.01135327 - 0.005292976 Scotland 6.08 0.728817596 4.51 1.865720701 1.359770076 Total= 7.058585881 Mean= 5.586 Mean= 3.524 St Dev= 0.6778102 St Dev= 0.528482103 Correl= 0.78428732 Correl= 0.78428732
  • 11. Example: Pearson Product Moment Correlation Coefficient 1. Work out the mean 2. Work out the standard deviation 3. Calculate Z scores for X and Y 4. Multiply Z scores together 5. Total Z scores 6. Divide by n-1 7. Verify r = 0.784 Pretty strong positive relationship between smoking and alcohol consumption =AVERAGE(B2:B11) =STDEV(B2:B11) =CORREL(B2:B11,D2:D11) =(B2-$B$14)/$B$15 =C2*E2 =SUM(F2:F11) =F13/9 11
  • 12. Example: Pearson Product Moment Correlation Coefficient Source: http://www.uwsp.edu/psych/stat/7/correlat.htm 12
  • 13. Example: Correlation Coefficient Source: http://davidmlane.com/hyperstat/A63407.html positive relationship r = 0.63 13
  • 14. x x yy y x Positive r = 0.6 Strong positive r = 0.9 Perfect positive r = 1 Examples of Positive Correlation: r>1 14
  • 15. x x yy y x Negative r = - 0.4 Strong negative r = - 0.8 Perfect negative r = - 1 Examples of Negative Correlation: r<1 15
  • 16. Correlation Pitfalls • Nonlinearity (as we’ve seen) • Truncated Range – i.e. wrongly categorised dataset • Outliers • Causation – Correlation shows that two random variables are related but doesn’t tell us anything about whether one causes the other 1. May confuse whether Aà B or B à A 2. There might be a third variable, C that causes both 3. It may just be a coincidence 16
  • 17. Other Correlation Pitfalls: Confusion between A and B 17
  • 18. Other Correlation Pitfalls: Variable C Ice Cream sales don’t cause Sun Tan sales and vice versa: both are being caused by hot weather
  • 20. Highway fatalities and lemon imports 20Johnson, S.R., 2008 “The Trouble with QSAR (or How I Learned To Stop Worrying and Embrace Fallacy)”, Journal of Chemical Information and Modeling, 48(1):25-26
  • 21. Chocolate consumption and Nobel laureates 21Messerli, F.H., 2012, “Chocolate Consumption, Cognitive Function, and Nobel Laureates”, The New England Journal of Medicine, 367:1562-1564
  • 24. Summary • You will almost always use cardinal data, and therefore you should associate correlation with the Pearson Product Moment Correlation Coefficient • However retain the technique and intuition behind the Spearman method in case you use ordered/ranked data 24
  • 26. Example: Spearman Coefficient rs = 1 - (6 x 64) / (10 x (100-1)) rs = 0.61212 Type of drink Last month This month d d2 Coffee 1 3 2 4 Tea 2 4 2 4 Orange juice 3 1 -2 4 Lemon juice 4 2 -2 4 Whisky 5 6 1 1 Red Wine 6 10 4 16 White Wine 7 9 2 4 Brandy 8 7 -1 1 Chocolate 9 8 -1 1 Cider 10 5 -5 25 64 26 A Caveat 𝑟 𝑠 = 1 − 6Σ𝑑) 𝑛(𝑛) − 1)
  • 27. Example: Correlation Coefficient Source: http://davidmlane.com/hyperstat/A63407.html positive relationship r = 0.63 27
  • 28. • This presentation forms part of a free, online course on analytics • http://econ.anthonyjevans.com/courses/analytics/ 28