SlideShare a Scribd company logo
1 of 31
Download to read offline
Implementation of Correlation
in R
Asst. Prof. Ashwini Mathur
For Example
This is the idea that two things that we measure can be somehow related to one
another.
For example, your personal happiness, which we could try to measure say with a
questionnaire, might be related to other things in your life that we could also
measure, such as number of close friends, yearly salary, how much chocolate you
have in your bedroom, or how many times you have said the word Nintendo in your
life.
Some of the relationships that we can measure are meaningful, and might reflect a
causal relationship where one thing causes a change in another thing. Some of the
relationships are spurious, and do not reflect a causal relationship.
Goals
1. Compute Correlation between two variables using software.
2. Discuss the possible meaning of correlations that you observe.
Note: use data from the World Happiness Report. A .csv of the data can be found here: WHR2018.csv
R
In this session, we use explore to explore correlations between
any two variables, and also show how to do a regression line.
There will be three main parts.
Getting R to compute the correlation, and looking at the data using
scatter plots. We’ll look at some correlations from the World
Happiness Report.
Then correlations using data we collect from ourselves
For Correlation (Command : cor)
R has the cor function for computing Pearson’s r between any two variables. In fact
this same function computes other versions of correlation, but we’ll skip those here.
To use the function you just need two variables with numbers in them like this:
x <- c(1,3,2,5,4,6,5,8,9)
y <- c(6,5,8,7,9,7,8,10,13)
cor(x,y)
## [1] 0.76539
Scatter Plots
Plot the data in a scatter plot using ggplot2, and let’s also return the
correlation and print it on the scatter plot. Remember, ggplot2 wants
the data in a data.frame, so we first put our x and y variables in a data
frame.
Program in R
library(ggplot2)
# create data frame for plotting
my_df <- data.frame(x,y)
# plot it
ggplot(my_df, aes(x=x,y=y))+
geom_point()+
geom_text(aes(label = round(cor(x,y), digits=2), y=12, x=2 ))
Lots of Scatterplots
Before we move on to real data, let’s look at some fake data first. Often we
will have many measures of X and Y, split between a few different
conditions, for example, A, B, C, and D. Let’s make some fake data for X
and Y, for each condition A, B, C, and D, and then use facet_wrapping to
look at four scatter plots all at once
x<-rnorm(40,0,1)
y<-rnorm(40,0,1)
conditions<-rep(c("A","B","C","D"), each=10)
all_df <- data.frame(conditions, x, y)
ggplot(all_df, aes(x=x,y=y))+
geom_point()+
facet_wrap(~conditions)
Let's Start with the Real Data-set
Overview
Let’s take a look at some correlations in real data. We are going to look at
responses to a questionnaire about happiness that was sent around the world,
from the world happiness report
Load the data
library(data.table)
whr_data <- fread('data/WHR2018.csv')
Look at the data
library(summarytools)
view(dfSummary(whr_data))
You should be able to see that there is data for many different countries, across a few different
years. There are lots of different kinds of measures, and each are given a name.
Problem to Solve with the help of
Data-Science
Lets formulate the Problem First
My Question
For the year 2017 only, does a countries measure for
“freedom to make life choices” correlate with that
countries measure for " Confidence in national
government"?
Let’s find out. We calculate the correlation, and then we make the scatter plot.
cor(whr_data$`Freedom to make life choices`,
whr_data$`Confidence in national government`)
## [1] NA
Lets try with scatter plot
ggplot(whr_data, aes(x=`Freedom to make life choices`,
y=`Confidence in national government`))+
geom_point()+
theme_classic()
Interesting, what happened here? We can see some dots, but the
correlation was NA (meaning undefined).
This occurred because there are some missing data points in the data.
We can remove all the rows with missing data first, then do the correlation.
We will do this a couple steps, first creating our own data.frame with only
the numbers we want to analyse. We can select the columns we want to
keep using select. Then we use filter to remove the rows with NAs.
library(dplyr)
smaller_df <- whr_data %>%
select(country,
`Freedom to make life choices`,
`Confidence in national government`) %>%
filter(!is.na(`Freedom to make life choices`),
!is.na(`Confidence in national government`))
cor(smaller_df$`Freedom to make life choices`,
smaller_df$`Confidence in national government`
Answer
## [1] 0.4080963
Now we see the correlation is .408.
Although the scatter plot shows the dots are everywhere, it generally
shows that as Freedom to make life choices increases in a country,
that countries confidence in their national government also increase.
This is a positive correlation.
Let’s do this again and add the best fit line, so the trend is more clear,
we use geom_smooth(method=lm, se=FALSE). I also change the alpha
value of the dots so they blend it bit, and you can see more of them
# Plot the data with best fit line.
ggplot(smaller_df, aes(x=`Freedom to make life choices`,
y=`Confidence in national
government`))+
geom_point(alpha=.5)+
geom_smooth(method=lm, se=FALSE)+
theme_classic()
Example 2.
what is the relationship between positive affect in a country and
negative affect in a country. I wouldn’t be surprised if there was a
negative correlation here: when positive feelings generally go up,
shouldn’t negative feelings generally go down?
select DVs and filter for NAs
smaller_df <- whr_data %>%
select(country,
`Positive affect`,
`Negative affect`) %>%
filter(!is.na(`Positive affect`),
!is.na(`Negative affect`))
# calcualte correlation
cor(smaller_df$`Positive affect`,
smaller_df$`Negative affect`)
Answer
## [1] -0.3841123
# plot the data with best fit line
ggplot(smaller_df, aes(x=`Positive affect`,
y=`Negative affect`))+
geom_point(alpha=.5)+
geom_smooth(method=lm, se=FALSE)+
theme_classic()
Correlation and linear regression

More Related Content

Similar to Correlation and linear regression

Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...Jonathan Zimmermann
 
Basics of Measurements.pdf
Basics of Measurements.pdfBasics of Measurements.pdf
Basics of Measurements.pdfMOAZZAMALISATTI
 
Basics of Measurements.pdf
Basics of Measurements.pdfBasics of Measurements.pdf
Basics of Measurements.pdfMOAZZAMALISATTI
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedbob panic
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2marshalkalra
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statisticsmscartersmaths
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Vivian S. Zhang
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxoreo10
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Simplilearn
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationNacho Caballero
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- CompleteIrfan Yaqoob
 
Principal components
Principal componentsPrincipal components
Principal componentsHutami Endang
 
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project reportGaurav Sawant
 
The nature of the data
The nature of the dataThe nature of the data
The nature of the dataKen Plummer
 

Similar to Correlation and linear regression (20)

Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
Problem set 3 - Statistics and Econometrics - Msc Business Analytics - Imperi...
 
Basics of Measurements.pdf
Basics of Measurements.pdfBasics of Measurements.pdf
Basics of Measurements.pdf
 
Basics of Measurements.pdf
Basics of Measurements.pdfBasics of Measurements.pdf
Basics of Measurements.pdf
 
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewedBasic statistics by_david_solomon_hadi_-_split_and_reviewed
Basic statistics by_david_solomon_hadi_-_split_and_reviewed
 
Statistics
StatisticsStatistics
Statistics
 
B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2B409 W11 Sas Collaborative Stats Guide V4.2
B409 W11 Sas Collaborative Stats Guide V4.2
 
Rclass
RclassRclass
Rclass
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statistics
 
Correlation
CorrelationCorrelation
Correlation
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...Data Science Academy Student Demo day--Michael blecher,the importance of clea...
Data Science Academy Student Demo day--Michael blecher,the importance of clea...
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
 
Bridging data analysis and interactive visualization
Bridging data analysis and interactive visualizationBridging data analysis and interactive visualization
Bridging data analysis and interactive visualization
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- Complete
 
Principal components
Principal componentsPrincipal components
Principal components
 
Statistics for ess
Statistics for essStatistics for ess
Statistics for ess
 
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
(Gaurav sawant &amp; dhaval sawlani)bia 678 final project report
 
The nature of the data
The nature of the dataThe nature of the data
The nature of the data
 

More from Ashwini Mathur

S3 classes and s4 classes
S3 classes and s4 classesS3 classes and s4 classes
S3 classes and s4 classesAshwini Mathur
 
Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Ashwini Mathur
 
R programming lab 3 - jupyter notebook
R programming lab   3 - jupyter notebookR programming lab   3 - jupyter notebook
R programming lab 3 - jupyter notebookAshwini Mathur
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebookAshwini Mathur
 
R programming lab 1 - jupyter notebook
R programming lab   1 - jupyter notebookR programming lab   1 - jupyter notebook
R programming lab 1 - jupyter notebookAshwini Mathur
 
R for statistics session 1
R for statistics session 1R for statistics session 1
R for statistics session 1Ashwini Mathur
 
Object oriented programming in r
Object oriented programming in r Object oriented programming in r
Object oriented programming in r Ashwini Mathur
 
Linear programming optimization in r
Linear programming optimization in r Linear programming optimization in r
Linear programming optimization in r Ashwini Mathur
 
Introduction to python along with the comparitive analysis with r
Introduction to python   along with the comparitive analysis with r Introduction to python   along with the comparitive analysis with r
Introduction to python along with the comparitive analysis with r Ashwini Mathur
 
Example 2 summerization notes for descriptive statistics using r
Example   2    summerization notes for descriptive statistics using r Example   2    summerization notes for descriptive statistics using r
Example 2 summerization notes for descriptive statistics using r Ashwini Mathur
 
Descriptive statistics assignment
Descriptive statistics assignment Descriptive statistics assignment
Descriptive statistics assignment Ashwini Mathur
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming languageAshwini Mathur
 
Data analysis for covid 19
Data analysis for covid 19Data analysis for covid 19
Data analysis for covid 19Ashwini Mathur
 
Calling c functions from r programming unit 5
Calling c functions from r programming    unit 5Calling c functions from r programming    unit 5
Calling c functions from r programming unit 5Ashwini Mathur
 
Anova (analysis of variance) test
Anova (analysis of variance) test Anova (analysis of variance) test
Anova (analysis of variance) test Ashwini Mathur
 

More from Ashwini Mathur (19)

S3 classes and s4 classes
S3 classes and s4 classesS3 classes and s4 classes
S3 classes and s4 classes
 
Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r Summerization notes for descriptive statistics using r
Summerization notes for descriptive statistics using r
 
Rpy2 demonstration
Rpy2 demonstrationRpy2 demonstration
Rpy2 demonstration
 
Rpy package
Rpy packageRpy package
Rpy package
 
R programming lab 3 - jupyter notebook
R programming lab   3 - jupyter notebookR programming lab   3 - jupyter notebook
R programming lab 3 - jupyter notebook
 
R programming lab 2 - jupyter notebook
R programming lab   2 - jupyter notebookR programming lab   2 - jupyter notebook
R programming lab 2 - jupyter notebook
 
R programming lab 1 - jupyter notebook
R programming lab   1 - jupyter notebookR programming lab   1 - jupyter notebook
R programming lab 1 - jupyter notebook
 
R for statistics session 1
R for statistics session 1R for statistics session 1
R for statistics session 1
 
R for statistics 2
R for statistics 2R for statistics 2
R for statistics 2
 
Play with matrix in r
Play with matrix in rPlay with matrix in r
Play with matrix in r
 
Object oriented programming in r
Object oriented programming in r Object oriented programming in r
Object oriented programming in r
 
Linear programming optimization in r
Linear programming optimization in r Linear programming optimization in r
Linear programming optimization in r
 
Introduction to python along with the comparitive analysis with r
Introduction to python   along with the comparitive analysis with r Introduction to python   along with the comparitive analysis with r
Introduction to python along with the comparitive analysis with r
 
Example 2 summerization notes for descriptive statistics using r
Example   2    summerization notes for descriptive statistics using r Example   2    summerization notes for descriptive statistics using r
Example 2 summerization notes for descriptive statistics using r
 
Descriptive statistics assignment
Descriptive statistics assignment Descriptive statistics assignment
Descriptive statistics assignment
 
Descriptive analytics in r programming language
Descriptive analytics in r programming languageDescriptive analytics in r programming language
Descriptive analytics in r programming language
 
Data analysis for covid 19
Data analysis for covid 19Data analysis for covid 19
Data analysis for covid 19
 
Calling c functions from r programming unit 5
Calling c functions from r programming    unit 5Calling c functions from r programming    unit 5
Calling c functions from r programming unit 5
 
Anova (analysis of variance) test
Anova (analysis of variance) test Anova (analysis of variance) test
Anova (analysis of variance) test
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknowmakika9823
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service LucknowAminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
Aminabad Call Girl Agent 9548273370 , Call Girls Service Lucknow
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

Correlation and linear regression

  • 1. Implementation of Correlation in R Asst. Prof. Ashwini Mathur
  • 2. For Example This is the idea that two things that we measure can be somehow related to one another. For example, your personal happiness, which we could try to measure say with a questionnaire, might be related to other things in your life that we could also measure, such as number of close friends, yearly salary, how much chocolate you have in your bedroom, or how many times you have said the word Nintendo in your life. Some of the relationships that we can measure are meaningful, and might reflect a causal relationship where one thing causes a change in another thing. Some of the relationships are spurious, and do not reflect a causal relationship.
  • 3. Goals 1. Compute Correlation between two variables using software. 2. Discuss the possible meaning of correlations that you observe. Note: use data from the World Happiness Report. A .csv of the data can be found here: WHR2018.csv
  • 4. R In this session, we use explore to explore correlations between any two variables, and also show how to do a regression line. There will be three main parts. Getting R to compute the correlation, and looking at the data using scatter plots. We’ll look at some correlations from the World Happiness Report. Then correlations using data we collect from ourselves
  • 5. For Correlation (Command : cor) R has the cor function for computing Pearson’s r between any two variables. In fact this same function computes other versions of correlation, but we’ll skip those here. To use the function you just need two variables with numbers in them like this: x <- c(1,3,2,5,4,6,5,8,9) y <- c(6,5,8,7,9,7,8,10,13) cor(x,y) ## [1] 0.76539
  • 6. Scatter Plots Plot the data in a scatter plot using ggplot2, and let’s also return the correlation and print it on the scatter plot. Remember, ggplot2 wants the data in a data.frame, so we first put our x and y variables in a data frame.
  • 7. Program in R library(ggplot2) # create data frame for plotting my_df <- data.frame(x,y) # plot it ggplot(my_df, aes(x=x,y=y))+ geom_point()+ geom_text(aes(label = round(cor(x,y), digits=2), y=12, x=2 ))
  • 8.
  • 9. Lots of Scatterplots Before we move on to real data, let’s look at some fake data first. Often we will have many measures of X and Y, split between a few different conditions, for example, A, B, C, and D. Let’s make some fake data for X and Y, for each condition A, B, C, and D, and then use facet_wrapping to look at four scatter plots all at once
  • 10. x<-rnorm(40,0,1) y<-rnorm(40,0,1) conditions<-rep(c("A","B","C","D"), each=10) all_df <- data.frame(conditions, x, y) ggplot(all_df, aes(x=x,y=y))+ geom_point()+ facet_wrap(~conditions)
  • 11.
  • 12. Let's Start with the Real Data-set
  • 13. Overview Let’s take a look at some correlations in real data. We are going to look at responses to a questionnaire about happiness that was sent around the world, from the world happiness report
  • 14. Load the data library(data.table) whr_data <- fread('data/WHR2018.csv')
  • 15. Look at the data library(summarytools) view(dfSummary(whr_data)) You should be able to see that there is data for many different countries, across a few different years. There are lots of different kinds of measures, and each are given a name.
  • 16. Problem to Solve with the help of Data-Science Lets formulate the Problem First
  • 17. My Question For the year 2017 only, does a countries measure for “freedom to make life choices” correlate with that countries measure for " Confidence in national government"?
  • 18. Let’s find out. We calculate the correlation, and then we make the scatter plot. cor(whr_data$`Freedom to make life choices`, whr_data$`Confidence in national government`) ## [1] NA
  • 19. Lets try with scatter plot ggplot(whr_data, aes(x=`Freedom to make life choices`, y=`Confidence in national government`))+ geom_point()+ theme_classic()
  • 20.
  • 21. Interesting, what happened here? We can see some dots, but the correlation was NA (meaning undefined). This occurred because there are some missing data points in the data. We can remove all the rows with missing data first, then do the correlation. We will do this a couple steps, first creating our own data.frame with only the numbers we want to analyse. We can select the columns we want to keep using select. Then we use filter to remove the rows with NAs.
  • 22. library(dplyr) smaller_df <- whr_data %>% select(country, `Freedom to make life choices`, `Confidence in national government`) %>% filter(!is.na(`Freedom to make life choices`), !is.na(`Confidence in national government`)) cor(smaller_df$`Freedom to make life choices`, smaller_df$`Confidence in national government`
  • 23. Answer ## [1] 0.4080963 Now we see the correlation is .408.
  • 24. Although the scatter plot shows the dots are everywhere, it generally shows that as Freedom to make life choices increases in a country, that countries confidence in their national government also increase. This is a positive correlation. Let’s do this again and add the best fit line, so the trend is more clear, we use geom_smooth(method=lm, se=FALSE). I also change the alpha value of the dots so they blend it bit, and you can see more of them
  • 25. # Plot the data with best fit line. ggplot(smaller_df, aes(x=`Freedom to make life choices`, y=`Confidence in national government`))+ geom_point(alpha=.5)+ geom_smooth(method=lm, se=FALSE)+ theme_classic()
  • 26.
  • 27. Example 2. what is the relationship between positive affect in a country and negative affect in a country. I wouldn’t be surprised if there was a negative correlation here: when positive feelings generally go up, shouldn’t negative feelings generally go down?
  • 28. select DVs and filter for NAs smaller_df <- whr_data %>% select(country, `Positive affect`, `Negative affect`) %>% filter(!is.na(`Positive affect`), !is.na(`Negative affect`)) # calcualte correlation cor(smaller_df$`Positive affect`, smaller_df$`Negative affect`)
  • 30. # plot the data with best fit line ggplot(smaller_df, aes(x=`Positive affect`, y=`Negative affect`))+ geom_point(alpha=.5)+ geom_smooth(method=lm, se=FALSE)+ theme_classic()