SlideShare a Scribd company logo
1 of 24
05/15/15 Slide 1
• Using a combination of tables and plots from SPSS plus
spreadsheets from Excel, we will show the linkage between
correlation and linear regression.
• Correlation and regression provide us with different, but
complementary, information on the relationship between
two quantitative variables.
Slide 2
The goal of this analysis is to study
the relationship between family
size and number of credit cards.
Finding the relationship will help us
predict the number of credit cards
a family typically has relative to the
number of family members. If a
family had fewer than expected,
they would be a good candidate for
us to extend another credit card
offer.
CreditCardData.sav has five
variables for 8 cases.
The data for the 8 cases is
shown in the Data View to the
left. The names and labels for
each of the variables is shown
below in the Variable View.
Slide 3
Creating a histogram of the
dependent variable, ncards, shows a
distribution that is about as normal as
we could expect for only 8 cases. I
have superimposed the red normal
curve and blue mean line on the
histogram.
For any quantitative variable,
our best estimate of the values
for cases in the distribution is
the mean, because it minimizes
the errors or differences
between the estimated value
and the actual score
represented by each of the bars
in the histogram.
Slide 4
To demonstrate that the mean is the best
value to estimate, I created a worksheet in
Excel that compares the error associated
with three different estimates of values for
each case: the mean of 7, an estimate
lower than the mean: 6, and an estimate
higher than the mean: 8.
Error is calculated as the sum of the squared deviations
from the value used as the estimate. Columns C, F and I
contain the deviations from each of the estimates 7, 6,
and 8). Columns D, G, and J contain the squared
deviations, and the summed total at the base of the
columns.
Using the mean of 7 as the estimate, there are 22 units of
error. Using either 6 or 8 results in 30 units of error.
The measure of error is called the Total Sum of Squares.
Slide 5
The graph for the relationship
between two quantitative
variables is the scatterplot, with
the independent variable Family
Size on the horizontal x-axis, and
the dependent variable Number
of Credit Cards on the vertical y-
axis.
I have superimposed the
blue dotted mean line for
Number of Credit Cards on
the scatterplot. We see
that the scores for two
cases actually fall on the
mean line, while the other
six are at varying distances
from the mean line.
Each dot represents the
combination of scores
for one case. For
example, this dot
represents a family of 5
that had 8 credit cards.
Slide 6
The purple lines are the deviations
– the differences between
individual scores and the mean of
the dependent variable.
If we square the deviations
and sum the squares, we have
the Total Sum of Squares.
The differences are often phrased as
distances, i.e. the vertical distance
between the mean line and the score for
this cases on the dependent variable is 3.
Slide 7
I have added the green vertical
dotted line at the mean number
of credit cards, 4.25.
The regression line will pass
through the intersection of
the means of both variables,
and will minimize the total
sum of the differences
between the individual scores
and the regression line.
Slide 8
One way to think about linear
regression is that we are
rotating a line through the
intersection of the means of the
two variables.
Each time we rotate the
line, we would compute
the total sum of
squares.
We stop when we have
found the line that has
the smallest total sum
of squares.
There is a direct method for finding
the regression line that does not
require this trial and error strategy.
Slide 9
If there is no relationship, the blue
regression line will be on top of (or
very close) to the dotted blue
mean line for the dependent
variable.
No relationship means that
we can not reduce the
error or total sum of
squares of the dependent
variable by using the
relationship to the
independent variable.
Slide 10
The points along the regression line
represent the estimated values for all
possible values of the independent
variable.
For example, if we wanted to estimate
the number of cards for a family of 4, we
would draw a vertical line from the 4 on
the horizontal axis up to the regression
line, and from the regression line left to
the vertical axis. The location on the
vertical axis is the estimated number of
cards that a family of 4 would have, i.e.
about 6.8 cards.
Slide 11
The differences between the estimated
value and the actual value for the cases
are deviations that are called residuals
(the light blue lines). They represent
errors in predicting the values of the
dependent variable based on the value
of the independent variable.
We had two cases with a family size of 4.
Our estimated value was overstated for
one of the cases, and understated for
the other case.
Slide 12
The formula for the regression line can be
extracted from the SPSS output.
For this example, the regression equation
is:
ncards = 2.871 + .971 x famsize
Slide 13
We can plug the regression
equation into Excel and estimate
the number of cards for each
case.
To compute the residuals,
we subtract the actual value
for ncards from the
estimated value for the case.
If we square the residuals, and
sum the squares, we have the
amount of error associated with
using the regression line to
estimate each case, 5.485758.
Slide 14
If we plug the total sum of squares and the sum of
squared residuals into an Excel spreadsheet, we
can compute the reduction in the total sum of
squared errors associated with using the
information in the independent variable, as
represented by the regression equation.
We can compute the percentage of total error
reduced by the regression equation, we end
up with the value of R², the percentage of
variance explained by the regression
relationship.
Our calculation for R² agrees with
the value of R Square in the SPSS
output.
Slide 15
R² is often interpreted as the percentage of
variance explained.
We can convert our Sum of Squares column to
Variance by dividing by the number of cases in
the sample minus one (8 – 1).
If we compute the percentages using variances
instead of sum of squares, we end with exactly
the sample value for R², 0.750647.
R² is also interpreted as the proportional reduction in error ( a
PRE statistic), which we can also phrase as an increase in
accuracy.
We should remember the no matter whether we interpret R² as
explaining variance or reducing error, the statistic applies to the
total error in distribution, not to the error in individual cases.
Slide 16
We can also think of regression and correlation as
based on the pattern of deviations for the two variable
across the cases in the distribution.
To present this, we will first compute the standard
scores for each variable. As standard scores, the value
for each case is the deviation from the mean of 0
which is the mean of the distribution of standard
scores.
Slide 17
Plotting the z-scores for
both variables produces
the same pattern in the
scatterplot that we found
with the raw data.
As we would expect for standard
scores, the green dotted line for
the mean z-score for family size is
at zero, as is the dotted blue line
for the standard scores for number
of credit cards.
Slide 18
We add lines for the
deviation from the
means for both variables.
The green deviation lines
represent differences
from the mean z-score
for family size.
The blue deviation
lines represent
differences from the
mean z-score for
number of credit cards.
Slide 19
For some points, the
length of the green
deviation line is similar to
the length of the blue
deviation line.
The strength of the relationship will
depend on the agreement of the
deviations for each case, i.e. the extent
to which the green line deviation for a
case agrees with the blue line deviation.
For other points, the
length of the green
deviation line is shorter
than the length of the blue
deviation line.
Slide 20
Overall, the pattern of the deviations is similar. Green
deviations above the mean are paired with blue
deviations above the mean. Green deviations below the
mean are paired with blue deviations below the mean.
Though the length of the deviations for individual cases
varies, the overall pattern suggests a strong
relationship.
Slide 21
To compute the correlation
coefficient, we multiply the
z-scores, and sum across all
the cases.
To compute Pearson’s r, we divide
the sum of the z-score products
by the number of cases minus
one.
The value for Pearson’s r that
we computed agrees with the
value supplied by SPSS. Finally, if we square the
value of Pearson’s r, we have
the same value as R Square
in the SPSS regression
output.
Slide 22
If we return to the regression results for
the raw data instead of the standard
scores, we can show the link between
Pearson’s r and the slope in the regression
equation.
Recall that the slope of the regression line
represents change in the dependent variable
associated with a one unit change in the
independent variable. Thus, when a family had one
more member, we would predict that they had .971
more credit cards.
Think of the standard deviation to
be a measure of average
difference from the mean for all of
the cases for each of the variables.
The standard deviation for number
of cares is 1.773 and the standard
deviation for family size is 1.581.
Slide 23
If the relationship between the two variables were
perfect (one predicted the other without error), we
could compute slope of the line using the average
amount of differences in each of the distributions –
the standard deviations.
On average, the number of cards would
go up 1.773 cards for a difference of
1.581 members in a family. We can
simplify this by dividing the standard
deviation for number of cards by the
standard deviation for family size:
1.773 ÷ 1.581 = 1.121
Thus, if the relationship were perfect, we
would increase our estimate of the
number of cards in a family by 1.121 for
every additional member of a family.
Slide 24
If the slope of the regression line were
1.121 when the relationship were
perfect, then we might expect the
slope to be 0.866 x 1.121 when the
relationship was less than perfect.
And it fact, that turns out to be true,
since:
0.866 x 1.121 = 0.971
The slope of the regression line is the
ratio of the standard deviations
multiplied by the correlation
coefficient.
If the relationship between the two
variables were perfect, Pearson’s r would
be 1.0 (or -1.0 if the relationship were
inverse).
However, we know that Pearson’s r is less
than that, actually it is 0.866.

More Related Content

What's hot

Chap12 multiple regression
Chap12 multiple regressionChap12 multiple regression
Chap12 multiple regressionJudianto Nugroho
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regressionSreerajVA
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFDaniel Koh
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer ApplicationsDrMateoMacalaguingJr
 
Simple linear regression project
Simple linear regression projectSimple linear regression project
Simple linear regression projectJAPAN SHAH
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regressionTauseef khan
 
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...Daniel Katz
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5Daniel Katz
 
MLR Project (Onion)
MLR Project (Onion)MLR Project (Onion)
MLR Project (Onion)Chawal Ukesh
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data scienceBrad Klingenberg
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelMehdi Shayegani
 
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Daniel Katz
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 

What's hot (19)

Chap12 multiple regression
Chap12 multiple regressionChap12 multiple regression
Chap12 multiple regression
 
Notes Ch8
Notes Ch8Notes Ch8
Notes Ch8
 
Lasso and ridge regression
Lasso and ridge regressionLasso and ridge regression
Lasso and ridge regression
 
Regression
RegressionRegression
Regression
 
Assessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIFAssessing Relative Importance using RSP Scoring to Generate VIF
Assessing Relative Importance using RSP Scoring to Generate VIF
 
Statistics with Computer Applications
Statistics with Computer ApplicationsStatistics with Computer Applications
Statistics with Computer Applications
 
Chapter13
Chapter13Chapter13
Chapter13
 
Visualization-2
Visualization-2Visualization-2
Visualization-2
 
SCATTER PLOTS
SCATTER PLOTSSCATTER PLOTS
SCATTER PLOTS
 
Simple linear regression project
Simple linear regression projectSimple linear regression project
Simple linear regression project
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regression
 
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
Quantitative Methods for Lawyers - Class #17 - Scatter Plots, Covariance, Cor...
 
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
Quantitative Methods for Lawyers - Class #22 - Regression Analysis - Part 5
 
MLR Project (Onion)
MLR Project (Onion)MLR Project (Onion)
MLR Project (Onion)
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression model
 
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
Quantitative Methods for Lawyers - Class #19 - Regression Analysis - Part 2
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 

Viewers also liked

My Research Projects
My Research ProjectsMy Research Projects
My Research Projectsrpunekar
 
CourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDFCourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDFJasmine Tamanaha
 
Statistics project2
Statistics project2Statistics project2
Statistics project2shri1984
 
Spss workshop by riaz
Spss workshop by riazSpss workshop by riaz
Spss workshop by riazMehreen Khan
 
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...shaika_jannat
 
Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project  Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project Hubert Lo
 
Research report
Research reportResearch report
Research reportArif Hasan
 
Statistics project section c group 6
Statistics project section c group 6Statistics project section c group 6
Statistics project section c group 6Avnika Suri
 
36711831 virgin-soft-drinks-working-capital-management
36711831 virgin-soft-drinks-working-capital-management36711831 virgin-soft-drinks-working-capital-management
36711831 virgin-soft-drinks-working-capital-managementDr. Ravneet Kaur
 
Statistics Project1
Statistics Project1Statistics Project1
Statistics Project1shri1984
 
Chapt 11 & 12 linear & multiple regression minitab
Chapt 11 & 12 linear &  multiple regression minitabChapt 11 & 12 linear &  multiple regression minitab
Chapt 11 & 12 linear & multiple regression minitabBoyu Deng
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regressionvinovk
 
Topic 18 multiple regression
Topic 18 multiple regressionTopic 18 multiple regression
Topic 18 multiple regressionSizwan Ahammed
 
Multiple Regression Analysis of Chinese Economic Factors Based on Stata
Multiple Regression Analysis of Chinese Economic Factors Based on Stata Multiple Regression Analysis of Chinese Economic Factors Based on Stata
Multiple Regression Analysis of Chinese Economic Factors Based on Stata Rifat Ahsan
 
Spss analysis conjoint_cluster_regression_pca_discriminant
Spss analysis conjoint_cluster_regression_pca_discriminantSpss analysis conjoint_cluster_regression_pca_discriminant
Spss analysis conjoint_cluster_regression_pca_discriminantDev Karan Singh Maletia
 
Spss & regression analysis
Spss & regression analysisSpss & regression analysis
Spss & regression analysisWafa Tariq
 

Viewers also liked (20)

760
760760
760
 
My Research Projects
My Research ProjectsMy Research Projects
My Research Projects
 
CourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDFCourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDF
 
Statistics project2
Statistics project2Statistics project2
Statistics project2
 
Spss workshop by riaz
Spss workshop by riazSpss workshop by riaz
Spss workshop by riaz
 
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
Case Study: Analysis and findings of Qubee customer satisfaction in compariso...
 
Regression spss
Regression spssRegression spss
Regression spss
 
Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project  Statistical Modeling - Cereal Data Project
Statistical Modeling - Cereal Data Project
 
Research report
Research reportResearch report
Research report
 
Statistics project section c group 6
Statistics project section c group 6Statistics project section c group 6
Statistics project section c group 6
 
36711831 virgin-soft-drinks-working-capital-management
36711831 virgin-soft-drinks-working-capital-management36711831 virgin-soft-drinks-working-capital-management
36711831 virgin-soft-drinks-working-capital-management
 
5 regression
5 regression5 regression
5 regression
 
Statistics Project1
Statistics Project1Statistics Project1
Statistics Project1
 
Spss
SpssSpss
Spss
 
Chapt 11 & 12 linear & multiple regression minitab
Chapt 11 & 12 linear &  multiple regression minitabChapt 11 & 12 linear &  multiple regression minitab
Chapt 11 & 12 linear & multiple regression minitab
 
Prediction of house price using multiple regression
Prediction of house price using multiple regressionPrediction of house price using multiple regression
Prediction of house price using multiple regression
 
Topic 18 multiple regression
Topic 18 multiple regressionTopic 18 multiple regression
Topic 18 multiple regression
 
Multiple Regression Analysis of Chinese Economic Factors Based on Stata
Multiple Regression Analysis of Chinese Economic Factors Based on Stata Multiple Regression Analysis of Chinese Economic Factors Based on Stata
Multiple Regression Analysis of Chinese Economic Factors Based on Stata
 
Spss analysis conjoint_cluster_regression_pca_discriminant
Spss analysis conjoint_cluster_regression_pca_discriminantSpss analysis conjoint_cluster_regression_pca_discriminant
Spss analysis conjoint_cluster_regression_pca_discriminant
 
Spss & regression analysis
Spss & regression analysisSpss & regression analysis
Spss & regression analysis
 

Similar to Correlation and Regression Analysis of Family Size and Credit Cards

Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regressionnszakir
 
Correlation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptxCorrelation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptxHaimanotReta
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAntony Raj
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- CompleteIrfan Yaqoob
 
Fill in the blanks8.1.      The magnitude of the correlation .docx
Fill in the blanks8.1.      The magnitude of the correlation .docxFill in the blanks8.1.      The magnitude of the correlation .docx
Fill in the blanks8.1.      The magnitude of the correlation .docxmglenn3
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxoreo10
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate dataUlster BOCES
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningKush Kulshrestha
 
2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regression2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regressionLong Beach City College
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statisticsmscartersmaths
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 

Similar to Correlation and Regression Analysis of Family Size and Credit Cards (20)

Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
Correlation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptxCorrelation Analysis PRESENTED.pptx
Correlation Analysis PRESENTED.pptx
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Scatter plot- Complete
Scatter plot- CompleteScatter plot- Complete
Scatter plot- Complete
 
2-20-04.ppt
2-20-04.ppt2-20-04.ppt
2-20-04.ppt
 
Fill in the blanks8.1.      The magnitude of the correlation .docx
Fill in the blanks8.1.      The magnitude of the correlation .docxFill in the blanks8.1.      The magnitude of the correlation .docx
Fill in the blanks8.1.      The magnitude of the correlation .docx
 
The future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docxThe future is uncertain. Some events do have a very small probabil.docx
The future is uncertain. Some events do have a very small probabil.docx
 
Correlation
CorrelationCorrelation
Correlation
 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Exploring bivariate data
Exploring bivariate dataExploring bivariate data
Exploring bivariate data
 
Correlation
CorrelationCorrelation
Correlation
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regression2.4 Scatterplots, correlation, and regression
2.4 Scatterplots, correlation, and regression
 
Frequency Tables - Statistics
Frequency Tables - StatisticsFrequency Tables - Statistics
Frequency Tables - Statistics
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 

Recently uploaded

BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfOnline Income Engine
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Dipal Arora
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMANIlamathiKannappan
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 

Recently uploaded (20)

BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
Unlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdfUnlocking the Secrets of Affiliate Marketing.pdf
Unlocking the Secrets of Affiliate Marketing.pdf
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Navi Mumbai Just Call 9907093804 Top Class Call Girl Service Avail...
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 

Correlation and Regression Analysis of Family Size and Credit Cards

  • 1. 05/15/15 Slide 1 • Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear regression. • Correlation and regression provide us with different, but complementary, information on the relationship between two quantitative variables.
  • 2. Slide 2 The goal of this analysis is to study the relationship between family size and number of credit cards. Finding the relationship will help us predict the number of credit cards a family typically has relative to the number of family members. If a family had fewer than expected, they would be a good candidate for us to extend another credit card offer. CreditCardData.sav has five variables for 8 cases. The data for the 8 cases is shown in the Data View to the left. The names and labels for each of the variables is shown below in the Variable View.
  • 3. Slide 3 Creating a histogram of the dependent variable, ncards, shows a distribution that is about as normal as we could expect for only 8 cases. I have superimposed the red normal curve and blue mean line on the histogram. For any quantitative variable, our best estimate of the values for cases in the distribution is the mean, because it minimizes the errors or differences between the estimated value and the actual score represented by each of the bars in the histogram.
  • 4. Slide 4 To demonstrate that the mean is the best value to estimate, I created a worksheet in Excel that compares the error associated with three different estimates of values for each case: the mean of 7, an estimate lower than the mean: 6, and an estimate higher than the mean: 8. Error is calculated as the sum of the squared deviations from the value used as the estimate. Columns C, F and I contain the deviations from each of the estimates 7, 6, and 8). Columns D, G, and J contain the squared deviations, and the summed total at the base of the columns. Using the mean of 7 as the estimate, there are 22 units of error. Using either 6 or 8 results in 30 units of error. The measure of error is called the Total Sum of Squares.
  • 5. Slide 5 The graph for the relationship between two quantitative variables is the scatterplot, with the independent variable Family Size on the horizontal x-axis, and the dependent variable Number of Credit Cards on the vertical y- axis. I have superimposed the blue dotted mean line for Number of Credit Cards on the scatterplot. We see that the scores for two cases actually fall on the mean line, while the other six are at varying distances from the mean line. Each dot represents the combination of scores for one case. For example, this dot represents a family of 5 that had 8 credit cards.
  • 6. Slide 6 The purple lines are the deviations – the differences between individual scores and the mean of the dependent variable. If we square the deviations and sum the squares, we have the Total Sum of Squares. The differences are often phrased as distances, i.e. the vertical distance between the mean line and the score for this cases on the dependent variable is 3.
  • 7. Slide 7 I have added the green vertical dotted line at the mean number of credit cards, 4.25. The regression line will pass through the intersection of the means of both variables, and will minimize the total sum of the differences between the individual scores and the regression line.
  • 8. Slide 8 One way to think about linear regression is that we are rotating a line through the intersection of the means of the two variables. Each time we rotate the line, we would compute the total sum of squares. We stop when we have found the line that has the smallest total sum of squares. There is a direct method for finding the regression line that does not require this trial and error strategy.
  • 9. Slide 9 If there is no relationship, the blue regression line will be on top of (or very close) to the dotted blue mean line for the dependent variable. No relationship means that we can not reduce the error or total sum of squares of the dependent variable by using the relationship to the independent variable.
  • 10. Slide 10 The points along the regression line represent the estimated values for all possible values of the independent variable. For example, if we wanted to estimate the number of cards for a family of 4, we would draw a vertical line from the 4 on the horizontal axis up to the regression line, and from the regression line left to the vertical axis. The location on the vertical axis is the estimated number of cards that a family of 4 would have, i.e. about 6.8 cards.
  • 11. Slide 11 The differences between the estimated value and the actual value for the cases are deviations that are called residuals (the light blue lines). They represent errors in predicting the values of the dependent variable based on the value of the independent variable. We had two cases with a family size of 4. Our estimated value was overstated for one of the cases, and understated for the other case.
  • 12. Slide 12 The formula for the regression line can be extracted from the SPSS output. For this example, the regression equation is: ncards = 2.871 + .971 x famsize
  • 13. Slide 13 We can plug the regression equation into Excel and estimate the number of cards for each case. To compute the residuals, we subtract the actual value for ncards from the estimated value for the case. If we square the residuals, and sum the squares, we have the amount of error associated with using the regression line to estimate each case, 5.485758.
  • 14. Slide 14 If we plug the total sum of squares and the sum of squared residuals into an Excel spreadsheet, we can compute the reduction in the total sum of squared errors associated with using the information in the independent variable, as represented by the regression equation. We can compute the percentage of total error reduced by the regression equation, we end up with the value of R², the percentage of variance explained by the regression relationship. Our calculation for R² agrees with the value of R Square in the SPSS output.
  • 15. Slide 15 R² is often interpreted as the percentage of variance explained. We can convert our Sum of Squares column to Variance by dividing by the number of cases in the sample minus one (8 – 1). If we compute the percentages using variances instead of sum of squares, we end with exactly the sample value for R², 0.750647. R² is also interpreted as the proportional reduction in error ( a PRE statistic), which we can also phrase as an increase in accuracy. We should remember the no matter whether we interpret R² as explaining variance or reducing error, the statistic applies to the total error in distribution, not to the error in individual cases.
  • 16. Slide 16 We can also think of regression and correlation as based on the pattern of deviations for the two variable across the cases in the distribution. To present this, we will first compute the standard scores for each variable. As standard scores, the value for each case is the deviation from the mean of 0 which is the mean of the distribution of standard scores.
  • 17. Slide 17 Plotting the z-scores for both variables produces the same pattern in the scatterplot that we found with the raw data. As we would expect for standard scores, the green dotted line for the mean z-score for family size is at zero, as is the dotted blue line for the standard scores for number of credit cards.
  • 18. Slide 18 We add lines for the deviation from the means for both variables. The green deviation lines represent differences from the mean z-score for family size. The blue deviation lines represent differences from the mean z-score for number of credit cards.
  • 19. Slide 19 For some points, the length of the green deviation line is similar to the length of the blue deviation line. The strength of the relationship will depend on the agreement of the deviations for each case, i.e. the extent to which the green line deviation for a case agrees with the blue line deviation. For other points, the length of the green deviation line is shorter than the length of the blue deviation line.
  • 20. Slide 20 Overall, the pattern of the deviations is similar. Green deviations above the mean are paired with blue deviations above the mean. Green deviations below the mean are paired with blue deviations below the mean. Though the length of the deviations for individual cases varies, the overall pattern suggests a strong relationship.
  • 21. Slide 21 To compute the correlation coefficient, we multiply the z-scores, and sum across all the cases. To compute Pearson’s r, we divide the sum of the z-score products by the number of cases minus one. The value for Pearson’s r that we computed agrees with the value supplied by SPSS. Finally, if we square the value of Pearson’s r, we have the same value as R Square in the SPSS regression output.
  • 22. Slide 22 If we return to the regression results for the raw data instead of the standard scores, we can show the link between Pearson’s r and the slope in the regression equation. Recall that the slope of the regression line represents change in the dependent variable associated with a one unit change in the independent variable. Thus, when a family had one more member, we would predict that they had .971 more credit cards. Think of the standard deviation to be a measure of average difference from the mean for all of the cases for each of the variables. The standard deviation for number of cares is 1.773 and the standard deviation for family size is 1.581.
  • 23. Slide 23 If the relationship between the two variables were perfect (one predicted the other without error), we could compute slope of the line using the average amount of differences in each of the distributions – the standard deviations. On average, the number of cards would go up 1.773 cards for a difference of 1.581 members in a family. We can simplify this by dividing the standard deviation for number of cards by the standard deviation for family size: 1.773 ÷ 1.581 = 1.121 Thus, if the relationship were perfect, we would increase our estimate of the number of cards in a family by 1.121 for every additional member of a family.
  • 24. Slide 24 If the slope of the regression line were 1.121 when the relationship were perfect, then we might expect the slope to be 0.866 x 1.121 when the relationship was less than perfect. And it fact, that turns out to be true, since: 0.866 x 1.121 = 0.971 The slope of the regression line is the ratio of the standard deviations multiplied by the correlation coefficient. If the relationship between the two variables were perfect, Pearson’s r would be 1.0 (or -1.0 if the relationship were inverse). However, we know that Pearson’s r is less than that, actually it is 0.866.