SlideShare a Scribd company logo
CHAPTER 16
Regression
Regression
The statistical technique for finding the best-fitting straight
line for a set of data
• Allows us to make
predictions based on
correlations
• A linear relationship
between two variables
allows the computation
of an equation that
provides a precise,
mathematical description
of the relationship abXY 
Regression
Line
The Relationship Between
Correlation and Regression
Both examine the relationship/association
between two variables
Both involve an X and Y variable for each
individual (one pair of scores)
Differences in practice
Correlation
Used to determine the
relationship between
two variables
Regression
Used to make
predictions about one
variable based on the
value of another
The Linear Equation:
Expresses a linear relationship between variables X and Y
• X: represents any given score on X
• Y: represents the corresponding score for Y based on X
• a: the Y-intercept
• Determines what the
value of Y equals when X = 0
• Where the line crosses the
Y-axis
• b: the slope constant
• How much the Y variable
will change when X is
increased by one point
• The direction and degree of the line’s tilt
abXY 
Prediction using Regression
A local video store charges a
$5/month membership fee
which allows video rentals at
$2 each
• How much will I spend per
month?
• If you never rent a video (X = 0)
• If you rent 3 videos/mo (X = 3)
• If you rent 8 videos/mo (X = 8)
abXY 
52  XY
55)0(2 Y
115)3(2 Y
215)8(2 Y
Graphing linear equations
7560)35(3
6060)05(0


YX
YX
The intercept (a) is 60
(when X = 0, Y = 60)
The slope (b) is 5
(as we increase one value in X, Y
increases 5 points)
0
10
20
30
40
50
60
70
80
0 1 2 3 4
• To graph the line below,
we only need to find two
pairs of scores for X and Y,
and then draw the straight
line that connects them
605  XY
The Regression Line
The line through the data points that ‘best fit’ the data
(assuming a linear relationship)
1. Makes the relationship
between two variables
easier to see (and
describe)
2. Identifies the ‘central
tendency’ of the relationship
between the variables
3. Can be used for prediction
• Best fit: the line that minimizes the distance of each
point to the line
‘Best fit’
Regression
Line
Correlation and the regression line
0
1
2
3
4
5
6
7
8
0 1 2 3 4 5
• The magnitude of the
correlation coefficient (r ) is
an indicator of how well
the points aggregate
around the regression line
• What would a perfect
correlation look like?
The Distance Between a Point and the Line
:ˆ
:
Y
Y
Each data point will have its
own distance from the
regression line (a.k.a. error)
The actual value of Y shown in
the data for a given X
The value of Y predicted for a
given X from your linear
equation
YY ˆDistance 
How well does the line fit the data?
• How well a set of data points fits a straight line
can be measured by calculating the distance
(error) between the line and each data point
YY ˆError 
hat"y"ˆ Y
How well does the line fit the data?
• Some of distances will be positive and some
negative, so to find a total value we must square
each distance (remember SS)
 2
ˆ YY
Total squared error
(SS residual):
Remember, this is
the squared sum
of all distances
The Regression Line
The line through the data points that ‘best fit’ the data
(assuming a linear relationship)
The Least-
Squared-Error
Solution
A.k.a.
• The “best fit”
regression line
• minimizes the distance
of each point from the line
• Gives the best prediction
of Y
• The Least-Squared-Error
Solution
• Results in the smallest possible
value for the total squared error abXY ˆ
Solving the regression equation
abXY ˆ
Remember:
n
YX
XYSP


x
y
x s
s
r
SS
SP
b 
XY bMMa 
meanM
I interrupt our regularly scheduled
program for a brief announcement….
‘Memba these?
We have spent the semester
utilizing the Computational
Formulas for all Sum of Squares
For sanity’s sake, we will now be
utilizing the definitional formulas
for all
n
X
XSSX
2
2 )(

n
Y
YSSY
2
2 )(

n
YX
XYSP


2
)( XX MXSS 
  YX MYMXSP 
2
)( YY MYSS 
And now back to our regularly
scheduled programming…..
Solving the regression equation
abXY ˆ
Remember:
x
y
x s
s
r
SS
SP
b 
XY bMMa 
meanM
  YX MYMXSP 
Let’s Try One!
(Example 16.1, p.563, using the definitional formula)
Scores
X Y
2 3
6 11
0 6
4 6
7 12
5 7
5 10
3 9
∑X=32
Mx=4
∑Y=64
MY=8
Error
X - MX Y - MY
-2 -5
2 3
-4 -2
0 -2
3 4
1 -1
1 2
-1 1
Products
(X – MX)(Y – MY)
10
6
8
0
12
-1
2
-1
SP = 36
Squared Error
(X - MX)2 (Y - MY)2
4 25
4 9
16 4
0 4
9 16
1 1
1 4
1 1
SSX = 36 SSY = 64
Find b and a in the regression equation
1
36
36

xSS
SP
b
448)4(18 

a
bMMa XY
36
648;364


SP
SSMSSM YYXx
441ˆ  XXabXY
Making Predictions
We use the regression to make predictions.
• For the previous example:
• Thus, an individual with a score of X = 3 would be
predicted to have a Y score of:
However, keep in mind:
1. The predicted value will not be perfect unless the correlation is
perfect (the data points are not perfectly in line)
• Least error is NOT the absence of error
2. The regression equation should not be used to make predictions for
X values outside the range of the original data
4ˆ  XY
743ˆ Y
Standardizing the Regression Equation
The standardized form of the regression equation
utilizes z-scores (standardized scores) in place of raw
scores:
Note:
1. We are now using the z-score for each X value (zx) to predict the
z-score for the corresponding Y value (zy)
2. The slope constant that was b is now identified as β (“beta”)
• The slope for standardized variables: one standard deviation change
in X produces this much change in the standard deviation of Y
• For an equation with two variables, β = Pearson r
3. There is no longer a constant (a) in the equation
because z-scores have a mean of 0
xy zz ˆ
xy bMMa 
The Accuracy of the Predictions
• These plots of two different sets of data have the same
regression equation
The regression equation does not
provide any information about the
accuracy of the predictions!
The Standard Error of the Estimate
Provides a measure of the standard distance between a
regression line (the predicted Y values) and the actual data
points (the actual Y values)
• Very similar to the standard deviation
• Answers the question:
How accurately does the regression equation predict the
observed Y values?
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
Let’s Compute the Standard Error of
Estimate (Example 16.1, p.563, using the definitional formula)
Data
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
Predicted Y
values
6
10
4
8
9
11
9
7
4ˆ  XY
Residual
-3
1
2
-2
-2
1
1
2
0
YY ˆ
Squared
Residual
9
1
4
4
4
1
1
4
SSresidual = 28
 2
ˆYY 
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
43.11
67.130
6
784
28
282





Relationship Between the Standard
Error of the Estimate and Correlation
• r2 = proportion of predicted variability
• Variability in Y that is predicted by its relationship with X
• (1 – r2) = proportion of unpredicted variability
So, if r = 0.80, then the predicted variability is r2 = 0.64
• 64% of the total variability for Y scores can be predicted by X
• And the unpredicted variability is the remaining 36% (1 - r2)
predicted variability = SSregression = r2
SSY
unpredicted variability = SSresidual = (1-r2
)SSY
An Easier Way to Compute SSresidual
sY.X =
SSresidual
df
=
1-r2
( )SSY
n-2
 
2
ˆ 2
.



n
YY
df
SS
s residual
XY
Instead of computing individual error values:
It is easier to simply use the formula for unpredicted
variability for the SSresidual
These are the steps we just went through to
compute the Standard Error of Estimate
Data
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
Predicted Y
values
6
10
4
8
9
11
9
7
4ˆ  XY
Residual
-3
1
2
-2
-2
1
1
2
0
YY ˆ
Squared
Residual
9
1
4
4
4
1
1
4
SSresidual = 28
 2
ˆYY 
sY.X =
SSresidual
df
=
å Y - ˆY( )
2
n-2
43.11
67.130
6
784
28
282





Now let’s do it using the easier formula
• We know SSX = 36, SSY = 64, and SP = 36 because we
calculated it a few slides back:
Scores
X Y
2 3
6 11
0 6
4 6
5 7
7 12
5 10
3 9
∑X=32
Mx=4
∑Y=64
MY=8
Error
X - MX Y - MY
-2 -5
2 3
-4 -2
0 -2
3 4
1 -1
1 2
-1 1
Products
(X - MX)2(Y - MY)2
10
6
8
0
12
-1
2
-1
SP = 36
Squared Error
(X - MX)2 (Y - MY)2
4 25
4 9
16 4
0 4
9 16
1 1
1 4
1 1
SSX = 36 SSY = 64
Using those figures, we can compute:
• With SSY = 64 and a correlation of 0.75, the predicted
variability from the regression equation is:
r =
SP
SSXSSY
=
36
36(64)
=
36
2304
=
36
48
= 0.75
SSregression = r2
SSY = 0.752
(64)= 0.5625(64) = 36
SSresidual = (1-r2
)SSY = (1-0.752
)64 = (1-0.5625)64
= (0.4375)64 = 28
• And the unpredicted variability is:
• This is the same value we found working with our table!
CHAPTER 16.2
Analysis of Regression:
Testing the Significance of the Regression Equation
Analysis of Regression
• Uses an F-ratio to determine whether the variance
predicted by the regression equation is significantly
greater than would be expected if there was no
relationship between X and Y.
F =
variance in Y predicted by the regression equation
unpredicted variance in the Y scores
F =
systematic changes in Y resulting from changes in X
changes in Y that are independent from changes in X
Significance testing
The regression equation does not account for a
significant proportion of variance in the Y scores
The equation does account for a significant
proportion of variance in the Y scores
MSregression =
SSregression
dfregression
;df =1
MSresidual =
SSresidual
dfresidual
;df = n- 2
Find and evaluate the critical F-value the same as for
ANOVA (df = # of predictors, n-2)
H0 :
H1 :
F =
MSregression
MSresidual
Coming up next…
• Wednesday lab
• Lab #9: Using SPSS for correlation and regression
• HW #9 is due in the beginning of class
• Read the second half of Chapter 16 (pp.572-581)
CHAPTER 16.3
Introduction to Multiple Regression with Two Predictor
Variables
Multiple
Regression
with Two
Predictor
Variables
• 40% of the variance in Academic Performance can be
predicted by IQ scores
• 30% of the variance in academic performance can be
predicted from SAT scores
• IQ and SAT also overlap: SAT contributes only an additional
10% beyond what is already predicted by IQ
Predicting the variance
in academic
performance from IQ
and SAT scores
Multiple Regression
When you have more than one predictor variable
Considering the two-predictor model:
For standardized scores:
ˆY = b1x1 + b2 x2 + a
ˆzY = b1zX1 + b2zX 2
Calculations for two-predictor
regression coefficients:
Where:
• SSX1= sum of squared
deviations for X1
• SSX2= sum of squared
deviations for X2
• SPX1Y= sum of products
of deviations for X1 and Y
• SPX2Y= sum of products
of deviations for X2 and Y
• SPX1X2= sum of products
of deviations for X1and X22211
2
2121
12112
2
2
2121
22121
1
)())((
))(())((
)())((
))(())((
XXY
XXXX
YXXXXYX
XXXX
YXXXXYX
MbMbMa
SPSSSS
SPSPSSSP
b
SPSSSS
SPSPSSSP
b







R²
Percentage of variance accounted for by a
multiple-regression equation
• Proportion of unpredicted variability:
Y
YXYX
Y
regression
SS
SPbSPb
SS
SS
R 22112 

Y
residual
SS
SS
R  )1( 2
Standard error of the
estimate
Significance testing
(2-predictors)
3
21



ndf
df
SS
MS
MSs
residual
residual
residualXXY
),2(
3
2
residual
residual
regression
residual
residual
regression
regression
dfdf
MS
MS
F
n
SS
MS
SS
MS





** With 3+ predictors, df
regression = # predictors
Evaluating the Contribution of Each
Predictor Variable
• With a multiple regression, we can evaluate the
contribution of each predictor variable
• Does variable X1 make a significant contribution
beyond what is already predicted by variable X2?
• Does variable X2 make a significant contribution
beyond what is already predicted by variable X1?
• This is useful if we want to control for a third variable and
any confounding effects

More Related Content

What's hot

Concentration inequality in Machine Learning
Concentration inequality in Machine LearningConcentration inequality in Machine Learning
Concentration inequality in Machine Learning
VARUN KUMAR
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)Harsh Upadhyay
 
Numerical method (curve fitting)
Numerical method (curve fitting)Numerical method (curve fitting)
Numerical method (curve fitting)
Varendra University Rajshahi-bangladesh
 
Linear regression
Linear regressionLinear regression
Linear regression
vermaumeshverma
 
What is chi square test
What  is  chi square testWhat  is  chi square test
What is chi square test
Talent Corner HR Services Pvt Ltd.
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysis
Mahak Vijayvargiya
 
8.2 Exploring exponential models
8.2 Exploring exponential models8.2 Exploring exponential models
8.2 Exploring exponential modelsswartzje
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
jasondroesch
 
An Overview of Simple Linear Regression
An Overview of Simple Linear RegressionAn Overview of Simple Linear Regression
An Overview of Simple Linear Regression
Georgian Court University
 
Basics of Integration and Derivatives
Basics of Integration and DerivativesBasics of Integration and Derivatives
Basics of Integration and Derivatives
Faisal Waqar
 
Statistics Reference Book
Statistics Reference BookStatistics Reference Book
Statistics Reference Book
Ram Kumar Shah "Struggler"
 
Chap11 simple regression
Chap11 simple regressionChap11 simple regression
Chap11 simple regression
Judianto Nugroho
 
Probability Distribution
Probability DistributionProbability Distribution
Probability Distribution
Pharmacy Universe
 
Random variable,Discrete and Continuous
Random variable,Discrete and ContinuousRandom variable,Discrete and Continuous
Random variable,Discrete and Continuous
Bharath kumar Karanam
 
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdfGrimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
Abdirahman Farah Ali
 
Measure of dispersion 10321
Measure of dispersion 10321Measure of dispersion 10321
Measure of dispersion 10321
gopinathannsriramachandraeduin
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear RegressionSharlaine Ruth
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
Maria Theresa
 

What's hot (20)

Concentration inequality in Machine Learning
Concentration inequality in Machine LearningConcentration inequality in Machine Learning
Concentration inequality in Machine Learning
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Multiple Linear Regression
Multiple Linear Regression Multiple Linear Regression
Multiple Linear Regression
 
Numerical method (curve fitting)
Numerical method (curve fitting)Numerical method (curve fitting)
Numerical method (curve fitting)
 
Linear regression
Linear regressionLinear regression
Linear regression
 
What is chi square test
What  is  chi square testWhat  is  chi square test
What is chi square test
 
Basics of Regression analysis
 Basics of Regression analysis Basics of Regression analysis
Basics of Regression analysis
 
8.2 Exploring exponential models
8.2 Exploring exponential models8.2 Exploring exponential models
8.2 Exploring exponential models
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Correlation and Regression
Correlation and RegressionCorrelation and Regression
Correlation and Regression
 
An Overview of Simple Linear Regression
An Overview of Simple Linear RegressionAn Overview of Simple Linear Regression
An Overview of Simple Linear Regression
 
Basics of Integration and Derivatives
Basics of Integration and DerivativesBasics of Integration and Derivatives
Basics of Integration and Derivatives
 
Statistics Reference Book
Statistics Reference BookStatistics Reference Book
Statistics Reference Book
 
Chap11 simple regression
Chap11 simple regressionChap11 simple regression
Chap11 simple regression
 
Probability Distribution
Probability DistributionProbability Distribution
Probability Distribution
 
Random variable,Discrete and Continuous
Random variable,Discrete and ContinuousRandom variable,Discrete and Continuous
Random variable,Discrete and Continuous
 
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdfGrimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
 
Measure of dispersion 10321
Measure of dispersion 10321Measure of dispersion 10321
Measure of dispersion 10321
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 

Similar to regression

Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
MuhammadAftab89
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
BAGARAGAZAROMUALD2
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
RidaIrfan10
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
ssuser71ac73
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
HarunorRashid74
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
krunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
MoinPasha12
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
nuwan udugampala
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
Rashi Agarwal
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
Correlations
CorrelationsCorrelations
Regression analysis
Regression analysisRegression analysis
Regression analysis
Awais Salman
 
Regression
Regression  Regression
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
Nimrita Koul
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
ShivankAggatwal
 
Linear regression
Linear regressionLinear regression
Linear regression
Regent University
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
Rohit77460
 

Similar to regression (20)

Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Regression and Co-Relation
Regression and Co-RelationRegression and Co-Relation
Regression and Co-Relation
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
regression.pptx
regression.pptxregression.pptx
regression.pptx
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Correlations
CorrelationsCorrelations
Correlations
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression
Regression  Regression
Regression
 
Linear regression analysis
Linear regression analysisLinear regression analysis
Linear regression analysis
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
 
Linear regression
Linear regressionLinear regression
Linear regression
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
 

More from Kaori Kubo Germano, PhD

Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
Kaori Kubo Germano, PhD
 
Probablity
ProbablityProbablity
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
Kaori Kubo Germano, PhD
 
z-scores
z-scoresz-scores
Choosing the right statistics
Choosing the right statisticsChoosing the right statistics
Choosing the right statistics
Kaori Kubo Germano, PhD
 
Chi square
Chi squareChi square
Factorial ANOVA
Factorial ANOVAFactorial ANOVA
Factorial ANOVA
Kaori Kubo Germano, PhD
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
Kaori Kubo Germano, PhD
 
Repeated Measures ANOVA
Repeated Measures ANOVARepeated Measures ANOVA
Repeated Measures ANOVA
Kaori Kubo Germano, PhD
 
Repeated Measures t-test
Repeated Measures t-testRepeated Measures t-test
Repeated Measures t-test
Kaori Kubo Germano, PhD
 
Independent samples t-test
Independent samples t-testIndependent samples t-test
Independent samples t-test
Kaori Kubo Germano, PhD
 
Introduction to the t-test
Introduction to the t-testIntroduction to the t-test
Introduction to the t-test
Kaori Kubo Germano, PhD
 
Central Tendency
Central TendencyCentral Tendency
Central Tendency
Kaori Kubo Germano, PhD
 
Variability
VariabilityVariability
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
Kaori Kubo Germano, PhD
 
Behavioral Statistics Intro lecture
Behavioral Statistics Intro lectureBehavioral Statistics Intro lecture
Behavioral Statistics Intro lecture
Kaori Kubo Germano, PhD
 

More from Kaori Kubo Germano, PhD (16)

Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Probablity
ProbablityProbablity
Probablity
 
Probability & Samples
Probability & SamplesProbability & Samples
Probability & Samples
 
z-scores
z-scoresz-scores
z-scores
 
Choosing the right statistics
Choosing the right statisticsChoosing the right statistics
Choosing the right statistics
 
Chi square
Chi squareChi square
Chi square
 
Factorial ANOVA
Factorial ANOVAFactorial ANOVA
Factorial ANOVA
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Repeated Measures ANOVA
Repeated Measures ANOVARepeated Measures ANOVA
Repeated Measures ANOVA
 
Repeated Measures t-test
Repeated Measures t-testRepeated Measures t-test
Repeated Measures t-test
 
Independent samples t-test
Independent samples t-testIndependent samples t-test
Independent samples t-test
 
Introduction to the t-test
Introduction to the t-testIntroduction to the t-test
Introduction to the t-test
 
Central Tendency
Central TendencyCentral Tendency
Central Tendency
 
Variability
VariabilityVariability
Variability
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Behavioral Statistics Intro lecture
Behavioral Statistics Intro lectureBehavioral Statistics Intro lecture
Behavioral Statistics Intro lecture
 

Recently uploaded

Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
gb193092
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 

Recently uploaded (20)

Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Marketing internship report file for MBA
Marketing internship report file for MBAMarketing internship report file for MBA
Marketing internship report file for MBA
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 

regression

  • 2. Regression The statistical technique for finding the best-fitting straight line for a set of data • Allows us to make predictions based on correlations • A linear relationship between two variables allows the computation of an equation that provides a precise, mathematical description of the relationship abXY  Regression Line
  • 3. The Relationship Between Correlation and Regression Both examine the relationship/association between two variables Both involve an X and Y variable for each individual (one pair of scores) Differences in practice Correlation Used to determine the relationship between two variables Regression Used to make predictions about one variable based on the value of another
  • 4. The Linear Equation: Expresses a linear relationship between variables X and Y • X: represents any given score on X • Y: represents the corresponding score for Y based on X • a: the Y-intercept • Determines what the value of Y equals when X = 0 • Where the line crosses the Y-axis • b: the slope constant • How much the Y variable will change when X is increased by one point • The direction and degree of the line’s tilt abXY 
  • 5. Prediction using Regression A local video store charges a $5/month membership fee which allows video rentals at $2 each • How much will I spend per month? • If you never rent a video (X = 0) • If you rent 3 videos/mo (X = 3) • If you rent 8 videos/mo (X = 8) abXY  52  XY 55)0(2 Y 115)3(2 Y 215)8(2 Y
  • 6. Graphing linear equations 7560)35(3 6060)05(0   YX YX The intercept (a) is 60 (when X = 0, Y = 60) The slope (b) is 5 (as we increase one value in X, Y increases 5 points) 0 10 20 30 40 50 60 70 80 0 1 2 3 4 • To graph the line below, we only need to find two pairs of scores for X and Y, and then draw the straight line that connects them 605  XY
  • 7. The Regression Line The line through the data points that ‘best fit’ the data (assuming a linear relationship) 1. Makes the relationship between two variables easier to see (and describe) 2. Identifies the ‘central tendency’ of the relationship between the variables 3. Can be used for prediction • Best fit: the line that minimizes the distance of each point to the line ‘Best fit’ Regression Line
  • 8. Correlation and the regression line 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 • The magnitude of the correlation coefficient (r ) is an indicator of how well the points aggregate around the regression line • What would a perfect correlation look like?
  • 9. The Distance Between a Point and the Line :ˆ : Y Y Each data point will have its own distance from the regression line (a.k.a. error) The actual value of Y shown in the data for a given X The value of Y predicted for a given X from your linear equation YY ˆDistance 
  • 10. How well does the line fit the data? • How well a set of data points fits a straight line can be measured by calculating the distance (error) between the line and each data point YY ˆError  hat"y"ˆ Y
  • 11. How well does the line fit the data? • Some of distances will be positive and some negative, so to find a total value we must square each distance (remember SS)  2 ˆ YY Total squared error (SS residual): Remember, this is the squared sum of all distances
  • 12. The Regression Line The line through the data points that ‘best fit’ the data (assuming a linear relationship) The Least- Squared-Error Solution A.k.a. • The “best fit” regression line • minimizes the distance of each point from the line • Gives the best prediction of Y • The Least-Squared-Error Solution • Results in the smallest possible value for the total squared error abXY ˆ
  • 13. Solving the regression equation abXY ˆ Remember: n YX XYSP   x y x s s r SS SP b  XY bMMa  meanM
  • 14. I interrupt our regularly scheduled program for a brief announcement….
  • 15. ‘Memba these? We have spent the semester utilizing the Computational Formulas for all Sum of Squares For sanity’s sake, we will now be utilizing the definitional formulas for all n X XSSX 2 2 )(  n Y YSSY 2 2 )(  n YX XYSP   2 )( XX MXSS    YX MYMXSP  2 )( YY MYSS 
  • 16. And now back to our regularly scheduled programming…..
  • 17. Solving the regression equation abXY ˆ Remember: x y x s s r SS SP b  XY bMMa  meanM   YX MYMXSP 
  • 18. Let’s Try One! (Example 16.1, p.563, using the definitional formula) Scores X Y 2 3 6 11 0 6 4 6 7 12 5 7 5 10 3 9 ∑X=32 Mx=4 ∑Y=64 MY=8 Error X - MX Y - MY -2 -5 2 3 -4 -2 0 -2 3 4 1 -1 1 2 -1 1 Products (X – MX)(Y – MY) 10 6 8 0 12 -1 2 -1 SP = 36 Squared Error (X - MX)2 (Y - MY)2 4 25 4 9 16 4 0 4 9 16 1 1 1 4 1 1 SSX = 36 SSY = 64
  • 19. Find b and a in the regression equation 1 36 36  xSS SP b 448)4(18   a bMMa XY 36 648;364   SP SSMSSM YYXx 441ˆ  XXabXY
  • 20. Making Predictions We use the regression to make predictions. • For the previous example: • Thus, an individual with a score of X = 3 would be predicted to have a Y score of: However, keep in mind: 1. The predicted value will not be perfect unless the correlation is perfect (the data points are not perfectly in line) • Least error is NOT the absence of error 2. The regression equation should not be used to make predictions for X values outside the range of the original data 4ˆ  XY 743ˆ Y
  • 21. Standardizing the Regression Equation The standardized form of the regression equation utilizes z-scores (standardized scores) in place of raw scores: Note: 1. We are now using the z-score for each X value (zx) to predict the z-score for the corresponding Y value (zy) 2. The slope constant that was b is now identified as β (“beta”) • The slope for standardized variables: one standard deviation change in X produces this much change in the standard deviation of Y • For an equation with two variables, β = Pearson r 3. There is no longer a constant (a) in the equation because z-scores have a mean of 0 xy zz ˆ xy bMMa 
  • 22. The Accuracy of the Predictions • These plots of two different sets of data have the same regression equation The regression equation does not provide any information about the accuracy of the predictions!
  • 23. The Standard Error of the Estimate Provides a measure of the standard distance between a regression line (the predicted Y values) and the actual data points (the actual Y values) • Very similar to the standard deviation • Answers the question: How accurately does the regression equation predict the observed Y values?   2 ˆ 2 .    n YY df SS s residual XY
  • 24. Let’s Compute the Standard Error of Estimate (Example 16.1, p.563, using the definitional formula) Data X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 Predicted Y values 6 10 4 8 9 11 9 7 4ˆ  XY Residual -3 1 2 -2 -2 1 1 2 0 YY ˆ Squared Residual 9 1 4 4 4 1 1 4 SSresidual = 28  2 ˆYY    2 ˆ 2 .    n YY df SS s residual XY 43.11 67.130 6 784 28 282     
  • 25. Relationship Between the Standard Error of the Estimate and Correlation • r2 = proportion of predicted variability • Variability in Y that is predicted by its relationship with X • (1 – r2) = proportion of unpredicted variability So, if r = 0.80, then the predicted variability is r2 = 0.64 • 64% of the total variability for Y scores can be predicted by X • And the unpredicted variability is the remaining 36% (1 - r2) predicted variability = SSregression = r2 SSY unpredicted variability = SSresidual = (1-r2 )SSY
  • 26. An Easier Way to Compute SSresidual sY.X = SSresidual df = 1-r2 ( )SSY n-2   2 ˆ 2 .    n YY df SS s residual XY Instead of computing individual error values: It is easier to simply use the formula for unpredicted variability for the SSresidual
  • 27. These are the steps we just went through to compute the Standard Error of Estimate Data X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 Predicted Y values 6 10 4 8 9 11 9 7 4ˆ  XY Residual -3 1 2 -2 -2 1 1 2 0 YY ˆ Squared Residual 9 1 4 4 4 1 1 4 SSresidual = 28  2 ˆYY  sY.X = SSresidual df = å Y - ˆY( ) 2 n-2 43.11 67.130 6 784 28 282     
  • 28. Now let’s do it using the easier formula • We know SSX = 36, SSY = 64, and SP = 36 because we calculated it a few slides back: Scores X Y 2 3 6 11 0 6 4 6 5 7 7 12 5 10 3 9 ∑X=32 Mx=4 ∑Y=64 MY=8 Error X - MX Y - MY -2 -5 2 3 -4 -2 0 -2 3 4 1 -1 1 2 -1 1 Products (X - MX)2(Y - MY)2 10 6 8 0 12 -1 2 -1 SP = 36 Squared Error (X - MX)2 (Y - MY)2 4 25 4 9 16 4 0 4 9 16 1 1 1 4 1 1 SSX = 36 SSY = 64
  • 29. Using those figures, we can compute: • With SSY = 64 and a correlation of 0.75, the predicted variability from the regression equation is: r = SP SSXSSY = 36 36(64) = 36 2304 = 36 48 = 0.75 SSregression = r2 SSY = 0.752 (64)= 0.5625(64) = 36 SSresidual = (1-r2 )SSY = (1-0.752 )64 = (1-0.5625)64 = (0.4375)64 = 28 • And the unpredicted variability is: • This is the same value we found working with our table!
  • 30. CHAPTER 16.2 Analysis of Regression: Testing the Significance of the Regression Equation
  • 31. Analysis of Regression • Uses an F-ratio to determine whether the variance predicted by the regression equation is significantly greater than would be expected if there was no relationship between X and Y. F = variance in Y predicted by the regression equation unpredicted variance in the Y scores F = systematic changes in Y resulting from changes in X changes in Y that are independent from changes in X
  • 32. Significance testing The regression equation does not account for a significant proportion of variance in the Y scores The equation does account for a significant proportion of variance in the Y scores MSregression = SSregression dfregression ;df =1 MSresidual = SSresidual dfresidual ;df = n- 2 Find and evaluate the critical F-value the same as for ANOVA (df = # of predictors, n-2) H0 : H1 : F = MSregression MSresidual
  • 33. Coming up next… • Wednesday lab • Lab #9: Using SPSS for correlation and regression • HW #9 is due in the beginning of class • Read the second half of Chapter 16 (pp.572-581)
  • 34. CHAPTER 16.3 Introduction to Multiple Regression with Two Predictor Variables
  • 35. Multiple Regression with Two Predictor Variables • 40% of the variance in Academic Performance can be predicted by IQ scores • 30% of the variance in academic performance can be predicted from SAT scores • IQ and SAT also overlap: SAT contributes only an additional 10% beyond what is already predicted by IQ Predicting the variance in academic performance from IQ and SAT scores
  • 36. Multiple Regression When you have more than one predictor variable Considering the two-predictor model: For standardized scores: ˆY = b1x1 + b2 x2 + a ˆzY = b1zX1 + b2zX 2
  • 37. Calculations for two-predictor regression coefficients: Where: • SSX1= sum of squared deviations for X1 • SSX2= sum of squared deviations for X2 • SPX1Y= sum of products of deviations for X1 and Y • SPX2Y= sum of products of deviations for X2 and Y • SPX1X2= sum of products of deviations for X1and X22211 2 2121 12112 2 2 2121 22121 1 )())(( ))(())(( )())(( ))(())(( XXY XXXX YXXXXYX XXXX YXXXXYX MbMbMa SPSSSS SPSPSSSP b SPSSSS SPSPSSSP b       
  • 38. R² Percentage of variance accounted for by a multiple-regression equation • Proportion of unpredicted variability: Y YXYX Y regression SS SPbSPb SS SS R 22112   Y residual SS SS R  )1( 2
  • 39. Standard error of the estimate Significance testing (2-predictors) 3 21    ndf df SS MS MSs residual residual residualXXY ),2( 3 2 residual residual regression residual residual regression regression dfdf MS MS F n SS MS SS MS      ** With 3+ predictors, df regression = # predictors
  • 40. Evaluating the Contribution of Each Predictor Variable • With a multiple regression, we can evaluate the contribution of each predictor variable • Does variable X1 make a significant contribution beyond what is already predicted by variable X2? • Does variable X2 make a significant contribution beyond what is already predicted by variable X1? • This is useful if we want to control for a third variable and any confounding effects