SlideShare a Scribd company logo
1 of 20
Regression Basics
Predicting a DV with a Single IV
Questions
• What are predictors and
criteria?
• Write an equation for
the linear regression.
Describe each term.
• How do changes in the
slope and intercept
affect (move) the
regression line?
• What does it mean to
test the significance of
the regression sum of
squares? R-square?
• What is R-square?
• What does it mean to choose
a regression line to satisfy
the loss function of least
squares?
• How do we find the slope
and intercept for the
regression line with a single
independent variable?
(Either formula for the slope
is acceptable.)
• Why does testing for the
regression sum of squares
turn out to have the same
result as testing for R-
square?
Basic Ideas
• Jargon
– IV = X = Predictor (pl. predictors)
– DV = Y = Criterion (pl. criteria)
– Regression of Y on X e.g., GPA on SAT
• Linear Model = relations between IV
and DV represented by straight line.
• A score on Y has 2 parts – (1) linear
function of X and (2) error.
Y Xi i i= + +α β ε (population values)
Basic Ideas (2)
• Sample value:
• Intercept – place where X=0
• Slope – change in Y if X changes 1
unit. Rise over run.
• If error is removed, we have a predicted
value for each person at X (the line):
Y a bX ei i i= + +
′ = +Y a bX
Suppose on average houses are worth about $75.00 a
square foot. Then the equation relating price to size
would be Y’=0+75X. The predicted price for a 2000
square foot house would be $150,000.
Linear Transformation
• 1 to 1 mapping of variables via line
• Permissible operations are addition and
multiplication (interval data)
1086420
X
40
35
30
25
20
15
10
5
0
Y
Changing the Y Intercept
Y=5+2X
Y=10+2X
Y=15+2X
Add a constant
1086420
X
30
20
10
0
Y
Changing the Slope
Y=5+.5X
Y=5+X
Y=5+2X
Multiply by a constant
′ = +Y a bX
Linear Transformation (2)
• Centigrade to Fahrenheit
• Note 1 to 1 map
• Intercept?
• Slope?
1209060300
Degrees C
240
200
160
120
80
40
0
DegreesF
32 degrees F, 0 degrees C
212 degrees F, 100 degrees C
Intercept is 32. When X (Cent) is 0, Y (Fahr) is 32.
Slope is 1.8. When Cent goes from 0 to 100 (run), Fahr goes
from 32 to 212 (rise), and 212-32 = 180. Then 180/100 =1.8 is
rise over run is the slope. Y = 32+1.8X. F=32+1.8C.
′ = +Y a bX
Review
• What are predictors and criteria?
• Write an equation for the linear
regression with 1 IV. Describe each
term.
• How do changes in the slope and
intercept affect (move) the regression
line?
Regression of Weight on
Height
Ht Wt
61 105
62 120
63 120
65 160
65 120
68 145
69 175
70 160
72 185
75 210
N=10 N=10
M=67 M=150
SD=4.57 SD=
33.99
767472706866646260
Height in Inches
240
210
180
150
120
90
60
Regression of Weight on HeightRegression of Weight on HeightRegression of Weight on Height
Rise
Run
Y= -316.86+6.97X
X
Correlation (r) = .94.
Regression equation: Y’=-316.86+6.97X
′ = +Y a bX
Illustration of the Linear
Model. This concept is vital!
727068666462
Height
200
180
160
140
120
100
Weight
Regression of Weight on Height
727068666462
Height
Regression of Weight on HeightRegression of Weight on Height
(65,120)
Mean of X
Mean of Y
Deviation from X
Deviation from Y
Linear Part
Error Part
yY'
e
Y Xi i i= + +α β ε
Y a bX ei i i= + +
Consider Y as
a deviation
from the
mean.
Part of that deviation can be associated with X (the linear
part) and part cannot (the error).
′ = +Y a bX
'
iii YYe −=
Predicted Values & Residuals
N Ht Wt Y' Resid
1 61 105 108.19 -3.19
2 62 120 115.16 4.84
3 63 120 122.13 -2.13
4 65 160 136.06 23.94
5 65 120 136.06 -16.06
6 68 145 156.97 -11.97
7 69 175 163.94 11.06
8 70 160 170.91 -10.91
9 72 185 184.84 0.16
10 75 210 205.75 4.25
M 67 150 150.00 0.00
SD 4.57 33.99 31.85 11.89
V 20.89 1155.56 1014.37 141.32
727068666462
Height
200
180
160
140
120
100
Weight
Regression of Weight on Height
727068666462
Height
Regression of Weight on HeightRegression of Weight on Height
(65,120)
Mean of X
Mean of Y
Deviation from X
Deviation from Y
Linear Part
Error Part
yY'
e
Numbers for linear part and error.
Note M of Y’
and Residuals.
Note variance of
Y is V(Y’) +
V(res).
′ = +Y a bX
Finding the Regression Line
Need to know the correlation, SDs and means of X and Y.
The correlation is the slope when both X and Y are
expressed as z scores. To translate to raw scores, just bring
back original SDs for both.
N
zz
r YX
XY
∑=
X
Y
XY
SD
SD
rb =
To find the intercept, use: XbYa −=
(rise over run)
Suppose r = .50, SDX = .5, MX = 10, SDY = 2, MY = 5.
2
5.
2
5. ==b 15)10(25 −=−=a XY 215' +−=
Slope Intercept Equation
Line of Least Squares
727068666462
Height
200
180
160
140
120
100
Weight
Regression of Weight on Height
727068666462
Height
Regression of Weight on HeightRegression of Weight on Height
(65,120)
Mean of X
Mean of Y
Deviation from X
Deviation from Y
Linear Part
Error Part
yY'
e
We have some points.
Assume linear relations
is reasonable, so the 2
vbls can be represented
by a line. Where
should the line go?
Place the line so errors (residuals) are small. The line we
calculate has a sum of errors = 0. It has a sum of squared
errors that are as small as possible; the line provides the
smallest sum of squared errors or least squares.
Least Squares (2)
Review
• What does it mean to choose a regression line
to satisfy the loss function of least squares?
• What are predicted values and residuals?
Suppose r = .25, SDX = 1, MX = 10, SDY = 2, MY = 5.
What is the regression equation (line)?
Partitioning the Sum of
Squares
ebXaY ++= bXaY +='
eYY += ' 'YYe −=
Definitions
)'()'( YYYYYY −+−=− = y, deviation from mean
∑∑ −+−=− 22
)]'()'[()( YYYYYY Sum of squares
∑ ∑∑ −+−= 222
)'()'()( YYYYy
(cross products
drop out)
Sum of
squared
deviations
from the
mean
=
Sum of squares
due to
regression
+
Sum of squared
residuals
reg error
Analog: SStot=SSB+SSW
Partitioning SS (2)
SSY=SSReg + SSRes Total SS is regression SS plus
residual SS. Can also get
proportions of each. Can get
variance by dividing SS by N if you
want. Proportion of total SS due to
regression = proportion of total
variance due to regression = R2
(R-square).
Y
s
Y
g
Y
Y
SS
SS
SS
SS
SS
SS ReRe
+=
)1(1 22
RR −+=
Partitioning SS (3)
Wt (Y)
M=150
Y' Resid
(Y-Y')
Resid2
105 2025 108.19 -41.81 1748.076 -3.19 10.1761
120 900 115.16 -34.84 1213.826 4.84 23.4256
120 900 122.13 -27.87 776.7369 -2.13 4.5369
160 100 136.06 -13.94 194.3236 23.94 573.1236
120 900 136.06 -13.94 194.3236 -16.06 257.9236
145 25 156.97 6.97 48.5809 -11.97 143.2809
175 625 163.94 13.94 194.3236 11.06 122.3236
160 100 170.91 20.91 437.2281 -10.91 119.0281
185 1225 184.84 34.84 1213.826 0.16 0.0256
210 3600 205.75 55.75 3108.063 4.25 18.0625
Sum =
1500
10400 1500.01 0.01 9129.307 -0.01 1271.907
Variance 1155.56 1014.37 141.32
2
)( YY − YY −'
2
)'( YY −
Partitioning SS (4)
Total Regress Residual
SS 10400 9129.31 1271.91
Variance 1155.56 1014.37 141.32
12.88.1
10400
91.1271
10400
31.9129
10400
10400
+=⇒+= Proportion of SS
12.88.1
56.1155
32.141
56.1155
37.1014
56.1155
56.1155
+=⇒+= Proportion of
Variance
R2
= .88
Note Y’ is linear function of X, so
.
XYYY rr == 94.'
1' =XYr
012.35..88. '
222
' ===== EYYEYEYY rrrRr
Significance Testing
Testing for the SS due to regression = testing for the variance
due to regression = testing the significance of R2
. All are the
same. 0: 2
0 =populationRH
F
SS df
SS df
SS k
SS N k
reg
res
reg
res
= =
− −
/
/
/
/ ( )
1
2 1
k=number of IVs (here
it’s 1) and N is the
sample size (# people).
F with k and (N-k-1)
df.
F
SS df
SS df
reg
res
= =
− −
=
/
/
. /
. / ( )
.
1
2
9129 31 1
127191 10 1 1
57 42
)1/()1(
/
2
2
−−−
=
kNR
kR
F Equivalent test using R-square
instead of SS.
F =
− − −
=
. /
( . ) / ( )
.
88 1
1 88 10 1 1
58 67
Results will be same within
rounding error.
Review
• What does it mean to test the
significance of the regression sum of
squares? R-square?
• What is R-square?
• Why does testing for the regression sum of
squares turn out to have the same result as
testing for R-square?

More Related Content

What's hot

슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)마이캠퍼스
 
Linear motion of a particle
Linear motion of a particleLinear motion of a particle
Linear motion of a particleKhanSaif2
 
Statr session14, Jan 11
Statr session14, Jan 11Statr session14, Jan 11
Statr session14, Jan 11Ruru Chowdhury
 
Coefficient of variation
Coefficient of variationCoefficient of variation
Coefficient of variationNadeem Uddin
 
2 d transformation
2 d transformation2 d transformation
2 d transformationAnkit Garg
 
2D Transformation
2D Transformation2D Transformation
2D TransformationShahDhruv21
 
Normal Distribution - Find the Value of Mue and Standard Deviation
Normal Distribution - Find the Value of Mue and Standard DeviationNormal Distribution - Find the Value of Mue and Standard Deviation
Normal Distribution - Find the Value of Mue and Standard DeviationSundar B N
 
Linear regression
Linear regressionLinear regression
Linear regressionCarlo Magno
 
Scalar and vector quantities
Scalar and vector quantitiesScalar and vector quantities
Scalar and vector quantitiesRaphael Perez
 
Lect6 transformation2d
Lect6 transformation2dLect6 transformation2d
Lect6 transformation2dBCET
 
Class 11 important questions for physics Scalars and Vectors
Class 11 important questions for physics Scalars and VectorsClass 11 important questions for physics Scalars and Vectors
Class 11 important questions for physics Scalars and VectorsInfomatica Academy
 
CG 2D Transformation
CG 2D TransformationCG 2D Transformation
CG 2D TransformationMohitModyani
 
2D transformation (Computer Graphics)
2D transformation (Computer Graphics)2D transformation (Computer Graphics)
2D transformation (Computer Graphics)Timbal Mayank
 

What's hot (20)

슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Linear motion of a particle
Linear motion of a particleLinear motion of a particle
Linear motion of a particle
 
Statr session14, Jan 11
Statr session14, Jan 11Statr session14, Jan 11
Statr session14, Jan 11
 
Budynas sm ch20
Budynas sm ch20Budynas sm ch20
Budynas sm ch20
 
1637 vector mathematics ap
1637 vector mathematics ap1637 vector mathematics ap
1637 vector mathematics ap
 
Coefficient of variation
Coefficient of variationCoefficient of variation
Coefficient of variation
 
2 d transformation
2 d transformation2 d transformation
2 d transformation
 
2D Transformation
2D Transformation2D Transformation
2D Transformation
 
Transforms UNIt 2
Transforms UNIt 2 Transforms UNIt 2
Transforms UNIt 2
 
2d transformation
2d transformation2d transformation
2d transformation
 
2D Transformation
2D Transformation2D Transformation
2D Transformation
 
Normal Distribution - Find the Value of Mue and Standard Deviation
Normal Distribution - Find the Value of Mue and Standard DeviationNormal Distribution - Find the Value of Mue and Standard Deviation
Normal Distribution - Find the Value of Mue and Standard Deviation
 
Forces
ForcesForces
Forces
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Scalar and vector quantities
Scalar and vector quantitiesScalar and vector quantities
Scalar and vector quantities
 
Lect6 transformation2d
Lect6 transformation2dLect6 transformation2d
Lect6 transformation2d
 
Class 11 important questions for physics Scalars and Vectors
Class 11 important questions for physics Scalars and VectorsClass 11 important questions for physics Scalars and Vectors
Class 11 important questions for physics Scalars and Vectors
 
CG 2D Transformation
CG 2D TransformationCG 2D Transformation
CG 2D Transformation
 
Vectors
VectorsVectors
Vectors
 
2D transformation (Computer Graphics)
2D transformation (Computer Graphics)2D transformation (Computer Graphics)
2D transformation (Computer Graphics)
 

Viewers also liked

05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)Jeonghun Yoon
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...Edureka!
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probabilityguest45a926
 
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...wwwally
 
2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CA
2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CA2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CA
2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CAWalter's Automotive
 
Transcript of records_Gafarov_Bachelor
Transcript of records_Gafarov_BachelorTranscript of records_Gafarov_Bachelor
Transcript of records_Gafarov_Bachelor??????? ???????
 
Sae brochure on six sigma
Sae brochure on six sigmaSae brochure on six sigma
Sae brochure on six sigmaDEEPESH DIXIT
 
MSN Interview Two
MSN Interview TwoMSN Interview Two
MSN Interview Twomisha1992
 
PCIC Data Portal 2.0
PCIC Data Portal 2.0PCIC Data Portal 2.0
PCIC Data Portal 2.0James Hiebert
 
Sharing the spoils: Richard Donnell presentation
Sharing the spoils: Richard Donnell presentationSharing the spoils: Richard Donnell presentation
Sharing the spoils: Richard Donnell presentationResolutionFoundation
 
imovie and logic screenshot images
imovie and logic screenshot imagesimovie and logic screenshot images
imovie and logic screenshot imagesezreen_b
 
Marketing Addiction Treatment, Prevention and Recovery Programs
Marketing Addiction Treatment, Prevention and Recovery ProgramsMarketing Addiction Treatment, Prevention and Recovery Programs
Marketing Addiction Treatment, Prevention and Recovery ProgramsJennifer Iacovelli
 
Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”
Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”
Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”Susanna Antico Lighting Design
 

Viewers also liked (20)

05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)05. k means clustering ( k-means 클러스터링)
05. k means clustering ( k-means 클러스터링)
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
K-Means Clustering Algorithm - Cluster Analysis | Machine Learning Algorithm ...
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
RFM Segmentation
RFM SegmentationRFM Segmentation
RFM Segmentation
 
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
Techdays 2013 managing your hybrid cloud datacenter with scom 2012 and what’s...
 
2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CA
2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CA2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CA
2014 Mercedes-Benz S-Class Serving Corona | Walter's Automotive in CA
 
Transcript of records_Gafarov_Bachelor
Transcript of records_Gafarov_BachelorTranscript of records_Gafarov_Bachelor
Transcript of records_Gafarov_Bachelor
 
Global Watch
Global Watch Global Watch
Global Watch
 
Sae brochure on six sigma
Sae brochure on six sigmaSae brochure on six sigma
Sae brochure on six sigma
 
MSN Interview Two
MSN Interview TwoMSN Interview Two
MSN Interview Two
 
Public Sector Enterprises Newsletter, January 2014
Public Sector Enterprises Newsletter, January 2014Public Sector Enterprises Newsletter, January 2014
Public Sector Enterprises Newsletter, January 2014
 
Resume template 15
Resume template 15Resume template 15
Resume template 15
 
Banco común de conocimientos
Banco común de conocimientosBanco común de conocimientos
Banco común de conocimientos
 
PCIC Data Portal 2.0
PCIC Data Portal 2.0PCIC Data Portal 2.0
PCIC Data Portal 2.0
 
Sharing the spoils: Richard Donnell presentation
Sharing the spoils: Richard Donnell presentationSharing the spoils: Richard Donnell presentation
Sharing the spoils: Richard Donnell presentation
 
imovie and logic screenshot images
imovie and logic screenshot imagesimovie and logic screenshot images
imovie and logic screenshot images
 
Marketing Addiction Treatment, Prevention and Recovery Programs
Marketing Addiction Treatment, Prevention and Recovery ProgramsMarketing Addiction Treatment, Prevention and Recovery Programs
Marketing Addiction Treatment, Prevention and Recovery Programs
 
Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”
Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”
Luce - Publisher AIDI; No.5/2010 - Article “The craft of Lighting design”
 

Similar to 15 regression basics

EE301 Lesson 15 Phasors Complex Numbers and Impedance (2).ppt
EE301 Lesson 15 Phasors Complex Numbers and Impedance (2).pptEE301 Lesson 15 Phasors Complex Numbers and Impedance (2).ppt
EE301 Lesson 15 Phasors Complex Numbers and Impedance (2).pptRyanAnderson41811
 
Multiple Regression Case
Multiple Regression CaseMultiple Regression Case
Multiple Regression CaseKusuma Arifiani
 
regressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdfregressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdfAdikesavaperumal
 
Initial value problems
Initial value problemsInitial value problems
Initial value problemsAli Jan Hasan
 
Quantum Computing 101, Part 2 - Hello Entangled World
Quantum Computing 101, Part 2 - Hello Entangled WorldQuantum Computing 101, Part 2 - Hello Entangled World
Quantum Computing 101, Part 2 - Hello Entangled WorldAaronTurner9
 
Regression Presentation.pptx
Regression Presentation.pptxRegression Presentation.pptx
Regression Presentation.pptxMuhammadMuslim25
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear modelRahul Rockers
 
Mathematics TAKS Exit Level Review
Mathematics TAKS Exit Level ReviewMathematics TAKS Exit Level Review
Mathematics TAKS Exit Level Reviewguest3f17823
 
The synthetic power of influence lines
The synthetic power of influence linesThe synthetic power of influence lines
The synthetic power of influence linesLorenzo Sperduto
 
Scaling and Units of Measurement
Scaling and Units of MeasurementScaling and Units of Measurement
Scaling and Units of MeasurementBharathiviji
 
characteristic of function, average rate chnage, instant rate chnage.pptx
characteristic of function, average rate chnage, instant rate chnage.pptxcharacteristic of function, average rate chnage, instant rate chnage.pptx
characteristic of function, average rate chnage, instant rate chnage.pptxPallaviGupta66118
 
Capítulo 04 carga e análise de tensão
Capítulo 04   carga e análise de tensãoCapítulo 04   carga e análise de tensão
Capítulo 04 carga e análise de tensãoJhayson Carvalho
 

Similar to 15 regression basics (20)

Bivariate
BivariateBivariate
Bivariate
 
vertical-curves
vertical-curvesvertical-curves
vertical-curves
 
EE301 Lesson 15 Phasors Complex Numbers and Impedance (2).ppt
EE301 Lesson 15 Phasors Complex Numbers and Impedance (2).pptEE301 Lesson 15 Phasors Complex Numbers and Impedance (2).ppt
EE301 Lesson 15 Phasors Complex Numbers and Impedance (2).ppt
 
01_SLR_final (1).pptx
01_SLR_final (1).pptx01_SLR_final (1).pptx
01_SLR_final (1).pptx
 
Multiple Regression Case
Multiple Regression CaseMultiple Regression Case
Multiple Regression Case
 
regressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdfregressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdf
 
Initial value problems
Initial value problemsInitial value problems
Initial value problems
 
Quantum Computing 101, Part 2 - Hello Entangled World
Quantum Computing 101, Part 2 - Hello Entangled WorldQuantum Computing 101, Part 2 - Hello Entangled World
Quantum Computing 101, Part 2 - Hello Entangled World
 
Regression Presentation.pptx
Regression Presentation.pptxRegression Presentation.pptx
Regression Presentation.pptx
 
regression.ppt
regression.pptregression.ppt
regression.ppt
 
Generalized linear model
Generalized linear modelGeneralized linear model
Generalized linear model
 
Mathematics TAKS Exit Level Review
Mathematics TAKS Exit Level ReviewMathematics TAKS Exit Level Review
Mathematics TAKS Exit Level Review
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
The synthetic power of influence lines
The synthetic power of influence linesThe synthetic power of influence lines
The synthetic power of influence lines
 
Presentation2 stats
Presentation2 statsPresentation2 stats
Presentation2 stats
 
Regression
RegressionRegression
Regression
 
Scaling and Units of Measurement
Scaling and Units of MeasurementScaling and Units of Measurement
Scaling and Units of Measurement
 
characteristic of function, average rate chnage, instant rate chnage.pptx
characteristic of function, average rate chnage, instant rate chnage.pptxcharacteristic of function, average rate chnage, instant rate chnage.pptx
characteristic of function, average rate chnage, instant rate chnage.pptx
 
Angle Pairs
Angle PairsAngle Pairs
Angle Pairs
 
Capítulo 04 carga e análise de tensão
Capítulo 04   carga e análise de tensãoCapítulo 04   carga e análise de tensão
Capítulo 04 carga e análise de tensão
 

15 regression basics

  • 1. Regression Basics Predicting a DV with a Single IV
  • 2. Questions • What are predictors and criteria? • Write an equation for the linear regression. Describe each term. • How do changes in the slope and intercept affect (move) the regression line? • What does it mean to test the significance of the regression sum of squares? R-square? • What is R-square? • What does it mean to choose a regression line to satisfy the loss function of least squares? • How do we find the slope and intercept for the regression line with a single independent variable? (Either formula for the slope is acceptable.) • Why does testing for the regression sum of squares turn out to have the same result as testing for R- square?
  • 3. Basic Ideas • Jargon – IV = X = Predictor (pl. predictors) – DV = Y = Criterion (pl. criteria) – Regression of Y on X e.g., GPA on SAT • Linear Model = relations between IV and DV represented by straight line. • A score on Y has 2 parts – (1) linear function of X and (2) error. Y Xi i i= + +α β ε (population values)
  • 4. Basic Ideas (2) • Sample value: • Intercept – place where X=0 • Slope – change in Y if X changes 1 unit. Rise over run. • If error is removed, we have a predicted value for each person at X (the line): Y a bX ei i i= + + ′ = +Y a bX Suppose on average houses are worth about $75.00 a square foot. Then the equation relating price to size would be Y’=0+75X. The predicted price for a 2000 square foot house would be $150,000.
  • 5. Linear Transformation • 1 to 1 mapping of variables via line • Permissible operations are addition and multiplication (interval data) 1086420 X 40 35 30 25 20 15 10 5 0 Y Changing the Y Intercept Y=5+2X Y=10+2X Y=15+2X Add a constant 1086420 X 30 20 10 0 Y Changing the Slope Y=5+.5X Y=5+X Y=5+2X Multiply by a constant ′ = +Y a bX
  • 6. Linear Transformation (2) • Centigrade to Fahrenheit • Note 1 to 1 map • Intercept? • Slope? 1209060300 Degrees C 240 200 160 120 80 40 0 DegreesF 32 degrees F, 0 degrees C 212 degrees F, 100 degrees C Intercept is 32. When X (Cent) is 0, Y (Fahr) is 32. Slope is 1.8. When Cent goes from 0 to 100 (run), Fahr goes from 32 to 212 (rise), and 212-32 = 180. Then 180/100 =1.8 is rise over run is the slope. Y = 32+1.8X. F=32+1.8C. ′ = +Y a bX
  • 7. Review • What are predictors and criteria? • Write an equation for the linear regression with 1 IV. Describe each term. • How do changes in the slope and intercept affect (move) the regression line?
  • 8. Regression of Weight on Height Ht Wt 61 105 62 120 63 120 65 160 65 120 68 145 69 175 70 160 72 185 75 210 N=10 N=10 M=67 M=150 SD=4.57 SD= 33.99 767472706866646260 Height in Inches 240 210 180 150 120 90 60 Regression of Weight on HeightRegression of Weight on HeightRegression of Weight on Height Rise Run Y= -316.86+6.97X X Correlation (r) = .94. Regression equation: Y’=-316.86+6.97X ′ = +Y a bX
  • 9. Illustration of the Linear Model. This concept is vital! 727068666462 Height 200 180 160 140 120 100 Weight Regression of Weight on Height 727068666462 Height Regression of Weight on HeightRegression of Weight on Height (65,120) Mean of X Mean of Y Deviation from X Deviation from Y Linear Part Error Part yY' e Y Xi i i= + +α β ε Y a bX ei i i= + + Consider Y as a deviation from the mean. Part of that deviation can be associated with X (the linear part) and part cannot (the error). ′ = +Y a bX ' iii YYe −=
  • 10. Predicted Values & Residuals N Ht Wt Y' Resid 1 61 105 108.19 -3.19 2 62 120 115.16 4.84 3 63 120 122.13 -2.13 4 65 160 136.06 23.94 5 65 120 136.06 -16.06 6 68 145 156.97 -11.97 7 69 175 163.94 11.06 8 70 160 170.91 -10.91 9 72 185 184.84 0.16 10 75 210 205.75 4.25 M 67 150 150.00 0.00 SD 4.57 33.99 31.85 11.89 V 20.89 1155.56 1014.37 141.32 727068666462 Height 200 180 160 140 120 100 Weight Regression of Weight on Height 727068666462 Height Regression of Weight on HeightRegression of Weight on Height (65,120) Mean of X Mean of Y Deviation from X Deviation from Y Linear Part Error Part yY' e Numbers for linear part and error. Note M of Y’ and Residuals. Note variance of Y is V(Y’) + V(res). ′ = +Y a bX
  • 11. Finding the Regression Line Need to know the correlation, SDs and means of X and Y. The correlation is the slope when both X and Y are expressed as z scores. To translate to raw scores, just bring back original SDs for both. N zz r YX XY ∑= X Y XY SD SD rb = To find the intercept, use: XbYa −= (rise over run) Suppose r = .50, SDX = .5, MX = 10, SDY = 2, MY = 5. 2 5. 2 5. ==b 15)10(25 −=−=a XY 215' +−= Slope Intercept Equation
  • 12. Line of Least Squares 727068666462 Height 200 180 160 140 120 100 Weight Regression of Weight on Height 727068666462 Height Regression of Weight on HeightRegression of Weight on Height (65,120) Mean of X Mean of Y Deviation from X Deviation from Y Linear Part Error Part yY' e We have some points. Assume linear relations is reasonable, so the 2 vbls can be represented by a line. Where should the line go? Place the line so errors (residuals) are small. The line we calculate has a sum of errors = 0. It has a sum of squared errors that are as small as possible; the line provides the smallest sum of squared errors or least squares.
  • 14. Review • What does it mean to choose a regression line to satisfy the loss function of least squares? • What are predicted values and residuals? Suppose r = .25, SDX = 1, MX = 10, SDY = 2, MY = 5. What is the regression equation (line)?
  • 15. Partitioning the Sum of Squares ebXaY ++= bXaY +=' eYY += ' 'YYe −= Definitions )'()'( YYYYYY −+−=− = y, deviation from mean ∑∑ −+−=− 22 )]'()'[()( YYYYYY Sum of squares ∑ ∑∑ −+−= 222 )'()'()( YYYYy (cross products drop out) Sum of squared deviations from the mean = Sum of squares due to regression + Sum of squared residuals reg error Analog: SStot=SSB+SSW
  • 16. Partitioning SS (2) SSY=SSReg + SSRes Total SS is regression SS plus residual SS. Can also get proportions of each. Can get variance by dividing SS by N if you want. Proportion of total SS due to regression = proportion of total variance due to regression = R2 (R-square). Y s Y g Y Y SS SS SS SS SS SS ReRe += )1(1 22 RR −+=
  • 17. Partitioning SS (3) Wt (Y) M=150 Y' Resid (Y-Y') Resid2 105 2025 108.19 -41.81 1748.076 -3.19 10.1761 120 900 115.16 -34.84 1213.826 4.84 23.4256 120 900 122.13 -27.87 776.7369 -2.13 4.5369 160 100 136.06 -13.94 194.3236 23.94 573.1236 120 900 136.06 -13.94 194.3236 -16.06 257.9236 145 25 156.97 6.97 48.5809 -11.97 143.2809 175 625 163.94 13.94 194.3236 11.06 122.3236 160 100 170.91 20.91 437.2281 -10.91 119.0281 185 1225 184.84 34.84 1213.826 0.16 0.0256 210 3600 205.75 55.75 3108.063 4.25 18.0625 Sum = 1500 10400 1500.01 0.01 9129.307 -0.01 1271.907 Variance 1155.56 1014.37 141.32 2 )( YY − YY −' 2 )'( YY −
  • 18. Partitioning SS (4) Total Regress Residual SS 10400 9129.31 1271.91 Variance 1155.56 1014.37 141.32 12.88.1 10400 91.1271 10400 31.9129 10400 10400 +=⇒+= Proportion of SS 12.88.1 56.1155 32.141 56.1155 37.1014 56.1155 56.1155 +=⇒+= Proportion of Variance R2 = .88 Note Y’ is linear function of X, so . XYYY rr == 94.' 1' =XYr 012.35..88. ' 222 ' ===== EYYEYEYY rrrRr
  • 19. Significance Testing Testing for the SS due to regression = testing for the variance due to regression = testing the significance of R2 . All are the same. 0: 2 0 =populationRH F SS df SS df SS k SS N k reg res reg res = = − − / / / / ( ) 1 2 1 k=number of IVs (here it’s 1) and N is the sample size (# people). F with k and (N-k-1) df. F SS df SS df reg res = = − − = / / . / . / ( ) . 1 2 9129 31 1 127191 10 1 1 57 42 )1/()1( / 2 2 −−− = kNR kR F Equivalent test using R-square instead of SS. F = − − − = . / ( . ) / ( ) . 88 1 1 88 10 1 1 58 67 Results will be same within rounding error.
  • 20. Review • What does it mean to test the significance of the regression sum of squares? R-square? • What is R-square? • Why does testing for the regression sum of squares turn out to have the same result as testing for R-square?