SlideShare a Scribd company logo
1 of 17
Download to read offline
Chapter 14 Part I

    ISDS 2001
     Matt Levy
Introduction
Regression is the term used to describe the technique of
modeling and analyzing 1 or more variables.

The focus is on a dependent variable, and one or more
independent variables.

Simple Linear Regression means 1 independent variable.

Regression, and other statistical modeling techniques gives us
the power to infer, or predict future outcomes.

An understanding of regression, and the techniques used to
validate your models will provide you with sound methodology
to do just that.
Simple Linear Regression
As previously mentioned, simple linear regression means we
have 1 dependent variable (y), and 1 independent variable (x).

In order to make a prediction about y using x, we need sample
data (from both x and y) in order to generate some additional
terms, namely the parameters (β0 and β1), and an error term
(ε).

The parameters, β0 and β1, can be thought of as what is
generated from explained variability

The error term (ε) accounts for unexplained variability.

Thus, the simple linear regression model is: y = β0 + β1x + ε
Estimating the Regression Equation
If we were so fortunate to know the population parameters, we could
use the equation on the previous slide to compute the mean.

Unfortunately, for us, we must use sample data to estimate these
parameters, and subsequently, use different symbols to denote our
estimated parameters:
   ŷ = b 0 + b 1x

Note that we use place a hat over y (pronounced y-hat) and use
english lettering to denote our estimated parameters.

We now have an equation that graphs a "regression line"
   ŷ is the point estimator of E(y), the mean.
   b0 is the y-intercept
   b1 is the slope
The Estimation Process for Simple Linear
Regression
The Estimation Process for Simple Linear
Regression
So how to we estimate b0 and b1?

To do this we use a method known as least squares.

In simple linear regression, finding b0 and b1 is relatively straightforward.

Equations 14.6 and 14.7 in your book show the procedure for b0 and
b1, respectively.

Once b0 and b1 are obtained, the estimated simple linear regression equation
will resemble the following:
   ŷ = 60 + 5x

It is important to note that you will have a ŷi for every yi in the sample data-set.

It is up to you to determine if the difference between them is small enough to         de
the equation an accurate predictor.
Coefficient of Determination
The Coefficient of Determination (r2) provides us one measure to judge how well our
regression equation (for example: ŷ = 60 + 5x ) fits the actual data.

Lets take some time to build r2 and learn some important terms along the way:

◆ Remember that we have an estimated dependent variable (ŷi ) and an actual dependent
variable (yi) for each observation.

◆ (yi - ŷi ) is known as the ith residual.

◆ When we take (yi - ŷi ), square it, and sum the squares we get the Sum of Squares of the
Error Terms (SSE) , hence SSE = ∑(yi - ŷi)2 .

◆ When we take (yi - y̅), square it, and sum the squares we get the Total Sum of Squares
(SST), hence SST = ∑ (yi - y̅)2

◆ Lastly, when we take (ŷi - y̅), square it, and sum the squares, we get a measure of how
much the estimated values on the regression line deviate from the actual mean.

◆ This is known as the Sum of Squares of the Regression Line:  SSR = ∑ (ŷi - y̅)2
Coefficient of Determination (con't)
The relationship between SSR, SST, and SSE is one of the most important
facts to know in statistics.

SST = SSR + SSE

Now, if (yi - ŷi ) = 0 for each ith observation, SST = SSR, and we have a perfect
fit of the data. This is never the case.

On the flip side, if SST - SSR = 0, we have the worst possible fit because
everything is in the error term, or the unexplained portion of the equation.

Hence to measure of goodness of fit we look at the ratio of SSR to SST.

r2 = SSR/SST

This yields a value between 0 and 1.

r2 can be interpreted as the % of the total sum of squares (SST) that can be
explained by using your estimated regression equation.
Correlation Coefficient
Denoted rxy, is a measure of the strength of the linear association between the
independent (x) and dependent variable (y).

rxy = (sign of b1) √r2

rxy always yields a value between (-1, +1).

A value of 1 indicates perfect positive linear relationship.

A value of -1 indicates perfect negative linear relationship.

A value of zero indicates no relationship.

In practice, this is used much less as it only provides an accurate
measurement in the case of perfectly linear relationships.

r2 can be used to measure goodness-of-fit in linear and nonlinear relationships.
Estimating the Regression Equation
In this model, y can be thought of as having a distribution for a
given range of x values.

As we have learned in the past, a distribution has a mean or
expected value.

Thus the regression equation for the mean is as follows:
  E(y) = β0 + β1x

Notice that to obtain the mean, we simply remove our ability to
account for unexplained variance.
Model Assumptions
It is important to understand that r2 is not enough to ensure we have an
appropriate regression equation.

There are numerous other tests and measures we must use.

All of these tests are based on assumptions about the error term (ε)
1. E(ε) = 0.
Implication: E(y) = β0 + β1x

2. The variance of ε, denoted by σ2 is the same for all values of x.
Implication: The variance of y equals σ2 and is the same for all values of x.

3. The values of ε are independent (uncorrelated)
Implication: The value of y for any x is not related to value of y for any other x.

4. ε is a normally distributed random variable.
Implication: Because y is a linear function of ε, y is also normally distributed.

Table 14.14 in the text provides a complete explanation.
Testing for Significance
In Simple Linear Regression, the mean or expected value of y is a linear
function of x (E(y) = β0 + β1x )

If the value of β1 = 0, then E(y) = β0 + 0x = β0.

Hence, in this case we can conclude x and y are not linearly related.

In the next, couple of slides we offer a few tests, the t-test, an evaluation of the
confidence interval for β1, the F-test.

Each of these test are based on the following hypothesis:

  H 0: β 1 = 0
  Ha: β1 ≠ 0

This starts to tell us more about the appropriateness of our model.
2
Estimating σ
As a pre-cursor to running our tests, we need an estimate of σ2.

Recall one of our key assumptions that variance of ε also represents the
variance of y.

Also recall the deviations of y about the regression line are called residuals.

Hence we can call upon the SSE to calculate the Mean Square Error (MSE) as
an estimate of σ2 which we will denote as s2.

s2 = MSE = SSE/(n-2), where n is the sample size and (n-2) is the
model degrees of freedom.

Consequently, to get the standard error (s) of the estimate: √MSE.
t  Test
Remember we are testing the following: H0: β1 = 0; Ha: β1 ≠ 0

To do this we need information about the distribution of b1 (see figure 14.17).,
specifically, we need the estimated standard deviation of b1 (see figure 14.18)

Once we have sb1 we can find the test statistic t: t = b1/sb1.

And using the t-table and our well known rejection rules:

Reject H0 if p-value ≤ α .

where t α/2 is based on a t-distribution with n-2 degrees of freedom.
Confidence Interval for β1
As an alternative to the t-test, we can check the confidence interval for β1

We are essentially checking to see if the interval of β1 contains 0.

The form of the confidence interval is as follows:

b1 ± t α/2*sb1

If this interval contains zero at the designated significance level, we cannot
reject the null hypothesis (H0).
F-Test
Based on the F probability distribution (hence, using our F-table)

In simple linear regression this does the same thing as the t-test.

With more than one independent variable (multiple regression) ONLY the F-
test can be used to test for overall significance.

To arrive at the F-Test statistic, we need the Mean Square due to Regression
(MSR).

MSR = MSE / (Number of Independent Variables)

F = MSR/MSE (Just like when we first learned ANOVA)

And using the F-table and our well known rejection rules:

Reject H0 if p-value ≤ α .

where F α is based on an F-distribution with 1 degree of freedom (for SLR) in
the numerator and (n-2) degrees of freedom in the denominator.
Caution about the Interpretation of
Significance Testing
Correlation is not causation!

Just because we Reject H0 does not guarantee cause-
and-effect, theoretical justification must be warranted.

Furthermore, just because we can Reject H0 does not
mean the relationship between x and y is linear.

More Related Content

What's hot

Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisASAD ALI
 
Data analysis 1
Data analysis 1Data analysis 1
Data analysis 1Bùi Trâm
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.sonia gupta
 
Simple regression and correlation
Simple regression and correlationSimple regression and correlation
Simple regression and correlationMary Grace
 
Chapter 2 part2-Correlation
Chapter 2 part2-CorrelationChapter 2 part2-Correlation
Chapter 2 part2-Correlationnszakir
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation AnalysisSuresh Babu
 
Correlation
CorrelationCorrelation
CorrelationTech_MX
 
Pearson's correlation
Pearson's  correlationPearson's  correlation
Pearson's correlationTRIPTI DUBEY
 
Correlation &regression
Correlation &regressionCorrelation &regression
Correlation &regressionJIMS
 
Regression & correlation coefficient
Regression & correlation coefficientRegression & correlation coefficient
Regression & correlation coefficientMuhamamdZiaSamad
 
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...RekhaChoudhary24
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regressionKeyur Tejani
 

What's hot (16)

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Data analysis 1
Data analysis 1Data analysis 1
Data analysis 1
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Simple regression and correlation
Simple regression and correlationSimple regression and correlation
Simple regression and correlation
 
Chapter 2 part2-Correlation
Chapter 2 part2-CorrelationChapter 2 part2-Correlation
Chapter 2 part2-Correlation
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
 
Correlation
CorrelationCorrelation
Correlation
 
Correlation
CorrelationCorrelation
Correlation
 
Pearson's correlation
Pearson's  correlationPearson's  correlation
Pearson's correlation
 
Correlation &regression
Correlation &regressionCorrelation &regression
Correlation &regression
 
Regression
RegressionRegression
Regression
 
Regression & correlation coefficient
Regression & correlation coefficientRegression & correlation coefficient
Regression & correlation coefficient
 
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
Simple Correlation : Karl Pearson’s Correlation co- efficient and Spearman’s ...
 
correlation and regression
correlation and regressioncorrelation and regression
correlation and regression
 

Viewers also liked

Viewers also liked (8)

Non linear curve fitting
Non linear curve fitting Non linear curve fitting
Non linear curve fitting
 
Mathematical modeling
Mathematical modelingMathematical modeling
Mathematical modeling
 
phd thesis presentation
phd thesis presentationphd thesis presentation
phd thesis presentation
 
Curve fitting - Lecture Notes
Curve fitting - Lecture NotesCurve fitting - Lecture Notes
Curve fitting - Lecture Notes
 
Es272 ch5a
Es272 ch5aEs272 ch5a
Es272 ch5a
 
Es272 ch1
Es272 ch1Es272 ch1
Es272 ch1
 
case study of curve fitting
case study of curve fittingcase study of curve fitting
case study of curve fitting
 
metode numerik kurva fitting dan regresi
metode numerik kurva fitting dan regresimetode numerik kurva fitting dan regresi
metode numerik kurva fitting dan regresi
 

Similar to Chapter 14 Part I

Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).pptMuhammadAftab89
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.pptRidaIrfan10
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptkrunal soni
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.pptMoinPasha12
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Sciencessuser71ac73
 
Linear Regression
Linear Regression Linear Regression
Linear Regression Rupak Roy
 
Materi_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfMateri_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfHasan Dwi Cahyono
 
Two-Variable (Bivariate) RegressionIn the last unit, we covered
Two-Variable (Bivariate) RegressionIn the last unit, we covered Two-Variable (Bivariate) RegressionIn the last unit, we covered
Two-Variable (Bivariate) RegressionIn the last unit, we covered LacieKlineeb
 
Lecture.3.regression.all
Lecture.3.regression.allLecture.3.regression.all
Lecture.3.regression.allKUBUKE JACKSON
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dMerrileeDelvalle969
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and ApplicationUniversity of Salerno
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREShriramKargaonkar
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download herekeerthanakshatriya20
 

Similar to Chapter 14 Part I (20)

Chapter 14 Part Ii
Chapter 14 Part IiChapter 14 Part Ii
Chapter 14 Part Ii
 
Corr And Regress
Corr And RegressCorr And Regress
Corr And Regress
 
Corr-and-Regress (1).ppt
Corr-and-Regress (1).pptCorr-and-Regress (1).ppt
Corr-and-Regress (1).ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Cr-and-Regress.ppt
Cr-and-Regress.pptCr-and-Regress.ppt
Cr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Corr-and-Regress.ppt
Corr-and-Regress.pptCorr-and-Regress.ppt
Corr-and-Regress.ppt
 
Correlation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social ScienceCorrelation & Regression for Statistics Social Science
Correlation & Regression for Statistics Social Science
 
Chapter 15
Chapter 15Chapter 15
Chapter 15
 
Linear Regression
Linear Regression Linear Regression
Linear Regression
 
Materi_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdfMateri_Business_Intelligence_1.pdf
Materi_Business_Intelligence_1.pdf
 
Two-Variable (Bivariate) RegressionIn the last unit, we covered
Two-Variable (Bivariate) RegressionIn the last unit, we covered Two-Variable (Bivariate) RegressionIn the last unit, we covered
Two-Variable (Bivariate) RegressionIn the last unit, we covered
 
Lecture.3.regression.all
Lecture.3.regression.allLecture.3.regression.all
Lecture.3.regression.all
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The d
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
 
9. parametric regression
9. parametric regression9. parametric regression
9. parametric regression
 
REGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HEREREGRESSION ANALYSIS THEORY EXPLAINED HERE
REGRESSION ANALYSIS THEORY EXPLAINED HERE
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download here
 
Powerpoint2.reg
Powerpoint2.regPowerpoint2.reg
Powerpoint2.reg
 

More from Matthew L Levy

More from Matthew L Levy (8)

Chapter 15R Lecture
Chapter 15R LectureChapter 15R Lecture
Chapter 15R Lecture
 
Chapter 14R
Chapter 14RChapter 14R
Chapter 14R
 
Chapter 5R
Chapter 5RChapter 5R
Chapter 5R
 
Chapter 4R Part II
Chapter 4R Part IIChapter 4R Part II
Chapter 4R Part II
 
Chapter 4 R Part I
Chapter 4 R Part IChapter 4 R Part I
Chapter 4 R Part I
 
Chapter 20 Lecture Notes
Chapter 20 Lecture NotesChapter 20 Lecture Notes
Chapter 20 Lecture Notes
 
Chapter 18 Part I
Chapter 18 Part IChapter 18 Part I
Chapter 18 Part I
 
Chapter 16
Chapter 16Chapter 16
Chapter 16
 

Recently uploaded

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 

Recently uploaded (20)

BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 

Chapter 14 Part I

  • 1. Chapter 14 Part I ISDS 2001 Matt Levy
  • 2. Introduction Regression is the term used to describe the technique of modeling and analyzing 1 or more variables. The focus is on a dependent variable, and one or more independent variables. Simple Linear Regression means 1 independent variable. Regression, and other statistical modeling techniques gives us the power to infer, or predict future outcomes. An understanding of regression, and the techniques used to validate your models will provide you with sound methodology to do just that.
  • 3. Simple Linear Regression As previously mentioned, simple linear regression means we have 1 dependent variable (y), and 1 independent variable (x). In order to make a prediction about y using x, we need sample data (from both x and y) in order to generate some additional terms, namely the parameters (β0 and β1), and an error term (ε). The parameters, β0 and β1, can be thought of as what is generated from explained variability The error term (ε) accounts for unexplained variability. Thus, the simple linear regression model is: y = β0 + β1x + ε
  • 4. Estimating the Regression Equation If we were so fortunate to know the population parameters, we could use the equation on the previous slide to compute the mean. Unfortunately, for us, we must use sample data to estimate these parameters, and subsequently, use different symbols to denote our estimated parameters: ŷ = b 0 + b 1x Note that we use place a hat over y (pronounced y-hat) and use english lettering to denote our estimated parameters. We now have an equation that graphs a "regression line" ŷ is the point estimator of E(y), the mean. b0 is the y-intercept b1 is the slope
  • 5. The Estimation Process for Simple Linear Regression
  • 6. The Estimation Process for Simple Linear Regression So how to we estimate b0 and b1? To do this we use a method known as least squares. In simple linear regression, finding b0 and b1 is relatively straightforward. Equations 14.6 and 14.7 in your book show the procedure for b0 and b1, respectively. Once b0 and b1 are obtained, the estimated simple linear regression equation will resemble the following: ŷ = 60 + 5x It is important to note that you will have a ŷi for every yi in the sample data-set. It is up to you to determine if the difference between them is small enough to de the equation an accurate predictor.
  • 7. Coefficient of Determination The Coefficient of Determination (r2) provides us one measure to judge how well our regression equation (for example: ŷ = 60 + 5x ) fits the actual data. Lets take some time to build r2 and learn some important terms along the way: ◆ Remember that we have an estimated dependent variable (ŷi ) and an actual dependent variable (yi) for each observation. ◆ (yi - ŷi ) is known as the ith residual. ◆ When we take (yi - ŷi ), square it, and sum the squares we get the Sum of Squares of the Error Terms (SSE) , hence SSE = ∑(yi - ŷi)2 . ◆ When we take (yi - y̅), square it, and sum the squares we get the Total Sum of Squares (SST), hence SST = ∑ (yi - y̅)2 ◆ Lastly, when we take (ŷi - y̅), square it, and sum the squares, we get a measure of how much the estimated values on the regression line deviate from the actual mean. ◆ This is known as the Sum of Squares of the Regression Line:  SSR = ∑ (ŷi - y̅)2
  • 8. Coefficient of Determination (con't) The relationship between SSR, SST, and SSE is one of the most important facts to know in statistics. SST = SSR + SSE Now, if (yi - ŷi ) = 0 for each ith observation, SST = SSR, and we have a perfect fit of the data. This is never the case. On the flip side, if SST - SSR = 0, we have the worst possible fit because everything is in the error term, or the unexplained portion of the equation. Hence to measure of goodness of fit we look at the ratio of SSR to SST. r2 = SSR/SST This yields a value between 0 and 1. r2 can be interpreted as the % of the total sum of squares (SST) that can be explained by using your estimated regression equation.
  • 9. Correlation Coefficient Denoted rxy, is a measure of the strength of the linear association between the independent (x) and dependent variable (y). rxy = (sign of b1) √r2 rxy always yields a value between (-1, +1). A value of 1 indicates perfect positive linear relationship. A value of -1 indicates perfect negative linear relationship. A value of zero indicates no relationship. In practice, this is used much less as it only provides an accurate measurement in the case of perfectly linear relationships. r2 can be used to measure goodness-of-fit in linear and nonlinear relationships.
  • 10. Estimating the Regression Equation In this model, y can be thought of as having a distribution for a given range of x values. As we have learned in the past, a distribution has a mean or expected value. Thus the regression equation for the mean is as follows: E(y) = β0 + β1x Notice that to obtain the mean, we simply remove our ability to account for unexplained variance.
  • 11. Model Assumptions It is important to understand that r2 is not enough to ensure we have an appropriate regression equation. There are numerous other tests and measures we must use. All of these tests are based on assumptions about the error term (ε) 1. E(ε) = 0. Implication: E(y) = β0 + β1x 2. The variance of ε, denoted by σ2 is the same for all values of x. Implication: The variance of y equals σ2 and is the same for all values of x. 3. The values of ε are independent (uncorrelated) Implication: The value of y for any x is not related to value of y for any other x. 4. ε is a normally distributed random variable. Implication: Because y is a linear function of ε, y is also normally distributed. Table 14.14 in the text provides a complete explanation.
  • 12. Testing for Significance In Simple Linear Regression, the mean or expected value of y is a linear function of x (E(y) = β0 + β1x ) If the value of β1 = 0, then E(y) = β0 + 0x = β0. Hence, in this case we can conclude x and y are not linearly related. In the next, couple of slides we offer a few tests, the t-test, an evaluation of the confidence interval for β1, the F-test. Each of these test are based on the following hypothesis: H 0: β 1 = 0 Ha: β1 ≠ 0 This starts to tell us more about the appropriateness of our model.
  • 13. 2 Estimating σ As a pre-cursor to running our tests, we need an estimate of σ2. Recall one of our key assumptions that variance of ε also represents the variance of y. Also recall the deviations of y about the regression line are called residuals. Hence we can call upon the SSE to calculate the Mean Square Error (MSE) as an estimate of σ2 which we will denote as s2. s2 = MSE = SSE/(n-2), where n is the sample size and (n-2) is the model degrees of freedom. Consequently, to get the standard error (s) of the estimate: √MSE.
  • 14. t  Test Remember we are testing the following: H0: β1 = 0; Ha: β1 ≠ 0 To do this we need information about the distribution of b1 (see figure 14.17)., specifically, we need the estimated standard deviation of b1 (see figure 14.18) Once we have sb1 we can find the test statistic t: t = b1/sb1. And using the t-table and our well known rejection rules: Reject H0 if p-value ≤ α . where t α/2 is based on a t-distribution with n-2 degrees of freedom.
  • 15. Confidence Interval for β1 As an alternative to the t-test, we can check the confidence interval for β1 We are essentially checking to see if the interval of β1 contains 0. The form of the confidence interval is as follows: b1 ± t α/2*sb1 If this interval contains zero at the designated significance level, we cannot reject the null hypothesis (H0).
  • 16. F-Test Based on the F probability distribution (hence, using our F-table) In simple linear regression this does the same thing as the t-test. With more than one independent variable (multiple regression) ONLY the F- test can be used to test for overall significance. To arrive at the F-Test statistic, we need the Mean Square due to Regression (MSR). MSR = MSE / (Number of Independent Variables) F = MSR/MSE (Just like when we first learned ANOVA) And using the F-table and our well known rejection rules: Reject H0 if p-value ≤ α . where F α is based on an F-distribution with 1 degree of freedom (for SLR) in the numerator and (n-2) degrees of freedom in the denominator.
  • 17. Caution about the Interpretation of Significance Testing Correlation is not causation! Just because we Reject H0 does not guarantee cause- and-effect, theoretical justification must be warranted. Furthermore, just because we can Reject H0 does not mean the relationship between x and y is linear.