SlideShare a Scribd company logo
1 of 40
Slide-1
SIMPLE LINEAR
REGRESSION
Slide-2
Learning Objectives
 How to use regression analysis to predict the value of
a dependent variable based on an independent
variable
 The meaning of the regression coefficients b0 and b1
 How to evaluate the assumptions of regression
analysis and know what to do if the assumptions are
violated
 To make inferences about the slope and correlation
coefficient
 To estimate mean values and predict individual values
Slide-3
Correlation vs. Regression
 A scatter diagram can be used to show the
relationship between two variables
 Correlation analysis is used to measure
strength of the association (linear relationship)
between two variables
 Correlation is only concerned with strength of the
relationship
 No causal effect is implied with correlation
Slide-4
Introduction to
Regression Analysis
 Regression analysis is used to:
 Predict the value of a dependent variable based on the
value of at least one independent variable
 Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to predict
or explain
Independent variable: the variable used to explain
the dependent variable
Slide-5
Simple Linear Regression
Model
 Only one independent variable, X
 Relationship between X and Y is
described by a linear function
 Changes in Y are assumed to be caused
by changes in X
Slide-6
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Slide-7
Types of Relationships
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
(continued)
Slide-8
Types of Relationships
Y
X
Y
X
No relationship
(continued)
Department of Statistics, ITS Surabaya Slide-9
i
i
1
0
i ε
X
β
β
Y 


Linear component
Simple Linear Regression
Model
Population
Y intercept
Population
Slope
Coefficient
Random
Error
term
Dependent
Variable
Independent
Variable
Random Error
component
Slide-10
(continued)
Random Error
for this Xi value
Y
X
Observed Value
of Y for Xi
Predicted Value
of Y for Xi
i
i
1
0
i ε
X
β
β
Y 


Xi
Slope = β1
Intercept = β0
εi
Simple Linear Regression
Model
Slide-11
i
1
0
i X
b
b
Ŷ 

The simple linear regression equation provides an
estimate of the population regression line
Simple Linear Regression
Equation (Prediction Line)
Estimate of
the regression
intercept
Estimate of the
regression slope
Estimated
(or predicted)
Y value for
observation i
Value of X for
observation i
The individual random error terms ei have a mean of zero
Slide-12
Simple Linear Regression
Example
 A real estate agent wishes to examine the
relationship between the selling price of a home
and its size (measured in square feet)
 A random sample of 10 houses is selected
 Dependent variable (Y) = house price in $1000s
 Independent variable (X) = square feet
Slide-13
Sample Data for House Price
Model
House Price in $1000s
(Y)
Square Feet
(X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Slide-14
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
House
Price
($1000s)
Graphical Presentation
 House price model: scatter plot
Slide-15
0
50
100
150
200
250
300
350
400
450
0 500 1000 1500 2000 2500 3000
Square Feet
House
Price
($1000s)
Graphical Presentation
 House price model: scatter plot and
regression line
feet)
(square
0.10977
98.24833
price
house 

Slope
= 0.10977
Intercept
= 98.248
Slide-16
Interpretation of the
Intercept, b0
 b0 is the estimated average value of Y when the
value of X is zero (if X = 0 is in the range of
observed X values)
 Here, no houses had 0 square feet, so b0 = 98.24833
just indicates that, for houses within the range of
sizes observed, $98,248.33 is the portion of the
house price not explained by square feet
feet)
(square
0.10977
98.24833
price
house 

Slide-17
Interpretation of the
Slope Coefficient, b1
 b1 measures the estimated change in the
average value of Y as a result of a one-
unit change in X
 Here, b1 = .10977 tells us that the average value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size
feet)
(square
0.10977
98.24833
price
house 

Slide-18
317.85
0)
0.1098(200
98.25
(sq.ft.)
0.1098
98.25
price
house





Predict the price for a house
with 2000 square feet:
The predicted price for a house with 2000
square feet is 317.85($1,000s) = $317,850
Predictions using
Regression Analysis
Slide-19
Measures of Variation
 Total variation is made up of two parts:
SSE
SSR
SST 

Total Sum of
Squares
Regression Sum
of Squares
Error Sum of
Squares
 
 2
i )
Y
Y
(
SST  
 2
i
i )
Ŷ
Y
(
SSE
 
 2
i )
Y
Ŷ
(
SSR
where:
= Average value of the dependent variable
Yi = Observed values of the dependent variable
i = Predicted value of Y for the given Xi value
Ŷ
Y
Slide-20
 SST = total sum of squares
 Measures the variation of the Yi values around their
mean Y
 SSR = regression sum of squares
 Explained variation attributable to the relationship
between X and Y
 SSE = error sum of squares
 Variation attributable to factors other than the
relationship between X and Y
(continued)
Measures of Variation
Slide-21
(continued)
Xi
Y
X
Yi
SST = (Yi - Y)2
SSE = (Yi - Yi )2

SSR = (Yi - Y)2

_
_
_
Y

Y
Y
_
Y

Measures of Variation
Multiple Regression
Multiple Regression
 In general the regression estimates are more
reliable if:
i) n is large (large dataset)
ii) The sample variance of the explanatory
variable is high.
iii) the variance of the error term is small
iv) The less closely related are the explanatory
variables.
Multiple Regression
 The constant and parameters are derived in the
same way as with the bi-variate model. It
involves minimising the sum of the error terms.
The equation for the slope parameters (α)
contains an expression for the covariance
between the explanatory variables.
 When a new variable is added it affects the
coefficients of the existing variables
Regression
 In the previous slide, a unit rise in x produces 0.4 of a unit rise in y,
with z held constant.
 Interpretation of the t-statistics remains the same, i.e. 0.4-0/0.4=1
(critical value is 2.02), so we fail to reject the null and x is not
significant.
 The R-squared statistic indicates 30% of the variance of y is
explained
 DW statistic indicates we are not sure if there is autocorrelation, as
the DW statistic lies in the zone of indecision (Dl=1.43, Du=1.62)
)
tan
,
45
(
56
.
1
,
3
.
0
R
(0.3)
(0.4)
(0.1)
9
.
0
4
.
0
6
.
0
ˆ
2
brackets
in
errors
dard
s
ns
observatio
DW
z
x
y t
t
t





Adjusted R-squared Statistic
 This statistic is used in a multiple regression
analysis, because it does not automatically
rise when an extra explanatory variable is
added.
 Its value depends on the number of
explanatory variables
 It is usually written as (R-bar squared):
2
R
ANNOVA, or Analysis of
Variance
 It is a statistical method used to compare the
means of two or more groups to determine if
there are any significant differences between
them.
 It is commonly used in research studies to
analyze the effects of different variables on a
particular outcome
Slide-27
The F-test
 The F-test is an analysis of the variance of a regression
 It can be used to test for the significance of a group of
variables or for a restriction
 It has a different distribution to the t-test, but can be
used to test at different levels of significance
 When determining the F-statistic we need to collect
either the residual sum of squares (RSS) or the R-
squared statistic
 The formula for the F-test of a group of variables can be
expressed in terms of either the residual sum of
squares (RSS) or explained sum of squares (ESS)
F-test of explanatory power
 This is the F-test for the goodness of fit of a
regression and in effect tests for the joint
significance of the explanatory variables.
 It is based on the R-squared statistic
 It is routinely produced by most computer
software packages
 It follows the F-distribution, which is quite
different to the t-test
F-test formula
 The formula for the F-test of the goodness of
fit is:
1
2
2
)
/(
)
1
(
1
/






k
k
n
F
k
n
R
k
R
F
F-distribution
 To find the critical value of the F-distribution, in
general you need to know the number of
parameters and the degrees of freedom
 The number of parameters is then read across
the top of the table, the d of f. from the side.
Where these two values intersect, we find the
critical value.
F-statistic
 When testing for the significance of the
goodness of fit, our null hypothesis is that the
explanatory variables jointly equal 0.
 If our F-statistic is below the critical value we fail
to reject the null and therefore we say the
goodness of fit is not significant.
Slide-33
House Price
in $1000s
(y)
Square Feet
(x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
(sq.ft.)
0.1098
98.25
price
house 

Simple Linear Regression Equation:
The slope of this model is 0.1098
Does square footage of the house
affect its sales price?
Inference about the Slope:
t Test
(continued)
Slide-34
Inference about the Slope:
t Test
 t test for a population slope
 Is there a linear relationship between X and Y?
 Null and alternative hypotheses
H0: β1 = 0 (no linear relationship)
H1: β1  0 (linear relationship does exist)
 Test statistic
1
b
1
1
S
β
b
t


2
n
d.f. 

where:
b1 = regression slope
coefficient
β1 = hypothesized slope
Sb = standard
error of the slope
1
Slide-35
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1  0
From Excel output:
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
1
b
S
t
b1
32938
.
3
03297
.
0
0
10977
.
0
S
β
b
t
1
b
1
1





Slide-36
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1  0
Test Statistic: t = 3.329
There is sufficient evidence
that square footage affects
house price
From Excel output:
Reject H0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
1
b
S t
b1
Decision:
Conclusion:
Reject H0
Reject H0
a/2=.025
-tα/2
Do not reject H0
0
tα/2
a/2=.025
-2.3060 2.3060 3.329
d.f. = 10-2 = 8
(continued)
Slide-37
Inferences about the Slope:
t Test Example
H0: β1 = 0
H1: β1  0
P-value = 0.01039
There is sufficient evidence
that square footage affects
house price
From Excel output:
Reject H0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
P-value
Decision: P-value < α so
Conclusion:
(continued)
This is a two-tail test, so
the p-value is
P(t > 3.329)+P(t < -3.329)
= 0.01039
(for 8 d.f.)
Slide-38
F Test for Significance
 F Test statistic:
where
MSE
MSR
F 
1
k
n
SSE
MSE
k
SSR
MSR




where F follows an F distribution with k numerator and (n – k - 1)
denominator degrees of freedom
(k = the number of independent variables in the regression model)
Slide-39
Excel Output
Regression Statistics
Multiple R 0.76211
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
11.0848
1708.1957
18934.9348
MSE
MSR
F 


With 1 and 8 degrees
of freedom
P-value for
the F Test
Slide-40
H0: β1 = 0
H1: β1 ≠ 0
a = .05
df1= 1 df2 = 8
Test Statistic:
Decision:
Conclusion:
Reject H0 at a = 0.05
There is sufficient evidence that
house size affects selling price
0
a = .05
F.05 = 5.32
Reject H0
Do not
reject H0
11.08
MSE
MSR
F 

Critical
Value:
Fa = 5.32
F Test for Significance
(continued)
F

More Related Content

Similar to linear Regression, multiple Regression and Annova

ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download herekeerthanakshatriya20
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .pptTapanKumarDash3
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysisRabin BK
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
lecture No. 3a.ppt
lecture No. 3a.pptlecture No. 3a.ppt
lecture No. 3a.pptHamidUllah50
 
regressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdfregressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdfAdikesavaperumal
 
Lesson07_new
Lesson07_newLesson07_new
Lesson07_newshengvn
 
Regression Presentation.pptx
Regression Presentation.pptxRegression Presentation.pptx
Regression Presentation.pptxMuhammadMuslim25
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
regression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfregression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfMuhammadAftab89
 
Regression analysis
Regression analysisRegression analysis
Regression analysissaba khan
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxRohit77460
 

Similar to linear Regression, multiple Regression and Annova (20)

Regression
RegressionRegression
Regression
 
ML-UNIT-IV complete notes download here
ML-UNIT-IV  complete notes download hereML-UNIT-IV  complete notes download here
ML-UNIT-IV complete notes download here
 
Quantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA ProgramQuantitative Methods - Level II - CFA Program
Quantitative Methods - Level II - CFA Program
 
Regression
RegressionRegression
Regression
 
regression analysis .ppt
regression analysis .pptregression analysis .ppt
regression analysis .ppt
 
Statistics-Regression analysis
Statistics-Regression analysisStatistics-Regression analysis
Statistics-Regression analysis
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Correlation and Regression
Correlation and Regression Correlation and Regression
Correlation and Regression
 
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
 
lecture No. 3a.ppt
lecture No. 3a.pptlecture No. 3a.ppt
lecture No. 3a.ppt
 
regressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdfregressionanalysis-110723130213-phpapp02.pdf
regressionanalysis-110723130213-phpapp02.pdf
 
Lesson07_new
Lesson07_newLesson07_new
Lesson07_new
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression Presentation.pptx
Regression Presentation.pptxRegression Presentation.pptx
Regression Presentation.pptx
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Chap11 simple regression
Chap11 simple regressionChap11 simple regression
Chap11 simple regression
 
Chapter 14 Part I
Chapter 14 Part IChapter 14 Part I
Chapter 14 Part I
 
regression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdfregression-130929093340-phpapp02 (1).pdf
regression-130929093340-phpapp02 (1).pdf
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
CORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptxCORRELATION AND REGRESSION.pptx
CORRELATION AND REGRESSION.pptx
 

More from Mansi Rastogi

Water resources. its depletion and its conservation
Water resources. its depletion and its conservationWater resources. its depletion and its conservation
Water resources. its depletion and its conservationMansi Rastogi
 
plume behavior and their flow at various locations
plume behavior and their flow at various locationsplume behavior and their flow at various locations
plume behavior and their flow at various locationsMansi Rastogi
 
Fate of pollutants and pathways in environment
Fate of pollutants and pathways in environmentFate of pollutants and pathways in environment
Fate of pollutants and pathways in environmentMansi Rastogi
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspectsMansi Rastogi
 
Air quality, Atmospheric stability and measure of air quality
Air quality, Atmospheric stability and measure of air qualityAir quality, Atmospheric stability and measure of air quality
Air quality, Atmospheric stability and measure of air qualityMansi Rastogi
 
Scope of Environment.pdf
Scope of Environment.pdfScope of Environment.pdf
Scope of Environment.pdfMansi Rastogi
 
ecosystem -2 energyflow.pdf
ecosystem -2 energyflow.pdfecosystem -2 energyflow.pdf
ecosystem -2 energyflow.pdfMansi Rastogi
 
Environment modelling
Environment modellingEnvironment modelling
Environment modellingMansi Rastogi
 

More from Mansi Rastogi (9)

Water resources. its depletion and its conservation
Water resources. its depletion and its conservationWater resources. its depletion and its conservation
Water resources. its depletion and its conservation
 
plume behavior and their flow at various locations
plume behavior and their flow at various locationsplume behavior and their flow at various locations
plume behavior and their flow at various locations
 
Fate of pollutants and pathways in environment
Fate of pollutants and pathways in environmentFate of pollutants and pathways in environment
Fate of pollutants and pathways in environment
 
Environment modelling and its environmental aspects
Environment modelling and its environmental aspectsEnvironment modelling and its environmental aspects
Environment modelling and its environmental aspects
 
Air quality, Atmospheric stability and measure of air quality
Air quality, Atmospheric stability and measure of air qualityAir quality, Atmospheric stability and measure of air quality
Air quality, Atmospheric stability and measure of air quality
 
ozone .pptx
ozone .pptxozone .pptx
ozone .pptx
 
Scope of Environment.pdf
Scope of Environment.pdfScope of Environment.pdf
Scope of Environment.pdf
 
ecosystem -2 energyflow.pdf
ecosystem -2 energyflow.pdfecosystem -2 energyflow.pdf
ecosystem -2 energyflow.pdf
 
Environment modelling
Environment modellingEnvironment modelling
Environment modelling
 

Recently uploaded

MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdfSuchita Rawat
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandRcvets
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!University of Hertfordshire
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...mikehavy0
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfbyp19971001
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxArpitaMishra69
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxmuralinath2
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysBrahmesh Reddy B R
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionAreesha Ahmad
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed systemADB online India
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...yogeshlabana357357
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationAreesha Ahmad
 
ANALEPTICS Mrs Namrata Sanjay Mane  Department of Pharmaceutical Chemistry...
ANALEPTICS  Mrs Namrata Sanjay  Mane   Department of Pharmaceutical Chemistry...ANALEPTICS  Mrs Namrata Sanjay  Mane   Department of Pharmaceutical Chemistry...
ANALEPTICS Mrs Namrata Sanjay Mane  Department of Pharmaceutical Chemistry...NAMRATAMANE8
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxGOWTHAMIM22
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...kevin8smith
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Sahil Suleman
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeAreesha Ahmad
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxMAGOTI ERNEST
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Ansari Aashif Raza Mohd Imtiyaz
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxKyawThanTint
 

Recently uploaded (20)

MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
 
Factor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary GlandFactor Causing low production and physiology of mamary Gland
Factor Causing low production and physiology of mamary Gland
 
Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!Quantifying Artificial Intelligence and What Comes Next!
Quantifying Artificial Intelligence and What Comes Next!
 
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
Abortion uae unmarried price +27791653574 Contact Us Dubai Abu Dhabi Sharjah ...
 
Heads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdfHeads-Up Multitasker: CHI 2024 Presentation.pdf
Heads-Up Multitasker: CHI 2024 Presentation.pdf
 
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptxPOST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
POST TRANSCRIPTIONAL GENE SILENCING-AN INTRODUCTION.pptx
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
 
GBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interactionGBSN - Microbiology (Unit 6) Human and Microbial interaction
GBSN - Microbiology (Unit 6) Human and Microbial interaction
 
Lubrication System in forced feed system
Lubrication System in forced feed systemLubrication System in forced feed system
Lubrication System in forced feed system
 
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
Soil and Water Conservation Engineering (SWCE) is a specialized field of stud...
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
ANALEPTICS Mrs Namrata Sanjay Mane  Department of Pharmaceutical Chemistry...
ANALEPTICS  Mrs Namrata Sanjay  Mane   Department of Pharmaceutical Chemistry...ANALEPTICS  Mrs Namrata Sanjay  Mane   Department of Pharmaceutical Chemistry...
ANALEPTICS Mrs Namrata Sanjay Mane  Department of Pharmaceutical Chemistry...
 
Isolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptxIsolation of AMF by wet sieving and decantation method pptx
Isolation of AMF by wet sieving and decantation method pptx
 
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
Harry Coumnas Thinks That Human Teleportation is Possible in Quantum Mechanic...
 
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
Alternative method of dissolution in-vitro in-vivo correlation and dissolutio...
 
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday LifeGBSN - Microbiology (Unit 7) Microbiology in Everyday Life
GBSN - Microbiology (Unit 7) Microbiology in Everyday Life
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
Molecular and Cellular Mechanism of Action of Hormones such as Growth Hormone...
 
Mining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptxMining Activity and Investment Opportunity in Myanmar.pptx
Mining Activity and Investment Opportunity in Myanmar.pptx
 

linear Regression, multiple Regression and Annova

  • 2. Slide-2 Learning Objectives  How to use regression analysis to predict the value of a dependent variable based on an independent variable  The meaning of the regression coefficients b0 and b1  How to evaluate the assumptions of regression analysis and know what to do if the assumptions are violated  To make inferences about the slope and correlation coefficient  To estimate mean values and predict individual values
  • 3. Slide-3 Correlation vs. Regression  A scatter diagram can be used to show the relationship between two variables  Correlation analysis is used to measure strength of the association (linear relationship) between two variables  Correlation is only concerned with strength of the relationship  No causal effect is implied with correlation
  • 4. Slide-4 Introduction to Regression Analysis  Regression analysis is used to:  Predict the value of a dependent variable based on the value of at least one independent variable  Explain the impact of changes in an independent variable on the dependent variable Dependent variable: the variable we wish to predict or explain Independent variable: the variable used to explain the dependent variable
  • 5. Slide-5 Simple Linear Regression Model  Only one independent variable, X  Relationship between X and Y is described by a linear function  Changes in Y are assumed to be caused by changes in X
  • 6. Slide-6 Types of Relationships Y X Y X Y Y X X Linear relationships Curvilinear relationships
  • 7. Slide-7 Types of Relationships Y X Y X Y Y X X Strong relationships Weak relationships (continued)
  • 8. Slide-8 Types of Relationships Y X Y X No relationship (continued)
  • 9. Department of Statistics, ITS Surabaya Slide-9 i i 1 0 i ε X β β Y    Linear component Simple Linear Regression Model Population Y intercept Population Slope Coefficient Random Error term Dependent Variable Independent Variable Random Error component
  • 10. Slide-10 (continued) Random Error for this Xi value Y X Observed Value of Y for Xi Predicted Value of Y for Xi i i 1 0 i ε X β β Y    Xi Slope = β1 Intercept = β0 εi Simple Linear Regression Model
  • 11. Slide-11 i 1 0 i X b b Ŷ   The simple linear regression equation provides an estimate of the population regression line Simple Linear Regression Equation (Prediction Line) Estimate of the regression intercept Estimate of the regression slope Estimated (or predicted) Y value for observation i Value of X for observation i The individual random error terms ei have a mean of zero
  • 12. Slide-12 Simple Linear Regression Example  A real estate agent wishes to examine the relationship between the selling price of a home and its size (measured in square feet)  A random sample of 10 houses is selected  Dependent variable (Y) = house price in $1000s  Independent variable (X) = square feet
  • 13. Slide-13 Sample Data for House Price Model House Price in $1000s (Y) Square Feet (X) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700
  • 14. Slide-14 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 Square Feet House Price ($1000s) Graphical Presentation  House price model: scatter plot
  • 15. Slide-15 0 50 100 150 200 250 300 350 400 450 0 500 1000 1500 2000 2500 3000 Square Feet House Price ($1000s) Graphical Presentation  House price model: scatter plot and regression line feet) (square 0.10977 98.24833 price house   Slope = 0.10977 Intercept = 98.248
  • 16. Slide-16 Interpretation of the Intercept, b0  b0 is the estimated average value of Y when the value of X is zero (if X = 0 is in the range of observed X values)  Here, no houses had 0 square feet, so b0 = 98.24833 just indicates that, for houses within the range of sizes observed, $98,248.33 is the portion of the house price not explained by square feet feet) (square 0.10977 98.24833 price house  
  • 17. Slide-17 Interpretation of the Slope Coefficient, b1  b1 measures the estimated change in the average value of Y as a result of a one- unit change in X  Here, b1 = .10977 tells us that the average value of a house increases by .10977($1000) = $109.77, on average, for each additional one square foot of size feet) (square 0.10977 98.24833 price house  
  • 18. Slide-18 317.85 0) 0.1098(200 98.25 (sq.ft.) 0.1098 98.25 price house      Predict the price for a house with 2000 square feet: The predicted price for a house with 2000 square feet is 317.85($1,000s) = $317,850 Predictions using Regression Analysis
  • 19. Slide-19 Measures of Variation  Total variation is made up of two parts: SSE SSR SST   Total Sum of Squares Regression Sum of Squares Error Sum of Squares    2 i ) Y Y ( SST    2 i i ) Ŷ Y ( SSE    2 i ) Y Ŷ ( SSR where: = Average value of the dependent variable Yi = Observed values of the dependent variable i = Predicted value of Y for the given Xi value Ŷ Y
  • 20. Slide-20  SST = total sum of squares  Measures the variation of the Yi values around their mean Y  SSR = regression sum of squares  Explained variation attributable to the relationship between X and Y  SSE = error sum of squares  Variation attributable to factors other than the relationship between X and Y (continued) Measures of Variation
  • 21. Slide-21 (continued) Xi Y X Yi SST = (Yi - Y)2 SSE = (Yi - Yi )2  SSR = (Yi - Y)2  _ _ _ Y  Y Y _ Y  Measures of Variation
  • 23. Multiple Regression  In general the regression estimates are more reliable if: i) n is large (large dataset) ii) The sample variance of the explanatory variable is high. iii) the variance of the error term is small iv) The less closely related are the explanatory variables.
  • 24. Multiple Regression  The constant and parameters are derived in the same way as with the bi-variate model. It involves minimising the sum of the error terms. The equation for the slope parameters (α) contains an expression for the covariance between the explanatory variables.  When a new variable is added it affects the coefficients of the existing variables
  • 25. Regression  In the previous slide, a unit rise in x produces 0.4 of a unit rise in y, with z held constant.  Interpretation of the t-statistics remains the same, i.e. 0.4-0/0.4=1 (critical value is 2.02), so we fail to reject the null and x is not significant.  The R-squared statistic indicates 30% of the variance of y is explained  DW statistic indicates we are not sure if there is autocorrelation, as the DW statistic lies in the zone of indecision (Dl=1.43, Du=1.62) ) tan , 45 ( 56 . 1 , 3 . 0 R (0.3) (0.4) (0.1) 9 . 0 4 . 0 6 . 0 ˆ 2 brackets in errors dard s ns observatio DW z x y t t t     
  • 26. Adjusted R-squared Statistic  This statistic is used in a multiple regression analysis, because it does not automatically rise when an extra explanatory variable is added.  Its value depends on the number of explanatory variables  It is usually written as (R-bar squared): 2 R
  • 27. ANNOVA, or Analysis of Variance  It is a statistical method used to compare the means of two or more groups to determine if there are any significant differences between them.  It is commonly used in research studies to analyze the effects of different variables on a particular outcome Slide-27
  • 28. The F-test  The F-test is an analysis of the variance of a regression  It can be used to test for the significance of a group of variables or for a restriction  It has a different distribution to the t-test, but can be used to test at different levels of significance  When determining the F-statistic we need to collect either the residual sum of squares (RSS) or the R- squared statistic  The formula for the F-test of a group of variables can be expressed in terms of either the residual sum of squares (RSS) or explained sum of squares (ESS)
  • 29. F-test of explanatory power  This is the F-test for the goodness of fit of a regression and in effect tests for the joint significance of the explanatory variables.  It is based on the R-squared statistic  It is routinely produced by most computer software packages  It follows the F-distribution, which is quite different to the t-test
  • 30. F-test formula  The formula for the F-test of the goodness of fit is: 1 2 2 ) /( ) 1 ( 1 /       k k n F k n R k R F
  • 31. F-distribution  To find the critical value of the F-distribution, in general you need to know the number of parameters and the degrees of freedom  The number of parameters is then read across the top of the table, the d of f. from the side. Where these two values intersect, we find the critical value.
  • 32. F-statistic  When testing for the significance of the goodness of fit, our null hypothesis is that the explanatory variables jointly equal 0.  If our F-statistic is below the critical value we fail to reject the null and therefore we say the goodness of fit is not significant.
  • 33. Slide-33 House Price in $1000s (y) Square Feet (x) 245 1400 312 1600 279 1700 308 1875 199 1100 219 1550 405 2350 324 2450 319 1425 255 1700 (sq.ft.) 0.1098 98.25 price house   Simple Linear Regression Equation: The slope of this model is 0.1098 Does square footage of the house affect its sales price? Inference about the Slope: t Test (continued)
  • 34. Slide-34 Inference about the Slope: t Test  t test for a population slope  Is there a linear relationship between X and Y?  Null and alternative hypotheses H0: β1 = 0 (no linear relationship) H1: β1  0 (linear relationship does exist)  Test statistic 1 b 1 1 S β b t   2 n d.f.   where: b1 = regression slope coefficient β1 = hypothesized slope Sb = standard error of the slope 1
  • 35. Slide-35 Inferences about the Slope: t Test Example H0: β1 = 0 H1: β1  0 From Excel output: Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 1 b S t b1 32938 . 3 03297 . 0 0 10977 . 0 S β b t 1 b 1 1     
  • 36. Slide-36 Inferences about the Slope: t Test Example H0: β1 = 0 H1: β1  0 Test Statistic: t = 3.329 There is sufficient evidence that square footage affects house price From Excel output: Reject H0 Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 1 b S t b1 Decision: Conclusion: Reject H0 Reject H0 a/2=.025 -tα/2 Do not reject H0 0 tα/2 a/2=.025 -2.3060 2.3060 3.329 d.f. = 10-2 = 8 (continued)
  • 37. Slide-37 Inferences about the Slope: t Test Example H0: β1 = 0 H1: β1  0 P-value = 0.01039 There is sufficient evidence that square footage affects house price From Excel output: Reject H0 Coefficients Standard Error t Stat P-value Intercept 98.24833 58.03348 1.69296 0.12892 Square Feet 0.10977 0.03297 3.32938 0.01039 P-value Decision: P-value < α so Conclusion: (continued) This is a two-tail test, so the p-value is P(t > 3.329)+P(t < -3.329) = 0.01039 (for 8 d.f.)
  • 38. Slide-38 F Test for Significance  F Test statistic: where MSE MSR F  1 k n SSE MSE k SSR MSR     where F follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom (k = the number of independent variables in the regression model)
  • 39. Slide-39 Excel Output Regression Statistics Multiple R 0.76211 R Square 0.58082 Adjusted R Square 0.52842 Standard Error 41.33032 Observations 10 ANOVA df SS MS F Significance F Regression 1 18934.9348 18934.9348 11.0848 0.01039 Residual 8 13665.5652 1708.1957 Total 9 32600.5000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386 Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580 11.0848 1708.1957 18934.9348 MSE MSR F    With 1 and 8 degrees of freedom P-value for the F Test
  • 40. Slide-40 H0: β1 = 0 H1: β1 ≠ 0 a = .05 df1= 1 df2 = 8 Test Statistic: Decision: Conclusion: Reject H0 at a = 0.05 There is sufficient evidence that house size affects selling price 0 a = .05 F.05 = 5.32 Reject H0 Do not reject H0 11.08 MSE MSR F   Critical Value: Fa = 5.32 F Test for Significance (continued) F