SlideShare a Scribd company logo
1 of 59
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin
Multiple Linear Regression and
Correlation Analysis
Chapter 14
2
GOALS
1. Describe the relationship between several independent
variables and a dependent variable using multiple regression
analysis.
2. Set up, interpret, and apply an ANOVA table
3. Compute and interpret the multiple standard error of estimate,
the coefficient of multiple determination, and the adjusted
coefficient of multiple determination.
4. Conduct a test of hypothesis to determine whether regression
coefficients differ from zero.
5. Conduct a test of hypothesis on each of the regression
coefficients.
6. Use residual analysis to evaluate the assumptions of multiple
regression analysis.
7. Evaluate the effects of correlated independent variables.
8. Use and understand qualitative independent variables.
3
Multiple Regression Analysis
The general multiple regression with k
independent variables is given by:
The least squares criterion is used to develop
this equation. Because determining b1, b2, etc. is
very tedious, a software package such as Excel
or MINITAB is recommended.
4
Multiple Regression Analysis
For two independent variables, the general form
of the multiple regression equation is:
• X1 and X2 are the independent variables.
• a is the Y-intercept
• b1 is the net change in Y for each unit change in X1
holding X2 constant. It is called a partial regression
coefficient, a net regression coefficient, or just a
regression coefficient.
5
Regression Plane for a 2-Independent
Variable Linear Regression Equation
6
Salsberry Realty sells homes along the east
coast of the United States. One of the
questions most frequently asked by
prospective buyers is: If we purchase this
home, how much can we expect to pay to
heat it during the winter? The research
department at Salsberry has been asked to
develop some guidelines regarding heating
costs for single-family homes.
Three variables are thought to relate to the
heating costs: (1) the mean daily outside
temperature, (2) the number of inches of
insulation in the attic, and (3) the age in
years of the furnace.
To investigate, Salsberry’s research department
selected a random sample of 20 recently
sold homes. It determined the cost to heat
each home last January, as well
(data in next slide)
Multiple Linear Regression - Example
7
Multiple Linear Regression - Example
8
Multiple Linear Regression – Minitab
Example
9
Multiple Linear Regression – Excel
Example
10
The Multiple Regression Equation –
Interpreting the Regression Coefficients
The regression coefficient for mean outside temperature
(X1) is 4.583. The coefficient is negative and shows an
inverse relationship between heating cost and
temperature.
As the outside temperature increases, the cost to heat the
home decreases. The numeric value of the regression
coefficient provides more information. If we increase
temperature by 1 degree and hold the other two
independent variables constant, we can estimate a
decrease of $4.583 in monthly heating cost.
11
The Multiple Regression Equation –
Interpreting the Regression Coefficients
The attic insulation variable (X2) also shows an inverse
relationship: the more insulation in the attic, the less
the cost to heat the home. So the negative sign for this
coefficient is logical. For each additional inch of
insulation, we expect the cost to heat the home to
decline $14.83 per month, regardless of the outside
temperature or the age of the furnace.
12
The Multiple Regression Equation –
Interpreting the Regression Coefficients
The age of the furnace variable (X3) shows a direct
relationship. With an older furnace, the cost to heat the
home increases.
Specifically, for each additional year older the furnace is,
we expect the cost to increase $6.10 per month.
13
Applying the Model for Estimation
What is the estimated heating cost for a home if the
mean outside temperature is 30 degrees, there
are 5 inches of insulation in the attic, and the
furnace is 10 years old?
14
Multiple Standard Error of Estimate
The multiple standard error of estimate is a measure of the
effectiveness of the regression equation.
 It is measured in the same units as the dependent
variable.
 It is difficult to determine what is a large value and what
is a small value of the standard error.
 The formula is:
15
16
Multiple Regression and
Correlation Assumptions
 The independent variables and the dependent
variable have a linear relationship. The dependent
variable must be continuous and at least interval-
scale.
 The residual must be the same for all values of Y.
When this is the case, we say the difference exhibits
homoscedasticity.
 The residuals should follow the normal distribution
with mean 0.
 Successive values of the dependent variable must
be uncorrelated.
17
The ANOVA Table
The ANOVA table reports the variation in the
dependent variable. The variation is divided
into two components.
 The Explained Variation is that accounted for
by the set of independent variable.
 The Unexplained or Random Variation is not
accounted for by the independent variables.
18
Minitab – the ANOVA Table
19
Coefficient of Multiple Determination (r2
)
Characteristics of the coefficient of multiple determination:
1. It is symbolized by a capital R squared. In other words,
it is written as because it behaves like the square of a
correlation coefficient.
2. It can range from 0 to 1. A value near 0 indicates little
association between the set of independent variables and
the dependent variable. A value near 1 means a strong
association.
3. It cannot assume negative values. Any number that is
squared or raised to the second power cannot be
negative.
4. It is easy to interpret. Because R2
is a value between 0
and 1 it is easy to interpret, compare, and understand.
20
Coefficient of Multiple Determination (r2
)
- Formula
21
Minitab – the ANOVA Table
804.0
916,212
220,171
total
2
===
SS
SSR
R
22
Adjusted Coefficient of Determination
 The number of independent variables in a multiple
regression equation makes the coefficient of
determination larger.
 If the number of variables, k, and the sample size, n,
are equal, the coefficient of determination is 1.0.
 To balance the effect that the number of
independent variables has on the coefficient of
multiple determination, statistical software packages
use an adjusted coefficient of multiple determination.
23
Adjusted Coefficient of Determination -
Example
24
Correlation Matrix
A correlation matrix is used to show all
possible simple correlation
coefficients among the variables.
The matrix is useful for locating
correlated independent variables.
It shows how strongly each
independent variable is correlated
with the dependent variable.
25
Global Test: Testing the Multiple
Regression Model
The global test is used to investigate
whether any of the independent
variables have significant coefficients.
The hypotheses are:
0equals'allNot:
0...:
1
210
β
βββ
H
H k ====
26
Global Test continued
The test statistic is the F
distribution with k (number of
independent variables) and
n-(k+1) degrees of freedom, where
n is the sample size.
Decision Rule:
Reject H0 if F > Fα,k,n-k-1
27
Finding the Critical F
28
Finding the Computed F
29
Interpretation
 The computed F is 21.90, in
the reject H0 region
 The null hypothesis that all the
multiple regression
coefficients are zero is
therefore rejected.
 Interpretation: some of the
independent variables
(amount of insulation, etc.) do
have the ability to explain the
variation in the dependent
variable (heating cost).
 Logical question – which
ones?
30
Evaluating Individual Regression
Coefficients (βi = 0)
 Test is used to determine which independent variables
have nonzero regression coefficients.
 The variables that have zero regression coefficients are
usually dropped from the analysis.
 The test statistic is the t distribution with n-(k+1) degrees of
freedom.
 The hypothesis test is as follows:
H0: βi = 0
H1: βi ≠ 0
Reject H0 if t > tα/2,n-k-1 or t < -tα/2,n-k-1
31
Critical t-stat for the Slopes
-2.120 2.120
32
Computed t-stat for the Slopes
33
Conclusion on Significance of Slopes
34
New Regression Model without
Variable “Age” – Minitab
35
New Regression Model without Variable
“Age” – Minitab
36
Testing the New Model for Significance
37
Critical t-stat for the New Slopes
110.2
0
110.2
0
00
00
00
:ifHReject
17,025.17,025.
1220,2/05.1220,2/05.
1,2/1,2/
1,2/1,2/
0
−<
−
>
−
−<
−
>
−
−<
−
>
−
−<
−
>
−
−<>
−−−−
−−−−
−−−−
ii
ii
ii
ii
b
i
b
i
b
i
b
i
b
i
b
i
kn
b
i
kn
b
i
knkn
s
b
s
b
t
s
b
t
s
b
t
s
b
t
s
b
t
s
b
t
s
b
tttt
αα
αα
-2.110 2.110
38
Conclusion on Significance of New
Slopes
39
Evaluating the
Assumptions of Multiple Regression
1. There is a linear relationship. That is, there is a straight-line
relationship between the dependent variable and the set of
independent variables.
2. The variation in the residuals is the same for both large and
small values of the estimated Y To put it another way, the residual
is unrelated whether the estimated Y is large or small.
3. The residuals follow the normal probability distribution.
4. The independent variables should not be correlated. That is, we
would like to select a set of independent variables that are not
themselves correlated.
5. The residuals are independent. This means that successive
observations of the dependent variable are not correlated. This
assumption is often violated when time is involved with the sampled
observations.
40
Analysis of Residuals
A residual is the difference between the
actual value of Y and the predicted
value of Y. Residuals should be
approximately normally distributed.
Histograms and stem-and-leaf charts
are useful in checking this requirement.
 A plot of the residuals and their
corresponding Y’ values is used for
showing that there are no trends or
patterns in the residuals.
41
Scatter Diagram
42
Residual Plot
43
Distribution of Residuals
Both MINITAB and Excel offer another graph that helps to evaluate the
assumption of normally distributed residuals. It is a called a normal
probability plot and is shown to the right of the histogram.
44
Multicollinearity
 Multicollinearity exists when independent
variables (X’s) are correlated.
 Correlated independent variables make it
difficult to make inferences about the
individual regression coefficients (slopes)
and their individual effects on the dependent
variable (Y).
 However, correlated independent variables
do not affect a multiple regression equation’s
ability to predict the dependent variable (Y).
45
Effect of Multicollinearity in
 Not a Problem: Multicollinearity does not
affect a multiple regression equation’s
ability to predict the dependent variable
 A Problem: Multicollinearity may show
unexpected results in evaluating the
relationship between each independent
variable and the dependent variable (a.k.a.
partial correlation analysis),
46
Clues Indicating Problems with Multicollinearity
1. An independent variable known to be an
important predictor ends up having a
regression coefficient that is not significant.
2. A regression coefficient that should have a
positive sign turns out to be negative, or
vice versa.
3. When an independent variable is added or
removed, there is a drastic change in the
values of the remaining regression
coefficients.
47
Variance Inflation Factor
 A general rule is if the correlation between two independent
variables is between -0.70 and 0.70 there likely is not a problem
using both of the independent variables.
 A more precise test is to use the variance inflation factor (VIF).
 The value of VIF is found as follows:
•The term R2
j refers to the coefficient of determination, where the selected
independent variable is used as a dependent variable and the remaining
independent variables are used as independent variables.
•A VIF greater than 10 is considered unsatisfactory, indicating that
independent variable should be removed from the analysis.
48
Multicollinearity – Example
Refer to the data in the
table, which relates the
heating cost to the
independent variables
outside temperature,
amount of insulation,
and age of furnace.
Develop a correlation
matrix for all the
independent variables.
Does it appear there is a
problem with
multicollinearity?
Find and interpret the
variance inflation factor
for each of the
independent variables.
49
Correlation Matrix - Minitab
50
VIF – Minitab Example
The VIF value of 1.32 is less than the
upper limit of 10. This indicates that
the independent variable temperature
is not strongly correlated with the other
independent variables.
Coefficient of
Determination
51
Independence Assumption
 The fifth assumption about regression and
correlation analysis is that successive
residuals should be independent.
 When successive residuals are correlated we
refer to this condition as autocorrelation.
Autocorrelation frequently occurs when the
data are collected over a period of time.
52
Residual Plot versus Fitted Values
 The graph shows the
residuals plotted on the
vertical axis and the fitted
values on the horizontal
axis.
 Note the run of residuals
above the mean of the
residuals, followed by a
run below the mean. A
scatter plot such as this
would indicate possible
autocorrelation.
53
Qualitative Independent Variables
 Frequently we wish to use nominal-scale
variables—such as gender, whether the
home has a swimming pool, or whether the
sports team was the home or the visiting
team—in our analysis. These are called
qualitative variables.
 To use a qualitative variable in regression
analysis, we use a scheme of dummy
variables in which one of the two possible
conditions is coded 0 and the other 1.
54
Qualitative Variable - Example
Suppose in the Salsberry
Realty example that the
independent variable
“garage” is added. For those
homes without an attached
garage, 0 is used; for homes
with an attached garage, a 1
is used. We will refer to the
“garage” variable as X4.The
data shown on the table are
entered into the MINITAB
system.
55
Qualitative Variable - Minitab
56
Using the Model for Estimation
What is the effect of the garage variable? Suppose we have two houses exactly
alike next to each other in Buffalo, New York; one has an attached garage,
and the other does not. Both homes have 3 inches of insulation, and the
mean January temperature in Buffalo is 20 degrees.
For the house without an attached garage, a 0 is substituted for in the regression
equation. The estimated heating cost is $280.90, found by:
For the house with an attached garage, a 1 is substituted for in the regression
equation. The estimated heating cost is $358.30, found by:
Without garage
With garage
57
Testing the Model for Significance
 We have shown the difference between the two
types of homes to be $77.40, but is the difference
significant?
 We conduct the following test of hypothesis.
H0: βi = 0
H1: βi ≠ 0
Reject H0 if t > tα/2,n-k-1 or t < -tα/2,n-k-1
58
120.2
0
120.2
0
00
00
00
:ifHReject
16,025.16,025.
1320,2/05.1320,2/05.
1,2/1,2/
1,2/1,2/
0
−<
−
>
−
−<
−
>
−
−<
−
>
−
−<
−
>
−
−<>
−−−−
−−−−
−−−−
ii
ii
ii
ii
b
i
b
i
b
i
b
i
b
i
b
i
kn
b
i
kn
b
i
knkn
s
b
s
b
t
s
b
t
s
b
t
s
b
t
s
b
t
s
b
t
s
b
tttt
αα
αα
Conclusion: The regression coefficient is not zero. The independent
variable garage should be included in the analysis.
59
End of Chapter 14

More Related Content

What's hot

multiple regression
multiple regressionmultiple regression
multiple regressionPriya Sharma
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis pptDavid Jaison
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regressionJames Neill
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.sonia gupta
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAjendra7846
 
Decision Analysis Introduction
Decision Analysis IntroductionDecision Analysis Introduction
Decision Analysis Introductionkelly kusmulyono
 
Binomial probability distributions ppt
Binomial probability distributions pptBinomial probability distributions ppt
Binomial probability distributions pptTayab Ali
 
regression and correlation
regression and correlationregression and correlation
regression and correlationPriya Sharma
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis Anil Pokhrel
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisSalim Azad
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsBabasab Patil
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)Harsh Upadhyay
 
Linear regression
Linear regressionLinear regression
Linear regressionSreerajVA
 
Exponential probability distribution
Exponential probability distributionExponential probability distribution
Exponential probability distributionMuhammad Bilal Tariq
 
F Distribution
F  DistributionF  Distribution
F Distributionjravish
 

What's hot (20)

multiple regression
multiple regressionmultiple regression
multiple regression
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Decision Analysis Introduction
Decision Analysis IntroductionDecision Analysis Introduction
Decision Analysis Introduction
 
Binomial probability distributions ppt
Binomial probability distributions pptBinomial probability distributions ppt
Binomial probability distributions ppt
 
regression and correlation
regression and correlationregression and correlation
regression and correlation
 
Correlation analysis
Correlation analysis Correlation analysis
Correlation analysis
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
 
Multivariate Analysis
Multivariate AnalysisMultivariate Analysis
Multivariate Analysis
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Linear regression theory
Linear regression theoryLinear regression theory
Linear regression theory
 
T distribution
T distributionT distribution
T distribution
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Ols
OlsOls
Ols
 
Exponential probability distribution
Exponential probability distributionExponential probability distribution
Exponential probability distribution
 
F Distribution
F  DistributionF  Distribution
F Distribution
 

Viewers also liked

Eco Basic 1 8
Eco Basic 1 8Eco Basic 1 8
Eco Basic 1 8kit11229
 
Income & expenditure ppt
Income & expenditure pptIncome & expenditure ppt
Income & expenditure pptMsOBrien
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression AnalysisMinha Hwang
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regressionTauseef khan
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spssDr. Ravneet Kaur
 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSSDr Athar Khan
 
Статистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээСтатистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээTuul Tuul
 

Viewers also liked (10)

Eco Basic 1 8
Eco Basic 1 8Eco Basic 1 8
Eco Basic 1 8
 
Linear regression
Linear regression Linear regression
Linear regression
 
Income & expenditure ppt
Income & expenditure pptIncome & expenditure ppt
Income & expenditure ppt
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression Analysis
 
Chapter 4 - multiple regression
Chapter 4  - multiple regressionChapter 4  - multiple regression
Chapter 4 - multiple regression
 
Multiple regression in spss
Multiple regression in spssMultiple regression in spss
Multiple regression in spss
 
Chapter 15
Chapter 15 Chapter 15
Chapter 15
 
Linear Regression Using SPSS
Linear Regression Using SPSSLinear Regression Using SPSS
Linear Regression Using SPSS
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Статистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээСтатистикийн үндсэн аргууд түүний хэрэглээ
Статистикийн үндсэн аргууд түүний хэрэглээ
 

Similar to Chapter 14

Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.pptTanyaWadhwani4
 
Chap014 BStat SemA201.ppt
Chap014  BStat SemA201.pptChap014  BStat SemA201.ppt
Chap014 BStat SemA201.pptnajwalyaa
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.pptRufesh
 
10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm iSatwik Mohanty
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and AnnovaMansi Rastogi
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Stephen Ong
 
Stats ca report_18180485
Stats ca report_18180485Stats ca report_18180485
Stats ca report_18180485sarthakkhare3
 
Chapter 15Multiple Regression and Model BuildingCo
Chapter 15Multiple Regression and Model BuildingCoChapter 15Multiple Regression and Model BuildingCo
Chapter 15Multiple Regression and Model BuildingCoEstelaJeffery653
 
linear regression PDF.pdf
linear regression PDF.pdflinear regression PDF.pdf
linear regression PDF.pdfJoshuaLau29
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionKhalid Aziz
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationAsadJaved304231
 

Similar to Chapter 14 (20)

Ch14 multiple regression
Ch14 multiple regressionCh14 multiple regression
Ch14 multiple regression
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Regression
RegressionRegression
Regression
 
Regression
RegressionRegression
Regression
 
Chap014 BStat SemA201.ppt
Chap014  BStat SemA201.pptChap014  BStat SemA201.ppt
Chap014 BStat SemA201.ppt
 
Chap014.ppt
Chap014.pptChap014.ppt
Chap014.ppt
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.ppt
 
10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i10685 6.1 multivar_mr_srm bm i
10685 6.1 multivar_mr_srm bm i
 
linear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annovalinear Regression, multiple Regression and Annova
linear Regression, multiple Regression and Annova
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02
 
Demand Estimation
Demand EstimationDemand Estimation
Demand Estimation
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Stats ca report_18180485
Stats ca report_18180485Stats ca report_18180485
Stats ca report_18180485
 
ch02.pdf
ch02.pdfch02.pdf
ch02.pdf
 
Chapter 15Multiple Regression and Model BuildingCo
Chapter 15Multiple Regression and Model BuildingCoChapter 15Multiple Regression and Model BuildingCo
Chapter 15Multiple Regression and Model BuildingCo
 
linear regression PDF.pdf
linear regression PDF.pdflinear regression PDF.pdf
linear regression PDF.pdf
 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
 
Logistic regression and analysis using statistical information
Logistic regression and analysis using statistical informationLogistic regression and analysis using statistical information
Logistic regression and analysis using statistical information
 

More from Tuul Tuul

2019 12etses
2019 12etses2019 12etses
2019 12etsesTuul Tuul
 
2018 12etses
2018 12etses2018 12etses
2018 12etsesTuul Tuul
 
2017 12etses
2017 12etses2017 12etses
2017 12etsesTuul Tuul
 
2016 12etses
2016 12etses2016 12etses
2016 12etsesTuul Tuul
 
Bbs11 ppt ch19
Bbs11 ppt ch19Bbs11 ppt ch19
Bbs11 ppt ch19Tuul Tuul
 
Bbs11 ppt ch17
Bbs11 ppt ch17Bbs11 ppt ch17
Bbs11 ppt ch17Tuul Tuul
 
Bbs11 ppt ch16
Bbs11 ppt ch16Bbs11 ppt ch16
Bbs11 ppt ch16Tuul Tuul
 
Bbs11 ppt ch14
Bbs11 ppt ch14Bbs11 ppt ch14
Bbs11 ppt ch14Tuul Tuul
 
Bbs11 ppt ch13
Bbs11 ppt ch13Bbs11 ppt ch13
Bbs11 ppt ch13Tuul Tuul
 
Bbs11 ppt ch12
Bbs11 ppt ch12Bbs11 ppt ch12
Bbs11 ppt ch12Tuul Tuul
 
Bbs11 ppt ch11
Bbs11 ppt ch11Bbs11 ppt ch11
Bbs11 ppt ch11Tuul Tuul
 
Bbs11 ppt ch10
Bbs11 ppt ch10Bbs11 ppt ch10
Bbs11 ppt ch10Tuul Tuul
 
Bbs11 ppt ch08
Bbs11 ppt ch08Bbs11 ppt ch08
Bbs11 ppt ch08Tuul Tuul
 
Bbs11 ppt ch07
Bbs11 ppt ch07Bbs11 ppt ch07
Bbs11 ppt ch07Tuul Tuul
 
Bbs11 ppt ch06
Bbs11 ppt ch06Bbs11 ppt ch06
Bbs11 ppt ch06Tuul Tuul
 
Bbs11 ppt ch05
Bbs11 ppt ch05Bbs11 ppt ch05
Bbs11 ppt ch05Tuul Tuul
 
Bbs11 ppt ch04
Bbs11 ppt ch04Bbs11 ppt ch04
Bbs11 ppt ch04Tuul Tuul
 
Bbs11 ppt ch03
Bbs11 ppt ch03Bbs11 ppt ch03
Bbs11 ppt ch03Tuul Tuul
 
Bbs11 ppt ch02
Bbs11 ppt ch02Bbs11 ppt ch02
Bbs11 ppt ch02Tuul Tuul
 

More from Tuul Tuul (20)

2019 12etses
2019 12etses2019 12etses
2019 12etses
 
2018 12etses
2018 12etses2018 12etses
2018 12etses
 
2017 12etses
2017 12etses2017 12etses
2017 12etses
 
2016 12etses
2016 12etses2016 12etses
2016 12etses
 
Jishee
JisheeJishee
Jishee
 
Bbs11 ppt ch19
Bbs11 ppt ch19Bbs11 ppt ch19
Bbs11 ppt ch19
 
Bbs11 ppt ch17
Bbs11 ppt ch17Bbs11 ppt ch17
Bbs11 ppt ch17
 
Bbs11 ppt ch16
Bbs11 ppt ch16Bbs11 ppt ch16
Bbs11 ppt ch16
 
Bbs11 ppt ch14
Bbs11 ppt ch14Bbs11 ppt ch14
Bbs11 ppt ch14
 
Bbs11 ppt ch13
Bbs11 ppt ch13Bbs11 ppt ch13
Bbs11 ppt ch13
 
Bbs11 ppt ch12
Bbs11 ppt ch12Bbs11 ppt ch12
Bbs11 ppt ch12
 
Bbs11 ppt ch11
Bbs11 ppt ch11Bbs11 ppt ch11
Bbs11 ppt ch11
 
Bbs11 ppt ch10
Bbs11 ppt ch10Bbs11 ppt ch10
Bbs11 ppt ch10
 
Bbs11 ppt ch08
Bbs11 ppt ch08Bbs11 ppt ch08
Bbs11 ppt ch08
 
Bbs11 ppt ch07
Bbs11 ppt ch07Bbs11 ppt ch07
Bbs11 ppt ch07
 
Bbs11 ppt ch06
Bbs11 ppt ch06Bbs11 ppt ch06
Bbs11 ppt ch06
 
Bbs11 ppt ch05
Bbs11 ppt ch05Bbs11 ppt ch05
Bbs11 ppt ch05
 
Bbs11 ppt ch04
Bbs11 ppt ch04Bbs11 ppt ch04
Bbs11 ppt ch04
 
Bbs11 ppt ch03
Bbs11 ppt ch03Bbs11 ppt ch03
Bbs11 ppt ch03
 
Bbs11 ppt ch02
Bbs11 ppt ch02Bbs11 ppt ch02
Bbs11 ppt ch02
 

Recently uploaded

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 

Chapter 14

  • 1. ©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Multiple Linear Regression and Correlation Analysis Chapter 14
  • 2. 2 GOALS 1. Describe the relationship between several independent variables and a dependent variable using multiple regression analysis. 2. Set up, interpret, and apply an ANOVA table 3. Compute and interpret the multiple standard error of estimate, the coefficient of multiple determination, and the adjusted coefficient of multiple determination. 4. Conduct a test of hypothesis to determine whether regression coefficients differ from zero. 5. Conduct a test of hypothesis on each of the regression coefficients. 6. Use residual analysis to evaluate the assumptions of multiple regression analysis. 7. Evaluate the effects of correlated independent variables. 8. Use and understand qualitative independent variables.
  • 3. 3 Multiple Regression Analysis The general multiple regression with k independent variables is given by: The least squares criterion is used to develop this equation. Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB is recommended.
  • 4. 4 Multiple Regression Analysis For two independent variables, the general form of the multiple regression equation is: • X1 and X2 are the independent variables. • a is the Y-intercept • b1 is the net change in Y for each unit change in X1 holding X2 constant. It is called a partial regression coefficient, a net regression coefficient, or just a regression coefficient.
  • 5. 5 Regression Plane for a 2-Independent Variable Linear Regression Equation
  • 6. 6 Salsberry Realty sells homes along the east coast of the United States. One of the questions most frequently asked by prospective buyers is: If we purchase this home, how much can we expect to pay to heat it during the winter? The research department at Salsberry has been asked to develop some guidelines regarding heating costs for single-family homes. Three variables are thought to relate to the heating costs: (1) the mean daily outside temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of the furnace. To investigate, Salsberry’s research department selected a random sample of 20 recently sold homes. It determined the cost to heat each home last January, as well (data in next slide) Multiple Linear Regression - Example
  • 8. 8 Multiple Linear Regression – Minitab Example
  • 9. 9 Multiple Linear Regression – Excel Example
  • 10. 10 The Multiple Regression Equation – Interpreting the Regression Coefficients The regression coefficient for mean outside temperature (X1) is 4.583. The coefficient is negative and shows an inverse relationship between heating cost and temperature. As the outside temperature increases, the cost to heat the home decreases. The numeric value of the regression coefficient provides more information. If we increase temperature by 1 degree and hold the other two independent variables constant, we can estimate a decrease of $4.583 in monthly heating cost.
  • 11. 11 The Multiple Regression Equation – Interpreting the Regression Coefficients The attic insulation variable (X2) also shows an inverse relationship: the more insulation in the attic, the less the cost to heat the home. So the negative sign for this coefficient is logical. For each additional inch of insulation, we expect the cost to heat the home to decline $14.83 per month, regardless of the outside temperature or the age of the furnace.
  • 12. 12 The Multiple Regression Equation – Interpreting the Regression Coefficients The age of the furnace variable (X3) shows a direct relationship. With an older furnace, the cost to heat the home increases. Specifically, for each additional year older the furnace is, we expect the cost to increase $6.10 per month.
  • 13. 13 Applying the Model for Estimation What is the estimated heating cost for a home if the mean outside temperature is 30 degrees, there are 5 inches of insulation in the attic, and the furnace is 10 years old?
  • 14. 14 Multiple Standard Error of Estimate The multiple standard error of estimate is a measure of the effectiveness of the regression equation.  It is measured in the same units as the dependent variable.  It is difficult to determine what is a large value and what is a small value of the standard error.  The formula is:
  • 15. 15
  • 16. 16 Multiple Regression and Correlation Assumptions  The independent variables and the dependent variable have a linear relationship. The dependent variable must be continuous and at least interval- scale.  The residual must be the same for all values of Y. When this is the case, we say the difference exhibits homoscedasticity.  The residuals should follow the normal distribution with mean 0.  Successive values of the dependent variable must be uncorrelated.
  • 17. 17 The ANOVA Table The ANOVA table reports the variation in the dependent variable. The variation is divided into two components.  The Explained Variation is that accounted for by the set of independent variable.  The Unexplained or Random Variation is not accounted for by the independent variables.
  • 18. 18 Minitab – the ANOVA Table
  • 19. 19 Coefficient of Multiple Determination (r2 ) Characteristics of the coefficient of multiple determination: 1. It is symbolized by a capital R squared. In other words, it is written as because it behaves like the square of a correlation coefficient. 2. It can range from 0 to 1. A value near 0 indicates little association between the set of independent variables and the dependent variable. A value near 1 means a strong association. 3. It cannot assume negative values. Any number that is squared or raised to the second power cannot be negative. 4. It is easy to interpret. Because R2 is a value between 0 and 1 it is easy to interpret, compare, and understand.
  • 20. 20 Coefficient of Multiple Determination (r2 ) - Formula
  • 21. 21 Minitab – the ANOVA Table 804.0 916,212 220,171 total 2 === SS SSR R
  • 22. 22 Adjusted Coefficient of Determination  The number of independent variables in a multiple regression equation makes the coefficient of determination larger.  If the number of variables, k, and the sample size, n, are equal, the coefficient of determination is 1.0.  To balance the effect that the number of independent variables has on the coefficient of multiple determination, statistical software packages use an adjusted coefficient of multiple determination.
  • 23. 23 Adjusted Coefficient of Determination - Example
  • 24. 24 Correlation Matrix A correlation matrix is used to show all possible simple correlation coefficients among the variables. The matrix is useful for locating correlated independent variables. It shows how strongly each independent variable is correlated with the dependent variable.
  • 25. 25 Global Test: Testing the Multiple Regression Model The global test is used to investigate whether any of the independent variables have significant coefficients. The hypotheses are: 0equals'allNot: 0...: 1 210 β βββ H H k ====
  • 26. 26 Global Test continued The test statistic is the F distribution with k (number of independent variables) and n-(k+1) degrees of freedom, where n is the sample size. Decision Rule: Reject H0 if F > Fα,k,n-k-1
  • 29. 29 Interpretation  The computed F is 21.90, in the reject H0 region  The null hypothesis that all the multiple regression coefficients are zero is therefore rejected.  Interpretation: some of the independent variables (amount of insulation, etc.) do have the ability to explain the variation in the dependent variable (heating cost).  Logical question – which ones?
  • 30. 30 Evaluating Individual Regression Coefficients (βi = 0)  Test is used to determine which independent variables have nonzero regression coefficients.  The variables that have zero regression coefficients are usually dropped from the analysis.  The test statistic is the t distribution with n-(k+1) degrees of freedom.  The hypothesis test is as follows: H0: βi = 0 H1: βi ≠ 0 Reject H0 if t > tα/2,n-k-1 or t < -tα/2,n-k-1
  • 31. 31 Critical t-stat for the Slopes -2.120 2.120
  • 34. 34 New Regression Model without Variable “Age” – Minitab
  • 35. 35 New Regression Model without Variable “Age” – Minitab
  • 36. 36 Testing the New Model for Significance
  • 37. 37 Critical t-stat for the New Slopes 110.2 0 110.2 0 00 00 00 :ifHReject 17,025.17,025. 1220,2/05.1220,2/05. 1,2/1,2/ 1,2/1,2/ 0 −< − > − −< − > − −< − > − −< − > − −<> −−−− −−−− −−−− ii ii ii ii b i b i b i b i b i b i kn b i kn b i knkn s b s b t s b t s b t s b t s b t s b t s b tttt αα αα -2.110 2.110
  • 39. 39 Evaluating the Assumptions of Multiple Regression 1. There is a linear relationship. That is, there is a straight-line relationship between the dependent variable and the set of independent variables. 2. The variation in the residuals is the same for both large and small values of the estimated Y To put it another way, the residual is unrelated whether the estimated Y is large or small. 3. The residuals follow the normal probability distribution. 4. The independent variables should not be correlated. That is, we would like to select a set of independent variables that are not themselves correlated. 5. The residuals are independent. This means that successive observations of the dependent variable are not correlated. This assumption is often violated when time is involved with the sampled observations.
  • 40. 40 Analysis of Residuals A residual is the difference between the actual value of Y and the predicted value of Y. Residuals should be approximately normally distributed. Histograms and stem-and-leaf charts are useful in checking this requirement.  A plot of the residuals and their corresponding Y’ values is used for showing that there are no trends or patterns in the residuals.
  • 43. 43 Distribution of Residuals Both MINITAB and Excel offer another graph that helps to evaluate the assumption of normally distributed residuals. It is a called a normal probability plot and is shown to the right of the histogram.
  • 44. 44 Multicollinearity  Multicollinearity exists when independent variables (X’s) are correlated.  Correlated independent variables make it difficult to make inferences about the individual regression coefficients (slopes) and their individual effects on the dependent variable (Y).  However, correlated independent variables do not affect a multiple regression equation’s ability to predict the dependent variable (Y).
  • 45. 45 Effect of Multicollinearity in  Not a Problem: Multicollinearity does not affect a multiple regression equation’s ability to predict the dependent variable  A Problem: Multicollinearity may show unexpected results in evaluating the relationship between each independent variable and the dependent variable (a.k.a. partial correlation analysis),
  • 46. 46 Clues Indicating Problems with Multicollinearity 1. An independent variable known to be an important predictor ends up having a regression coefficient that is not significant. 2. A regression coefficient that should have a positive sign turns out to be negative, or vice versa. 3. When an independent variable is added or removed, there is a drastic change in the values of the remaining regression coefficients.
  • 47. 47 Variance Inflation Factor  A general rule is if the correlation between two independent variables is between -0.70 and 0.70 there likely is not a problem using both of the independent variables.  A more precise test is to use the variance inflation factor (VIF).  The value of VIF is found as follows: •The term R2 j refers to the coefficient of determination, where the selected independent variable is used as a dependent variable and the remaining independent variables are used as independent variables. •A VIF greater than 10 is considered unsatisfactory, indicating that independent variable should be removed from the analysis.
  • 48. 48 Multicollinearity – Example Refer to the data in the table, which relates the heating cost to the independent variables outside temperature, amount of insulation, and age of furnace. Develop a correlation matrix for all the independent variables. Does it appear there is a problem with multicollinearity? Find and interpret the variance inflation factor for each of the independent variables.
  • 50. 50 VIF – Minitab Example The VIF value of 1.32 is less than the upper limit of 10. This indicates that the independent variable temperature is not strongly correlated with the other independent variables. Coefficient of Determination
  • 51. 51 Independence Assumption  The fifth assumption about regression and correlation analysis is that successive residuals should be independent.  When successive residuals are correlated we refer to this condition as autocorrelation. Autocorrelation frequently occurs when the data are collected over a period of time.
  • 52. 52 Residual Plot versus Fitted Values  The graph shows the residuals plotted on the vertical axis and the fitted values on the horizontal axis.  Note the run of residuals above the mean of the residuals, followed by a run below the mean. A scatter plot such as this would indicate possible autocorrelation.
  • 53. 53 Qualitative Independent Variables  Frequently we wish to use nominal-scale variables—such as gender, whether the home has a swimming pool, or whether the sports team was the home or the visiting team—in our analysis. These are called qualitative variables.  To use a qualitative variable in regression analysis, we use a scheme of dummy variables in which one of the two possible conditions is coded 0 and the other 1.
  • 54. 54 Qualitative Variable - Example Suppose in the Salsberry Realty example that the independent variable “garage” is added. For those homes without an attached garage, 0 is used; for homes with an attached garage, a 1 is used. We will refer to the “garage” variable as X4.The data shown on the table are entered into the MINITAB system.
  • 56. 56 Using the Model for Estimation What is the effect of the garage variable? Suppose we have two houses exactly alike next to each other in Buffalo, New York; one has an attached garage, and the other does not. Both homes have 3 inches of insulation, and the mean January temperature in Buffalo is 20 degrees. For the house without an attached garage, a 0 is substituted for in the regression equation. The estimated heating cost is $280.90, found by: For the house with an attached garage, a 1 is substituted for in the regression equation. The estimated heating cost is $358.30, found by: Without garage With garage
  • 57. 57 Testing the Model for Significance  We have shown the difference between the two types of homes to be $77.40, but is the difference significant?  We conduct the following test of hypothesis. H0: βi = 0 H1: βi ≠ 0 Reject H0 if t > tα/2,n-k-1 or t < -tα/2,n-k-1