SlideShare a Scribd company logo
1 of 36
Statistics for Managers
Using Microsoft® Excel
4th Edition
Chapter 14
Multiple Regression Model Building

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.

Chap 14-1
Chapter Goals
After completing this chapter, you should be
able to:
use quadratic terms in a regression model
 use transformed variables in a regression model
 measure the correlation among the independent
variables
 build a regression model using either the stepwise or
best-subsets approach
 explain the pitfalls involved in developing a multiple
regression model
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-2
Prentice-Hall, Inc.

Nonlinear Relationships






The relationship between the dependent
variable and an independent variable may
not be linear
Can review the scatter diagram to check for
non-linear relationships
Example: Quadratic model
2
Yi = β0 + β1X1i + β 2 X1i + ε i

The second independent variable is the square
Statistics for the first variable
of Managers Using
Microsoft Excel, 4e © 2004
Chap 14-3
Prentice-Hall, Inc.

Quadratic Regression Model
Model form:

Yi = β0 + β1X1i + β 2 X + ε i
2
1i



where:
β0 = Y intercept
β1 = regression coefficient for linear effect of X on Y
β2 = regression coefficient for quadratic effect on Y
εi = random error in Y for observation i

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-4
Linear vs. Nonlinear Fit
Y

Y

X

Linear fit does not give
Statistics for Managers Using
random 4e © 2004
Microsoft Excel,residuals
Prentice-Hall, Inc.

X

residuals

residuals

X

X



Nonlinear fit gives
random residuals

Chap 14-5
Quadratic Regression Model
2
Yi = β0 + β1X1i + β 2 X1i + ε i

Quadratic models may be considered when the scatter
diagram takes on one of the following shapes:
Y

Y

β1 < 0
β2 > 0

X1

Y

β1 > 0
β2 > 0

X1

Y

β1 < 0
β2 < 0

β1 = the Using
Statistics for Managers coefficient of the linear term
Microsoft Excel, β2 =© 2004
4e the coefficient of the squared term
Prentice-Hall, Inc.

X1

β1 > 0
β2 < 0

Chap 14-6

X1
Testing the Overall
Quadratic Model


Estimate the quadratic model to obtain the
regression equation:
ˆ = b +b X +b X2
Yi
0
1 1i
2 1i



Test for Overall Relationship
H0: β1 = β2 = 0 (no overall relationship between X and Y)
H1: β1 and/or β2 ≠ 0 (there is a relationship between X and Y)
MSR
MSE

F test statistic =
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.


Chap 14-7
Testing for Significance:
Quadratic Effect


Testing the Quadratic Effect


Compare quadratic regression equation
2
Yi = b0 + b1X1i + b 2 X1i

with the linear regression equation

Yi = b0 + b1X1i
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-8
Testing for Significance:
Quadratic Effect


(continued)

Testing the Quadratic Effect


Consider the quadratic regression equaiton
2
Yi = b0 + b1X1i + b 2 X1i

Hypotheses


H0: β2 = 0

(The quadratic term does not improve the model)



H1: β2 ≠ 0

(The quadratic term improves the model)

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-9
Testing for Significance:
Quadratic Effect


(continued)

Testing the Quadratic Effect
Hypotheses


(The quadratic term does not improve the model)





H0: β2 = 0
H1: β2 ≠ 0

(The quadratic term improves the model)

The test statistic is

b2 − β2
t=
Sb 2
Statistics for Managers Using
d.f. =
Microsoft Excel, 4e © 2004n − 3
Prentice-Hall, Inc.

where:
b2 = squared term slope
coefficient
β2 = hypothesized slope (zero)
Sb 2= standard error of the slope

Chap 14-10
Testing for Significance:
Quadratic Effect


(continued)

Testing the Quadratic Effect
Compare r2 from simple regression to
adjusted r2 from the quadratic model



If adj. r2 from the quadratic model is larger
than the r2 from the simple model, then the
quadratic model is a better model

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-11
Example: Quadratic Model
3

1

7

3

15

5

22

7

33

8

40

10

54

12

67

13

70

14

78

Purity increases as filter time
increases:

2

8



15

Purity vs. Time
100
80

Purity

Purity

Filter
Time

60
40
20

85
0
Statistics15 Managers Using0
for
87
16
Microsoft17
Excel, 4e © 2004
99
Prentice-Hall, Inc.

5

10

15

20

Time

Chap 14-12
Example: Quadratic Model


(continued)

Simple regression results:
Y = -11.283 + 5.985 Time
^
Coefficients

Standard
Error

-11.28267

3.46805

-3.25332

0.30966

19.32819

t statistic, F statistic, and r2
are all high, but the
residuals are not random:

0.00691

5.98520

 

2.078E-10

Intercept
Time

t Stat

P-value

Regression Statistics
R Square

0.96888
0.96628

Standard Error

Time Residual Plot

Significance F

373.57904

10

2.0778E-10

6.15997

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

5
Residuals

Adjusted R Square

F

0
-5 0

5

10

15

-10
Time

Chap 14-13

20
Example: Quadratic Model


(continued)

Quadratic regression results:
Y = 1.539 + 1.565 Time + 0.245 (Time)2
^
Coefficients

Standard
Error

Intercept

1.53870

2.24465

0.68550

0.50722

Time

1.56496

0.60179

2.60052

0.02467

Time-squared

0.24516

0.03258

7.52406

1.165E-05

t Stat

Time Residual Plot

P-value

10
Residuals

 

5
0
-5

Regression Statistics
0.99494

Adjusted R Square

0.99402

Standard Error

2.368E-13

The quadratic term is significant and
Statisticsthe model: adj. r2 is higher and
improves for Managers Using
Microsoft Excel, 4e are2004
SYX is lower, residuals © now random

Prentice-Hall, Inc.

10

15

20

Time

2.59513

1080.7330

5

Significance F

Time-squared Residual Plot
10
Residuals

R Square

F

0

5
0
-5

0

100

200

300

Time-squared

Chap 14-14

400
Using Transformations in
Regression Analysis
Idea:
 non-linear models can often be transformed
to a linear form




Can be estimated by least squares if transformed

transform X or Y or both to get a better fit or
to deal with violations of regression
assumptions

Can be based on theory, logic or scatter
Statistics for Managers Using
diagrams
Microsoft Excel, 4e © 2004


Prentice-Hall, Inc.

Chap 14-15
The Square Root Transformation


The square-root transformation

Yi = β0 + β1 X1i + ε i


Used to

overcome violations of the homoscedasticity
assumption
 fit a non-linear relationship
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-16
Prentice-Hall, Inc.

The Square Root Transformation
(continued)
Yi = β0 + β1X1i + ε i


Yi = β0 + β1 X1i + ε i

Shape of original relationship

Y



Y

X
Y
Statistics for Managers Using
Microsoft Excel, 4e © 2004
X
Prentice-Hall, Inc.

Relationship when transformed

b1 > 0

X
Y
b1 < 0

X
Chap 14-17
The Log Transformation
The Multiplicative Model:


Original multiplicative model



Transformed multiplicative model

log Yi = log β0 + β1 log X1i + log ε i

β
Yi = β0 X1i1 ε i

The Exponential Model:


Original multiplicative model

Yi = e

β 0 +β1X1i +β 2 X 2i

εi

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.



Transformed exponential model

ln Yi = β0 + β1X1i + β 2 X 2i + ln ε i
Chap 14-18
Interpretation of coefficients
For the multiplicative model:
log Yi = log β0 + β1 log X1i + log ε i


When both dependent and independent
variables are logged:

The coefficient of the independent variable Xk can
be interpreted as : a 1 percent change in Xk leads to
an estimated bk percentage change in the average
value of Y. Using
Statistics for ManagersTherefore bk is the elasticity of Y with
respect to a change in Xk .
Microsoft Excel, 4e © 2004
Chap 14-19
Prentice-Hall, Inc.

Collinearity


Collinearity: High correlation exists among two
or more independent variables



This means the correlated variables contribute
redundant information to the multiple regression
model

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-20
Collinearity
(continued)


Including two highly correlated explanatory
variables can adversely affect the regression
results


No new information provided



Can lead to unstable coefficients (large
standard error and low t-values)

Coefficient signs may not match prior
expectations
Statistics for Managers Using


Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-21
Some Indications of
Strong Collinearity
Incorrect signs on the coefficients
 Large change in the value of a previous
coefficient when a new variable is added to the
model
 A previously significant variable becomes
insignificant when a new independent variable
is added
 The estimate of the standard deviation of the
model increases when a variable is added to
Statistics for Managers Using
the model


Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-22
Detecting Collinearity
(Variance Inflationary Factor)
VIFj is used to measure collinearity:

1
VIFj =
2
1−R j
where R2j is the coefficient of determination
of variable Xj with all other X variables

If VIFj > 5, Xj is highly correlated with
Statistics for Managers Using
the other explanatory variables
Microsoft Excel, 4e © 2004
Chap 14-23
Prentice-Hall, Inc.
Example: Pie Sales
Week

Pie
Sales

Price
($)

Advertising
($100s)

1

350

5.50

3.3

2

460

7.50

3.3

3

350

8.00

3.0

4

430

8.00

4.5

5

350

6.80

3.0

6

380

7.50

4.0

7

430

4.50

3.0

8

470

6.40

3.7

9

450

7.00

3.5

10

490

5.00

4.0

11

340

7.20

3.5

12

300

7.90

3.2

13
440
5.90
4.0
Statistics for Managers Using
14
450
5.00
Microsoft Excel, 4e © 3.5
2004
15
300
2.7
Prentice-Hall, 7.00
Inc.

Recall the multiple regression
equation of chapter 13:

Sales = b0 + b1 (Price)
+ b2 (Advertising)

Chap 14-24
Detect Collinearity in PHStat
PHStat / regression / multiple regression …
Check the “variance inflationary factor (VIF)” box
Regression Analysis

Output for the pie sales example:
Price and all other X
 Since there are only two
Regression Statistics
explanatory variables, only one
Multiple R
0.030438
VIF is reported
R Square
0.000926
 VIF is < 5
Adjusted R
Square
-0.075925
 There is no evidence of
Standard Error
1.21527
Observations
15
Statistics for Managers Using collinearity between Price and
Advertising
VIF
Microsoft Excel,1.0009272004
4e ©
Chap 14-25
Prentice-Hall, Inc.
Model Building


Goal is to develop a model with the best set of
independent variables






Stepwise regression procedure




Easier to interpret if unimportant variables are
removed
Lower probability of collinearity
Provide evaluation of alternative models as variables
are added

Best-subset approach

Try all combinations and select the best using the
Statistics for Managers Using
highest adjusted r2 and lowest standard error
Microsoft Excel, 4e © 2004
Chap 14-26
Prentice-Hall, Inc.

Stepwise Regression


Idea: develop the least squares regression
equation in steps, adding one explanatory
variable at a time and evaluating whether
existing variables should remain or be removed



The coefficient of partial determination is the
measure of the marginal contribution of each
independent variable, given that other
independent variables are in the model

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-27
Best Subsets Regression


Idea: estimate all possible regression equations
using all possible combinations of independent
variables



Choose the best fit by looking for the highest
adjusted r2 and lowest standard error

Stepwise regression and best subsets
regression can be performed using PHStat
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-28
Prentice-Hall, Inc.
Alternative Best Subsets
Criterion


Calculate the value Cp for each potential
regression model



Consider models with Cp values close to or
below k + 1

k is the number of independent variables in the
model under consideration
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-29
Prentice-Hall, Inc.

Alternative Best Subsets
Criterion


(continued)

The Cp Statistic
2
(1 − Rk )(n − T )
Cp =
− (n − 2(k + 1))
2
1 − RT
Where

k = number of independent variables included in a
particular regression model
T = total number of parameters to be estimated in the
full regression model
2
Rk = coefficient of multiple determination for model with k
independent variables
2
R T = coefficient of
Statistics for Managers Using multiple determination for full model with
Microsoft Excel, 4e © 2004 estimated parameters
all T

Prentice-Hall, Inc.

Chap 14-30
10 Steps in Model Building
1. Choose explanatory variables to include in the
model
2. Estimate full model and check VIFs
3. Check if any VIFs > 5
4.




If no VIF > 5, go to step 5
If one VIF > 5, remove this variable
If more than one, eliminate the variable with the
highest VIF and go back to step 2

5. Perform best subsets regression with

Statistics for Managers Using
remaining variables …
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-31
10 Steps in Model Building
(continued)

6. List all models with Cp close to or less than (k
+ 1)
7. Choose the best model



Consider parsimony
Do extra variable make a significant contribution?

8. Perform complete analysis with chosen model,
including residual analysis
9. Transform the model if necessary to deal with
violations of linearity or other model
assumptions
Statistics for Managers Using
10. Excel, 4e © 2004
Microsoft Use the model for prediction
Prentice-Hall, Inc.

Chap 14-32
Model Building Flowchart
Choose X1,X2,…Xk
Run regression
to find VIFs

Any
VIF>5?
Yes

Remove
variable with
highest
VIF

Yes

More
than one?

No

Run subsets
regression to obtain
“best” models in
terms of Cp
Do complete analysis
Add quadratic term and/or
transform variables as indicated

No

Statistics for Managers Using
Remove
Microsoft Excel, 4e © this X
2004
Prentice-Hall, Inc.

Perform
predictions

Chap 14-33
Pitfalls and Ethical
Considerations
To avoid pitfalls and address ethical considerations:


Understand that interpretation of the
estimated regression coefficients are
performed holding all other independent
variables constant



Evaluate residual plots for each independent
variable



Evaluate interaction terms

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-34
Additional Pitfalls
and Ethical Considerations

(continued)

To avoid pitfalls and address ethical considerations:


Obtain VIFs for each independent variable
before determining which variables should be
included in the model



Examine several alternative models using bestsubsets regression

Use other methods when the assumptions
necessary for least-squares regression have
been seriously violated
Statistics for Managers Using


Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-35
Chapter Summary



Developed the quadratic regression model
Discussed using transformations in
regression models






The multiplicative model
The exponential model

Described collinearity
Discussed model building



Stepwise regression
Best subsets

 Addressed pitfalls
Statistics for Managers Usingin multiple regression and
ethical 4e © 2004
Microsoft Excel,considerations
Chap 14-36
Prentice-Hall, Inc.

More Related Content

What's hot

What's hot (20)

Chap05 discrete probability distributions
Chap05 discrete probability distributionsChap05 discrete probability distributions
Chap05 discrete probability distributions
 
Chap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distributionChap04 discrete random variables and probability distribution
Chap04 discrete random variables and probability distribution
 
Correlations using SPSS
Correlations using SPSSCorrelations using SPSS
Correlations using SPSS
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Malhotra17
Malhotra17Malhotra17
Malhotra17
 
Basic business statistics 2
Basic business statistics 2Basic business statistics 2
Basic business statistics 2
 
P value in proportion
P value in proportionP value in proportion
P value in proportion
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributions
 
Chap08 fundamentals of hypothesis
Chap08 fundamentals of  hypothesisChap08 fundamentals of  hypothesis
Chap08 fundamentals of hypothesis
 
Chebyshev's inequality
Chebyshev's inequalityChebyshev's inequality
Chebyshev's inequality
 
Types Of Index Numbers
Types Of Index NumbersTypes Of Index Numbers
Types Of Index Numbers
 
Mpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlationMpc 006 - 02-03 partial and multiple correlation
Mpc 006 - 02-03 partial and multiple correlation
 
Chap03 numerical descriptive measures
Chap03 numerical descriptive measuresChap03 numerical descriptive measures
Chap03 numerical descriptive measures
 
Unit 4 Tests of Significance
Unit 4 Tests of SignificanceUnit 4 Tests of Significance
Unit 4 Tests of Significance
 
Chap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributionsChap05 continuous random variables and probability distributions
Chap05 continuous random variables and probability distributions
 
Presentation On Regression
Presentation On RegressionPresentation On Regression
Presentation On Regression
 
Chap15 time series forecasting & index number
Chap15 time series forecasting & index numberChap15 time series forecasting & index number
Chap15 time series forecasting & index number
 
Chap12 simple regression
Chap12 simple regressionChap12 simple regression
Chap12 simple regression
 
multiple regression
multiple regressionmultiple regression
multiple regression
 
Moment generating function
Moment generating functionMoment generating function
Moment generating function
 

Viewers also liked

Linear regression
Linear regressionLinear regression
Linear regression
Tech_MX
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
Harsh Upadhyay
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
mailund
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
mailund
 
Ch1 2 index number
Ch1 2 index numberCh1 2 index number
Ch1 2 index number
Jay Tanna
 

Viewers also liked (20)

Linear regression
Linear regressionLinear regression
Linear regression
 
Chap17 statistical applications on management
Chap17 statistical applications on managementChap17 statistical applications on management
Chap17 statistical applications on management
 
Chap16 decision making
Chap16 decision makingChap16 decision making
Chap16 decision making
 
Bbs10 ppt ch16
Bbs10 ppt ch16Bbs10 ppt ch16
Bbs10 ppt ch16
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Ch1 2 index number
Ch1 2 index numberCh1 2 index number
Ch1 2 index number
 
Advertising
AdvertisingAdvertising
Advertising
 
Clr model
Clr modelClr model
Clr model
 
Chap04 basic probability
Chap04 basic probabilityChap04 basic probability
Chap04 basic probability
 
CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...
CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...
CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...
 
Chap01 intro & data collection
Chap01 intro & data collectionChap01 intro & data collection
Chap01 intro & data collection
 
Time series, forecasting, and index numbers
Time series, forecasting, and index numbersTime series, forecasting, and index numbers
Time series, forecasting, and index numbers
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
Experimental Methods in Mass Communication Research
Experimental Methods in Mass Communication ResearchExperimental Methods in Mass Communication Research
Experimental Methods in Mass Communication Research
 
Business Statistics Chapter 9
Business Statistics Chapter 9Business Statistics Chapter 9
Business Statistics Chapter 9
 
Business Statistics Chapter 8
Business Statistics Chapter 8Business Statistics Chapter 8
Business Statistics Chapter 8
 
Confidence interval with Z
Confidence interval with ZConfidence interval with Z
Confidence interval with Z
 

Similar to Chap14 multiple regression model building

Quantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docxQuantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docx
hildredzr1di
 
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCELUSING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
IslamBarakat18
 

Similar to Chap14 multiple regression model building (20)

multiple regression model building
 multiple regression model building multiple regression model building
multiple regression model building
 
Simple Linear Regression
Simple Linear RegressionSimple Linear Regression
Simple Linear Regression
 
chap12.pptx
chap12.pptxchap12.pptx
chap12.pptx
 
Introduction to Multiple Regression
Introduction to Multiple RegressionIntroduction to Multiple Regression
Introduction to Multiple Regression
 
Chapter 4 power point presentation Regression models
Chapter 4 power point presentation Regression modelsChapter 4 power point presentation Regression models
Chapter 4 power point presentation Regression models
 
Chap10 anova
Chap10 anovaChap10 anova
Chap10 anova
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Statistical Applications in Quality and Productivity Management
Statistical Applications in Quality and Productivity ManagementStatistical Applications in Quality and Productivity Management
Statistical Applications in Quality and Productivity Management
 
Chi square using excel
Chi square using excelChi square using excel
Chi square using excel
 
Chap06 normal distributions & continous
Chap06 normal distributions & continousChap06 normal distributions & continous
Chap06 normal distributions & continous
 
04 regression
04 regression04 regression
04 regression
 
Numerical Descriptive Measures
Numerical Descriptive MeasuresNumerical Descriptive Measures
Numerical Descriptive Measures
 
Newbold_chap14.ppt
Newbold_chap14.pptNewbold_chap14.ppt
Newbold_chap14.ppt
 
Time Series Forecasting and Index Numbers
Time Series Forecasting and Index NumbersTime Series Forecasting and Index Numbers
Time Series Forecasting and Index Numbers
 
Quantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docxQuantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docx
 
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCELUSING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
 
Fundamentals of Programming Chapter 5
Fundamentals of Programming Chapter 5Fundamentals of Programming Chapter 5
Fundamentals of Programming Chapter 5
 
Rsh qam11 ch04 ge
Rsh qam11 ch04 geRsh qam11 ch04 ge
Rsh qam11 ch04 ge
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Msb12e ppt ch11
Msb12e ppt ch11Msb12e ppt ch11
Msb12e ppt ch11
 

More from Uni Azza Aunillah

presentation listening and summarize
presentation listening and summarizepresentation listening and summarize
presentation listening and summarize
Uni Azza Aunillah
 
Etika bisnis dalam lingkungan produksi
Etika bisnis dalam lingkungan produksiEtika bisnis dalam lingkungan produksi
Etika bisnis dalam lingkungan produksi
Uni Azza Aunillah
 

More from Uni Azza Aunillah (13)

Chap07 interval estimation
Chap07 interval estimationChap07 interval estimation
Chap07 interval estimation
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
Equation
EquationEquation
Equation
 
Saluran distribusi
Saluran distribusiSaluran distribusi
Saluran distribusi
 
Mengintegrasikan teori
Mengintegrasikan teoriMengintegrasikan teori
Mengintegrasikan teori
 
Kepemimpinen Dahlan Iskan
Kepemimpinen Dahlan IskanKepemimpinen Dahlan Iskan
Kepemimpinen Dahlan Iskan
 
presentation listening and summarize
presentation listening and summarizepresentation listening and summarize
presentation listening and summarize
 
Perbankan syariah
Perbankan syariahPerbankan syariah
Perbankan syariah
 
Uu bi no 3 tahun 2004
Uu bi no 3 tahun 2004Uu bi no 3 tahun 2004
Uu bi no 3 tahun 2004
 
Modul ekonomi moneter
Modul ekonomi moneterModul ekonomi moneter
Modul ekonomi moneter
 
Teori manajemen klasik
Teori manajemen klasikTeori manajemen klasik
Teori manajemen klasik
 
Proses pengawasan dalam manajemen
Proses pengawasan dalam manajemenProses pengawasan dalam manajemen
Proses pengawasan dalam manajemen
 
Etika bisnis dalam lingkungan produksi
Etika bisnis dalam lingkungan produksiEtika bisnis dalam lingkungan produksi
Etika bisnis dalam lingkungan produksi
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

The UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, OcadoThe UX of Automation by AJ King, Senior UX Researcher, Ocado
The UX of Automation by AJ King, Senior UX Researcher, Ocado
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Buy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdfBuy Epson EcoTank L3210 Colour Printer Online.pdf
Buy Epson EcoTank L3210 Colour Printer Online.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
SOQL 201 for Admins & Developers: Slice & Dice Your Org’s Data With Aggregate...
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 

Chap14 multiple regression model building

  • 1. Statistics for Managers Using Microsoft® Excel 4th Edition Chapter 14 Multiple Regression Model Building Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1
  • 2. Chapter Goals After completing this chapter, you should be able to: use quadratic terms in a regression model  use transformed variables in a regression model  measure the correlation among the independent variables  build a regression model using either the stepwise or best-subsets approach  explain the pitfalls involved in developing a multiple regression model Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-2 Prentice-Hall, Inc. 
  • 3. Nonlinear Relationships    The relationship between the dependent variable and an independent variable may not be linear Can review the scatter diagram to check for non-linear relationships Example: Quadratic model 2 Yi = β0 + β1X1i + β 2 X1i + ε i The second independent variable is the square Statistics for the first variable of Managers Using Microsoft Excel, 4e © 2004 Chap 14-3 Prentice-Hall, Inc. 
  • 4. Quadratic Regression Model Model form: Yi = β0 + β1X1i + β 2 X + ε i 2 1i  where: β0 = Y intercept β1 = regression coefficient for linear effect of X on Y β2 = regression coefficient for quadratic effect on Y εi = random error in Y for observation i Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-4
  • 5. Linear vs. Nonlinear Fit Y Y X Linear fit does not give Statistics for Managers Using random 4e © 2004 Microsoft Excel,residuals Prentice-Hall, Inc. X residuals residuals X X  Nonlinear fit gives random residuals Chap 14-5
  • 6. Quadratic Regression Model 2 Yi = β0 + β1X1i + β 2 X1i + ε i Quadratic models may be considered when the scatter diagram takes on one of the following shapes: Y Y β1 < 0 β2 > 0 X1 Y β1 > 0 β2 > 0 X1 Y β1 < 0 β2 < 0 β1 = the Using Statistics for Managers coefficient of the linear term Microsoft Excel, β2 =© 2004 4e the coefficient of the squared term Prentice-Hall, Inc. X1 β1 > 0 β2 < 0 Chap 14-6 X1
  • 7. Testing the Overall Quadratic Model  Estimate the quadratic model to obtain the regression equation: ˆ = b +b X +b X2 Yi 0 1 1i 2 1i  Test for Overall Relationship H0: β1 = β2 = 0 (no overall relationship between X and Y) H1: β1 and/or β2 ≠ 0 (there is a relationship between X and Y) MSR MSE F test statistic = Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.  Chap 14-7
  • 8. Testing for Significance: Quadratic Effect  Testing the Quadratic Effect  Compare quadratic regression equation 2 Yi = b0 + b1X1i + b 2 X1i with the linear regression equation Yi = b0 + b1X1i Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-8
  • 9. Testing for Significance: Quadratic Effect  (continued) Testing the Quadratic Effect  Consider the quadratic regression equaiton 2 Yi = b0 + b1X1i + b 2 X1i Hypotheses  H0: β2 = 0 (The quadratic term does not improve the model)  H1: β2 ≠ 0 (The quadratic term improves the model) Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-9
  • 10. Testing for Significance: Quadratic Effect  (continued) Testing the Quadratic Effect Hypotheses  (The quadratic term does not improve the model)   H0: β2 = 0 H1: β2 ≠ 0 (The quadratic term improves the model) The test statistic is b2 − β2 t= Sb 2 Statistics for Managers Using d.f. = Microsoft Excel, 4e © 2004n − 3 Prentice-Hall, Inc. where: b2 = squared term slope coefficient β2 = hypothesized slope (zero) Sb 2= standard error of the slope Chap 14-10
  • 11. Testing for Significance: Quadratic Effect  (continued) Testing the Quadratic Effect Compare r2 from simple regression to adjusted r2 from the quadratic model  If adj. r2 from the quadratic model is larger than the r2 from the simple model, then the quadratic model is a better model Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-11
  • 12. Example: Quadratic Model 3 1 7 3 15 5 22 7 33 8 40 10 54 12 67 13 70 14 78 Purity increases as filter time increases: 2 8  15 Purity vs. Time 100 80 Purity Purity Filter Time 60 40 20 85 0 Statistics15 Managers Using0 for 87 16 Microsoft17 Excel, 4e © 2004 99 Prentice-Hall, Inc. 5 10 15 20 Time Chap 14-12
  • 13. Example: Quadratic Model  (continued) Simple regression results: Y = -11.283 + 5.985 Time ^ Coefficients Standard Error -11.28267 3.46805 -3.25332 0.30966 19.32819 t statistic, F statistic, and r2 are all high, but the residuals are not random: 0.00691 5.98520   2.078E-10 Intercept Time t Stat P-value Regression Statistics R Square 0.96888 0.96628 Standard Error Time Residual Plot Significance F 373.57904 10 2.0778E-10 6.15997 Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. 5 Residuals Adjusted R Square F 0 -5 0 5 10 15 -10 Time Chap 14-13 20
  • 14. Example: Quadratic Model  (continued) Quadratic regression results: Y = 1.539 + 1.565 Time + 0.245 (Time)2 ^ Coefficients Standard Error Intercept 1.53870 2.24465 0.68550 0.50722 Time 1.56496 0.60179 2.60052 0.02467 Time-squared 0.24516 0.03258 7.52406 1.165E-05 t Stat Time Residual Plot P-value 10 Residuals   5 0 -5 Regression Statistics 0.99494 Adjusted R Square 0.99402 Standard Error 2.368E-13 The quadratic term is significant and Statisticsthe model: adj. r2 is higher and improves for Managers Using Microsoft Excel, 4e are2004 SYX is lower, residuals © now random Prentice-Hall, Inc. 10 15 20 Time 2.59513 1080.7330 5 Significance F Time-squared Residual Plot 10 Residuals R Square F 0 5 0 -5 0 100 200 300 Time-squared Chap 14-14 400
  • 15. Using Transformations in Regression Analysis Idea:  non-linear models can often be transformed to a linear form   Can be estimated by least squares if transformed transform X or Y or both to get a better fit or to deal with violations of regression assumptions Can be based on theory, logic or scatter Statistics for Managers Using diagrams Microsoft Excel, 4e © 2004  Prentice-Hall, Inc. Chap 14-15
  • 16. The Square Root Transformation  The square-root transformation Yi = β0 + β1 X1i + ε i  Used to overcome violations of the homoscedasticity assumption  fit a non-linear relationship Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-16 Prentice-Hall, Inc. 
  • 17. The Square Root Transformation (continued) Yi = β0 + β1X1i + ε i  Yi = β0 + β1 X1i + ε i Shape of original relationship Y  Y X Y Statistics for Managers Using Microsoft Excel, 4e © 2004 X Prentice-Hall, Inc. Relationship when transformed b1 > 0 X Y b1 < 0 X Chap 14-17
  • 18. The Log Transformation The Multiplicative Model:  Original multiplicative model  Transformed multiplicative model log Yi = log β0 + β1 log X1i + log ε i β Yi = β0 X1i1 ε i The Exponential Model:  Original multiplicative model Yi = e β 0 +β1X1i +β 2 X 2i εi Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.  Transformed exponential model ln Yi = β0 + β1X1i + β 2 X 2i + ln ε i Chap 14-18
  • 19. Interpretation of coefficients For the multiplicative model: log Yi = log β0 + β1 log X1i + log ε i  When both dependent and independent variables are logged: The coefficient of the independent variable Xk can be interpreted as : a 1 percent change in Xk leads to an estimated bk percentage change in the average value of Y. Using Statistics for ManagersTherefore bk is the elasticity of Y with respect to a change in Xk . Microsoft Excel, 4e © 2004 Chap 14-19 Prentice-Hall, Inc. 
  • 20. Collinearity  Collinearity: High correlation exists among two or more independent variables  This means the correlated variables contribute redundant information to the multiple regression model Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-20
  • 21. Collinearity (continued)  Including two highly correlated explanatory variables can adversely affect the regression results  No new information provided  Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations Statistics for Managers Using  Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-21
  • 22. Some Indications of Strong Collinearity Incorrect signs on the coefficients  Large change in the value of a previous coefficient when a new variable is added to the model  A previously significant variable becomes insignificant when a new independent variable is added  The estimate of the standard deviation of the model increases when a variable is added to Statistics for Managers Using the model  Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-22
  • 23. Detecting Collinearity (Variance Inflationary Factor) VIFj is used to measure collinearity: 1 VIFj = 2 1−R j where R2j is the coefficient of determination of variable Xj with all other X variables If VIFj > 5, Xj is highly correlated with Statistics for Managers Using the other explanatory variables Microsoft Excel, 4e © 2004 Chap 14-23 Prentice-Hall, Inc.
  • 24. Example: Pie Sales Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430 8.00 4.5 5 350 6.80 3.0 6 380 7.50 4.0 7 430 4.50 3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 Statistics for Managers Using 14 450 5.00 Microsoft Excel, 4e © 3.5 2004 15 300 2.7 Prentice-Hall, 7.00 Inc. Recall the multiple regression equation of chapter 13: Sales = b0 + b1 (Price) + b2 (Advertising) Chap 14-24
  • 25. Detect Collinearity in PHStat PHStat / regression / multiple regression … Check the “variance inflationary factor (VIF)” box Regression Analysis Output for the pie sales example: Price and all other X  Since there are only two Regression Statistics explanatory variables, only one Multiple R 0.030438 VIF is reported R Square 0.000926  VIF is < 5 Adjusted R Square -0.075925  There is no evidence of Standard Error 1.21527 Observations 15 Statistics for Managers Using collinearity between Price and Advertising VIF Microsoft Excel,1.0009272004 4e © Chap 14-25 Prentice-Hall, Inc.
  • 26. Model Building  Goal is to develop a model with the best set of independent variables    Stepwise regression procedure   Easier to interpret if unimportant variables are removed Lower probability of collinearity Provide evaluation of alternative models as variables are added Best-subset approach Try all combinations and select the best using the Statistics for Managers Using highest adjusted r2 and lowest standard error Microsoft Excel, 4e © 2004 Chap 14-26 Prentice-Hall, Inc. 
  • 27. Stepwise Regression  Idea: develop the least squares regression equation in steps, adding one explanatory variable at a time and evaluating whether existing variables should remain or be removed  The coefficient of partial determination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-27
  • 28. Best Subsets Regression  Idea: estimate all possible regression equations using all possible combinations of independent variables  Choose the best fit by looking for the highest adjusted r2 and lowest standard error Stepwise regression and best subsets regression can be performed using PHStat Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-28 Prentice-Hall, Inc.
  • 29. Alternative Best Subsets Criterion  Calculate the value Cp for each potential regression model  Consider models with Cp values close to or below k + 1 k is the number of independent variables in the model under consideration Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-29 Prentice-Hall, Inc. 
  • 30. Alternative Best Subsets Criterion  (continued) The Cp Statistic 2 (1 − Rk )(n − T ) Cp = − (n − 2(k + 1)) 2 1 − RT Where k = number of independent variables included in a particular regression model T = total number of parameters to be estimated in the full regression model 2 Rk = coefficient of multiple determination for model with k independent variables 2 R T = coefficient of Statistics for Managers Using multiple determination for full model with Microsoft Excel, 4e © 2004 estimated parameters all T Prentice-Hall, Inc. Chap 14-30
  • 31. 10 Steps in Model Building 1. Choose explanatory variables to include in the model 2. Estimate full model and check VIFs 3. Check if any VIFs > 5 4.    If no VIF > 5, go to step 5 If one VIF > 5, remove this variable If more than one, eliminate the variable with the highest VIF and go back to step 2 5. Perform best subsets regression with Statistics for Managers Using remaining variables … Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-31
  • 32. 10 Steps in Model Building (continued) 6. List all models with Cp close to or less than (k + 1) 7. Choose the best model   Consider parsimony Do extra variable make a significant contribution? 8. Perform complete analysis with chosen model, including residual analysis 9. Transform the model if necessary to deal with violations of linearity or other model assumptions Statistics for Managers Using 10. Excel, 4e © 2004 Microsoft Use the model for prediction Prentice-Hall, Inc. Chap 14-32
  • 33. Model Building Flowchart Choose X1,X2,…Xk Run regression to find VIFs Any VIF>5? Yes Remove variable with highest VIF Yes More than one? No Run subsets regression to obtain “best” models in terms of Cp Do complete analysis Add quadratic term and/or transform variables as indicated No Statistics for Managers Using Remove Microsoft Excel, 4e © this X 2004 Prentice-Hall, Inc. Perform predictions Chap 14-33
  • 34. Pitfalls and Ethical Considerations To avoid pitfalls and address ethical considerations:  Understand that interpretation of the estimated regression coefficients are performed holding all other independent variables constant  Evaluate residual plots for each independent variable  Evaluate interaction terms Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-34
  • 35. Additional Pitfalls and Ethical Considerations (continued) To avoid pitfalls and address ethical considerations:  Obtain VIFs for each independent variable before determining which variables should be included in the model  Examine several alternative models using bestsubsets regression Use other methods when the assumptions necessary for least-squares regression have been seriously violated Statistics for Managers Using  Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-35
  • 36. Chapter Summary   Developed the quadratic regression model Discussed using transformations in regression models     The multiplicative model The exponential model Described collinearity Discussed model building   Stepwise regression Best subsets  Addressed pitfalls Statistics for Managers Usingin multiple regression and ethical 4e © 2004 Microsoft Excel,considerations Chap 14-36 Prentice-Hall, Inc.

Editor's Notes

  1. {}