SlideShare a Scribd company logo
Statistics for Managers
Using Microsoft® Excel
4th Edition
Chapter 14
Multiple Regression Model Building

Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.

Chap 14-1
Chapter Goals
After completing this chapter, you should be
able to:
use quadratic terms in a regression model
 use transformed variables in a regression model
 measure the correlation among the independent
variables
 build a regression model using either the stepwise or
best-subsets approach
 explain the pitfalls involved in developing a multiple
regression model
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-2
Prentice-Hall, Inc.

Nonlinear Relationships






The relationship between the dependent
variable and an independent variable may
not be linear
Can review the scatter diagram to check for
non-linear relationships
Example: Quadratic model
2
Yi = β0 + β1X1i + β 2 X1i + ε i

The second independent variable is the square
Statistics for the first variable
of Managers Using
Microsoft Excel, 4e © 2004
Chap 14-3
Prentice-Hall, Inc.

Quadratic Regression Model
Model form:

Yi = β0 + β1X1i + β 2 X + ε i
2
1i



where:
β0 = Y intercept
β1 = regression coefficient for linear effect of X on Y
β2 = regression coefficient for quadratic effect on Y
εi = random error in Y for observation i

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-4
Linear vs. Nonlinear Fit
Y

Y

X

Linear fit does not give
Statistics for Managers Using
random 4e © 2004
Microsoft Excel,residuals
Prentice-Hall, Inc.

X

residuals

residuals

X

X



Nonlinear fit gives
random residuals

Chap 14-5
Quadratic Regression Model
2
Yi = β0 + β1X1i + β 2 X1i + ε i

Quadratic models may be considered when the scatter
diagram takes on one of the following shapes:
Y

Y

β1 < 0
β2 > 0

X1

Y

β1 > 0
β2 > 0

X1

Y

β1 < 0
β2 < 0

β1 = the Using
Statistics for Managers coefficient of the linear term
Microsoft Excel, β2 =© 2004
4e the coefficient of the squared term
Prentice-Hall, Inc.

X1

β1 > 0
β2 < 0

Chap 14-6

X1
Testing the Overall
Quadratic Model


Estimate the quadratic model to obtain the
regression equation:
ˆ = b +b X +b X2
Yi
0
1 1i
2 1i



Test for Overall Relationship
H0: β1 = β2 = 0 (no overall relationship between X and Y)
H1: β1 and/or β2 ≠ 0 (there is a relationship between X and Y)
MSR
MSE

F test statistic =
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.


Chap 14-7
Testing for Significance:
Quadratic Effect


Testing the Quadratic Effect


Compare quadratic regression equation
2
Yi = b0 + b1X1i + b 2 X1i

with the linear regression equation

Yi = b0 + b1X1i
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-8
Testing for Significance:
Quadratic Effect


(continued)

Testing the Quadratic Effect


Consider the quadratic regression equaiton
2
Yi = b0 + b1X1i + b 2 X1i

Hypotheses


H0: β2 = 0

(The quadratic term does not improve the model)



H1: β2 ≠ 0

(The quadratic term improves the model)

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-9
Testing for Significance:
Quadratic Effect


(continued)

Testing the Quadratic Effect
Hypotheses


(The quadratic term does not improve the model)





H0: β2 = 0
H1: β2 ≠ 0

(The quadratic term improves the model)

The test statistic is

b2 − β2
t=
Sb 2
Statistics for Managers Using
d.f. =
Microsoft Excel, 4e © 2004n − 3
Prentice-Hall, Inc.

where:
b2 = squared term slope
coefficient
β2 = hypothesized slope (zero)
Sb 2= standard error of the slope

Chap 14-10
Testing for Significance:
Quadratic Effect


(continued)

Testing the Quadratic Effect
Compare r2 from simple regression to
adjusted r2 from the quadratic model



If adj. r2 from the quadratic model is larger
than the r2 from the simple model, then the
quadratic model is a better model

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-11
Example: Quadratic Model
3

1

7

3

15

5

22

7

33

8

40

10

54

12

67

13

70

14

78

Purity increases as filter time
increases:

2

8



15

Purity vs. Time
100
80

Purity

Purity

Filter
Time

60
40
20

85
0
Statistics15 Managers Using0
for
87
16
Microsoft17
Excel, 4e © 2004
99
Prentice-Hall, Inc.

5

10

15

20

Time

Chap 14-12
Example: Quadratic Model


(continued)

Simple regression results:
Y = -11.283 + 5.985 Time
^
Coefficients

Standard
Error

-11.28267

3.46805

-3.25332

0.30966

19.32819

t statistic, F statistic, and r2
are all high, but the
residuals are not random:

0.00691

5.98520

 

2.078E-10

Intercept
Time

t Stat

P-value

Regression Statistics
R Square

0.96888
0.96628

Standard Error

Time Residual Plot

Significance F

373.57904

10

2.0778E-10

6.15997

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

5
Residuals

Adjusted R Square

F

0
-5 0

5

10

15

-10
Time

Chap 14-13

20
Example: Quadratic Model


(continued)

Quadratic regression results:
Y = 1.539 + 1.565 Time + 0.245 (Time)2
^
Coefficients

Standard
Error

Intercept

1.53870

2.24465

0.68550

0.50722

Time

1.56496

0.60179

2.60052

0.02467

Time-squared

0.24516

0.03258

7.52406

1.165E-05

t Stat

Time Residual Plot

P-value

10
Residuals

 

5
0
-5

Regression Statistics
0.99494

Adjusted R Square

0.99402

Standard Error

2.368E-13

The quadratic term is significant and
Statisticsthe model: adj. r2 is higher and
improves for Managers Using
Microsoft Excel, 4e are2004
SYX is lower, residuals © now random

Prentice-Hall, Inc.

10

15

20

Time

2.59513

1080.7330

5

Significance F

Time-squared Residual Plot
10
Residuals

R Square

F

0

5
0
-5

0

100

200

300

Time-squared

Chap 14-14

400
Using Transformations in
Regression Analysis
Idea:
 non-linear models can often be transformed
to a linear form




Can be estimated by least squares if transformed

transform X or Y or both to get a better fit or
to deal with violations of regression
assumptions

Can be based on theory, logic or scatter
Statistics for Managers Using
diagrams
Microsoft Excel, 4e © 2004


Prentice-Hall, Inc.

Chap 14-15
The Square Root Transformation


The square-root transformation

Yi = β0 + β1 X1i + ε i


Used to

overcome violations of the homoscedasticity
assumption
 fit a non-linear relationship
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-16
Prentice-Hall, Inc.

The Square Root Transformation
(continued)
Yi = β0 + β1X1i + ε i


Yi = β0 + β1 X1i + ε i

Shape of original relationship

Y



Y

X
Y
Statistics for Managers Using
Microsoft Excel, 4e © 2004
X
Prentice-Hall, Inc.

Relationship when transformed

b1 > 0

X
Y
b1 < 0

X
Chap 14-17
The Log Transformation
The Multiplicative Model:


Original multiplicative model



Transformed multiplicative model

log Yi = log β0 + β1 log X1i + log ε i

β
Yi = β0 X1i1 ε i

The Exponential Model:


Original multiplicative model

Yi = e

β 0 +β1X1i +β 2 X 2i

εi

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.



Transformed exponential model

ln Yi = β0 + β1X1i + β 2 X 2i + ln ε i
Chap 14-18
Interpretation of coefficients
For the multiplicative model:
log Yi = log β0 + β1 log X1i + log ε i


When both dependent and independent
variables are logged:

The coefficient of the independent variable Xk can
be interpreted as : a 1 percent change in Xk leads to
an estimated bk percentage change in the average
value of Y. Using
Statistics for ManagersTherefore bk is the elasticity of Y with
respect to a change in Xk .
Microsoft Excel, 4e © 2004
Chap 14-19
Prentice-Hall, Inc.

Collinearity


Collinearity: High correlation exists among two
or more independent variables



This means the correlated variables contribute
redundant information to the multiple regression
model

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-20
Collinearity
(continued)


Including two highly correlated explanatory
variables can adversely affect the regression
results


No new information provided



Can lead to unstable coefficients (large
standard error and low t-values)

Coefficient signs may not match prior
expectations
Statistics for Managers Using


Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-21
Some Indications of
Strong Collinearity
Incorrect signs on the coefficients
 Large change in the value of a previous
coefficient when a new variable is added to the
model
 A previously significant variable becomes
insignificant when a new independent variable
is added
 The estimate of the standard deviation of the
model increases when a variable is added to
Statistics for Managers Using
the model


Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-22
Detecting Collinearity
(Variance Inflationary Factor)
VIFj is used to measure collinearity:

1
VIFj =
2
1−R j
where R2j is the coefficient of determination
of variable Xj with all other X variables

If VIFj > 5, Xj is highly correlated with
Statistics for Managers Using
the other explanatory variables
Microsoft Excel, 4e © 2004
Chap 14-23
Prentice-Hall, Inc.
Example: Pie Sales
Week

Pie
Sales

Price
($)

Advertising
($100s)

1

350

5.50

3.3

2

460

7.50

3.3

3

350

8.00

3.0

4

430

8.00

4.5

5

350

6.80

3.0

6

380

7.50

4.0

7

430

4.50

3.0

8

470

6.40

3.7

9

450

7.00

3.5

10

490

5.00

4.0

11

340

7.20

3.5

12

300

7.90

3.2

13
440
5.90
4.0
Statistics for Managers Using
14
450
5.00
Microsoft Excel, 4e © 3.5
2004
15
300
2.7
Prentice-Hall, 7.00
Inc.

Recall the multiple regression
equation of chapter 13:

Sales = b0 + b1 (Price)
+ b2 (Advertising)

Chap 14-24
Detect Collinearity in PHStat
PHStat / regression / multiple regression …
Check the “variance inflationary factor (VIF)” box
Regression Analysis

Output for the pie sales example:
Price and all other X
 Since there are only two
Regression Statistics
explanatory variables, only one
Multiple R
0.030438
VIF is reported
R Square
0.000926
 VIF is < 5
Adjusted R
Square
-0.075925
 There is no evidence of
Standard Error
1.21527
Observations
15
Statistics for Managers Using collinearity between Price and
Advertising
VIF
Microsoft Excel,1.0009272004
4e ©
Chap 14-25
Prentice-Hall, Inc.
Model Building


Goal is to develop a model with the best set of
independent variables






Stepwise regression procedure




Easier to interpret if unimportant variables are
removed
Lower probability of collinearity
Provide evaluation of alternative models as variables
are added

Best-subset approach

Try all combinations and select the best using the
Statistics for Managers Using
highest adjusted r2 and lowest standard error
Microsoft Excel, 4e © 2004
Chap 14-26
Prentice-Hall, Inc.

Stepwise Regression


Idea: develop the least squares regression
equation in steps, adding one explanatory
variable at a time and evaluating whether
existing variables should remain or be removed



The coefficient of partial determination is the
measure of the marginal contribution of each
independent variable, given that other
independent variables are in the model

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-27
Best Subsets Regression


Idea: estimate all possible regression equations
using all possible combinations of independent
variables



Choose the best fit by looking for the highest
adjusted r2 and lowest standard error

Stepwise regression and best subsets
regression can be performed using PHStat
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-28
Prentice-Hall, Inc.
Alternative Best Subsets
Criterion


Calculate the value Cp for each potential
regression model



Consider models with Cp values close to or
below k + 1

k is the number of independent variables in the
model under consideration
Statistics for Managers Using
Microsoft Excel, 4e © 2004
Chap 14-29
Prentice-Hall, Inc.

Alternative Best Subsets
Criterion


(continued)

The Cp Statistic
2
(1 − Rk )(n − T )
Cp =
− (n − 2(k + 1))
2
1 − RT
Where

k = number of independent variables included in a
particular regression model
T = total number of parameters to be estimated in the
full regression model
2
Rk = coefficient of multiple determination for model with k
independent variables
2
R T = coefficient of
Statistics for Managers Using multiple determination for full model with
Microsoft Excel, 4e © 2004 estimated parameters
all T

Prentice-Hall, Inc.

Chap 14-30
10 Steps in Model Building
1. Choose explanatory variables to include in the
model
2. Estimate full model and check VIFs
3. Check if any VIFs > 5
4.




If no VIF > 5, go to step 5
If one VIF > 5, remove this variable
If more than one, eliminate the variable with the
highest VIF and go back to step 2

5. Perform best subsets regression with

Statistics for Managers Using
remaining variables …
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-31
10 Steps in Model Building
(continued)

6. List all models with Cp close to or less than (k
+ 1)
7. Choose the best model



Consider parsimony
Do extra variable make a significant contribution?

8. Perform complete analysis with chosen model,
including residual analysis
9. Transform the model if necessary to deal with
violations of linearity or other model
assumptions
Statistics for Managers Using
10. Excel, 4e © 2004
Microsoft Use the model for prediction
Prentice-Hall, Inc.

Chap 14-32
Model Building Flowchart
Choose X1,X2,…Xk
Run regression
to find VIFs

Any
VIF>5?
Yes

Remove
variable with
highest
VIF

Yes

More
than one?

No

Run subsets
regression to obtain
“best” models in
terms of Cp
Do complete analysis
Add quadratic term and/or
transform variables as indicated

No

Statistics for Managers Using
Remove
Microsoft Excel, 4e © this X
2004
Prentice-Hall, Inc.

Perform
predictions

Chap 14-33
Pitfalls and Ethical
Considerations
To avoid pitfalls and address ethical considerations:


Understand that interpretation of the
estimated regression coefficients are
performed holding all other independent
variables constant



Evaluate residual plots for each independent
variable



Evaluate interaction terms

Statistics for Managers Using
Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-34
Additional Pitfalls
and Ethical Considerations

(continued)

To avoid pitfalls and address ethical considerations:


Obtain VIFs for each independent variable
before determining which variables should be
included in the model



Examine several alternative models using bestsubsets regression

Use other methods when the assumptions
necessary for least-squares regression have
been seriously violated
Statistics for Managers Using


Microsoft Excel, 4e © 2004
Prentice-Hall, Inc.

Chap 14-35
Chapter Summary



Developed the quadratic regression model
Discussed using transformations in
regression models






The multiplicative model
The exponential model

Described collinearity
Discussed model building



Stepwise regression
Best subsets

 Addressed pitfalls
Statistics for Managers Usingin multiple regression and
ethical 4e © 2004
Microsoft Excel,considerations
Chap 14-36
Prentice-Hall, Inc.

More Related Content

What's hot

Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
Kemal İnciroğlu
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
bmcfad01
 
Simple regression model
Simple regression modelSimple regression model
Simple regression model
MidoTami
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
Sayantan Baidya
 

What's hot (20)

Bbs11 ppt ch10
Bbs11 ppt ch10Bbs11 ppt ch10
Bbs11 ppt ch10
 
Hypothesis tests for one and two population variances ppt @ bec doms
Hypothesis tests for one and two population variances ppt @ bec domsHypothesis tests for one and two population variances ppt @ bec doms
Hypothesis tests for one and two population variances ppt @ bec doms
 
Some Important Discrete Probability Distributions
Some Important Discrete Probability DistributionsSome Important Discrete Probability Distributions
Some Important Discrete Probability Distributions
 
Basic Probability
Basic ProbabilityBasic Probability
Basic Probability
 
Correlation
CorrelationCorrelation
Correlation
 
Linear regression
Linear regressionLinear regression
Linear regression
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
Multiple regression
Multiple regressionMultiple regression
Multiple regression
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Exponential probability distribution
Exponential probability distributionExponential probability distribution
Exponential probability distribution
 
Econometrics and business forecasting
Econometrics and business forecastingEconometrics and business forecasting
Econometrics and business forecasting
 
Chapter 11
Chapter 11Chapter 11
Chapter 11
 
Chap03 numerical descriptive measures
Chap03 numerical descriptive measuresChap03 numerical descriptive measures
Chap03 numerical descriptive measures
 
Chap15 time series forecasting & index number
Chap15 time series forecasting & index numberChap15 time series forecasting & index number
Chap15 time series forecasting & index number
 
Simple regression model
Simple regression modelSimple regression model
Simple regression model
 
Multiple Regression
Multiple RegressionMultiple Regression
Multiple Regression
 
Chap08 fundamentals of hypothesis
Chap08 fundamentals of  hypothesisChap08 fundamentals of  hypothesis
Chap08 fundamentals of hypothesis
 
Malhotra17
Malhotra17Malhotra17
Malhotra17
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
 
Two-sample Hypothesis Tests
Two-sample Hypothesis Tests Two-sample Hypothesis Tests
Two-sample Hypothesis Tests
 

Viewers also liked

Linear regression
Linear regressionLinear regression
Linear regression
Tech_MX
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
Harsh Upadhyay
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
mailund
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
mailund
 
Ch1 2 index number
Ch1 2 index numberCh1 2 index number
Ch1 2 index number
Jay Tanna
 

Viewers also liked (20)

Chap12 simple regression
Chap12 simple regressionChap12 simple regression
Chap12 simple regression
 
Chap13 intro to multiple regression
Chap13 intro to multiple regressionChap13 intro to multiple regression
Chap13 intro to multiple regression
 
Chap17 statistical applications on management
Chap17 statistical applications on managementChap17 statistical applications on management
Chap17 statistical applications on management
 
Chap16 decision making
Chap16 decision makingChap16 decision making
Chap16 decision making
 
Bbs10 ppt ch16
Bbs10 ppt ch16Bbs10 ppt ch16
Bbs10 ppt ch16
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Simple linear regression (final)
Simple linear regression (final)Simple linear regression (final)
Simple linear regression (final)
 
Linear Regression Ex
Linear Regression ExLinear Regression Ex
Linear Regression Ex
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
Ch1 2 index number
Ch1 2 index numberCh1 2 index number
Ch1 2 index number
 
Advertising
AdvertisingAdvertising
Advertising
 
Clr model
Clr modelClr model
Clr model
 
Chap04 basic probability
Chap04 basic probabilityChap04 basic probability
Chap04 basic probability
 
CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...
CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...
CMMI High Maturity Best Practices HMBP 2010: Process Performance Models:Not N...
 
Chap01 intro & data collection
Chap01 intro & data collectionChap01 intro & data collection
Chap01 intro & data collection
 
Time series, forecasting, and index numbers
Time series, forecasting, and index numbersTime series, forecasting, and index numbers
Time series, forecasting, and index numbers
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
Experimental Methods in Mass Communication Research
Experimental Methods in Mass Communication ResearchExperimental Methods in Mass Communication Research
Experimental Methods in Mass Communication Research
 
Business Statistics Chapter 9
Business Statistics Chapter 9Business Statistics Chapter 9
Business Statistics Chapter 9
 
Business Statistics Chapter 8
Business Statistics Chapter 8Business Statistics Chapter 8
Business Statistics Chapter 8
 

Similar to Chap14 multiple regression model building

Quantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docxQuantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docx
hildredzr1di
 
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCELUSING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
IslamBarakat18
 

Similar to Chap14 multiple regression model building (20)

multiple regression model building
 multiple regression model building multiple regression model building
multiple regression model building
 
chap12.pptx
chap12.pptxchap12.pptx
chap12.pptx
 
Introduction to Multiple Regression
Introduction to Multiple RegressionIntroduction to Multiple Regression
Introduction to Multiple Regression
 
Chapter 4 power point presentation Regression models
Chapter 4 power point presentation Regression modelsChapter 4 power point presentation Regression models
Chapter 4 power point presentation Regression models
 
Chap10 anova
Chap10 anovaChap10 anova
Chap10 anova
 
Analysis of Variance
Analysis of VarianceAnalysis of Variance
Analysis of Variance
 
Statistical Applications in Quality and Productivity Management
Statistical Applications in Quality and Productivity ManagementStatistical Applications in Quality and Productivity Management
Statistical Applications in Quality and Productivity Management
 
Chi square using excel
Chi square using excelChi square using excel
Chi square using excel
 
Chap06 normal distributions & continous
Chap06 normal distributions & continousChap06 normal distributions & continous
Chap06 normal distributions & continous
 
04 regression
04 regression04 regression
04 regression
 
Numerical Descriptive Measures
Numerical Descriptive MeasuresNumerical Descriptive Measures
Numerical Descriptive Measures
 
Newbold_chap14.ppt
Newbold_chap14.pptNewbold_chap14.ppt
Newbold_chap14.ppt
 
Time Series Forecasting and Index Numbers
Time Series Forecasting and Index NumbersTime Series Forecasting and Index Numbers
Time Series Forecasting and Index Numbers
 
Quantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docxQuantitative Analysis for ManagementThirteenth EditionLesson.docx
Quantitative Analysis for ManagementThirteenth EditionLesson.docx
 
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCELUSING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
USING ADVANCED FUNCTIONS AND CONITIONAL FORMATTING EXCEL
 
Fundamentals of Programming Chapter 5
Fundamentals of Programming Chapter 5Fundamentals of Programming Chapter 5
Fundamentals of Programming Chapter 5
 
Rsh qam11 ch04 ge
Rsh qam11 ch04 geRsh qam11 ch04 ge
Rsh qam11 ch04 ge
 
Multiple Regression.ppt
Multiple Regression.pptMultiple Regression.ppt
Multiple Regression.ppt
 
Msb12e ppt ch11
Msb12e ppt ch11Msb12e ppt ch11
Msb12e ppt ch11
 
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
GlobalLogic Machine Learning Webinar “Statistical learning of linear regressi...
 

More from Uni Azza Aunillah (12)

Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
Equation
EquationEquation
Equation
 
Saluran distribusi
Saluran distribusiSaluran distribusi
Saluran distribusi
 
Mengintegrasikan teori
Mengintegrasikan teoriMengintegrasikan teori
Mengintegrasikan teori
 
Kepemimpinen Dahlan Iskan
Kepemimpinen Dahlan IskanKepemimpinen Dahlan Iskan
Kepemimpinen Dahlan Iskan
 
presentation listening and summarize
presentation listening and summarizepresentation listening and summarize
presentation listening and summarize
 
Perbankan syariah
Perbankan syariahPerbankan syariah
Perbankan syariah
 
Uu bi no 3 tahun 2004
Uu bi no 3 tahun 2004Uu bi no 3 tahun 2004
Uu bi no 3 tahun 2004
 
Modul ekonomi moneter
Modul ekonomi moneterModul ekonomi moneter
Modul ekonomi moneter
 
Teori manajemen klasik
Teori manajemen klasikTeori manajemen klasik
Teori manajemen klasik
 
Proses pengawasan dalam manajemen
Proses pengawasan dalam manajemenProses pengawasan dalam manajemen
Proses pengawasan dalam manajemen
 
Etika bisnis dalam lingkungan produksi
Etika bisnis dalam lingkungan produksiEtika bisnis dalam lingkungan produksi
Etika bisnis dalam lingkungan produksi
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 

Chap14 multiple regression model building

  • 1. Statistics for Managers Using Microsoft® Excel 4th Edition Chapter 14 Multiple Regression Model Building Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1
  • 2. Chapter Goals After completing this chapter, you should be able to: use quadratic terms in a regression model  use transformed variables in a regression model  measure the correlation among the independent variables  build a regression model using either the stepwise or best-subsets approach  explain the pitfalls involved in developing a multiple regression model Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-2 Prentice-Hall, Inc. 
  • 3. Nonlinear Relationships    The relationship between the dependent variable and an independent variable may not be linear Can review the scatter diagram to check for non-linear relationships Example: Quadratic model 2 Yi = β0 + β1X1i + β 2 X1i + ε i The second independent variable is the square Statistics for the first variable of Managers Using Microsoft Excel, 4e © 2004 Chap 14-3 Prentice-Hall, Inc. 
  • 4. Quadratic Regression Model Model form: Yi = β0 + β1X1i + β 2 X + ε i 2 1i  where: β0 = Y intercept β1 = regression coefficient for linear effect of X on Y β2 = regression coefficient for quadratic effect on Y εi = random error in Y for observation i Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-4
  • 5. Linear vs. Nonlinear Fit Y Y X Linear fit does not give Statistics for Managers Using random 4e © 2004 Microsoft Excel,residuals Prentice-Hall, Inc. X residuals residuals X X  Nonlinear fit gives random residuals Chap 14-5
  • 6. Quadratic Regression Model 2 Yi = β0 + β1X1i + β 2 X1i + ε i Quadratic models may be considered when the scatter diagram takes on one of the following shapes: Y Y β1 < 0 β2 > 0 X1 Y β1 > 0 β2 > 0 X1 Y β1 < 0 β2 < 0 β1 = the Using Statistics for Managers coefficient of the linear term Microsoft Excel, β2 =© 2004 4e the coefficient of the squared term Prentice-Hall, Inc. X1 β1 > 0 β2 < 0 Chap 14-6 X1
  • 7. Testing the Overall Quadratic Model  Estimate the quadratic model to obtain the regression equation: ˆ = b +b X +b X2 Yi 0 1 1i 2 1i  Test for Overall Relationship H0: β1 = β2 = 0 (no overall relationship between X and Y) H1: β1 and/or β2 ≠ 0 (there is a relationship between X and Y) MSR MSE F test statistic = Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.  Chap 14-7
  • 8. Testing for Significance: Quadratic Effect  Testing the Quadratic Effect  Compare quadratic regression equation 2 Yi = b0 + b1X1i + b 2 X1i with the linear regression equation Yi = b0 + b1X1i Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-8
  • 9. Testing for Significance: Quadratic Effect  (continued) Testing the Quadratic Effect  Consider the quadratic regression equaiton 2 Yi = b0 + b1X1i + b 2 X1i Hypotheses  H0: β2 = 0 (The quadratic term does not improve the model)  H1: β2 ≠ 0 (The quadratic term improves the model) Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-9
  • 10. Testing for Significance: Quadratic Effect  (continued) Testing the Quadratic Effect Hypotheses  (The quadratic term does not improve the model)   H0: β2 = 0 H1: β2 ≠ 0 (The quadratic term improves the model) The test statistic is b2 − β2 t= Sb 2 Statistics for Managers Using d.f. = Microsoft Excel, 4e © 2004n − 3 Prentice-Hall, Inc. where: b2 = squared term slope coefficient β2 = hypothesized slope (zero) Sb 2= standard error of the slope Chap 14-10
  • 11. Testing for Significance: Quadratic Effect  (continued) Testing the Quadratic Effect Compare r2 from simple regression to adjusted r2 from the quadratic model  If adj. r2 from the quadratic model is larger than the r2 from the simple model, then the quadratic model is a better model Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-11
  • 12. Example: Quadratic Model 3 1 7 3 15 5 22 7 33 8 40 10 54 12 67 13 70 14 78 Purity increases as filter time increases: 2 8  15 Purity vs. Time 100 80 Purity Purity Filter Time 60 40 20 85 0 Statistics15 Managers Using0 for 87 16 Microsoft17 Excel, 4e © 2004 99 Prentice-Hall, Inc. 5 10 15 20 Time Chap 14-12
  • 13. Example: Quadratic Model  (continued) Simple regression results: Y = -11.283 + 5.985 Time ^ Coefficients Standard Error -11.28267 3.46805 -3.25332 0.30966 19.32819 t statistic, F statistic, and r2 are all high, but the residuals are not random: 0.00691 5.98520   2.078E-10 Intercept Time t Stat P-value Regression Statistics R Square 0.96888 0.96628 Standard Error Time Residual Plot Significance F 373.57904 10 2.0778E-10 6.15997 Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. 5 Residuals Adjusted R Square F 0 -5 0 5 10 15 -10 Time Chap 14-13 20
  • 14. Example: Quadratic Model  (continued) Quadratic regression results: Y = 1.539 + 1.565 Time + 0.245 (Time)2 ^ Coefficients Standard Error Intercept 1.53870 2.24465 0.68550 0.50722 Time 1.56496 0.60179 2.60052 0.02467 Time-squared 0.24516 0.03258 7.52406 1.165E-05 t Stat Time Residual Plot P-value 10 Residuals   5 0 -5 Regression Statistics 0.99494 Adjusted R Square 0.99402 Standard Error 2.368E-13 The quadratic term is significant and Statisticsthe model: adj. r2 is higher and improves for Managers Using Microsoft Excel, 4e are2004 SYX is lower, residuals © now random Prentice-Hall, Inc. 10 15 20 Time 2.59513 1080.7330 5 Significance F Time-squared Residual Plot 10 Residuals R Square F 0 5 0 -5 0 100 200 300 Time-squared Chap 14-14 400
  • 15. Using Transformations in Regression Analysis Idea:  non-linear models can often be transformed to a linear form   Can be estimated by least squares if transformed transform X or Y or both to get a better fit or to deal with violations of regression assumptions Can be based on theory, logic or scatter Statistics for Managers Using diagrams Microsoft Excel, 4e © 2004  Prentice-Hall, Inc. Chap 14-15
  • 16. The Square Root Transformation  The square-root transformation Yi = β0 + β1 X1i + ε i  Used to overcome violations of the homoscedasticity assumption  fit a non-linear relationship Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-16 Prentice-Hall, Inc. 
  • 17. The Square Root Transformation (continued) Yi = β0 + β1X1i + ε i  Yi = β0 + β1 X1i + ε i Shape of original relationship Y  Y X Y Statistics for Managers Using Microsoft Excel, 4e © 2004 X Prentice-Hall, Inc. Relationship when transformed b1 > 0 X Y b1 < 0 X Chap 14-17
  • 18. The Log Transformation The Multiplicative Model:  Original multiplicative model  Transformed multiplicative model log Yi = log β0 + β1 log X1i + log ε i β Yi = β0 X1i1 ε i The Exponential Model:  Original multiplicative model Yi = e β 0 +β1X1i +β 2 X 2i εi Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc.  Transformed exponential model ln Yi = β0 + β1X1i + β 2 X 2i + ln ε i Chap 14-18
  • 19. Interpretation of coefficients For the multiplicative model: log Yi = log β0 + β1 log X1i + log ε i  When both dependent and independent variables are logged: The coefficient of the independent variable Xk can be interpreted as : a 1 percent change in Xk leads to an estimated bk percentage change in the average value of Y. Using Statistics for ManagersTherefore bk is the elasticity of Y with respect to a change in Xk . Microsoft Excel, 4e © 2004 Chap 14-19 Prentice-Hall, Inc. 
  • 20. Collinearity  Collinearity: High correlation exists among two or more independent variables  This means the correlated variables contribute redundant information to the multiple regression model Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-20
  • 21. Collinearity (continued)  Including two highly correlated explanatory variables can adversely affect the regression results  No new information provided  Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations Statistics for Managers Using  Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-21
  • 22. Some Indications of Strong Collinearity Incorrect signs on the coefficients  Large change in the value of a previous coefficient when a new variable is added to the model  A previously significant variable becomes insignificant when a new independent variable is added  The estimate of the standard deviation of the model increases when a variable is added to Statistics for Managers Using the model  Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-22
  • 23. Detecting Collinearity (Variance Inflationary Factor) VIFj is used to measure collinearity: 1 VIFj = 2 1−R j where R2j is the coefficient of determination of variable Xj with all other X variables If VIFj > 5, Xj is highly correlated with Statistics for Managers Using the other explanatory variables Microsoft Excel, 4e © 2004 Chap 14-23 Prentice-Hall, Inc.
  • 24. Example: Pie Sales Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3.3 3 350 8.00 3.0 4 430 8.00 4.5 5 350 6.80 3.0 6 380 7.50 4.0 7 430 4.50 3.0 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 4.0 11 340 7.20 3.5 12 300 7.90 3.2 13 440 5.90 4.0 Statistics for Managers Using 14 450 5.00 Microsoft Excel, 4e © 3.5 2004 15 300 2.7 Prentice-Hall, 7.00 Inc. Recall the multiple regression equation of chapter 13: Sales = b0 + b1 (Price) + b2 (Advertising) Chap 14-24
  • 25. Detect Collinearity in PHStat PHStat / regression / multiple regression … Check the “variance inflationary factor (VIF)” box Regression Analysis Output for the pie sales example: Price and all other X  Since there are only two Regression Statistics explanatory variables, only one Multiple R 0.030438 VIF is reported R Square 0.000926  VIF is < 5 Adjusted R Square -0.075925  There is no evidence of Standard Error 1.21527 Observations 15 Statistics for Managers Using collinearity between Price and Advertising VIF Microsoft Excel,1.0009272004 4e © Chap 14-25 Prentice-Hall, Inc.
  • 26. Model Building  Goal is to develop a model with the best set of independent variables    Stepwise regression procedure   Easier to interpret if unimportant variables are removed Lower probability of collinearity Provide evaluation of alternative models as variables are added Best-subset approach Try all combinations and select the best using the Statistics for Managers Using highest adjusted r2 and lowest standard error Microsoft Excel, 4e © 2004 Chap 14-26 Prentice-Hall, Inc. 
  • 27. Stepwise Regression  Idea: develop the least squares regression equation in steps, adding one explanatory variable at a time and evaluating whether existing variables should remain or be removed  The coefficient of partial determination is the measure of the marginal contribution of each independent variable, given that other independent variables are in the model Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-27
  • 28. Best Subsets Regression  Idea: estimate all possible regression equations using all possible combinations of independent variables  Choose the best fit by looking for the highest adjusted r2 and lowest standard error Stepwise regression and best subsets regression can be performed using PHStat Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-28 Prentice-Hall, Inc.
  • 29. Alternative Best Subsets Criterion  Calculate the value Cp for each potential regression model  Consider models with Cp values close to or below k + 1 k is the number of independent variables in the model under consideration Statistics for Managers Using Microsoft Excel, 4e © 2004 Chap 14-29 Prentice-Hall, Inc. 
  • 30. Alternative Best Subsets Criterion  (continued) The Cp Statistic 2 (1 − Rk )(n − T ) Cp = − (n − 2(k + 1)) 2 1 − RT Where k = number of independent variables included in a particular regression model T = total number of parameters to be estimated in the full regression model 2 Rk = coefficient of multiple determination for model with k independent variables 2 R T = coefficient of Statistics for Managers Using multiple determination for full model with Microsoft Excel, 4e © 2004 estimated parameters all T Prentice-Hall, Inc. Chap 14-30
  • 31. 10 Steps in Model Building 1. Choose explanatory variables to include in the model 2. Estimate full model and check VIFs 3. Check if any VIFs > 5 4.    If no VIF > 5, go to step 5 If one VIF > 5, remove this variable If more than one, eliminate the variable with the highest VIF and go back to step 2 5. Perform best subsets regression with Statistics for Managers Using remaining variables … Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-31
  • 32. 10 Steps in Model Building (continued) 6. List all models with Cp close to or less than (k + 1) 7. Choose the best model   Consider parsimony Do extra variable make a significant contribution? 8. Perform complete analysis with chosen model, including residual analysis 9. Transform the model if necessary to deal with violations of linearity or other model assumptions Statistics for Managers Using 10. Excel, 4e © 2004 Microsoft Use the model for prediction Prentice-Hall, Inc. Chap 14-32
  • 33. Model Building Flowchart Choose X1,X2,…Xk Run regression to find VIFs Any VIF>5? Yes Remove variable with highest VIF Yes More than one? No Run subsets regression to obtain “best” models in terms of Cp Do complete analysis Add quadratic term and/or transform variables as indicated No Statistics for Managers Using Remove Microsoft Excel, 4e © this X 2004 Prentice-Hall, Inc. Perform predictions Chap 14-33
  • 34. Pitfalls and Ethical Considerations To avoid pitfalls and address ethical considerations:  Understand that interpretation of the estimated regression coefficients are performed holding all other independent variables constant  Evaluate residual plots for each independent variable  Evaluate interaction terms Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-34
  • 35. Additional Pitfalls and Ethical Considerations (continued) To avoid pitfalls and address ethical considerations:  Obtain VIFs for each independent variable before determining which variables should be included in the model  Examine several alternative models using bestsubsets regression Use other methods when the assumptions necessary for least-squares regression have been seriously violated Statistics for Managers Using  Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-35
  • 36. Chapter Summary   Developed the quadratic regression model Discussed using transformations in regression models     The multiplicative model The exponential model Described collinearity Discussed model building   Stepwise regression Best subsets  Addressed pitfalls Statistics for Managers Usingin multiple regression and ethical 4e © 2004 Microsoft Excel,considerations Chap 14-36 Prentice-Hall, Inc.

Editor's Notes

  1. {}