SlideShare a Scribd company logo
1 of 100
Applied Business Forecasting
and planning
Multiple Regression Analysis
Introduction
 In simple linear regression we studied the
relationship between one explanatory
variable and one response variable.
 Now, we look at situations where several
explanatory variables works together to
explain the response.
Introduction
 Following our principles of data analysis,
we look first at each variable separately,
then at relationships among the variables.
 We look at the distribution of each variable
to be used in multiple regression to
determine if there are any unusual patterns
that may be important in building our
regression analysis.
Multiple Regression
 Example. In a study of direct operating cost, Y, for 67
branch offices of consumer finance charge, four
independent variables were considered:
 X1: Average size of loan outstanding during the year,
 X2 : Average number of loans outstanding,
 X3 : Total number of new loan applications processed, and
 X4 : Office salary scale index.
 The model for this example is





 




 4
4
3
3
2
2
1
1
0 x
X
x
x
Y
Formal Statement of the Model
 General regression model
• 0, 1, , k are parameters
• X1, X2, …,Xk are known constants
•  , the error terms are independent N(o, 2)




 




 k
k x
x
x
Y 
2
2
1
1
0
Estimating the parameters of the model
 The values of the regression parameters i are not
known. We estimate them from data.
 As in the simple linear regression case, we use the
least-squares method to fit a linear function
to the data.
 The least-squares method chooses the b’s that
make the sum of squares of the residuals as small
as possible.
k
k x
b
x
b
x
b
b
y 



 
2
2
1
1
0
ˆ
Estimating the parameters of the model
 The least-squares estimates are the values that
minimize the quantity
 Since the formulas for the least-squares estimates
are complicated and hand calculation is out of
question, we are content to understand the least-
squares principle and let software do the
computations.
2
1
)
ˆ
(



n
i
i
i y
y
Estimating the parameters of the model
 The estimate of i is bi and it indicates the change
in the mean response per unit increase in Xi when
the rest of the independent variables in the model
are held constant.
 The parameters i are frequently called partial
regression coefficients because they reflect the
partial effect of one independent variable when the
rest of independent variables are included in the
model and are held constant
Estimating the parameters of the model
 The observed variability of the responses
about this fitted model is measured by the
variance
and the regression standard error






n
i
i
i y
y
k
n
S
1
2
2
)
ˆ
(
1
1
2
s
s 
Estimating the parameters of the model
 In the model 2 and  measure the
variability of the responses about the
population regression equation.
 It is natural to estimate 2 by s2 and  by s.
Analysis of Variance Table
 The basic idea of the regression ANOVA table are
the same in simple and multiple regression.
 The sum of squares decomposition and the
associated degrees of freedom are:
 df:
SSE
SSR
SST
y
y
y
y
y
y i
i
i
i






  
 2
2
2
)
ˆ
(
)
ˆ
(
)
(
)
1
(
1 



 k
n
k
n
Analysis of Variance Table
Source Sum of
Squares
df Mean
Square
F-test
Regression SSR k MSR=
SSR/k
MSR/MSE
Error SSE n-k-1 MSE=
SSE/n-k-1
Total SST n-1
F-test for the overall fit of the model
 To test the statistical significance of the regression
relation between the response variable y and the
set of variables x1,…, xk, i.e. to choose between
the alternatives:
 We use the test statistic:
zero
equal
)
,
1
(
all
not
:
0
: 2
1
0
k
i
H
H
i
a
k











MSE
MSR
F 
F-test for the overall fit of the model
 The decision rule at significance level  is:
 Reject H0 if
 Where the critical value F(, k, n-k-1) can be found
from an F-table.
 The existence of a regression relation by itself
does not assure that useful prediction can be made
by using it.
 Note that when k=1, this test reduces to the F-test
for testing in simple linear regression whether or
not 1= 0
)
1
,
;
( 

 k
n
k
F
F 
Interval estimation of i
 For our regression model, we have:
 Therefore, an interval estimate for i
with 1-  confidence coefficient is:
Where
freedom
of
degrees
1
-
k
-
n
on with
distributi
-
t
a
has
)
( i
i
i
b
s
b 

)
(
)
1
;
2
( i
i b
s
k
n
t
b 



 
 2
)
(
)
(
x
x
MSE
b
s i
Significance tests for i
 To test:
 We may use the test statistic:
 Reject H0 if
0
:
0
:
0


i
a
i
H
H


)
( i
i
b
s
b
t 
)
1
;
2
(
)
1
;
2
(







k
n
t
t
or
k
n
t
t


Multiple regression model Building
 Often we have many explanatory variables,
and our goal is to use these to explain the
variation in the response variable.
 A model using just a few of the variables
often predicts about as well as the model
using all the explanatory variables.
Multiple regression model Building
 We may find that the reciprocal of a variable is a
better choice than the variable itself, or that
including the square of an explanatory variable
improves prediction.
 We may find that the effect of one explanatory
variable may depends upon the value of another
explanatory variable. We account for this situation
by including interaction terms.
Multiple regression model Building
 The simplest way to construct an interaction
term is to multiply the two explanatory
variables together.
 How can we find a good model?
Selecting the best Regression equation.
 After a lengthy list of potentially useful
independent variables has been compiled,
some of the independent variables can be
screened out. An independent variable
 May not be fundamental to the problem
 May be subject to large measurement error
 May effectively duplicate another independent
variable in the list.
Selecting the best Regression Equation.
 Once the investigator has tentatively
decided upon the functional forms of the
regression relations (linear, quadratic, etc.),
the next step is to obtain a subset of the
explanatory variables (x) that “best” explain
the variability in the response variable y.
Selecting the best Regression Equation.
 An automatic search procedure that
develops sequentially the subset of
explanatory variables to be included in the
regression model is called stepwise
procedure.
 It was developed to economize on
computational efforts.
 It will end with the identification of a single
regression model as “best”.
Example: Sales Forecasting
 Sales Forecasting
 Multiple regression is a popular technique for predicting
product sales with the help of other variables that are likely to
have a bearing on sales.
 Example
 The growth of cable television has created vast new potential
in the home entertainment business. The following table gives
the values of several variables measured in a random sample of
20 local television stations which offer their programming to
cable subscribers. A TV industry analyst wants to build a
statistical model for predicting the number of subscribers that a
cable station can expect.
Example:Sales Forecasting
 Y = Number of cable subscribers (SUSCRIB)
 X1 = Advertising rate which the station charges local
advertisers for one minute of prim time
space (ADRATE)
 X2 = Kilowatt power of the station’s non-cable signal
(KILOWATT)
 X3 = Number of families living in the station’s area of
dominant influence (ADI), a geographical division of
radio and TV audiences (APIPOP)
 X4 = Number of competing stations in the ADI
(COMPETE)
Example:Sales Forecasting
 The sample data are fitted by a multiple regression
model using Excel program.
 The marginal t-test provides a way of choosing the
variables for inclusion in the equation.
 The fitted Model is
SIGNAL
COMPETE 







 4
3
2
1
0 APIPOP
ADRATE
SUBSCRIBE 




Example:Sales Forecasting
 Excel Summary output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.884267744
R Square 0.781929444
Adjusted R Square 0.723777295
Standard Error 142.9354188
Observations 20
ANOVA
df SS MS F Significance F
Regression 4 1098857.84 274714.4601 13.44626923 7.52E-05
Residual 15 306458.0092 20430.53395
Total 19 1405315.85
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 51.42007002 98.97458277 0.51952803 0.610973806 -159.539 262.3795
AD_Rate -0.267196347 0.081055107 -3.296477624 0.004894126 -0.43996 -0.09443
Signal -0.020105139 0.045184758 -0.444954014 0.662706578 -0.11641 0.076204
APIPOP 0.440333955 0.135200486 3.256896248 0.005307766 0.152161 0.728507
Compete 16.230071 26.47854322 0.61295181 0.549089662 -40.2076 72.66778
Example:Sales Forecasting
 Do we need all the four variables in the
model?
 Based on the partial t-test, the variables
signal and compete are the least significant
variables in our model.
 Let’s drop the least significant variables one
at a time.
Example:Sales Forecasting
 Excel Summary Output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.882638739
R Square 0.779051144
Adjusted R Square 0.737623233
Standard Error 139.3069743
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 1094812.92 364937.64 18.80498277 1.69966E-05
Residual 16 310502.9296 19406.4331
Total 19 1405315.85
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 51.31610447 96.4618242 0.531983558 0.602046756 -153.1737817 255.806
AD_Rate -0.259538026 0.077195983 -3.36206646 0.003965102 -0.423186162 -0.09589
APIPOP 0.433505145 0.130916687 3.311305499 0.004412929 0.15597423 0.711036
Compete 13.92154404 25.30614013 0.550125146 0.589831583 -39.72506442 67.56815
Example:Sales Forecasting
 The variable Compete is the next variable to
get rid of.
Example:Sales Forecasting
 Excel Summary Output
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8802681
R Square 0.774871928
Adjusted R Square 0.748386273
Standard Error 136.4197776
Observations 20
ANOVA
df SS MS F Significance F
Regression 2 1088939.802 544469.901 29.2562866 3.13078E-06
Residual 17 316376.0474 18610.35573
Total 19 1405315.85
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 96.28121395 50.16415506 1.919322948 0.07188916 -9.556049653 202.1184776
AD_Rate -0.254280696 0.075014548 -3.389751739 0.003484198 -0.41254778 -0.096013612
APIPOP 0.495481252 0.065306012 7.587069489 7.45293E-07 0.357697418 0.633265086
Example:Sales Forecasting
• All the variables in the model are
statistically significant, therefore our final
model is:
Final Model
APIPOP
495
.
0
ADRATE
25
.
0
28
.
96
SUBSCRIBE 




Interpreting the Final Model
 What is the interpretation of the estimated parameters.
 Is the association positive or negative?
 Does this make sense intuitively, based on what the data
represents?
 What other variables could be confounders?
 Are there other analysis that you might consider doing?
New questions raised?
Multicollinearity
 In multiple regression analysis, one is often
concerned with the nature and significance of the
relations between the explanatory variables and
the response variable.
 Questions that are frequently asked are:
 What is the relative importance of the effects of the
different independent variables?
 What is the magnitude of the effect of a given
independent variable on the dependent variable?
Multicollinearity
 Can any independent variable be dropped from the
model because it has little or no effect on the dependent
variable?
 Should any independent variables not yet included in
the model be considered for possible inclusion?
 Simple answers can be given to these questions if
 The independent variables in the model are
uncorrelated among themselves.
 They are uncorrelated with any other independent
variables that are related to the dependent variable but
omitted from the model.
Multicollinearity
 When the independent variables are correlated among
themselves, multicollinearity or colinearity among them is
said to exist.
 In many non-experimental situations in business,
economics, and the social and biological sciences, the
independent variables tend to be correlated among
themselves.
 For example, in a regression of family food expenditures
on the variables: family income, family savings, and the
age of head of household, the explanatory variables will
be correlated among themselves.
Multicollinearity
 Further, the explanatory variables will also
be correlated with other socioeconomic
variables not included in the model that do
affect family food expenditures, such as
family size.
Multicollinearity
 Some key problems that typically arise when the
explanatory variables being considered for the regression
model are highly correlated among themselves are:
1. Adding or deleting an explanatory variable changes the
regression coefficients.
2. The estimated standard deviations of the regression coefficients
become large when the explanatory variables in the regression
model are highly correlated with each other.
3. The estimated regression coefficients individually may not be
statistically significant even though a definite statistical relation
exists between the response variable and the set of explanatory
variables.
Multicollinearity Diagnostics
 A formal method of detecting the presence of
multicollinearity that is widely used is by the
means of Variance Inflation Factor.
 It measures how much the variances of the estimated
regression coefficients are inflated as compared to
when the independent variables are not linearly related.
 Is the coefficient of determination from the
regression of the jth independent variable on the
remaining k-1 independent variables.
k
j
R
VIF
j
j 
,
2
,
1
,
1
1
2



2
j
R
Multicollinearity Diagnostics
 AVIF near 1 suggests that multicollinearity is not a
problem for the independent variables.
 Its estimated coefficient and associated t value will not change
much as the other independent variables are added or deleted from
the regression equation.
 A VIF much greater than 1 indicates the presence of
multicollinearity. A maximum VIF value in excess of 10 is
often taken as an indication that the multicollinearity may
be unduly influencing the least square estimates.
 the estimated coefficient attached to the variable is unstable and
its associated t statistic may change considerably as the other
independent variables are added or deleted.
Multicollinearity Diagnostics
 The simple correlation coefficient between all
pairs of explanatory variables (i.e., X1, X2, …, Xk
) is helpful in selecting appropriate explanatory
variables for a regression model and is also critical
for examining multicollinearity.
 While it is true that a correlation very close to +1
or –1 does suggest multicollinearity, it is not true
(unless there are only two explanatory variables)
to infer multicollinearity does not exist when there
are no high correlations between any pair of
explanatory variables.
Example:Sales Forecasting
Pearson Correlation Coefficients, N = 20
Prob > |r| under H0: Rho=0
SUBSCRIB ADRATE KILOWATT APIPOP COMPETE
SUBSCRIB 1.00000 -0.02848 0.44762 0.90447 0.79832
SUBSCRIB 0.9051 0.0478 <.0001 <.0001
ADRATE -0.02848 1.00000 -0.01021 0.32512 0.34147
ADRATE 0.9051 0.9659 0.1619 0.1406
KILOWATT 0.44762 -0.01021 1.00000 0.45303 0.46895
KILOWATT 0.0478 0.9659 0.0449 0.0370
APIPOP 0.90447 0.32512 0.45303 1.00000 0.87592
APIPOP <.0001 0.1619 0.0449 <.0001
COMPETE 0.79832 0.34147 0.46895 0.87592 1.00000
COMPETE <.0001 0.1406 0.0370 <.0001
Example:Sales Forecasting
APIPOP
495
.
0
ADRATE
25
.
0
28
.
96
SUBSCRIBE 




COMPETE
16.23
APIPOP
44
.
0
SIGNAL
.02
-
ADRATE
27
.
0
42
.
51
SUBSCRIBE 







COMPETE
13.92
APIPOP
43
.
0
ADRATE
26
.
0
32
.
51
SUBSCRIBE 






Example:Sales Forecasting
 VIF calculation:
 Fit the model
COMPETE






 3
2
1
0 ADRATE
SIGNAL
APIPOP 



SUMMARY OUTPUT
Regression Statistics
Multiple R 0.878054
R Square 0.770978
Adjusted R Square 0.728036
Standard Error 264.3027
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 3762601 1254200 17.9541 2.25472E-05
Residual 16 1117695 69855.92
Total 19 4880295
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -472.685 139.7492 -3.38238 0.003799 -768.9402258 -176.43
Compete 159.8413 28.29157 5.649786 3.62E-05 99.86587622 219.8168
ADRATE 0.048173 0.149395 0.322455 0.751283 -0.268529713 0.364876
Signal 0.037937 0.083011 0.457012 0.653806 -0.138038952 0.213913
Example:Sales Forecasting
 Fit the model
SIGNAL






 3
2
1
0 APIPOP
ADRATE
Compete 



SUMMARY OUTPUT
Regression Statistics
Multiple R 0.882936
R Square 0.779575
Adjusted R Square 0.738246
Standard Error 1.34954
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 103.0599 34.35329 18.86239 1.66815E-05
Residual 16 29.14013 1.821258
Total 19 132.2
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 3.10416 0.520589 5.96278 1.99E-05 2.000559786 4.20776
ADRATE 0.000491 0.000755 0.649331 0.525337 -0.001110874 0.002092
Signal 0.000334 0.000418 0.799258 0.435846 -0.000552489 0.001221
APIPOP 0.004167 0.000738 5.649786 3.62E-05 0.002603667 0.005731
Example:Sales Forecasting
 Fit the model
COMPETE






 3
2
1
0 APIPOP
ADRATE
Signal 



SUMMARY OUTPUT
Regression Statistics
Multiple R 0.512244
R Square 0.262394
Adjusted R Square 0.124092
Standard Error 790.8387
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 3559789 1186596 1.897261 0.170774675
Residual 16 10006813 625425.8
Total 19 13566602
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 5.171093 547.6089 0.009443 0.992582 -1155.707711 1166.05
APIPOP 0.339655 0.743207 0.457012 0.653806 -1.235874129 1.915184
Compete 114.8227 143.6617 0.799258 0.435846 -189.7263711 419.3718
ADRATE -0.38091 0.438238 -0.86919 0.397593 -1.309935875 0.548109
Example:Sales Forecasting
 Fit the model
COMPETE






 3
2
1
0 APIPOP
Signal
ADRATE 



SUMMARY OUTPUT
Regression Statistics
Multiple R 0.399084
R Square 0.159268
Adjusted R Square 0.001631
Standard Error 440.8588
Observations 20
ANOVA
df SS MS F Significance F
Regression 3 589101.7 196367.2 1.010346 0.413876018
Residual 16 3109703 194356.5
Total 19 3698805
Coefficients
Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 253.7304 298.6063 0.849716 0.408018 -379.2865355 886.7474
Signal -0.11837 0.136186 -0.86919 0.397593 -0.407073832 0.170329
APIPOP 0.134029 0.415653 0.322455 0.751283 -0.747116077 1.015175
Compete 52.3446 80.61309 0.649331 0.525337 -118.5474784 223.2367
Example:Sales Forecasting
 VIF calculation Results:
 There is no significant multicollinearity.
Variable R- Squared VIF
ADRATE 0.159268 1.19
COMPETE 0.779575 4.54
SIGNAL 0.262394 1.36
APIPOP 0.770978 4.36
Qualitative Independent Variables
 Many variables of interest in business, economics,
and social and biological sciences are not
quantitative but are qualitative.
 Examples of qualitative variables are gender
(male, female), purchase status (purchase, no
purchase), and type of firms.
 Qualitative variables can also be used in multiple
regression.
Qualitative Independent Variables
 An economist wished to relate the speed with which a
particular insurance innovation is adopted (y) to the size of
the insurance firm (x1) and the type of firm. The dependent
variable is measured by the number of months elapsed
between the time the first firm adopted the innovation and
and the time the given firm adopted the innovation. The
first independent variable, size of the firm, is quantitative,
and measured by the amount of total assets of the firm. The
second independent variable, type of firm, is qualitative
and is composed of two classes-Stock companies and
mutual companies.
Indicator variables
 Indicator, or dummy variables are used to
determine the relationship between qualitative
independent variables and a dependent variable.
 Indicator variables take on the values 0
and 1.
 For the insurance innovation example, where the
qualitative variable has two classes, we might
define the indicator variable x2 as follows:
otherwise
0
company
stock
if
1
2 
x
Indicator variables
 A qualitative variable with c classes will be
represented by c-1 indicator variables.
 A regression function with an indicator
variable with two levels (c = 2) will yield
two estimated lines.
Interpretation of Regression Coefficients
 In our insurance innovation example, the
regression model is:
 Where:





 


 2
2
1
1
0 x
x
y
firm
of
size
1 
x
otherwise
0
company
stock
if
1
2 
x
Interpretation of Regression Coefficients
 To understand the meaning of the
regression coefficients in this model,
consider first the case of mutual firm. For
such a firm, x2 = 0 and we have:
 For a stock firm x2 = 1 and the response
function is:
firms
Mutual
x
b
b
b
x
b
b
yi 1
1
0
2
1
1
0 )
0
(
ˆ 




firms
Stock
x
b
b
b
b
x
b
b
yi 1
1
2
0
2
1
1
0 )
(
)
1
(
ˆ 





Interpretation of Regression Coefficients
 The response function for the mutual firms is a
straight line, with y intercept 0 and slope 1.
 For stock firms, this also is a straight line, with the
same slope 1 but with y intercept 0+2.
 With reference to the insurance innovation
example, the mean time elapsed before the
innovation is adopted is linear function of size of
firm (x1), with the same slope 1for both types of
firms.
Interpretation of Regression Coefficients
 2 indicates how much lower or higher the
response function for stock firm is than the one for
the mutual firm.
 2 measures the differential effect of type of firms.
 In general, 2 shows how much higher (lower) the
mean response line is for the class coded 1 than
the line for the class coded 0, for any level of x1.
Example: Insurance Innovation Adoption
 Here is the data set for the insurance innovation example:
Months Elapsed Size type of firm Type
17 151 0 Mutual
26 92 0 Mutual
21 175 0 Mutual
30 31 0 Mutual
22 104 0 Mutual
0 277 0 Mutual
12 210 0 Mutual
19 120 0 Mutual
4 290 0 Mutual
16 238 1 Stock
28 164 1 Stock
15 272 1 Stock
11 295 1 Stock
38 68 1 Stock
31 85 1 Stock
21 224 1 Stock
20 166 1 Stock
13 305 1 Stock
30 124 1 Stock
14 246 1 Stock
Example: Insurance Innovation Adoption
 Fitting the regression model
Where


 fitted response function is:



 


 2
2
1
1
0 x
x
y
firm
of
size
1 
x
otherwise
0
company
stock
if
1
2 
x
2
1 77
.
8
1061
.
87
.
33
ˆ x
x
y 


Example: Insurance Innovation Adoption
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.95993655
R Square 0.92147818
Adjusted R Square 0.91224031
Standard Error 2.78630562
Observations 20
ANOVA
df SS MS F Significance F
Regression 2 1548.820517 774.4103 99.75016 4.04966E-10
Residual 17 131.979483 7.763499
Total 19 1680.8
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 33.8698658 1.562588138 21.67549 8E-14 30.57308841 37.16664321
Size -0.10608882 0.007799653 -13.6017 1.45E-10 -0.122544675 -0.089632969
type of firm 8.76797549 1.286421264 6.815789 3.01E-06 6.053860079 11.4820909
Example: Insurance Innovation Adoption
 The fitted response function is:
 Stock firms response function is:
 Mutual firms response function is:
 Interpretation ?
2
1 77
.
8
1061
.
87
.
33
ˆ x
x
y 


1
1061
.
)
77
.
8
87
.
33
(
ˆ x
y 


1
1061
.
87
.
33
ˆ x
y 

Accounting for Seasonality in a Multiple
regression Model
 Seasonal Patterns are not easily accounted for by
the typical causal variables that we use in
regression analysis.
 An indicator variable can be used effectively to
account for seasonality in our time series data.
 The number of seasonal indicator variables to use
depends on the data.
 If we have p periods in our data series, we can not
use more than P-1 seasonal indicator variables.
Example: Private Housing Starts (PHS)
 Housing starts in the United States measured in
thousands of units. These data are plotted for 1990
Q1 through 1999Q4. There are typically few
housing starts during the first quarter of the year
(January, February, March); there is usually a big
increase in the second quarter of (April, May,
June), followed by some decline in the third
quarter (July, August, September), and further
decline in the fourth quarter (October, November,
December).
Example: Private Housing Starts (PHS)
Private Housing Starts (PHS) in Thousands of Units
0
50
100
150
200
250
300
350
400
Mar-90
Jul-90
Nov-90
Mar-91
Jul-91
Nov-91
Mar-92
Jul-92
Nov-92
Mar-93
Jul-93
Nov-93
Mar-94
Jul-94
Nov-94
Mar-95
Jul-95
Nov-95
Mar-96
Jul-96
Nov-96
Mar-97
Jul-97
Nov-97
Mar-98
Jul-98
Nov-98
1 1
1
1
1
1 1
1
"1" marks the first quarter of each year.
Example: Private Housing Starts (PHS)
 To Account for and measure this seasonality in a
regression model, we will use three dummy
variables: Q2 for the second quarter, Q3 for the
third quarter, and Q4 for the fourth quarter. These
will be coded as follows:
 Q2 = 1 for all second quarters and zero otherwise.
 Q3 = 1 for all third quarters and zero otherwise
 Q4 = 1 for all fourth quarters and zero otherwise.
Example: Private Housing Starts (PHS)
 Data for private housing starts (PHS), the
mortgage rate (MR), and these seasonal indicator
variables are shown in the following slide.
 Examine the data carefully to verify your
understanding of the coding for Q2, Q3, Q4.
 Since we have assigned dummy variables for the
second, third, and fourth quarters, the first quarter
is the base quarter for our regression model.
 Note that any quarter could be used as the base,
with indicator variables to adjust for differences in
other quarters.
Example: Private Housing Starts (PHS)
PERIOD PHS MR Q2 Q3 Q4
31-Mar-90 217 10.1202 0 0 0
30-Jun-90 271.3 10.3372 1 0 0
30-Sep-90 233 10.1033 0 1 0
31-Dec-90 173.6 9.9547 0 0 1
31-Mar-91 146.7 9.5008 0 0 0
30-Jun-91 254.1 9.5265 1 0 0
30-Sep-91 239.8 9.2755 0 1 0
31-Dec-91 199.8 8.6882 0 0 1
31-Mar-92 218.5 8.7098 0 0 0
30-Jun-92 296.4 8.6782 1 0 0
30-Sep-92 276.4 8.0085 0 1 0
31-Dec-92 238.8 8.2052 0 0 1
31-Mar-93 213.2 7.7332 0 0 0
30-Jun-93 323.7 7.4515 1 0 0
30-Sep-93 309.3 7.0778 0 1 0
31-Dec-93 279.4 7.0537 0 0 1
31-Mar-94 252.6 7.2958 0 0 0
30-Jun-94 354.2 8.4370 1 0 0
30-Sep-94 325.7 8.5882 0 1 0
31-Dec-94 265.9 9.0977 0 0 1
31-Mar-95 214.2 8.8123 0 0 0
30-Jun-95 296.7 7.9470 1 0 0
30-Sep-95 308.2 7.7012 0 1 0
31-Dec-95 257.2 7.3508 0 0 1
31-Mar-96 240 7.2430 0 0 0
30-Jun-96 344.5 8.1050 1 0 0
30-Sep-96 324 8.1590 0 1 0
31-Dec-96 252.4 7.7102 0 0 1
31-Mar-97 237.8 7.7905 0 0 0
30-Jun-97 324.5 7.9255 1 0 0
30-Sep-97 314.6 7.4692 0 1 0
31-Dec-97 256.8 7.1980 0 0 1
31-Mar-98 258.4 7.0547 0 0 0
30-Jun-98 360.4 7.0938 1 0 0
30-Sep-98 348 6.8657 0 1 0
31-Dec-98 304.6 6.7633 0 0 1
31-Mar-99 294.1 6.8805 0 0 0
30-Jun-99 377.1 7.2037 1 0 0
30-Sep-99 355.6 7.7990 0 1 0
31-Dec-99 308.1 7.8338 0 0 1
Example: Private Housing Starts (PHS)
 The regression model for private housing
starts (PHS) is:
 In this model we expect b1 to have a
negative sign, and we would expect b2, b3,
b4 all to have positive signs. Why?
 Regression results for this model are shown
in the next slide.
)
4
(
)
3
(
)
2
(
)
( 4
3
2
1
0 Q
Q
Q
MR
PHS 



 




Example: Private Housing Starts (PHS)
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.885398221
R Square 0.78393001
Adjusted R Square 0.759236296
Standard Error 26.4498851
Observations 40
ANOVA
df SS MS F Significance F
Regression 4 88837.93624 22209.48406 31.74613731 3.33637E-11
Residual 35 24485.87476 699.5964217
Total 39 113323.811
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 473.0650749 35.54169837 13.31014264 2.93931E-15 400.9115031 545.2186467
MR -30.04838192 4.257226391 -7.058206249 3.21421E-08 -38.69102153 -21.40574231
Q2 95.74106935 11.84748487 8.081130334 1.6292E-09 71.689367 119.7927717
Q3 73.92904763 11.82881519 6.249911462 3.62313E-07 49.91524679 97.94284847
Q4 20.54778131 11.84139803 1.73524961 0.091495355 -3.491564078 44.5871267
Example: Private Housing Starts (PHS)
 Use the prediction equation to make a
forecast for each of the fourth quarter of
1999.
 Prediction equation:
)
4
(
55
.
20
)
3
(
93
.
73
)
2
(
74
.
95
)
(
05
.
30
06
.
473
ˆ Q
Q
Q
MR
S
PH 




Example: Private Housing Starts (PHS)
0
50
100
150
200
250
300
350
400
Mar-90
Jul-90
Nov-90
Mar-91
Jul-91
Nov-91
Mar-92
Jul-92
Nov-92
Mar-93
Jul-93
Nov-93
Mar-94
Jul-94
Nov-94
Mar-95
Jul-95
Nov-95
Mar-96
Jul-96
Nov-96
Mar-97
Jul-97
Nov-97
Mar-98
Jul-98
Nov-98
PHS PHSF1 PHSF2
Private Housing Starts (PHS) with a Simple Regression Forecast (PHSF1) and a Multiple Regression Forecast (PHSF2) in
Thousands of Units
Regression Diagnostics and Residual
Analysis
 It is important to check the adequacy of the model before it
becomes part of the decision making process.
 Residual plots can be used to check the model
assumptions.
 It is important to study outlying observations to decide
whether they should be retained or eliminated.
 If retained, whether their influence should be reduced in
the fitting process or revise the regression function.
Time Series Data and the Problem of
Serial Correlation
 In the regression models we assume that the
errors i are independent.
 In business and economics, many regression
applications involve time series data.
 For such data, the assumption of
uncorrelated or independent error terms is
often not appropriate.
Problems of Serial Correlation
 If the error terms in the regression model are
autocorrelated, the use of ordinary least squares
procedures has a number of important
consequences
 MSE underestimate the variance of the error terms
 The confidence intervals and tests using the t and F
distribution are no longer strictly applicable.
 The standard error of the regression coefficients
underestimate the variability of the estimated regression
coefficients. Spurious regression can result.
First order serial correlation
 The error term in current period is directly related
to the error term in the previous time period.
 Let the subscript t represent time, then the simple
linear regression model is:
 Where
 t = error at time t
  = the parameter that measures correlation between
adjacent error terms
 t normally distributed error terms with mean zero and
variance 2

t
t
t x
y 

 

 1
0
t
t
t 


 
 1
Example
 The effect of positive serial correlation in a
simple linear regression model.
 Misleading forecasts of future y values.
 Standard error of the estimate, S y.x will
underestimate the variability of the y’s about
the true regression line.
 Strong autocorrelation can make two unrelated
variables appear to be related.
Durbin-Watson Test for Serial
Correlation
 Recall the first-order serial correlation model
 The hypothesis to be tested are:
 The alternative hypothesis is  > 0 since in
business and economic time series tend to show
positive correlation.
t
t
t 


 
 1
0
:
0
:
0




a
H
H
t
t
t x
y 

 

 1
0
Durbin-Watson Test for Serial
Correlation
 The Durbin-Watson statistic is defined as
 Where






 n
t
t
n
t
t
t
e
e
e
DW
1
2
2
2
1)
(
t
period
for time
residual
the
ˆ 

 t
t
t y
y
e
1
-
t
period
for time
residual
the
ˆ 1
1
1 

 

 t
t
t y
y
e
Durbin-Watson Test for Serial
Correlation
 The auto correlation coefficient  can be
estimated by the lag 1 residual
autocorrelation r1(e)
 And it can be shown that





 n
t
t
n
t
t
t
e
e
e
e
r
1
2
2
1
1 )
(
))
(
1
(
2 1 e
r
DW 

Durbin-Watson Test for Serial
Correlation
 Since –1 < r1(e) < 1 then 0 < DW < 4
 If r1(e) = 0, then DW = 2 (there is no
correlation.)
 If r1(e) > 0, then DW < 2 (positive
correlation)
 If r1(e) < 0, Then DW > 2 (negative
correlation)
Durbin-Watson Test for Serial
Correlation
 Decision rule:
 If DW > U, Do not reject H0.
 If DW < L, Reject H0
 If L  DW  U, the test is inconclusive.
 The critical Upper (U) an Lower (L) bound can be
found in Durbin-Watson table of your text book.
 To use this table you need to know The
significance level () The number of independent
parameters in the model (k), and the sample size
(n).
Example
 The Blaisdell Company wished to predict
its sales by using industry sales as a
predictor variable. The following table
gives seasonally adjusted quarterly data on
company sales and industry sales for the
period 1983-1987.
Example
Year Quarter t CompSale InduSale
1983 1 1 20.96 127.3
2 2 21.4 130
3 3 21.96 132.7
4 4 21.52 129.4
1984 1 5 22.39 135
2 6 22.76 137.1
3 7 23.48 141.2
4 8 23.66 142.8
1985 1 9 24.1 145.5
2 10 24.01 145.3
3 11 24.54 148.3
4 12 24.3 146.4
1986 1 13 25 150.2
2 14 25.64 153.1
3 15 26.36 157.3
4 16 26.98 160.7
1987 1 17 27.52 164.2
2 18 27.78 165.6
3 19 28.24 168.7
4 20 28.78 171.7
Example
Blaisdell Company Example
0
5
10
15
20
25
30
35
0 50 100 150 200
Industry sales($ millions)
Company
Sales
($
millions)
Example
 The scatter plot suggests that a linear regression
model is appropriate.
 Least squares method was used to fit a regression
line to the data.
 The residuals were plotted against the fitted
values.
 The plot shows that the residuals are consistently
above or below the fitted value for extended
periods.
Example
Example
 To confirm this graphic diagnosis we will use the
Durbin-Watson test for:
 The test statistic is:
0
:
0
:
0




a
H
H






 n
t
t
n
t
t
t
e
e
e
DW
1
2
2
2
1)
(
Example
Year Quarter t Company sales(y) Industry sales(x) et et-et-1 (et-et-1)^2 et^2
1983 1 1 20.96 127.3 -0.02605 0.000679
2 2 21.4 130 -0.06202 -0.03596 0.001293 0.003846
3 3 21.96 132.7 0.022021 0.084036 0.007062 0.000485
4 4 21.52 129.4 0.163754 0.141733 0.020088 0.026815
1984 1 5 22.39 135 0.04657 -0.11718 0.013732 0.002169
2 6 22.76 137.1 0.046377 -0.00019 3.76E-08 0.002151
3 7 23.48 141.2 0.043617 -0.00276 7.61E-06 0.001902
4 8 23.66 142.8 -0.05844 -0.10205 0.010415 0.003415
1985 1 9 24.1 145.5 -0.0944 -0.03596 0.001293 0.008911
2 10 24.01 145.3 -0.14914 -0.05474 0.002997 0.022243
3 11 24.54 148.3 -0.14799 0.001152 1.33E-06 0.021901
4 12 24.3 146.4 -0.05305 0.094937 0.009013 0.002815
1986 1 13 25 150.2 -0.02293 0.030125 0.000908 0.000526
2 14 25.64 153.1 0.105852 0.12878 0.016584 0.011205
3 15 26.36 157.3 0.085464 -0.02039 0.000416 0.007304
4 16 26.98 160.7 0.106102 0.020638 0.000426 0.011258
1987 1 17 27.52 164.2 0.029112 -0.07699 0.005927 0.000848
2 18 27.78 165.6 0.042316 0.013204 0.000174 0.001791
3 19 28.24 168.7 -0.04416 -0.08648 0.007478 0.00195
4 20 28.78 171.7 -0.03301 0.011152 0.000124 0.00109
0.097941 0.133302
Blaisdell Company Example
30
35
Example
 Using Durbin Watson table of your text
book, for k = 1, and n=20, and using  =
.01 we find U = 1.15, and L = .95
 Since DW = .735 falls below L = .95 , we
reject the null hypothesis, namely, that the
error terms are positively autocorrelated.
735
.
13330
.
09794
.


DW
Remedial Measures for Serial Correlation
 Addition of one or more independent
variables to the regression model.
 One major cause of autocorrelated error terms
is the omission from the model of one or more
key variables that have time-ordered effects on
the dependent variable.
 Use transformed variables.
 The regression model is specified in terms of
changes rather than levels.
Extensions of the Multiple Regression
Model
 In some situations, nonlinear terms may be needed
as independent variables in a regression analysis.
 Business or economic logic may suggest that non-
linearity is expected.
 A graphic display of the data may be helpful in
determining whether non-linearity is present.
 One common economic cause for non-linearity is
diminishing returns.
 Fore example, the effect of advertising on sales may
diminish as increased advertising is used.
Extensions of the Multiple Regression
Model
 Some common forms of nonlinear functions
are :
)
(
)
( 2
2
1
0 X
X
Y 

 


)
(
)
(
)
( 3
3
2
2
1
0 X
X
X
Y 


 



)
1
(
1
0 X
Y 
 

1
0 

X
e
Y 
Extensions of the Multiple Regression
Model
 To illustrate the use and interpretation of a
non-linear term, we return to the problem of
developing a forecasting model for private
housing starts (PHS).
 So far we have looked at the following
model
 Where MR is the mortgage rate and Q2, Q3, and Q4 are
indicators variables for quarters 2, 3, and 4.
)
4
(
)
3
(
)
2
(
)
( 4
3
2
1
0 Q
Q
Q
MR
PHS 



 




Example: Private Housing Start
 First we add real disposable personal
income per capita (DPI) as an independent
variable. Our new model for this data set is:
 Regression results for this model are shown
in the next slide.
)
(
)
4
(
)
3
(
)
2
(
)
( 5
4
3
2
1
0 DPI
Q
Q
Q
MR
PHS 




 





Example: Private Housing Start
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.943791346
R Square 0.890742104
Adjusted R Square 0.874187878
Standard Error 19.05542121
Observations 39
ANOVA
df SS MS F Significance F
Regression 5 97690.01942 19538 53.80753 6.51194E-15
Residual 33 11982.59955 363.1091
Total 38 109672.619
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept -31.06403714 105.1938477 -0.2953 0.769613 -245.0826992 182.9546249
MR -20.1992545 4.124906847 -4.8969 2.5E-05 -28.59144723 -11.80706176
Q2 97.03478074 8.900711541 10.90191 1.78E-12 78.9261326 115.1434289
Q3 75.40017073 8.827185877 8.541813 7.17E-10 57.44111179 93.35922967
Q4 20.35306822 8.83373887 2.304015 0.027657 2.380677107 38.32545934
DPI 0.022407799 0.004356973 5.142974 1.21E-05 0.013543464 0.031272134
Example: Private Housing Start
 The prediction model is
 In comparison with the previous model, we
see that the R-squared has improved.It has
changed from 78% to 89%.
 The standard error of the estimate has
decreased from 26.49 for the previous
model to 19.05 for the new model.
)
(
02
.
0
)
4
(
35
.
20
)
3
(
40
.
75
)
2
(
03
.
97
)
(
19
.
20
06
.
31
ˆ DPI
Q
Q
Q
MR
S
PH 






Example: Private Housing Start
 The value of the DW test has changed from 0.88
for the previous model to 0.78 for the new model.
 At 5% level the critical value for DW test, from
Durbin-Watson table, for k = 5, and n = 39 is L=
1.22, and U = 1.79.
 Since The value of the DW test is smaller than
L=1.22, we reject the null hypothesis H0:  =0
 This implies that there is serial correlation in both
models, the assumption of the independence of
the error terms is not valid.
Example: Private Housing Start
 The Plot of PHS against
DPI shows a curve linear
relation.
 Next we introduce a
nonlinear term into the
regression.
 The square of disposable
personal income per capita
(DPI2) is included in the
regression model.
Private Housing Start and Disposable Personal Income
17500
18000
18500
19000
19500
20000
20500
21000
21500
0 50 100 150 200 250 300 350 400
DPI
PHS
Example: Private Housing Start
 We also add the dependent variable, lagged
one quarter, as an independent variable in
order to help reduce serial correlation.
 The third model that we fit to our data set
is:
 Regression results for this model are shown
in the next slide.
)
(
)
(
)
(
)
4
(
)
3
(
)
2
(
)
( 7
2
6
5
4
3
2
1
0 LPHS
DPI
DPI
Q
Q
Q
MR
PHS 






 







Example: Private Housing Start
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.97778626
R Square 0.956065971
Adjusted R Square 0.946145384
Standard Error 12.46719572
Observations 39
ANOVA
df SS MS F Significance F
Regression 7 104854.2589 14979.17985 96.37191 3.07085E-19
Residual 31 4818.360042 155.4309691
Total 38 109672.619
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 716.5926532 1017.664989 0.704153784 0.486593 -1358.949934 2792.13524
MR -13.65521724 3.093504134 -4.414158396 0.000114 -19.96446404 -7.345970448
Q2 106.9813297 6.069780998 17.62523718 1.04E-17 94.60192287 119.3607366
Q3 27.72122303 9.111432565 3.042465916 0.004748 9.138323433 46.30412262
Q4 -13.37855186 7.653050858 -1.748133144 0.09034 -28.98706069 2.22995698
DPI -0.060399279 0.104412354 -0.578468704 0.567127 -0.273349798 0.15255124
DPI SQUARED 0.000335974 0.000536397 0.626354647 0.535668 -0.000758014 0.001429963
LPHS 0.655786939 0.097265424 6.742241114 1.51E-07 0.457412689 0.854161189
Example: Private Housing Start
 The inclusion of DPI2 and Lagged PHS has
increased the R-squared to 96%
 The standard error of the estimate has decreased to
12.45
 The value of the DW test has increased to 2.32
which is greater than U = 1.79 which rule out
positive serial correlation.
 You see that the third model worked best for this
data set.
 The following slide gives the data set.
Example: Private Housing Start
PERIOD PHS MR LPHS Q2 Q3 Q4 DPI DPI SQUARED
30-Jun-90 271.3 10.3372 217 1 0 0 18063 1,631,359.85
30-Sep-90 233 10.1033 271.3 0 1 0 18031 1,625,584.81
31-Dec-90 173.6 9.9547 233 0 0 1 17856 1,594,183.68
31-Mar-91 146.7 9.5008 173.6 0 0 0 17748 1,574,957.52
30-Jun-91 254.1 9.5265 146.7 1 0 0 17861 1,595,076.61
30-Sep-91 239.8 9.2755 254.1 0 1 0 17816 1,587,049.28
31-Dec-91 199.8 8.6882 239.8 0 0 1 17811 1,586,158.61
31-Mar-92 218.5 8.7098 199.8 0 0 0 18000 1,620,000.00
30-Jun-92 296.4 8.6782 218.5 1 0 0 18085 1,635,336.13
30-Sep-92 276.4 8.0085 296.4 0 1 0 18036 1,626,486.48
31-Dec-92 238.8 8.2052 276.4 0 0 1 18330 1,679,944.50
31-Mar-93 213.2 7.7332 238.8 0 0 0 17975 1,615,503.13
30-Jun-93 323.7 7.4515 213.2 1 0 0 18247 1,664,765.05
30-Sep-93 309.3 7.0778 323.7 0 1 0 18246 1,664,582.58
31-Dec-93 279.4 7.0537 309.3 0 0 1 18413 1,695,192.85
31-Mar-94 252.6 7.2958 279.4 0 0 0 18154 1,647,838.58
30-Jun-94 354.2 8.4370 252.6 1 0 0 18409 1,694,456.41
30-Sep-94 325.7 8.5882 354.2 0 1 0 18493 1,709,955.25
31-Dec-94 265.9 9.0977 325.7 0 0 1 18667 1,742,284.45
31-Mar-95 214.2 8.8123 265.9 0 0 0 18834 1,773,597.78
30-Jun-95 296.7 7.9470 214.2 1 0 0 18798 1,766,824.02
30-Sep-95 308.2 7.7012 296.7 0 1 0 18871 1,780,573.21
31-Dec-95 257.2 7.3508 308.2 0 0 1 18942 1,793,996.82
31-Mar-96 240 7.2430 257.2 0 0 0 19071 1,818,515.21
30-Jun-96 344.5 8.1050 240 1 0 0 19081 1,820,422.81
30-Sep-96 324 8.1590 344.5 0 1 0 19161 1,835,719.61
31-Dec-96 252.4 7.7102 324 0 0 1 19152 1,833,995.52
31-Mar-97 237.8 7.7905 252.4 0 0 0 19331 1,868,437.81
30-Jun-97 324.5 7.9255 237.8 1 0 0 19315 1,865,346.13
30-Sep-97 314.6 7.4692 324.5 0 1 0 19385 1,878,891.13
31-Dec-97 256.8 7.1980 314.6 0 0 1 19478 1,896,962.42
31-Mar-98 258.4 7.0547 256.8 0 0 0 19632 1,927,077.12
30-Jun-98 360.4 7.0938 258.4 1 0 0 19719 1,944,194.81
30-Sep-98 348 6.8657 360.4 0 1 0 19905 1,980,963.41
31-Dec-98 304.6 6.7633 348 0 0 1 20194 2,038,980.00
31-Mar-99 294.1 6.8805 304.6 0 0 0 20377 2,076,010.87
30-Jun-99 377.1 7.2037 294.1 1 0 0 20472 2,095,440.74
30-Sep-99 355.6 7.7990 377.1 0 1 0 20756 2,153,982.23
31-Dec-99 308.1 7.8338 355.6 0 0 1 21124 2,231,020.37

More Related Content

What's hot

Multivariate Linear Regression.ppt
Multivariate Linear Regression.pptMultivariate Linear Regression.ppt
Multivariate Linear Regression.pptTanyaWadhwani4
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisASAD ALI
 
Chapter8
Chapter8Chapter8
Chapter8Vu Vo
 
Chapter 7 sampling distributions
Chapter 7 sampling distributionsChapter 7 sampling distributions
Chapter 7 sampling distributionsmeharahutsham
 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression pptSantosh Bhaskar
 
Regression diagnostics
Regression diagnosticsRegression diagnostics
Regression diagnosticsdermengles
 
Exponential probability distribution
Exponential probability distributionExponential probability distribution
Exponential probability distributionMuhammad Bilal Tariq
 
Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSSLNIPE
 
Probability basics and bayes' theorem
Probability basics and bayes' theoremProbability basics and bayes' theorem
Probability basics and bayes' theoremBalaji P
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regressionJames Neill
 
Types of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics IITypes of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics IIRupak Roy
 

What's hot (20)

Probability
ProbabilityProbability
Probability
 
Regression
RegressionRegression
Regression
 
Poisson Probability Distributions
Poisson Probability DistributionsPoisson Probability Distributions
Poisson Probability Distributions
 
Multivariate Linear Regression.ppt
Multivariate Linear Regression.pptMultivariate Linear Regression.ppt
Multivariate Linear Regression.ppt
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Chapter8
Chapter8Chapter8
Chapter8
 
Chapter 7 sampling distributions
Chapter 7 sampling distributionsChapter 7 sampling distributions
Chapter 7 sampling distributions
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression ppt
 
7 functions
7   functions7   functions
7 functions
 
Regression diagnostics
Regression diagnosticsRegression diagnostics
Regression diagnostics
 
Exponential probability distribution
Exponential probability distributionExponential probability distribution
Exponential probability distribution
 
Regression
RegressionRegression
Regression
 
Logistic regression with SPSS
Logistic regression with SPSSLogistic regression with SPSS
Logistic regression with SPSS
 
Probability basics and bayes' theorem
Probability basics and bayes' theoremProbability basics and bayes' theorem
Probability basics and bayes' theorem
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Types of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics IITypes of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics II
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
poisson distribution
poisson distributionpoisson distribution
poisson distribution
 

Similar to Multiple Regression.ppt

Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptxiris765749
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisHARISH Kumar H R
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptxBHUSHANKPATEL
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxmadlynplamondon
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.pptRufesh
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectSurya Chandra
 
SURE Model_Panel data.pptx
SURE Model_Panel data.pptxSURE Model_Panel data.pptx
SURE Model_Panel data.pptxGeetaShreeprabha
 
Bba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression modelsBba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression modelsStephen Ong
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inferenceKemal İnciroğlu
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24Ruru Chowdhury
 

Similar to Multiple Regression.ppt (20)

Lecture - 8 MLR.pptx
Lecture - 8 MLR.pptxLecture - 8 MLR.pptx
Lecture - 8 MLR.pptx
 
Multinomial Logistic Regression Analysis
Multinomial Logistic Regression AnalysisMultinomial Logistic Regression Analysis
Multinomial Logistic Regression Analysis
 
Multiple Linear Regression.pptx
Multiple Linear Regression.pptxMultiple Linear Regression.pptx
Multiple Linear Regression.pptx
 
Ders 2 ols .ppt
Ders 2 ols .pptDers 2 ols .ppt
Ders 2 ols .ppt
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Demand Estimation
Demand EstimationDemand Estimation
Demand Estimation
 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
 
604_multiplee.ppt
604_multiplee.ppt604_multiplee.ppt
604_multiplee.ppt
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Exploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems ProjectExploring Support Vector Regression - Signals and Systems Project
Exploring Support Vector Regression - Signals and Systems Project
 
report
reportreport
report
 
SURE Model_Panel data.pptx
SURE Model_Panel data.pptxSURE Model_Panel data.pptx
SURE Model_Panel data.pptx
 
Rsh qam11 ch04 ge
Rsh qam11 ch04 geRsh qam11 ch04 ge
Rsh qam11 ch04 ge
 
Chapter 14
Chapter 14 Chapter 14
Chapter 14
 
Bba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression modelsBba 3274 qm week 6 part 1 regression models
Bba 3274 qm week 6 part 1 regression models
 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24
 

Recently uploaded

VIP Kolkata Call Girl Kalighat 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kalighat 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kalighat 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kalighat 👉 8250192130 Available With Roomdivyansh0kumar0
 
VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...
VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...
VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...Suhani Kapoor
 
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service BikanerLow Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service BikanerSuhani Kapoor
 
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service NashikRussian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashikranjana rawat
 
Call Girls Mumbai Gayatri 8617697112 Independent Escort Service Mumbai
Call Girls Mumbai Gayatri 8617697112 Independent Escort Service MumbaiCall Girls Mumbai Gayatri 8617697112 Independent Escort Service Mumbai
Call Girls Mumbai Gayatri 8617697112 Independent Escort Service MumbaiCall girls in Ahmedabad High profile
 
(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Service
(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Service(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Service
(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and ChallengesSustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and ChallengesDr. Salem Baidas
 
webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...
webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...
webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...Cluster TWEED
 
VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...
VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...
VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...Suhani Kapoor
 
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130Suhani Kapoor
 
(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...Suhani Kapoor
 
Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999
Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999
Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999Tina Ji
 
(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...ranjana rawat
 
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...Call Girls in Nagpur High Profile
 

Recently uploaded (20)

VIP Kolkata Call Girl Kalighat 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kalighat 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kalighat 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kalighat 👉 8250192130 Available With Room
 
VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...
VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...
VIP Call Girls Saharanpur Aaradhya 8250192130 Independent Escort Service Saha...
 
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service BikanerLow Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
Low Rate Call Girls Bikaner Anika 8250192130 Independent Escort Service Bikaner
 
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service NashikRussian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
Russian Call Girls Nashik Anjali 7001305949 Independent Escort Service Nashik
 
Call Girls Mumbai Gayatri 8617697112 Independent Escort Service Mumbai
Call Girls Mumbai Gayatri 8617697112 Independent Escort Service MumbaiCall Girls Mumbai Gayatri 8617697112 Independent Escort Service Mumbai
Call Girls Mumbai Gayatri 8617697112 Independent Escort Service Mumbai
 
(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Service
(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Service(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Service
(ANAYA) Call Girls Hadapsar ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCeCall Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
Call Girls In Yamuna Vihar꧁❤ 🔝 9953056974🔝❤꧂ Escort ServiCe
 
(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(RIYA) Kalyani Nagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and ChallengesSustainable Clothing Strategies and Challenges
Sustainable Clothing Strategies and Challenges
 
Sustainable Packaging
Sustainable PackagingSustainable Packaging
Sustainable Packaging
 
Green Banking
Green Banking Green Banking
Green Banking
 
webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...
webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...
webinaire-green-mirror-episode-2-Smart contracts and virtual purchase agreeme...
 
VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...
VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...
VIP Call Girls Mahadevpur Colony ( Hyderabad ) Phone 8250192130 | ₹5k To 25k ...
 
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
VIP Call Girls Service Chaitanyapuri Hyderabad Call +91-8250192130
 
(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Talegaon Dabhade ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
VIP Call Girls Moti Ganpur ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With R...
 
Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999
Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999
Call Girls In Faridabad(Ballabgarh) Book ☎ 8168257667, @4999
 
Model Call Girl in Rajiv Chowk Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Rajiv Chowk Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Rajiv Chowk Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Rajiv Chowk Delhi reach out to us at 🔝9953056974🔝
 
(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
(NANDITA) Hadapsar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune ...
 
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...Booking open Available Pune Call Girls Parvati Darshan  6297143586 Call Hot I...
Booking open Available Pune Call Girls Parvati Darshan 6297143586 Call Hot I...
 

Multiple Regression.ppt

  • 1. Applied Business Forecasting and planning Multiple Regression Analysis
  • 2. Introduction  In simple linear regression we studied the relationship between one explanatory variable and one response variable.  Now, we look at situations where several explanatory variables works together to explain the response.
  • 3. Introduction  Following our principles of data analysis, we look first at each variable separately, then at relationships among the variables.  We look at the distribution of each variable to be used in multiple regression to determine if there are any unusual patterns that may be important in building our regression analysis.
  • 4. Multiple Regression  Example. In a study of direct operating cost, Y, for 67 branch offices of consumer finance charge, four independent variables were considered:  X1: Average size of loan outstanding during the year,  X2 : Average number of loans outstanding,  X3 : Total number of new loan applications processed, and  X4 : Office salary scale index.  The model for this example is             4 4 3 3 2 2 1 1 0 x X x x Y
  • 5. Formal Statement of the Model  General regression model • 0, 1, , k are parameters • X1, X2, …,Xk are known constants •  , the error terms are independent N(o, 2)            k k x x x Y  2 2 1 1 0
  • 6. Estimating the parameters of the model  The values of the regression parameters i are not known. We estimate them from data.  As in the simple linear regression case, we use the least-squares method to fit a linear function to the data.  The least-squares method chooses the b’s that make the sum of squares of the residuals as small as possible. k k x b x b x b b y       2 2 1 1 0 ˆ
  • 7. Estimating the parameters of the model  The least-squares estimates are the values that minimize the quantity  Since the formulas for the least-squares estimates are complicated and hand calculation is out of question, we are content to understand the least- squares principle and let software do the computations. 2 1 ) ˆ (    n i i i y y
  • 8. Estimating the parameters of the model  The estimate of i is bi and it indicates the change in the mean response per unit increase in Xi when the rest of the independent variables in the model are held constant.  The parameters i are frequently called partial regression coefficients because they reflect the partial effect of one independent variable when the rest of independent variables are included in the model and are held constant
  • 9. Estimating the parameters of the model  The observed variability of the responses about this fitted model is measured by the variance and the regression standard error       n i i i y y k n S 1 2 2 ) ˆ ( 1 1 2 s s 
  • 10. Estimating the parameters of the model  In the model 2 and  measure the variability of the responses about the population regression equation.  It is natural to estimate 2 by s2 and  by s.
  • 11. Analysis of Variance Table  The basic idea of the regression ANOVA table are the same in simple and multiple regression.  The sum of squares decomposition and the associated degrees of freedom are:  df: SSE SSR SST y y y y y y i i i i           2 2 2 ) ˆ ( ) ˆ ( ) ( ) 1 ( 1      k n k n
  • 12. Analysis of Variance Table Source Sum of Squares df Mean Square F-test Regression SSR k MSR= SSR/k MSR/MSE Error SSE n-k-1 MSE= SSE/n-k-1 Total SST n-1
  • 13. F-test for the overall fit of the model  To test the statistical significance of the regression relation between the response variable y and the set of variables x1,…, xk, i.e. to choose between the alternatives:  We use the test statistic: zero equal ) , 1 ( all not : 0 : 2 1 0 k i H H i a k            MSE MSR F 
  • 14. F-test for the overall fit of the model  The decision rule at significance level  is:  Reject H0 if  Where the critical value F(, k, n-k-1) can be found from an F-table.  The existence of a regression relation by itself does not assure that useful prediction can be made by using it.  Note that when k=1, this test reduces to the F-test for testing in simple linear regression whether or not 1= 0 ) 1 , ; (    k n k F F 
  • 15. Interval estimation of i  For our regression model, we have:  Therefore, an interval estimate for i with 1-  confidence coefficient is: Where freedom of degrees 1 - k - n on with distributi - t a has ) ( i i i b s b   ) ( ) 1 ; 2 ( i i b s k n t b        2 ) ( ) ( x x MSE b s i
  • 16. Significance tests for i  To test:  We may use the test statistic:  Reject H0 if 0 : 0 : 0   i a i H H   ) ( i i b s b t  ) 1 ; 2 ( ) 1 ; 2 (        k n t t or k n t t  
  • 17. Multiple regression model Building  Often we have many explanatory variables, and our goal is to use these to explain the variation in the response variable.  A model using just a few of the variables often predicts about as well as the model using all the explanatory variables.
  • 18. Multiple regression model Building  We may find that the reciprocal of a variable is a better choice than the variable itself, or that including the square of an explanatory variable improves prediction.  We may find that the effect of one explanatory variable may depends upon the value of another explanatory variable. We account for this situation by including interaction terms.
  • 19. Multiple regression model Building  The simplest way to construct an interaction term is to multiply the two explanatory variables together.  How can we find a good model?
  • 20. Selecting the best Regression equation.  After a lengthy list of potentially useful independent variables has been compiled, some of the independent variables can be screened out. An independent variable  May not be fundamental to the problem  May be subject to large measurement error  May effectively duplicate another independent variable in the list.
  • 21. Selecting the best Regression Equation.  Once the investigator has tentatively decided upon the functional forms of the regression relations (linear, quadratic, etc.), the next step is to obtain a subset of the explanatory variables (x) that “best” explain the variability in the response variable y.
  • 22. Selecting the best Regression Equation.  An automatic search procedure that develops sequentially the subset of explanatory variables to be included in the regression model is called stepwise procedure.  It was developed to economize on computational efforts.  It will end with the identification of a single regression model as “best”.
  • 23. Example: Sales Forecasting  Sales Forecasting  Multiple regression is a popular technique for predicting product sales with the help of other variables that are likely to have a bearing on sales.  Example  The growth of cable television has created vast new potential in the home entertainment business. The following table gives the values of several variables measured in a random sample of 20 local television stations which offer their programming to cable subscribers. A TV industry analyst wants to build a statistical model for predicting the number of subscribers that a cable station can expect.
  • 24. Example:Sales Forecasting  Y = Number of cable subscribers (SUSCRIB)  X1 = Advertising rate which the station charges local advertisers for one minute of prim time space (ADRATE)  X2 = Kilowatt power of the station’s non-cable signal (KILOWATT)  X3 = Number of families living in the station’s area of dominant influence (ADI), a geographical division of radio and TV audiences (APIPOP)  X4 = Number of competing stations in the ADI (COMPETE)
  • 25. Example:Sales Forecasting  The sample data are fitted by a multiple regression model using Excel program.  The marginal t-test provides a way of choosing the variables for inclusion in the equation.  The fitted Model is SIGNAL COMPETE          4 3 2 1 0 APIPOP ADRATE SUBSCRIBE     
  • 26. Example:Sales Forecasting  Excel Summary output SUMMARY OUTPUT Regression Statistics Multiple R 0.884267744 R Square 0.781929444 Adjusted R Square 0.723777295 Standard Error 142.9354188 Observations 20 ANOVA df SS MS F Significance F Regression 4 1098857.84 274714.4601 13.44626923 7.52E-05 Residual 15 306458.0092 20430.53395 Total 19 1405315.85 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 51.42007002 98.97458277 0.51952803 0.610973806 -159.539 262.3795 AD_Rate -0.267196347 0.081055107 -3.296477624 0.004894126 -0.43996 -0.09443 Signal -0.020105139 0.045184758 -0.444954014 0.662706578 -0.11641 0.076204 APIPOP 0.440333955 0.135200486 3.256896248 0.005307766 0.152161 0.728507 Compete 16.230071 26.47854322 0.61295181 0.549089662 -40.2076 72.66778
  • 27. Example:Sales Forecasting  Do we need all the four variables in the model?  Based on the partial t-test, the variables signal and compete are the least significant variables in our model.  Let’s drop the least significant variables one at a time.
  • 28. Example:Sales Forecasting  Excel Summary Output SUMMARY OUTPUT Regression Statistics Multiple R 0.882638739 R Square 0.779051144 Adjusted R Square 0.737623233 Standard Error 139.3069743 Observations 20 ANOVA df SS MS F Significance F Regression 3 1094812.92 364937.64 18.80498277 1.69966E-05 Residual 16 310502.9296 19406.4331 Total 19 1405315.85 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 51.31610447 96.4618242 0.531983558 0.602046756 -153.1737817 255.806 AD_Rate -0.259538026 0.077195983 -3.36206646 0.003965102 -0.423186162 -0.09589 APIPOP 0.433505145 0.130916687 3.311305499 0.004412929 0.15597423 0.711036 Compete 13.92154404 25.30614013 0.550125146 0.589831583 -39.72506442 67.56815
  • 29. Example:Sales Forecasting  The variable Compete is the next variable to get rid of.
  • 30. Example:Sales Forecasting  Excel Summary Output SUMMARY OUTPUT Regression Statistics Multiple R 0.8802681 R Square 0.774871928 Adjusted R Square 0.748386273 Standard Error 136.4197776 Observations 20 ANOVA df SS MS F Significance F Regression 2 1088939.802 544469.901 29.2562866 3.13078E-06 Residual 17 316376.0474 18610.35573 Total 19 1405315.85 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 96.28121395 50.16415506 1.919322948 0.07188916 -9.556049653 202.1184776 AD_Rate -0.254280696 0.075014548 -3.389751739 0.003484198 -0.41254778 -0.096013612 APIPOP 0.495481252 0.065306012 7.587069489 7.45293E-07 0.357697418 0.633265086
  • 31. Example:Sales Forecasting • All the variables in the model are statistically significant, therefore our final model is: Final Model APIPOP 495 . 0 ADRATE 25 . 0 28 . 96 SUBSCRIBE     
  • 32. Interpreting the Final Model  What is the interpretation of the estimated parameters.  Is the association positive or negative?  Does this make sense intuitively, based on what the data represents?  What other variables could be confounders?  Are there other analysis that you might consider doing? New questions raised?
  • 33. Multicollinearity  In multiple regression analysis, one is often concerned with the nature and significance of the relations between the explanatory variables and the response variable.  Questions that are frequently asked are:  What is the relative importance of the effects of the different independent variables?  What is the magnitude of the effect of a given independent variable on the dependent variable?
  • 34. Multicollinearity  Can any independent variable be dropped from the model because it has little or no effect on the dependent variable?  Should any independent variables not yet included in the model be considered for possible inclusion?  Simple answers can be given to these questions if  The independent variables in the model are uncorrelated among themselves.  They are uncorrelated with any other independent variables that are related to the dependent variable but omitted from the model.
  • 35. Multicollinearity  When the independent variables are correlated among themselves, multicollinearity or colinearity among them is said to exist.  In many non-experimental situations in business, economics, and the social and biological sciences, the independent variables tend to be correlated among themselves.  For example, in a regression of family food expenditures on the variables: family income, family savings, and the age of head of household, the explanatory variables will be correlated among themselves.
  • 36. Multicollinearity  Further, the explanatory variables will also be correlated with other socioeconomic variables not included in the model that do affect family food expenditures, such as family size.
  • 37. Multicollinearity  Some key problems that typically arise when the explanatory variables being considered for the regression model are highly correlated among themselves are: 1. Adding or deleting an explanatory variable changes the regression coefficients. 2. The estimated standard deviations of the regression coefficients become large when the explanatory variables in the regression model are highly correlated with each other. 3. The estimated regression coefficients individually may not be statistically significant even though a definite statistical relation exists between the response variable and the set of explanatory variables.
  • 38. Multicollinearity Diagnostics  A formal method of detecting the presence of multicollinearity that is widely used is by the means of Variance Inflation Factor.  It measures how much the variances of the estimated regression coefficients are inflated as compared to when the independent variables are not linearly related.  Is the coefficient of determination from the regression of the jth independent variable on the remaining k-1 independent variables. k j R VIF j j  , 2 , 1 , 1 1 2    2 j R
  • 39. Multicollinearity Diagnostics  AVIF near 1 suggests that multicollinearity is not a problem for the independent variables.  Its estimated coefficient and associated t value will not change much as the other independent variables are added or deleted from the regression equation.  A VIF much greater than 1 indicates the presence of multicollinearity. A maximum VIF value in excess of 10 is often taken as an indication that the multicollinearity may be unduly influencing the least square estimates.  the estimated coefficient attached to the variable is unstable and its associated t statistic may change considerably as the other independent variables are added or deleted.
  • 40. Multicollinearity Diagnostics  The simple correlation coefficient between all pairs of explanatory variables (i.e., X1, X2, …, Xk ) is helpful in selecting appropriate explanatory variables for a regression model and is also critical for examining multicollinearity.  While it is true that a correlation very close to +1 or –1 does suggest multicollinearity, it is not true (unless there are only two explanatory variables) to infer multicollinearity does not exist when there are no high correlations between any pair of explanatory variables.
  • 41. Example:Sales Forecasting Pearson Correlation Coefficients, N = 20 Prob > |r| under H0: Rho=0 SUBSCRIB ADRATE KILOWATT APIPOP COMPETE SUBSCRIB 1.00000 -0.02848 0.44762 0.90447 0.79832 SUBSCRIB 0.9051 0.0478 <.0001 <.0001 ADRATE -0.02848 1.00000 -0.01021 0.32512 0.34147 ADRATE 0.9051 0.9659 0.1619 0.1406 KILOWATT 0.44762 -0.01021 1.00000 0.45303 0.46895 KILOWATT 0.0478 0.9659 0.0449 0.0370 APIPOP 0.90447 0.32512 0.45303 1.00000 0.87592 APIPOP <.0001 0.1619 0.0449 <.0001 COMPETE 0.79832 0.34147 0.46895 0.87592 1.00000 COMPETE <.0001 0.1406 0.0370 <.0001
  • 42. Example:Sales Forecasting APIPOP 495 . 0 ADRATE 25 . 0 28 . 96 SUBSCRIBE      COMPETE 16.23 APIPOP 44 . 0 SIGNAL .02 - ADRATE 27 . 0 42 . 51 SUBSCRIBE         COMPETE 13.92 APIPOP 43 . 0 ADRATE 26 . 0 32 . 51 SUBSCRIBE       
  • 43. Example:Sales Forecasting  VIF calculation:  Fit the model COMPETE        3 2 1 0 ADRATE SIGNAL APIPOP     SUMMARY OUTPUT Regression Statistics Multiple R 0.878054 R Square 0.770978 Adjusted R Square 0.728036 Standard Error 264.3027 Observations 20 ANOVA df SS MS F Significance F Regression 3 3762601 1254200 17.9541 2.25472E-05 Residual 16 1117695 69855.92 Total 19 4880295 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -472.685 139.7492 -3.38238 0.003799 -768.9402258 -176.43 Compete 159.8413 28.29157 5.649786 3.62E-05 99.86587622 219.8168 ADRATE 0.048173 0.149395 0.322455 0.751283 -0.268529713 0.364876 Signal 0.037937 0.083011 0.457012 0.653806 -0.138038952 0.213913
  • 44. Example:Sales Forecasting  Fit the model SIGNAL        3 2 1 0 APIPOP ADRATE Compete     SUMMARY OUTPUT Regression Statistics Multiple R 0.882936 R Square 0.779575 Adjusted R Square 0.738246 Standard Error 1.34954 Observations 20 ANOVA df SS MS F Significance F Regression 3 103.0599 34.35329 18.86239 1.66815E-05 Residual 16 29.14013 1.821258 Total 19 132.2 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 3.10416 0.520589 5.96278 1.99E-05 2.000559786 4.20776 ADRATE 0.000491 0.000755 0.649331 0.525337 -0.001110874 0.002092 Signal 0.000334 0.000418 0.799258 0.435846 -0.000552489 0.001221 APIPOP 0.004167 0.000738 5.649786 3.62E-05 0.002603667 0.005731
  • 45. Example:Sales Forecasting  Fit the model COMPETE        3 2 1 0 APIPOP ADRATE Signal     SUMMARY OUTPUT Regression Statistics Multiple R 0.512244 R Square 0.262394 Adjusted R Square 0.124092 Standard Error 790.8387 Observations 20 ANOVA df SS MS F Significance F Regression 3 3559789 1186596 1.897261 0.170774675 Residual 16 10006813 625425.8 Total 19 13566602 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 5.171093 547.6089 0.009443 0.992582 -1155.707711 1166.05 APIPOP 0.339655 0.743207 0.457012 0.653806 -1.235874129 1.915184 Compete 114.8227 143.6617 0.799258 0.435846 -189.7263711 419.3718 ADRATE -0.38091 0.438238 -0.86919 0.397593 -1.309935875 0.548109
  • 46. Example:Sales Forecasting  Fit the model COMPETE        3 2 1 0 APIPOP Signal ADRATE     SUMMARY OUTPUT Regression Statistics Multiple R 0.399084 R Square 0.159268 Adjusted R Square 0.001631 Standard Error 440.8588 Observations 20 ANOVA df SS MS F Significance F Regression 3 589101.7 196367.2 1.010346 0.413876018 Residual 16 3109703 194356.5 Total 19 3698805 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 253.7304 298.6063 0.849716 0.408018 -379.2865355 886.7474 Signal -0.11837 0.136186 -0.86919 0.397593 -0.407073832 0.170329 APIPOP 0.134029 0.415653 0.322455 0.751283 -0.747116077 1.015175 Compete 52.3446 80.61309 0.649331 0.525337 -118.5474784 223.2367
  • 47. Example:Sales Forecasting  VIF calculation Results:  There is no significant multicollinearity. Variable R- Squared VIF ADRATE 0.159268 1.19 COMPETE 0.779575 4.54 SIGNAL 0.262394 1.36 APIPOP 0.770978 4.36
  • 48. Qualitative Independent Variables  Many variables of interest in business, economics, and social and biological sciences are not quantitative but are qualitative.  Examples of qualitative variables are gender (male, female), purchase status (purchase, no purchase), and type of firms.  Qualitative variables can also be used in multiple regression.
  • 49. Qualitative Independent Variables  An economist wished to relate the speed with which a particular insurance innovation is adopted (y) to the size of the insurance firm (x1) and the type of firm. The dependent variable is measured by the number of months elapsed between the time the first firm adopted the innovation and and the time the given firm adopted the innovation. The first independent variable, size of the firm, is quantitative, and measured by the amount of total assets of the firm. The second independent variable, type of firm, is qualitative and is composed of two classes-Stock companies and mutual companies.
  • 50. Indicator variables  Indicator, or dummy variables are used to determine the relationship between qualitative independent variables and a dependent variable.  Indicator variables take on the values 0 and 1.  For the insurance innovation example, where the qualitative variable has two classes, we might define the indicator variable x2 as follows: otherwise 0 company stock if 1 2  x
  • 51. Indicator variables  A qualitative variable with c classes will be represented by c-1 indicator variables.  A regression function with an indicator variable with two levels (c = 2) will yield two estimated lines.
  • 52. Interpretation of Regression Coefficients  In our insurance innovation example, the regression model is:  Where:           2 2 1 1 0 x x y firm of size 1  x otherwise 0 company stock if 1 2  x
  • 53. Interpretation of Regression Coefficients  To understand the meaning of the regression coefficients in this model, consider first the case of mutual firm. For such a firm, x2 = 0 and we have:  For a stock firm x2 = 1 and the response function is: firms Mutual x b b b x b b yi 1 1 0 2 1 1 0 ) 0 ( ˆ      firms Stock x b b b b x b b yi 1 1 2 0 2 1 1 0 ) ( ) 1 ( ˆ      
  • 54. Interpretation of Regression Coefficients  The response function for the mutual firms is a straight line, with y intercept 0 and slope 1.  For stock firms, this also is a straight line, with the same slope 1 but with y intercept 0+2.  With reference to the insurance innovation example, the mean time elapsed before the innovation is adopted is linear function of size of firm (x1), with the same slope 1for both types of firms.
  • 55. Interpretation of Regression Coefficients  2 indicates how much lower or higher the response function for stock firm is than the one for the mutual firm.  2 measures the differential effect of type of firms.  In general, 2 shows how much higher (lower) the mean response line is for the class coded 1 than the line for the class coded 0, for any level of x1.
  • 56. Example: Insurance Innovation Adoption  Here is the data set for the insurance innovation example: Months Elapsed Size type of firm Type 17 151 0 Mutual 26 92 0 Mutual 21 175 0 Mutual 30 31 0 Mutual 22 104 0 Mutual 0 277 0 Mutual 12 210 0 Mutual 19 120 0 Mutual 4 290 0 Mutual 16 238 1 Stock 28 164 1 Stock 15 272 1 Stock 11 295 1 Stock 38 68 1 Stock 31 85 1 Stock 21 224 1 Stock 20 166 1 Stock 13 305 1 Stock 30 124 1 Stock 14 246 1 Stock
  • 57. Example: Insurance Innovation Adoption  Fitting the regression model Where    fitted response function is:         2 2 1 1 0 x x y firm of size 1  x otherwise 0 company stock if 1 2  x 2 1 77 . 8 1061 . 87 . 33 ˆ x x y   
  • 58. Example: Insurance Innovation Adoption SUMMARY OUTPUT Regression Statistics Multiple R 0.95993655 R Square 0.92147818 Adjusted R Square 0.91224031 Standard Error 2.78630562 Observations 20 ANOVA df SS MS F Significance F Regression 2 1548.820517 774.4103 99.75016 4.04966E-10 Residual 17 131.979483 7.763499 Total 19 1680.8 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 33.8698658 1.562588138 21.67549 8E-14 30.57308841 37.16664321 Size -0.10608882 0.007799653 -13.6017 1.45E-10 -0.122544675 -0.089632969 type of firm 8.76797549 1.286421264 6.815789 3.01E-06 6.053860079 11.4820909
  • 59. Example: Insurance Innovation Adoption  The fitted response function is:  Stock firms response function is:  Mutual firms response function is:  Interpretation ? 2 1 77 . 8 1061 . 87 . 33 ˆ x x y    1 1061 . ) 77 . 8 87 . 33 ( ˆ x y    1 1061 . 87 . 33 ˆ x y  
  • 60. Accounting for Seasonality in a Multiple regression Model  Seasonal Patterns are not easily accounted for by the typical causal variables that we use in regression analysis.  An indicator variable can be used effectively to account for seasonality in our time series data.  The number of seasonal indicator variables to use depends on the data.  If we have p periods in our data series, we can not use more than P-1 seasonal indicator variables.
  • 61. Example: Private Housing Starts (PHS)  Housing starts in the United States measured in thousands of units. These data are plotted for 1990 Q1 through 1999Q4. There are typically few housing starts during the first quarter of the year (January, February, March); there is usually a big increase in the second quarter of (April, May, June), followed by some decline in the third quarter (July, August, September), and further decline in the fourth quarter (October, November, December).
  • 62. Example: Private Housing Starts (PHS) Private Housing Starts (PHS) in Thousands of Units 0 50 100 150 200 250 300 350 400 Mar-90 Jul-90 Nov-90 Mar-91 Jul-91 Nov-91 Mar-92 Jul-92 Nov-92 Mar-93 Jul-93 Nov-93 Mar-94 Jul-94 Nov-94 Mar-95 Jul-95 Nov-95 Mar-96 Jul-96 Nov-96 Mar-97 Jul-97 Nov-97 Mar-98 Jul-98 Nov-98 1 1 1 1 1 1 1 1 "1" marks the first quarter of each year.
  • 63. Example: Private Housing Starts (PHS)  To Account for and measure this seasonality in a regression model, we will use three dummy variables: Q2 for the second quarter, Q3 for the third quarter, and Q4 for the fourth quarter. These will be coded as follows:  Q2 = 1 for all second quarters and zero otherwise.  Q3 = 1 for all third quarters and zero otherwise  Q4 = 1 for all fourth quarters and zero otherwise.
  • 64. Example: Private Housing Starts (PHS)  Data for private housing starts (PHS), the mortgage rate (MR), and these seasonal indicator variables are shown in the following slide.  Examine the data carefully to verify your understanding of the coding for Q2, Q3, Q4.  Since we have assigned dummy variables for the second, third, and fourth quarters, the first quarter is the base quarter for our regression model.  Note that any quarter could be used as the base, with indicator variables to adjust for differences in other quarters.
  • 65. Example: Private Housing Starts (PHS) PERIOD PHS MR Q2 Q3 Q4 31-Mar-90 217 10.1202 0 0 0 30-Jun-90 271.3 10.3372 1 0 0 30-Sep-90 233 10.1033 0 1 0 31-Dec-90 173.6 9.9547 0 0 1 31-Mar-91 146.7 9.5008 0 0 0 30-Jun-91 254.1 9.5265 1 0 0 30-Sep-91 239.8 9.2755 0 1 0 31-Dec-91 199.8 8.6882 0 0 1 31-Mar-92 218.5 8.7098 0 0 0 30-Jun-92 296.4 8.6782 1 0 0 30-Sep-92 276.4 8.0085 0 1 0 31-Dec-92 238.8 8.2052 0 0 1 31-Mar-93 213.2 7.7332 0 0 0 30-Jun-93 323.7 7.4515 1 0 0 30-Sep-93 309.3 7.0778 0 1 0 31-Dec-93 279.4 7.0537 0 0 1 31-Mar-94 252.6 7.2958 0 0 0 30-Jun-94 354.2 8.4370 1 0 0 30-Sep-94 325.7 8.5882 0 1 0 31-Dec-94 265.9 9.0977 0 0 1 31-Mar-95 214.2 8.8123 0 0 0 30-Jun-95 296.7 7.9470 1 0 0 30-Sep-95 308.2 7.7012 0 1 0 31-Dec-95 257.2 7.3508 0 0 1 31-Mar-96 240 7.2430 0 0 0 30-Jun-96 344.5 8.1050 1 0 0 30-Sep-96 324 8.1590 0 1 0 31-Dec-96 252.4 7.7102 0 0 1 31-Mar-97 237.8 7.7905 0 0 0 30-Jun-97 324.5 7.9255 1 0 0 30-Sep-97 314.6 7.4692 0 1 0 31-Dec-97 256.8 7.1980 0 0 1 31-Mar-98 258.4 7.0547 0 0 0 30-Jun-98 360.4 7.0938 1 0 0 30-Sep-98 348 6.8657 0 1 0 31-Dec-98 304.6 6.7633 0 0 1 31-Mar-99 294.1 6.8805 0 0 0 30-Jun-99 377.1 7.2037 1 0 0 30-Sep-99 355.6 7.7990 0 1 0 31-Dec-99 308.1 7.8338 0 0 1
  • 66. Example: Private Housing Starts (PHS)  The regression model for private housing starts (PHS) is:  In this model we expect b1 to have a negative sign, and we would expect b2, b3, b4 all to have positive signs. Why?  Regression results for this model are shown in the next slide. ) 4 ( ) 3 ( ) 2 ( ) ( 4 3 2 1 0 Q Q Q MR PHS          
  • 67. Example: Private Housing Starts (PHS) SUMMARY OUTPUT Regression Statistics Multiple R 0.885398221 R Square 0.78393001 Adjusted R Square 0.759236296 Standard Error 26.4498851 Observations 40 ANOVA df SS MS F Significance F Regression 4 88837.93624 22209.48406 31.74613731 3.33637E-11 Residual 35 24485.87476 699.5964217 Total 39 113323.811 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 473.0650749 35.54169837 13.31014264 2.93931E-15 400.9115031 545.2186467 MR -30.04838192 4.257226391 -7.058206249 3.21421E-08 -38.69102153 -21.40574231 Q2 95.74106935 11.84748487 8.081130334 1.6292E-09 71.689367 119.7927717 Q3 73.92904763 11.82881519 6.249911462 3.62313E-07 49.91524679 97.94284847 Q4 20.54778131 11.84139803 1.73524961 0.091495355 -3.491564078 44.5871267
  • 68. Example: Private Housing Starts (PHS)  Use the prediction equation to make a forecast for each of the fourth quarter of 1999.  Prediction equation: ) 4 ( 55 . 20 ) 3 ( 93 . 73 ) 2 ( 74 . 95 ) ( 05 . 30 06 . 473 ˆ Q Q Q MR S PH     
  • 69. Example: Private Housing Starts (PHS) 0 50 100 150 200 250 300 350 400 Mar-90 Jul-90 Nov-90 Mar-91 Jul-91 Nov-91 Mar-92 Jul-92 Nov-92 Mar-93 Jul-93 Nov-93 Mar-94 Jul-94 Nov-94 Mar-95 Jul-95 Nov-95 Mar-96 Jul-96 Nov-96 Mar-97 Jul-97 Nov-97 Mar-98 Jul-98 Nov-98 PHS PHSF1 PHSF2 Private Housing Starts (PHS) with a Simple Regression Forecast (PHSF1) and a Multiple Regression Forecast (PHSF2) in Thousands of Units
  • 70. Regression Diagnostics and Residual Analysis  It is important to check the adequacy of the model before it becomes part of the decision making process.  Residual plots can be used to check the model assumptions.  It is important to study outlying observations to decide whether they should be retained or eliminated.  If retained, whether their influence should be reduced in the fitting process or revise the regression function.
  • 71. Time Series Data and the Problem of Serial Correlation  In the regression models we assume that the errors i are independent.  In business and economics, many regression applications involve time series data.  For such data, the assumption of uncorrelated or independent error terms is often not appropriate.
  • 72. Problems of Serial Correlation  If the error terms in the regression model are autocorrelated, the use of ordinary least squares procedures has a number of important consequences  MSE underestimate the variance of the error terms  The confidence intervals and tests using the t and F distribution are no longer strictly applicable.  The standard error of the regression coefficients underestimate the variability of the estimated regression coefficients. Spurious regression can result.
  • 73. First order serial correlation  The error term in current period is directly related to the error term in the previous time period.  Let the subscript t represent time, then the simple linear regression model is:  Where  t = error at time t   = the parameter that measures correlation between adjacent error terms  t normally distributed error terms with mean zero and variance 2  t t t x y       1 0 t t t       1
  • 74. Example  The effect of positive serial correlation in a simple linear regression model.  Misleading forecasts of future y values.  Standard error of the estimate, S y.x will underestimate the variability of the y’s about the true regression line.  Strong autocorrelation can make two unrelated variables appear to be related.
  • 75. Durbin-Watson Test for Serial Correlation  Recall the first-order serial correlation model  The hypothesis to be tested are:  The alternative hypothesis is  > 0 since in business and economic time series tend to show positive correlation. t t t       1 0 : 0 : 0     a H H t t t x y       1 0
  • 76. Durbin-Watson Test for Serial Correlation  The Durbin-Watson statistic is defined as  Where        n t t n t t t e e e DW 1 2 2 2 1) ( t period for time residual the ˆ    t t t y y e 1 - t period for time residual the ˆ 1 1 1       t t t y y e
  • 77. Durbin-Watson Test for Serial Correlation  The auto correlation coefficient  can be estimated by the lag 1 residual autocorrelation r1(e)  And it can be shown that       n t t n t t t e e e e r 1 2 2 1 1 ) ( )) ( 1 ( 2 1 e r DW  
  • 78. Durbin-Watson Test for Serial Correlation  Since –1 < r1(e) < 1 then 0 < DW < 4  If r1(e) = 0, then DW = 2 (there is no correlation.)  If r1(e) > 0, then DW < 2 (positive correlation)  If r1(e) < 0, Then DW > 2 (negative correlation)
  • 79. Durbin-Watson Test for Serial Correlation  Decision rule:  If DW > U, Do not reject H0.  If DW < L, Reject H0  If L  DW  U, the test is inconclusive.  The critical Upper (U) an Lower (L) bound can be found in Durbin-Watson table of your text book.  To use this table you need to know The significance level () The number of independent parameters in the model (k), and the sample size (n).
  • 80. Example  The Blaisdell Company wished to predict its sales by using industry sales as a predictor variable. The following table gives seasonally adjusted quarterly data on company sales and industry sales for the period 1983-1987.
  • 81. Example Year Quarter t CompSale InduSale 1983 1 1 20.96 127.3 2 2 21.4 130 3 3 21.96 132.7 4 4 21.52 129.4 1984 1 5 22.39 135 2 6 22.76 137.1 3 7 23.48 141.2 4 8 23.66 142.8 1985 1 9 24.1 145.5 2 10 24.01 145.3 3 11 24.54 148.3 4 12 24.3 146.4 1986 1 13 25 150.2 2 14 25.64 153.1 3 15 26.36 157.3 4 16 26.98 160.7 1987 1 17 27.52 164.2 2 18 27.78 165.6 3 19 28.24 168.7 4 20 28.78 171.7
  • 82. Example Blaisdell Company Example 0 5 10 15 20 25 30 35 0 50 100 150 200 Industry sales($ millions) Company Sales ($ millions)
  • 83. Example  The scatter plot suggests that a linear regression model is appropriate.  Least squares method was used to fit a regression line to the data.  The residuals were plotted against the fitted values.  The plot shows that the residuals are consistently above or below the fitted value for extended periods.
  • 85. Example  To confirm this graphic diagnosis we will use the Durbin-Watson test for:  The test statistic is: 0 : 0 : 0     a H H        n t t n t t t e e e DW 1 2 2 2 1) (
  • 86. Example Year Quarter t Company sales(y) Industry sales(x) et et-et-1 (et-et-1)^2 et^2 1983 1 1 20.96 127.3 -0.02605 0.000679 2 2 21.4 130 -0.06202 -0.03596 0.001293 0.003846 3 3 21.96 132.7 0.022021 0.084036 0.007062 0.000485 4 4 21.52 129.4 0.163754 0.141733 0.020088 0.026815 1984 1 5 22.39 135 0.04657 -0.11718 0.013732 0.002169 2 6 22.76 137.1 0.046377 -0.00019 3.76E-08 0.002151 3 7 23.48 141.2 0.043617 -0.00276 7.61E-06 0.001902 4 8 23.66 142.8 -0.05844 -0.10205 0.010415 0.003415 1985 1 9 24.1 145.5 -0.0944 -0.03596 0.001293 0.008911 2 10 24.01 145.3 -0.14914 -0.05474 0.002997 0.022243 3 11 24.54 148.3 -0.14799 0.001152 1.33E-06 0.021901 4 12 24.3 146.4 -0.05305 0.094937 0.009013 0.002815 1986 1 13 25 150.2 -0.02293 0.030125 0.000908 0.000526 2 14 25.64 153.1 0.105852 0.12878 0.016584 0.011205 3 15 26.36 157.3 0.085464 -0.02039 0.000416 0.007304 4 16 26.98 160.7 0.106102 0.020638 0.000426 0.011258 1987 1 17 27.52 164.2 0.029112 -0.07699 0.005927 0.000848 2 18 27.78 165.6 0.042316 0.013204 0.000174 0.001791 3 19 28.24 168.7 -0.04416 -0.08648 0.007478 0.00195 4 20 28.78 171.7 -0.03301 0.011152 0.000124 0.00109 0.097941 0.133302 Blaisdell Company Example 30 35
  • 87. Example  Using Durbin Watson table of your text book, for k = 1, and n=20, and using  = .01 we find U = 1.15, and L = .95  Since DW = .735 falls below L = .95 , we reject the null hypothesis, namely, that the error terms are positively autocorrelated. 735 . 13330 . 09794 .   DW
  • 88. Remedial Measures for Serial Correlation  Addition of one or more independent variables to the regression model.  One major cause of autocorrelated error terms is the omission from the model of one or more key variables that have time-ordered effects on the dependent variable.  Use transformed variables.  The regression model is specified in terms of changes rather than levels.
  • 89. Extensions of the Multiple Regression Model  In some situations, nonlinear terms may be needed as independent variables in a regression analysis.  Business or economic logic may suggest that non- linearity is expected.  A graphic display of the data may be helpful in determining whether non-linearity is present.  One common economic cause for non-linearity is diminishing returns.  Fore example, the effect of advertising on sales may diminish as increased advertising is used.
  • 90. Extensions of the Multiple Regression Model  Some common forms of nonlinear functions are : ) ( ) ( 2 2 1 0 X X Y       ) ( ) ( ) ( 3 3 2 2 1 0 X X X Y         ) 1 ( 1 0 X Y     1 0   X e Y 
  • 91. Extensions of the Multiple Regression Model  To illustrate the use and interpretation of a non-linear term, we return to the problem of developing a forecasting model for private housing starts (PHS).  So far we have looked at the following model  Where MR is the mortgage rate and Q2, Q3, and Q4 are indicators variables for quarters 2, 3, and 4. ) 4 ( ) 3 ( ) 2 ( ) ( 4 3 2 1 0 Q Q Q MR PHS          
  • 92. Example: Private Housing Start  First we add real disposable personal income per capita (DPI) as an independent variable. Our new model for this data set is:  Regression results for this model are shown in the next slide. ) ( ) 4 ( ) 3 ( ) 2 ( ) ( 5 4 3 2 1 0 DPI Q Q Q MR PHS            
  • 93. Example: Private Housing Start SUMMARY OUTPUT Regression Statistics Multiple R 0.943791346 R Square 0.890742104 Adjusted R Square 0.874187878 Standard Error 19.05542121 Observations 39 ANOVA df SS MS F Significance F Regression 5 97690.01942 19538 53.80753 6.51194E-15 Residual 33 11982.59955 363.1091 Total 38 109672.619 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -31.06403714 105.1938477 -0.2953 0.769613 -245.0826992 182.9546249 MR -20.1992545 4.124906847 -4.8969 2.5E-05 -28.59144723 -11.80706176 Q2 97.03478074 8.900711541 10.90191 1.78E-12 78.9261326 115.1434289 Q3 75.40017073 8.827185877 8.541813 7.17E-10 57.44111179 93.35922967 Q4 20.35306822 8.83373887 2.304015 0.027657 2.380677107 38.32545934 DPI 0.022407799 0.004356973 5.142974 1.21E-05 0.013543464 0.031272134
  • 94. Example: Private Housing Start  The prediction model is  In comparison with the previous model, we see that the R-squared has improved.It has changed from 78% to 89%.  The standard error of the estimate has decreased from 26.49 for the previous model to 19.05 for the new model. ) ( 02 . 0 ) 4 ( 35 . 20 ) 3 ( 40 . 75 ) 2 ( 03 . 97 ) ( 19 . 20 06 . 31 ˆ DPI Q Q Q MR S PH       
  • 95. Example: Private Housing Start  The value of the DW test has changed from 0.88 for the previous model to 0.78 for the new model.  At 5% level the critical value for DW test, from Durbin-Watson table, for k = 5, and n = 39 is L= 1.22, and U = 1.79.  Since The value of the DW test is smaller than L=1.22, we reject the null hypothesis H0:  =0  This implies that there is serial correlation in both models, the assumption of the independence of the error terms is not valid.
  • 96. Example: Private Housing Start  The Plot of PHS against DPI shows a curve linear relation.  Next we introduce a nonlinear term into the regression.  The square of disposable personal income per capita (DPI2) is included in the regression model. Private Housing Start and Disposable Personal Income 17500 18000 18500 19000 19500 20000 20500 21000 21500 0 50 100 150 200 250 300 350 400 DPI PHS
  • 97. Example: Private Housing Start  We also add the dependent variable, lagged one quarter, as an independent variable in order to help reduce serial correlation.  The third model that we fit to our data set is:  Regression results for this model are shown in the next slide. ) ( ) ( ) ( ) 4 ( ) 3 ( ) 2 ( ) ( 7 2 6 5 4 3 2 1 0 LPHS DPI DPI Q Q Q MR PHS                
  • 98. Example: Private Housing Start SUMMARY OUTPUT Regression Statistics Multiple R 0.97778626 R Square 0.956065971 Adjusted R Square 0.946145384 Standard Error 12.46719572 Observations 39 ANOVA df SS MS F Significance F Regression 7 104854.2589 14979.17985 96.37191 3.07085E-19 Residual 31 4818.360042 155.4309691 Total 38 109672.619 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 716.5926532 1017.664989 0.704153784 0.486593 -1358.949934 2792.13524 MR -13.65521724 3.093504134 -4.414158396 0.000114 -19.96446404 -7.345970448 Q2 106.9813297 6.069780998 17.62523718 1.04E-17 94.60192287 119.3607366 Q3 27.72122303 9.111432565 3.042465916 0.004748 9.138323433 46.30412262 Q4 -13.37855186 7.653050858 -1.748133144 0.09034 -28.98706069 2.22995698 DPI -0.060399279 0.104412354 -0.578468704 0.567127 -0.273349798 0.15255124 DPI SQUARED 0.000335974 0.000536397 0.626354647 0.535668 -0.000758014 0.001429963 LPHS 0.655786939 0.097265424 6.742241114 1.51E-07 0.457412689 0.854161189
  • 99. Example: Private Housing Start  The inclusion of DPI2 and Lagged PHS has increased the R-squared to 96%  The standard error of the estimate has decreased to 12.45  The value of the DW test has increased to 2.32 which is greater than U = 1.79 which rule out positive serial correlation.  You see that the third model worked best for this data set.  The following slide gives the data set.
  • 100. Example: Private Housing Start PERIOD PHS MR LPHS Q2 Q3 Q4 DPI DPI SQUARED 30-Jun-90 271.3 10.3372 217 1 0 0 18063 1,631,359.85 30-Sep-90 233 10.1033 271.3 0 1 0 18031 1,625,584.81 31-Dec-90 173.6 9.9547 233 0 0 1 17856 1,594,183.68 31-Mar-91 146.7 9.5008 173.6 0 0 0 17748 1,574,957.52 30-Jun-91 254.1 9.5265 146.7 1 0 0 17861 1,595,076.61 30-Sep-91 239.8 9.2755 254.1 0 1 0 17816 1,587,049.28 31-Dec-91 199.8 8.6882 239.8 0 0 1 17811 1,586,158.61 31-Mar-92 218.5 8.7098 199.8 0 0 0 18000 1,620,000.00 30-Jun-92 296.4 8.6782 218.5 1 0 0 18085 1,635,336.13 30-Sep-92 276.4 8.0085 296.4 0 1 0 18036 1,626,486.48 31-Dec-92 238.8 8.2052 276.4 0 0 1 18330 1,679,944.50 31-Mar-93 213.2 7.7332 238.8 0 0 0 17975 1,615,503.13 30-Jun-93 323.7 7.4515 213.2 1 0 0 18247 1,664,765.05 30-Sep-93 309.3 7.0778 323.7 0 1 0 18246 1,664,582.58 31-Dec-93 279.4 7.0537 309.3 0 0 1 18413 1,695,192.85 31-Mar-94 252.6 7.2958 279.4 0 0 0 18154 1,647,838.58 30-Jun-94 354.2 8.4370 252.6 1 0 0 18409 1,694,456.41 30-Sep-94 325.7 8.5882 354.2 0 1 0 18493 1,709,955.25 31-Dec-94 265.9 9.0977 325.7 0 0 1 18667 1,742,284.45 31-Mar-95 214.2 8.8123 265.9 0 0 0 18834 1,773,597.78 30-Jun-95 296.7 7.9470 214.2 1 0 0 18798 1,766,824.02 30-Sep-95 308.2 7.7012 296.7 0 1 0 18871 1,780,573.21 31-Dec-95 257.2 7.3508 308.2 0 0 1 18942 1,793,996.82 31-Mar-96 240 7.2430 257.2 0 0 0 19071 1,818,515.21 30-Jun-96 344.5 8.1050 240 1 0 0 19081 1,820,422.81 30-Sep-96 324 8.1590 344.5 0 1 0 19161 1,835,719.61 31-Dec-96 252.4 7.7102 324 0 0 1 19152 1,833,995.52 31-Mar-97 237.8 7.7905 252.4 0 0 0 19331 1,868,437.81 30-Jun-97 324.5 7.9255 237.8 1 0 0 19315 1,865,346.13 30-Sep-97 314.6 7.4692 324.5 0 1 0 19385 1,878,891.13 31-Dec-97 256.8 7.1980 314.6 0 0 1 19478 1,896,962.42 31-Mar-98 258.4 7.0547 256.8 0 0 0 19632 1,927,077.12 30-Jun-98 360.4 7.0938 258.4 1 0 0 19719 1,944,194.81 30-Sep-98 348 6.8657 360.4 0 1 0 19905 1,980,963.41 31-Dec-98 304.6 6.7633 348 0 0 1 20194 2,038,980.00 31-Mar-99 294.1 6.8805 304.6 0 0 0 20377 2,076,010.87 30-Jun-99 377.1 7.2037 294.1 1 0 0 20472 2,095,440.74 30-Sep-99 355.6 7.7990 377.1 0 1 0 20756 2,153,982.23 31-Dec-99 308.1 7.8338 355.6 0 0 1 21124 2,231,020.37