Stuck with your Regression Assignment? Get 24/7 help from tutors with Phd in the subject. Email us at support@helpwithassignment.com
Reach us at http://www.HelpWithAssignment.com
2. The methods of simple linear regression, discussed in
Chapter 7, apply when we wish to fit a linear model
relating the value of an dependent variable y to the
value of a single independent variable x.
There are many situations when a single independent
variable is not enough.
In situations like this, there are several independent
variables, x1,x2,…,xp, that are related to a dependent
variable y.
3. Assume that we have a sample of n items
and that on each item we have measured a
dependent variable y and p independent
variables, x1,x2,…,xp.
The ith sampled item gives rise to the
ordered set (yi,x1i,…,xpi).
We can then fit the multiple regression
model yi = β0 + β1x1i +…+ βpxpi + εi.
4. Polynomial regression model (the independent variables are
all powers of a single variable)
Quadratic model (polynomial regression of model of degree 2,
and powers of several variables)
A variable that is the product of two other variables is called an
interaction.
These models are considered linear models, even though they contain
nonlinear terms in the independent variables. The reason is that they
are linear in the coefficients, βi .
2 2
0 1 1 2 2 3 1 2 4 1 5 2i i i i i i i i
y x x x x x xβ β β β β β ε= + + + + + +
ippii xxxy εββββ ++++= ˆ...ˆˆˆˆ 2
210
5. In any multiple regression model, the estimates
are computed by least-squares, just as in simple linear
regression. The equation
is called the least-squares equation or fitted
regression equation.
The residuals are the quantities
which are the differences between the observed y values
and the y values given by the equation.
We want to compute so as to minimize the sum of
the squared residuals. This is complicated and we rely on
computers to calculate them.
0 1
ˆ ˆ ˆ, ,..., p
β β β
ˆi i i
e y y= −
0 1
ˆ ˆ ˆ, ,..., p
β β β
pp xxy βββ ˆ...ˆˆˆ 110 ++=
6. Much of the analysis in multiple regression is based on three
fundamental quantities.
regression sum of squares (SSR),
error sum of squares (SSE)
total sum of squares (SST)
these quantities are the same as defined in Chapter 7.
The analysis of variance identity is
SST = SSR + SSE
7. Recall: Assumptions for Errors in Linear Models:
In the simplest situation, the following assumptions are
satisfied (notice that these are the same as for simple
linear regression.):
1. The errors ε1,…,εn are random and independent. In
particular, the magnitude of any error εi does not influence
the value of the next error εi+1.
2. The errors ε1,…,εn all have mean 0.
3. The errors ε1,…,εn all have the same variance, which we
denote by σ2
.
4. The errors ε1,…,εn are normally distributed.
8. The three statistics most often used in multiple regression
are the estimated error variance s2
, the coefficient of
determination R2
, and the F statistic.
We have to adjust the estimated standard deviation since we
are estimating p + 1 coefficients,
The estimated variance of each least-squares coefficient is a
complicated calculation and we can find them using a
computer.
The value of R2
is calculated in the same way as r2
in simple
linear regression.
2
2 1
ˆ( )
1 1
n
i ii
y y SSE
s
n p n p
=
−
= =
− − − −
∑
SST
SSR
SST
SSE
R =−=12
9. When assumptions 1 through 4 are satisfied, the
quantity
has a Student’s t distribution with n – p + 1
degrees of freedom.
The number of degrees of freedom is equal to the
denominator used to compute the estimated error
variance.
This statistic is used to compute confidence
intervals and to perform hypothesis tests, as we did
with simple linear regression.
ˆ
ˆ
i
i i
β
β β
s
−
10. In simple linear regression, a test of the null hypothesis β1 = 0 is
almost always made. If this hypothesis is not rejected, then the
linear model may not be useful.
The test is multiple linear regression is H0 =
β1= β2= … = βp= 0. This is a very strong hypothesis. It says that
none of the independent variables has any linear relationship
with the dependent variable.
The test statistic for this hypothesis is
This is an F statistic and its null distribution is Fp,n-p-1. Note that the
denominator of the F statistic is s2
. The subscripts p and n-p-1 are
the degrees of freedom for the F statistic.
Slightly different versions of the F statistics can be used to test
milder null hypotheses.
1−−
=
pn
SSE
p
SSR
F
11. The regression equation is
Goodput = 96.0 - 1.82 Speed + 0.565 Pause + 0.0247 Speed*Pause + 0.0140 Speed^2
- 0.0118 Pause^2
Predictor Coef StDev T P
Constant 96.024 3.946 24.34 0.000
Speed -1.8245 0.2376 -7.68 0.000
Pause 0.5652 0.2256 2.51 0.022
Speed*Pa 0.024731 0.003249 7.61 0.000
Speed^2 0.014020 0.004745 2.95 0.008
Pause^2 -0.011793 0.003516 -3.35 0.003
S = 2.942 R-Sq = 93.2% R-Sq(adj) = 91.4%
Analysis of Variance
Source DF SS MS F P
Regression 5 2240.49 448.10 51.77 0.000
Residual Error 19 164.46 8.66
Total 24 2404.95
Predicted Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 74.272 1.175 (71.812, 76.732) (67.641, 80.903)
Values of Predictors for New Observations
New
Obs Speed Pause Speed*Pause Speed^2 Pause^2
1 25.0 15.0 375 625 225
Speed Pause Goodput
5 10 95.111
5 20 94.577
5 30 94.734
5 40 94.317
5 50 94.644
10 10 90.8
10 20 90.183
10 30 91.341
10 40 91.321
10 50 92.104
20 10 72.422
20 20 82.089
20 30 84.937
20 40 87.8
20 50 89.941
30 10 62.963
30 20 76.126
30 30 84.855
30 40 87.694
30 50 90.556
40 10 55.298
40 20 78.262
40 30 84.624
40 40 87.078
40 50 90.101
12. Use the multiple regression model to predict the
goodput for a network with speed 12 m/s and pause
time 25 s.
For the goodput data, find the residual for the point
Speed = 20, Pause = 30.
Find a 95% confidence interval for the coefficient of
Speed in the multiple regression model.
Test the null hypothesis that the coefficient of Pause
is less than or equal to 0.3.
13. It is important in multiple linear regression to test
the validity of the assumptions for errors in the
linear model.
•Errors are random and independent: Residuals vs. run order
•Errors all have mean of zero: Residuals vs. each independent
variable
•Errors all have the same variance: Residuals vs. fitted values
•Errors are normally distributed: Normal or Half-Normal plot of residuals
If the residual plots indicate a violation of assumptions, transformations can
be tried.
14. Fitting separate models to each variable is not
the same as fitting the multivariate model.
Consider the following example: There are 225
gas wells that received “fracture treatment” in
order to increase production. In this treatment,
fracture fluid, which consists of fluid mixed with
sand, is pumped into the well. The sand holds
open the cracks in the rock, thus increasing the
flow of gas.
15. We can use sand to predict production or fluid to predict production. If we
fit a simple model, then sand and fluid in their models show up as
important predictors.
We might be tempted to conclude that increasing the volume of fluid or the
volume of sand would increase production.
16. There is confounding in this situation. If we increase the volume of fluid,
then we also increase the volume of sand.
If production depends only on the volume of sand, there will still be a
relationship in the data between production and fluid, and vice versus.
17. The following output presents results using only one independent variable (fluid or
sand) in the model. Note that log transformations have been done. Both fluid
and sand have a statistically significant effect.
The regression equation is
ln Prod = - 0.444 + 0.798 ln Fluid
Predictor Coef StDev T P
Constant -0.4442 0.5853 -0.76 0.449
ln Fluid 0.79833 0.08010 9.97 0.000
S = 0.7459 R-Sq = 28.2% R-Sq(adj) = 27.9%
The regression equation is
ln Prod = - 0.778 + 0.748 ln Sand
Predictor Coef StDev T P
Constant -0.7784 0.6912 -1.13 0.261
ln Sand 0.74751 0.08381 8.92 0.000
S = 0.7678 R-Sq = 23.9% R-Sq(adj) = 23.6%
18. This output presents results from multiple linear regression, in
which both fluid and sand are included in the model. In
contrast to the separate simple linear regression results,
only fluid has a statistically significant effect; sand does
not.
The regression equation is
ln Prod = - 0.729 + 0.670 ln Fluid + 0.148 ln Sand
Predictor Coef StDev T P
Constant -0.7288 0.6719 -1.08 0.279
ln Fluid 0.6701 0.1687 3.97 0.000
ln Sand 0.1481 0.1714 0.86 0.389
S = 0.7463 R-Sq = 28.4% R-Sq(adj) = 27.8%
19. When two independent variables are
very strongly correlated, multiple
regression may not be able to
determine which is the important one.
In this case, the variables are said to be
collinear.
The word collinear means to lie on the
same line, and when two variables are
highly correlate, their scatterplot is
approximately a straight line.
•The word multicollinearity is sometimes used as well, meaning that
multiple variables are highly correlated with each other.
•When collinearity is present, the set of independent variables is
sometimes said to be ill-conditioned.
20. There are many situations in which a large number
of independent variables have been measured, and
we need to decide which of them to include in the
model.
This is the problem of model selection, and it is not
an easy one.
Good model selection rests on this basic principle
known as Occam’s razor:
“The best scientific model is the simplest model
that explains the observed data.”
In terms of linear models, Occam’s razor implies
the principle of parsimony:
“A model should contain the smallest number of
variables necessary to fit the data.”
21. 1. A linear model should always contain an
intercept, unless physical theory dictates
otherwise.
2. If a power xn
of a variable is included in the model,
all lower powers x, x2
, …, xn-1
should be included as
well, unless physical theory dictates otherwise.
3. If a product xy of two variables is included in a
model, then the variables x and y should be
included separately as well, unless physical
theory dictates otherwise.
22. What is the effect of X on Y?
Draw a smooth curve through the data
of what you would expect a good model
to look like.
23.
24. First, check if an entire variable can be eliminated
(including linear, quadratic, and interaction terms)
Ex: yi=β0 + β1x1 +β2x2 +β3x1
2
+β4x2
2
+β5x1x2
Can all x1 terms (x1,x1
2
,x1x2) be dropped as a group?
Next, drop other insignificant terms one at a time,
starting with the term with the highest p value.
Removing a term will change the coefficients and p-values
of the remaining terms.
Often called “backward elimination”
25. It often happens that one has formed a model
that contains a large number of independent
variables, and one wishes to determine whether
a given subset of them may be dropped from
the model without significantly reducing the
accuracy of the model.
Assume that we know that the model
yi=β0 + β1x1i +…+βkxki+βk+1xk+1i +… βpxpi + εi is correct.
We will call this the full model.
26. We wish to test the null hypothesis H0:βk+1=…
=βp=0.
If H0is true, the model will remain correct if we
drop the variables xk+1,…xp, so we can replace the
full model with the following reduced model:
yi=β0 + β1x1i +…+βkxki + εi.
27. To develop a test statistic for H0, we begin by
computing the error sums of squares for
both the full and reduced models.
We call this SSfull and SSreduced, respectively.
The number of degrees of freedom:
Full Model: n – p – 1
Reduced Model: n – k – 1.
28. If the full model is correct, than the error variance σ2
is
well estimated be SSEfull/(n-p-1).
Under the null hypothesis (the reduced model is also
correct) and the variance can be estimated by:
SSEreduced/(n-k-1), so
SSEfull = (n-p-1)σ2
SSEreduced = (n-k-1)σ2
The difference between the above is:
SSEfull – SSEreduced = (k-p)σ2
So, if the null hypothesis is true, σ2
can be estimated
by:
)(
)(
kp
SSESSE fullreduced
−
−
29. The test statistic is
If H0 is true, then f tends to be close to 1. If
H0 is false, then f tends to be larger.
The test statistic can be thought of as the
variance explained by the dropped terms
divided by our best estimate of the variance.
)1(
)(
)(
−−
−
−
=
pn
SSE
kp
SSESSE
f
full
fullreduced
30. You fit the data from a Central-Composite design in 3 Factors with a full
quadratic equation.
The sum of the squared errors from the regression was:
SSE = 175 vR = 10 (s2
R = 17.50 )
When factor X2 was dropped from the model (4 terms), the sum of squares
increased to
SSE = 323 vR = 14 (s2
R = 23.07 )
Is X2 needed in the model?
31. This method is very useful for developing
parsimonious models by removing unnecessary
variables. However, the conditions under which it is
formally correct are rarely met.
More often, a large model is fit, some of the variables
are seen to have fairly large P-values, and the F test is
used to decide whether to drop them from the model.
It is often the case that there is no one “correct”
model. There are several models that fit equally well.
32. When there is little or no physical theory to rely on,
many different models will fit the data about
equally well.
The methods for choosing a model involve
statistics, whose values depend on the data.
Therefore, if the experiment is repeated, these
statistics will come out differently, and different
models may appear to be “best.”
Some or all of the independent variables in a
selected model may not really be related to the
dependent variable. Whenever possible,
experiments should be repeated to test these
apparent relationships.
Model selection is an art, not a science.
42. Your book also discusses
Best subsets regression
Stepwise regression
Includes forward selection and backward elimination
We won’t cover these methods in detail in this
class
43. This is the most widely use model selection technique.
Its main advantage over best subsets regression is
that it is less computationally intensive, so it can be
used in situations where there are a very large
number of candidate independent variables and too
many possible subsets for every one of them to be
examined.
The user chooses two threshold P-values, αin and αout,
with αin < αout.
The stepwise regression procedure begins with a step
called a forward selection step, in which the
independent variables with smallest P-value is
selected, provided that P < αin.
This variable is entered in the model, creating a model
with a single independent variable.
44. In the next step, the remaining variables are examined
one at a time as candidates for the second variable in
the model. The one with the smallest P-value is added
to the model, again provided that P < αin.
Now, it is possible that adding the second variables to
the model increased the P-value of the first variable.
In the next step, called a backward elimination
step, the first variable is dropped from the model if its
P-value has grown to exceed the value αout.
The algorithm continues by alternating forward
selection steps with backward eliminations steps.
The algorithm terminates when no variables meet the
criteria for being added to or dropped from the model.
45. www.HelpWithAssignment.com is an online tutoring and
Live Assigment help company. We provides seamless online
tuitions in sessions of 30 minutes, 60 minutes or 120 minutes
covering a variety of subjects. The specialty of HWA online
tuitions are:
•Conducted by experts in the subject taught
•Tutors selected after rigorous assessment and training
•Tutoring sessions follow a pre-decided structure based on
instructional design best practices
•State-of-the art technology used With a whiteboard,
document sharing facility, video and audio conferencing as
well as chat support HWA’s one-on-one tuitions have a large
following. Several thousand hours of tuitions have already
been delivered to the satisfaction of customers.
In short,HWA’s online tuitions are seamless,personalized and
convenient.
WWW.HELPWITHASSIGNMENT.COM