SlideShare a Scribd company logo
1 of 115
Quantitative Research Methods
Lecture 9 Model Building
1. Regression Diagnostics I
2. Regression Diagnostics II Multicollinearity
3. Regression Diagnostics II Time series
4. Polynomial Models
5. Nominal variable in Multiple Regression
6. Stepwise Multiple Regression
Statistical analyses
ā€¢ Group differences (nominal variable) on one interval variable:
ā–« T-tests (2 groups)
ā–« ANOVA (3 or more groups)
ļ‚– One factor: one way ANOVA
ļ‚– Two factor: two way/factor ANOVA
ā€¢ The relationship between two nominal variable:
ā–« Chi-square test
ā€¢ The relationship between two interval variable:
ā–« Correlation, simple linear regression
ā€¢ The relationship between multiple interval variable on one
interval variable
ā–« Multiple regression
ā€¢ The relationship between multiple interval variable on one
nominal variable (yes/no)
ā–« Logistic regression
Regression
ā€¢ Single Linear Regression (interval)
ā–« one independent, one dependent
ā€¢ Multiple Regression (all interval)
ā–« Multiple independent, one dependent
ā€¢ Logistic Regression
ā–« Multiple interval independent, one nominal
dependent (Yes/No)
ā–« Check example: https://youtu.be/H_48AcV0qlY
ā–«
16.4
Simple Linear Regression Modelā€¦
A straight line model with one independent
variable is called a simple linear regression
model. Its is written as:
error variable
dependent
variable
independent
variable
y-intercept slope of the line
16.5
Simple Linear Regression Modelā€¦
Note that both and are population
parameters which are usually unknown and
hence estimated from the data.
y
x
run
rise
=slope (=rise/run)
=y-intercept
16.6
Estimating the Coefficientsā€¦
In much the same way we base estimates of Āµ on x , we
estimate Ī²0 using b0 and Ī²1 using b1, the y-intercept
and slope (respectively) of the least squares or
regression line given by:
(Recall: this is an application of the least squares
method and it produces a straight line that
minimizes the sum of the squared differences
between the points and the line)
16.7
Least Squares Lineā€¦
these differences are
called residuals
Example 16.1
16.8
Example 16.2ā€¦
Car dealers across North America use the "Red Book" to
help them determine the value of used cars that their
customers trade in when purchasing new cars.
The book, which is published monthly, lists the trade-in
values for all basic models of cars.
It provides alternative values for each car model according
to its condition and optional features.
The values are determined on the basis of the average paid
at recent used-car auctions, the source of supply for many
used-car dealers.
16.9
Example 16.2ā€¦
However, the Red Book does not indicate the value
determined by the odometer reading, despite the fact that a
critical factor for used-car buyers is how far the car has
been driven.
To examine this issue, a used-car dealer randomly selected
100 three-year old Toyota Camrys that were sold at auction
during the past month.
The dealer recorded the price ($1,000) and the number of
miles (thousands) on the odometer. (Xm16-02).
The dealer wants to find the regression line.
16.10
Using SPSS
Analyze > Regression > Linear
Simple Linear Regression
SPSS Steps: Analyze > Regression > Linear
16.11
SPSS Output: check three tables
R2 strength of the linear relationship
Model
significance
/fit
b1 b0
16.12
Example 16.2ā€¦
As you might expect with used carsā€¦
The slope coefficient, b1, is ā€“0.0669, that is, each
additional mile on the odometer decreases the price by
$.0669 or 6.69Ā¢
The intercept, b0, is 17,250. One interpretation would
be that when x = 0 (no miles on the car) the selling
price is $17,250. However, we have no data for cars
with less than 19,100 miles on them so this isnā€™t a
correct assessment.
16.13
Testing the Slopeā€¦
If no linear relationship exists between the two
variables, we would expect the regression line to be
horizontal, that is, to have a slope of zero.
We want to see if there is a linear relationship, i.e.
we want to see if the slope (Ī²1) is something other
than zero. Our research hypothesis becomes:
H1: Ī²1 ā‰  0
Thus the null hypothesis becomes:
H0: Ī²1 = 0
16.14
Coefficient of Determinationā€¦
Tests thus far have shown if a linear relationship
exists; it is also useful to measure the strength
of the relationship. This is done by calculating
the coefficient of determination ā€“ R2.
The coefficient of determination is the square of
the coefficient of correlation (r), hence R2 = (r)2
16.15
Coefficient of Determinationā€¦
As we did with analysis of variance, we can partition
the variation in y into two parts:
Variation in y = SSE + SSR
SSE ā€“ Sum of Squares Error ā€“ measures the amount of
variation in y that remains unexplained (i.e. due to
error)
SSR ā€“ Sum of Squares Regression ā€“ measures the
amount of variation in y explained by variation in the
independent variable x.
16.16
Coefficient of Determination
R2 has a value of .6483. This means 64.83% of the variation
in the auction selling prices (y) is explained by the variation
in the odometer readings (x). The remaining 35.17% is
unexplained, i.e. due to error.
Unlike the value of a test statistic, the coefficient of
determination does not have a critical value that
enables us to draw conclusions.
In general the higher the value of R2, the better the model
fits the data.
R2 = 1: Perfect match between the line and the data points.
R2 = 0: There are no linear relationship between x and y.
16.17
Using the Regression Equationā€¦
We could use our regression equation:
y = 17.250 ā€“ .0669x
to predict the selling price of a car with 40 (,000) miles
on it:
y = 17.250 ā€“ .0669x = 17.250 ā€“ .0669(40) = 14,574
We call this value ($14,574) a point prediction.
Chances are though the actual selling price will be
different, hence we can estimate the selling price in terms
of an interval.
16.18
Prediction Interval
The prediction interval is used when we want to
predict one particular value of the dependent
variable, given a specific value of the independent
variable:
(xg is the given value of x weā€™re interested in)
16.19
Confidence Interval Estimatorā€¦
ā€¦of the expected value of y. In this case, we are
estimating the mean of y given a value of x:
(Technically this formula is used for infinitely large
populations. However, we can interpret our
problem as attempting to determine the average
selling price of all Toyota Camrys, all with 40,000
miles on the odometer)
16.20
Whatā€™s the Difference?
1 no 1
The confidence interval estimate of the expected value of y will be narrower than
the prediction interval for the same given value of x and confidence level. This is
because there is less error in estimating a mean value as opposed to predicting an
individual value.
Prediction Interval Confidence Interval
Used to estimate the value of
one value of y (at given x)
Used to estimate the mean
value of y (at given x)
16.21
Intervals with SPSS
Output
17.22
16.23
Regression Diagnosticsā€¦
There are three conditions that are required in order to
perform a regression analysis. These are:
ā€¢ The error variable must be normally distributed,
ā€¢ The error variable must have a constant variance,
ā€¢ The errors must be independent of each other.
How can we diagnose violations of these conditions?
ļƒØ Residual Analysis, that is, examine the
differences between the actual data points and those
predicted by the linear equationā€¦
16.24
Nonnormalityā€¦
We can take the residuals and put them into a histogram
to visually check for normalityā€¦
ā€¦weā€™re looking for a bell shaped histogram with the
mean close to zero. ļƒ¼
SPSS: Regression>Linear>Save>check Residuals >
unstandardized & standardized
SPSS: Test of normality
Analyze>descriptive statistics >explore>plots
Example 16.2
16.28
Heteroscedasticityā€¦
When the requirement of a constant variance is violated,
we have a condition of heteroscedasticity.
We can diagnose heteroscedasticity by plotting the
residual against the predicted y.
16.29
Heteroscedasticityā€¦
If the variance of the error variable ( ) is not constant,
then we have ā€œheteroscedasticityā€. Hereā€™s the plot of
the residual against the predicted value of y:
there doesnā€™t appear to be a
change in the spread of the
plotted points, therefore no
heteroscedasticity
ļƒ¼
SPSS: Regression>linear>Save>check Predicted Values &
Residuals
SPSS: Graphs>scatter> y-Residual; x-
Predicted Price
16.32
Nonindependence of the Error Variable
If we were to observe the auction price of cars every week
for, say, a year, that would constitute a time series.
When the data are time series, the errors often are
correlated.
Error terms that are correlated over time are said to be
autocorrelated or serially correlated.
We can often detect autocorrelation by graphing the
residuals against the time periods. If a pattern
emerges, it is likely that the independence requirement is
violated.
16.33
Nonindependence of the Error Variable
Patterns in the appearance of the residuals over time
indicates that autocorrelation exists:
Note the runs of positive residuals,
replaced by runs of negative residuals
Note the oscillating behavior of the
residuals around zero.
Durbin-Watson test, one way to test autocorrelation
16.34
Outliersā€¦
An outlier is an observation that is unusually
small or unusually large.
E.g. our used car example had odometer readings
from 19.1 to 49.2 thousand miles. Suppose we
have a value of only 5,000 miles (i.e. a car driven
by an old person only on Sundays ļŠ ) ā€” this point
is an outlier.
16.35
Outliersā€¦
Possible reasons for the existence of outliers include:
ā–« There was an error in recording the value
ā–« The point should not have been included in the sample
ā–« Perhaps the observation is indeed valid.
Outliers can be easily identified from a scatter plot.
If the absolute value of the standard residual is > 2, we
suspect the point may be an outlier and investigate further.
They need to be dealt with since they can easily
influence the least squares lineā€¦
Example 16.2
SPSS: Graph>scatter>x: odomter>y:price
Procedure for Regression Diagnostics
1. Develop a model that has a theoretical basis; that
is, for the dependent variable in question, find an
independent variable that you believe is linearly
related to it.
2. Gather data for the two variables.
3. Draw the scatter diagram to determine whether a
linear model appears to be appropriate. Identify
possible outliers.
4. Determine the regression equation.
5. Calculate the residuals and check slides for the
required conditions.
6. Assess the model fit: Check slides SPSS output
7. If the model fits the data, use the regression
equation to predict a particular value of the
dependent variable or estimate its mean (or both)
From simple linear regression to
multiple regression
ā€¢ Simple linear regression
Education
Income
17.40
Multiple Regressionā€¦
The simple linear regression model was used to
analyze how one interval variable (the dependent
variable y) is related to one other interval variable (the
independent variable x).
Multiple regression allows for any number of
independent variables.
We expect to develop models that fit the data better
than would a simple linear regression model.
Multiple regression
Variable A
Variable D
Variable B
Variable C
Multiple regression
Age
Income
Education
Number of
Family member
earn money
Number of
Children
Year
With current
employer
Occupation
Prestige score
Work hours
Example: GSS2008
ā€¢ How is income affected by
ā–« Age (AGE)
ā–« Education (EDUC)
ā–« Work hours (HRS)
ā–« Spouse work hours (SPHRS)
ā–« Occupation prestige score (PRESTG80)
ā–« Number of children (CHILDS)
ā–« Number of family members earn money (EARNS)
ā–« Years with current employer (CUREMPYR)
17.44
The Modelā€¦
We now assume we have k independent variables potentially
related to the one dependent variable. This relationship is
represented in this first order linear equation:
In the one variable, two dimensional case we drew a regression
line; here we imagine a response surface.
error variable
dependent
variable
independent variables
coefficients
17.45
Estimating the Coefficientsā€¦
The sample regression equation is expressed as:
We will use computer output to:
Assess the modelā€¦
How well it fits the data?
Is it useful?
Are any required conditions violated?
Employ the modelā€¦
Interpreting the coefficients
Predictions using the regression model.
17.46
Regression Analysis Stepsā€¦
u Use a computer and software to generate the
coefficients and the statistics used to assess the model.
v Diagnose violations of required conditions. If there
are problems, attempt to remedy them.
w Assess the modelā€™s fit.
coefficient of determination,
F-test of the analysis of variance.
x If u, v, and w are OK, use the model for prediction.
17.47
Transformationā€¦
Can we transform this data into a mathematical
model that looks like this:
income
education Year with current employā€¦age
17.48
Using SPSS
ā€¢ Analyze > Regression > Linear
Using SPSS
ā€¢ Dependent/Independent
Output
The mathematical model
Å·= -51785.243 +460.87 x1+4100.9 x2+ā€¦+329.771 x8
17.52
The Modelā€¦
Although we havenā€™t done any assessment of the model yet,
at first pass:
Å·= -51785.243 +460.87 x1+4100.9 x2+ 620 x3-862.201 x4ā€¦+329.771 x8
it suggests that increases in AGE, EDUC, HRS,
PRESTG80, EARNRS, CUREMPYR, will positively
impact the income.
Likewise, increases in the SPHRS, CHILDS will
negatively impact the operating marginā€¦
INTERPRET
17.53
Model Assessmentā€¦
We will assess the model in two ways:
Coefficient of determination, and
F-test of the analysis of variance.
17.54
Coefficient of Determinationā€¦
ā€¢ Again, the coefficient of determination is defined
as:
This means that 33.7% of the variation in income is
explained by the six independent variables, but
66.3% remains unexplained.
17.55
Adjusted R2 valueā€¦
The adjustedā€ R2 is:
the coefficient of determination adjusted
for the number of explanatory variables.
It takes into account the sample size n, and k, the
number of independent variables, and is given by:
17.56
Testing the Validity of the Modelā€¦
In a multiple regression model (i.e. more than one
independent variable), we utilize an analysis of
variance technique to test the overall validity of the
model. Hereā€™s the idea:
H0:
H1: At least one is not equal to zero.
If the null hypothesis is true, none of the independent
variables is linearly related to y, and so the model is
invalid.
If at least one is not equal to 0, the model does have
some validity.
17.57
Testing the Validity of the Modelā€¦
ANOVA table for regression analysisā€¦
Source of
Variation
degrees of
freedom
Sums of
Squares
Mean Squares F-Statistic
Regression k SSR MSR = SSR/k F=MSR/MSE
Error nā€“kā€“1 SSE MSE = SSE/(nā€“k-1)
Total nā€“1
A large value of F indicates that most of the variation in y is explained by
the regression equation and that the model is valid. A small value of F
indicates that most of the variation in y is unexplained.
Testing the Validity of the Modelā€¦
P<.o5, at least one is not 0,
Reject H0, accept H1
the the model is valid
17.59
Interpreting the Coefficients*
Intercept (b0) -51785.243 ā€¢ This is the average income when
all of the independent variables are zero. Itā€™s meaningless to try
and interpret this value, particularly if 0 is outside the range of
the values of the independent variables (as is the case here).
Age (b1) 460.87 ā€¢ Each 1 year increase in age will increase
$460.87 in the income.
Education (b2) 4100.9ā€¢ For each additional year of
education, the annual income will increase $4100.9.
Hours of work (b3) 620 ā€¢ each additional hour of work per
week, the annual income will increase $620.
*in each case we assume all other variables are held constantā€¦
17.60
Interpreting the Coefficients*
Spouse hours of work (b4) -862.201ā€¢ For each additional
hour the spouse work per week, the average annual income will
decrease $862.201 .
Occupation Prestige Score (b5) 641ā€¢ For each additional
unit of score, the average annual income increases by $641
Number of Children (b6) -331 ā€¢ For each additional child,
the average income decrease by -331
Number of family members earn money (b7) 687 ā€¢ For
each additional family member earn money, the income
increase by $687
Number of years with current job (b8) 330ā€¢ For each
additional year with current job, the income increase by
$330.
*in each case we assume all other variables are held constantā€¦
17.61
Testing the Coefficientsā€¦
For each independent variable, we can test to
determine whether there is enough evidence of a linear
relationship between it and the dependent variable for
the entire populationā€¦
H0: = 0
H1: ā‰  0
(for i = 1, 2, ā€¦, k) and using:
as our test statistic (with nā€“kā€“1 degrees of freedom).
17.62
Testing the Coefficients
We can use SPSS output to quickly test each of the
8 coefficients in our modelā€¦
Thus, EDUC, HRS, SPHRS, PRESTG80, are linearly related to the
operating margin. There is no evidence to infer that AGE, CHILDS,
EARNS, CUREMPYR are linearly related to operating margin.
17.63
Using the Regression Equation
Much like we did with simple linear regression, we
can produce a prediction interval for a
particular value of y.
As well, we can produce the confidence interval
estimate of the expected value of y.
17.64
Using the Regression Equation
Exercise GSS2008:
We add one row (our given values for the independent
variables) to the bottom of our data set, please produce
ā–« prediction interval
ā–« confidence interval estimate
For the dependent variable y.
17.65
Regression Diagnostics I
Exercise GSS2008
ā€¢ Calculate the residuals and check the following:
ā–« Is the error variable nonnormal?
ā–« Perform a normality test
ā€¢ Is the error variance constant?
ā–« Plot the residuals versus the predicted values of y.
ā€¢ Are the errors independent (time-series data)?
ā–« Plot the residuals versus the time periods.
ā€¢ Are there observations that are inaccurate or do
not belong to the target population?
ā–« Double-check the accuracy of outliers and influential
observations.
17.66
Regression Diagnostics II
ā€¢ Multiple regression models have a problem that
simple regressions do not, namely
multicollinearity.
ā€¢ It happens when the independent variables
are highly correlated.
ā€¢ Weā€™ll explore this concept through the following
exampleā€¦
17.67
Example GSS2008
ā€¢ AGE and CUREMPYR are not significant
predictor for INCOME in multiple regression
model, but when we run correlation between
AGE and INCOME, CUREMPYR and INCOME.
They are both significantly correlated.
ā€¢ How to account for this apparent contradiction?
ā€¢ The answer is that the AGE and CUREMPYR are
correlated, all three independent variables
are correlated with each other !
ā€¢ The is the problem of multicollinearity.
Multiple Regression Output
How to deal with multicollinearity
problem
ā€¢ Multicollinearity exits in virtually all multiple
regression models.
ā€¢ To minimize the effect:
ā–« Try to include independent variables that are
independent of each other.
ā–« Develop a model that has a theoretical basis and
include IVs that are necessary.
17.71
Regression Diagnostics III ā€“ Time Series
ā€¢ The Durbin-Watson test allows us to determine
whether there is evidence of first-order
autocorrelation ā€” a condition in which a
relationship exists between consecutive
residuals, i.e. ei-1 and ei (i is the time period). The
statistic for this test is defined as:
ā€¢ d has a range of values: 0 ā‰¤ d ā‰¤ 4.
17.72
Durbinā€“Watson (two-tail test)
ā€¢ To test for first-order autocorrelation:
ā€¢ If d < dL or d > 4 ā€“ dL , first-order
autocorrelation exists.
ā€¢ If d falls between dL and dU or between 4 ā€“ dU
and 4 ā€“ dU , the test is inconclusive.
ā€¢ If d falls between dU and 4 ā€“ dU there is no
evidence of first order autocorrelation.
4-dU 4-dL
exists existsinconclusive
dUdL 2 40
inconclusive doesnā€™t exist
17.73
Example 17.1 Xm17-01
Can we create a model that will predict lift ticket
sales at a ski hill based on two weather
parameters?
Variables:
y - lift ticket sales during Christmas week,
x1 - total snowfall (inches), and
x2 - average temperature (degrees Fahrenheit)
Our ski hill manager collected 20 years of data.
17.74
Example 17.1
Both the coefficient of determination
and the p-value of the F-test indicate
the model is poorā€¦
Neither variable is linearly related
to ticket saleā€¦
17.75
Example 17.1
ā€¢ The histogram of residualsā€¦
ā€¢ reveals the errors may be normally distributedā€¦
17.76
Example 17.1
ā€¢ In the plot of residuals versus predicted values
(testing for heteroscedasticity) ā€” the error
variance appears to be constantā€¦
17.77
Example 17.1 Durbin-Watson
ā€¢ Apply the Durbin-Watson Statistic from to the entire list of
residuals.
ā€¢ Regression>Linear>Statistics>check Durbin-Watson
17.78
Example 17.1
To test for first-order autocorrelation with Ī± = .05, we
find in Table 8(a) in Appendix B
dL = 1.10 and dU = 1.54
The null and alternative hypotheses are
H0 : There is no first-order autocorrelation.
H1 : There is first-order autocorrelation.
The rejection region includes d < dL = 1.10. Since d =
.593, we reject the null hypothesis and conclude that
there is enough evidence to infer that first-order
autocorrelation exists.
17.79
Example 17.1
Autocorrelation usually indicates that the model needs to
include an independent variable that has a time-ordered
effect on the dependent variable.
The simplest such independent variable represents the
time periods. We included a third independent variable
that records the number of years since the year the data
were gathered. Thus, x3 = 1, 2,..., 20. The new model is
y = Ī²0 + Ī²1x1 + Ī²2x2 + Ī²3x3 + Īµ
17.80
Example 17.1
The fit of the model is high,
The model is validā€¦
Snowfall and time are linearly related to
ticket sales; temperature is notā€¦
our new
variable
dL = 1.10 and dU = 1.54
dU <d<4- dU, first-order
autocorrelation doesn't exit
17.81
Example 17.1
ā€¢ The Durbin-Watson statistic against the residuals
from our Regression analysis is equal to 1.885.
ā€¢ we can conclude that there is not enough evidence
to infer the presence of first-order
autocorrelation. (Determining dL is left as an
exercise for the readerā€¦)
ā€¢ Hence, we have improved out model dramatically!
17.82
Example 17.1
Notice that the model is improved dramatically.
The F-test tells us that the model is valid. The t-tests tell us that
both the amount of snowfall and time are significantly linearly
related to the number of lift tickets.
This information could prove useful in advertising for the resort.
For example, if there has been a recent snowfall, the resort
could emphasize that in its advertising.
If no new snow has fallen, it may emphasize their snow-making
facilities.
18.83
Model Selection
Regression analysis can also be used for:
ā€¢ non-linear (polynomial) models, and
ā€¢ models that include nominal independent
variables.
18.84
Polynomial Models
Previously we looked at this multiple regression
model:
(its considered linear or first-order since the
exponent on each of the xiā€™s is 1)
The independent variables may be functions of a
smaller number of predictor variables; polynomial
models fall into this category. If there is one
predictor value (x) we have:
18.85
Polynomial Models
u
v
Technically, equation vis a multiple regression model
with p independent variables (x1, x2, ā€¦, xp). Since x1 =
x, x2 = x2, x3 = x3, ā€¦, xp = xp, its based on one predictor
value (x).
p is the order of the equation; weā€™ll focus equations of
order p = 1, 2, and 3.
18.86
First Order Model
When p = 1, we have our simple linear regression model:
That is, we believe there is a straight-line relationship
between the dependent and independent variables over the
range of the values of x:
18.87
Second Order Model
When p = 2, the polynomial model is a parabola:
18.88
Third Order Model
When p = 3, our third order model looks like:
18.89
Polynomial Models: 2 Predictor
Variables
Perhaps we suspect that there are two predictor
variables (x1 & x2) which influence the dependent
variable:
First order model (no interaction):
First order model (with interaction):
18.90
Polynomial Models: 2 Predictor Variables
First order models, 2 predictors, without & with interaction:
18.91
Polynomial Models: 2 Predictor Variables
If we believe that a quadratic relationship exists between y
and each of x1 and x2, and that the predictor variables
interact in their effect on y, we can use this model:
Second order model (in two variables) WITH interaction:
18.92
Polynomial Models: 2 Predictor
Variables
2nd order models, 2 predictors, without & with interaction:
18.93
Selecting a Model
One predictor variable, or two (or more)?
First order? Second order? Higher order?
With interaction? Without?
How do we choose the right model??
Use our knowledge of the variables involved to
build an initial model.
Test that model using statistical techniques.
If required, modify our model and re-testā€¦
18.94
Example 18.1
Weā€™ve been asked to come up with a regression model
for a fast food restaurant. We know our primary
market is middle-income adults and their children,
particularly those between the ages of 5 and 12.
Dependent variable ā€”restaurant revenue (gross or net)
Predictor variables ā€” family income, age of children
Is the relationship first order? quadratic?ā€¦
18.95
Example 18.1
The relationship between the dependent variable (revenue)
and each predictor variable is probably quadratic.
Members of low or high income households are less likely to eat at this chainā€™s
restaurants, since the restaurants attract mostly middle-income customers.
Neighborhoods where the mean age of children is either quite low or quite high
are also less likely to eat there vs. the families with children in the 5-to-12 year
range.
Seems reasonable?
18.96
Example 18.1
Should we include the interaction term in our model?
When in doubt, it is probably best to include
it.
Our model then, is:
Where y = annual gross sales
x1 = median annual household income*
x2 = mean age of children*
*in the neighborhood
18.97
Example 18.2 Xm18-02
Our fast food restaurant research department
selected 25 locations at random and gathered data
on revenues, household income, and ages of
neighborhood children.
Collected Data Calculated Data
18.98
Example 18.2
You can take the original data collected (revenues,
household income, and age) and plot y vs. x1 and y
vs. x2 to get a feel for the data; trend lines were
added for clarityā€¦
18.99
Example 18.2
Checking the regression toolā€™s outputā€¦
The model fits the data well
and its validā€¦
INTERPRET
18.100
Nominal Independent Variables
Thus far in our regression analysis, weā€™ve only
considered variables that are interval. Often
however, we need to consider nominal data in
our analysis.
For example, our earlier example regarding the
market for used cars focused only on mileage.
Perhaps color is an important factor. How can we
model this new variable?
18.101
Indicator Variables
An indicator variable (also called a dummy
variable) is a variable that can assume either one
of only two values (usually 0 and 1).
A value of 1 usually indicates the existence of a certain
condition, while a value of 0 usually indicates that the
condition does not hold.
I1 =
I2 =
0 if color not white
1 if color is white
0 if color not silver
1 if color is silver
Car Color I1 I2
white 1 0
silver 0 1
other 0 0
two tone! 1 1
to represent m categoriesā€¦
we need mā€“1 indicator variables
18.102
Interpreting Indicator Variable Coefficients
After performing our regression analysis:
we have this regression equationā€¦
Thus, the price diminishes with additional mileage (x)
a white car sells for $91.10 more than other colors (I1)
a silver car fetches $330.40 more than other colors (I2)
18.103
Graphically
18.104
Testing the Coefficients
To test the coefficient of I1, we use these
hypothesesā€¦
H0: = 0
H1: ā‰  0
There is insufficient evidence to infer that in the
population of 3-year-old white Tauruses with the same
odometer reading have a different selling price than
do Tauruses in the ā€œotherā€ color categoryā€¦
18.105
Testing the Coefficients
To test the coefficient of I2, we use these
hypothesesā€¦
H0: = 0
H1: ā‰  0
We can conclude that there are differences in
auction selling prices between all 3-year-old
silver-colored Tauruses and the ā€œotherā€ color
category with the same odometer readings
Stepwise Regression
ā€¢ Stepwise Regression is an iterative procedure
that adds and deletes one independent variable
at a time. The decision to add or delete a variable
is made on the basis of whether that variable
improves the model.
ā€¢ It is a procedure that can eliminate correlated
independent variables.
Step 1: do simultaneous regression and
rank all the significant variables
No.1
No.4
No.2
No.3
Step 2
ā€¢ Analyzeļƒ 
ā€¢ Regressionļƒ 
ā€¢ Linearļƒ 
ā€¢ Stepwiseļƒ 
ā€¢ Dependent variableļƒ 
ā€¢ Independent variables (1st round: the top
predictor; 2nd round: the top predictor & the 2nd
top predictorā€¦until the nth round; n = number
of predictors
ā€¢ Statisticsļƒ 
ā€¢ R square change & Descriptives
ā€¢ Stepwise output
ā€¢ What to read?
ā€¢ R2 , R2 change, F of R2 change, significance level
of F of R2 change in each round
Stepwise output
ā€¢ The regression equation
ā€¢ Simulaneous: Å·= āˆ’51785.243 +460.87 AGE+4100.9 EDUC+ 620 HRSāˆ’862.201
SPHRSā€¦ ā€¦ +329.771 CUREMPRY
ā€¢ Stepwise: Å·= -44703.12 +3944.7 EDUS-617.37SPHRS+526.493PRESTG80+956.933HRS
Multiple regression
ā€¢ Multiple regression examines the predictability
of a set of predictors on a dependent variable
(criterion)
ā€¢ Why donā€™t we just throw in all the predictors and
let the MR determine which ones are good
predictors then?
ā€¢ Reason 1: Theoretical consideration
ā€¢ Reason 2: Concern of sample size
Concern of sample size
ā€¢ The desired level is 20 observations for each
independent variable
ā€¢ For instance, if you have 6 predictors, youā€™ve got
to have at least 120 subjects in your data
ā€¢ However, if a stepwise procedure is employed,
the recommended level increases to 50 to 1
ā€¢ That is, youā€™ve got to have at least 300 subjects
in order to run stepwise MR
18.114
Model Building
Here is a procedure for building a regression model:
uIdentify the dependent variable; what is it we
wish to predict? Donā€™t forget the variableā€™s unit of
measure.
vList potential predictors; how would changes in
predictors change the dependent variable? Be
selective; go with the fewest independent variables
required. Be aware of the effects of multicollinearity.
w Gather the data; at least six? observations for
each independent variable used in the equation.
18.115
Model Building
x Identify several possible models; formulate
first- and second- order models with and without
interaction. Draw scatter diagrams.
y Use statistical software to estimate the
models.
z Determine whether the required conditions
are satisfied; if not, attempt to correct the problem.
{ Use your judgment and the statistical output
to select the best model!

More Related Content

What's hot

Linear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data ScienceLinear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data ScienceSumit Pandey
Ā 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisMuhammad Fazeel
Ā 
Correlation Statistics
Correlation StatisticsCorrelation Statistics
Correlation Statisticstahmid rashid
Ā 
Regression analysis in excel
Regression analysis in excelRegression analysis in excel
Regression analysis in excelThilina Rathnayaka
Ā 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepDan Wellisch
Ā 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regressiondessybudiyanti
Ā 
Simple regression and correlation
Simple regression and correlationSimple regression and correlation
Simple regression and correlationMary Grace
Ā 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & RegressionGrant Heller
Ā 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regressionAbdelaziz Tayoun
Ā 
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierRegression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierAl Arizmendez
Ā 
Chapter 16: Correlation (enhanced by VisualBee)
Chapter 16: Correlation  
(enhanced by VisualBee)Chapter 16: Correlation  
(enhanced by VisualBee)
Chapter 16: Correlation (enhanced by VisualBee)nunngera
Ā 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysisShiela Vinarao
Ā 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression pptSantosh Bhaskar
Ā 
multiple regression
multiple regressionmultiple regression
multiple regressionPriya Sharma
Ā 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24Ruru Chowdhury
Ā 

What's hot (20)

Linear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data ScienceLinear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data Science
Ā 
Regression
RegressionRegression
Regression
Ā 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
Ā 
Correlation Statistics
Correlation StatisticsCorrelation Statistics
Correlation Statistics
Ā 
Regression analysis in excel
Regression analysis in excelRegression analysis in excel
Regression analysis in excel
Ā 
Simple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-StepSimple Linear Regression: Step-By-Step
Simple Linear Regression: Step-By-Step
Ā 
Simple Linier Regression
Simple Linier RegressionSimple Linier Regression
Simple Linier Regression
Ā 
Simple regression and correlation
Simple regression and correlationSimple regression and correlation
Simple regression and correlation
Ā 
Correlation & Regression
Correlation & RegressionCorrelation & Regression
Correlation & Regression
Ā 
Correlation continued
Correlation continuedCorrelation continued
Correlation continued
Ā 
Simple lin regress_inference
Simple lin regress_inferenceSimple lin regress_inference
Simple lin regress_inference
Ā 
Correlation 2
Correlation 2Correlation 2
Correlation 2
Ā 
Correlation and regression
Correlation and regressionCorrelation and regression
Correlation and regression
Ā 
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn LottierRegression Analysis presentation by Al Arizmendez and Cathryn Lottier
Regression Analysis presentation by Al Arizmendez and Cathryn Lottier
Ā 
Chapter 16: Correlation (enhanced by VisualBee)
Chapter 16: Correlation  
(enhanced by VisualBee)Chapter 16: Correlation  
(enhanced by VisualBee)
Chapter 16: Correlation (enhanced by VisualBee)
Ā 
Correlation Analysis
Correlation AnalysisCorrelation Analysis
Correlation Analysis
Ā 
Correlation analysis
Correlation analysisCorrelation analysis
Correlation analysis
Ā 
Correlation and Regression ppt
Correlation and Regression pptCorrelation and Regression ppt
Correlation and Regression ppt
Ā 
multiple regression
multiple regressionmultiple regression
multiple regression
Ā 
Statr session 23 and 24
Statr session 23 and 24Statr session 23 and 24
Statr session 23 and 24
Ā 

Similar to Quantitative Research Methods Lecture Regression Diagnostics

Simple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptxSimple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptxSoumyaBansal7
Ā 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdflisow86669
Ā 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regressionnszakir
Ā 
Chap013.ppt
Chap013.pptChap013.ppt
Chap013.pptnajwalyaa
Ā 
Business Analytics Foundation with R Tools Part 1
Business Analytics Foundation with R Tools Part 1Business Analytics Foundation with R Tools Part 1
Business Analytics Foundation with R Tools Part 1Beamsync
Ā 
Chapter05
Chapter05Chapter05
Chapter05rwmiller
Ā 
IPPTCh013.pptx
IPPTCh013.pptxIPPTCh013.pptx
IPPTCh013.pptxManoloTaquire
Ā 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptxarsh260174
Ā 
Regression Analysis Techniques.pptx
Regression Analysis Techniques.pptxRegression Analysis Techniques.pptx
Regression Analysis Techniques.pptxYutaItadori
Ā 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxChode Amarnath
Ā 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxmadlynplamondon
Ā 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxbudbarber38650
Ā 
Econometrics project
Econometrics projectEconometrics project
Econometrics projectShubham Joon
Ā 
10 Nonparamatric statistics
10 Nonparamatric statistics10 Nonparamatric statistics
10 Nonparamatric statisticsPenny Jiang
Ā 
Module 3_ Classification.pptx
Module 3_ Classification.pptxModule 3_ Classification.pptx
Module 3_ Classification.pptxnikshaikh786
Ā 

Similar to Quantitative Research Methods Lecture Regression Diagnostics (20)

Simple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptxSimple Regression Analysis ch12.pptx
Simple Regression Analysis ch12.pptx
Ā 
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdfregression-linearandlogisitics-220524024037-4221a176 (1).pdf
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
Ā 
Linear and Logistics Regression
Linear and Logistics RegressionLinear and Logistics Regression
Linear and Logistics Regression
Ā 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
Ā 
Chap013.ppt
Chap013.pptChap013.ppt
Chap013.ppt
Ā 
Business Analytics Foundation with R Tools Part 1
Business Analytics Foundation with R Tools Part 1Business Analytics Foundation with R Tools Part 1
Business Analytics Foundation with R Tools Part 1
Ā 
Chapter05
Chapter05Chapter05
Chapter05
Ā 
IPPTCh013.pptx
IPPTCh013.pptxIPPTCh013.pptx
IPPTCh013.pptx
Ā 
Regression Analysis.pptx
Regression Analysis.pptxRegression Analysis.pptx
Regression Analysis.pptx
Ā 
Regression Analysis Techniques.pptx
Regression Analysis Techniques.pptxRegression Analysis Techniques.pptx
Regression Analysis Techniques.pptx
Ā 
Chap013.ppt
Chap013.pptChap013.ppt
Chap013.ppt
Ā 
LINEAR REGRESSION.pptx
LINEAR REGRESSION.pptxLINEAR REGRESSION.pptx
LINEAR REGRESSION.pptx
Ā 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
Ā 
Unit 03 - Consolidated.pptx
Unit 03 - Consolidated.pptxUnit 03 - Consolidated.pptx
Unit 03 - Consolidated.pptx
Ā 
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docxDistribution of EstimatesLinear Regression ModelAssume (yt,.docx
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Ā 
Quantity Demand Analysis
Quantity Demand AnalysisQuantity Demand Analysis
Quantity Demand Analysis
Ā 
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docxFSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
FSE 200AdkinsPage 1 of 10Simple Linear Regression Corr.docx
Ā 
Econometrics project
Econometrics projectEconometrics project
Econometrics project
Ā 
10 Nonparamatric statistics
10 Nonparamatric statistics10 Nonparamatric statistics
10 Nonparamatric statistics
Ā 
Module 3_ Classification.pptx
Module 3_ Classification.pptxModule 3_ Classification.pptx
Module 3_ Classification.pptx
Ā 

More from Penny Jiang

Step 9 Evaluating the Strategic Plan
Step 9 Evaluating the Strategic PlanStep 9 Evaluating the Strategic Plan
Step 9 Evaluating the Strategic PlanPenny Jiang
Ā 
Step 8 Implementing the Strategic Plan
Step 8 Implementing the Strategic PlanStep 8 Implementing the Strategic Plan
Step 8 Implementing the Strategic PlanPenny Jiang
Ā 
Step 7 Selecting Tactics
Step 7 Selecting TacticsStep 7 Selecting Tactics
Step 7 Selecting TacticsPenny Jiang
Ā 
Step 6 Developing the Message Strategy
Step 6 Developing the Message StrategyStep 6 Developing the Message Strategy
Step 6 Developing the Message StrategyPenny Jiang
Ā 
Chinese calligraphy
Chinese calligraphyChinese calligraphy
Chinese calligraphyPenny Jiang
Ā 
7 anova chi square test
 7 anova chi square test 7 anova chi square test
7 anova chi square testPenny Jiang
Ā 
6 estimation hypothesis testing t test
6 estimation hypothesis testing t test6 estimation hypothesis testing t test
6 estimation hypothesis testing t testPenny Jiang
Ā 
5 numerical descriptive statitics
5 numerical descriptive statitics5 numerical descriptive statitics
5 numerical descriptive statiticsPenny Jiang
Ā 
4 sampling
4 sampling4 sampling
4 samplingPenny Jiang
Ā 
3 survey, questionaire, graphic techniques
3 survey, questionaire, graphic techniques3 survey, questionaire, graphic techniques
3 survey, questionaire, graphic techniquesPenny Jiang
Ā 
2 statistics, measurement, graphical techniques
2 statistics, measurement, graphical techniques2 statistics, measurement, graphical techniques
2 statistics, measurement, graphical techniquesPenny Jiang
Ā 
1 introduction
1 introduction1 introduction
1 introductionPenny Jiang
Ā 
2 elements of design line
2 elements of design line2 elements of design line
2 elements of design linePenny Jiang
Ā 

More from Penny Jiang (13)

Step 9 Evaluating the Strategic Plan
Step 9 Evaluating the Strategic PlanStep 9 Evaluating the Strategic Plan
Step 9 Evaluating the Strategic Plan
Ā 
Step 8 Implementing the Strategic Plan
Step 8 Implementing the Strategic PlanStep 8 Implementing the Strategic Plan
Step 8 Implementing the Strategic Plan
Ā 
Step 7 Selecting Tactics
Step 7 Selecting TacticsStep 7 Selecting Tactics
Step 7 Selecting Tactics
Ā 
Step 6 Developing the Message Strategy
Step 6 Developing the Message StrategyStep 6 Developing the Message Strategy
Step 6 Developing the Message Strategy
Ā 
Chinese calligraphy
Chinese calligraphyChinese calligraphy
Chinese calligraphy
Ā 
7 anova chi square test
 7 anova chi square test 7 anova chi square test
7 anova chi square test
Ā 
6 estimation hypothesis testing t test
6 estimation hypothesis testing t test6 estimation hypothesis testing t test
6 estimation hypothesis testing t test
Ā 
5 numerical descriptive statitics
5 numerical descriptive statitics5 numerical descriptive statitics
5 numerical descriptive statitics
Ā 
4 sampling
4 sampling4 sampling
4 sampling
Ā 
3 survey, questionaire, graphic techniques
3 survey, questionaire, graphic techniques3 survey, questionaire, graphic techniques
3 survey, questionaire, graphic techniques
Ā 
2 statistics, measurement, graphical techniques
2 statistics, measurement, graphical techniques2 statistics, measurement, graphical techniques
2 statistics, measurement, graphical techniques
Ā 
1 introduction
1 introduction1 introduction
1 introduction
Ā 
2 elements of design line
2 elements of design line2 elements of design line
2 elements of design line
Ā 

Recently uploaded

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
Ā 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
Ā 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
Ā 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
Ā 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
Ā 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
Ā 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
Ā 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
Ā 
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļøcall girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø9953056974 Low Rate Call Girls In Saket, Delhi NCR
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
Ā 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
Ā 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
Ā 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
Ā 

Recently uploaded (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
Ā 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Ā 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
Ā 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
Ā 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
Ā 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
Ā 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Ā 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
Ā 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Ā 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
Ā 
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļøcall girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
call girls in Kamla Market (DELHI) šŸ” >ą¼’9953330565šŸ” genuine Escort Service šŸ”āœ”ļøāœ”ļø
Ā 
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Tilak Nagar Delhi reach out to us at šŸ”9953056974šŸ”
Ā 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
Ā 
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
ā€œOh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
Ā 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
Ā 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
Ā 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
Ā 

Quantitative Research Methods Lecture Regression Diagnostics

  • 1. Quantitative Research Methods Lecture 9 Model Building 1. Regression Diagnostics I 2. Regression Diagnostics II Multicollinearity 3. Regression Diagnostics II Time series 4. Polynomial Models 5. Nominal variable in Multiple Regression 6. Stepwise Multiple Regression
  • 2. Statistical analyses ā€¢ Group differences (nominal variable) on one interval variable: ā–« T-tests (2 groups) ā–« ANOVA (3 or more groups) ļ‚– One factor: one way ANOVA ļ‚– Two factor: two way/factor ANOVA ā€¢ The relationship between two nominal variable: ā–« Chi-square test ā€¢ The relationship between two interval variable: ā–« Correlation, simple linear regression ā€¢ The relationship between multiple interval variable on one interval variable ā–« Multiple regression ā€¢ The relationship between multiple interval variable on one nominal variable (yes/no) ā–« Logistic regression
  • 3. Regression ā€¢ Single Linear Regression (interval) ā–« one independent, one dependent ā€¢ Multiple Regression (all interval) ā–« Multiple independent, one dependent ā€¢ Logistic Regression ā–« Multiple interval independent, one nominal dependent (Yes/No) ā–« Check example: https://youtu.be/H_48AcV0qlY ā–«
  • 4. 16.4 Simple Linear Regression Modelā€¦ A straight line model with one independent variable is called a simple linear regression model. Its is written as: error variable dependent variable independent variable y-intercept slope of the line
  • 5. 16.5 Simple Linear Regression Modelā€¦ Note that both and are population parameters which are usually unknown and hence estimated from the data. y x run rise =slope (=rise/run) =y-intercept
  • 6. 16.6 Estimating the Coefficientsā€¦ In much the same way we base estimates of Āµ on x , we estimate Ī²0 using b0 and Ī²1 using b1, the y-intercept and slope (respectively) of the least squares or regression line given by: (Recall: this is an application of the least squares method and it produces a straight line that minimizes the sum of the squared differences between the points and the line)
  • 7. 16.7 Least Squares Lineā€¦ these differences are called residuals Example 16.1
  • 8. 16.8 Example 16.2ā€¦ Car dealers across North America use the "Red Book" to help them determine the value of used cars that their customers trade in when purchasing new cars. The book, which is published monthly, lists the trade-in values for all basic models of cars. It provides alternative values for each car model according to its condition and optional features. The values are determined on the basis of the average paid at recent used-car auctions, the source of supply for many used-car dealers.
  • 9. 16.9 Example 16.2ā€¦ However, the Red Book does not indicate the value determined by the odometer reading, despite the fact that a critical factor for used-car buyers is how far the car has been driven. To examine this issue, a used-car dealer randomly selected 100 three-year old Toyota Camrys that were sold at auction during the past month. The dealer recorded the price ($1,000) and the number of miles (thousands) on the odometer. (Xm16-02). The dealer wants to find the regression line.
  • 10. 16.10 Using SPSS Analyze > Regression > Linear Simple Linear Regression SPSS Steps: Analyze > Regression > Linear
  • 11. 16.11 SPSS Output: check three tables R2 strength of the linear relationship Model significance /fit b1 b0
  • 12. 16.12 Example 16.2ā€¦ As you might expect with used carsā€¦ The slope coefficient, b1, is ā€“0.0669, that is, each additional mile on the odometer decreases the price by $.0669 or 6.69Ā¢ The intercept, b0, is 17,250. One interpretation would be that when x = 0 (no miles on the car) the selling price is $17,250. However, we have no data for cars with less than 19,100 miles on them so this isnā€™t a correct assessment.
  • 13. 16.13 Testing the Slopeā€¦ If no linear relationship exists between the two variables, we would expect the regression line to be horizontal, that is, to have a slope of zero. We want to see if there is a linear relationship, i.e. we want to see if the slope (Ī²1) is something other than zero. Our research hypothesis becomes: H1: Ī²1 ā‰  0 Thus the null hypothesis becomes: H0: Ī²1 = 0
  • 14. 16.14 Coefficient of Determinationā€¦ Tests thus far have shown if a linear relationship exists; it is also useful to measure the strength of the relationship. This is done by calculating the coefficient of determination ā€“ R2. The coefficient of determination is the square of the coefficient of correlation (r), hence R2 = (r)2
  • 15. 16.15 Coefficient of Determinationā€¦ As we did with analysis of variance, we can partition the variation in y into two parts: Variation in y = SSE + SSR SSE ā€“ Sum of Squares Error ā€“ measures the amount of variation in y that remains unexplained (i.e. due to error) SSR ā€“ Sum of Squares Regression ā€“ measures the amount of variation in y explained by variation in the independent variable x.
  • 16. 16.16 Coefficient of Determination R2 has a value of .6483. This means 64.83% of the variation in the auction selling prices (y) is explained by the variation in the odometer readings (x). The remaining 35.17% is unexplained, i.e. due to error. Unlike the value of a test statistic, the coefficient of determination does not have a critical value that enables us to draw conclusions. In general the higher the value of R2, the better the model fits the data. R2 = 1: Perfect match between the line and the data points. R2 = 0: There are no linear relationship between x and y.
  • 17. 16.17 Using the Regression Equationā€¦ We could use our regression equation: y = 17.250 ā€“ .0669x to predict the selling price of a car with 40 (,000) miles on it: y = 17.250 ā€“ .0669x = 17.250 ā€“ .0669(40) = 14,574 We call this value ($14,574) a point prediction. Chances are though the actual selling price will be different, hence we can estimate the selling price in terms of an interval.
  • 18. 16.18 Prediction Interval The prediction interval is used when we want to predict one particular value of the dependent variable, given a specific value of the independent variable: (xg is the given value of x weā€™re interested in)
  • 19. 16.19 Confidence Interval Estimatorā€¦ ā€¦of the expected value of y. In this case, we are estimating the mean of y given a value of x: (Technically this formula is used for infinitely large populations. However, we can interpret our problem as attempting to determine the average selling price of all Toyota Camrys, all with 40,000 miles on the odometer)
  • 20. 16.20 Whatā€™s the Difference? 1 no 1 The confidence interval estimate of the expected value of y will be narrower than the prediction interval for the same given value of x and confidence level. This is because there is less error in estimating a mean value as opposed to predicting an individual value. Prediction Interval Confidence Interval Used to estimate the value of one value of y (at given x) Used to estimate the mean value of y (at given x)
  • 23. 16.23 Regression Diagnosticsā€¦ There are three conditions that are required in order to perform a regression analysis. These are: ā€¢ The error variable must be normally distributed, ā€¢ The error variable must have a constant variance, ā€¢ The errors must be independent of each other. How can we diagnose violations of these conditions? ļƒØ Residual Analysis, that is, examine the differences between the actual data points and those predicted by the linear equationā€¦
  • 24. 16.24 Nonnormalityā€¦ We can take the residuals and put them into a histogram to visually check for normalityā€¦ ā€¦weā€™re looking for a bell shaped histogram with the mean close to zero. ļƒ¼
  • 25. SPSS: Regression>Linear>Save>check Residuals > unstandardized & standardized
  • 26. SPSS: Test of normality Analyze>descriptive statistics >explore>plots
  • 28. 16.28 Heteroscedasticityā€¦ When the requirement of a constant variance is violated, we have a condition of heteroscedasticity. We can diagnose heteroscedasticity by plotting the residual against the predicted y.
  • 29. 16.29 Heteroscedasticityā€¦ If the variance of the error variable ( ) is not constant, then we have ā€œheteroscedasticityā€. Hereā€™s the plot of the residual against the predicted value of y: there doesnā€™t appear to be a change in the spread of the plotted points, therefore no heteroscedasticity ļƒ¼
  • 31. SPSS: Graphs>scatter> y-Residual; x- Predicted Price
  • 32. 16.32 Nonindependence of the Error Variable If we were to observe the auction price of cars every week for, say, a year, that would constitute a time series. When the data are time series, the errors often are correlated. Error terms that are correlated over time are said to be autocorrelated or serially correlated. We can often detect autocorrelation by graphing the residuals against the time periods. If a pattern emerges, it is likely that the independence requirement is violated.
  • 33. 16.33 Nonindependence of the Error Variable Patterns in the appearance of the residuals over time indicates that autocorrelation exists: Note the runs of positive residuals, replaced by runs of negative residuals Note the oscillating behavior of the residuals around zero. Durbin-Watson test, one way to test autocorrelation
  • 34. 16.34 Outliersā€¦ An outlier is an observation that is unusually small or unusually large. E.g. our used car example had odometer readings from 19.1 to 49.2 thousand miles. Suppose we have a value of only 5,000 miles (i.e. a car driven by an old person only on Sundays ļŠ ) ā€” this point is an outlier.
  • 35. 16.35 Outliersā€¦ Possible reasons for the existence of outliers include: ā–« There was an error in recording the value ā–« The point should not have been included in the sample ā–« Perhaps the observation is indeed valid. Outliers can be easily identified from a scatter plot. If the absolute value of the standard residual is > 2, we suspect the point may be an outlier and investigate further. They need to be dealt with since they can easily influence the least squares lineā€¦
  • 37.
  • 38. Procedure for Regression Diagnostics 1. Develop a model that has a theoretical basis; that is, for the dependent variable in question, find an independent variable that you believe is linearly related to it. 2. Gather data for the two variables. 3. Draw the scatter diagram to determine whether a linear model appears to be appropriate. Identify possible outliers. 4. Determine the regression equation. 5. Calculate the residuals and check slides for the required conditions. 6. Assess the model fit: Check slides SPSS output 7. If the model fits the data, use the regression equation to predict a particular value of the dependent variable or estimate its mean (or both)
  • 39. From simple linear regression to multiple regression ā€¢ Simple linear regression Education Income
  • 40. 17.40 Multiple Regressionā€¦ The simple linear regression model was used to analyze how one interval variable (the dependent variable y) is related to one other interval variable (the independent variable x). Multiple regression allows for any number of independent variables. We expect to develop models that fit the data better than would a simple linear regression model.
  • 41. Multiple regression Variable A Variable D Variable B Variable C
  • 42. Multiple regression Age Income Education Number of Family member earn money Number of Children Year With current employer Occupation Prestige score Work hours
  • 43. Example: GSS2008 ā€¢ How is income affected by ā–« Age (AGE) ā–« Education (EDUC) ā–« Work hours (HRS) ā–« Spouse work hours (SPHRS) ā–« Occupation prestige score (PRESTG80) ā–« Number of children (CHILDS) ā–« Number of family members earn money (EARNS) ā–« Years with current employer (CUREMPYR)
  • 44. 17.44 The Modelā€¦ We now assume we have k independent variables potentially related to the one dependent variable. This relationship is represented in this first order linear equation: In the one variable, two dimensional case we drew a regression line; here we imagine a response surface. error variable dependent variable independent variables coefficients
  • 45. 17.45 Estimating the Coefficientsā€¦ The sample regression equation is expressed as: We will use computer output to: Assess the modelā€¦ How well it fits the data? Is it useful? Are any required conditions violated? Employ the modelā€¦ Interpreting the coefficients Predictions using the regression model.
  • 46. 17.46 Regression Analysis Stepsā€¦ u Use a computer and software to generate the coefficients and the statistics used to assess the model. v Diagnose violations of required conditions. If there are problems, attempt to remedy them. w Assess the modelā€™s fit. coefficient of determination, F-test of the analysis of variance. x If u, v, and w are OK, use the model for prediction.
  • 47. 17.47 Transformationā€¦ Can we transform this data into a mathematical model that looks like this: income education Year with current employā€¦age
  • 48. 17.48 Using SPSS ā€¢ Analyze > Regression > Linear
  • 51. The mathematical model Å·= -51785.243 +460.87 x1+4100.9 x2+ā€¦+329.771 x8
  • 52. 17.52 The Modelā€¦ Although we havenā€™t done any assessment of the model yet, at first pass: Å·= -51785.243 +460.87 x1+4100.9 x2+ 620 x3-862.201 x4ā€¦+329.771 x8 it suggests that increases in AGE, EDUC, HRS, PRESTG80, EARNRS, CUREMPYR, will positively impact the income. Likewise, increases in the SPHRS, CHILDS will negatively impact the operating marginā€¦ INTERPRET
  • 53. 17.53 Model Assessmentā€¦ We will assess the model in two ways: Coefficient of determination, and F-test of the analysis of variance.
  • 54. 17.54 Coefficient of Determinationā€¦ ā€¢ Again, the coefficient of determination is defined as: This means that 33.7% of the variation in income is explained by the six independent variables, but 66.3% remains unexplained.
  • 55. 17.55 Adjusted R2 valueā€¦ The adjustedā€ R2 is: the coefficient of determination adjusted for the number of explanatory variables. It takes into account the sample size n, and k, the number of independent variables, and is given by:
  • 56. 17.56 Testing the Validity of the Modelā€¦ In a multiple regression model (i.e. more than one independent variable), we utilize an analysis of variance technique to test the overall validity of the model. Hereā€™s the idea: H0: H1: At least one is not equal to zero. If the null hypothesis is true, none of the independent variables is linearly related to y, and so the model is invalid. If at least one is not equal to 0, the model does have some validity.
  • 57. 17.57 Testing the Validity of the Modelā€¦ ANOVA table for regression analysisā€¦ Source of Variation degrees of freedom Sums of Squares Mean Squares F-Statistic Regression k SSR MSR = SSR/k F=MSR/MSE Error nā€“kā€“1 SSE MSE = SSE/(nā€“k-1) Total nā€“1 A large value of F indicates that most of the variation in y is explained by the regression equation and that the model is valid. A small value of F indicates that most of the variation in y is unexplained.
  • 58. Testing the Validity of the Modelā€¦ P<.o5, at least one is not 0, Reject H0, accept H1 the the model is valid
  • 59. 17.59 Interpreting the Coefficients* Intercept (b0) -51785.243 ā€¢ This is the average income when all of the independent variables are zero. Itā€™s meaningless to try and interpret this value, particularly if 0 is outside the range of the values of the independent variables (as is the case here). Age (b1) 460.87 ā€¢ Each 1 year increase in age will increase $460.87 in the income. Education (b2) 4100.9ā€¢ For each additional year of education, the annual income will increase $4100.9. Hours of work (b3) 620 ā€¢ each additional hour of work per week, the annual income will increase $620. *in each case we assume all other variables are held constantā€¦
  • 60. 17.60 Interpreting the Coefficients* Spouse hours of work (b4) -862.201ā€¢ For each additional hour the spouse work per week, the average annual income will decrease $862.201 . Occupation Prestige Score (b5) 641ā€¢ For each additional unit of score, the average annual income increases by $641 Number of Children (b6) -331 ā€¢ For each additional child, the average income decrease by -331 Number of family members earn money (b7) 687 ā€¢ For each additional family member earn money, the income increase by $687 Number of years with current job (b8) 330ā€¢ For each additional year with current job, the income increase by $330. *in each case we assume all other variables are held constantā€¦
  • 61. 17.61 Testing the Coefficientsā€¦ For each independent variable, we can test to determine whether there is enough evidence of a linear relationship between it and the dependent variable for the entire populationā€¦ H0: = 0 H1: ā‰  0 (for i = 1, 2, ā€¦, k) and using: as our test statistic (with nā€“kā€“1 degrees of freedom).
  • 62. 17.62 Testing the Coefficients We can use SPSS output to quickly test each of the 8 coefficients in our modelā€¦ Thus, EDUC, HRS, SPHRS, PRESTG80, are linearly related to the operating margin. There is no evidence to infer that AGE, CHILDS, EARNS, CUREMPYR are linearly related to operating margin.
  • 63. 17.63 Using the Regression Equation Much like we did with simple linear regression, we can produce a prediction interval for a particular value of y. As well, we can produce the confidence interval estimate of the expected value of y.
  • 64. 17.64 Using the Regression Equation Exercise GSS2008: We add one row (our given values for the independent variables) to the bottom of our data set, please produce ā–« prediction interval ā–« confidence interval estimate For the dependent variable y.
  • 65. 17.65 Regression Diagnostics I Exercise GSS2008 ā€¢ Calculate the residuals and check the following: ā–« Is the error variable nonnormal? ā–« Perform a normality test ā€¢ Is the error variance constant? ā–« Plot the residuals versus the predicted values of y. ā€¢ Are the errors independent (time-series data)? ā–« Plot the residuals versus the time periods. ā€¢ Are there observations that are inaccurate or do not belong to the target population? ā–« Double-check the accuracy of outliers and influential observations.
  • 66. 17.66 Regression Diagnostics II ā€¢ Multiple regression models have a problem that simple regressions do not, namely multicollinearity. ā€¢ It happens when the independent variables are highly correlated. ā€¢ Weā€™ll explore this concept through the following exampleā€¦
  • 67. 17.67 Example GSS2008 ā€¢ AGE and CUREMPYR are not significant predictor for INCOME in multiple regression model, but when we run correlation between AGE and INCOME, CUREMPYR and INCOME. They are both significantly correlated. ā€¢ How to account for this apparent contradiction? ā€¢ The answer is that the AGE and CUREMPYR are correlated, all three independent variables are correlated with each other ! ā€¢ The is the problem of multicollinearity.
  • 69.
  • 70. How to deal with multicollinearity problem ā€¢ Multicollinearity exits in virtually all multiple regression models. ā€¢ To minimize the effect: ā–« Try to include independent variables that are independent of each other. ā–« Develop a model that has a theoretical basis and include IVs that are necessary.
  • 71. 17.71 Regression Diagnostics III ā€“ Time Series ā€¢ The Durbin-Watson test allows us to determine whether there is evidence of first-order autocorrelation ā€” a condition in which a relationship exists between consecutive residuals, i.e. ei-1 and ei (i is the time period). The statistic for this test is defined as: ā€¢ d has a range of values: 0 ā‰¤ d ā‰¤ 4.
  • 72. 17.72 Durbinā€“Watson (two-tail test) ā€¢ To test for first-order autocorrelation: ā€¢ If d < dL or d > 4 ā€“ dL , first-order autocorrelation exists. ā€¢ If d falls between dL and dU or between 4 ā€“ dU and 4 ā€“ dU , the test is inconclusive. ā€¢ If d falls between dU and 4 ā€“ dU there is no evidence of first order autocorrelation. 4-dU 4-dL exists existsinconclusive dUdL 2 40 inconclusive doesnā€™t exist
  • 73. 17.73 Example 17.1 Xm17-01 Can we create a model that will predict lift ticket sales at a ski hill based on two weather parameters? Variables: y - lift ticket sales during Christmas week, x1 - total snowfall (inches), and x2 - average temperature (degrees Fahrenheit) Our ski hill manager collected 20 years of data.
  • 74. 17.74 Example 17.1 Both the coefficient of determination and the p-value of the F-test indicate the model is poorā€¦ Neither variable is linearly related to ticket saleā€¦
  • 75. 17.75 Example 17.1 ā€¢ The histogram of residualsā€¦ ā€¢ reveals the errors may be normally distributedā€¦
  • 76. 17.76 Example 17.1 ā€¢ In the plot of residuals versus predicted values (testing for heteroscedasticity) ā€” the error variance appears to be constantā€¦
  • 77. 17.77 Example 17.1 Durbin-Watson ā€¢ Apply the Durbin-Watson Statistic from to the entire list of residuals. ā€¢ Regression>Linear>Statistics>check Durbin-Watson
  • 78. 17.78 Example 17.1 To test for first-order autocorrelation with Ī± = .05, we find in Table 8(a) in Appendix B dL = 1.10 and dU = 1.54 The null and alternative hypotheses are H0 : There is no first-order autocorrelation. H1 : There is first-order autocorrelation. The rejection region includes d < dL = 1.10. Since d = .593, we reject the null hypothesis and conclude that there is enough evidence to infer that first-order autocorrelation exists.
  • 79. 17.79 Example 17.1 Autocorrelation usually indicates that the model needs to include an independent variable that has a time-ordered effect on the dependent variable. The simplest such independent variable represents the time periods. We included a third independent variable that records the number of years since the year the data were gathered. Thus, x3 = 1, 2,..., 20. The new model is y = Ī²0 + Ī²1x1 + Ī²2x2 + Ī²3x3 + Īµ
  • 80. 17.80 Example 17.1 The fit of the model is high, The model is validā€¦ Snowfall and time are linearly related to ticket sales; temperature is notā€¦ our new variable dL = 1.10 and dU = 1.54 dU <d<4- dU, first-order autocorrelation doesn't exit
  • 81. 17.81 Example 17.1 ā€¢ The Durbin-Watson statistic against the residuals from our Regression analysis is equal to 1.885. ā€¢ we can conclude that there is not enough evidence to infer the presence of first-order autocorrelation. (Determining dL is left as an exercise for the readerā€¦) ā€¢ Hence, we have improved out model dramatically!
  • 82. 17.82 Example 17.1 Notice that the model is improved dramatically. The F-test tells us that the model is valid. The t-tests tell us that both the amount of snowfall and time are significantly linearly related to the number of lift tickets. This information could prove useful in advertising for the resort. For example, if there has been a recent snowfall, the resort could emphasize that in its advertising. If no new snow has fallen, it may emphasize their snow-making facilities.
  • 83. 18.83 Model Selection Regression analysis can also be used for: ā€¢ non-linear (polynomial) models, and ā€¢ models that include nominal independent variables.
  • 84. 18.84 Polynomial Models Previously we looked at this multiple regression model: (its considered linear or first-order since the exponent on each of the xiā€™s is 1) The independent variables may be functions of a smaller number of predictor variables; polynomial models fall into this category. If there is one predictor value (x) we have:
  • 85. 18.85 Polynomial Models u v Technically, equation vis a multiple regression model with p independent variables (x1, x2, ā€¦, xp). Since x1 = x, x2 = x2, x3 = x3, ā€¦, xp = xp, its based on one predictor value (x). p is the order of the equation; weā€™ll focus equations of order p = 1, 2, and 3.
  • 86. 18.86 First Order Model When p = 1, we have our simple linear regression model: That is, we believe there is a straight-line relationship between the dependent and independent variables over the range of the values of x:
  • 87. 18.87 Second Order Model When p = 2, the polynomial model is a parabola:
  • 88. 18.88 Third Order Model When p = 3, our third order model looks like:
  • 89. 18.89 Polynomial Models: 2 Predictor Variables Perhaps we suspect that there are two predictor variables (x1 & x2) which influence the dependent variable: First order model (no interaction): First order model (with interaction):
  • 90. 18.90 Polynomial Models: 2 Predictor Variables First order models, 2 predictors, without & with interaction:
  • 91. 18.91 Polynomial Models: 2 Predictor Variables If we believe that a quadratic relationship exists between y and each of x1 and x2, and that the predictor variables interact in their effect on y, we can use this model: Second order model (in two variables) WITH interaction:
  • 92. 18.92 Polynomial Models: 2 Predictor Variables 2nd order models, 2 predictors, without & with interaction:
  • 93. 18.93 Selecting a Model One predictor variable, or two (or more)? First order? Second order? Higher order? With interaction? Without? How do we choose the right model?? Use our knowledge of the variables involved to build an initial model. Test that model using statistical techniques. If required, modify our model and re-testā€¦
  • 94. 18.94 Example 18.1 Weā€™ve been asked to come up with a regression model for a fast food restaurant. We know our primary market is middle-income adults and their children, particularly those between the ages of 5 and 12. Dependent variable ā€”restaurant revenue (gross or net) Predictor variables ā€” family income, age of children Is the relationship first order? quadratic?ā€¦
  • 95. 18.95 Example 18.1 The relationship between the dependent variable (revenue) and each predictor variable is probably quadratic. Members of low or high income households are less likely to eat at this chainā€™s restaurants, since the restaurants attract mostly middle-income customers. Neighborhoods where the mean age of children is either quite low or quite high are also less likely to eat there vs. the families with children in the 5-to-12 year range. Seems reasonable?
  • 96. 18.96 Example 18.1 Should we include the interaction term in our model? When in doubt, it is probably best to include it. Our model then, is: Where y = annual gross sales x1 = median annual household income* x2 = mean age of children* *in the neighborhood
  • 97. 18.97 Example 18.2 Xm18-02 Our fast food restaurant research department selected 25 locations at random and gathered data on revenues, household income, and ages of neighborhood children. Collected Data Calculated Data
  • 98. 18.98 Example 18.2 You can take the original data collected (revenues, household income, and age) and plot y vs. x1 and y vs. x2 to get a feel for the data; trend lines were added for clarityā€¦
  • 99. 18.99 Example 18.2 Checking the regression toolā€™s outputā€¦ The model fits the data well and its validā€¦ INTERPRET
  • 100. 18.100 Nominal Independent Variables Thus far in our regression analysis, weā€™ve only considered variables that are interval. Often however, we need to consider nominal data in our analysis. For example, our earlier example regarding the market for used cars focused only on mileage. Perhaps color is an important factor. How can we model this new variable?
  • 101. 18.101 Indicator Variables An indicator variable (also called a dummy variable) is a variable that can assume either one of only two values (usually 0 and 1). A value of 1 usually indicates the existence of a certain condition, while a value of 0 usually indicates that the condition does not hold. I1 = I2 = 0 if color not white 1 if color is white 0 if color not silver 1 if color is silver Car Color I1 I2 white 1 0 silver 0 1 other 0 0 two tone! 1 1 to represent m categoriesā€¦ we need mā€“1 indicator variables
  • 102. 18.102 Interpreting Indicator Variable Coefficients After performing our regression analysis: we have this regression equationā€¦ Thus, the price diminishes with additional mileage (x) a white car sells for $91.10 more than other colors (I1) a silver car fetches $330.40 more than other colors (I2)
  • 104. 18.104 Testing the Coefficients To test the coefficient of I1, we use these hypothesesā€¦ H0: = 0 H1: ā‰  0 There is insufficient evidence to infer that in the population of 3-year-old white Tauruses with the same odometer reading have a different selling price than do Tauruses in the ā€œotherā€ color categoryā€¦
  • 105. 18.105 Testing the Coefficients To test the coefficient of I2, we use these hypothesesā€¦ H0: = 0 H1: ā‰  0 We can conclude that there are differences in auction selling prices between all 3-year-old silver-colored Tauruses and the ā€œotherā€ color category with the same odometer readings
  • 106. Stepwise Regression ā€¢ Stepwise Regression is an iterative procedure that adds and deletes one independent variable at a time. The decision to add or delete a variable is made on the basis of whether that variable improves the model. ā€¢ It is a procedure that can eliminate correlated independent variables.
  • 107. Step 1: do simultaneous regression and rank all the significant variables No.1 No.4 No.2 No.3
  • 108. Step 2 ā€¢ Analyzeļƒ  ā€¢ Regressionļƒ  ā€¢ Linearļƒ  ā€¢ Stepwiseļƒ  ā€¢ Dependent variableļƒ  ā€¢ Independent variables (1st round: the top predictor; 2nd round: the top predictor & the 2nd top predictorā€¦until the nth round; n = number of predictors ā€¢ Statisticsļƒ  ā€¢ R square change & Descriptives
  • 109. ā€¢ Stepwise output ā€¢ What to read? ā€¢ R2 , R2 change, F of R2 change, significance level of F of R2 change in each round
  • 111. ā€¢ The regression equation ā€¢ Simulaneous: Å·= āˆ’51785.243 +460.87 AGE+4100.9 EDUC+ 620 HRSāˆ’862.201 SPHRSā€¦ ā€¦ +329.771 CUREMPRY ā€¢ Stepwise: Å·= -44703.12 +3944.7 EDUS-617.37SPHRS+526.493PRESTG80+956.933HRS
  • 112. Multiple regression ā€¢ Multiple regression examines the predictability of a set of predictors on a dependent variable (criterion) ā€¢ Why donā€™t we just throw in all the predictors and let the MR determine which ones are good predictors then? ā€¢ Reason 1: Theoretical consideration ā€¢ Reason 2: Concern of sample size
  • 113. Concern of sample size ā€¢ The desired level is 20 observations for each independent variable ā€¢ For instance, if you have 6 predictors, youā€™ve got to have at least 120 subjects in your data ā€¢ However, if a stepwise procedure is employed, the recommended level increases to 50 to 1 ā€¢ That is, youā€™ve got to have at least 300 subjects in order to run stepwise MR
  • 114. 18.114 Model Building Here is a procedure for building a regression model: uIdentify the dependent variable; what is it we wish to predict? Donā€™t forget the variableā€™s unit of measure. vList potential predictors; how would changes in predictors change the dependent variable? Be selective; go with the fewest independent variables required. Be aware of the effects of multicollinearity. w Gather the data; at least six? observations for each independent variable used in the equation.
  • 115. 18.115 Model Building x Identify several possible models; formulate first- and second- order models with and without interaction. Draw scatter diagrams. y Use statistical software to estimate the models. z Determine whether the required conditions are satisfied; if not, attempt to correct the problem. { Use your judgment and the statistical output to select the best model!