CourseProjectReviewPaper.jktamanaPDF

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

1

Four
Assumptions
of
Multiple

Regression
that
Researchers
should

Always
Test

A
Reference
Paper
Review

Jasmine
K.
Tamanaha

University
of
North
Carolina
–
Charlotte

Author’s
Note

This paper was prepared for Course Project, STAT 4123/5123 Applied Statistics
I, taught by Dr. Shaoyu Li.

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

2

Abstract
We live in a world, where results are key and numbers answer questions
and solidify answers. How many times have you thought to yourself, “show me
the numbers?” Even as a numbers person I often times find myself asking or
thinking the same thing, however, I also like to dig a little deeper and ask the
follow up questions that never seem to get asked or answered, “WHERE did you
get your numbers?” Likewise, “HOW did you come to that conclusion?” This
review of a reference paper relative to those types of questions, and responds to
how faulty the numbers can be, four assumptions that the practicing researcher
(Osborne and Waters, 2002) needs to take into account, how to test these four
assumptions, and how pertinent this information is to data analysis and more
specifically analysis in the social sciences. If any of these assumptions is
violated…then the forecasts, confidence intervals, and scientific insights yielded
by a regression model may be (at best) inefficient or (at worst) seriously biased or
misleading. (Roberts, 2014)

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

3

“Essentially, all models are wrong, but some are useful”. (Box, 1987) This may
be one of the most analyzed and discussed quotes, by analysts alike. The first time I
heard this was from my Applied Statistics I class taught by Dr. Li, and after hearing the
quote, it really resonated with me. I further investigated and it was not hard to find.
After typing in bits and pieces of the quote into Google, it quickly auto filled and
immediately my page was flooded. Suddenly I was inundated with information about
George E.P. Box, questions and discussions of “what does this mean”, and much more.
Personally, I have since re-quoted this many times particularly whenever somebody
wants to talk numbers. The further you progress in your statistical studies you come to
realize that numbers are not as reliable as you had been originally taught since grade
school. Osborne and Waters do a remarkable job in Four Assumptions of Multiple
Regression That Researchers Should Always Test (2002), at bringing some issues to light,
in particular, highlighting four assumptions that fellow researchers and analysts need to
concede:
1) Normality Assumptions
2) Linearity Assumptions
3) Reliability of Measurement Assumptions
4) Homoscedasticity Assumptions
Awareness
and
understanding
the
importance
of
checking
these
assumptions
in

regression
analysis,
period,
should
and
needs
to
be
general
knowledge.

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

4

Regression
analysis
assumes
that
the
data
variables
have
normal

distribution,
but
what
about
the
cases
of
non-‐normality?

Most
people
know
that

non-‐normality
exists
and
if
the
name
does
not
ring
a
bell,
words
like
“outlier”
and

“skewed”
are
most
definitely
key
buzzwords
that
everyone
has
either
used
or
heard.

In
statistics
substantial
outliers
and
highly
skewed
variables
can
completely
change

the
relationships
of
the
data,
as
well
as
significance
tests.

In
statistics
you
learn

many
ways
to
spot
non-‐normality
such,
as
Normality
Plots,
Q-‐Q
Plots,
and
Smirnov

tests
to
name
a
few.

As
a
result
of
finding
normality
we
are
taught
about
“data

cleaning”
and
using
Transformations.

However
by
removing
outliers
we
may

deleting
key
information
that
may
or
may
not
be
relevant
to
the
test
at
hand,
by

adding
more
data
we
can
make
the
data
high
risk
for
multicollinearity,
Type
I
or

Type
II
errors,
and
by
doing
Transformation
we
may
be
complicating
the

interpretation
of
the
results.

Basically
we
have
learned
ways
to
improve
normality

and
maybe
even
accuracy,
but
at
what
cost?

Changing
data
has
always
been
a
topic
of
curiosity
for
me,
because
solely
for

analytical
purposes
I
have
my
ideal
goals
for
meeting
basic
requirements
such
as,
p-‐
values,
z-‐tests,
t-‐tests,
Adjusted
R-‐squared,
F-‐statistic,
and
the
list
goes
on.

Conversely
so,
it
makes
me
want
to
shout
“YOU
ARE
STILL
CHANGING
DATA”.

How

am
I
supposed
to
trust
any
statistics
regurgitated
by
news
anchors,
salesman,
and

advertisements
not
knowing
the
steps
that
were
taken
to
support
their
“90%

Accuracy”
or
“5.8%
Unemployment
Drop”?

Overall
we
assume
that
there
is
even
a

relationship
between
the
dependent
and
independent
variables,
and
multiple

regression
can
only
accurately
estimate
the
relationship
between
these
variables
if
the

relationships
are
linear
in
nature. (Osborne, Waters, 2002) Which presents the
question, “What about social sciences?” Non-linear relationships commonly occur in the
social sciences, of which Osborne has an in-depth working knowledge of, particularly
Psychology and Education. When experiencing non-linearity the results will typically
underestimate the true nature between the independent and dependent variables. Osborne
and Waters (2002), Pedhazur (1997), Cohen and Cohen (1983), and Berry and Feldman
(1985) discuss or suggest three primary ways to detect non-linearity (Osborne, Waters

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

5

2002). First being use of theory, or using past analyses to educate oneself as well as
supplement the current analyses. Second, accrue residual plots, which is easily and
readily accessible, and Thirdly, detecting curvilinearity, using squared or cubed terms.
The three “primary” methods for detecting non-linearity are not fail-safe, and still
pose many concerns, especially for the social sciences. Logically the social sciences
have many variables that are not particularly measurable. How exactly do you measure
your stress and anxiety levels? Unfortunately humans we not built with charts detailing
or bodies levels, although we have come up with way that we could use factors to test our
stress levels. Those many factors are obviously important to measurement, but there is a
very clear correlation amongst the factors, which again can lead to under estimation, over
estimation, all based on unreliable measurements. Every statistician’s goal is to
accurately model the “real” relationship, that is where Cronbach alphas come into play,
mainly for the world of social science analyses. Error estimates and reliability estimates
are just that, estimates, and are often times assumed for acceptability. There are
accepted methods for dealing with reliability in both simple and multiple regression.
Analysts be aware, even small correlations can change your R squared when correcting
low reliability, in making adjustments you may also change the magnitude or even the
direction of relationships, and the most dramatic changes occur when the covariate has a
substantial relationship with the other variables.
Even the simplest of changes can cause a reaction of changes, which may even
change what your data was trying to say in the first place. With discussing unreliable
reliability measurements, I also mentioned error estimates. What happens if the variance
of errors is the same across the board for all levels of independent variables? This is
called Homoscedasticity, and the opposite is Heteroscedasticity. When
Heteroscedasticity is very obvious, it can lead to serious alterations in your data, and
which can certainly “weaken” the analysis. Again with weakening the analysis you will
run into overestimation errors. We can use our handy residual plots to check for it.
Visually heteroscedasticity may look like bowtie or even a fan, which as we know we
want even randomness around 0 for our residuals. The fan shape can be show in
Goldfeld-Quandt test, indicating the error term either increases or decreases consistently

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

6

as the value of the dependent variables increases, and in the Glejser test we recognize the
bow-tie shape due to the error term having a small variance centrally and a larger
variance at the extreme points. Transformation may be helpful to reduce
heteroscedasticity.
As one can see there is no quick fix or remedy without having potential
consequences, but not making alterations may have consequences as well, and it is very
much a catch 22 situation, which is when one may decide to go about their research and
analysis differently. Osborne and Waters’ main goal of the article was to raise
awareness of the importance of checking assumptions in simple and multiple regression
(2002), and that the four assumptions given can be checked and dealt with, with ease,
which seem to have important benefits. As Osborne and Waters also state as an
introduction “Most statistical tests rely upon certain assumptions about the variables
used in the analysis.” So it is our duty as researchers and analysts to recognize situations
as to not cause serious bias, familiarize, even if they may have little affect, and identify
when the violations of these four assumptions and many others are essential to
meaningful data analysis (Pedhazur, 1997, p.33). We have as serious situation where we
have a rich literature in education and social science, but we are fored to call into
question the validity of many of these results, conclusions, and assertions, as we have no
idea whether the assumptions of the statistical tests were met. (Osborne).

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

7

References

Osborne, Jason W, & Waters, Elaine, Four Assumptions of Multiple Regression
That Researchers Should Always Test, Practical Assessment, Research &
Evaluation, 2002, 8(2), North Carolina State University University of
Oklahoma.
Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and
Response Surfaces, p. 424, Wiley. ISBN 0471810339.
Roberts, K. Global Warming: Utah's Future Threatens Hotter Temps, Longer and
More Severe Droughts. Department of Decision Sciences. Duke University: The
Fuqua School of Business, Updated 1 Dec. 2014 Web. 2014.
Berry, W. D., Feldman, S. (1985). Multiple Regression in Practice. Sage
University Paper Series on Quantitative Applications in the Social Sciences, series
no. 07-050). Newbury Park, CA: Sage.
Cohen, J., Cohen, P. (1983). Applied multiple regression/correlation analysis
for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Nunnally, J. C. (1978). Psychometric Theory (2nd
ed.). New York: McGraw Hill.
Osborne, J. W. (2001). A new look at outliers and fringeliers: Their effects on
statistic accuracy and Type I and Type II error rates. Unpublished manuscript,
Department of Educational Research and Leadership and Counselor Education,
North Carolina State University.
Osborne, J. W., Christensen, W. R., Gunter, J. (April, 2001). Educational
Psychology from a Statistician’sPerspective: AReviewofthePower and Goodness
of Educational Psychology Research. Paper presented at the national meeting of
the American Education Research Association (AERA), Seattle, WA.
Pedhazur, E. J., (1997). Multiple Regression in Behavioral Research (3
rd
ed.).
Orlando, FL:Harcourt Brace.
Tabachnick, B. G., Fidell, L. S. (1996). Using Multivariate Statistics (3rd ed.).
New York: Harper Collins College Publishers
Tabachnick, B. G., Fidell, L. S. (2001). Using Multivariate Statistics (4th ed.).
Needham Heights, MA: Allyn and Bacon.

FOUR
ASSUMPTIONS
RESEARCHERS
SHOULD
TEST

8

CourseProjectReviewPaper.jktamanaPDF

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to CourseProjectReviewPaper.jktamanaPDF

Similar to CourseProjectReviewPaper.jktamanaPDF (20)

CourseProjectReviewPaper.jktamanaPDF