SlideShare a Scribd company logo
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
1	
  
Four	
  Assumptions	
  of	
  Multiple	
  
Regression	
  that	
  Researchers	
  should	
  
Always	
  Test	
  
	
  
	
  
	
  
A	
  Reference	
  Paper	
  Review	
  
Jasmine	
  K.	
  Tamanaha	
  
University	
  of	
  North	
  Carolina	
  –	
  Charlotte	
  
	
  
	
  
	
  
	
  
	
  
	
  
Author’s	
  Note	
  
This paper was prepared for Course Project, STAT 4123/5123 Applied Statistics
I, taught by Dr. Shaoyu Li.
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
2	
  
Abstract
We live in a world, where results are key and numbers answer questions
and solidify answers. How many times have you thought to yourself, “show me
the numbers?” Even as a numbers person I often times find myself asking or
thinking the same thing, however, I also like to dig a little deeper and ask the
follow up questions that never seem to get asked or answered, “WHERE did you
get your numbers?” Likewise, “HOW did you come to that conclusion?” This
review of a reference paper relative to those types of questions, and responds to
how faulty the numbers can be, four assumptions that the practicing researcher
(Osborne and Waters, 2002) needs to take into account, how to test these four
assumptions, and how pertinent this information is to data analysis and more
specifically analysis in the social sciences. If any of these assumptions is
violated…then the forecasts, confidence intervals, and scientific insights yielded
by a regression model may be (at best) inefficient or (at worst) seriously biased or
misleading. (Roberts, 2014)
	
  
	
  
	
  
	
  
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
3	
  
	
   	
  
	
  
“Essentially, all models are wrong, but some are useful”. (Box, 1987) This may
be one of the most analyzed and discussed quotes, by analysts alike. The first time I
heard this was from my Applied Statistics I class taught by Dr. Li, and after hearing the
quote, it really resonated with me. I further investigated and it was not hard to find.
After typing in bits and pieces of the quote into Google, it quickly auto filled and
immediately my page was flooded. Suddenly I was inundated with information about
George E.P. Box, questions and discussions of “what does this mean”, and much more.
Personally, I have since re-quoted this many times particularly whenever somebody
wants to talk numbers. The further you progress in your statistical studies you come to
realize that numbers are not as reliable as you had been originally taught since grade
school. Osborne and Waters do a remarkable job in Four Assumptions of Multiple
Regression That Researchers Should Always Test (2002), at bringing some issues to light,
in particular, highlighting four assumptions that fellow researchers and analysts need to
concede:
1) Normality Assumptions
2) Linearity Assumptions
3) Reliability of Measurement Assumptions
4) Homoscedasticity Assumptions
Awareness	
  and	
  understanding	
  the	
  importance	
  of	
  checking	
  these	
  assumptions	
  in	
  
regression	
  analysis,	
  period,	
  should	
  and	
  needs	
  to	
  be	
  general	
  knowledge.	
  
	
   	
  
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
4	
  
	
   Regression	
  analysis	
  assumes	
  that	
  the	
  data	
  variables	
  have	
  normal	
  
distribution,	
  but	
  what	
  about	
  the	
  cases	
  of	
  non-­‐normality?	
  	
  Most	
  people	
  know	
  that	
  
non-­‐normality	
  exists	
  and	
  if	
  the	
  name	
  does	
  not	
  ring	
  a	
  bell,	
  words	
  like	
  “outlier”	
  and	
  
“skewed”	
  are	
  most	
  definitely	
  key	
  buzzwords	
  that	
  everyone	
  has	
  either	
  used	
  or	
  heard.	
  	
  
In	
  statistics	
  substantial	
  outliers	
  and	
  highly	
  skewed	
  variables	
  can	
  completely	
  change	
  
the	
  relationships	
  of	
  the	
  data,	
  as	
  well	
  as	
  significance	
  tests.	
  	
  In	
  statistics	
  you	
  learn	
  
many	
  ways	
  to	
  spot	
  non-­‐normality	
  such,	
  as	
  Normality	
  Plots,	
  Q-­‐Q	
  Plots,	
  and	
  Smirnov	
  
tests	
  to	
  name	
  a	
  few.	
  	
  As	
  a	
  result	
  of	
  finding	
  normality	
  we	
  are	
  taught	
  about	
  “data	
  
cleaning”	
  and	
  using	
  Transformations.	
  	
  However	
  by	
  removing	
  outliers	
  we	
  may	
  
deleting	
  key	
  information	
  that	
  may	
  or	
  may	
  not	
  be	
  relevant	
  to	
  the	
  test	
  at	
  hand,	
  by	
  
adding	
  more	
  data	
  we	
  can	
  make	
  the	
  data	
  high	
  risk	
  for	
  multicollinearity,	
  Type	
  I	
  or	
  
Type	
  II	
  errors,	
  and	
  by	
  doing	
  Transformation	
  we	
  may	
  be	
  complicating	
  the	
  
interpretation	
  of	
  the	
  results.	
  	
  Basically	
  we	
  have	
  learned	
  ways	
  to	
  improve	
  normality	
  
and	
  maybe	
  even	
  accuracy,	
  but	
  at	
  what	
  cost?	
  
	
   	
  
	
   Changing	
  data	
  has	
  always	
  been	
  a	
  topic	
  of	
  curiosity	
  for	
  me,	
  because	
  solely	
  for	
  
analytical	
  purposes	
  I	
  have	
  my	
  ideal	
  goals	
  for	
  meeting	
  basic	
  requirements	
  such	
  as,	
  p-­‐
values,	
  z-­‐tests,	
  t-­‐tests,	
  Adjusted	
  R-­‐squared,	
  F-­‐statistic,	
  and	
  the	
  list	
  goes	
  on.	
  	
  
Conversely	
  so,	
  it	
  makes	
  me	
  want	
  to	
  shout	
  “YOU	
  ARE	
  STILL	
  CHANGING	
  DATA”.	
  	
  How	
  
am	
  I	
  supposed	
  to	
  trust	
  any	
  statistics	
  regurgitated	
  by	
  news	
  anchors,	
  salesman,	
  and	
  
advertisements	
  not	
  knowing	
  the	
  steps	
  that	
  were	
  taken	
  to	
  support	
  their	
  “90%	
  
Accuracy”	
  or	
  “5.8%	
  Unemployment	
  Drop”?	
  	
  Overall	
  we	
  assume	
  that	
  there	
  is	
  even	
  a	
  
relationship	
  between	
  the	
  dependent	
  and	
  independent	
  variables,	
  and	
  multiple	
  
regression	
  can	
  only	
  accurately	
  estimate	
  the	
  relationship	
  between	
  these	
  variables	
  if	
  the	
  
relationships	
  are	
  linear	
  in	
  nature. (Osborne, Waters, 2002) Which presents the
question, “What about social sciences?” Non-linear relationships commonly occur in the
social sciences, of which Osborne has an in-depth working knowledge of, particularly
Psychology and Education. When experiencing non-linearity the results will typically
underestimate the true nature between the independent and dependent variables. Osborne
and Waters (2002), Pedhazur (1997), Cohen and Cohen (1983), and Berry and Feldman
(1985) discuss or suggest three primary ways to detect non-linearity (Osborne, Waters
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
5	
  
2002). First being use of theory, or using past analyses to educate oneself as well as
supplement the current analyses. Second, accrue residual plots, which is easily and
readily accessible, and Thirdly, detecting curvilinearity, using squared or cubed terms.
The three “primary” methods for detecting non-linearity are not fail-safe, and still
pose many concerns, especially for the social sciences. Logically the social sciences
have many variables that are not particularly measurable. How exactly do you measure
your stress and anxiety levels? Unfortunately humans we not built with charts detailing
or bodies levels, although we have come up with way that we could use factors to test our
stress levels. Those many factors are obviously important to measurement, but there is a
very clear correlation amongst the factors, which again can lead to under estimation, over
estimation, all based on unreliable measurements. Every statistician’s goal is to
accurately model the “real” relationship, that is where Cronbach alphas come into play,
mainly for the world of social science analyses. Error estimates and reliability estimates
are just that, estimates, and are often times assumed for acceptability. There are
accepted methods for dealing with reliability in both simple and multiple regression.
Analysts be aware, even small correlations can change your R squared when correcting
low reliability, in making adjustments you may also change the magnitude or even the
direction of relationships, and the most dramatic changes occur when the covariate has a
substantial relationship with the other variables.
Even the simplest of changes can cause a reaction of changes, which may even
change what your data was trying to say in the first place. With discussing unreliable
reliability measurements, I also mentioned error estimates. What happens if the variance
of errors is the same across the board for all levels of independent variables? This is
called Homoscedasticity, and the opposite is Heteroscedasticity. When
Heteroscedasticity is very obvious, it can lead to serious alterations in your data, and
which can certainly “weaken” the analysis. Again with weakening the analysis you will
run into overestimation errors. We can use our handy residual plots to check for it.
Visually heteroscedasticity may look like bowtie or even a fan, which as we know we
want even randomness around 0 for our residuals. The fan shape can be show in
Goldfeld-Quandt test, indicating the error term either increases or decreases consistently
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
6	
  
as the value of the dependent variables increases, and in the Glejser test we recognize the
bow-tie shape due to the error term having a small variance centrally and a larger
variance at the extreme points. Transformation may be helpful to reduce
heteroscedasticity.
As one can see there is no quick fix or remedy without having potential
consequences, but not making alterations may have consequences as well, and it is very
much a catch 22 situation, which is when one may decide to go about their research and
analysis differently. Osborne and Waters’ main goal of the article was to raise
awareness of the importance of checking assumptions in simple and multiple regression
(2002), and that the four assumptions given can be checked and dealt with, with ease,
which seem to have important benefits. As Osborne and Waters also state as an
introduction “Most statistical tests rely upon certain assumptions about the variables
used in the analysis.” So it is our duty as researchers and analysts to recognize situations
as to not cause serious bias, familiarize, even if they may have little affect, and identify
when the violations of these four assumptions and many others are essential to
meaningful data analysis (Pedhazur, 1997, p.33). We have as serious situation where we
have a rich literature in education and social science, but we are fored to call into
question the validity of many of these results, conclusions, and assertions, as we have no
idea whether the assumptions of the statistical tests were met. (Osborne).
	
  
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
7	
  
References	
  
	
  
	
  
Osborne, Jason W, & Waters, Elaine, Four Assumptions of Multiple Regression
That Researchers Should Always Test, Practical Assessment, Research &
Evaluation, 2002, 8(2), North Carolina State University  University of
Oklahoma.
Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and
Response Surfaces, p. 424, Wiley. ISBN 0471810339.
Roberts, K. Global Warming: Utah's Future Threatens Hotter Temps, Longer and
More Severe Droughts. Department of Decision Sciences. Duke University: The
Fuqua School of Business, Updated 1 Dec. 2014 Web. 2014.
Berry, W. D.,  Feldman, S. (1985). Multiple Regression in Practice. Sage
University Paper Series on Quantitative Applications in the Social Sciences, series
no. 07-050). Newbury Park, CA: Sage.
Cohen, J.,  Cohen, P. (1983). Applied multiple regression/correlation analysis
for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Nunnally, J. C. (1978). Psychometric Theory (2nd
ed.). New York: McGraw Hill.
Osborne, J. W. (2001). A new look at outliers and fringeliers: Their effects on
statistic accuracy and Type I and Type II error rates. Unpublished manuscript,
Department of Educational Research and Leadership and Counselor Education,
North Carolina State University.
Osborne, J. W., Christensen, W. R.,  Gunter, J. (April, 2001). Educational
Psychology from a Statistician’sPerspective: AReviewofthePower and Goodness
of Educational Psychology Research. Paper presented at the national meeting of
the American Education Research Association (AERA), Seattle, WA.
Pedhazur, E. J., (1997). Multiple Regression in Behavioral Research (3
rd
ed.).
Orlando, FL:Harcourt Brace.
Tabachnick, B. G., Fidell, L. S. (1996). Using Multivariate Statistics (3rd ed.).
New York: Harper Collins College Publishers
Tabachnick, B. G., Fidell, L. S. (2001). Using Multivariate Statistics (4th ed.).
Needham Heights, MA: Allyn and Bacon.
FOUR	
  ASSUMPTIONS	
  RESEARCHERS	
  SHOULD	
  TEST	
   	
  
	
   	
   	
   	
  
8	
  
	
  

More Related Content

What's hot

Mayo & parker spsp 2016 june 16
Mayo & parker   spsp 2016 june 16Mayo & parker   spsp 2016 june 16
Mayo & parker spsp 2016 june 16
jemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
jemille6
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
jemille6
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
jemille6
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
jemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
jemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
jemille6
 
Analysing problems creatively final
Analysing problems creatively finalAnalysing problems creatively final
Analysing problems creatively finalZain Shaikh
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
jemille6
 
Short talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibilityShort talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibility
Dorothy Bishop
 
Chapter 3 part1-Design of Experiments
Chapter 3 part1-Design of ExperimentsChapter 3 part1-Design of Experiments
Chapter 3 part1-Design of Experiments
nszakir
 
On p-values
On p-valuesOn p-values
On p-values
Maarten van Smeden
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibility
Dorothy Bishop
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
jemille6
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
jemille6
 
Hypothesis testing, error and bias
Hypothesis testing, error and biasHypothesis testing, error and bias
Hypothesis testing, error and bias
Dr.Jatin Chhaya
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
jemille6
 
Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2
John Labrador
 
Hypothesis types, formulation, and testing
Hypothesis types, formulation, and testingHypothesis types, formulation, and testing
Hypothesis types, formulation, and testing
Aneesa Ch
 

What's hot (20)

Mayo &amp; parker spsp 2016 june 16
Mayo &amp; parker   spsp 2016 june 16Mayo &amp; parker   spsp 2016 june 16
Mayo &amp; parker spsp 2016 june 16
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
 
Feb21 mayobostonpaper
Feb21 mayobostonpaperFeb21 mayobostonpaper
Feb21 mayobostonpaper
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
Analysing problems creatively final
Analysing problems creatively finalAnalysing problems creatively final
Analysing problems creatively final
 
Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma Statistical Flukes, the Higgs Discovery, and 5 Sigma
Statistical Flukes, the Higgs Discovery, and 5 Sigma
 
Articulo 50 palabras
Articulo 50 palabras Articulo 50 palabras
Articulo 50 palabras
 
Short talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibilityShort talk on 2 cognitive biases and reproducibility
Short talk on 2 cognitive biases and reproducibility
 
Chapter 3 part1-Design of Experiments
Chapter 3 part1-Design of ExperimentsChapter 3 part1-Design of Experiments
Chapter 3 part1-Design of Experiments
 
On p-values
On p-valuesOn p-values
On p-values
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibility
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
April 3 2014 slides mayo
April 3 2014 slides mayoApril 3 2014 slides mayo
April 3 2014 slides mayo
 
Hypothesis testing, error and bias
Hypothesis testing, error and biasHypothesis testing, error and bias
Hypothesis testing, error and bias
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2Hypothesis Testing. Inferential Statistics pt. 2
Hypothesis Testing. Inferential Statistics pt. 2
 
Hypothesis types, formulation, and testing
Hypothesis types, formulation, and testingHypothesis types, formulation, and testing
Hypothesis types, formulation, and testing
 

Viewers also liked

Cellphones In The Classroom
Cellphones In The ClassroomCellphones In The Classroom
Cellphones In The Classroom
Vicki Davis
 
Students' Choice on Smartphones
Students' Choice on SmartphonesStudents' Choice on Smartphones
Students' Choice on Smartphones
wdanyang
 
Statistics project section c group 6
Statistics project section c group 6Statistics project section c group 6
Statistics project section c group 6Avnika Suri
 
CONSUMER PREFERENCES ON NOKIA MOBILE
CONSUMER PREFERENCES ON NOKIA MOBILE CONSUMER PREFERENCES ON NOKIA MOBILE
CONSUMER PREFERENCES ON NOKIA MOBILE
Saptarshi Chakraborty
 
Spss workshop by riaz
Spss workshop by riazSpss workshop by riaz
Spss workshop by riaz
Mehreen Khan
 
Mobile phone questionnaire
Mobile phone questionnaireMobile phone questionnaire
Mobile phone questionnairegeorgiart298
 

Viewers also liked (7)

Use of smarts phones
Use of smarts phonesUse of smarts phones
Use of smarts phones
 
Cellphones In The Classroom
Cellphones In The ClassroomCellphones In The Classroom
Cellphones In The Classroom
 
Students' Choice on Smartphones
Students' Choice on SmartphonesStudents' Choice on Smartphones
Students' Choice on Smartphones
 
Statistics project section c group 6
Statistics project section c group 6Statistics project section c group 6
Statistics project section c group 6
 
CONSUMER PREFERENCES ON NOKIA MOBILE
CONSUMER PREFERENCES ON NOKIA MOBILE CONSUMER PREFERENCES ON NOKIA MOBILE
CONSUMER PREFERENCES ON NOKIA MOBILE
 
Spss workshop by riaz
Spss workshop by riazSpss workshop by riaz
Spss workshop by riaz
 
Mobile phone questionnaire
Mobile phone questionnaireMobile phone questionnaire
Mobile phone questionnaire
 

Similar to CourseProjectReviewPaper.jktamanaPDF

Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docxLesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
SHIVA101531
 
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
taishao1
 
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
evonnehoggarth79783
 
Inferential include one or more of the inferential statistical procedures.docx
Inferential include one or more of the inferential statistical procedures.docxInferential include one or more of the inferential statistical procedures.docx
Inferential include one or more of the inferential statistical procedures.docx
write4
 
BASIC MATH PROBLEMS IN STATISCTICSS.pptx
BASIC MATH PROBLEMS IN STATISCTICSS.pptxBASIC MATH PROBLEMS IN STATISCTICSS.pptx
BASIC MATH PROBLEMS IN STATISCTICSS.pptx
AngelFaithBactol
 
Sti2018 jws
Sti2018 jwsSti2018 jws
Sti2018 jws
Jesper Schneider
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
jasoninnes20
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
curwenmichaela
 
CategoryPoor (Below Average)AverageAbove AverageLength of .docx
CategoryPoor (Below Average)AverageAbove AverageLength of .docxCategoryPoor (Below Average)AverageAbove AverageLength of .docx
CategoryPoor (Below Average)AverageAbove AverageLength of .docx
tidwellveronique
 
Debunk bullshit in statistics QN
Debunk bullshit in statistics QNDebunk bullshit in statistics QN
Debunk bullshit in statistics QN
Quan Nguyen
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
karlhennesey
 
Steps in hypothesis.pptx
Steps in hypothesis.pptxSteps in hypothesis.pptx
Steps in hypothesis.pptx
Yashwanth Rm
 
Change language English  DeutschEspañolNederlandsYour ResultsClosed.docx
Change language English  DeutschEspañolNederlandsYour ResultsClosed.docxChange language English  DeutschEspañolNederlandsYour ResultsClosed.docx
Change language English  DeutschEspañolNederlandsYour ResultsClosed.docx
sleeperharwell
 
Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docx
Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docxCross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docx
Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docx
annettsparrow
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
praveen3030
 
IMRAD FORMAT
IMRAD FORMAT IMRAD FORMAT
IMRAD FORMAT
Rhem Rick Corpuz
 
The Art and Science of Survey Research
The Art and Science of Survey ResearchThe Art and Science of Survey Research
The Art and Science of Survey Research
Siobhan O'Dwyer
 
تحليل البيانات وتفسير المعطيات
تحليل البيانات وتفسير المعطياتتحليل البيانات وتفسير المعطيات
تحليل البيانات وتفسير المعطيات
مركز البحوث الأقسام العلمية
 
Determinants
DeterminantsDeterminants
Reasrch methodology for MBA
Reasrch methodology for MBAReasrch methodology for MBA
Reasrch methodology for MBA
Rahul Rajan
 

Similar to CourseProjectReviewPaper.jktamanaPDF (20)

Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docxLesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
Lesson 2 Statistics Benefits, Risks, and MeasurementsAssignmen.docx
 
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
 
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
7 HYPOTHETICALS AND YOU TESTING YOUR QUESTIONS7 MEDIA LIBRARY.docx
 
Inferential include one or more of the inferential statistical procedures.docx
Inferential include one or more of the inferential statistical procedures.docxInferential include one or more of the inferential statistical procedures.docx
Inferential include one or more of the inferential statistical procedures.docx
 
BASIC MATH PROBLEMS IN STATISCTICSS.pptx
BASIC MATH PROBLEMS IN STATISCTICSS.pptxBASIC MATH PROBLEMS IN STATISCTICSS.pptx
BASIC MATH PROBLEMS IN STATISCTICSS.pptx
 
Sti2018 jws
Sti2018 jwsSti2018 jws
Sti2018 jws
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
 
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docxBUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
BUS 308 Week 2 Lecture 1 Examining Differences - overview .docx
 
CategoryPoor (Below Average)AverageAbove AverageLength of .docx
CategoryPoor (Below Average)AverageAbove AverageLength of .docxCategoryPoor (Below Average)AverageAbove AverageLength of .docx
CategoryPoor (Below Average)AverageAbove AverageLength of .docx
 
Debunk bullshit in statistics QN
Debunk bullshit in statistics QNDebunk bullshit in statistics QN
Debunk bullshit in statistics QN
 
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docxPage 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
Page 266LEARNING OBJECTIVES· Explain how researchers use inf.docx
 
Steps in hypothesis.pptx
Steps in hypothesis.pptxSteps in hypothesis.pptx
Steps in hypothesis.pptx
 
Change language English  DeutschEspañolNederlandsYour ResultsClosed.docx
Change language English  DeutschEspañolNederlandsYour ResultsClosed.docxChange language English  DeutschEspañolNederlandsYour ResultsClosed.docx
Change language English  DeutschEspañolNederlandsYour ResultsClosed.docx
 
Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docx
Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docxCross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docx
Cross-Cultural PsychologyChapter 2 Methodology of Cross-Cult.docx
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
IMRAD FORMAT
IMRAD FORMAT IMRAD FORMAT
IMRAD FORMAT
 
The Art and Science of Survey Research
The Art and Science of Survey ResearchThe Art and Science of Survey Research
The Art and Science of Survey Research
 
تحليل البيانات وتفسير المعطيات
تحليل البيانات وتفسير المعطياتتحليل البيانات وتفسير المعطيات
تحليل البيانات وتفسير المعطيات
 
Determinants
DeterminantsDeterminants
Determinants
 
Reasrch methodology for MBA
Reasrch methodology for MBAReasrch methodology for MBA
Reasrch methodology for MBA
 

CourseProjectReviewPaper.jktamanaPDF

  • 1. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             1   Four  Assumptions  of  Multiple   Regression  that  Researchers  should   Always  Test         A  Reference  Paper  Review   Jasmine  K.  Tamanaha   University  of  North  Carolina  –  Charlotte               Author’s  Note   This paper was prepared for Course Project, STAT 4123/5123 Applied Statistics I, taught by Dr. Shaoyu Li.
  • 2. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             2   Abstract We live in a world, where results are key and numbers answer questions and solidify answers. How many times have you thought to yourself, “show me the numbers?” Even as a numbers person I often times find myself asking or thinking the same thing, however, I also like to dig a little deeper and ask the follow up questions that never seem to get asked or answered, “WHERE did you get your numbers?” Likewise, “HOW did you come to that conclusion?” This review of a reference paper relative to those types of questions, and responds to how faulty the numbers can be, four assumptions that the practicing researcher (Osborne and Waters, 2002) needs to take into account, how to test these four assumptions, and how pertinent this information is to data analysis and more specifically analysis in the social sciences. If any of these assumptions is violated…then the forecasts, confidence intervals, and scientific insights yielded by a regression model may be (at best) inefficient or (at worst) seriously biased or misleading. (Roberts, 2014)        
  • 3. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             3         “Essentially, all models are wrong, but some are useful”. (Box, 1987) This may be one of the most analyzed and discussed quotes, by analysts alike. The first time I heard this was from my Applied Statistics I class taught by Dr. Li, and after hearing the quote, it really resonated with me. I further investigated and it was not hard to find. After typing in bits and pieces of the quote into Google, it quickly auto filled and immediately my page was flooded. Suddenly I was inundated with information about George E.P. Box, questions and discussions of “what does this mean”, and much more. Personally, I have since re-quoted this many times particularly whenever somebody wants to talk numbers. The further you progress in your statistical studies you come to realize that numbers are not as reliable as you had been originally taught since grade school. Osborne and Waters do a remarkable job in Four Assumptions of Multiple Regression That Researchers Should Always Test (2002), at bringing some issues to light, in particular, highlighting four assumptions that fellow researchers and analysts need to concede: 1) Normality Assumptions 2) Linearity Assumptions 3) Reliability of Measurement Assumptions 4) Homoscedasticity Assumptions Awareness  and  understanding  the  importance  of  checking  these  assumptions  in   regression  analysis,  period,  should  and  needs  to  be  general  knowledge.      
  • 4. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             4     Regression  analysis  assumes  that  the  data  variables  have  normal   distribution,  but  what  about  the  cases  of  non-­‐normality?    Most  people  know  that   non-­‐normality  exists  and  if  the  name  does  not  ring  a  bell,  words  like  “outlier”  and   “skewed”  are  most  definitely  key  buzzwords  that  everyone  has  either  used  or  heard.     In  statistics  substantial  outliers  and  highly  skewed  variables  can  completely  change   the  relationships  of  the  data,  as  well  as  significance  tests.    In  statistics  you  learn   many  ways  to  spot  non-­‐normality  such,  as  Normality  Plots,  Q-­‐Q  Plots,  and  Smirnov   tests  to  name  a  few.    As  a  result  of  finding  normality  we  are  taught  about  “data   cleaning”  and  using  Transformations.    However  by  removing  outliers  we  may   deleting  key  information  that  may  or  may  not  be  relevant  to  the  test  at  hand,  by   adding  more  data  we  can  make  the  data  high  risk  for  multicollinearity,  Type  I  or   Type  II  errors,  and  by  doing  Transformation  we  may  be  complicating  the   interpretation  of  the  results.    Basically  we  have  learned  ways  to  improve  normality   and  maybe  even  accuracy,  but  at  what  cost?         Changing  data  has  always  been  a  topic  of  curiosity  for  me,  because  solely  for   analytical  purposes  I  have  my  ideal  goals  for  meeting  basic  requirements  such  as,  p-­‐ values,  z-­‐tests,  t-­‐tests,  Adjusted  R-­‐squared,  F-­‐statistic,  and  the  list  goes  on.     Conversely  so,  it  makes  me  want  to  shout  “YOU  ARE  STILL  CHANGING  DATA”.    How   am  I  supposed  to  trust  any  statistics  regurgitated  by  news  anchors,  salesman,  and   advertisements  not  knowing  the  steps  that  were  taken  to  support  their  “90%   Accuracy”  or  “5.8%  Unemployment  Drop”?    Overall  we  assume  that  there  is  even  a   relationship  between  the  dependent  and  independent  variables,  and  multiple   regression  can  only  accurately  estimate  the  relationship  between  these  variables  if  the   relationships  are  linear  in  nature. (Osborne, Waters, 2002) Which presents the question, “What about social sciences?” Non-linear relationships commonly occur in the social sciences, of which Osborne has an in-depth working knowledge of, particularly Psychology and Education. When experiencing non-linearity the results will typically underestimate the true nature between the independent and dependent variables. Osborne and Waters (2002), Pedhazur (1997), Cohen and Cohen (1983), and Berry and Feldman (1985) discuss or suggest three primary ways to detect non-linearity (Osborne, Waters
  • 5. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             5   2002). First being use of theory, or using past analyses to educate oneself as well as supplement the current analyses. Second, accrue residual plots, which is easily and readily accessible, and Thirdly, detecting curvilinearity, using squared or cubed terms. The three “primary” methods for detecting non-linearity are not fail-safe, and still pose many concerns, especially for the social sciences. Logically the social sciences have many variables that are not particularly measurable. How exactly do you measure your stress and anxiety levels? Unfortunately humans we not built with charts detailing or bodies levels, although we have come up with way that we could use factors to test our stress levels. Those many factors are obviously important to measurement, but there is a very clear correlation amongst the factors, which again can lead to under estimation, over estimation, all based on unreliable measurements. Every statistician’s goal is to accurately model the “real” relationship, that is where Cronbach alphas come into play, mainly for the world of social science analyses. Error estimates and reliability estimates are just that, estimates, and are often times assumed for acceptability. There are accepted methods for dealing with reliability in both simple and multiple regression. Analysts be aware, even small correlations can change your R squared when correcting low reliability, in making adjustments you may also change the magnitude or even the direction of relationships, and the most dramatic changes occur when the covariate has a substantial relationship with the other variables. Even the simplest of changes can cause a reaction of changes, which may even change what your data was trying to say in the first place. With discussing unreliable reliability measurements, I also mentioned error estimates. What happens if the variance of errors is the same across the board for all levels of independent variables? This is called Homoscedasticity, and the opposite is Heteroscedasticity. When Heteroscedasticity is very obvious, it can lead to serious alterations in your data, and which can certainly “weaken” the analysis. Again with weakening the analysis you will run into overestimation errors. We can use our handy residual plots to check for it. Visually heteroscedasticity may look like bowtie or even a fan, which as we know we want even randomness around 0 for our residuals. The fan shape can be show in Goldfeld-Quandt test, indicating the error term either increases or decreases consistently
  • 6. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             6   as the value of the dependent variables increases, and in the Glejser test we recognize the bow-tie shape due to the error term having a small variance centrally and a larger variance at the extreme points. Transformation may be helpful to reduce heteroscedasticity. As one can see there is no quick fix or remedy without having potential consequences, but not making alterations may have consequences as well, and it is very much a catch 22 situation, which is when one may decide to go about their research and analysis differently. Osborne and Waters’ main goal of the article was to raise awareness of the importance of checking assumptions in simple and multiple regression (2002), and that the four assumptions given can be checked and dealt with, with ease, which seem to have important benefits. As Osborne and Waters also state as an introduction “Most statistical tests rely upon certain assumptions about the variables used in the analysis.” So it is our duty as researchers and analysts to recognize situations as to not cause serious bias, familiarize, even if they may have little affect, and identify when the violations of these four assumptions and many others are essential to meaningful data analysis (Pedhazur, 1997, p.33). We have as serious situation where we have a rich literature in education and social science, but we are fored to call into question the validity of many of these results, conclusions, and assertions, as we have no idea whether the assumptions of the statistical tests were met. (Osborne).  
  • 7. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             7   References       Osborne, Jason W, & Waters, Elaine, Four Assumptions of Multiple Regression That Researchers Should Always Test, Practical Assessment, Research & Evaluation, 2002, 8(2), North Carolina State University University of Oklahoma. Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley. ISBN 0471810339. Roberts, K. Global Warming: Utah's Future Threatens Hotter Temps, Longer and More Severe Droughts. Department of Decision Sciences. Duke University: The Fuqua School of Business, Updated 1 Dec. 2014 Web. 2014. Berry, W. D., Feldman, S. (1985). Multiple Regression in Practice. Sage University Paper Series on Quantitative Applications in the Social Sciences, series no. 07-050). Newbury Park, CA: Sage. Cohen, J., Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Nunnally, J. C. (1978). Psychometric Theory (2nd ed.). New York: McGraw Hill. Osborne, J. W. (2001). A new look at outliers and fringeliers: Their effects on statistic accuracy and Type I and Type II error rates. Unpublished manuscript, Department of Educational Research and Leadership and Counselor Education, North Carolina State University. Osborne, J. W., Christensen, W. R., Gunter, J. (April, 2001). Educational Psychology from a Statistician’sPerspective: AReviewofthePower and Goodness of Educational Psychology Research. Paper presented at the national meeting of the American Education Research Association (AERA), Seattle, WA. Pedhazur, E. J., (1997). Multiple Regression in Behavioral Research (3 rd ed.). Orlando, FL:Harcourt Brace. Tabachnick, B. G., Fidell, L. S. (1996). Using Multivariate Statistics (3rd ed.). New York: Harper Collins College Publishers Tabachnick, B. G., Fidell, L. S. (2001). Using Multivariate Statistics (4th ed.). Needham Heights, MA: Allyn and Bacon.
  • 8. FOUR  ASSUMPTIONS  RESEARCHERS  SHOULD  TEST             8