Hypothesis testing

of 52
Hypothesis testing and interpretation of data
Testing Of Hypothesis
The basic logic of hypothesis testing is to prove or disprove the research question. When a
researcher conducts quantitatively research, he/she is attempting to answer a research question or
hypothesis that has been formulated .One method of evaluating this research question is via a
process called hypothesis testing, which is sometimes also referred to as significance testing.
Example :
Two lecturers, Sandy and Mandy, thinks that they use the best method to teach their students.
Each lecturer has 50 statistics student who are studying a graduate degree in management. In
sandy’s class, students have to attend one lecture and one seminar class every week, whilst in
Mandy believes that lectures are sufficient by themselves in their own time. This is the first year
that Sandy has given seminars, but since they take up a lot of her time, she wants to make sure
that she is not wasting her time and that seminars improve the students’ performance.
The ResearchHypothesis
The first step in hypothesis testing is to set a research hypothesis. In a sandy and mandy,s study,
the aim is to examine the effect that two different teaching methods – providing both lectures
and seminars classes (sandy), and providing only lectures by themselves (mandy) – had on the
performance of the students. More specifically , they want to determine whether performance is
different between the two different teaching methods. Whilst mandy is skeptical about the

of 52
effectiveness of seminars, sandy clearly believes that students do better than those in mandy’s
class. This leads to the following research hypothesis:
Researchhypothesis: When student attend seminar classes, in addition to lecture, their
performance increases.
By taking a hypothesis testing approach, Sandy and Mandy want to generalize their result toa
population(total students) rather than just the students in their sample. However, in order to use
hypothesis testing, one needs to re-state the research hypothesis as a null and alternative
hypothesis.
Null hypothesis : the null hypothesis (H0) is a hypothesis which the researcher tries to disprove,
reject or nullify. A null hypothesis is “the hypothesis that there is no relationship between two or
more variables, symbolized as H0.
Alternative hypothesis: the alternate, or research, hypothesis proposes a relationship between two
or more variables, symbolized as H1.
Decision errors
Two type of errors can result from a hypothesis test.
TypeⅠerror : A typeⅠerror occurs when the researcher rejects anull hypothesis when it is true.
The probability of committing a type error is called the significance level. This probability is
also called alpha, and is often denoted by α

of 52
Type Ⅱerror : A Type Ⅱ error occurs when the researcher fails to reject a null hypothesis,
which is false. The probability of committing a Type Ⅱ error is called Beta, and is often
denoted by β . The probability of not committing a TypeⅡ error is called the Power of
the test.

of 52
Steps/procedures in Hypothesis Testing
1. Identify the research problem :
The first step is to state the research problem The research problem needs to identify the
population of interest ,and the variables under investigation.
Example of research problem: To find out the effectiveness of two teaching methods- only
lecture method- with reference to exam marks of the students.
In the above research problem, the population of interest refers to the student, and the variable
include the teaching methods and the marks.
This step enable the researcher not only define what is not to be tested but what variable(s) will
be used in sample data collection. The type of variable(s), wheter categorical, discreate or
continuous, further defines the statistical test which can be performed on the collected data.
2.Specific the null and alternative Hypothesis:
The research problem or question is converted into a null hypothesis and an alternative
hypothesis. The hypothesis. The hypotheses are started in such a way that they are mutually
exclusive. That is, if one is true, the other must be false.
(a)Null Hypothesis: A null hypothesis (H0)is a statement that declares the observed difference is
due to “chance”. It is the hypothesis the researcher hopes to reject or disprove.
A null hypothesis states that there is no relationship between two or more variables. The
simplistic definition of the null is - as the opposite of the alternative hypothesis(H1).
Example: “There is no difference between the two methods of teaching( only lecture method,
and lecture-cum-seminar method) on the scoring of marks of student.”

of 52
(b) Alternative Hypothesis:
The alternate hypothesis proposes a relationship between two or more variables, symbolized as
H1.
Example: “The lecture-cum-seminar method improves the scoring of marks of students as
compared to the only lecture method.”
“Note that the two hypotheses we propose to test must be mutually exclusive i.e., when one is
true the other must be false. And we see that they must be exhaustive; they must be include all
possible occurrences.”
From the above, it is clear that the null hypothesis is a hypothesis of no difference. The main
problem of testing of hypothesis is to accept or to reject the null hypothesis. The alternative
hypothesis specifies a definite relationship between the two variables. Only one alternative
hypothesis is tested against the null hypothesis.
3. Significance Level:
After formulating the hypotheses, the researcher must determine a certain level of significance.
The confidence with which a null hypothesis is accepted or rejected depends on the level of
significance.
Generally, the level of significance falls between 5%and 1%:
A significance level of 5% means the risk of making a wrong decision in accepting a false
hypothesis or in rejecting a true hypothesis by 5 times out 100 occasions.
A significance level of 1% means the ris of making a wrong decision is 1%. This means the
researcher may make o

of 52
A wrong decision in accepting a false hypothesis or in rejecting a true hypothesis is once out of
100 occasions. Therefore, a 1% level of significance provides greater confidence with which null
hypothesis is accepted or rejected as compared to 5% level of significance.
4.Test Statistic:
A statistic used to test the null hypothesis. The researcher needs to identify a test statistic that can
be used to assess the truth of the null hypothesis. It is used to test whether the null hypothesis set
up should be accepted or rejected.
Test statistic is calculated from the collected data. There are different types of test statistics. For
instance, the z statistic will compare the observed sample mean to an expected population mean
μ0. Large test statistics indicate data are far from expected, providing evidence against the null
hypothesis and in favor of the alternative hypothesis.
Every test in statistics indicate the same. Based on the sample data, it gives the probability( P-
Value) that can be observed. When the P-Value is low, it means the sample data are very
significant and it indicates that the null hypothesis is wrong. When the P-value is high, it
suggests that the null hypothesis is wrong. When the P-value is high, it suggest that the collected
data are within the normal range.
5.Region of Acceptance and Region of Rejection :
The region of acceptance is a range of values. If the test statistic falls within the region of
acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the
chance of making a Type Ⅰerror is equal to the Alpha(α) level of significance.

of 52
Type Ⅰerror –A rejection of a true null hypothesis
The set of values outside the region of acceptance is called the region of rejection. If the test
statistics falls within the region of rejection, the null hypothesis is rejected at the Alpha (α) level
of significance.
6. Select an Appropriate Test:
A hypothesis test may be one-tailed or two-tailed. Whether the test is one sided or 2 sided
depends on alternative hypothesis and nature of the problem.
A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling
distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the
mean is less than equal to 10. The alternative hypothesis would be that the mean is greater than
10. The region of rejection would consist of a range of numbers located on the right side of
sampling distribution; that is, a set of numbers greater than 10.
In simple words, in one tailed test, the test statistic for rejection of null hypothesis falls only in
one side of sampling distribution curve.

of 52
Significance Level
In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis.
The significance level is used in hypothesis testing as follows: First, the difference between the
results of the experiment and the null hypothesis is determined. Then, assuming the null
hypothesis is true, the probability of a difference that large or larger is computed . Finally, this
probability is compared to the significance level. If the probability is less than or equal to the
significance level, then the null hypothesis is rejected and the outcome is said to be statistically
significant. Traditionally, experimenters have used either the 0.05 level (sometimes called the
5% level) or the 0.01 level (1% level), although the choice of levels is largely subjective. The
lower the significance level, the more the data must diverge from the null hypothesis to be
significant. Therefore, the 0.01 level is more conservative than the 0.05 level. The Greek letter
alpha (α) is sometimes used to indicate the significance level. See also: Type I
error and significance test

of 52
5) Identify the rejection region
• Is it an upper, lower, or two-tailed test?
• Determine the critical value associated with , the level of significance of the test
The third step is to compute the probability value (also known as
the p value). This is the probability of obtaining a sample statistic as
different or more different from the parameter specified in the null
hypothesis given that the null hypothesis is true.

of 52
Hypothesistesting

of 52
PARAMETRICTESTS
1. Descriptive Statistics – overview of the attributes of a data set. These include measurements
of central tendency (frequency histograms, mean, median, & mode) and dispersion (range,
variance & standard deviation)
2. Inferential Statistics - provide measures of how well data support hypothesis and if data are
generalizable beyond what was tested (significance tests)
Data: Observations recorded during research
Types of data:
1. Nominal data synonymous with categorical data, assigned names/ categories based on
characters with out ranking between categories.ex. male/female, yes/no, death /survival
2. Ordinal data orderedorgradeddata, expressedas Scores or ranks
ex.paingradedas mild,moderate andsevere
3. Interval data an equal and definite interval betweentwomeasurements
itcan be continuousordiscrete
ex.weightexpressedas20, 21,22,23,24
interval between20& 21 is same as 23 &24

of 52
ParametricHypothesis testsare frequentlyusedtomeasure the qualityof sampleparametersorto test
whetherestimatesonagivenparameterare equal fortwosamples.
ParametricHypothesistestssetupanull hypothesisagainstanalternative hypothesis,testing,for
instance,whetherornot the populationmeanisequal toacertainvalue,andthenusingappropriate
statisticstocalculate the probabilitythatthe null hypothesisistrue.Youcan thenrejector accept the
null hypothesisbasedonthe calculatedprobability.

of 52
Z test
z-testisbasedonthe normal probabilitydistributionandisusedforjudgingthe significance of several
statistical measures,particularlythe mean.The relevantteststatistic,z,isworkedoutandcompared
withitsprobable value (tobe readfromtable showingareaundernormal curve) ata specifiedlevelof
significance forjudgingthe significanceof the measure concerned.Thisisa mostfrequentlyusedtestin
researchstudies.Thistestisusedevenwhenbinomial distributionort-distributionisapplicable onthe
presumptionthatsucha distributiontendstoapproximate normal distributionas‘n’becomeslarger.z-
testis generallyusedforcomparingthe meanof a sample tosome hypothesisedmeanforthe
populationincase of large sample,orwhenpopulationvarianceisknown.z-testisalsousedforjudging
he significance of difference betweenmeansof twoindependentsamplesincase of large samples,or
whenpopulationvariance isknown.z-testisalsousedforcomparingthe sample proportiontoa
theoretical value of populationproportionorforjudgingthe difference inproportionsof two
independentsampleswhennhappenstobe large.Besides,thistestmaybe usedforjudgingthe
significance of median,mode,coefficientof correlationandseveral othermeasures.t-testisbasedont-
distributionandisconsideredanappropriate testforjudgingthe significance of asample meanorfor
judgingthe significance of difference betweenthe meansof twosamplesin case of small sample(s)
whenpopulationvariance isnotknown(inwhichcase we use variance of the sample asanestimate of
the populationvariance).Incase twosamplesare related,we use pairedt-test(orwhatisknownas
difference test) forjudging the significance of the meanof differencebetweenthe tworelatedsamples.
It can alsobe usedforjudgingthe significance of the coefficientsof simpleandpartial correlations.The
relevantteststatistic,t,iscalculatedfromthe sample dataandthen comparedwithitsprobable value
basedon t-distribution(tobe readfromthe table thatgivesprobable valuesof tfor differentlevelsof
significance fordifferentdegreesof freedom)ata specifiedlevel of significance forconcerningdegrees
of freedomforacceptingorrejectingthe null hypothesis.Itmaybe notedthatt-testappliesonlyincase
of small sample(s) whenpopulationvarianceisunknown.
A Z-testisany statistical testforwhichthe distribution of the teststatisticunderthe null hypothesis can
be approximatedbyanormal distribution.Because of the central limittheorem,manyteststatisticsare
approximately normallydistributedforlarge samples.Foreachsignificance level,the Z-testhasa single
critical value (forexample,1.96for 5% two tailed) whichmakesitmore convenientthanthe Student's t-
testwhichhas separate critical valuesforeachsample size.Therefore,manystatistical testscanbe
convenientlyperformedasapproximate Z-testsif the sample sizeislarge orthe populationvariance
known.If the population variance isunknown(andtherefore hastobe estimatedfromthe sample itself)
and the sample size isnotlarge (n< 30), the Student's t-testmaybe more appropriate.
If T isa statisticthatis approximatelynormallydistributedunderthe null hypothesis,the nextstepin
performingaZ-testisto estimate the expectedvalue θof T underthe null hypothesis,andthenobtain
an estimate sof the standard deviation of T.Afterthatthe standard score Z = (T − θ) / s iscalculated,
fromwhich one-tailedandtwo-tailedp-valuescanbe calculatedasΦ(−Z) (forupper-tailedtests),Φ(Z)
(forlower-tailedtests) and2Φ(−|Z|) (fortwo-tailedtests)where Φisthe standard normalcumulative
distributionfunction.

of 52
Use inlocationtesting[edit]
The term "Z-test"isoftenusedtoreferspecificallytothe one-samplelocationtest comparingthe mean
of a setof measurementstoa givenconstant.If the observeddata X1,..., Xn are (i) uncorrelated,(ii) have
a commonmean μ, and(iii) have acommonvariance σ2
,thenthe sample average X hasmeanμ and
variance σ2
/ n.If ournull hypothesisisthatthe meanvalue of the populationisagivennumberμ0,we
can use X −μ0 as a test-statistic,rejectingthe null hypothesisif X −μ0islarge.
To calculate the standardizedstatisticZ= (X − μ0) / s, we needtoeitherknow orhave an approximate
value forσ2
, fromwhichwe can calculate s2
= σ2
/ n.In some applications,σ2
isknown,butthisis
uncommon.If the sample size ismoderate orlarge,we can substitute the samplevariance forσ2
,giving
a plug-in test.The resultingtestwill notbe anexactZ-testsince the uncertaintyinthe sample variance is
not accountedfor— however,itwill be agoodapproximationunlessthe sample sizeissmall.A t-
testcan be usedto accountfor the uncertaintyinthe sample variance whenthe sample sizeissmall and
the data are exactly normal.There isnouniversal constantatwhichthe sample size isgenerally
consideredlarge enoughtojustifyuse of the plug-intest.Typical rulesof thumbrange from20 to50
samples.Forlargersample sizes,the t-testprocedure givesalmostidentical p-valuesasthe Z-test
procedure.
Otherlocationteststhatcan be performedas Z-testsare the two-sample locationtestandthe paired
difference test.
Conditions[edit]
For the Z-testto be applicable,certainconditionsmustbe met.
 Nuisance parameters shouldbe known,orestimatedwithhighaccuracy(anexample of a
nuisance parameterwouldbe the standarddeviation inaone-sample locationtest). Z-tests
focuson a single parameter,andtreatall otherunknownparametersasbeingfixedattheirtrue
values.Inpractice,due to Slutsky'stheorem,"pluggingin"consistentestimatesof nuisance
parameterscan be justified.Howeverif the sample sizeisnotlarge enoughforthese estimates
to be reasonablyaccurate,the Z-testmaynot performwell.
 The test statisticshouldfollowa normal distribution.Generally,one appealstothe central limit
theoremtojustifyassumingthatateststatisticvariesnormally.There isagreatdeal of
statistical researchonthe questionof whenateststatisticvariesapproximatelynormally.If the
variationof the teststatisticisstronglynon-normal,aZ-testshouldnotbe used.
If estimatesof nuisance parametersare pluggedinasdiscussedabove,itisimportanttouse estimates
appropriate forthe waythe data were sampled.Inthe special case of Z-testsforthe one ortwo sample
locationproblem,the usual samplestandarddeviation isonlyappropriate if the datawere collectedas
an independentsample.
In some situations,itispossible todevise atestthat properlyaccountsforthe variationinplug-in
estimatesof nuisance parameters.Inthe case of one and twosample locationproblems,a t-testdoes
this.
Example[edit]

of 52
Suppose thatina particulargeographicregion,the meanandstandarddeviationof scoresona reading
testare 100 points,and12 points,respectively.Ourinterestisinthe scoresof 55 studentsina particular
school whoreceivedameanscore of 96. We can askwhetherthismeanscore issignificantlylowerthan
the regional mean — that is,are the studentsinthisschool comparable toa simple randomsample of
55 studentsfromthe regionasa whole,orare theirscoressurprisinglylow?
We beginbycalculatingthe standarderrorof the mean:
where isthe populationstandarddeviation
Nextwe calculate the z-score,whichisthe distance fromthe sample meantothe populationmeanin
unitsof the standarderror:
In thisexample,we treatthe populationmeanandvariance asknown,whichwouldbe appropriateif all
studentsinthe regionwere tested.Whenpopulationparametersare unknown,attest shouldbe
conductedinstead.
The classroommeanscore is96, whichis−2.47 standarderror unitsfromthe populationmeanof 100.
Lookingupthe z-score ina table of the standard normal distribution,we findthatthe probabilityof
observingastandardnormal value below -2.47is approximately0.5- 0.4932 = 0.0068. This isthe one-
sidedp-value forthe null hypothesisthatthe 55 studentsare comparable toa simple randomsample
fromthe populationof all test-takers.The two-sidedp-valueisapproximately0.014 (twice the one-
sidedp-value).
Anotherwayof statingthingsisthat withprobability1 − 0.014 = 0.986, a simple randomsample of 55
studentswouldhave ameantestscore within4 unitsof the populationmean.We couldalsosaythat
with98.6% confidence we rejectthe null hypothesis thatthe 55 test takersare comparable to a simple
randomsample fromthe populationof test-takers.
The Z-testtellsusthat the 55 studentsof interesthave anunusuallylow meantestscore comparedto
mostsimple randomsamplesof similarsize fromthe populationof test-takers.A deficiencyof this
analysisisthatit doesnotconsiderwhethertheeffectsize of 4pointsismeaningful.If insteadof a
classroom,we consideredasubregioncontaining900 studentswhose meanscore was99, nearlythe
same z-score and p-value wouldbe observed.Thisshowsthatif the sample size islarge enough,very
small differencesfromthe null value canbe highlystatisticallysignificant.See statistical hypothesis
testingforfurtherdiscussionof thisissue.
Z-testsotherthanlocationtests[edit]
Locationtestsare the most familiar Z-tests.Anotherclassof Z-testsarisesin maximum
likelihood estimationof theparametersinaparametricstatistical model.Maximumlikelihoodestimates
are approximatelynormal undercertainconditions,andtheirasymptoticvariance canbe calculatedin

of 52
termsof the Fisherinformation.The maximumlikelihoodestimate dividedbyitsstandarderrorcan be
usedas a teststatisticfor the null hypothesisthatthe populationvalue of the parameterequalszero.
More generally,if isthe maximumlikelihoodestimate of aparameterθ, and θ0 isthe value of θ under
the null hypothesis,
can be usedasa Z-teststatistic.
Whenusinga Z-testformaximumlikelihoodestimates,itisimportanttobe aware that the normal
approximationmaybe poorif the sample size isnotsufficientlylarge. Althoughthere isnosimple,
universal rule statinghowlarge the sample sizemustbe touse a Z-test, simulation cangive agoodidea
as to whetheraZ-testisappropriate ina givensituation.
Z-testsare employedwheneveritcan be arguedthat a teststatisticfollowsanormal distributionunder
the null hypothesisof interest.Many non-parametricteststatistics,suchas U statistics,are
approximatelynormal forlarge enoughsample sizes,andhence are oftenperformedas Z-tests.
F test
F-testisbasedonF-distributionandisusedtocompare the variance of the two-independentsamples.
Thistestis alsousedinthe contextof analysisof variance (ANOVA)forjudgingthe significance of more
than twosample meansatone and the same time.Itisalsousedfor judgingthe significance of multiple
correlationcoefficients.Teststatistic,F,iscalculatedandcomparedwithitsprobable value (tobe seen
inthe F-ratiotablesfordifferentdegreesof freedomforgreaterandsmallervariancesatspecifiedlevel
of significance) foracceptingorrejectingthe null hypothesis.
An F-testisany statistical testinwhichthe teststatistichasan F-distribution underthe null hypothesis.
It ismost oftenusedwhen comparingstatistical models thathave beenfittedtoa data set,inorderto
identifythe modelthatbestfitsthe populationfromwhichthe datawere sampled.Exact"F-tests"
mainlyarise whenthe modelshave beenfittedtothe data usingleastsquares.The name wascoined
by George W. Snedecor,inhonourof SirRonaldA.Fisher.Fisherinitiallydevelopedthe statisticasthe
variance ratioin the 1920s.[

of 52
Commonexamplesof F-tests[edit]
Commonexamplesof the use of F-testsare,forexample,the studyof the followingcases:
 The hypothesisthatthe meansof a givensetof normallydistributed populations,all havingthe
same standarddeviation,are equal.Thisisperhapsthe best-knownF-test,andplaysan
importantrole inthe analysisof variance (ANOVA).
 The hypothesis thata proposedregressionmodel fitsthe datawell.SeeLack-of-fitsumof
squares.
 The hypothesisthata data setina regressionanalysis followsthe simplerof twoproposedlinear
modelsthatare nestedwithineachother.
In addition,some statistical procedures,suchas Scheffé'smethod formultiple comparisonsadjustment
inlinearmodels,alsouse F-tests.
F-testof the equalityof two variances[edit]
Main article: F-testof equalityof variances
The F-testissensitive tonon-normality.[2][3]
Inthe analysisof variance (ANOVA),alternativetests
include Levene'stest,Bartlett'stest,andthe Brown–Forsythe test.However,whenanyof these testsare
conductedtotest the underlyingassumptionof homoscedasticity (i.e.homogeneityof variance),asa
preliminarysteptotestingformeaneffects,there isanincrease inthe experiment-wiseType I
error rate.[4]
Formulaand calculation[edit]
Most F-testsarise byconsideringadecompositionof the variability inacollectionof datainterms
of sumsof squares.TheteststatisticinanF-testisthe ratio of two scaledsumsof squaresreflecting
differentsourcesof variability.Thesesumsof squaresare constructedsothat the statistictendstobe
greaterwhenthe null hypothesisisnottrue.Inorderfor the statisticto follow the F-distribution under
the null hypothesis,the sumsof squaresshouldbe statisticallyindependent,andeachshouldfollowa
scaledchi-squareddistribution.The latterconditionisguaranteedif the datavaluesare independent
and normallydistributed withacommon variance.
Multiple-comparisonANOVAproblems[edit]
The F-testinone-wayanalysisof variance isusedtoassesswhetherthe expectedvalues of a
quantitative variable withinseveralpre-definedgroupsdifferfromeachother.Forexample,suppose
that a medical trial comparesfourtreatments.The ANOVA F-testcanbe usedtoassesswhetheranyof
the treatmentsisonaverage superior,orinferior,tothe othersversusthe null hypothesisthatall four
treatmentsyieldthe same meanresponse.Thisisanexample of an"omnibus"test,meaningthata
single testisperformedtodetectanyof several possibledifferences.Alternatively,we couldcarryout
pairwise testsamongthe treatments(forinstance,inthe medical trial example withfourtreatmentswe
couldcarry out six testsamongpairs of treatments).The advantage of the ANOVA F-testisthatwe do
not needtopre-specifywhichtreatmentsare tobe compared,andwe donot needtoadjustfor
makingmultiplecomparisons.The disadvantageof the ANOVA F-testisthatif we rejectthe null
hypothesis,we donotknowwhichtreatmentscanbe saidto be significantlydifferentfromthe others –

of 52
if the F-testisperformedatlevel α we cannotstate that the treatmentpairwiththe greatestmean
difference issignificantlydifferentatlevel α.
The formulafor the one-wayANOVAF-teststatisticis
or
The "explainedvariance",or"between-groupvariability"is
where denotesthe sample mean inthe ith
group, ni is the numberof observationsinthe ith
group,
denotesthe overall meanof the data,and K denotesthe numberof groups.
The "unexplainedvariance",or"within-groupvariability"is
where Yij is the jth
observationinthe ith
out of K groups and N is the overall sample size.This F-statistic
followsthe F-distribution withK−1, N −K degreesof freedomunderthe null hypothesis.The statisticwill
be large if the between-groupvariabilityislarge relativetothe within-groupvariability,whichisunlikely
to happenif the populationmeans of the groupsall have the same value.
Note that whenthere are onlytwogroupsfor the one-wayANOVAF-test, F=t2
where tis
the Student's t statistic.
Regressionproblems[edit]
Considertwomodels,1and2, where model 1is'nested'withinmodel 2.Model 1 isthe Restricted
model,andModel 2 is the Unrestrictedone.Thatis,model 1 has p1 parameters,andmodel 2
has p2 parameters,where p2 > p1,and forany choice of parametersinmodel 1,the same regression
curve can be achievedbysome choice of the parametersof model 2.(We use the conventionthatany
constantparameterina model isincludedwhencountingthe parameters.Forinstance,the simple
linearmodel y = mx + b hasp=2 underthisconvention.)The model withmore parameterswillalwaysbe
able to fitthe data at leastas well asthe model withfewerparameters.Thustypicallymodel 2will givea
better(i.e.lowererror) fittothe data than model 1.But one oftenwantsto determine whethermodel 2
givesa significantly betterfittothe data. One approach tothis problemistouse an F test.
If there are n data pointstoestimate parametersof bothmodelsfrom, thenone cancalculate
the F statistic,givenby

of 52
where RSSi is the residual sumof squares of model i.If yourregressionmodel hasbeencalculatedwith
weights,thenreplace RSSi withχ2
,the weightedsumof squaredresiduals.Underthe null hypothesis
that model 2 doesnotprovide a significantlybetterfitthanmodel 1, F will have an F distribution,with
(p2−p1,n−p2) degreesof freedom.The null hypothesisisrejectedif the Fcalculatedfromthe datais
greaterthan the critical value of the F-distribution forsome desiredfalse-rejectionprobability(e.g.
0.05). The F-testisa Wald test.
One-wayANOVA example[edit]
Consideranexperimenttostudythe effectof three differentlevelsof afactor on a response (e.g.three
levelsof afertilizeronplantgrowth).If we had6 observationsforeachlevel,we couldwritethe
outcome of the experimentinatable like this,wherea1,a2,anda3 are the three levelsof the factor
beingstudied.
a1 a2 a3
6 8 13
8 12 9
4 9 11
5 11 8
3 6 7
4 8 12
The null hypothesis,denotedH0,forthe overall F-testforthisexperimentwouldbe thatall three levels
of the factor produce the same response,onaverage.Tocalculate the F-ratio:
Step 1: Calculate the meanwithineachgroup:
Step 2: Calculate the overall mean:

of 52
where a is the numberof groups.
Step 3: Calculate the "between-group"sumof squares:
where n is the numberof data valuespergroup.
The between-groupdegreesof freedomisone lessthanthe numberof groups
so the between-groupmeansquare value is
Step 4: Calculate the "within-group"sumof squares.Beginbycenteringthe dataineach group
a1 a2 a3
6−5=1 8−9=−1 13−10=3
8−5=3 12−9=3 9−10=−1
4−5=−1 9−9=0 11−10=1
5−5=0 11−9=2 8−10=−2
3−5=−2 6−9=−3 7−10=−3
4−5=−1 8−9=−1 12−10=2
The within-groupsumof squaresisthe sumof squaresof all 18 valuesinthistable
The within-groupdegreesof freedomis

of 52
Thus the within-groupmeansquare value is
Step 5: The F-ratiois
The critical value isthe numberthat the teststatisticmustexceedtorejectthe test.Inthis
case, Fcrit(2,15) = 3.68 at α = 0.05. Since F=9.3 > 3.68, the resultsare significantatthe 5% significance
level.One wouldrejectthe null hypothesis,concludingthatthere isstrongevidencethatthe expected
valuesinthe three groupsdiffer.The p-valueforthistestis0.002.
Afterperformingthe F-test,itiscommonto carry out some "post-hoc"analysisof the groupmeans.In
thiscase,the firsttwogroupmeansdifferby4 units,the firstand thirdgroupmeansdifferby5 units,
and the secondandthird groupmeansdifferbyonly1 unit.The standarderror of eachof these
differencesis .Thusthe firstgroup isstronglydifferentfromthe other
groups,as the meandifference ismore timesthe standarderror,sowe can be highlyconfidentthat
the populationmean of the firstgroupdiffersfromthe populationmeansof the othergroups.However
there isno evidence thatthe secondandthirdgroupshave differentpopulation meansfromeachother,
as theirmeandifference of one unitiscomparable tothe standarderror.
Note F(x, y) denotesan F-distribution cumulative distributionfunctionwith x degreesof freedominthe
numeratorand ydegreesof freedominthe denominator.
ANOVA'srobustnesswithrespecttoType I errorsfor departuresfrompopulationnormality[edit]

of 52
The one-wayANOVA canbe generalizedtothe factorial andmultivariatelayouts,aswell astothe
analysisof covariance.[clarification needed]
It isoftenstatedinpopularliterature thatnone of these F-testsare robustwhenthere are severe
violationsof the assumptionthateachpopulationfollowsthe normal distribution,particularlyforsmall
alphalevelsandunbalancedlayouts.[5]
Furthermore,itisalsoclaimedthatif the underlyingassumption
of homoscedasticity isviolated,the Type Ierrorpropertiesdegenerate muchmore severely.[6]
However,thisisa misconception,basedonworkdone inthe 1950s and earlier.The firstcomprehensive
investigationof the issue byMonte CarlosimulationwasDonaldson(1966).[7]
He showedthatunderthe
usual departures(positiveskew,unequalvariances)"the F-testisconservative"soislesslikelythanit
shouldbe to findthata variable issignificant.However,aseitherthe sample sizeorthe numberof cells
increases,"the powercurvesseemtoconverge tothatbased onthe normal distribution".More detailed
workwas done byTiku (1971).[8]
He foundthat "The non-normal theorypowerof Fisfoundto differ
fromthe normal theorypowerbya correction termwhichdecreasessharplywithincreasingsample
size."The problemof non-normality,especiallyinlarge samples,isfarlessseriousthanpopulararticles
wouldsuggest.
The current viewisthat"Monte-Carlostudieswere usedextensivelywithnormal distribution-based
teststo determine howsensitivetheyare toviolationsof the assumptionof normal distributionof the
analyzedvariablesinthe population.The general conclusionfromthese studiesisthatthe
consequencesof suchviolationsare less severe thanpreviouslythought.Althoughthese conclusions
shouldnotentirelydiscourage anyone frombeingconcernedaboutthe normalityassumption,theyhave
increasedthe overall popularityof the distribution-dependentstatistical testsinall areasof research."[9]
For nonparametricalternativesinthe factorial layout,see Sawilowsky.[10]
Formore discussion
see ANOVA onranks.

of 52
References[edit]
1. Jump up^ Lomax, Richard G. (2007) Statistical Concepts: A Second Course, p. 10, ISBN 0-
8058-5850-4
2. Jump up^ Box, G. E. P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4):
318–335. doi:10.1093/biomet/40.3-4.318.JSTOR 2333350.
3. Jump up^ Markowski, Carol A; Markowski, Edward P. (1990). "Conditions for the Effectiveness
of a Preliminary Test of Variance". The American Statistician 44 (4): 322–
326. doi:10.2307/2684360. JSTOR 2684360.
4. Jump up^ Sawilowsky, S. (2002). "Fermat, Schubert, Einstein, and Behrens-Fisher:The
Probable Difference Between Two Means When σ1
2
≠ σ2
2
". Journal of Modern Applied Statistical
Methods, 1(2), 461–472.
5. Jump up^ Blair, R. C. (1981). "A reaction to 'Consequences of failure to meet assumptions
underlying the fixed effects analysis of variance and covariance.'" Review of Educational
Research, 51, 499–507.
6. Jump up^ Randolf, E. A., & Barcikowski, R. S. (1989, November). "Type I error rate when real
study values are used as population parameters in a Monte Carlo study". Paper presented at the
11th annual meeting of the Mid-Western Educational Research Association, Chicago.
7. Jump
up^ https://www.rand.org/content/dam/rand/pubs/research_memoranda/2008/RM5072.pdf
8. Jump up^ M. L. Tiku, "Power Function of the F-Test Under Non-Normal Situations", Journal of
the American Statistical Association Vol. 66, No. 336 (Dec., 1971), page 913
9. Jump up^ https://www.statsoft.com/textbook/elementary-statistics-concepts/
10. Jump up^ Sawilowsky, S. (1990). Nonparametric tests of interaction in experimental
design. Review of Educational Research, 25(20–59).

Hypothesis testing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hypothesis testing

Similar to Hypothesis testing (20)

Recently uploaded

Recently uploaded (20)

Hypothesis testing