This document discusses hypothesis testing and interpretation of data. It provides an example of two lecturers, Sandy and Mandy, who want to test whether providing seminar classes in addition to lectures improves student performance compared to lectures alone. The document outlines the steps in hypothesis testing: 1) Identify the research problem and variables, 2) Specify the null and alternative hypotheses, 3) Choose a significance level, 4) Identify the test statistic, 5) Determine the rejection region, and 6) Select the appropriate statistical test. It defines type 1 and type 2 errors and explains key concepts like the null hypothesis, alternative hypothesis, and significance level.
Hypothesis Testing. Inferential Statistics pt. 2John Labrador
A hypothesis test is a statistical test that is used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. A hypothesis test examines two opposing hypotheses about a population: the null hypothesis and the alternative hypothesis.
INTRODUCTION
CHARACTERISTICS OF A HYPOTHESIS
CRITERIA FOR HYPOTHESIS CONSTRUCTION
STEPS IN HYPOTHESIS TESTING
SOURCES OF HYPOTHESIS
APPROACHES TO HYPOTHESIS TESTING
THE LOGIC OF HYPOTHESIS TESTING
TYPES OF ERRORS IN HYPOTHESIS
hypothesis-Meaning need for hypothesis qualities of good hypothesis type of hypothesis null and alternative hypothesis sources of hypothesis formulation of hypothesis, hypothesis testing
Hypothesis Testing. Inferential Statistics pt. 2John Labrador
A hypothesis test is a statistical test that is used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. A hypothesis test examines two opposing hypotheses about a population: the null hypothesis and the alternative hypothesis.
INTRODUCTION
CHARACTERISTICS OF A HYPOTHESIS
CRITERIA FOR HYPOTHESIS CONSTRUCTION
STEPS IN HYPOTHESIS TESTING
SOURCES OF HYPOTHESIS
APPROACHES TO HYPOTHESIS TESTING
THE LOGIC OF HYPOTHESIS TESTING
TYPES OF ERRORS IN HYPOTHESIS
hypothesis-Meaning need for hypothesis qualities of good hypothesis type of hypothesis null and alternative hypothesis sources of hypothesis formulation of hypothesis, hypothesis testing
A hypothesis is the translation of the information that we are keen on. Utilizing Hypothesis Testing, we attempt to decipher or reach inferences about the populace utilizing test information. A Hypothesis assesses two totally unrelated articulations about a populace to figure out which explanation is best upheld by the example information.
Hypothesis testing and estimation are used to reach conclusions about a population by examining a sample of that population.
Hypothesis testing is widely used in medicine, dentistry, health care, biology and other fields as a means to draw conclusions about the nature of populations
Hypothesis is usually considered as the principal instrument in research and quality control. Its main function is to suggest new experiments and observations. In fact, many experiments are carried out with the deliberate object of testing hypothesis. Decision makers often face situations wherein they are interested in testing hypothesis on the basis of available information and then take decisions on the basis of such testing. In Six –Sigma methodology, hypothesis testing is a tool of substance and used in analysis phase of the six sigma project so that improvement can be done in right direction
A hypothesis is the translation of the information that we are keen on. Utilizing Hypothesis Testing, we attempt to decipher or reach inferences about the populace utilizing test information. A Hypothesis assesses two totally unrelated articulations about a populace to figure out which explanation is best upheld by the example information.
Hypothesis testing and estimation are used to reach conclusions about a population by examining a sample of that population.
Hypothesis testing is widely used in medicine, dentistry, health care, biology and other fields as a means to draw conclusions about the nature of populations
Hypothesis is usually considered as the principal instrument in research and quality control. Its main function is to suggest new experiments and observations. In fact, many experiments are carried out with the deliberate object of testing hypothesis. Decision makers often face situations wherein they are interested in testing hypothesis on the basis of available information and then take decisions on the basis of such testing. In Six –Sigma methodology, hypothesis testing is a tool of substance and used in analysis phase of the six sigma project so that improvement can be done in right direction
Standards of Auditing - Introduction and Application in the Indian ContextBharath Rao
A brief introduction to those who are new to the standards of auditing as issued by the Institute of Chartered Accountants of India. This presentation briefs about the concept of Auditing Standards, its relevance and its application in our daily audits.
This Student Financial Service system provides an easy, accessible, and seamless to
students whenever they need it. It ensures access to a PU education for all admitted and enrolled
students without regard to their financial circumstances. It provides the best possible solutions and
service to students and their families.
A study on understanding the concept of demonetization with reference to MBA ...Syed Valiullah Bakhtiyari
This research is fully based on primary data and it has been collected first hand by the researcher itself, since the respondents were students pursuing master's in business administration it becomes very interesting to know the new age jargon of demonetization.
Hypothesis Testing Definitions A statistical hypothesi.docxwilcockiris
Hypothesis Testing
Definitions:
A statistical hypothesis is a guess about a population parameter. The guess may or not be
true.
The null hypothesis, written H0, is a statistical hypothesis that states that there is no
difference between a parameter and a specific value, or that there is no difference between
two parameters.
The alternative hypothesis, written H1 or HA, is a statistical hypothesis that specifies a
specific difference between a parameter and a specific value, or that there is a difference
between two parameters.
Example 1:
A medical researcher is interested in finding out whether a new medication will have
undesirable side effects. She is particularly concerned with the pulse rate of patients who
take the medication. The research question is, will the pulse rate increase, decrease, or
remain the same after a patient takes the medication?
Since the researcher knows that the mean pulse rate for the population under study is 82
beats per minute, the hypotheses for this study are:
H0: µ = 82
HA: µ ≠ 82
The null hypothesis specifies that the mean will remain unchanged and the alternative
hypothesis states that it will be different. This test is called a two-tailed test since the
possible side effects could be to raise or lower the pulse rate. Notice that this is a non
directional hypothesis. The rejection region lies in both tails. We divide the alpha in two
and place half in each tail.
Example 2:
An entrepreneur invents an additive to increase the life of an automobile battery. If the
mean lifetime of the automobile battery is 36 months, then his hypotheses are:
H0: µ ≤ 36
HA: µ > 36
Here, the entrepreneur is only interested in increasing the lifetime of the batteries, so his
alternative hypothesis is that the mean is greater than 36 months. The null hypothesis is
that the mean is less than or equal to 36 months. This test is one-tailed since the interest
is only in an increased lifetime. Notice that the direction of the inequality in the alternate
hypothesis points to the right, same as the area of the curve that forms the rejection
region.
Example 3:
A landlord who wants to lower heating bills in a large apartment complex is considering
using a new type of insulation. If the current average of the monthly heating bills is $78,
his hypotheses about heating costs with the new insulation are:
H0: µ ≥ 78
HA: µ < 78
This test is also a one-tailed test since the landlord is interested only in lowering heating
costs. Notice that the direction of the inequality in the alternate hypothesis points to the
left, same as the area of the curve that forms the rejection region.
Study Design:
After stating the hypotheses, the researcher’s next step is to design the study. In designing
the study, the researcher selects an appropriate statistical test, chooses a level of
significance, and formulates a plan for conducting the study..
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
1. Page 1 of 52
Hypothesis testing and interpretation of data
Testing Of Hypothesis
The basic logic of hypothesis testing is to prove or disprove the research question. When a
researcher conducts quantitatively research, he/she is attempting to answer a research question or
hypothesis that has been formulated .One method of evaluating this research question is via a
process called hypothesis testing, which is sometimes also referred to as significance testing.
Example :
Two lecturers, Sandy and Mandy, thinks that they use the best method to teach their students.
Each lecturer has 50 statistics student who are studying a graduate degree in management. In
sandy’s class, students have to attend one lecture and one seminar class every week, whilst in
Mandy believes that lectures are sufficient by themselves in their own time. This is the first year
that Sandy has given seminars, but since they take up a lot of her time, she wants to make sure
that she is not wasting her time and that seminars improve the students’ performance.
The ResearchHypothesis
The first step in hypothesis testing is to set a research hypothesis. In a sandy and mandy,s study,
the aim is to examine the effect that two different teaching methods – providing both lectures
and seminars classes (sandy), and providing only lectures by themselves (mandy) – had on the
performance of the students. More specifically , they want to determine whether performance is
different between the two different teaching methods. Whilst mandy is skeptical about the
2. Page 2 of 52
effectiveness of seminars, sandy clearly believes that students do better than those in mandy’s
class. This leads to the following research hypothesis:
Researchhypothesis: When student attend seminar classes, in addition to lecture, their
performance increases.
By taking a hypothesis testing approach, Sandy and Mandy want to generalize their result toa
population(total students) rather than just the students in their sample. However, in order to use
hypothesis testing, one needs to re-state the research hypothesis as a null and alternative
hypothesis.
Null hypothesis : the null hypothesis (H0) is a hypothesis which the researcher tries to disprove,
reject or nullify. A null hypothesis is “the hypothesis that there is no relationship between two or
more variables, symbolized as H0.
Alternative hypothesis: the alternate, or research, hypothesis proposes a relationship between two
or more variables, symbolized as H1.
Decision errors
Two type of errors can result from a hypothesis test.
TypeⅠerror : A typeⅠerror occurs when the researcher rejects anull hypothesis when it is true.
The probability of committing a type error is called the significance level. This probability is
also called alpha, and is often denoted by α
3. Page 3 of 52
Type Ⅱerror : A Type Ⅱ error occurs when the researcher fails to reject a null hypothesis,
which is false. The probability of committing a Type Ⅱ error is called Beta, and is often
denoted by β . The probability of not committing a TypeⅡ error is called the Power of
the test.
4. Page 4 of 52
Steps/procedures in Hypothesis Testing
1. Identify the research problem :
The first step is to state the research problem The research problem needs to identify the
population of interest ,and the variables under investigation.
Example of research problem: To find out the effectiveness of two teaching methods- only
lecture method- with reference to exam marks of the students.
In the above research problem, the population of interest refers to the student, and the variable
include the teaching methods and the marks.
This step enable the researcher not only define what is not to be tested but what variable(s) will
be used in sample data collection. The type of variable(s), wheter categorical, discreate or
continuous, further defines the statistical test which can be performed on the collected data.
2.Specific the null and alternative Hypothesis:
The research problem or question is converted into a null hypothesis and an alternative
hypothesis. The hypothesis. The hypotheses are started in such a way that they are mutually
exclusive. That is, if one is true, the other must be false.
(a)Null Hypothesis: A null hypothesis (H0)is a statement that declares the observed difference is
due to “chance”. It is the hypothesis the researcher hopes to reject or disprove.
A null hypothesis states that there is no relationship between two or more variables. The
simplistic definition of the null is - as the opposite of the alternative hypothesis(H1).
Example: “There is no difference between the two methods of teaching( only lecture method,
and lecture-cum-seminar method) on the scoring of marks of student.”
5. Page 5 of 52
(b) Alternative Hypothesis:
The alternate hypothesis proposes a relationship between two or more variables, symbolized as
H1.
Example: “The lecture-cum-seminar method improves the scoring of marks of students as
compared to the only lecture method.”
“Note that the two hypotheses we propose to test must be mutually exclusive i.e., when one is
true the other must be false. And we see that they must be exhaustive; they must be include all
possible occurrences.”
From the above, it is clear that the null hypothesis is a hypothesis of no difference. The main
problem of testing of hypothesis is to accept or to reject the null hypothesis. The alternative
hypothesis specifies a definite relationship between the two variables. Only one alternative
hypothesis is tested against the null hypothesis.
3. Significance Level:
After formulating the hypotheses, the researcher must determine a certain level of significance.
The confidence with which a null hypothesis is accepted or rejected depends on the level of
significance.
Generally, the level of significance falls between 5%and 1%:
A significance level of 5% means the risk of making a wrong decision in accepting a false
hypothesis or in rejecting a true hypothesis by 5 times out 100 occasions.
A significance level of 1% means the ris of making a wrong decision is 1%. This means the
researcher may make o
6. Page 6 of 52
A wrong decision in accepting a false hypothesis or in rejecting a true hypothesis is once out of
100 occasions. Therefore, a 1% level of significance provides greater confidence with which null
hypothesis is accepted or rejected as compared to 5% level of significance.
4.Test Statistic:
A statistic used to test the null hypothesis. The researcher needs to identify a test statistic that can
be used to assess the truth of the null hypothesis. It is used to test whether the null hypothesis set
up should be accepted or rejected.
Test statistic is calculated from the collected data. There are different types of test statistics. For
instance, the z statistic will compare the observed sample mean to an expected population mean
μ0. Large test statistics indicate data are far from expected, providing evidence against the null
hypothesis and in favor of the alternative hypothesis.
Every test in statistics indicate the same. Based on the sample data, it gives the probability( P-
Value) that can be observed. When the P-Value is low, it means the sample data are very
significant and it indicates that the null hypothesis is wrong. When the P-value is high, it
suggests that the null hypothesis is wrong. When the P-value is high, it suggest that the collected
data are within the normal range.
5.Region of Acceptance and Region of Rejection :
The region of acceptance is a range of values. If the test statistic falls within the region of
acceptance, the null hypothesis is not rejected. The region of acceptance is defined so that the
chance of making a Type Ⅰerror is equal to the Alpha(α) level of significance.
7. Page 7 of 52
Type Ⅰerror –A rejection of a true null hypothesis
The set of values outside the region of acceptance is called the region of rejection. If the test
statistics falls within the region of rejection, the null hypothesis is rejected at the Alpha (α) level
of significance.
6. Select an Appropriate Test:
A hypothesis test may be one-tailed or two-tailed. Whether the test is one sided or 2 sided
depends on alternative hypothesis and nature of the problem.
A test of a statistical hypothesis, where the region of rejection is on only one side of the sampling
distribution, is called a one-tailed test. For example, suppose the null hypothesis states that the
mean is less than equal to 10. The alternative hypothesis would be that the mean is greater than
10. The region of rejection would consist of a range of numbers located on the right side of
sampling distribution; that is, a set of numbers greater than 10.
In simple words, in one tailed test, the test statistic for rejection of null hypothesis falls only in
one side of sampling distribution curve.
8. Page 8 of 52
Significance Level
In hypothesis testing, the significance level is the criterion used for rejecting the null hypothesis.
The significance level is used in hypothesis testing as follows: First, the difference between the
results of the experiment and the null hypothesis is determined. Then, assuming the null
hypothesis is true, the probability of a difference that large or larger is computed . Finally, this
probability is compared to the significance level. If the probability is less than or equal to the
significance level, then the null hypothesis is rejected and the outcome is said to be statistically
significant. Traditionally, experimenters have used either the 0.05 level (sometimes called the
5% level) or the 0.01 level (1% level), although the choice of levels is largely subjective. The
lower the significance level, the more the data must diverge from the null hypothesis to be
significant. Therefore, the 0.01 level is more conservative than the 0.05 level. The Greek letter
alpha (α) is sometimes used to indicate the significance level. See also: Type I
error and significance test
9. Page 9 of 52
5) Identify the rejection region
• Is it an upper, lower, or two-tailed test?
• Determine the critical value associated with , the level of significance of the test
The third step is to compute the probability value (also known as
the p value). This is the probability of obtaining a sample statistic as
different or more different from the parameter specified in the null
hypothesis given that the null hypothesis is true.
14. Page 14 of 52
PARAMETRICTESTS
1. Descriptive Statistics – overview of the attributes of a data set. These include measurements
of central tendency (frequency histograms, mean, median, & mode) and dispersion (range,
variance & standard deviation)
2. Inferential Statistics - provide measures of how well data support hypothesis and if data are
generalizable beyond what was tested (significance tests)
Data: Observations recorded during research
Types of data:
1. Nominal data synonymous with categorical data, assigned names/ categories based on
characters with out ranking between categories.ex. male/female, yes/no, death /survival
2. Ordinal data orderedorgradeddata, expressedas Scores or ranks
ex.paingradedas mild,moderate andsevere
3. Interval data an equal and definite interval betweentwomeasurements
itcan be continuousordiscrete
ex.weightexpressedas20, 21,22,23,24
interval between20& 21 is same as 23 &24
16. Page 16 of 52
ParametricHypothesis testsare frequentlyusedtomeasure the qualityof sampleparametersorto test
whetherestimatesonagivenparameterare equal fortwosamples.
ParametricHypothesistestssetupanull hypothesisagainstanalternative hypothesis,testing,for
instance,whetherornot the populationmeanisequal toacertainvalue,andthenusingappropriate
statisticstocalculate the probabilitythatthe null hypothesisistrue.Youcan thenrejector accept the
null hypothesisbasedonthe calculatedprobability.
17. Page 17 of 52
Z test
z-testisbasedonthe normal probabilitydistributionandisusedforjudgingthe significance of several
statistical measures,particularlythe mean.The relevantteststatistic,z,isworkedoutandcompared
withitsprobable value (tobe readfromtable showingareaundernormal curve) ata specifiedlevelof
significance forjudgingthe significanceof the measure concerned.Thisisa mostfrequentlyusedtestin
researchstudies.Thistestisusedevenwhenbinomial distributionort-distributionisapplicable onthe
presumptionthatsucha distributiontendstoapproximate normal distributionas‘n’becomeslarger.z-
testis generallyusedforcomparingthe meanof a sample tosome hypothesisedmeanforthe
populationincase of large sample,orwhenpopulationvarianceisknown.z-testisalsousedforjudging
he significance of difference betweenmeansof twoindependentsamplesincase of large samples,or
whenpopulationvariance isknown.z-testisalsousedforcomparingthe sample proportiontoa
theoretical value of populationproportionorforjudgingthe difference inproportionsof two
independentsampleswhennhappenstobe large.Besides,thistestmaybe usedforjudgingthe
significance of median,mode,coefficientof correlationandseveral othermeasures.t-testisbasedont-
distributionandisconsideredanappropriate testforjudgingthe significance of asample meanorfor
judgingthe significance of difference betweenthe meansof twosamplesin case of small sample(s)
whenpopulationvariance isnotknown(inwhichcase we use variance of the sample asanestimate of
the populationvariance).Incase twosamplesare related,we use pairedt-test(orwhatisknownas
difference test) forjudging the significance of the meanof differencebetweenthe tworelatedsamples.
It can alsobe usedforjudgingthe significance of the coefficientsof simpleandpartial correlations.The
relevantteststatistic,t,iscalculatedfromthe sample dataandthen comparedwithitsprobable value
basedon t-distribution(tobe readfromthe table thatgivesprobable valuesof tfor differentlevelsof
significance fordifferentdegreesof freedom)ata specifiedlevel of significance forconcerningdegrees
of freedomforacceptingorrejectingthe null hypothesis.Itmaybe notedthatt-testappliesonlyincase
of small sample(s) whenpopulationvarianceisunknown.
A Z-testisany statistical testforwhichthe distribution of the teststatisticunderthe null hypothesis can
be approximatedbyanormal distribution.Because of the central limittheorem,manyteststatisticsare
approximately normallydistributedforlarge samples.Foreachsignificance level,the Z-testhasa single
critical value (forexample,1.96for 5% two tailed) whichmakesitmore convenientthanthe Student's t-
testwhichhas separate critical valuesforeachsample size.Therefore,manystatistical testscanbe
convenientlyperformedasapproximate Z-testsif the sample sizeislarge orthe populationvariance
known.If the population variance isunknown(andtherefore hastobe estimatedfromthe sample itself)
and the sample size isnotlarge (n< 30), the Student's t-testmaybe more appropriate.
If T isa statisticthatis approximatelynormallydistributedunderthe null hypothesis,the nextstepin
performingaZ-testisto estimate the expectedvalue θof T underthe null hypothesis,andthenobtain
an estimate sof the standard deviation of T.Afterthatthe standard score Z = (T − θ) / s iscalculated,
fromwhich one-tailedandtwo-tailedp-valuescanbe calculatedasΦ(−Z) (forupper-tailedtests),Φ(Z)
(forlower-tailedtests) and2Φ(−|Z|) (fortwo-tailedtests)where Φisthe standard normalcumulative
distributionfunction.
18. Page 18 of 52
Use inlocationtesting[edit]
The term "Z-test"isoftenusedtoreferspecificallytothe one-samplelocationtest comparingthe mean
of a setof measurementstoa givenconstant.If the observeddata X1,..., Xn are (i) uncorrelated,(ii) have
a commonmean μ, and(iii) have acommonvariance σ2
,thenthe sample average X hasmeanμ and
variance σ2
/ n.If ournull hypothesisisthatthe meanvalue of the populationisagivennumberμ0,we
can use X −μ0 as a test-statistic,rejectingthe null hypothesisif X −μ0islarge.
To calculate the standardizedstatisticZ= (X − μ0) / s, we needtoeitherknow orhave an approximate
value forσ2
, fromwhichwe can calculate s2
= σ2
/ n.In some applications,σ2
isknown,butthisis
uncommon.If the sample size ismoderate orlarge,we can substitute the samplevariance forσ2
,giving
a plug-in test.The resultingtestwill notbe anexactZ-testsince the uncertaintyinthe sample variance is
not accountedfor— however,itwill be agoodapproximationunlessthe sample sizeissmall.A t-
testcan be usedto accountfor the uncertaintyinthe sample variance whenthe sample sizeissmall and
the data are exactly normal.There isnouniversal constantatwhichthe sample size isgenerally
consideredlarge enoughtojustifyuse of the plug-intest.Typical rulesof thumbrange from20 to50
samples.Forlargersample sizes,the t-testprocedure givesalmostidentical p-valuesasthe Z-test
procedure.
Otherlocationteststhatcan be performedas Z-testsare the two-sample locationtestandthe paired
difference test.
Conditions[edit]
For the Z-testto be applicable,certainconditionsmustbe met.
Nuisance parameters shouldbe known,orestimatedwithhighaccuracy(anexample of a
nuisance parameterwouldbe the standarddeviation inaone-sample locationtest). Z-tests
focuson a single parameter,andtreatall otherunknownparametersasbeingfixedattheirtrue
values.Inpractice,due to Slutsky'stheorem,"pluggingin"consistentestimatesof nuisance
parameterscan be justified.Howeverif the sample sizeisnotlarge enoughforthese estimates
to be reasonablyaccurate,the Z-testmaynot performwell.
The test statisticshouldfollowa normal distribution.Generally,one appealstothe central limit
theoremtojustifyassumingthatateststatisticvariesnormally.There isagreatdeal of
statistical researchonthe questionof whenateststatisticvariesapproximatelynormally.If the
variationof the teststatisticisstronglynon-normal,aZ-testshouldnotbe used.
If estimatesof nuisance parametersare pluggedinasdiscussedabove,itisimportanttouse estimates
appropriate forthe waythe data were sampled.Inthe special case of Z-testsforthe one ortwo sample
locationproblem,the usual samplestandarddeviation isonlyappropriate if the datawere collectedas
an independentsample.
In some situations,itispossible todevise atestthat properlyaccountsforthe variationinplug-in
estimatesof nuisance parameters.Inthe case of one and twosample locationproblems,a t-testdoes
this.
Example[edit]
19. Page 19 of 52
Suppose thatina particulargeographicregion,the meanandstandarddeviationof scoresona reading
testare 100 points,and12 points,respectively.Ourinterestisinthe scoresof 55 studentsina particular
school whoreceivedameanscore of 96. We can askwhetherthismeanscore issignificantlylowerthan
the regional mean — that is,are the studentsinthisschool comparable toa simple randomsample of
55 studentsfromthe regionasa whole,orare theirscoressurprisinglylow?
We beginbycalculatingthe standarderrorof the mean:
where isthe populationstandarddeviation
Nextwe calculate the z-score,whichisthe distance fromthe sample meantothe populationmeanin
unitsof the standarderror:
In thisexample,we treatthe populationmeanandvariance asknown,whichwouldbe appropriateif all
studentsinthe regionwere tested.Whenpopulationparametersare unknown,attest shouldbe
conductedinstead.
The classroommeanscore is96, whichis−2.47 standarderror unitsfromthe populationmeanof 100.
Lookingupthe z-score ina table of the standard normal distribution,we findthatthe probabilityof
observingastandardnormal value below -2.47is approximately0.5- 0.4932 = 0.0068. This isthe one-
sidedp-value forthe null hypothesisthatthe 55 studentsare comparable toa simple randomsample
fromthe populationof all test-takers.The two-sidedp-valueisapproximately0.014 (twice the one-
sidedp-value).
Anotherwayof statingthingsisthat withprobability1 − 0.014 = 0.986, a simple randomsample of 55
studentswouldhave ameantestscore within4 unitsof the populationmean.We couldalsosaythat
with98.6% confidence we rejectthe null hypothesis thatthe 55 test takersare comparable to a simple
randomsample fromthe populationof test-takers.
The Z-testtellsusthat the 55 studentsof interesthave anunusuallylow meantestscore comparedto
mostsimple randomsamplesof similarsize fromthe populationof test-takers.A deficiencyof this
analysisisthatit doesnotconsiderwhethertheeffectsize of 4pointsismeaningful.If insteadof a
classroom,we consideredasubregioncontaining900 studentswhose meanscore was99, nearlythe
same z-score and p-value wouldbe observed.Thisshowsthatif the sample size islarge enough,very
small differencesfromthe null value canbe highlystatisticallysignificant.See statistical hypothesis
testingforfurtherdiscussionof thisissue.
Z-testsotherthanlocationtests[edit]
Locationtestsare the most familiar Z-tests.Anotherclassof Z-testsarisesin maximum
likelihood estimationof theparametersinaparametricstatistical model.Maximumlikelihoodestimates
are approximatelynormal undercertainconditions,andtheirasymptoticvariance canbe calculatedin
20. Page 20 of 52
termsof the Fisherinformation.The maximumlikelihoodestimate dividedbyitsstandarderrorcan be
usedas a teststatisticfor the null hypothesisthatthe populationvalue of the parameterequalszero.
More generally,if isthe maximumlikelihoodestimate of aparameterθ, and θ0 isthe value of θ under
the null hypothesis,
can be usedasa Z-teststatistic.
Whenusinga Z-testformaximumlikelihoodestimates,itisimportanttobe aware that the normal
approximationmaybe poorif the sample size isnotsufficientlylarge. Althoughthere isnosimple,
universal rule statinghowlarge the sample sizemustbe touse a Z-test, simulation cangive agoodidea
as to whetheraZ-testisappropriate ina givensituation.
Z-testsare employedwheneveritcan be arguedthat a teststatisticfollowsanormal distributionunder
the null hypothesisof interest.Many non-parametricteststatistics,suchas U statistics,are
approximatelynormal forlarge enoughsample sizes,andhence are oftenperformedas Z-tests.
F test
F-testisbasedonF-distributionandisusedtocompare the variance of the two-independentsamples.
Thistestis alsousedinthe contextof analysisof variance (ANOVA)forjudgingthe significance of more
than twosample meansatone and the same time.Itisalsousedfor judgingthe significance of multiple
correlationcoefficients.Teststatistic,F,iscalculatedandcomparedwithitsprobable value (tobe seen
inthe F-ratiotablesfordifferentdegreesof freedomforgreaterandsmallervariancesatspecifiedlevel
of significance) foracceptingorrejectingthe null hypothesis.
An F-testisany statistical testinwhichthe teststatistichasan F-distribution underthe null hypothesis.
It ismost oftenusedwhen comparingstatistical models thathave beenfittedtoa data set,inorderto
identifythe modelthatbestfitsthe populationfromwhichthe datawere sampled.Exact"F-tests"
mainlyarise whenthe modelshave beenfittedtothe data usingleastsquares.The name wascoined
by George W. Snedecor,inhonourof SirRonaldA.Fisher.Fisherinitiallydevelopedthe statisticasthe
variance ratioin the 1920s.[
21. Page 21 of 52
Commonexamplesof F-tests[edit]
Commonexamplesof the use of F-testsare,forexample,the studyof the followingcases:
The hypothesisthatthe meansof a givensetof normallydistributed populations,all havingthe
same standarddeviation,are equal.Thisisperhapsthe best-knownF-test,andplaysan
importantrole inthe analysisof variance (ANOVA).
The hypothesis thata proposedregressionmodel fitsthe datawell.SeeLack-of-fitsumof
squares.
The hypothesisthata data setina regressionanalysis followsthe simplerof twoproposedlinear
modelsthatare nestedwithineachother.
In addition,some statistical procedures,suchas Scheffé'smethod formultiple comparisonsadjustment
inlinearmodels,alsouse F-tests.
F-testof the equalityof two variances[edit]
Main article: F-testof equalityof variances
The F-testissensitive tonon-normality.[2][3]
Inthe analysisof variance (ANOVA),alternativetests
include Levene'stest,Bartlett'stest,andthe Brown–Forsythe test.However,whenanyof these testsare
conductedtotest the underlyingassumptionof homoscedasticity (i.e.homogeneityof variance),asa
preliminarysteptotestingformeaneffects,there isanincrease inthe experiment-wiseType I
error rate.[4]
Formulaand calculation[edit]
Most F-testsarise byconsideringadecompositionof the variability inacollectionof datainterms
of sumsof squares.TheteststatisticinanF-testisthe ratio of two scaledsumsof squaresreflecting
differentsourcesof variability.Thesesumsof squaresare constructedsothat the statistictendstobe
greaterwhenthe null hypothesisisnottrue.Inorderfor the statisticto follow the F-distribution under
the null hypothesis,the sumsof squaresshouldbe statisticallyindependent,andeachshouldfollowa
scaledchi-squareddistribution.The latterconditionisguaranteedif the datavaluesare independent
and normallydistributed withacommon variance.
Multiple-comparisonANOVAproblems[edit]
The F-testinone-wayanalysisof variance isusedtoassesswhetherthe expectedvalues of a
quantitative variable withinseveralpre-definedgroupsdifferfromeachother.Forexample,suppose
that a medical trial comparesfourtreatments.The ANOVA F-testcanbe usedtoassesswhetheranyof
the treatmentsisonaverage superior,orinferior,tothe othersversusthe null hypothesisthatall four
treatmentsyieldthe same meanresponse.Thisisanexample of an"omnibus"test,meaningthata
single testisperformedtodetectanyof several possibledifferences.Alternatively,we couldcarryout
pairwise testsamongthe treatments(forinstance,inthe medical trial example withfourtreatmentswe
couldcarry out six testsamongpairs of treatments).The advantage of the ANOVA F-testisthatwe do
not needtopre-specifywhichtreatmentsare tobe compared,andwe donot needtoadjustfor
makingmultiplecomparisons.The disadvantageof the ANOVA F-testisthatif we rejectthe null
hypothesis,we donotknowwhichtreatmentscanbe saidto be significantlydifferentfromthe others –
22. Page 22 of 52
if the F-testisperformedatlevel α we cannotstate that the treatmentpairwiththe greatestmean
difference issignificantlydifferentatlevel α.
The formulafor the one-wayANOVAF-teststatisticis
or
The "explainedvariance",or"between-groupvariability"is
where denotesthe sample mean inthe ith
group, ni is the numberof observationsinthe ith
group,
denotesthe overall meanof the data,and K denotesthe numberof groups.
The "unexplainedvariance",or"within-groupvariability"is
where Yij is the jth
observationinthe ith
out of K groups and N is the overall sample size.This F-statistic
followsthe F-distribution withK−1, N −K degreesof freedomunderthe null hypothesis.The statisticwill
be large if the between-groupvariabilityislarge relativetothe within-groupvariability,whichisunlikely
to happenif the populationmeans of the groupsall have the same value.
Note that whenthere are onlytwogroupsfor the one-wayANOVAF-test, F=t2
where tis
the Student's t statistic.
Regressionproblems[edit]
Considertwomodels,1and2, where model 1is'nested'withinmodel 2.Model 1 isthe Restricted
model,andModel 2 is the Unrestrictedone.Thatis,model 1 has p1 parameters,andmodel 2
has p2 parameters,where p2 > p1,and forany choice of parametersinmodel 1,the same regression
curve can be achievedbysome choice of the parametersof model 2.(We use the conventionthatany
constantparameterina model isincludedwhencountingthe parameters.Forinstance,the simple
linearmodel y = mx + b hasp=2 underthisconvention.)The model withmore parameterswillalwaysbe
able to fitthe data at leastas well asthe model withfewerparameters.Thustypicallymodel 2will givea
better(i.e.lowererror) fittothe data than model 1.But one oftenwantsto determine whethermodel 2
givesa significantly betterfittothe data. One approach tothis problemistouse an F test.
If there are n data pointstoestimate parametersof bothmodelsfrom, thenone cancalculate
the F statistic,givenby
23. Page 23 of 52
where RSSi is the residual sumof squares of model i.If yourregressionmodel hasbeencalculatedwith
weights,thenreplace RSSi withχ2
,the weightedsumof squaredresiduals.Underthe null hypothesis
that model 2 doesnotprovide a significantlybetterfitthanmodel 1, F will have an F distribution,with
(p2−p1,n−p2) degreesof freedom.The null hypothesisisrejectedif the Fcalculatedfromthe datais
greaterthan the critical value of the F-distribution forsome desiredfalse-rejectionprobability(e.g.
0.05). The F-testisa Wald test.
One-wayANOVA example[edit]
Consideranexperimenttostudythe effectof three differentlevelsof afactor on a response (e.g.three
levelsof afertilizeronplantgrowth).If we had6 observationsforeachlevel,we couldwritethe
outcome of the experimentinatable like this,wherea1,a2,anda3 are the three levelsof the factor
beingstudied.
a1 a2 a3
6 8 13
8 12 9
4 9 11
5 11 8
3 6 7
4 8 12
The null hypothesis,denotedH0,forthe overall F-testforthisexperimentwouldbe thatall three levels
of the factor produce the same response,onaverage.Tocalculate the F-ratio:
Step 1: Calculate the meanwithineachgroup:
Step 2: Calculate the overall mean:
24. Page 24 of 52
where a is the numberof groups.
Step 3: Calculate the "between-group"sumof squares:
where n is the numberof data valuespergroup.
The between-groupdegreesof freedomisone lessthanthe numberof groups
so the between-groupmeansquare value is
Step 4: Calculate the "within-group"sumof squares.Beginbycenteringthe dataineach group
a1 a2 a3
6−5=1 8−9=−1 13−10=3
8−5=3 12−9=3 9−10=−1
4−5=−1 9−9=0 11−10=1
5−5=0 11−9=2 8−10=−2
3−5=−2 6−9=−3 7−10=−3
4−5=−1 8−9=−1 12−10=2
The within-groupsumof squaresisthe sumof squaresof all 18 valuesinthistable
The within-groupdegreesof freedomis
25. Page 25 of 52
Thus the within-groupmeansquare value is
Step 5: The F-ratiois
The critical value isthe numberthat the teststatisticmustexceedtorejectthe test.Inthis
case, Fcrit(2,15) = 3.68 at α = 0.05. Since F=9.3 > 3.68, the resultsare significantatthe 5% significance
level.One wouldrejectthe null hypothesis,concludingthatthere isstrongevidencethatthe expected
valuesinthe three groupsdiffer.The p-valueforthistestis0.002.
Afterperformingthe F-test,itiscommonto carry out some "post-hoc"analysisof the groupmeans.In
thiscase,the firsttwogroupmeansdifferby4 units,the firstand thirdgroupmeansdifferby5 units,
and the secondandthird groupmeansdifferbyonly1 unit.The standarderror of eachof these
differencesis .Thusthe firstgroup isstronglydifferentfromthe other
groups,as the meandifference ismore timesthe standarderror,sowe can be highlyconfidentthat
the populationmean of the firstgroupdiffersfromthe populationmeansof the othergroups.However
there isno evidence thatthe secondandthirdgroupshave differentpopulation meansfromeachother,
as theirmeandifference of one unitiscomparable tothe standarderror.
Note F(x, y) denotesan F-distribution cumulative distributionfunctionwith x degreesof freedominthe
numeratorand ydegreesof freedominthe denominator.
ANOVA'srobustnesswithrespecttoType I errorsfor departuresfrompopulationnormality[edit]
26. Page 26 of 52
The one-wayANOVA canbe generalizedtothe factorial andmultivariatelayouts,aswell astothe
analysisof covariance.[clarification needed]
It isoftenstatedinpopularliterature thatnone of these F-testsare robustwhenthere are severe
violationsof the assumptionthateachpopulationfollowsthe normal distribution,particularlyforsmall
alphalevelsandunbalancedlayouts.[5]
Furthermore,itisalsoclaimedthatif the underlyingassumption
of homoscedasticity isviolated,the Type Ierrorpropertiesdegenerate muchmore severely.[6]
However,thisisa misconception,basedonworkdone inthe 1950s and earlier.The firstcomprehensive
investigationof the issue byMonte CarlosimulationwasDonaldson(1966).[7]
He showedthatunderthe
usual departures(positiveskew,unequalvariances)"the F-testisconservative"soislesslikelythanit
shouldbe to findthata variable issignificant.However,aseitherthe sample sizeorthe numberof cells
increases,"the powercurvesseemtoconverge tothatbased onthe normal distribution".More detailed
workwas done byTiku (1971).[8]
He foundthat "The non-normal theorypowerof Fisfoundto differ
fromthe normal theorypowerbya correction termwhichdecreasessharplywithincreasingsample
size."The problemof non-normality,especiallyinlarge samples,isfarlessseriousthanpopulararticles
wouldsuggest.
The current viewisthat"Monte-Carlostudieswere usedextensivelywithnormal distribution-based
teststo determine howsensitivetheyare toviolationsof the assumptionof normal distributionof the
analyzedvariablesinthe population.The general conclusionfromthese studiesisthatthe
consequencesof suchviolationsare less severe thanpreviouslythought.Althoughthese conclusions
shouldnotentirelydiscourage anyone frombeingconcernedaboutthe normalityassumption,theyhave
increasedthe overall popularityof the distribution-dependentstatistical testsinall areasof research."[9]
For nonparametricalternativesinthe factorial layout,see Sawilowsky.[10]
Formore discussion
see ANOVA onranks.
31. Page 31 of 52
References[edit]
1. Jump up^ Lomax, Richard G. (2007) Statistical Concepts: A Second Course, p. 10, ISBN 0-
8058-5850-4
2. Jump up^ Box, G. E. P. (1953). "Non-Normality and Tests on Variances". Biometrika 40 (3/4):
318–335. doi:10.1093/biomet/40.3-4.318.JSTOR 2333350.
3. Jump up^ Markowski, Carol A; Markowski, Edward P. (1990). "Conditions for the Effectiveness
of a Preliminary Test of Variance". The American Statistician 44 (4): 322–
326. doi:10.2307/2684360. JSTOR 2684360.
4. Jump up^ Sawilowsky, S. (2002). "Fermat, Schubert, Einstein, and Behrens-Fisher:The
Probable Difference Between Two Means When σ1
2
≠ σ2
2
". Journal of Modern Applied Statistical
Methods, 1(2), 461–472.
5. Jump up^ Blair, R. C. (1981). "A reaction to 'Consequences of failure to meet assumptions
underlying the fixed effects analysis of variance and covariance.'" Review of Educational
Research, 51, 499–507.
6. Jump up^ Randolf, E. A., & Barcikowski, R. S. (1989, November). "Type I error rate when real
study values are used as population parameters in a Monte Carlo study". Paper presented at the
11th annual meeting of the Mid-Western Educational Research Association, Chicago.
7. Jump
up^ https://www.rand.org/content/dam/rand/pubs/research_memoranda/2008/RM5072.pdf
8. Jump up^ M. L. Tiku, "Power Function of the F-Test Under Non-Normal Situations", Journal of
the American Statistical Association Vol. 66, No. 336 (Dec., 1971), page 913
9. Jump up^ https://www.statsoft.com/textbook/elementary-statistics-concepts/
10. Jump up^ Sawilowsky, S. (1990). Nonparametric tests of interaction in experimental
design. Review of Educational Research, 25(20–59).