Statistics for Social
Workers
J. Timothy Stocks
tatrstrrsrefers to a branch ot mathematics dealing '"'th the direct de<erip-
tion of sample or population characteristics and the an.ll)'5i• of popula·
lion characteri>tics b)' inference from samples. It co•·ers J wide range of
content, including th~ collection, organization, and interpretJtion of
data. It is divided into two broad categoric>: de;cnptive >lathrics and
inferential >lJt ost ics.
Descriptive statistics involves the CQnlputation of statistics or pnr.1meters to describe a
sample' or a popu lation _~ All t he data arc available and used in <.omputntlon o f t hese
aggregate characteristics. T his may involve reports of central tendency or v.~r i al>il i ty of
single variables (univariate statistics). ll also may involve enumeration of the I'Ciation-
sh ips between or among two or moo·e variables' (bivariate or multivariJte stot istics}.
Descriptiw statistics arc used 10 provide information about a large m.b> of data in a form
that ma)' be easily understood. The defining characteristic of descriptive ;tJtistks b that
the product is a report, not .on inference.
Inferential statisti<> imolvc' the construction of a probable description of the charac·
teristics of a population b•sed on s.unple data. We compute statistics from .1 pJrtial;et of
the population data (a samplt) to estimate the population parameters. Thrse t<timates
are not exact, but \\·e can mo~k..: reawnable judgments as w ho\V preruc our c~lim:ues are.
Included within inferential statiwcs i;, hypothesis testing, a procedure for U>ing mathe-
m:uics tO provide evidence for the exi<tence of relationships between o r among variable;.
T bis testing is a form of inferential •"l~umem.
Descriptive Statistics
Measures of Central Tendency
Measures of central tenden')' are individual numbers that typify the tot.tl set of ~cores.
The three most frequently used mca>urcs of centraltendenq are the arithmetic mean, the
mode, and the median.
Arir!Jmeric .\1ea11. The arithmetic mean usually is simply called the mca11. It also is called
the m-erage. It is computed b)' adding up all of a set of scores and dwidmg by the number
of scores in the set. The algebraic representation of this is
75
76 PA11 f I • OuANTifAllVi AffkOAGHU: fouHo~;noM Of Ot.r"' CO ltf(TIO'J
~, =l:: X ,
11
where 11 represents the popu I at ion mean, X represems an individual score, and rr is t he
number of scores being adde(l.
The formula for the sample mean is the same except t hat the mean is represented by
the variable lener with a bar above it:
- l:;X X= --.
II
Following are t he numbers of class periods skipped by 20 seventh-graders d uring
I week: {1, 6,2,6, 15,2(),3,20, 17, 11, 15, 18,8,3, 17, 16, 14, 17,0, 101. Wecomputethe
mean by adding up the class periods missed and dh•iding by 20:
l:;X 219 •
J.l = -- = - = 10.9o.
II 20
Mode. The mode is the most frequently appearing score. It really is not so much a m.
Statistics for Social Workers J. Timothy Stocks tatr.docx
1. Statistics for Social
Workers
J. Timothy Stocks
tatrstrrsrefers to a branch ot mathematics dealing '"'th the direct
de<erip-
tion of sample or population characteristics and the an.ll)'5i• of
popula·
lion characteri>tics b)' inference from samples. It co•·ers J wide
range of
content, including th~ collection, organization, and
interpretJtion of
data. It is divided into two broad categoric>: de;cnptive
>lathrics and
inferential >lJt ost ics.
Descriptive statistics involves the CQnlputation of statistics or
pnr.1meters to describe a
sample' or a popu lation _~ All t he data arc available and used
in <.omputntlon o f t hese
aggregate characteristics. T his may involve reports of central
tendency or v.~r i al>il i ty of
single variables (univariate statistics). ll also may involve
enumeration of the I'Ciation-
sh ips between or among two or moo·e variables' (bivariate or
multivariJte stot istics}.
Descriptiw statistics arc used 10 provide information about a
large m.b> of data in a form
that ma)' be easily understood. The defining characteristic of
descriptive ;tJtistks b that
the product is a report, not .on inference.
2. Inferential statisti<> imolvc' the construction of a probable
description of the charac·
teristics of a population b•sed on s.unple data. We compute
statistics from .1 pJrtial;et of
the population data (a samplt) to estimate the population
parameters. Thrse t<timates
are not exact, but ·e can mo~k..: reawnable judgments as w
hoV preruc our c~lim:ues are.
Included within inferential statiwcs i;, hypothesis testing, a
procedure for U>ing mathe-
m:uics tO provide evidence for the exi<tence of relationships
between o r among variable;.
T bis testing is a form of inferential •"l~umem.
Descriptive Statistics
Measures of Central Tendency
Measures of central tenden')' are individual numbers that typify
the tot.tl set of ~cores.
The three most frequently used mca>urcs of centraltendenq are
the arithmetic mean, the
mode, and the median.
Arir!Jmeric .1ea11. The arithmetic mean usually is simply
called the mca11. It also is called
the m-erage. It is computed b)' adding up all of a set of scores
and dwidmg by the number
of scores in the set. The algebraic representation of this is
75
76 PA11 f I • OuANTifAllVi AffkOAGHU: fouHo~;noM Of
Ot.r"' CO ltf(TIO'J
3. ~, =l:: X ,
11
where 11 represents the popu I at ion mean, X represems an
individual score, and rr is t he
number of scores being adde(l.
The formula for the sample mean is the same except t hat the
mean is represented by
the variable lener with a bar above it:
- l:;X X= --.
II
Following are t he numbers of class periods skipped by 20
seventh-graders d uring
I week: {1, 6,2,6, 15,2(),3,20, 17, 11, 15, 18,8,3, 17, 16, 14,
17,0, 101. Wecomputethe
mean by adding up the class periods missed and dh•iding by 20:
l:;X 219 •
J.l = -- = - = 10.9o.
II 20
Mode. The mode is the most frequently appearing score. It
really is not so much a measure
of centrality as it is a measure of typicalness. It is found by o
rganizing scores int o a fre-
quency distribution and determining which score has t he
greatest fre-
TABLE 6 . 1 Truancy Scores
quency. Table 6. 1 displays the truancy scores arranged in a
5. 0
1
3
1
2
I
0
0
l
I
0
1
0
2
0
0
2
0
Because 17 is the most frequently appearing number, the mode
(or
modal number) of class periods skipped is 17.
Unlike the mean or median, a distribution o f scores can have
more
than one mode.
,llfedinrr. lf we take all the scores in a set of scores, place t hem
in o rder
from least to greatest, and count in to the middle, then the score
6. in the
middle is the median. This is easy enough if there is an odd
number of
scores. However, if there is an even number of scores, then
there is no
single score in the middle. In this case, t he two middle scores
are
selected, and their average is the median.
There a.re 20 scores in the previous example. The median would
be
the a"erage of the lOth and lith scores. We usc t he frequency
table to
find these scores, which are 14 and J 5. T hus, the median is
14.5.
Measures of Variabi li ty
Whereas measures of central tendency are used to estimate a
typical
score in a dimibution, measures of variability may be thought of
ns a
way in which to measure departu re from typic<~lness. They
pro"ide
information on how "spread out" scores in a d istribution are.
J<auge. The range is the easiest measure of variability to
calculate. It is
simply the distance from the minimum ( lowest) score in a
distribution
If
10
R
7. :.aJ
13
de
c .. ...nu 6 • STAnsnu t<~~ Soc&AL Wouta~ 77
to the maximum ( highest) score. h is obtained by subtracting
the 111ini murn score flom
lhe maximum ~cor~.
Let us compute th.- rang.- for the following dJt.l ~ct:
/1, 6, 10, 14, 18,22/.
'T'he n1inimum i!) 2, and tht." tnJximum is 22:
Range = 22 - 2 20.
Sum ofSquaus. The sum of squares is a measure of the total
amount of variability in" set
of scores. Jts na me tells how to wmpute it. Smu ofsqunres is
short (or sum ofsqumed dc1ti
til ion scores. It is represented by the S)'lnbol SS.
The formulas for sample and population sums ot squares are the
same except for sam-
ple and populat•on mean symbob:
SS = I(X ~tl'
Using the dJtJ set fo r t11e range, the sum of squnres would be
computed as in
'ldble6.2.
8. V.~rinuce. Another name for variance i~ mean square. This is
short for mean of squared
devintron score<. 1l1is is obtained by dividi ng the sum of
squares by the number of scores
(11). It is a me,tsure of the average amount of variabilit y
associated with each score in a set
of scores. The population variance fOI'mu la is
ss
a2= -.
n
whc1e cr2 is the syn>bol for populn tion variance, SS is the
symbol fo r sum of squares, and
11 st,uJds for th e number of scores in the population.
The variance for the example we used to compute
sum of squares would be
TAOLE 6.2 Computing the Sum of Squares
X X m
2 tO
6 6
10 ]
l<t 12
18 >6
12 10
9. NOTE, !X~ 72; n- 6; ~ • 12; l:(X - p)' ~ 780
(X - m)'
100
36
4
4
36
100
2 280
(J --= 46.67.
6
The sample variJnce is not an unbi.as.ed estin1a1o1
of thf population variance. If we compute the vari
anccs for these samples using the SS/11 formula, then
the- san1ple vadn nccs wil1 average o ut smaller than
the population val'iance. For th is rc:~son, the sample
variance is computed differently froru the population
variance:
ss
sl = - - .
II - I
10. CHA,Ut 6 • Sr"n~nn HJa SOCIAl wouus 77
to the maximum (highc;t) score. h is obtained by subtracting the
minimum scoo·c from
the maximum score.
let us compute the rnnge for the following data set:
12. 6, 10, 14, 18.221 .
The minimum is 2. and the maximum is 22:
Range 22-2 = 20.
Sum of8qo~t~res. The ,um of squares;, a measure of the total
amoun t o f variability in a set
of score~. It> name tells how to compute it. Sum of 51Jo.arcs is
short for ;um of squared dco•i-
atiou scores. It is reprewnt<>tl by the symlxll SS.
The formulas for <.omple and popul.llion sums of squares are
the ~arne except tor S<J m -
p le a nd population mean sym bols:
ss l.(X -X)'
Usi ng the data set for the range, t he su m of squares would be
computed ns i n
T.,b)e 6.2.
~rta11u. Another name for variance is mean square. This is
short for menn of 51JIUtred
devontw11 scores. This os obtained by dividing the sum of
squares by the number of ><.ores
(n). It is a measure of t he averoge ••m ount of var iability
associated w ith each score in a set
11. of scores. T he popula tio n variance for m11ln is
ss
¢ =- .
n
where o ' is th e symbol foo· population v•o·ia.nc.e, SS is t he
symbol fo o· Slim o f squares. a11d
11 stands for the numbet of scores in the population.
The •-..ria nee for the example we used to compute
sum of squar~s would be
TABu 6.2 Computing the Sum of Squares
X X-m
2 - 10
6 -6
10 -2
14 +2
18 +6
22 +10
HOT£: r.x- 72: n; ti; p = 12: l:lX Ill'= 250.
(X- m)'
100
12. j(,
4
4
J&
tOO
280
cr2 =
6
~ 46.67.
The snmple variance is uot Jn Ulbiased estimalor
o f' t he population variance. Jf we com pute t he vari-
ances for these samples using th" SShr formu la, then
the sample variances will average out smaller than
thc population ••ariance. For this reJson, the sample
Vllriance is computed differe ntly from the population
variance:
ss r =-.
n - J
78 PAll I • QuAiuu.ot.nvt A"MACH(S.:. FouHDAIIOif"i Of
O.AIA CoLLfcnow
The n - 1 i> a correction fac tor for this tendency to
undcre>tima te. I t is c.1 lled
degree• of freedon1. If <lur example we1< a sample. then the
,,ariance would be
13. .1 280
> =--
6 - 1
280 6
5 = 5.
Sumdard Deviatron. Although the variance is a measure of
average variability associJtc'<l
wllh each score, it i> on a d ifferent sc.lle from the score itself.
Tlw variance measures avel·
age squared deviation from the mean. To get " me<tstne of
averdgc variabili ty on the ;a rue
scale as the original scores, we ta ke the squa 1·c rc)Ot of the
varia nee. The st<tndard deviation
is the square root of the variance. The fo rmula< are
Using the same .ct of numbers as before, the population
standard deviation would be
cr -/46.67 = 6.83 .
and the sample st.mdard deviation would be
s J56 = 7.'18.
For a normally d istribured set of scores, n ppwximately 68% of
all ;cores will be within
ll •tanrlard deviation of 1 he mean.
Measures of Relationship
T.1ble 6.3 shows the relat iortship between number of >treSsors
experien<ed by a parent during
.1 week and that parent's frequency of U>C of corporal
punishment during the same wee.k.
14. One can use ,·eg,·cssion procedures to dcrivr the line that best
fo ts the data. This line is
rcfel'l'ed to as a regression line (or line of best ii 1 o r
prediction I inc). Su ch a line bas been
.CJiculated for the example plot. It has a Y ime,·cept of - 3.555
t11id a slope of + 1.279. T his
gives us the prediction equation of
Y,_. = 3.555 t 1.279X,
where Yis fi-equ ency o f <Orporal p unishment and X is
stresso1 ~. This is graphically pre
dieted in Figure 6 . 1.
Slope is the ch•ngc in Y for a unit increase in X. So, the slope
of 11.279 meam that''"
increase in stres.ors (X) of 1 will be accomp.ulicd by an
increase in predicted frequency of
~orporal punishment (I') of + 1.279 incidents per week. If the
slope were a negati'e
number, then an increase in X would be accompanied by a pred
ictcd decrease in Y.
The equation does not give the actual value of Y (called the
obt.tined or obserwd
score); rather, it giv~s a prediction of the value of Y for a
certain value of X. Fo r
-
Cu,"na 6 • SrAliSnc<o 10~ So- '"' WOhi•C. 79
r iQUIO 6.1 8
15. Frequency ol Stre<sors
and Use of Co•poral 7
0
Punishment
~
6 0
c . Y P'td; - 3.555 + 1.279X ..
" 5 0 r:r
e ...
c 4 ..
E
.r:
3 til
·;:
" Q.
2 0
0
0
0 1 2 3 4 5 6 7 8 9
Stressors
example, if X were 3 , rhen we would predi<.t t hal Y would be
- 3.555 + 1.279(3) ~ - 3.555
+ 3.837 ~ 0.282.
Tuu 6 . 3 frequency of
Sttessors and Use of
16. Corporal Punishment
Sue-ssors Pun1.shm~nt
3 0
4
4 }
s 3
6 4
7 ~
8 6
7
q 8
1() 9
T he regression li ne is the line that predicts Y >UCh t hat t he
error
of p redictio n is minim ized. Error is d efined as the d ifference
between the predicted score and the obtaine<l score. The
equation
for compu ting error is
E= Y Y..,.. ..
~1en X= 4, there arc two obL1ined ''alues of Y: I and 2. The
p redicted value of Y is
Y,,...t = - 3.555 I 1.279( 4) = - 3.555 + S. l l6 ~ 1.56 1.
17. rhe error of prediction i~ E =I - 1.561 = -0.561 fu r Y = I, and
E - 2 - 1.561 = +0.<139fnr Y=2 .
If we square each error difference score and sum the squares.
then we get a quantity called the enor sum of sq.ure;., which i;.
r~presented b)•
SSI: L( Y - Y,..,.,)'.
T he regressi011 line io !he o ne line that give> the sm.11lcst va
lue
fo r SSt.
80 P~oar 1 • QUAtHnAnvE A ,ROACHES: FouNOAHO~r~~$ of
DAtA Conte I!Otf
The SSE is a measure of the lOla I variability of obtained score
values around their pre-
dicted values. There are two other ;un" of squares !hat are
important to undcr>tanding
correlation and regri'SSion.
The total sum of squ.m:s (SS1) i$ a measure of the total
variabilit)' of the obtained
score values around the mean oft he obtained scores. The SST is
represented by
SST = L(Y-Y)'.
The remaining sum of squa 1·cs is coiled the regression sum of
S<Ju:u·cs (SSR) o r the
explained sum of squares. If we squnre each of the differences
between prcdie1 cd scores
18. and t he mean and then add t hem u p, w·c get the SSR, which is
represented by
SSR L( v, .... - Y)'.
The SSR is a measure of the tot.d variabil ity of the predicted
score values around the
mean of the obtained scores.
An important and interesting feature of the>e three sums of
squares is that the sum of
the SSR and SSE is equal to the SS1:
SST SSR- SSE.
This leads us to three o ther imponnnt stat istics: t he proportion
of variance explJined
(I'VE) , the correlation coefficient, ond the standard error of
estim ate.
Proportion of Iarin nee Expluir~ctl. T ht I'VE is a measure of
how good Lhc rcs,·cssion line
p red icts obtained scores. The values of PV£ 1·ange fro m 0 (
no p red ictive value) to I ( pre-
diction with perfect accurJLy). The cqunt ion fo r PV£ is
SSR
J>vE - - ·
SST
There also is a computational equation for the PVE. which is
where
PVE - ( SSXY )'
19. SSX • SSY'
SSXY is the "co variance" ~um of ;qua res: l.(X - X)( Y - Y ),
SSX is t he sum of squares for vn rinble X: IlX - XJ', and
SSYis the sum of squares for varinblc Y: 2:( Y - Y)'.
The procedure fo r computing these sums of squares is outlined
in Table 6.4.
The proportion of v.triance in the freque ncy of corporal
punishment thnl may be
explained by stressors experienced ;,
( 4 6L5)1 3782.25
l'VE = - = = 0 .953.
(48.1)(825) 3968.25
TABLE 6.4 Computation of r2 (PVE)
y Y - y (Y- Y)' X X x (X - X)' (X X)( Y Y)
3 -33 10 .89 0 -4 5 20 .2 5 +1405
4 -2 3 5.29 -lS 12 .25 +80S
4 -23 529 2 -15 6 .25 < 5.75
5 - Ll 1.69 3 1.5 2.25 • 1.95
6 -ol 0 .09 < -o5 0.25 0 IS
7 +0./ 0.49 5 ·10.5 0.25 035
8 + II 2.89 6 ; 1.5 2 .25 • 2.55
20. 7 TO.! 0.49 7 12.5 6 .25 11.75
9 +27 7.29 R t3.5 12.25 -19.45
10 +3 I 13 69 9 "'5 20.25 16.65
NOTE: Y - 6.3; SSY - 48. l; X = 4.5; S5X = 82.5; S5XY • •6 l S
The PVEsometimes is en lied th~ coefticient of determination
and is represented by the
symbol r'.
Correlation Co~ffirirm. A correlation coellicient also is a
111easure of th e strength of rela-
tionship between two variables. The correlation cocfficicnt is
represented by the letter r
and can take on values between - 1 and + I inclu~ivc. The
correlation coefficient always has
the same sign a.< the slope. If one squares a correlation
coefficient, then <me will obtain the
PV£ It is computed using the following formula:
SSXY
r = -vr.;;S50sx""•""S;;;S;;o;Y
For our examph: data, the correlation coefficient would be
+61.5 ~ 61.5 +61.5
R --- = = = -0.976 .
./(18.1)(82.5) ¥'3968.25 62.994
Standard Error of Em mate. The standard error of estimate is the
<tandard deviation of the
prediction errors. It i< computed like any other standard
21. deviation: the: square root of the
SSE divided by the dcRn:es of freedom.
The fi rst s tep is to compute the variance error (s:.J:
..1
'E
SSE
n-2
Notice that the value for degrees of freedom is 11 2 rather than
11 - l. The reason why
we subtract 2 in this instance is that variance error (and
standard Cfi'Or of c:stimatc) is a
statistic describing characteristics of two variables. T hey deal
with the error involved in
the prediction of Y (one variable) from X {the other v.triable) .
'l he standard error of estimate is the square root of the variance
error:
Sf.= ...j(ij.
The standard error of estimate tells us hOv spread out scores
are with respect to their
predicted values. If the error· scores ( E = Y- Y,.o~> are
normally distributed around the
prediction line, then about 68% of actual scores will foil
between ±I :;,; of their predicted
values.
We can calculate the standard error of estimate using the
foUowing computing formula:
22. ( n-1) ( I -- r 2)(-------) , u-2
where
s,. is the standard deviation of Y,
r is the correlation coefficient fo r X and Y, and
n is tl1e sample si7.c.
for the example dat..1, this would be
S£ = 2.3lli ((J -- .953) :~ = D = 2.311 ((0.47)~)
= 2.311J0.053 = (0.230)(0.727) = 0 .167.
Inferential Statistics: Hypothesis Testing
The Null and Alternative Hypotheses
Classical ;tatistical hypothesis testing is based on the evaluation
of two rival hypothescs:
the null hypothesis and the alrermltive hypothesis.
We try to dete<:t relationsh ips by identifying changes that are
unl ikely to have occurred
simp!)• bccau~e of random fluctuat ions <If dependent
measures. Statistical analysis is the
usual procedure for identil)•ing ;uch relationsh•p>.
The null hypothesis is the hypotltcsis that there is no
relationship between two vari-
ables. This implies that if the null hypothesis is true, then any
apparent relationship in
Mmples i> the resuh of random flu ctuations in the dependent
meas ure or sampling error.
Statistical hypothesis tests arc carried out on samples. for
23. example, in nn experi-
ment!// two-gro11p posttcst-only design, there would be a
sample whose members
received an intervention and a sample whose members did not.
Both of these would be
probability samples from a larger population. The interven tion
>ample would reprcse>11
Figure 6.2
The Null Hypothesis
and Type I Error
C14Anu 6 • StAJtmu f<M' Socw. Wouus 83
the popula tion of all individuals as if they had received the
i.ntervt•ntion. Th e control
sample would be repre<entative of the <ame popuiJtion of
individuals as if the)· had
not recei>·ed the inten-emion.
lf the intervention had no effect, then th e populations would be
iden tical. However, it
would be unlikely that two samples from two ident ical popula
tions would he ident ical. So,
although the sample mea ns would be diffe rent, they would not
rcpre>CtH any effect of t he
independent variable. The apparent difference would be due to
sampling error.
Statistical hypothC$is tests invoh·e e'-aluating evidence from
.amples to make inler-
ences about populations. II is for this reason that the null
hypothe>i> is a statement about
population parameters. For example, o ne null hypothe>iS for I
24. he previous design cou ld be
stated as
or as
H, : ll = ~to = 0.
H, stands for the null hypothC$iS. It is J letter H with J " ro
subscript. It is a statement
t.ha t the m~ans of the experime ntal ( Mean I) and cont rol (
Mean 2) popultnio'ls arc eq ual.
To <:>tablish that a relat ionship exists between th e in
tervention (independent Vilfi:tble)
and the outcome (measure o f the dependent variable), we must
collect eviden<C that
allows us to reject the null h)'J>Othesis.
Strictly speaking, we do not mak~ J decision as to whether the
nul] hypoth eoi:. is
correct. Ve evaluate the evidence to determine the ext<·nL to
which it •cncls to confirn"' or
disconfi rm the null hypothesis. If the evide nce wct·e suc.h that
it is unlikely that an
observed relationship would have ocwrrcd as the re.ult of
sampling e r ror, then we would
reject the null hypothesis. If the eviden«: were more ambiguous,
then we would f.1il to
reject the null hypothesis. The terms re;err and fail to rrjm carry
the implicit under<tand-
ing tlMt our decision might be in ert'or. Th e truth i, th at we n
ever really know whethe r
our decbio11 is correct.
vVhen we reject the n ull hypothesh and it is true, we ltJve
committed a Type I error. By
25. setting certain statistic•! criteria beforehand, we can ~"tablish
the prombiliry that we "•ill
commit a 'JYpe l error. 'c decide what proportion of the time
we arc willing to commit a
Type l error. This proportion ( proba bility) is called a l1>ha
(o:). If we n1e willing to reject
the null hypothesis when it is true onl)• I in 20 times, thc11 we
set our a level at .05. If' on ly I
in 100 time>, then we set it at .0 I.
Tbe probability that we will fail to reje<t the null hy]>Othesis
when it is true (correct
deci;ion) ts 1 - a (Figure 6.2).
Situahon: NULL HYPOTH ESIS TRUE
Deas1on ACSlllt
Reject H, 1'ype I Error
ex • the probability or rejecting the Null Hypo thes is when it is
true
Fail to Reject H, Correct Decision
I a= the probability of not rejecttng the Nun Hypothesis wllcn
11 is true.
84 PAII t I • Qv.umr:.WI~ A PI'IOACHH: Fourwt. lt<m S OF 0
1.1A CotulCI!Oii
Figure G.:Y
The Nu ll Hypothesis
and u Level
26. The fol!pwing hypothesis would be evaluated by c<>mparing
the difference between
sample means:
If' we carried out multiple samples from populations with
identical. n>eans (the null
hypothesis was true), then we would find that most of the
vallles for the differences
between the sample means wou ld not be 0. Figure 6.3
represents a distribm ion of the dif·
fercn ces between sample means drawn from identical
populations.
The mean d ifference for the total distribution of samp le means
is 0, and the standard
deviation is 5. I f the differences are normally distributed, then
approximately 68% of
lhese differences will be between - 5 (z = - 1) and +5 (z= +l).
Fully 95% of the differences
in the distribution will fall between the range of -9.8 ( z =-1
.96} and +9.8 (z = +1 .96). If
we drew a random sa mple from each population, it '~ould not
be unusual to find a di ffer-
ence between sample means of as mnch as 9 .8, even though the
population means were
the same.
On the other hand, we would expect to fin d a difference more
than 9.8 about 1 in 20
times. If we set our criterion fo r rejecting the null hypothesis
such that a mean difference
must be greater than +9.8 or less than - 9.8, tben we would
commit a Type I error only 1
in 20 times (.OS) on average. O ur (J. level ( the probability of
committing a Type l error)
would be set at .05.
27. The probability that a relationship or a difference of a certain
size would be seen in a
sample if the nuU hypothesis were true is represented by p. To
reject the null hypothesis,
p mu~t be less than or equal to <X. The probability of getting
an effect this large or !~rger if
the null hypothesis were true is less than or equal to the
probability of making a Type l
error that we ha,•e decided is acceptable.
1 - u = .95
- 4 - 3 - 1 0 +1 +3 +4
z
- 20 - 15 - 10 - 5 0 +5 +10 +15 +20
X, -x2
a = .05
CH..,tU 6 • Sr.r.nsnu •o• SoctAt Wo~·~ui 85
Rejecting the H0: We believe that it i~ likely that the
relationship in the sample IS gcncr
alizablc to the population.
Not rejutmg the H,; We do not believe that we have >umcient
e1•idence to draw infer-
ences about the populat ion.
For the previous example, let us imagine that we ha-e set a=
.OS. Al;o, imagine thJt we
obtained a difference betwt-en the sample me.ms of 10. The
probability that we would
28. obtain a difference of +10 or - 10 would be equivalent to the
probability of a z ~core
g reater than +2.0 plus the probabilit y of a z ~core less th.111 -
2.0 o r .0228 + .0228 = .0156.
This is o ur p value; p = .0456. Because p <a, we would reject
the n ull hypothesis.
Some texts create the impression that the alternative (or
research or experimental)
hypothes~ b simply tbc opposite of the null hypothesis. In fact,
sometimes d1is nail·c
alternative h)pothesis is used. However, it generally is not
particularly useful to
researchers. Usually. we nrc inrertsted i n defecting an in
lcrvention effccl of a particu l :~r
size. On certnin measu,·c,, we would be interested in .mwll
effects (<:.g., death rate),
whereas on others, o nly l~rger effects would be of interest.
When we are inter<5ted in an effect of a particular •ize. we use
a specific altemnti1e
hypotbesil. that takes the following form:
H, : f.l 1 - ~,.,;:: id I,
where dis a difference of a particular size. If the test is a
nondirectional I<'St, then the dif-
ference in the alternative hypothesis would be expressed as an
absolute value, ldl, to ohnw
that either ,t positive or neg.tt tve differe~tct~ ;, involv~d.
lt is custo mary to exprc>S the mea11 d i ffere nce in an II , in
units of standard deviat ion.
Such scores are called zsco,·es. T he diffe(ence is called an
effect size. Effect sizes frequently
are used in meta-analyse> of outcome studies to compare the
29. relatic cllicacy of different
t )'Pes of intencntioos acrOS> 'tudies.
Cohen (1988) groups effect sizes into small, medium , and large
cntegorics. The criteda
for each arc al follows:
Small effect >iu (d ~ .2): It is appro:rimatcly the effect size for
the average difference in
height (i.e., 0.5 inches and < = 2.1) between 15- and 16 year-old
girls.
Medium effect size (d • .5): It is ap proximately the effect size
fo r t he average differc11ce
in heigh t ( i.e., 1.0 inches and s~ 2.0) bNwccn 14- aud 18·
year-old g ir ls.
Large cff<Xl size (d: .8): rh1s is the same eflect size (tl = .8) as
the avcrJge difference in
height for 13- and 18-year-old girls.
l ntuit iv<:ly. it would se..-m t hat we wo uld want to detect
even ve1y >mall effect si ~t·s in
our research. llo1Vever, t here is a practicdl trade-off involved.
All o ther things being equal.
the consistcllt detection of unaU effect >izc' requires very large
(1l > 200) sample size,,
Because 'cry large sample sizes require resources thdt might not
be readily available,
they might not be practical for all studies. Furthermore. there
are c~rtail1 outcome vari-
ables for which we would not be part icuia l'l y in terested in
small effec t>.
If we rejeCt t he null hypothesis, t hen we implicitly huvc
30. decided that t he evidence >Up-
ports the alternative hypothesis. If the alttrnative hypothc<is is
true and we reject t he null
hypothesis. then we have m3de a correct decision. However, if
we fail to reject the null
hypothesis and the alternati•e hypothesis is true, then we hJve
committC'd a Type II error.
A Type !I error involves the fa ilure to detect an existing effect
(Figure 6.4).
86 P1o11r I • Qt•MmTM •; e A ?PIOAC HtS: Fou NDAti ON)
o, 0.-.tA Contr'fiO'I
Figur• 6 .4
The Null Hypoth<sis
and Typo II Error
Decision
Reject 1io
Fail to Reject
H•
Siluation: ALTERNATIVE HYPOTHESIS TRUE
Result
Correct 0 edslon
1 -13 a t he
Alternative
31. probabinty of rejecling tho Null Hypothesis when the
Hypothesis is true. The power ot a test.
Type II E n· or
I}~ the p r
Altornatlvo
obability of not rejecling the Null Hypothesis w11e 11 the
Hypothesis is true.
Beta(~) is t he probdbility o f committing a Type rr error. This
probability is eStdblished
when we set our criterion for rejecting the null hypothesis. The
probdbility of a correct
decision (I - f3) is an importdnt probability. It is so important
that it has a nJmc~power.
Power refers to the probability t h.u "e will detect an eff«t of
the size we have sckctcd.
We should decide on the power (I - (3) as well as the a level
before we carry out a sta-
tistical test. just as with Type 1 error, we should decide
beforehand how often we are will-
ing to make a Type 11 error (fail to detect a certain effect size).
This is our f3 level. The
procedure for making such determinat ions is discussed in
Cohen ( 1988).
Assumptio ns for Statisti cal Hypothesis Tests
Although assumptions arc diffc •·cm leu different tests, all tests
of the uull hypo1 hcsis shn re
two related assumptions: randomness nud independence.
T he randomness assum ption is t hnt sample members m ust be
32. randomly selected from
the populatio n being evaluate d. If the sample is being divided
into groups (e.g., trc:>tment
and control), then assignment to gro ups al.<e> must be random.
This is referred to as mn-
rlom selection and random fWigmnem.
The mathematical models that underlie statistical hypothesis
testing depend on ran-
dom sampling. If the samples Jre not random. then •<e cannot
compute .111 accurate prob·
ability (p) that the sample could have resulted if the …