Trochim, W. M. K. (2006). Internal validity.httpwww.socialres

Trochim, W. M. K. (2006). Internal validity.
http://www.socialresearchmethods.net/kb/intval.php
Please follow link:^^^^^
Social Work Research: Chi Square
Molly, an administrator with a regional organization that
advocates for alternatives to long-term prison sentences for
nonviolent offenders, asked a team of researchers to conduct an
outcome evaluation of a new vocational rehabilitation program
for recently paroled prison inmates. The primary goal of the
program is to promote full-time employment among its
participants.
To evaluate the program, the evaluators decided to use a quasi-
experimental research design. The program enrolled 30
individuals to participate in the new program. Additionally,
there was a waiting list of 30 other participants who planned to
enroll after the first group completed the program. After the
first group of 30 participants completed the vocational program
(the “intervention” group), the researchers compared those
participants’ levels of employment with the 30 on the waiting
list (the “comparison” group).
In order to collect data on employment levels, the probation
officers for each of the 60 people in the sample (those in both
the intervention and comparison groups) completed a short
survey on the status of each client in the sample. The survey
contained demographic questions that included an item that
inquired about the employment level of the client. This was
measured through variables identified as none, part-time, or
full-time. A hard copy of the survey was mailed to each
probation officer and a stamped, self-addressed envelope was
provided for return of the survey to the researchers.
After the surveys were returned, the researchers entered the data
into an SPSS program for statistical analysis. Because both the
independent variable (participation in the vocational

rehabilitation program) and dependent variable (employment
outcome) used nominal/categorical measurement, the bivariate
statistic selected to compare the outcome of the two groups was
the Pearson chi-square.
After all of the information was entered into the SPSS program,
the following output charts were generated:
TABLE 1. CASE PROCESSING SUMMARY
Cases
Valid
Missing
Total
N
Percent
N
Percent
N
Percent
Program
Participation
*Employment
59
98.3%
1
1.7%
60
100.0%
TABLE 2. PROGRAM PARTICIPATION *EMPLOYMENT
CROSS TABULATION
Employment
Total
None

Part-Time
Full-Time
Program
Participation
Intervention
Group
Count % within Program Participation
5
16.7%
7
23.3%
18
60.0%
30
100.0%
Comparison
Group
16
55.2%
7
24.1%
6
20.7%
29
100.0%
Total
21
35.6%
14
23.7%
24

40.7%
59
100.0%
TABLE 3. CHI-SQUARE TESTS
Value
df
Asymp. Sig. (2-sided)
Pearson Chi-Square
11.748a
2
.003
Likelihood Ratio
12.321
2
.002
Linear-by-Linear Association
11.548
1
.001
N of Valid Cases
59
a. 0 cells (.0%) have expected count less than 5. The minimum
expected count is 6.88.
The first table, titled Case Processing Summary, provided the
sample size (N = 59). Information for one of the 60 participants
was not available, while the information was collected for all of
the other 59 participants.
The second table, Program Participation Employment Cross
Tabulation, provided the frequency table, which showed that
among participants in the intervention group, 18 or 60% were
found to be employed full time, while 7 or 23% were found to
be employed part time, and 5 or 17% were unemployed. The
corresponding numbers for the comparison group (parolees who

had not yet enrolled in the program but were on the waiting list
for admission) showed that only 6 or 21% were employed full-
time, while 7 or 24% were employed part time, and 16 or 55%
were unemployed.
The third table, which provided the outcome of the Pearson chi-
square test, found that the difference between the intervention
and comparison groups were highly significant, with a p value
of .003, which is significantly beyond the usual alpha-level of
.05 that most researchers use to establish significance.
These results indicate that the vocational rehabilitation
intervention program may be effective at promoting full-time
employment among recently paroled inmates. However, there
are multiple limitations to this study, including that 1) no
random assignment was used, and 2) it is possible that
differences between the groups were due to preexisting
differences among the participants (such as selection bias).
Potential future studies could include a matched comparison
group or, if possible, a control group. In addition, future studies
should assess not only whether or not a recently paroled
individual obtains employment but also the degree to which he
or she is able to maintain employment, earn a living wage, and
satisfy other conditions of probation.
(Plummer 63-65)
Plummer, Sara-Beth, Sara Makris, Sally Brocksen. Social Work
Case Studies: Concentration Year. Laureate Publishing,
10/21/13. VitalBook file.
The citation provided is a guideline. Please check each citation
for accuracy before use.
Statistics for Social
Workers
J. Timothy Stocks

tatrstrrsrefers to a branch ot mathematics dealing '"'th the direct
de<erip-
tion of sample or population characteristics and the an.ll)'5i• of
popula·
lion characteri>tics b)' inference from samples. It co•·ers J wide
range of
content, including th~ collection, organization, and
interpretJtion of
data. It is divided into two broad categoric>: de;cnptive
>lathrics and
inferential >lJt ost ics.
Descriptive statistics involves the CQnlputation of statistics or
pnr.1meters to describe a
sample' or a popu lation _~ All t he data arc available and used
in <.omputntlon o f t hese
aggregate characteristics. T his may involve reports of central
tendency or v.~r i al>il i ty of
single variables (univariate statistics). ll also may involve
enumeration of the I'Ciation-
sh ips between or among two or moo·e variables' (bivariate or
multivariJte stot istics}.
Descriptiw statistics arc used 10 provide information about a
large m.b> of data in a form
that ma)' be easily understood. The defining characteristic of
descriptive ;tJtistks b that
the product is a report, not .on inference.
Inferential statisti<> imolvc' the construction of a probable
description of the charac·
teristics of a population b•sed on s.unple data. We compute
statistics from .1 pJrtial;et of
the population data (a samplt) to estimate the population
parameters. Thrse t<timates
are not exact, but ·e can mo~k..: reawnable judgments as w

hoV preruc our c~lim:ues are.
Included within inferential statiwcs i;, hypothesis testing, a
procedure for U>ing mathe-
m:uics tO provide evidence for the exi<tence of relationships
between o r among variable;.
T bis testing is a form of inferential •"l~umem.
Descriptive Statistics
Measures of Central Tendency
Measures of central tenden')' are individual numbers that typify
the tot.tl set of ~cores.
The three most frequently used mca>urcs of centraltendenq are
the arithmetic mean, the
mode, and the median.
Arir!Jmeric .1ea11. The arithmetic mean usually is simply
called the mca11. It also is called
the m-erage. It is computed b)' adding up all of a set of scores
and dwidmg by the number
of scores in the set. The algebraic representation of this is
75
76 PA11 f I • OuANTifAllVi AffkOAGHU: fouHo~;noM Of
Ot.r"' CO ltf(TIO'J
~, =l:: X ,
11
where 11 represents the popu I at ion mean, X represems an
individual score, and rr is t he
number of scores being adde(l.

The formula for the sample mean is the same except t hat the
mean is represented by
the variable lener with a bar above it:
- l:;X X= --.
II
Following are t he numbers of class periods skipped by 20
seventh-graders d uring
I week: {1, 6,2,6, 15,2(),3,20, 17, 11, 15, 18,8,3, 17, 16, 14,
17,0, 101. Wecomputethe
mean by adding up the class periods missed and dh•iding by 20:
l:;X 219 •
J.l = -- = - = 10.9o.
II 20
Mode. The mode is the most frequently appearing score. It
really is not so much a measure
of centrality as it is a measure of typicalness. It is found by o
rganizing scores int o a fre-
quency distribution and determining which score has t he
greatest fre-
TABLE 6 . 1 Truancy Scores
quency. Table 6. 1 displays the truancy scores arranged in a
frequency
distribution.
Score
20
19

18
17
16
IS
14
13
12
II
10
9
8
7
6
5
4
3
2
1
0
frequ ency
2
0
1
3
1
2
I

0
0
l
I
0
1
0
2
0
0
2
0
Because 17 is the most frequently appearing number, the mode
(or
modal number) of class periods skipped is 17.
Unlike the mean or median, a distribution o f scores can have
more
than one mode.
,llfedinrr. lf we take all the scores in a set of scores, place t hem
in o rder
from least to greatest, and count in to the middle, then the score
in the
middle is the median. This is easy enough if there is an odd
number of
scores. However, if there is an even number of scores, then
there is no
single score in the middle. In this case, t he two middle scores
are
selected, and their average is the median.

There a.re 20 scores in the previous example. The median would
be
the a"erage of the lOth and lith scores. We usc t he frequency
table to
find these scores, which are 14 and J 5. T hus, the median is
14.5.
Measures of Variabi li ty
Whereas measures of central tendency are used to estimate a
typical
score in a dimibution, measures of variability may be thought of
ns a
way in which to measure departu re from typic<~lness. They
pro"ide
information on how "spread out" scores in a d istribution are.
J<auge. The range is the easiest measure of variability to
calculate. It is
simply the distance from the minimum ( lowest) score in a
distribution
If
10
R
:.aJ
13
de
c .. ...nu 6 • STAnsnu t<~~ Soc&AL Wouta~ 77

to the maximum ( highest) score. h is obtained by subtracting
the 111ini murn score flom
lhe maximum ~cor~.
Let us compute th.- rang.- for the following dJt.l ~ct:
/1, 6, 10, 14, 18,22/.
'T'he n1inimum i!) 2, and tht." tnJximum is 22:
Range = 22 - 2 20.
Sum ofSquaus. The sum of squares is a measure of the total
amount of variability in" set
of scores. Jts na me tells how to wmpute it. Smu ofsqunres is
short (or sum ofsqumed dc1ti
til ion scores. It is represented by the S)'lnbol SS.
The formulas for sample and population sums ot squares are the
same except for sam-
ple and populat•on mean symbob:
SS = I(X ~tl'
Using the dJtJ set fo r t11e range, the sum of squnres would be
computed as in
'ldble6.2.
V.~rinuce. Another name for variance i~ mean square. This is
short for mean of squared
devintron score<. 1l1is is obtained by dividi ng the sum of
squares by the number of scores
(11). It is a me,tsure of the average amount of variabilit y
associated with each score in a set
of scores. The population variance fOI'mu la is

ss
a2= -.
n
whc1e cr2 is the syn>bol for populn tion variance, SS is the
symbol fo r sum of squares, and
11 st,uJds for th e number of scores in the population.
The variance for the example we used to compute
sum of squares would be
TAOLE 6.2 Computing the Sum of Squares
X X m
2 tO
6 6
10 ]
l<t 12
18 >6
12 10
NOTE, !X~ 72; n- 6; ~ • 12; l:(X - p)' ~ 780
(X - m)'
100
36

4
4
36
100
2 280
(J --= 46.67.
6
The sample variJnce is not an unbi.as.ed estin1a1o1
of thf population variance. If we compute the vari
anccs for these samples using the SS/11 formula, then
the- san1ple vadn nccs wil1 average o ut smaller than
the population val'iance. For th is rc:~son, the sample
variance is computed differently froru the population
variance:
ss
sl = - - .
II - I
CHA,Ut 6 • Sr"n~nn HJa SOCIAl wouus 77
to the maximum (highc;t) score. h is obtained by subtracting the
minimum scoo·c from
the maximum score.
let us compute the rnnge for the following data set:

12. 6, 10, 14, 18.221 .
The minimum is 2. and the maximum is 22:
Range 22-2 = 20.
Sum of8qo~t~res. The ,um of squares;, a measure of the total
amoun t o f variability in a set
of score~. It> name tells how to compute it. Sum of 51Jo.arcs is
short for ;um of squared dco•i-
atiou scores. It is reprewnt<>tl by the symlxll SS.
The formulas for <.omple and popul.llion sums of squares are
the ~arne except tor S<J m -
p le a nd population mean sym bols:
ss l.(X -X)'
Usi ng the data set for the range, t he su m of squares would be
computed ns i n
T.,b)e 6.2.
~rta11u. Another name for variance is mean square. This is
short for menn of 51JIUtred
devontw11 scores. This os obtained by dividing the sum of
squares by the number of ><.ores
(n). It is a measure of t he averoge ••m ount of var iability
associated w ith each score in a set
of scores. T he popula tio n variance for m11ln is
ss
¢ =- .
n
where o ' is th e symbol foo· population v•o·ia.nc.e, SS is t he

symbol fo o· Slim o f squares. a11d
11 stands for the numbet of scores in the population.
The •-..ria nee for the example we used to compute
sum of squar~s would be
TABu 6.2 Computing the Sum of Squares
X X-m
2 - 10
6 -6
10 -2
14 +2
18 +6
22 +10
HOT£: r.x- 72: n; ti; p = 12: l:lX Ill'= 250.
(X- m)'
100
j(,
4
4
J&

tOO
280
cr2 =
6
~ 46.67.
The snmple variance is uot Jn Ulbiased estimalor
o f' t he population variance. Jf we com pute t he vari-
ances for these samples using th" SShr formu la, then
the sample variances will average out smaller than
thc population ••ariance. For this reJson, the sample
Vllriance is computed differe ntly from the population
variance:
ss r =-.
n - J
78 PAll I • QuAiuu.ot.nvt A"MACH(S.:. FouHDAIIOif"i Of
O.AIA CoLLfcnow
The n - 1 i> a correction fac tor for this tendency to
undcre>tima te. I t is c.1 lled
degree• of freedon1. If <lur example we1< a sample. then the
,,ariance would be
.1 280
> =--
6 - 1
280 6
5 = 5.

Sumdard Deviatron. Although the variance is a measure of
average variability associJtc'<l
wllh each score, it i> on a d ifferent sc.lle from the score itself.
Tlw variance measures avel·
age squared deviation from the mean. To get " me<tstne of
averdgc variabili ty on the ;a rue
scale as the original scores, we ta ke the squa 1·c rc)Ot of the
varia nee. The st<tndard deviation
is the square root of the variance. The fo rmula< are
Using the same .ct of numbers as before, the population
standard deviation would be
cr -/46.67 = 6.83 .
and the sample st.mdard deviation would be
s J56 = 7.'18.
For a normally d istribured set of scores, n ppwximately 68% of
all ;cores will be within
ll •tanrlard deviation of 1 he mean.
Measures of Relationship
T.1ble 6.3 shows the relat iortship between number of >treSsors
experien<ed by a parent during
.1 week and that parent's frequency of U>C of corporal
punishment during the same wee.k.
One can use ,·eg,·cssion procedures to dcrivr the line that best
fo ts the data. This line is
rcfel'l'ed to as a regression line (or line of best ii 1 o r
prediction I inc). Su ch a line bas been
.CJiculated for the example plot. It has a Y ime,·cept of - 3.555
t11id a slope of + 1.279. T his
gives us the prediction equation of

Y,_. = 3.555 t 1.279X,
where Yis fi-equ ency o f <Orporal p unishment and X is
stresso1 ~. This is graphically pre
dieted in Figure 6 . 1.
Slope is the ch•ngc in Y for a unit increase in X. So, the slope
of 11.279 meam that''"
increase in stres.ors (X) of 1 will be accomp.ulicd by an
increase in predicted frequency of
~orporal punishment (I') of + 1.279 incidents per week. If the
slope were a negati'e
number, then an increase in X would be accompanied by a pred
ictcd decrease in Y.
The equation does not give the actual value of Y (called the
obt.tined or obserwd
score); rather, it giv~s a prediction of the value of Y for a
certain value of X. Fo r
-
Cu,"na 6 • SrAliSnc<o 10~ So- '"' WOhi•C. 79
r iQUIO 6.1 8
Frequency ol Stre<sors
and Use of Co•poral 7
0
Punishment
~

6 0
c . Y P'td; - 3.555 + 1.279X ..
" 5 0 r:r
e ...
c 4 ..
E
.r:
3 til
·;:
" Q.
2 0
0
0
0 1 2 3 4 5 6 7 8 9
Stressors
example, if X were 3 , rhen we would predi<.t t hal Y would be
- 3.555 + 1.279(3) ~ - 3.555
+ 3.837 ~ 0.282.
Tuu 6 . 3 frequency of
Sttessors and Use of
Corporal Punishment
Sue-ssors Pun1.shm~nt
3 0
4

4 }
s 3
6 4
7 ~
8 6
7
q 8
1() 9
T he regression li ne is the line that predicts Y >UCh t hat t he
error
of p redictio n is minim ized. Error is d efined as the d ifference
between the predicted score and the obtaine<l score. The
equation
for compu ting error is
E= Y Y..,.. ..
~1en X= 4, there arc two obL1ined ''alues of Y: I and 2. The
p redicted value of Y is
Y,,...t = - 3.555 I 1.279( 4) = - 3.555 + S. l l6 ~ 1.56 1.
rhe error of prediction i~ E =I - 1.561 = -0.561 fu r Y = I, and
E - 2 - 1.561 = +0.<139fnr Y=2 .
If we square each error difference score and sum the squares.
then we get a quantity called the enor sum of sq.ure;., which i;.
r~presented b)•

SSI: L( Y - Y,..,.,)'.
T he regressi011 line io !he o ne line that give> the sm.11lcst va
lue
fo r SSt.
80 P~oar 1 • QUAtHnAnvE A ,ROACHES: FouNOAHO~r~~$ of
DAtA Conte I!Otf
The SSE is a measure of the lOla I variability of obtained score
values around their pre-
dicted values. There are two other ;un" of squares !hat are
important to undcr>tanding
correlation and regri'SSion.
The total sum of squ.m:s (SS1) i$ a measure of the total
variabilit)' of the obtained
score values around the mean oft he obtained scores. The SST is
represented by
SST = L(Y-Y)'.
The remaining sum of squa 1·cs is coiled the regression sum of
S<Ju:u·cs (SSR) o r the
explained sum of squares. If we squnre each of the differences
between prcdie1 cd scores
and t he mean and then add t hem u p, w·c get the SSR, which is
represented by
SSR L( v, .... - Y)'.
The SSR is a measure of the tot.d variabil ity of the predicted
score values around the
mean of the obtained scores.

An important and interesting feature of the>e three sums of
squares is that the sum of
the SSR and SSE is equal to the SS1:
SST SSR- SSE.
This leads us to three o ther imponnnt stat istics: t he proportion
of variance explJined
(I'VE) , the correlation coefficient, ond the standard error of
estim ate.
Proportion of Iarin nee Expluir~ctl. T ht I'VE is a measure of
how good Lhc rcs,·cssion line
p red icts obtained scores. The values of PV£ 1·ange fro m 0 (
no p red ictive value) to I ( pre-
diction with perfect accurJLy). The cqunt ion fo r PV£ is
SSR
J>vE - - ·
SST
There also is a computational equation for the PVE. which is
where
PVE - ( SSXY )'
SSX • SSY'
SSXY is the "co variance" ~um of ;qua res: l.(X - X)( Y - Y ),
SSX is t he sum of squares for vn rinble X: IlX - XJ', and
SSYis the sum of squares for varinblc Y: 2:( Y - Y)'.
The procedure fo r computing these sums of squares is outlined
in Table 6.4.

The proportion of v.triance in the freque ncy of corporal
punishment thnl may be
explained by stressors experienced ;,
( 4 6L5)1 3782.25
l'VE = - = = 0 .953.
(48.1)(825) 3968.25
TABLE 6.4 Computation of r2 (PVE)
y Y - y (Y- Y)' X X x (X - X)' (X X)( Y Y)
3 -33 10 .89 0 -4 5 20 .2 5 +1405
4 -2 3 5.29 -lS 12 .25 +80S
4 -23 529 2 -15 6 .25 < 5.75
5 - Ll 1.69 3 1.5 2.25 • 1.95
6 -ol 0 .09 < -o5 0.25 0 IS
7 +0./ 0.49 5 ·10.5 0.25 035
8 + II 2.89 6 ; 1.5 2 .25 • 2.55
7 TO.! 0.49 7 12.5 6 .25 11.75
9 +27 7.29 R t3.5 12.25 -19.45
10 +3 I 13 69 9 "'5 20.25 16.65
NOTE: Y - 6.3; SSY - 48. l; X = 4.5; S5X = 82.5; S5XY • •6 l S

The PVEsometimes is en lied th~ coefticient of determination
and is represented by the
symbol r'.
Correlation Co~ffirirm. A correlation coellicient also is a
111easure of th e strength of rela-
tionship between two variables. The correlation cocfficicnt is
represented by the letter r
and can take on values between - 1 and + I inclu~ivc. The
correlation coefficient always has
the same sign a.< the slope. If one squares a correlation
coefficient, then <me will obtain the
PV£ It is computed using the following formula:
SSXY
r = -vr.;;S50sx""•""S;;;S;;o;Y
For our examph: data, the correlation coefficient would be
+61.5 ~ 61.5 +61.5
R --- = = = -0.976 .
./(18.1)(82.5) ¥'3968.25 62.994
Standard Error of Em mate. The standard error of estimate is the
<tandard deviation of the
prediction errors. It i< computed like any other standard
deviation: the: square root of the
SSE divided by the dcRn:es of freedom.
The fi rst s tep is to compute the variance error (s:.J:
..1
'E

SSE
n-2
Notice that the value for degrees of freedom is 11 2 rather than
11 - l. The reason why
we subtract 2 in this instance is that variance error (and
standard Cfi'Or of c:stimatc) is a
statistic describing characteristics of two variables. T hey deal
with the error involved in
the prediction of Y (one variable) from X {the other v.triable) .
'l he standard error of estimate is the square root of the variance
error:
Sf.= ...j(ij.
The standard error of estimate tells us hOv spread out scores
are with respect to their
predicted values. If the error· scores ( E = Y- Y,.o~> are
normally distributed around the
prediction line, then about 68% of actual scores will foil
between ±I :;,; of their predicted
values.
We can calculate the standard error of estimate using the
foUowing computing formula:
( n-1) ( I -- r 2)(-------) , u-2
where
s,. is the standard deviation of Y,
r is the correlation coefficient fo r X and Y, and
n is tl1e sample si7.c.

for the example dat..1, this would be
S£ = 2.3lli ((J -- .953) :~ = D = 2.311 ((0.47)~)
= 2.311J0.053 = (0.230)(0.727) = 0 .167.
Inferential Statistics: Hypothesis Testing
The Null and Alternative Hypotheses
Classical ;tatistical hypothesis testing is based on the evaluation
of two rival hypothescs:
the null hypothesis and the alrermltive hypothesis.
We try to dete<:t relationsh ips by identifying changes that are
unl ikely to have occurred
simp!)• bccau~e of random fluctuat ions <If dependent
measures. Statistical analysis is the
usual procedure for identil)•ing ;uch relationsh•p>.
The null hypothesis is the hypotltcsis that there is no
relationship between two vari-
ables. This implies that if the null hypothesis is true, then any
apparent relationship in
Mmples i> the resuh of random flu ctuations in the dependent
meas ure or sampling error.
Statistical hypothesis tests arc carried out on samples. for
example, in nn experi-
ment!// two-gro11p posttcst-only design, there would be a
sample whose members
received an intervention and a sample whose members did not.
Both of these would be
probability samples from a larger population. The interven tion
>ample would reprcse>11

Figure 6.2
The Null Hypothesis
and Type I Error
C14Anu 6 • StAJtmu f<M' Socw. Wouus 83
the popula tion of all individuals as if they had received the
i.ntervt•ntion. Th e control
sample would be repre<entative of the <ame popuiJtion of
individuals as if the)· had
not recei>·ed the inten-emion.
lf the intervention had no effect, then th e populations would be
iden tical. However, it
would be unlikely that two samples from two ident ical popula
tions would he ident ical. So,
although the sample mea ns would be diffe rent, they would not
rcpre>CtH any effect of t he
independent variable. The apparent difference would be due to
sampling error.
Statistical hypothC$is tests invoh·e e'-aluating evidence from
.amples to make inler-
ences about populations. II is for this reason that the null
hypothe>i> is a statement about
population parameters. For example, o ne null hypothe>iS for I
he previous design cou ld be
stated as
or as
H, : ll = ~to = 0.
H, stands for the null hypothC$iS. It is J letter H with J " ro

subscript. It is a statement
t.ha t the m~ans of the experime ntal ( Mean I) and cont rol (
Mean 2) popultnio'ls arc eq ual.
To <:>tablish that a relat ionship exists between th e in
tervention (independent Vilfi:tble)
and the outcome (measure o f the dependent variable), we must
collect eviden<C that
allows us to reject the null h)'J>Othesis.
Strictly speaking, we do not mak~ J decision as to whether the
nul] hypoth eoi:. is
correct. Ve evaluate the evidence to determine the ext<·nL to
which it •cncls to confirn"' or
disconfi rm the null hypothesis. If the evide nce wct·e suc.h that
it is unlikely that an
observed relationship would have ocwrrcd as the re.ult of
sampling e r ror, then we would
reject the null hypothesis. If the eviden«: were more ambiguous,
then we would f.1il to
reject the null hypothesis. The terms re;err and fail to rrjm carry
the implicit under<tand-
ing tlMt our decision might be in ert'or. Th e truth i, th at we n
ever really know whethe r
our decbio11 is correct.
vVhen we reject the n ull hypothesh and it is true, we ltJve
committed a Type I error. By
setting certain statistic•! criteria beforehand, we can ~"tablish
the prombiliry that we "•ill
commit a 'JYpe l error. 'c decide what proportion of the time
we arc willing to commit a
Type l error. This proportion ( proba bility) is called a l1>ha
(o:). If we n1e willing to reject
the null hypothesis when it is true onl)• I in 20 times, thc11 we
set our a level at .05. If' on ly I

in 100 time>, then we set it at .0 I.
Tbe probability that we will fail to reje<t the null hy]>Othesis
when it is true (correct
deci;ion) ts 1 - a (Figure 6.2).
Situahon: NULL HYPOTH ESIS TRUE
Deas1on ACSlllt
Reject H, 1'ype I Error
ex • the probability or rejecting the Null Hypo thes is when it is
true
Fail to Reject H, Correct Decision
I a= the probability of not rejecttng the Nun Hypothesis wllcn
11 is true.
84 PAII t I • Qv.umr:.WI~ A PI'IOACHH: Fourwt. lt<m S OF 0
1.1A CotulCI!Oii
Figure G.:Y
The Nu ll Hypothesis
and u Level
The fol!pwing hypothesis would be evaluated by c<>mparing
the difference between
sample means:
If' we carried out multiple samples from populations with
identical. n>eans (the null
hypothesis was true), then we would find that most of the
vallles for the differences

between the sample means wou ld not be 0. Figure 6.3
represents a distribm ion of the dif·
fercn ces between sample means drawn from identical
populations.
The mean d ifference for the total distribution of samp le means
is 0, and the standard
deviation is 5. I f the differences are normally distributed, then
approximately 68% of
lhese differences will be between - 5 (z = - 1) and +5 (z= +l).
Fully 95% of the differences
in the distribution will fall between the range of -9.8 ( z =-1
.96} and +9.8 (z = +1 .96). If
we drew a random sa mple from each population, it '~ould not
be unusual to find a di ffer-
ence between sample means of as mnch as 9 .8, even though the
population means were
the same.
On the other hand, we would expect to fin d a difference more
than 9.8 about 1 in 20
times. If we set our criterion fo r rejecting the null hypothesis
such that a mean difference
must be greater than +9.8 or less than - 9.8, tben we would
commit a Type I error only 1
in 20 times (.OS) on average. O ur (J. level ( the probability of
committing a Type l error)
would be set at .05.
The probability that a relationship or a difference of a certain
size would be seen in a
sample if the nuU hypothesis were true is represented by p. To
reject the null hypothesis,
p mu~t be less than or equal to <X. The probability of getting
an effect this large or !~rger if
the null hypothesis were true is less than or equal to the

probability of making a Type l
error that we ha,•e decided is acceptable.
1 - u = .95
- 4 - 3 - 1 0 +1 +3 +4
z
- 20 - 15 - 10 - 5 0 +5 +10 +15 +20
X, -x2
a = .05
CH..,tU 6 • Sr.r.nsnu •o• SoctAt Wo~·~ui 85
Rejecting the H0: We believe that it i~ likely that the
relationship in the sample IS gcncr
alizablc to the population.
Not rejutmg the H,; We do not believe that we have >umcient
e1•idence to draw infer-
ences about the populat ion.
For the previous example, let us imagine that we ha-e set a=
.OS. Al;o, imagine thJt we
obtained a difference betwt-en the sample me.ms of 10. The
probability that we would
obtain a difference of +10 or - 10 would be equivalent to the
probability of a z ~core
g reater than +2.0 plus the probabilit y of a z ~core less th.111 -
2.0 o r .0228 + .0228 = .0156.
This is o ur p value; p = .0456. Because p <a, we would reject
the n ull hypothesis.
Some texts create the impression that the alternative (or

research or experimental)
hypothes~ b simply tbc opposite of the null hypothesis. In fact,
sometimes d1is nail·c
alternative h)pothesis is used. However, it generally is not
particularly useful to
researchers. Usually. we nrc inrertsted i n defecting an in
lcrvention effccl of a particu l :~r
size. On certnin measu,·c,, we would be interested in .mwll
effects (<:.g., death rate),
whereas on others, o nly l~rger effects would be of interest.
When we are inter<5ted in an effect of a particular •ize. we use
a specific altemnti1e
hypotbesil. that takes the following form:
H, : f.l 1 - ~,.,;:: id I,
where dis a difference of a particular size. If the test is a
nondirectional I<'St, then the dif-
ference in the alternative hypothesis would be expressed as an
absolute value, ldl, to ohnw
that either ,t positive or neg.tt tve differe~tct~ ;, involv~d.
lt is custo mary to exprc>S the mea11 d i ffere nce in an II , in
units of standard deviat ion.
Such scores are called zsco,·es. T he diffe(ence is called an
effect size. Effect sizes frequently
are used in meta-analyse> of outcome studies to compare the
relatic cllicacy of different
t )'Pes of intencntioos acrOS> 'tudies.
Cohen (1988) groups effect sizes into small, medium , and large
cntegorics. The criteda
for each arc al follows:
Small effect >iu (d ~ .2): It is appro:rimatcly the effect size for

the average difference in
height (i.e., 0.5 inches and < = 2.1) between 15- and 16 year-old
girls.
Medium effect size (d • .5): It is ap proximately the effect size
fo r t he average differc11ce
in heigh t ( i.e., 1.0 inches and s~ 2.0) bNwccn 14- aud 18·
year-old g ir ls.
Large cff<Xl size (d: .8): rh1s is the same eflect size (tl = .8) as
the avcrJge difference in
height for 13- and 18-year-old girls.
l ntuit iv<:ly. it would se..-m t hat we wo uld want to detect
even ve1y >mall effect si ~t·s in
our research. llo1Vever, t here is a practicdl trade-off involved.
All o ther things being equal.
the consistcllt detection of unaU effect >izc' requires very large
(1l > 200) sample size,,
Because 'cry large sample sizes require resources thdt might not
be readily available,
they might not be practical for all studies. Furthermore. there
are c~rtail1 outcome vari-
ables for which we would not be part icuia l'l y in terested in
small effec t>.
If we rejeCt t he null hypothesis, t hen we implicitly huvc
decided that t he evidence >Up-
ports the alternative hypothesis. If the alttrnative hypothc<is is
true and we reject t he null
hypothesis. then we have m3de a correct decision. However, if
we fail to reject the null
hypothesis and the alternati•e hypothesis is true, then we hJve
committC'd a Type II error.
A Type !I error involves the fa ilure to detect an existing effect

(Figure 6.4).
86 P1o11r I • Qt•MmTM •; e A ?PIOAC HtS: Fou NDAti ON)
o, 0.-.tA Contr'fiO'I
Figur• 6 .4
The Null Hypoth<sis
and Typo II Error
Decision
Reject 1io
Fail to Reject
H•
Siluation: ALTERNATIVE HYPOTHESIS TRUE
Result
Correct 0 edslon
1 -13 a t he
Alternative
probabinty of rejecling tho Null Hypothesis when the
Hypothesis is true. The power ot a test.
Type II E n· or
I}~ the p r
Altornatlvo

obability of not rejecling the Null Hypothesis w11e 11 the
Hypothesis is true.
Beta(~) is t he probdbility o f committing a Type rr error. This
probability is eStdblished
when we set our criterion for rejecting the null hypothesis. The
probdbility of a correct
decision (I - f3) is an importdnt probability. It is so important
that it has a nJmc~power.
Power refers to the probability t h.u "e will detect an eff«t of
the size we have sckctcd.
We should decide on the power (I - (3) as well as the a level
before we carry out a sta-
tistical test. just as with Type 1 error, we should decide
beforehand how often we are will-
ing to make a Type 11 error (fail to detect a certain effect size).
This is our f3 level. The
procedure for making such determinat ions is discussed in
Cohen ( 1988).
Assumptio ns for Statisti cal Hypothesis Tests
Although assumptions arc diffc •·cm leu different tests, all tests
of the uull hypo1 hcsis shn re
two related assumptions: randomness nud independence.
T he randomness assum ption is t hnt sample members m ust be
randomly selected from
the populatio n being evaluate d. If the sample is being divided
into groups (e.g., trc:>tment
and control), then assignment to gro ups al.<e> must be random.
This is referred to as mn-
rlom selection and random fWigmnem.
The mathematical models that underlie statistical hypothesis

testing depend on ran-
dom sampling. If the samples Jre not random. then •<e cannot
compute .111 accurate prob·
ability (p) that the sample could have resulted if the null
hypothesi~ were true.
The independence asswnption t. that one member's score •<ill
not innucncc another
member's score. The only common re!Jtionship among group
scores should be the inter-
vention. One implication of this is t hat members of a group
should not have any contact
with each other so as nut to a !Teet each o ther's scores.
Again, the mathematical models are dependent on the
independence of sample scores.
l f t he scores are not independent, t hen the probability (p) is,
as before. >i mply n number
t h•t has little to do with the p ro babilit)' of a Type I erro r.
Parametric and Nonpara metric Hypothesis Tests
Traditionally. hypothesis tests arc g rouped into parametric and
nonp.trJntCt ric tests. T he
names are misleading given th at one class of test has no more
or less to do with popula-
tion parameters than t he other. T he difference between t he
two tests lies in the mathe
matical assumptions used to compute the likelihood of a Type I
error.
Parametric tests are based on the assumption that t he
populations from whkh the
samples are drown are norm.•lly di~t rihuted. Non parametric
tests do not have this rigid

C HAJ>TEJI 6 • STATI 11(~ 1011: SOCIAl WO !U({IS 87
assumption. T hus, a non parametric test can be carr ied out on a
broader range of data
than can a parametric test. Nonparametric lests remain
serviceable even in circumstances
where parametric procedures collapse.
When the populations from which we sample are nor mally
distributed , and when all
the other assumptions of t he parametric test are met, parametric
test~ are slightly more
powerful than non parametr ic tests. However, when the
parametr ic assu mptions are not
met, nonparametric tests are more powerful.
Specific Hypothesis Tests
•Ve now investigate several frequently used hypothesis te.m
and issues surrounding their
appropria te use. Where appropriate, parametric and
nonparametric tes ts are presented
together for ead1 type of design.
Single-Sample Hypothesis Tests
These are tests i n which a single sample is drawn. Comparisons
are made between sample
values and population parameters to see whether the sample
differs in a statistically sig-
nificant way fro m the parent populnt.ion. Occasionally, these
tests are used to determine
~<hether a sample differs from some theoretical population.

For example, we might wish to gather evidence as to whether a
particular population
was normally distributed. We would take a randon1 sample from
this population and com·
pare the <l istribution of scores to an artificially constructed,
normally d istr ibuted set of
scores. If there were a statistically significam difference, tben
we would reject the hypothe-
sis tlwt our sample came from~ normally distributed population
(the null hypothesis}.
Typicrully, these tests are not used for experiments. T hey tend
to be used to demonstrate
that certain strata within populations differ from t he population
as a whole.
Here, we investigate two single-sample test~:
L Single-sample rtest (interval or ratio scale)
2. x' (chi-square) goodness of fit test (nominal scale)
TIJe Single-Srmrple t Test. This rest usually is used to sec
whether a strotum of a population
is different on average from the population as a whole (e.g., are
the mean wages received
by social workers in Lansing different from the mean for aU
social workers in M ichigaJJ?) .
The null hypothesis for t his test is t hat the mean wages fo r a
particular strntum
(l ansing social workers) of the population and the population as
a whole ( Michigan
social wor kers) will be the same:
where !lo is the mean wage fo r the population and ~t 1 is the

mean wage fo r t he stratum.
The assumptions of the single-sample t test are as follows:
Randomness: Sample members must be randomly drawn from
the pop ulation.
fndeptmdence: Sa mple (X) scores rnust be independent of each
other.
Sct1liug:The dependent m~sure (X scores) must be interval or
ratio.
Norma l distribr<tion:The population of X scores must be nor
mally di&tributed.
88 PAIIT I • QUANnrAnVf At-nOA.t-H£s: Fo u iOAnotn o•
OA t A Cou.£CIION
These asswnprioos are li<ted more or lc:.s in order of
in1portance. VioiJtions of the frrsr
three assumptions are es>entiJIIy "f•tal" ones. E'·en slight
violations of the lir..t two
assumptions can introduce major error into the compmation of p
value~.
Violation of the assumption of,, normal distribution will
introduce >Ome error into
the computation of p vJiues. Unless the population distribution
is markedly different
fro m a normal distribution, rhe erro" will tend to be slight
(e.g., a re ported p v.tlue of.042
Jctu ally will be a p value of .057). This is what is meant whe n
some-one snys t ha t the t test
is a <•robust" test.

T he tstatistic fo•· t he sing le sample t te;t is computed by
subtr:ocr ing t he null hypotbe-
• is (popula tion) mean from t h e s"mple mean and dividing by
th e sta ndard error of th e
n1ean.
T he fo rmu la for r...,, (pronOlii1Ced "t obr•ined") is
As the absolute value of '·• get> larger, tht> more unlikely it is
that such a difference
could occur if the null hypothc>sis is true. At a certain point,
tht' probabilit)' (p) of obtam-
ing a t so large becomes sufficiently small (rt'acbt'S the a.
level) that we rcjt'<t the null
hypotbt'Sis.
T he critical value oft (the v.d ue t hat too. must equal or exceed
to reject the null hypoth-
esis) depends o n the degrees of freedom. For a single-sample
rtest,the degree> of freedom
ure df= n - I , whe re" is the s.omp k >itt'.
Let us look at how to compute '"k
v.re know from a statewide SUI'VC)' I hat the average time
taken to complete an outpa-
tient rehabilitation p rogram r-or .o certain injury, X, is 46.6 d
ays. We w ish to see whethe r
clients seen at o u r clinic nrc taking longer o r ;horter than the
state average.
We randomly sa mple 16 fil e< from the pa>t year, We review
these c.1>cS anu dete•mine
the length of program for each of the clients in the sample. The
mean n umber of days to

complete rehabilitation a t our clinic is 19.875 days. This is
lower than the populat ion
mean of 46.6 days. The question is whether this result is
statistically significant. I> itlikel)'
that this sample could ha,·e been drawn from a population with
a mean of 46.6?
To determine thi>, we ne..'<lto calculate r.,... The first step in
calculating t,_,. was arriro out
when we computro the sample mean. Tite next step is to
compute the standard error of the
lllt'aO. We begin this by <umpu ung the standard deviation,
which t urns our to be s 11.888.
Th e standard erro r of the lliCJn i> calculated by d ividing the
standard deviation by t he
square root of the sample size or
s;
_s_ = l 1.888 = l 1.888 =
2
_
9
72.
/ii Jl6 4
We take th e fo rmu la for t,,..., Joel p lug in our n umbers 10
obLain
29.875- 46.6
2.972

-1 6.725 8
2.972 - 5.62
We look up the tabled t val u e {I., ) at 15 degrees offreroom.
This turns out to be 2. 131
for a nondirectional test at (X .05 (sec • t•ble of the critical
values for the ttt»t, non<li-
rectional, found in most ,tatistie> texts). The absolute , .. Jue of
r.,.. = 5.628. This is greater
than t"" = 2.131, so we reject the n ull hypothesis. The e-.-
idencc suggests thot clicnls in o ur
clinic average fewer days in rehabilitation thon is t he case in
the statewide population.
T he effect size index for a test o f means is d and is computed
as follows fo r a single-
sample t test:
d = ~~o .
s
The effect size for our example would be as follows:
d = 29.875 - 46.6
11.888
which would be classifie d as a large effect.
-16.725
11.888 = 1.4069'
1he x' Cootfne;s-of· Fit Test . Th e.%' goodness- of-fit test is a
single·sam pic test. lL is used in

t he evaluation of 11ominal (categorical) variables. The test
involves comparisons between
observed and expected frequencies wi thin strata in a sample.
Expected freq uencies are
derived from either population v-alues or t heoretical values.
Observed frequencie-s are
those derived from the sample.
T he null hypothesis for !he x' test is that the population from
which the s.1mple has
been drawn will have !he same proportion of members in each
category as the empirical
or theoretical null hypothesis population:
where
P., is the proportion o r case~ •.vitbin category kin the null
hypothesis population
(expected), and
P01 is the proportion of cases within category k in the
population from which the test
sample was drawn (observed).
The assumptio n> fo r thet' goodness-of fit test arc as follows:
• Randomness: Sample members m ust be randornly drawn from
the populnt i<)ll.
• Independence: Snmplc scores m ust be independent of each
other. O ne im plication of
this is that categories must be mut ually cxclu;ive (no case may
appear in more than
one category).
• Scaling: The dependent measure (categories) m ust be

nominal.
• expected frequenck$: No exl'ected frequency within a category
should be less !han I,
and no more than 20% of the expected frequencies should be
less than 5.
As "ith all tests of !he nuU hypothesis, the x' test begins with
the assumptions of ran ·
domness and independence. Deriving fr o m thc.~c assumptions
is the requirement that the
categor ies in the cross-tabulation must be mutunlly exclusive
and exhaustive.
Mutually exclusive means t hat an individual may not be in
more than one categot)' per
variable. ExiJaustive means that all categories of int ere;t arc
covered.
These assumpliom nrc listed more or less in o rder of
i.n1portance. Violations of the first
three assumptions are essentially "fatal" ones. Even slight
violations of the first two
assumptions can introduce major errors into the computation of
p values.
90 PA~-r l • OVAinllAt•vt Al'tfiOoCI!CS: FouNOoTION<o 01
DAYA C.ouu:.HON
They} goodness-of-fit test is basically a h>rgc-sam plc test.
Whc11 the c·xpectcd frequen
cies are small (expected frequency les.~ thnn I or atlc:1~t 20o,(,
of expected ft·equ,•ncics less
than 5), the probabilities associated with the X' t~St will be in

accurate.
The usual pt·occdtu'c in this case is either to increase expc led
frc<1ucncb b)' colbp, ing
adj.>ccnt C<>tcgorics (also called cells) <>r to u.<c '"' ot her
test. Follo<"ing is a concrete
CX:l111 plc.
The workers at the Interdenom ina tional Social Services Center
in St. Win ifre d
Township wanted to see whether they were servi ng people o f
all fniths (and those of no
fit ith) equ:11l)'· The)' had census 11gures indicating that
religious preferences in the town>hip
were as follows: Ch risti~n (64%), Jewish (10%), Muslim (8%),
other religionino preference
(14%). and agnostic/atheist ( 4%).
The workers randomly sampled 50 clients from those seen
during the previous year.
Befor• they drew the sample, they calculated the expected freq
uency for each category. To
obtain rhe expected frequencies for the sample, the)' converted
the percentage for each
preference to a decimal proportion and multiplied ir by 50.
Thus, the expected frequency
for Christians was 64% of 50 or .64 x 50 : 32, the Jewish
category was 10% of 50 or
. 10 x 50 = 5, and so on. Table 6.5 depicts the expected
frequencies.
TABLE 6.5 Expected Frequencies for Religious Preferences
Expected
fr(!q uency

Christi (In
J2
Jewish
5
ti1uslim Other/No Preference Agnostic/ Atheist
4 7 2
Two (40%) of our expected frequencies (Muslim and
agnostichlllteist) are less than 5.
Given that the maximum allowable is 20%, we are violating a
test assumption . We can
remedy this by collapsing categories (merging two or more
categories into one) Ot' by
increasing the sample size. However, thet·e is no c.ategoq• that
we could reasonably com·
bir1e with agnostic/atheist. lt would not work to combine this
C<tegory with any of the
other categol'ics because the latter ar• religious individuals,
whereas atheists and agnostics
aJe not religious.
However, we could increase the sample size. To get a sample in
which onl)• one (20%)
o f the expected frequencies was less than 5, we would need a
sample large enough so that
8% ( percentage of the population identifying as Muslim ) of il
would equal 5:
0.08 • 11 = 5
" = - 5- = 62.5 "' 6J.

0.08
So, our sample size would need to be 63, givi11g us th e
expected frcq ucncio.:> show11 in
Table 6.6. On!)' one of live (20%) of the expect«l frequencies is
less I han 5, and nQne of
them is less tha n I, so the s:un ple size assumption is mel. The
results of a random sample
of 63 cases were as found in Table 6.7.
TABLE 6.6 New Expected Frequencies for Religious
Prefere~ce; ' · < · ;. : •: •: •
. . ~ ' * •
Christian Jewish Muslim Other/No P(eference Agn ostic:/
Atheist
--------------------------
~>:pecte.fl
frcq uc:nc;·
~0.32 6.30 5.04 8 82 2 52
TABLE 6.7 Observed and Expected Frequencies for Religious
Preferences
Christian Jewish Muslim Other/ No Preference Agno$tic/
Ath~isl:
Expected 40.3L &.30 5.04 8 .82 2.52
rr~(j ll CrtCy
Obse1·.-cd 49 2 2 9
frequency

The null hypothesis fo r this example is th;~ l the p roporlion of
peo ple living in St.
Win ifred T<>wnship who identify 1vith each religious
categor)' will be the sam.: as the pro·
portion of people who have received services at the
Interdenominational Services Center
in St. Winifred 1b w nship who identify wit·h each relig io us
catt:gory.
The null hypoth~sis expresses the expectation that observed and
expected frequencies
will not be differem. Notice the similari ty ben~<.>en the nu ll
hypothesis and the numerator
of the ,,, .•. test statistic:
/v IJ&
X2 = "' (Jo - rd 0 0 1 L- fE .
T he form ula tells us to >U btract the e xpe<ied score from the
observed score (j~ -.0 and
then to square the difference (ffo - f.:]' ) and divide by the
expected score (ff0 - J~l'!f.) for
each observed and expected score pair. •Vhen we are fmished,
we add the answers and
o bta in the X',,, test s~tlist ic (Ta ble 6.&).
The x.,. is evaluated by comparing it to a cr-itical value <x'.,,)
that is obtained from a
table of critical values of the X2 distribution. If X'.,b, is greater
than or equal to x', ... • then
we reject t he null hypot hesis.
For ax' goodness of fit, the degrees of freedom are equal to the
number of ,,ategories

(c) min us I or df = c- L In our case, we have five categories
(Christian. Jewish, Muslim,
otherino prefere nce, and agnostic/athe;st), so df = 5- I = 4.
The critical value fo r X' at C< = .05 an d df =4 is X' .," = 9.49.
We have calculllted 7.'.,., as
23. 1295. Because X1<,1>1 is greater than X.~ena , we reject
the null hypothesh:. The evidence .sug-
gests that people of all faiths (and those of no faith) are not
being sec11 proportionately to
their representations in the township.
Earlier, we discussed the use of t he effect size measure d for
the t test. Jt is an appropri-
ale measure of eftect size: fO r a test of means. However, Lhc
X2 test doc,~ not compare
92 PAIT I • Q UAIITI TA.Tivt A PPfiOAW £s: fou~OAliONS
O f DATA Coll.ECTI OM
TABLE 6.8 Computation of x' ...
Observed (f
0
) Expected (f,) fo - fe lfc - f,)' (f.- t,)'
f,
49 4032 +8.68 75.3424 17.4404
2 6-30 -4.30 18.4900 2.9349
2 5 04 - 3.04 9.24 16 1.8337

9 .8.82 - 0. 18 0.0324 0.0037
2.S2 - 1.52 2.310• 0.9!68
!'JOT!.: I
(f, - f,)'
17,4404 + 2.9349 + I 8337 + 0.0037 + 0.9168= :t',, = 23.1295.
f,
means. It compares frequencies (or proportions}. Therefore, a d
ifferent effect size index is
used for the X' test-w. This measure of effect size ranges from 0
to I . Cohen ( !988) clas-
sifi es these effect s izes into three categories:
Small effe<i size: w~ .10
Medium effect size: w ~ .30
Large effect size: w ~ .50
The effect size c.oefficient for a x! goodness-of-fi t test is
computed according to the fol-
lowing formula:
where N = the total sample size.
For the St. Winifred Township example,
IV= J(23.! 295/ 63}- J(0.367l) = 0.6059,
which would be classiGed as a large effect.
Hypothesis Tests for Two Related Samples
These are Jests in which either a single sample is drawn and

rneasLtremen ts are taken at
rwo times or two samples are drawn and members of the sample
are individually matched
o n som e altribute. ~vfeasureJDeDts are taken fot each member
of the matched groups.
We· investigate three examples of two related sample tests in
this section:
I. Dependent (matched, paired, correlated) samples t test (in
terval or ratio scale)
2. Wilcoxon matched pairs, signed rank.~ test (ordinal scale)
3. McNemar change test ( nominal scale)
C1MPH~ 6 • Sunsncs FOR Sot-IAt 'IOKKUlS 93
Difference Scores. The dependent r test and the Wilcoxon
matched pairs, signed ranks test
evaluate d ifference scores. These may be differences between
scores f<om measuremenl~
taken m two differen t times on the same individual (pretest and
posttest) or differences
between scores taken on two diffe rent individuals who have
been paired or matched with
each other based on their similarity on some variable or variable
cluster (e.g., gender,
race/etllnicity, socioeconomic status). The formula for a d
ifference score is
x; - X1 =X0 ,
X, is the first of a pair of scores,

x; is the second of a pair of scores. and
X
0
is the d ifference between the two.
The null hypothesis for all these tests is that the samples came
from popub tions in
which the expected differences are zero.
Tlte Dependenr. Samples t Test. This also is called the
correlated, paired, or matched t test.
The nu ll hypothesis for this test is that the mean of the
differences between the paired
scores is 0:
where
J.l.xo = the mean diffe rence between the populations from
which the samples were
d rav.1n) and
)!00 "" the mean difference between the populations specified
by the null hypothesis.
Because the null hypotnesis typically Sp<!cifies no difference
(!!00 = 0), the null hypothe-
sis usually is written as
The t statistic for the dependent t test is the mean of the sample
differences divided by
the standard error of the mean difference or
Xo - l'oo
lobt = 5= ·

XD
As the absolute va.lue of t. gets larger, the more unlikely it is
that such a difference could
occur if the nnll ll)'pothesis is true. AI a certain point, the p
robability (p) of obtaining at so
large becomes sufficiently small (reaches the alpha level) that
we reject the null hypothesis.
The assumptions of the dependem t test are as follows:
Randomness: Sample members must be randomly d rawn from
the population.
Tndependence: Xvscores must be independen t of each other.
Sca ling: The Mpcndt'nt measure (X
0
scores) must be interval or ratio.
No r·mal distribution: The population of X
0
scores must be normally distributed .
These a>sumptions a re list ed more or less in order of import>l
11cc. Viola tions of the t1 rsl
t hree asswup tions i1re essen t ially "dea th penalty" violation..
Eve n slight violation. "r the
(ht two assumpti011s can intr oduce majo r e rror in to th e
comp ullll ion or p values. Sim i lady,
dilTnence scores computed fro1n ~""'O sel!t of ordi nal data
tnay inwrporate major error.

Violation of th~ assu mption of a normal distribution "ill
introduce some error into
the computation of p values. However. Wllcss the population
distribution is markedly dif
fcrent fi-om a normal di>tribu tion, the errors will tend to be
slight (e.g., a reported p value
of .042 actually will be a p value of .057). Th is is what is
ml·an t wh en someone '"YS thnt
the t test is a "'robu~t .. test.
Still, cvm thoug h t he erwr is sli~;ht, the nonp<tr<~metric.
Wikoxon rn;,tch ed ~>t~ irs,
sig ned ranks test (discussed in the next section} prob;,bly will
yield a more accu rate test
when there are viulation~ of this normal dislribution
as.su.mpliun.
Let us look at the proc<"<iure for compuling th<: dependent
grouvs I statistic. We usc an
evaluation uf an intervention for individuals '"ith dcpre..,.inn
problems. The dependent
measure is the Bclk Depression Inventory ( liD I), a reliable and
well 'alidated mea>urc nf
dcpn:s~;un.
Ten clienL~ were rand omly s~kcted r,·om clients seen fo r d ep
ression problcn" a t a (l,un -
m unity cent~r. 'I 'hey were pretested (X,) with t he BDI, r<·cd
ved I he treatment, ;,nd t he n
were posrtested (~)wi th t h e same inst ru111e n1.. The m ean
of the d iffe rence scores (.k0 )
wa.s - L This means that tJ K· aven1ge: chtUl.gC' in BD f
scnrefi fron1 pcelC'Sl tu pn:-:ttest was a
dtcrease of I poinl. The standard deviation of the ditlcrcnce
s.:ort> 'aS l.H .

'I he ne>.'t step is the cnmpntation of the 'landard error ol tllc
mean. Wedhdde the stan-
dard deviation by the square rout of t he s.unpk siu: to get t he
standard c·rror of th e mean:
.< XD = 1.'33/ V 10 - l .;l3j 3 .16 = 0 .•12.
Ve plug the value.< into the formula li>r t.,.:
XI>
r"lobt = -
-'xl'>
- 1
-~ - .1..38
0.42 ..
Fo1· a = .05 and rlf ~ 11 - I = 10 - I -9, r, ... = 2.262 (sec a
t<~nle of critical values for the
1 te,r, nondire.:tional, fo und in m ost stali>Li" texts). Because
lt .... l - 2 .. l8 is greater !loan or
equal tn the critical ';liuc, we reject the null hyp(llhcsis at a=
.05.
The cff~ctsi/e index for tbiotc.,l i' ll and is rom puled a;
foUows:
;
For the depr~ssion intervention cx,unplc,
-1-0 - 1
d = = = - 0.752.

1.33 1.33
w hich wou ld be classifier! ns " medium effect.
CHAI'rER 6 • SI All~ucs Hl!t Socu .. l Woll.~Eas 95
lv'ilc&X011 Matched Pairs, Signed Ranks Test. The Wilcoxon
matched pairs, signed ranks test
is a nonparametric test for the evalua tion of d ifference scores.
The test involves ranking
d ifference scores as 10 how far they are from 0. The difference
score closest to 0 receives
the rank of I, the next score receives the rank of 2, and so on.
The ranks for diffe rence
scores below 0 are given a negative sign, whereas those above 0
are given a positive s ign.
T he null hypothesis is t hat the sample comes from a
population of di fference scores in
"' hich the expected difference score is 0.
The assumptions fo r t he Wilcoxon matched pairs, signed ranks
test are as follows:
• Ratufomness: Sample members must be randomly drawn fro m
the population.
• independence: XD scores 111ust be independen t of each
other.
• Scaling: T he dependent measure (XD scores) must be ordi nal
(interval or ratio dif-
ferences must be converted to ranks).
Let us look at the procedure for computing the Wilcoxon
matched pairs, signed ranks

test statistic. We use the same example as for t he t test. The
dependent measure is t he BDI,
a measure of depression. Scores on the BDI are not normally
distributed, tending to be
positively skewed.
Ten clients were randomly selected from clients seen for
deprcs.~ion problems at a com-
mun ity center. They were pretested w·ith the BDI~ received the
treatment, and I hen were
posttested with t he same instrument. We c.ompute the
difference scores (post -pre) fo r
each indi,·idual. We assign a rank to each difference score
based on irs closeness to 0.
Difference scores ofO do not receive a rank. Tied ranks receive
the average nlllk for the tie.
So, if we look at Table 6.9, we see that there is one difference
score of 0 that goes
unranked. There are five difference so::ores of eit her - 1 or +L
These cover t he first five
ranks {I, 2, 3, 4, 5), giving an average rank of 3. T here are
three difference scores of - 2
(and none of +2). T hese cover the next three ranks (6, 7, 8) ,
giving an average rank of 7.
The una! score is - 3, which is given the rank of 9.
TABLE 6.9 Computation of the Wilcoxon T .. ,
Signed Ranks
JD Number Pretcsl Postte.st Difference Rank Positive Negati ve
17 16 - 1 3 3
2 19 t8 -1 3 3

3 18 15 -3 9 9
4 18 17 -1 3 3
s 16 16 0
6 16 17 +1 3 3
7 18 16 - 2 7 7
8 21 19 - 2 7 7
9 18 19 .+1 3 3
10 18 16 - 2 7 7
NOTE: Sum of ranks for less, frequent ~ign ~ 6:
9 6 t-'11111 I • QUAWhlAII11 Al•f'II(IA(tUI: r t i
UNOATI(Hn ()I I)AlA (.OU I CI101i
T he M<l st<·p is to '';ign" the rank. ' I hi< mcJns to place the
rank in eith« 1hc p<hilivc
or 1hc negative <.Oiumnm 1h~ l.thle. depending on whether 1he
differ,·ncc >(Ore wa, PO>i
tivc or ncg.uivc.
We then determine which ,ign (JXl,ithe or neg.ttive) apJl<'ared
1.-s~ fre<JUCOtl)· Jnd add
up rhc r.mks for 1his >~!(n. lkcause th e positive sign ,tppearctf
only twice (comp>rctf to
~even tim~s for lhc ncg:.uivc sill.n)~ w~: add up I he rank~ in
the pO$itivc column .lnd obtain
1>. rhi•" I he IC1 l3l"lic v~lue for the Wil<OM>n mJI<.hed

J>J II.,, stgncd r:lnks test.
Th e IC> I. stati>l icis w iled 'f.,1, . This is an 11 ppcrcase T a
nd is not the >flllll' as the >tatistic
us<:d with the (lo'"erc.tse) I distribullon.
There are two other i<sues with re>pect to the Wilwxon 7.1,. •
hat shoul11 be ad,lresscd:
1. The Wilcoxon T..., is cvaluat<·d according to rhe ruombtr of
nontcro differentc
~cores. So, we should subt ract I from the o rigina l 11 fo r each
<liiferenc~ score th ot h
0 10 obtJin a corrected 11 to usc for the critical '~lue table.
2. Unlike most other t~>l &ratistic~. the Wilcoxon T,,, must be
lrss tlta11 or equa l to t he
c ritical value to ,·eject the null hypothc>is.
We consult a table of critica l values for I he W ilcoxon T(scc t
ahlc of .:ritical values for
Wilcoxon Tin any general swristics book) Jnd stt whether obe
result (7.,.. = 6) was sig·
nificant at o. = .05. lle<:ause there wa. one differen ce score
equal to 0, the corrected 11 = 9.
The critical value for the Wilcoxon 7"a t n=9 and a .05 is T.,. =
5. 1:,.. = 6 is not less than
or equ•lto the critic.ol value, so we fail to reject the nuU
h)·polhesi> at o.- .05.
There is n o weD-accepted post h oc measure of effect sizt for
Otd in:d tesL~ of rela ted
scores. One possib le measure would be proportion of
nonoverlapping scores as a measure
of effect. Cohen ( 1988) brieOy discu~s this measure, called U.

The p1·ocedure bc:gins with compul ing the miniJuum and
maximum ~cores for each of
the two related g roups. We choose the least maximum and the
greatest minimum. Tbi>
establish es the end points for the overlap range.
We count t he n umber of scores in both groups w ithin this
mngc (including rhe end
JX>ints) and divide by the total number of scores. This gives a
proportion of overlapping
score.o;. Subt ract t his number from I , and wr o btain the p
ropottion of nunoverlapping
$Cores. T his indc.~ ranges from 0 to I. Lower proportions arc
indicative of ~mallcr effects,
and higher on~> are indicative of larg<·r effects.
Cohe11 ( 1988) calcula tes equivalent< between U a nd d, which
would imply the foUow·
ing definition> of strength of effect:
Small ct rect slzr
Uugc ('tfect SIZC
d~ ~
d:.8
u- .IS
u- .33
u ~ 47
f"Or the example da1~, the minimum scooc for th e prctCl wa&
16, and the mnximum
~core w;1~ 2 1. The poSit(!St miuimum and ua.tximllln -;cores
wt:r~ 15 .md llJ. rc-'>petti•cly.

'I h e grc•test minimum is 16 •• md lht lcastm.l.ximum is 19.
Of 20 total '>()1 e.,, 1 ~ f~U with in thi, 1werl.•1> r.onge. The
p ru('<J rt ion of ovcrhop is I ~/20 c.~) .
Tl'te pwportion of nonovcrl•ppings..otc., b u~ 1 -.90 = .10.
hich would be a smJJI cft<:.:t.
CHAnt~ 6 • STAT1srtcs rQR SQetAL Wcnrxus 97
.WcNmmr Change Test. The Mc:-icmar change test is used for
pre- and post intervention
designs "'here the variables in the anai)'Sis arc dichotomously
scored (e.g., improved ~.
not impro,•ed, same,.,_ different, increase 's. decrease).
The layout for the McJ-:emar change test is shown in Figure
6.5. Cell A cont.Un> the
number of indh~dual.s who changed from+ to-. Cell B contains
the number of individ-
uals who recei,ed +on both measu rement>. Cell C contains the
number of individuals
who received - on both measurements. Cell D contains the
number of individullh who
changed from - to +. The null hypot hesis is expressed "'
where
P, is t he proportion of cases shifting from+ to- (decreasing) in
the null hypothesis
population, and
P
0

is the proponion of ca,.,; shifting from - to + (increasing) in the
ouU hypothesi'
population.
The assumptions for the McNemar change test are sintilar to
those for the X' test:
Rrmrlomness: Sample members must be randomly drawn from
the population.
Independence: Withi n-group sa111 plc sco•cs must be
independent of each other (although
llerween-group scores [pre· ~nd poM1c~1 ~cores] will
necessarily be dependent).
Smling: The dependent measure (categol'ies) must be nomi nal.
F.xpected frequencies: No expected freq ue11cy within a
category should be less than 5.
A special case of X'..,, b t he test >tatistic for the McNemar
change test:
where
t _ (If,. .fi,f - I ) 2
'"" - f, + fn
J. =the frequency in Cell A, and
fn =the freq uency in Cell D.
Th ·is is a test statistic with df = I , For rlf I , we need to
include s·omcthiug called the
Yates correction for continuity in the equation. This is - I,
which appears in the n ur.-'1~ 1'"
tor of the test statistic.

Figure 6.5
McNemar Change
Test layout
Before +
After
A B
c 0
98 PART I • QuAutlfi~T•vt A PI'AOAC HlS! Fou~JDAfiONS
OF Ot.rA CotUCliON
Let us imagine that we are interested in marijuana use among
high school students. We
also are interested in change in marijuana ust over time.
Jmagine that we collected survey
data on a random sample of ninth-graders in 2007.1n 2009, we
surveyed the same sample
that had been in ninth grade in 2007. We fo und that 32 of 65
students said that they used
marijuana during the previous year, as compared 10 23 of 65 in
2009. The results are sum-
marized in Table 6. 10.
TABLE 6.10 Observed and Expected Frequencies for the
McNemar
Change Test
2009

None Marijuana
2007
Marijvana 2 (Cell A) 21 (Cell S)
None 31 (Cell C) 11 (Cell 0)
Total 33 32
l'o!<ll
23
42
65
Cell A repn-serm thMe studeitts who had used marijuaM in
2007 hut who had nOf used
it in 2009. Cell B shows the number of students who had used
marijuana in both 2007 and
2009. CeU C shows the number of students who did not use
marijuana either in 2007 or in
2009. Cell D shows the number of students who did not use
marijuana in 2007 but who did
use it in 2009.
So, the sum of Cells A and D is the total number of students
whose patterns of mari-
juano use changed. The nuU hypothesis fo r the McNemar
change test is th at changing from
nonuse to use would be just as likely as changing from use to
nonuse.
In other words, of the I 3 individuals who c.ha11ged their

pauern of marijuana usc, "e
would expect half (6.5} to go from not using 10 using and the
other half (6.5) to go from
using to not using if the null hypothesis were true.
Tile calculation of the McNemar change test statistic is shown
in Table 6. 1 L
!'or df ~ 1 and C/. ~ .05, x,, = 3.84 (see a I<Jbe of critical
values of x' fo<md in most sta-
tistics texts). Because x ',., = 4.92, we would reject the null
hypolhesis at u = .OS. We would
conclude that there was in fact aJl increase in marijuana use
between 2007 and 2009.
TABLE 6.11 Computation of the McNemar Change Test
Statistic
( JI~ - f01)-1
2 11 8
NOTE: 7~1 = 4.923.
64
(If. - f. l- 1 I'
f..,. + fl)
4 ,9230767
CHAot1U 6 e STATISTICS fO-. SOCI~l W O'-I(rll 99
The effect size coefficient for a M':-lemar change test is wand

is computed according
to the following formula:
For the high school survey,
w = J(4.923/65) "' Jo.o757 = 0.2752,
which wo uld be classified as a medium effect.
Hypothes is Tests fQr Two Ind e p e nde nt S amples
These are tests in '•hich a sam ple is randomly drawn and
individ uals fro m the sample Jrc
rJ.ndomly assigned to one of two experimental conditions.
We investigate three examples of two independent samples
tests:
I. Independent samples (group) /test (interval or ratio scale)
2. vV"dcoxonfMann-Whitney (WfM-W) test (ordinal scale)
3. ;(2 test of independence (2 X k) ( uominal scale)
l11depeudent Samples 1 Test. T his sometimes is CJIIcd the g
roup t test. It is a test of mcJ.ns
whose null hypothesis is fo r mally stated •• follows:
Following are the assum ptions of t he independent t rest:
Randomness: Sample members m usr be randomly drawn from
the populotion and ran·
dom ly assigned to o ne of the '-"0 groups.
ltrdepe11dence: Scores must be independent of e.1ch or her.

Scalitrg: The dependenr measure musr be inrervlll or ratio.
Normal distribution: T he populations from which tbe
individuals in the samples were
d r,own must be normally distribured.
Homogeneity of variances (a,'- a ,'): ' f he samples must be
drawn from populatious
whose variances are eq ual.
Equality of sample sizes ( "• = n,): ' I he samples m ust be of
the same sir.e.
As before, these assumptions are listed more or less in o rder of
imp o rtance. T he fir. r
three assumptions are rbe " fa tal" assum pt ion;.
Violation o f the nonnaliry assumption will make for Jess
accurate p val ues. However,
unlc;.s Lhe population dist r iburion is markedly diiTerent from
a normal d isrr iburion, the
errors will tend to be slight. Slill, e"en though the error is
slight. the oonparamcrric W /M-
W test probably will be more accurate when the norma lit)•
assum prion is violated.
The independent groups t tesr alw is fair!)' robu>t .-ith respect
to •iolation of the
homogeneiry of variances assumption and the equal sample size
assumprion. A problem
may .orise when both of these assumptions are violated Jtthe
same time.
100 PAnl I • OUANntAuvt Art~AoAc.ul~ Fou~~rooAT ION>

o• 0"'" Ct~ur<TION
If the ,maller variance •~ mthc "11allca >.~mple.then the
probability of,, I ypc II ca ror ( 1101
deteaing an exi;,ting dilfcrcn<c) ia"rC.1«'>.i£ th(' larger
'ariancc is i 11 til<' <mJIIcr .amp!<-, then
1 he probability of a 1Ypc I error (rei<-.:ting the null
hypothc:.i> when it i> true) anne.a'<".
If there is no ..tSsodarion lk·twt-en s.;1mplt"' Mit.' ~lnd
vari:wcc. then ''iol.l1ion of c:.u.h of
thc>e .~S»umptions is not partiCufMiy problem.uic. There may
be fairly ,,ub>t.mtial di~
crrpJncies bctwet•n s. .. mplc si1C!' withnut much effect on
Lhc dtc.ur~cy o i Ottr /' cMim.lttl'!.
Similarly, if e- very other n~~nmption i!) mel, 1hcu a slight
difference in v11riam:c:. will not
h ave a fa rge effect on probability estimates.
T he t stat i~tic for the independent 1 lc<t is the d ifference be
tween the snmpfc 111cans
d ividc<l by the standard e•-roa· ,,r the diffprrnces between
means or
x , - x2
lut-·1 --
Sx 1- ... ~
Be«luse rwo sample mean• arc computed, 2 degrees of freedom
are lost:
df 110 + n, - 2.
where
"• = number of scores for the first group, and

11
2
= number of scores for the seco11d group.
Following is an example ot the ll>e o( the independent t test
statistic. We whh to sec
wl1ethf:r there is a difference i11 ((•vel of soci.al act iv ity in
children depending 011 whether
they are in after-school care <>r h0111c (.(ltc . Because more
childre11 attendc<l the .1fter
school program, a proportional~ stratilied >ample of 16 children
in afteHchoof care
(Group I ) and 14 childien in home care (Group 2) was drawn.
The dcpcnclcnt meJsure
v,•as a score on a socir1 l activity ).CJ )e in whk h lower scores
represent less soc ial aclivity and
higher scores represent more social activity.
We c'aluate tl1is with an independent 1 tc.L The first step in
calculating '·•• i, to com·
pule the sample mean for each group. The next step is to
compute the stJndard error of
the mean. Howe•·er,the pl'()(cdure for doing thi< i~ a little
different from that u«<< before.
A> lou might recall. the standard error of the mean is the
standard dcvi,ation d" aded by
the square root oi the sample 'ire:
$
.,;;; /sl !.. II
This also is equivalent to the squ:HC •·oot· o f the variance
times the inverse of the,., , .

p te size (l/11).
Unf{'trtunately) we c:u•not u~t..· lhis IOI'tnuln for t+ae standa
rd error o f lhc mean. It is I he
"ttdnda l'd crroJ' for a sinr,l<.- ... amplt. Bccauo,r we have two
sample:, in ,m iudcpcndt•nt
WOU(JS lCsi, the formula has to he Jitert·tf J bit.
Th~ first difference i in the (orrnuiJ for •he: va ria nce. TIH!
variM1u: i' the Uill o l
..qual'l."> divided b)' the deg~C·c~ of lrct'dom. ll•s tht same
he...- eX(Cpt that we have two
'oms of squan:s (one for Group I and one for Group 2). and o u1
degree< of freedom Jr('
11 1 rt. 2. Thi• gives "' the folfowint: cquJtion:
ss, ss1
" ' I II• 2'
CH.t.PHR 6 • Su.nsncs f OR SOC IAL W ORKERS 101
s; is the pooled estimate of the variance based on two groups,
55
1
is the sum of squares fo r Group I ,
SS, is the sum of squares for Group 2,
n
1
is the number of scores in Group J, and

n, is the number of scores in Group 2.
Because there are two groups, we do not multiply s: times (1/n);
rather, we multiply it
by i lin,+ I In,). We take the square root of this and obtain the
pooled standard error of
the mean:
S.'1-Xl = , (I 1) s- - + -P IlL nz .
The means and sums of squares for our example are presented in
Table 6.1 2. Now, let
us tq• computing t..,,.
TABLE 6.12 Group Statistics
Group Mean Sum of Squafcs "
27.8B <1330.40 16
Home care 21.36 17{)7. 16 4
First, we compute the pooled standard error of the mean (also
called the standard
error of the mean difference). We begin by calculating the
pooled variance:
ss, + ssl 43:;0.40 + 1101.16 6037.56
28 = 215.63 . = n, + n2 - 2 16+14-2
From the estimate fo r the pooled vari<Htce, we may calcubte
the standard errol' of the
mean diffe rence:
s2 - +- = ( 1 I) I' tll ll2, 2 15.63 (~ + ~) = ,128.88 = 5.37 16 14
Wt calculate 1

001
:
27.88 - 21.36 6.52
lobt = = -- = 1.213 .
5.37 5.37
For ex = .05 and df = 111 + 112 - 2 = I 6 + L4 - 2 = 28, Ia;, =
2.048. Because 1100,1 = 1.213 is
less than the critical value, we fa il to reject the null hypothesis
at a. = .OS.
102 PAI!.l I • QuANtiTATIVE AI'P~OACHES: Fou ... O-.liOM
Of 0ATA co~UtliO'f
There are two post hoc effe<:t size measures for an independent
t test. The 11m of these
(d) already has lxen di.cmsed:
Note dlatthe numerator is the difference between the two sample
m eanl and that th e
denominator is the pooled c>ti mate oft he standard deviation.
The pooh.'!! •t andard de,•i-
ation is t he square root of the pooled variance that we
calculated earlier:
Sp = fs~ = V215.63 = 14.68.
The effect size for the example would be
d = 27.88 21 36 = 6.52 = 0.44
14 .68 14.68 ,

which would be classified .ts a 1mallto medium effect size.
The other measure is Tl • (eta-.quare). n' is the proportion of
variance explained ( Pifl:) .
This is equivalent to the 'quared point-biserial correlation
coefficient and is computed by
2
/<lbt
2 if.
/Obi + d
We '''ere com paring socinl nc tivity in c hild ren in after-school
care vcrMJ> t hose in home
ca re. Children in after-sdtool cure sCC)rcd h igher on social
activity than d id c hild ren in
home care. T he differe nce was not statistically s ignificant for
<> ur chosen ex = .05.
r.,.,. was 1.2 13 with df • 28. Pu tting these numbers in t h e
formu la, we obtain the
following:
l_ ( 1.213)
1
" - ( 1.213)
2 + 28
1.471
29.47 1 = 0'0499'
So, a litde less than 5% of the variability in social activity
among the chlldren was
potentially explained by whether they were in after-school care

or home cJre.
Wilcoxon/Mann -Whiwey Test. Statistic> texts used t o reter to
this te>t as t he Mann-
~Vhitney test. Recent ly, th e name of Wilcoxon has been added
to it. The reason t hat
Wilcoxon's n ame has been added is t hat he developed the test
first and published it first
( Wilcoxon, 1945). Unfortunately, m OI'e fo lks noticed the art
ide publishtd by Mann a nd
I•Vhitn ey ( 1947) 2 years later.
Tbe W/M-W test is a nonp a1·ametric test th at involves initia
lly t reating both samples as
one group and ranking scores from lcn;t to most. After this is
done, the freq ue ncies of low
and high ranks between groups arc compared.
The assumptions of the W/M W test are as follows :
Randomness: Sample members must be randomly drawn fr<>m
the popuiJtion of inter-
est and randomly a>Signed to one of the two groups.
C U AI'rtll 6 • S IAHSHCS FOR $o cu._t W ORKU$ 103
Independence: Scores m ust be independent of each othe r.
Scaling: The dependent measure must be ordinal (inter val or
ratio scores must be con-
verted to ranks).
'When the assumptions of the t test are met, the r test will be
slightly more powel'ful

than the W!M-W test. However, if the distr ibution of
population scores is even slightly
d iffe rent from normal, t hen theW /M • W test may be t he
more powerful test.
let us look at the procedure for com puti ng t he W/M-W test
statistic. We use the same
exam ple as we d id fo r t he independent r test. We evaluated
level of social activity in
children in arter-school ca re and in home care. T he dependent
measure was a score o n a
social activity scale in which lower scores represent Jess social
activity and higher scores
represent more social activity.
The first step in carrying out the W/M· W test is to assign ranks
to the scores without
respect to which g roup individuals '"ere in. The rank of I goes
to the highest score, t he
rank of2 to the next highest score, and so on . Tied ranks
receive the average rank. We then
sum t he ran ks within each g roup. The summed ranks are
called W1 for G rou p 1 and W,
for Group 2 and are fo und in Table 6.13.
TABLE 6.13 Summed Ranks for the Wilcoxon/ Mann-Whitney
Test
Summed ranks
After-School Care
n
1
= 16

w,= 218
Home Care
n
1
= 14
w;-= 247
The test statistic for the W/M-W test is u..,,. We begin by
calculating U statistics for
each according to t he fol lo wing equations:
U
111 + ( 111 + l)
1 = 11J n;z. + lFV1
2
n2 + (n2 + 1)
U2=11rnz+ 2 w,
nt(nt + 1} u, = ,,, tiJ + 2 - w,
= ( 16)( 14) + ( l6)(~6 - I} 2 18 = 126
(]
112(n 2 + I}
2 = , J l'l:z. + -=-'-=,...--'-
2
w, = ( 16}(14) + ( 14}( 14 - l)

2
182
= 224 +-- 247 = 224 + 91 - 247 = 68.
2
We choose the smaller U as u;,.,. Ln this instance, u.,. = u, =
68.
247
u •• , m ust be less tlran or equal to the critical value to reject t
he null h ypothesis.
The critical value for the W/M· W U at n, = 16 and at n, = 14,
and o: = .OS is U"'' = 64.
104 PoIU I • 0uAN11tAT!V( A1'1'110M.Ht~ : FOU'IDATIO.,.S
or OoTA CouH.UO'
U.,..: 142 is not less than or equal to the critical value, so we
fail to rejtct the null hypothe-
sis at CL: .05.
As before, t here is no well-established effect size measure fo r
the W/M-W test. The U
m easure of nonoverlap probably would be the best bet.
For o ur example data, the minimum and maximum fo r t he
after -school care g roup
w ere 2 and 55. whereas they were 7 and 40 for the home care
grout>· The greatest mini -
mum is 7, and the le"'t ma.ximum is 40. All 14 .cores in the
home ca re g roup are within

the overlap range, and 12 of l4 scores in the after-school care
group are in t he overlap
range. This gi•es us a proportion of overlap of 26/30: .867. The
proport•on of nonover-
lap is U I .867"' .133. This would be ,, small effect.
X' Test of lmlcpt!m/ence (2 x k). The assumption> fo r d1e x'
test of indCj>Crtdence are as
follows:
/lat~dom/les.: Sample members must be rnndo mly dra"'n from
the 1>opulation.
/Jillependl'!lre: Sample scores m ust be independent of each
other. O ne implication of
this is tha t categories must be mutually exclusi'e (no case m ay
appear in more than
one c.1tegory ).
Scaling: The dependent measure (categories) must be nominal.
Expmcd frequmcie$: No expected frequency within a category
should be less than 1,
and no more d1an 20% of t he exp«tcd freq uencies sho uld be
less t han 5.
As wit h all tests of t he null hypothesis. the x2 test begins with
t he assumptions of ran-
d omness and independence. Deriving from t hese assumptions
is the requirement that the
categories in the cross·L1 bulation be mulllnl/y exclusive and
ex/u~ustive.
Mwunlly rtclusive meaJlS that nn individual may not be ill more
thn n one category per
variable. Bxluwsti•-e means that all possible categories are

covered.
let us imagine that we are interested in marijuana use among
high school students and
sp<-cifically whether there are any diffcrcn= in sutb use
between 9th and 12th-graders
in our school di>trict. We conduct • proportionate str atified
samplt in which we ran-
domly s:~mplc oixt)'-five 9th-graders and fifty-five 12th-g
raders from all Mudents in the
district. T he students are surveyed on t heir usc of ((rugs over
the past ye.ar under condi-
tio ns guaranteeing co nfiden tiality of response. Table 6.14
depicts reported marijuana use
f o r t he s tudents in the sam ple o ver the past yenr.
TABLE 6.14 Marijuana Use
None
MatiJuanil
l eta I
Grade
9th 12th
42 33
23 22
65 55
Toto!

75
1 ~0
A higher proport ion of 12th-g raders
than 9th-graders in t his sample used mar-
ijua na at least once during t he past year.
The question we are interested in is
whether it is likely that >uch a sample
could have come from a population in
which the proportion.1 of 9th- and 12th-
graders using mc:1rijuana were identicaL
The usual test used to evaluate such
data is the x: test of i ndepcndcnce. The X1
test evaluates the likelihood that a per·
ccived relationsg1ip between propor tions
in categories (called being dependent)
C HAI'TEII: 6 • STATISTIC-S fOR. Soc•AL Wo~Kflt S 105
co uld have come from a po pulatio n in which no such
relationship existed (call ed
independence) .
The null hypothesis for this example would be that the same
proportion of 9th-graders
as 12th-graders used marijuana during the past year. The null
hypot hesis values for this
test are called the expected frequencies. These expected
frequencies ior marijuana are cal-
culated so as to be proportionately equal for bot h 9th- and 12th
-graders.

Because 45 of 120 of the total sample (9th· and 12th-graders)
used marijuana during
the past year, the proportion for t he total sample is 45f!20 =
.375. The expected frequency
of marijuana use for the sixty-live 9th-graders would be
.375(65) = 24 .375. T he expected
marijuana use fo rthe fifty-five 12th-graders would be .375(55)
= 20.625. Table 6.15 shows
the expected frequencies in parentheses.
The%' test evaluates the likelihe>od of the observed frequency
departing from the
expected freq uency. T he null hypothesis is
H,: P"'- P,,= O,
where P
0
, is the pro port ion of cases within category k in the null
hypothesis population
(e.xpected; in this case, this is the expected proportion of
students in each of the two gt·ade
levels [9th and 12th] who fell into o ne or t he other use
category [marijuana use or no
marijuana usc)}; and P,~ is the proportion of cases wi thin
categor y k drawn from the
actual population (observed; in this case, this is the obser ved
[or obtaine.d] proportion of
students in eacb of t he two grade levels [9th and 12th] who fell
into one or the other use
category [marijuana use or no marijuana use]).
The X'.,, test statistic is
Degrees o f freedom for a x' test of independence are computed

by multiplying the
number of rows minus I times the n umber of columns min us I
or
df= (Row - I )(Colum ns- 1)
TABlE 6.15 Observed and Expected Frequencies for Marijuana
Use
None
Marijuana
Total
9th
42 (40.625)
23 (24.375}
65
N01'E: Expwcd frequencies are in parentheses.
Grade
12th
33 (34.375)
22 (20.675)
55
Total

75
45
120
For Ollr example, this would be
d/=(2 -1}(2 1)=(1)(1)=1
Re.::all from our dbcussion of the ;'.lcNemar change te:.t that
we include the Yates cor
rection for continuit)· in the formula ,,hen df l . The equation
for the corrected test sta
tistic is as follows:
X
1 = I: (Vo- fr,l - 0.5)
1
ul>• /c
The form of the equ~tion tells us to suhtr.ltt the expected ;core
from the observed
>eore and take the ab:.olute value of the difference (make the
difference positive). Then.
subtract O.S fro m the absolute difference (I/., f. I -0.5) and
square t he result. Next. divide
by t he expected score. T his is re~1eated for ca<h observed and
expe<ted score pair. W hc u
we are finished , we sum the answers and obtnin the corre<ted
x· .. ,. test st.ttistic.

The reader might have noticed that t he con ection for the
McNemar c hange test wa,l
I.Q, whereas th e correct ion for the X' test of independence
(and the goodness-ol:fitiCit)
was 0.5. I will not go iuto an)' detail beyond sa)'ing that this is
be.::ause the McNemar
change test uses o nly half of the a••ailable cross-tabulation
cells ( two of four) to computl'
its x.'..,., ••hereas all cells Jre used to compute ;c,.. in the
independence and goodne~< of·
fit tl'sts.
Tnble 6.16 shows how 10 work out the ma rijuJna survey data.
For df= I and ex .05, the critical value fot· x',,.,. is 3.84. Ou r c
alculated value (X',,,l was
0. 1 09. Bec<Juse t he obtuiued (cakuloted) value did not
exceed t he critical value, we wou ld
not reject the null hypothesis at a= .05.
As before, the effe.::t <i>c measure is ";which is wmputed a• a
post h oc measure by
w - Ji.x'/N).
~or a 2 >< 2 tab le, w;, eq ual to the absolute v.tlue of <p (phi),
which i, J true cor relation
cocfticient.. If we sq uare w, t hen we obta in tp' , w h ich is the
propor tion of variance
ex plained (P1£).
T AILE 6.16 Compuution of x' ...
CJb,crved (f0 ) Expected (1, ) (If.- f, J - 0.5)
42 ~() 615 8/~

lJ 14 375 81~
23 ]4.375 .875
n 20 62~ 875
NOTE: 7.' = 0.01 9 + 0.02l + 0.031 + 0.037 ~ 0. 109.
bbt
(If. - f, J- 0.5)' (Jf.- f,l - 0 .5)'
f,
0.7651>2'> 0.019
0.76~6lS 0022
0765675 0.031
0.765625 0.037
CHAI'tfft 6 • Sr.t.nsncs FOil So C-I.t.l WOII(US 107
For our example,
w = /(O. J09/t 20) = Jo.ooo90S3 - oo3o i
and
w' = PVE - .0009.
This is an extremely smaU effect size.
f'or 2 x k tabulation, we cannot convert tv to PVE.
Hypothesis Tests fork > 2 Independent Samples

Irnaginc that we wert: in terested in ageist attitudes among
sodal 'Orkers. Specificall)'> we
are interested in whether there are any d ifferences in the
magnitudes of ageist attitudes
among (a) hospital social workers. ( b) nursing home social
workers, and (c) adult pro tee-
tive services social workers.
We cotdd conduct independent group tests among aU possible
pair ings: hospital (a) with
nursing home (b), hospital (a) with protective services (c), and
nursing home (b) with pro-
tective services (c).
This gives us three tests. When we conduct o ne test at the ex=
.05 levd, we have a
.05 chance of committing a Type I error (rejecting the null
hypothesis when it is tr ue) and
a .95 chance of making a correct decision (not rejecting the null
hypot~esis when it is
true). If 1ve conduct three tests at u = .05, our chance of commi
tting at least one Type I
error increases to about .15 (the precise probability is .
142625). So, we actually are testing
at around 0'. = . 15.
As the number of comparisons incceases, t·he likelihood of
rejecting the null hypothe-
sis "rhen it is true increases. oVe are ((capitalizing on chattce
.'>
One way of dealing with capitalization on chance would be to
use a stricter alpha
leveL f'o r three co mpa risons, we m ight cond uct our tests at u
"' .05/3 "' .0 167.

Unfortunately, if we do th is, then we will reduce the po,ver ( I
- ~) of o ur test to detect a
possible existing effect.
However, there are tests that allow one to detect whether there
are any differences
among groups wiLhout compromising power. This is done by
siJnultaneously eva1U(lting
all groups for any differences. If no d ifferences are detected,
then we fai l to reject the null
hypothesis and stop. No further tests are conducted because w e
already have our ans11w.
The difference> among all gro ups are not sufficien tly large
that we can reject the notion
that all of the samples come from the s ame population.
If significant differences are detected, then further pair
comparisons are conducted to
determine which pairs arc different. T he screening tests do not
tell us whether only one
pair, two pairs, o r all pairs show statistically significant
differences. Screening tests show
only that there are some differences among all possible
comparisons.
lf we conduct our screening test at a ,. .OS, then we will carry
out the pair comparisons
when the null hypothesis is true 1 out of20 times (commit a
Type I error). By conducting
the in itial overall screening in a single test, we protect against
the compounding o f the
alpha level brought on by multiple comparisons.
We look at three examples of screen ing tests fork> 2
independent samples:

I. One-way analysis o f variance (ANOVA) (interval or ratio
scale)
2. Kruskal· Wallis (K· W) test (ordinal scale)
3. X1 test of independence (k x k) (nominal scale)
108 '""' I • QUANTITATIVl AmtOA.CIILS : fOU"-DATIOJr.S
Of DA'rA C.olUCltOh'
One· Way A011dysis of'ariance. The AtOVA is a test of
means. The null hypothesis is
where k is the number of population nocans being estimated.
If all of the means are equal, then it fo llows that the voriance
of the means is 0 or
I 10 : &,. = 0.
The test statistic used in A..'OVA is called F and is calculated
as follows:
n_.;
7
where the numerator is the variance of the sample means mu
ltiplied by the sample size,
and the denominator is a pooled estimntc of the score variances
within the samples.
The assumptions underlying o ne-way ANOVA are as follows:
Randomness: Sample members must be randomly drawn from
the population and randomly

assigned to one of the k groups.
Indepelltltllct: Scores must be independent of each other.
Scalir~g: The dependent measure must be interval or ratio.
Normnl distribution: The populations from which the
individuals in the sam ples were
drawn must be normally d istributed.
Homoge11ciry of variances (oi = o~ = .. . = o~): The samples
must be drawn from pop·
ulntions whose variances arc equal.
&jualiry of sample sizes (n, = n, = ... = 11,): The samples must
be of the same size.
ANOVA involves taking the variability among scores and
detumining which is vari·
ability due to membership in a particular group (variability
a.~sociated with group means
or between-group variance) and which is variability associated
with unexplained fluctua·
tions (wi thin-group variance).
The totnl variability of scores is divided into one componenl
representing the variability
of treatment group means around an overall mean (sometimes
called a grand mean) and
another component representing the variability of group scores
around their own individ·
ual group means. The variability of group means around the
grand mean is called between·
group variance. The variabiliry of individual scores around their
own group means is called
within-group variance. This division is rep.--nted by the

foUowing equation:
{X - X)~ (X -Xl +(X-X).
Total Within Between
The X with two bars represems the grand mean, which is the
mean of all scores with·
out respect to which group they are in. X is a particular score,
and the X with one bar is
the mean of the group to which that score belongs.
C.HAPlUt 6 a STATiiliGS roll: SOCIAl W Oill({fi S 109
This equation illustrates that tbe deviatio n of the particul ar
score fro m t he grand mean
is the sLun of the deviation of the sco re fro m its g roup mean
and the deviation of tbe
g ro up mean fro m t he g rand mean. T his might be a little
dearer if we look at a simple data
set. Let us hlke the exam ple about ageist attit udes among
hospital social workers (Group I),
nursing ho me social workers (Gro up 2), a11d adult protective
services social workers
(Group 3). T be dependent measure quan tifies ageist attitudes
(higher scores represent
n1ore ageist sentiment).
There are k = 3 g ro ups, with each containing n = 4 scores. The
total number of scores
is N= 12. The group means are 3 (Gro up 1 ), 5 (G roup 2), and
9 (Grotlp 3), and the grand
mea n is 5.67.
There are t hree types of sum of squares calculated in AN OVA.

T he fo rm ulas fo r the
sums of sq uares are derived fro m t he deviatio n score C<j
uations.
ss, ...
1
is calculated by subtracting the grand mean from each score,
squaring the differ-
ences, and add ing up (summing) the squared differences:
=2
ss,."' = (X - Xl .
ss .... m is calculated by subtracting the group mean fro m each
score within a group,
squaring the differences, a nd adding up (summing) the squared
differences fo r each
g ro up. This gives us t hree s ums of squares: sswoup I'
SSC.,>I>p , . and SS.;ooup>· These are added
up to give us ssv.·ilhin:
- 2 - 2 - 2
ssW'''" = r <x - x,J + r <x - x,) + r <x - x,J .
s~.~ is calculated by subtracting t he g rand mea n from each
group mean, squaring
the diffe rences, and adding up (summing) the squared
differences. Then, we multiply the
to tal by the sample size. This is because this sum of squares
needs to be weighted. Whereas
N = 12 scores ~~ent to make up SS10,.1, and ( k)(n) = (3)(4) =
12 scores went to m ake up
SS., ... ,,,, o nly the k= 3 g roup means went to make upS~""'".
We m ultiply by 11 = •l so that
S~~ will have t he same " 'eig ht as tlte o ther two sums of

squares:
S~"'""' = " I (X - X)'.
The sums of squares arc as fo llow·s:
SS,.;,'"' = 20 + 20 + 20 = 60
s~ ..... ,"' (4) 18.667 = 74 .667
ss ... ,, = 134.667.
The to tal sum of squares (SS~,1 ) is t he sum of the within-g ro
up su m of sq <Lares
(SS.,.,.,) and the between-group sum of squares (55,....,,):
o r
134.667 = 60.00 + 74.667.
110 PAtH I a Q u AN11JA1 1V[ APPI0A(H£S: FOUIIOAltO~S
Of 0 AlA COlltCTIO!.'
Each of these sums o f squares is a component o f a d iffere nt
variance. In ANOVA jar-
gon, a variance is called a mean square. Each particular m ean
square ( variance) has its
own degrees of freedom .
Because the total sum o f squares (SS,.,1) involves t he varia
bility o f all scores aro und
o ne grand mean, the degrees of freedom ar e N - l. The within-
groups sum of squares
(SSw"''") involves the variability of all scores wit hin g roups
around k g ro up m eans, where

k is the n umber o f g ro ups. So, the within-groups degrees o f
freedo m are N- k. T he
between-groups sum of squares($""""') involves the va riability
of k gr o up m eans
around the grand mea n. So, the between-g roups degrees of
freed om are k - J.
BeCtJase :1 (/tlritlii<'Y:' (meoll sqa,?re) is,? Rllll of square>
diviOed br degrees of freedom,
the fo rmu la fo r a m ean square would be MS ~ SSitlf
Two mean squares are u::;ed to calcnlate the Fubt statistic:
MS~·i!Jun and A-f~,wMn · Their
specific fo rm ulas are as follows:
There are k ~ 3 groups, so df,"""" = k - 1 = 3- 1 = 2. We may
now compute
A•f""'" = i 4.66712 = 3i.333
and
T here are a to tal of N = 12 scores within k = 3. so di,;,,;0 =
12- 3 = 9 and MS .. n ,h;, ~ 60/9
~ 6.667.
These are the two variances u~ed ro m ake up the F ratio (F ••
,): MS.., • ...., and MS,.,,,,.
The fo rm LLla for F •• , is
MSt,.,w..,n
MSwulUn .
l f we plug in t he values from o ur example, t hen we obtain
fo~x = MSb""'"" = 37.333 = S.6s.
MS,,;,hin 6.667

This is a bit confusing when presented in bits aJ1d pieces. The
ANOVA sununary table
is a way of p resent ing t he information about the sums of
squares, degrees of freedom,
mean squares, and F statistics in a more easily understood
fashion. Table 6 . 17 uses the
example data.
Once we have computed the Poht' iL is compared to a critical F.
Because two variances
were used to calculate o ur F •• ,. there are two types of degrees
o f freedom asso ciated with
it: n umerator deg rees o f freedom (between g ro u ps) and de;w
.minator d egrees of freedom
(within g roups). T hese are used either to look up values in a
table o f the F distribution or
by computer programs to com pu te p values.
For our example, the n umerator degrees o f freedo m are df = 2
because 2 degr ees of
freedom were used in the calculation o f MS,""'"'' The d
enominator d egrees of freedom
C HJo i'IU 6 • S t ATISTIC.S fO ft S OCtAl 1N CIIUP.S 111
TABLE 6 . 17 ANOVA Summary Table
Source Sum of Squares Degrees of Fceedom Mean Squar~ F
11111
B~tween 74.667 3 - 1 - 2 74.67/2 = 37 333 37..333/6 667 = 5 65
Within

Total
60.00
134.667
12 - 3 - 9 60.00/ 9 = 6.667
12- 1 • 11
are df: 9 because 9 degrees of freedom were used in the
calculation of MS . .,,h;, · The criti-
cal value for Fat 2 and 9 degrees of freedom is .t~"' = 4.26.
Because F..,,: 5.6 is greater than
the critical value, we reject the null hypothesis at«= .OS.
Based on these findin gs, it is likely th at at least one pair of
means come from d ifferent
populations. Because we already have screened out other
opportuni ties LO commit'I)'Pe 1
error, further testing would not be capi[aiizing on chance. Thus,
we may carry out the fol-
lowing pair comparisons:
Group l versus Group 2
Group I versus Group 3
Group 2 versus Group 3
The individual pair comparisons may be carried out using any of
a number of multi-
ple comparison tests. One of the more frequently used is the
least significant difference
(LSD) test. The l.SD test is a variant on the t test. However, the

standard error of the mean
is calculated from the within-groups mean square (variance)
from the ANOVA:
where
tt, is the nwnber of scores in Group i, and
tt, is the number of scores in Group J.
If the group TIS are equal, then this becomes
For our example,
Sx;-.<_; = )(2}(6 .667)/4 = J3.333 = 0.557.
We now maycarry oul our comparisons evaluating tat df= N - k=
12 - 3 = 9 (Figure 6.6).
In all three instances, we reject the rwll hypothesis at a = .OS.
I
Figure 6 .6
Multiple Comparisons
Hospilal (Group I) vs t - 3 - 5 - 3466 df= 9,«= 05
Nursing Home (Group 2) "' - 0.577 - . / t!tl = 2.262
Reject H.
Hosprtal (Group 1) vs. r.,. •• g;~ = 10399 Clf = 9, a- .05
Adult Protective Services t .. , = 2.262
(Group 3) Reject H.

Nursrng Home (Group 2) '-=~5~ = 6.932 Clf = 9,a ~ 05 vs.
Adun Pro!ectrve la.= 2.262
Services (Group 3) Rejecl H.
T here are a number of measure> for effect size for ru'0'A. For
the >.Ike of srmplicity,
we d eal wit h rwo: Cohen'• (1988) J and 1{
The J effect· size mca>ure is eq ual to Lhe stand ard deviatio n
of th e sam ple means divided
by the pooled "ithin group standard devialion. It ranges from a
min imum of 0 to an
rndetinitcly large upper limit. It m~) be estimated from F..,. by
using the following for mula:
f = JnFobr·
11' wa, discussed earlier and defined as a proportion of variance
explarned. It is calcu-
laled by the fo llowing formula:
l S.'itwlwttn
1) =-- - . ss,,,,.,
It also may be calcul.lled from art F.,.:
Cohen ( 1988) categorizes these effect si1-"s into small,
medium, and large categories.
The critcri~ lor each are as folio" s:
Sm all cfYcct size: f :. .lO
Medium efYect size: f; .25
Large effect size: f .40
Using the exarn plr dJLa, 11' is

11' = .0 1
11' ; .06
11'; . 14
z SSt.,,.... 74.667
'l = = 0.554.
ss,"'·'' t 34.667
CHArtfa 6 • Sr.c..nsTIC;.s fQI SociAL WoRKEss 113
which is a very large effect.
Kmskal-Wal!is Test. The K-W test is the k > 2 groups
equivalent o f the W/M -W test.
The test involves iniliall y treating all samples as one gro up
and ranking scores from
least to most. After this is done, the frequenc ies of low and
high ranks among groups <1re
compared.
The assumptions of the K-W test are as follows:
Rat~donmess: Sample members must be randomly drawn from
the population of inter-
est and randomly assigned to one of the k groups.
Independence: Scores must be independent of each other.
Scali?Jg: The dependent measure must be ordi nal (interval or
ratio scores must be con-

verted to ranks).
When the assumptions of ANOVA arc mer, the analysis of
variance will be sligh tly
more po<,•erful than the K -W test. However, if the distribution
of population scores is not
normal and/or the population variances are not equal. then the
K-W test might be the
more powerful test.
The K-W test is a screening test. If th ere is no significant
difference foun d, then we stop
testing. If a significant difference is fo und, then we proceed to
test ind ividual pairs with
the W/M -W test.
Our example involves the evaluation of three interven tion
techniques being used with
clients who wish to stop making negative self-statements: (a)
self-disputation,
(b) thought stopping, and (c) identifying the source of the
negative statement (insight). A
total o r 27 clients with this concern were randomly selected
and assigned to one of the
three intervention conditions. On the 28th day of the
intervention, each client counted
the n umber of negative self-statementS that he or she had
made.
The proced ure for tlle K-W test is s imilar to that for the W/M-
W test. We begin by
assigning ranks to the scores without regard to which group
individuals were in. We then
sum the ranks within each group. The sununed ranks are called
W, for Group I, W2 for
Group 2, and W, fo r Group 3 (Table 6 .18).

Trochim, W. M. K. (2006). Internal validity.httpwww.socialres

Trochim, W. M. K. (2006). Internal validity.httpwww.socialres

Recommended

Recommended

More Related Content

Similar to Trochim, W. M. K. (2006). Internal validity.httpwww.socialres

Similar to Trochim, W. M. K. (2006). Internal validity.httpwww.socialres (20)

More from curranalmeta

More from curranalmeta (20)

Recently uploaded

Recently uploaded (20)

Trochim, W. M. K. (2006). Internal validity.httpwww.socialres