This document contains class notes from an empirical research methods course. It outlines key concepts related to sampling, statistics, experimental design, and data analysis techniques including t-tests, analysis of variance (ANOVA), and factorial ANOVA. Examples are provided to illustrate how to conduct statistical tests in SPSS and how to interpret and report results. Key terms are defined throughout to explain assumptions, computations, and interpretations of different statistical analyses.
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
IS 4800 Research Methods Notes
1. IS 4800 Empirical Research Methods
for Information Science
Class Notes March 16, 2012
Instructor: Prof. Carole Hafner, 446 WVH
hafner@ccs.neu.edu Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
2. Outline
• Sampling and statistics (cont.)
• T test for paired samples
• T test for independent means
• Analysis of Variance
• Two way analysis of Variance
4. 4
Relationship Between Population
and Samples When a Treatment
Had An Effect
Control
group
population
c
Control
group
sample
Mc
Treatment
group
sample
Mt
Treatment
group
population
t
5. Population
Mean? Variance?
2
Sampling
Sample of size N
Mean values from all possible
samples of size N
aka “distribution of means” MM =
N
X
M
=
N
M
2
2
=
N
M
X
SD
=
2
2
)
(
ZM = ( M - ) / M
6. Z tests and t-tests
t is like Z:
Z = M - μ /
t = M – μ / μ = 0 for paired samples
We use a stricter criterion (t) instead of Z because is
based on an estimate of the population variance while is
based on a known population variance.
M
M
S
M
S
M
S2 = Σ (X - M)2 = SS
N – 1 N-1
S2
M = S2/N
7. Given info about
population of change
scores and the
sample size we will
be using (N)
T-test with paired samples
Now, given a
particular sample of
change scores of
size N
We can compute the
distribution of means
We compute its mean
and finally determine
the probability that this
mean occurred by
chance
?
= 0
S2 est 2 from sample = SS/df
M
S
M
t =
df = N-1
S2
M = S2/N
8. t test for independent samples
Given two
samples
Estimate population
variances
(assume same)
Estimate variances
of distributions
of means
Estimate variance
of differences
between means
(mean = 0)
This is now your
comparison distribution
9. Estimating the Population Variance
S2 is an estimate of σ2
S2 = SS/(N-1) for one sample (take sq root for S)
For two independent samples – “pooled estimate”:
S2 = df1/dfTotal * S1
2 + df2/dfTotal * S2
2
dfTotal = df1 + df2 = (N1 -1) + (N2 – 1)
From this calculate variance of sample means: S2
M = S2/N
needed to compute t statistic
S2
difference = S2
Pooled / N1 + S2
Pooled / N2
10. t test for independent samples, continued
This is your
comparison distribution
NOT normal, is a ‘t’
distribution
Shape changes depending on
df
df = (N1 – 1) + (N2 – 1)
Distribution of differences
between means
Compute t = (M1-M2)/SDifference
Determine if beyond cutoff score
for test parameters (df,sig, tails)
from lookup table.
11. ANOVA: When to use
• Categorial IV
numerical DV (same as t-test)
• HOWEVER:
– There are more than 2 levels of IV so:
– (M1 – M2) / Sm won’t work
13. 13
Basic Logic of ANOVA
• Null hypothesis
– Means of all groups are equal.
• Test: do the means differ more than expected
give the null hypothesis?
• Terminology
– Group = Condition = Cell
14. 14
Accompanying Statistics
• Experimental
– Between-subjects
• Single factor, N-level (for N>2)
– One-way Analysis of Variance (ANOVA)
• Two factor, two-level (or more!)
– Factorial Analysis of Variance
– AKA N-way Analysis of Variance (for N IVs)
– AKA N-factor ANOVA
– Within-subjects
• Repeated-measures ANOVA (not discussed)
– AKA within-subjects ANOVA
15. 15
• The Analysis of Variance is used when you have more
than two groups in an experiment
– The F-ratio is the statistic computed in an Analysis of
Variance and is compared to critical values of F
– The analysis of variance may be used with unequal sample
size (weighted or unweighted means analysis)
– When there are just 2 groups, ANOVA is equivalent to the t
test for independent means
ANOVA: Single factor, N-level
(for N>2)
16. One-Way ANOVA – Assuming
Null Hypothesis is True…
Within-Group Estimate
Of Population Variance
2
1
est
2
2
est
2
3
est
2
est
within
Between-Group Estimate
Of Population Variance
M1
M2
M3
2
est
between
2
2
est
within
est
between
F
=
21. Using the F Statistic
• Use a table for F(BDF, WDF)
– And also α
BDF = between-groups degrees of freedom =
number of groups -1
WDF = within-groups degrees of freedom =
Σ df for all groups = N – number of groups
25. SPSS Results…
ANOVA
Performance
24.813 2 12.406 9.442 .001
27.594 21 1.314
52.406 23
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
F(2,21)=9.442, p<.05
26. 26
Factorial Designs
• Two or more nominal independent variables,
each with two or more levels, and a numeric
dependent variable.
• Factorial ANOVA teases apart the contribution
of each variable separately.
• For N IVs, aka “N-way” ANOVA
27. 27
Factorial Designs
• Adding a second independent variable to a single-
factor design results in a FACTORIAL DESIGN
• Two components can be assessed
– The MAIN EFFECT of each independent variable
• The separate effect of each independent variable
• Analogous to separate experiments involving those variables
– The INTERACTION between independent variables
• When the effect of one independent variable changes over levels of a
second
• Or– when the effect of one variable depends on the level of the other
variable.
29. 0
2
4
6
8
10
12
Level 1 Level 2
Level of Independent Variable A
Value
of
the
Dependent
Variable
Level 1 Level 2
Example of An Interaction - Student Center Sign –
2 Genders x 2 Sign Conditions
F
M
No
Sign
Sign
34. 34
Degrees of Freedom
• df for between-group variance estimates for main
effects
– Number of levels – 1
• df for between-group variance estimates for
interaction effect
– Total num cells – df for both main effects – 1
– e.g. 2x2 => 4 – (1+1) – 1 = 1
• df for within-group variance estimate
– Sum of df for each cell = N – num cells
• Report: “F(bet-group, within-group)=F, Sig.”
35. Publication format
Tests of Between-Subjects Effects
Dependent Variable: Performance
26.507a
5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
Source
Corrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)
a.
N=24, 2x3=6 cells => df TrainingDays=2,
df within-group variance=24-6=18
=> F(2,18)=7.20, p<.05
36. 36
Reporting rule
• IF you have a significant interaction
• THEN
– If 2x2 study: do not report main effects, even if
significant
– Else: must look at patterns of means in cells to
determine whether to report main effects or not.
42. “Factorial Design”
• Not all cells in your design need to be tested
– But if they are, it is a “full factorial design”, and you
do a “full factorial ANOVA”
Real-Time Retrospective
Agent
Text
X
43. 43
Higher-Order Factorial Designs
• More than two independent variables are included in a
higher-order factorial design
– As factors are added, the complexity of the experimental
design increases
• The number of possible main effects and interactions increases
• The number of subjects required increases
• The volume of materials and amount of time needed to complete the
experiment increases