Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
classmar16.ppt
1. IS 4800 Empirical Research Methods
for Information Science
Class Notes March 16, 2012
Instructor: Prof. Carole Hafner, 446 WVH
hafner@ccs.neu.edu Tel: 617-373-5116
Course Web site: www.ccs.neu.edu/course/is4800sp12/
2. Outline
• Sampling and statistics (cont.)
• T test for paired samples
• T test for independent means
• Analysis of Variance
• Two way analysis of Variance
4. 4
Relationship Between Population
and Samples When a Treatment
Had An Effect
Control
group
population
c
Control
group
sample
Mc
Treatment
group
sample
Mt
Treatment
group
population
t
5. Population
Mean? Variance?
2
Sampling
Sample of size N
Mean values from all possible
samples of size N
aka “distribution of means” MM =
N
X
M
=
N
M
2
2
=
N
M
X
SD
=
2
2
)
(
ZM = ( M - ) / M
6. Z tests and t-tests
t is like Z:
Z = M - μ /
t = M – μ / μ = 0 for paired samples
We use a stricter criterion (t) instead of Z because is
based on an estimate of the population variance while is
based on a known population variance.
M
M
S
M
S
M
S2 = Σ (X - M)2 = SS
N – 1 N-1
S2
M = S2/N
7. Given info about
population of change
scores and the
sample size we will
be using (N)
T-test with paired samples
Now, given a
particular sample of
change scores of
size N
We can compute the
distribution of means
We compute its mean
and finally determine
the probability that this
mean occurred by
chance
?
= 0
S2 est 2 from sample = SS/df
M
S
M
t =
df = N-1
S2
M = S2/N
8. t test for independent samples
Given two
samples
Estimate population
variances
(assume same)
Estimate variances
of distributions
of means
Estimate variance
of differences
between means
(mean = 0)
This is now your
comparison distribution
9. Estimating the Population Variance
S2 is an estimate of σ2
S2 = SS/(N-1) for one sample (take sq root for S)
For two independent samples – “pooled estimate”:
S2 = df1/dfTotal * S1
2 + df2/dfTotal * S2
2
dfTotal = df1 + df2 = (N1 -1) + (N2 – 1)
From this calculate variance of sample means: S2
M = S2/N
needed to compute t statistic
S2
difference = S2
Pooled / N1 + S2
Pooled / N2
10. t test for independent samples, continued
This is your
comparison distribution
NOT normal, is a ‘t’
distribution
Shape changes depending on
df
df = (N1 – 1) + (N2 – 1)
Distribution of differences
between means
Compute t = (M1-M2)/SDifference
Determine if beyond cutoff score
for test parameters (df,sig, tails)
from lookup table.
11. ANOVA: When to use
• Categorial IV
numerical DV (same as t-test)
• HOWEVER:
– There are more than 2 levels of IV so:
– (M1 – M2) / Sm won’t work
13. 13
Basic Logic of ANOVA
• Null hypothesis
– Means of all groups are equal.
• Test: do the means differ more than expected
give the null hypothesis?
• Terminology
– Group = Condition = Cell
14. 14
Accompanying Statistics
• Experimental
– Between-subjects
• Single factor, N-level (for N>2)
– One-way Analysis of Variance (ANOVA)
• Two factor, two-level (or more!)
– Factorial Analysis of Variance
– AKA N-way Analysis of Variance (for N IVs)
– AKA N-factor ANOVA
– Within-subjects
• Repeated-measures ANOVA (not discussed)
– AKA within-subjects ANOVA
15. 15
• The Analysis of Variance is used when you have more
than two groups in an experiment
– The F-ratio is the statistic computed in an Analysis of
Variance and is compared to critical values of F
– The analysis of variance may be used with unequal sample
size (weighted or unweighted means analysis)
– When there are just 2 groups, ANOVA is equivalent to the t
test for independent means
ANOVA: Single factor, N-level
(for N>2)
16. One-Way ANOVA – Assuming
Null Hypothesis is True…
Within-Group Estimate
Of Population Variance
2
1
est
2
2
est
2
3
est
2
est
within
Between-Group Estimate
Of Population Variance
M1
M2
M3
2
est
between
2
2
est
within
est
between
F
=
21. Using the F Statistic
• Use a table for F(BDF, WDF)
– And also α
BDF = between-groups degrees of freedom =
number of groups -1
WDF = within-groups degrees of freedom =
Σ df for all groups = N – number of groups
25. SPSS Results…
ANOVA
Performance
24.813 2 12.406 9.442 .001
27.594 21 1.314
52.406 23
Between Groups
Within Groups
Total
Sum of
Squares df Mean Square F Sig.
F(2,21)=9.442, p<.05
26. 26
Factorial Designs
• Two or more nominal independent variables,
each with two or more levels, and a numeric
dependent variable.
• Factorial ANOVA teases apart the contribution
of each variable separately.
• For N IVs, aka “N-way” ANOVA
27. 27
Factorial Designs
• Adding a second independent variable to a single-
factor design results in a FACTORIAL DESIGN
• Two components can be assessed
– The MAIN EFFECT of each independent variable
• The separate effect of each independent variable
• Analogous to separate experiments involving those variables
– The INTERACTION between independent variables
• When the effect of one independent variable changes over levels of a
second
• Or– when the effect of one variable depends on the level of the other
variable.
29. 0
2
4
6
8
10
12
Level 1 Level 2
Level of Independent Variable A
Value
of
the
Dependent
Variable
Level 1 Level 2
Example of An Interaction - Student Center Sign –
2 Genders x 2 Sign Conditions
F
M
No
Sign
Sign
34. 34
Degrees of Freedom
• df for between-group variance estimates for main
effects
– Number of levels – 1
• df for between-group variance estimates for
interaction effect
– Total num cells – df for both main effects – 1
– e.g. 2x2 => 4 – (1+1) – 1 = 1
• df for within-group variance estimate
– Sum of df for each cell = N – num cells
• Report: “F(bet-group, within-group)=F, Sig.”
35. Publication format
Tests of Between-Subjects Effects
Dependent Variable: Performance
26.507a
5 5.301 3.685 .018
210.855 1 210.855 146.547 .000
20.728 2 10.364 7.203 .005
.002 1 .002 .001 .974
1.680 2 .840 .584 .568
25.899 18 1.439
401.250 24
52.406 23
Source
Corrected Model
Intercept
TrainingDays
Trainer
TrainingDays * Trainer
Error
Total
Corrected Total
Type III Sum
of Squares df Mean Square F Sig.
R Squared = .506 (Adjusted R Squared = .369)
a.
N=24, 2x3=6 cells => df TrainingDays=2,
df within-group variance=24-6=18
=> F(2,18)=7.20, p<.05
36. 36
Reporting rule
• IF you have a significant interaction
• THEN
– If 2x2 study: do not report main effects, even if
significant
– Else: must look at patterns of means in cells to
determine whether to report main effects or not.
42. “Factorial Design”
• Not all cells in your design need to be tested
– But if they are, it is a “full factorial design”, and you
do a “full factorial ANOVA”
Real-Time Retrospective
Agent
Text
X
43. 43
Higher-Order Factorial Designs
• More than two independent variables are included in a
higher-order factorial design
– As factors are added, the complexity of the experimental
design increases
• The number of possible main effects and interactions increases
• The number of subjects required increases
• The volume of materials and amount of time needed to complete the
experiment increases