SlideShare a Scribd company logo
1 of 96
Quasi-Experimental Workshop
Tom Cook and Will Shadish
Supported by Institute for
Educational Sciences
Introductions: People
Workshop Staff
Your names and affiliations
Logic for selecting you
Introduction: Purposes
• Briefly describe the sorry, but improving, current
state of causal research in education
• Briefly learn why the randomized experiment is
so preferred when it is feasible
• Learn better quasi-experimental designs and
principles for creating novel designs
• Improve your own causal research projects
through discussion with Will and Tom
• Disseminate better practices when go home
• Have fun, eat well, meet interesting new people
Decorum
No dress code
Please interrupt for clarification
Engage instructors in side conversations at
breaks and meals
No titles, only first names
Schedule by Day
• Morning session, lunch period, afternoon
session, rest or talk 4 to 6, dinner
• Lunch to meet others, let off steam, or
discuss your own projects with Tom or Will
• Rest period to catch up on home, meet
Tom or Will, discuss material with others
• Dinners are to have fun and socialize
And at the End of the Workshop
• We will provide a web site and email
addresses for followup
• Provide you with finalized sets of
powerpoint slides (since we do tend to
change them a little at each workshop).
• The Current State of the Causal Art in
Education Research
Education leads the other Social
Sciences in Methods for:
• Meta-Analysis
• Hierarchical Linear Modeling
• Psychometrics
• Analysis of Individual Change
• Next 5 days do not entail an indictment of
education research methods writ large
• Only of its methods for identifying what
works, identifying causal relationships
State of Practice in Causal
Research in Education
• Theory and research suggest best methods for
causal purposes. Yet
• Low prevalence of randomized experiments--
Ehri; Mosteller
• Low prevalence of Regression-discontinuity
• Low prevalence of interrupted time series
• Low prevalence of studies combining control
groups, pretests and sophisticated matching
• High prevalence of weakest designs: pretest-
posttest only; non-equivalent control group
without a pretest; and even with it.
Forces Impelling Change in
Causal Research Practice
• General dissatisfaction with knowledge of “what
works” in education
• IES’ experimental agenda and its control over
many U.S. research funds
• Role of some foundations, esp. W.T. Grant
• Growth of applied micro-economists in
education research in USA and abroad
• Better causal research practice in early
childhood education and school-based
prevention -- why?
RA in American Educational
Research Today
Heavily promoted at IES, NIH and in some
Foundations
• Normative in pre-school research (ASPE)
and in research on behavior problems in
schools (IES and NIJJ)
• Reality in terms of Funding Decisions
• Growing reality in terms of publication
decisions
IES Causal Programs in Bush
Administration
• National Evaluations mandated by Congress or
some Title and done by contract research firms
• Program Announcements for Field-Initiated
Studies mostly done in universities
• Training and Center Grants to universities
• What Works Clearinghouse
• Regional Labs
• SREE - Society for Research in Educational
Effectiveness
• Unusual degree of focus on a single method
Institutionalizing the Agenda
• Depends on more persons able and willing to
assign at random entering into ed research
• Degree to which the opposition is mobilized to
fight random assignment
• Emerging results show few large effects - shall
we shoot the messenger?
• Not yet totally clear how much the old Bush
priority for RA is being pursued, but there has
been clear change towards RA in ed research
Our Main Purpose
• To raise quality of causal research, we will:
• Do one afternoon on randomized experiments,
though some content also applies to quasi-exps
• A day on Regression-discontinuity
• A half-day on short interrupted time-series, with
some material on value-added analyses
• A day on various sample- and individual case-
matching practices, good and bad
• A day on other causal design principles that are
not predicated on matching cases for
comparability
Terminology to Keep on Track
• Experimentation - deliberate intrusion into an
ongoing process to identify effects of that
intrusion - exogenous shock
• Randomized experiments involve assignment to
treatment and comparison groups based on
chance--examples
• Natural experiment denotes some sudden and
non-researcher controlled intrusion into an
ongoing process--examples
Terminology
• Quasi-experiments involve assignment to
treatment not based on chance--examples
• A non-experiment seeks to draw
conclusions about a causal agent that is
not deliberately manipulated nor suddenly
intrudes into an ongoing process -- e.g.,
cutting into an endogenous process
relating attention to learning gains
Today we will
• Discuss what we mean by causation
• Discuss threats to validity, esp. internal
validity
• Analyze the randomized experiment as the
archetypal causal study
• Discuss the limitations to doing
experiments in real school settings
• Discuss ways of circumventing these
limitations
Some Working Conceptions
of Causation
Activity or Manipulability Theory
from Philosophy of Science
• What is it?
• Some examples from daily life and science
• Why it is important for practice and policy
• How it relates to experimentation
• Illustrating its major limitations through
confrontation with other theories of causation
Mackie’s INUS Conditional
Causal agents as Insufficient but Necessary
Parts of Unnecessary but Sufficient conditions
for an effect
• Example of all the “hidden” factors it takes for a
matchstick to cause fire dependably or for class
size decreases to cause learning
• Experimentation is partial because it teaches us
about few causal contingencies
• Full causal knowledge requires knowing the
causal role of multiple contingency variables
• So the conclusion from any one study may be
unstable - causal heterogeneity.
Cronbach’s UTOS Formulation
• Studies require Units, Treatments, Outcomes
(Observations), Settings -- and also Times
• These condition the results of any one causal
claim from an experiment -- some examples
• Implies 2 things: Unit of progress is review not
single study; and the greater value of identifying
general causal mediating processes over ever
more causal contingency variables
• Both causal explanation and probes of causal
robustness require studies whose causal
conclusions we can trust! Hence this workshop.
Another Way of Saying this (1)
• More than study-specific causal descriptions
from A to B, Science values (a) explanatory
causal knowledge of why A affects B and (b)
causal descriptions that robustly replicate across
multiple, heterogeneous studies
• Aspirations of science should also animate
public policy cos each is more helpful when it
applies stable knowledge
• Experimentation is useful because causal
explanation always contains causal descriptions
that are better if stable. Why explain causal
phenomena that are wrong or weakly replicable?
Another Way of Saying this (2)
• Reviews allow us to establish dependability of a
causal connection IF the UTOS sampling frame
is heterogeneous
• Reviews allow us to identify some specific
moderator and mediator variables
• But reviews require at least some individual
causal conclusions we trust. Why review many
studies if they are biased in same direction?
• Hence this workshop. Good knowledge of
descriptive causal connections facilitates both
explanations and reviews that are dependable
and so less dependent on unknown conditions
• Now we turn to the best explicated theory
of descriptive causal practice for the social
sciences:
Rubin’s Causal Model
Rubin’s Counterfactual Model
• At a conceptual level, this is a counterfactual
model of causation.
– An observed treatment given to a person. The
outcome of that treatment is Y(1)
– The counterfactual is the outcome that would have
happened Y(0) if the person had not received the
treatment.
– An effect is the difference between what did
happen and what would have happened:
Effect = Y(1) – Y(0).
• Unfortunately, it is impossible to observe the
counterfactual, so much of experimental
design is about finding a credible source of
counterfactual inference.
Rubin’s Model:
Potential Outcomes
• Rubin often refers to this model as a
“potential outcomes” model.
• Before an experiment starts, each
participant has two potential outcomes,
– Y(1): Their outcome given treatment
– Y(0): Their outcome without treatment
• This can be diagrammed as follows:
Rubin’s Potential Outcomes Model
Units Potential Outcomes Causal Effects
Treatment Control
1 Y1(1) Y1(0) Y1(1) – Y1(0)
i Yi(1) Yii(0) Yi(1) – Yi(0)
N YN(1) YN(0) YN(1) – YN(0)

 



)
1
(
Y )
0
(
Y
Under this
model, we
can get a
causal effect
for each
person.
)
0
(
)
1
( Y
Y 
And we can get an average causal effect as the difference between group means.
Rubin’s Potential Outcomes Model
Units Potential Outcomes Causal Effects
Treatment Control
1 Y1(1) Y1(0) Y1(1) – Y1(0)
i Yi(1) Yii(0) Yi(1) – Yi(0)
N YN(1) YN(0) YN(1) – YN(0)

 



)
1
(
Y )
0
(
Y
Unfortunately, we can only observe one of the two potential outcomes for each unit.
Rubin proposed that we do so randomly, which we accomplish by random assignment:
)
0
(
)
1
( Y
Y 
Rubin’s Potential Outcomes
Model
Units Potential Outcomes Causal Effects
Treatment Control
1 Y1(1)
i Yii(0)
N YN(1)

 



)
0
(
)
1
( Y
Y 
)
0
(
Y
The cost of doing this is that we can no longer estimate individual causal effects. But we can still
estimate Average Causal Effect (ACE) as the difference between the two group means. This
estimate is unbiased because the potential outcomes are missing completely at random.
)
1
(
Y
Rubin’s Model and Quasi-
Experiments
• The aim is to construct a good source of
counterfactual inference given that we
cannot assign randomly, for example
– Well-matched groups
– Persons as their own controls
• Rubin has also created statistical methods
for helping in this task:
– Propensity scores
– Hidden bias analysis
Is Rubin’s Model Universally
Applicable?
• Natural Sciences invoke causation and they
experiment, but they rarely use comparison
groups for matching purposes
• They pattern-match instead, creating either a
• Very specific hypothesis as a point prediction; or
• Very elaborate hypothesis that is then tested via
re-application and removal of treatment under
experimenter control
• We will later use insights from this notion to
construct a non-matching approach to causal
inference in quasi-experiments to complement
matching approaches
Very Brief Exigesis of Validity
• This goes over some well known ground
• But it forces us to be explicit about the
issues on which we prioritize in this
workshop
Validity
• We do (or read about) a quasi-experiment that
gathered (or reported) data
• Then we make all sorts of inferences from the
data
– About whether the treatment worked
– About whether it might work elsewhere
• The question of validity is the question of the
truth of those inferences.
• Campbell’s validity typology is one way to
organize our thinking about inferences.
Campbell’s Validity Typology
• As developed by Campbell (1957), Campbell & Stanley
(1963), Cook & Campbell (1979), with very minor
changes in Shadish, Cook & Campbell (2002)
– Internal Validity
– Statistical Conclusion Validity
– Construct Validity
– External Validity
• Each of the validity types has prototypical threats to
validity—common reasons why we are often wrong
about each of the four inferences.
Internal Validity
• Internal Validity: The validity of inferences about
whether observed covariation between A (the
presumed treatment) and B (the presumed
outcome) reflects a causal relationship from A to
B, as those variables were manipulated or
measured.
• Or more simply—did the treatment affect the
outcome?
• This will be the main priority in this workshop.
Threats to Internal Validity
1. Ambiguous Temporal Precedence
2. Selection
3. History
4. Maturation
5. Regression
6. Attrition
7. Testing
8. Instrumentation
9. Additive and Interactive Effects of Threats to Internal
Validity
Think of these threats as specific kinds of
counterfactuals—things that might have
happened to the participants if they had not
received treatment.
Statistical Conclusion Validity
• Statistical Conclusion Validity: The validity of
inferences about the correlation (covariation)
between treatment and outcome.
• Closely tied to Internal Validity
– SCV asks if the two variables are correlated
– IV asks if that correlation is due to causation
Threats to Statistical Conclusion
Validity
1. Low Statistical Power (very common)
2. Violated Assumptions of Statistical Tests
(especially problems of nesting—students nested
in classes)
3. Fishing and the Error Rate Problem
4. Unreliability of Measures
5. Restriction of Range
6. Unreliability of Treatment Implementation
7. Extraneous Variance in the Experimental Setting
8. Heterogeneity of Units
9. Inaccurate Effect Size Estimation
Construct Validity
• Construct Validity: The validity of inferences
about the higher-order constructs that represent
sampling particulars.
– We do things in experiments
– We talk about the things we did in our reports
– One way to think about construct validity is that it is
about how accurately our talk matches what we
actually did.
External Validity
• External Validity: The validity of inferences about
whether the cause-effect relationship holds over
variation in persons, settings, treatment
variables, and measurement variables.
• Always the “stepchild” in Campbell’s work, Cook
has developed a theory of causal generalization
addressing both construct and external validity.
• But that is another workshop.
Validity Priorities for This
Workshop
Main Focus is Internal Validity
Statistical Conclusion Validity: Because it is so closely tied
to Internal Validity
Relatively little focus
Construct Validity
External Validity
Randomized Experiments
with Individual Students and
with Clusters of Classrooms or Schools
Randomized Control Trials:
Some Selective Issues
1. Logic of random assignment
2. Clarification of Assumptions of RCTs
3. Recent Advances for Dealing with Partial
and not Full Implementation of Treatment
4. Recent Advances in Dealing with Sample
Size Needs when assigning Schools or
Classrooms rather than Students
What is an Experiment?
• The key feature common to all
experiments is to deliberately manipulate a
cause in order to discover its effects
• Note this differentiates experiments from
– Case control studies, which first identify an
effect, and then try to discover causes, a
much harder task
Random Assignment
• Any procedure that assigns units to
conditions based only on chance, where
each unit has a nonzero probability of
being assigned to a condition
– Coin toss
– Dice roll
– Lottery
– More formal methods (more shortly)
What Random Assignment Is
Not
• Random assignment is not random
sampling
– Random sampling is rarely feasible in
experiments
• Random assignment does not require that
every unit have an equal probability of
being assigned to conditions
– You can assign unequal proportions to
conditions
Equating on Expectation
• Randomization equates groups on expectation for all
observed and unobserved variables, not in each
experiment
– In quasi-experiments matching only equates on observed
variables.
• Expectation: the mean of the distribution of all
possible sample means resulting from all possible
random assignments of units to conditions
– In cards, some get good hands and some don’t (luck of the
draw)
– But over time, you get your share of good hands
Estimates are Unbiased and
Consistent
• Estimates of effect from randomized
experiments are unbiased: the expectation
equals the population parameter.
– So the average of many randomized experiments is a
good estimate of the parameter (e.g., Meta-analysis)
• Estimates from randomized experiments are
consistent: as the sample size increases in an
experiment, the sample estimate approaches
the population parameter.
– So large sample sizes are good
• Quasi-experiments have neither of these
characteristics.
Randomized Experiments and The
Logic of Causal Relationships
• Logic of Causal Relationships
– Cause must precede effect
– Cause must covary with effect
– Must rule out alternative causes
• Randomized Experiments Do All This
– They give treatment, then measure effect
– Can easily measure covariation
– Randomization makes most other causes less likely
• Quasi-experiments are problematic on the third criterion.
• But no method matches this logic perfectly (e.g., attrition
in randomized experiments).
Assumptions on which a
Treatment Main Effect depends
• Posttest group means will differ, but they are
causally interpretable only if:
• The assignment is proper, so that pretest and
other covariate means do not differ on
observables on expectation (and in theory on
unobservables)
• There is no differential attrition, and so the
attrition rate and profile of remaining units is
constant across treatment groups
• There is no contamination across groups, which
is relevant for answering questions about
treatment-on-treated but not about intent to treat.
Advantages of Experiments
• Unbiased estimates of effects
• Relatively few, transparent and testable
assumptions
• More statistical power than alternatives
• Long history of implementation in health,
and in some areas of education
• Credibility in science and policy circles
Disadvantages attributed to
Experiments we must discuss
• Not always feasible for reasons of ethics,
politics, logistics and ignorance
• Experience is limited in education, especially
with higher order units like whole schools
• Limited generality of results - voluntarism and
INUS conditionals revisited
• Danger that the method alone will determine
types of causal questions asked and not asked
and crowd out other types of knowledge
• Asks intent-to-treat questions that have limited
yield for theory and program developers
Analyses Taking Implementation
into Account
• An intent-to-treat analysis (ITT)
• An analysis by amount of treatment
actually received (TOT)
• Need to construct studies that give
unbiased inference about each type of
treatment effect
• We have seen how to do ITT. What about
TOT?
Partial Treatment
Implementation
Intent to Treat
• Participants analyzed in condition to which they
were assigned
• Preserves internal validity
• Yields unbiased estimate about effects of being
assigned to treatment, not of receiving treatment
• May be of policy interest
• But should be complemented by other analyses
Analysis by Treatment Received
• Compare outcomes for those who
received treatment to outcomes for those
who did not
• Estimates effects of treatment receipt
• But is quasi-experimental
• Rarely a good option by itself
Instrumental Variables Analysis
• Angrist, Imbens, Rubin JASA 1996
• In economics, an instrument is a variable
or set of variables is correlated with
outcome only through an effect on other
variables (in this case, on treatment)
• Can use the instrument to obtain an
unbiased estimate of effect
Instrument Treatment Outcome
Instrumental Variables Analysis
• Use random assignment as an instrument for
incomplete treatment implementation
• Yields unbiased estimate of the effects of receipt
of treatment
• Random assignment is certainly related to
treatment, but it is unrelated to outcome except
through the treatment.
Random
Assignment
Treatment Outcome
Example: The Effects of Serving in
the Military on Death
• Lottery randomly assigned people to being
eligible for the draft.
– Intent to treat analysis would assess the effects of
being eligible for the draft on death
• This is a good randomized experiment yielding
an unbiased estimate
Lottery:
Eligible for
Draft or Not
Death
Example continued
• But that is not the question of interest
– Not all those eligible for the draft entered the military (not all
those assigned to treatment received it).
• Some who were draft eligible were never drafted
• Some who were not eligible chose to enlist
– We could compare the death rates for those who actually
entered the military with those who did not (we could
compare those who received treatment to those who did
not):
– But this design is quasi-experimental
Entered
Military or Not
Death
Example: Instrumental
Variable Analysis
• Random assignment is an instrument
because it can only affect the outcome
through the treatment.
• That is, being randomly assigned to being
draft eligible only affects death if the
person actually joins the military.
Lottery: Draft
Eligible or not
Entered
Military or Not
Died
Analysis for binary outcome and
binary treatment implementation
• 35.3% draft eligible served in military
• 19.4% not eligible served in military
• The lottery (random assignment) caused 15.9%
to serve in the military (normal randomized
experiment)
• 2.04% draft eligible died
• 1.95% not eligible died
• Draft eligible caused 0.09% to die
• Causal effect of serving in military on death
among those participating in the lottery is
.0009/.159 = .0058 = .56%
Assumptions of IV Strategy
1. One person’s outcomes do not vary
depending on the treatment someone
else is assigned
2. The causal effects of assignment both on
receipt and on outcome can be
estimated using standard intent-to-treat
analyses
3. Assignment to treatment has a nonzero
effect on receipt of treatment
Assumptions, continued
4. Random assignment (the instrumental variable) affects
outcome only through its effects on receipt of treatment
• a potential draftee’s knowledge that he was now eligible for the
draft might cause him to stay in school to gain a deferment,
which might improve mortality rates through education and
income
5. There are no “oppositional” participants who would
always refuse treatment if assigned to it, but take
treatment if not assigned to it
– A person whose family history would have encouraged him to
volunteer for the military in the absence of being drafted but
who objected to the government draft and so refused to serve
in protest
More on Angrist et al.
• Extensions to
– Variable treatment intensity
– Quasi-experiments of all kinds, but regression-
discontinuity in particular
– Continuous outcomes
• An area rapidly developing
• But still limited to analyses of a single mediator
variable. In many substantive applications, there
are many mediators of a treatment’s effects, as
in a causal or structural equation model.
Issues of Nesting and
Clusters, most of which is also
relevant to Quasi-Experiments
Units and Aggregate Units
• Can randomly assign:
– Units (e.g., children, households)
– Aggregates (e.g., classrooms, neighborhoods)
• Why we use aggregates:
– When the aggregate is of intrinsic interest (e.g.,
effects of whole school reform)
– To avoid treatment contamination effects within
aggregates.
– When treatment cannot be restricted to individual
units (e.g., city wide media campaigns)
The Problem with Aggregates
• Most statistical procedures assume (and require) that observations
(errors) be independent of each other.
• When units are nested within aggregates, units are probably not
independent
– If units are analyzed as if they were independent, Type I error
skyrockets
• E.g., an intraclass correlation of .001 can lead to a Type I error rate of a >
.20!
• Further, degrees of freedom for tests of the treatment effect should
now be based on the number of aggregates, not the number of
persons
• This means test of hypotheses about aggregates can be over-
powered if analyzed wrongly and that the correct analysis might
need “many” classrooms or schools, which is expensive
What Creates Dependence?
• Aggregates create dependence by
– Participants interacting with each other
– Exposure to common influences (e.g,.
Patients nested within physician practices)
• Both these problems are greater the
longer the group members have been
interacting with each other.
Making an Unnecessary
Independence Problem
• Individual treatment provided in groups for
convenience alone creates dependence the
more groups members interact and are exposed
to same influences.
• For instance, Empirically Supported Treatments
or Type I errors?
– About of a third of ESTs provide treatment in groups
– When properly reanalyzed, very few results were still
significant.
Some Myths about Nesting
• Myth: Random assignment to aggregates solves the
problem.
– This does not stop interacting or common influences
• Myth: All is OK if the unit of assignment is the same as
the unit of analysis.
– That is irrelevant if there is nesting.
• Myth: You can test if the ICC = 0, and if so, ignore
aggregates.
– That test is a low power test
• Myth: No problem if randomly assign students to two
groups within one classroom.
– Students are still interacting and exposed to same influences
The Worst Possible Case
• Random assignment of one aggregate
(e.g., a class) per condition
– The problem is that class and condition are
completely confounded, leaving no degrees of
freedom with which to estimate the effect of
the class.
– This is true even if you randomly assign
students to classes first.
What to Do?
• Avoid using one aggregate per condition
• Design to ensure sufficient power--more to come later
– have more aggregates with fewer units per aggregate
– randomly assign from strata
– use covariates or repeated measure
• Analyze correctly
– On aggregate means (but low power, and loses individual data)
– Using multilevel modeling (preferred)
• Increase degrees of freedom for the error term by
borrowing information about ICCs from past studies
An Example: The Empirically
Supported Treatments (EST) list.
• EST’s touted as methodologically strong
– But problem not limited to ESTs
• Includes 33 studies of group-administered
treatment
– Group therapies
– Individual therapies administered in group settings for
convenience
• None took nesting into account in analysis
• We estimated what proper analysis would have
yielded, using various assumptions about ICC.
– Adjust significance tests based on ICCs
– Adjust df based on number of groups not individuals
Table 1
Equations for adjusting Effects
Estimators.
t-test
ICC
m
t
t
o
unadj
adj
)
1
(
1 


F-test for ANOVA Fadj 
Funadj
1 (mo 1)ICC
Chi-Square
ICC
mo
unadj
adj
)
1
(
1
2
2





Note. Adapted from Rooney (1992). m=number of members per group, ICC=Intraclass Correlation.
Results
• After the corrections, only 12.4% to 68.2% of tests that
were originally reported as significant remained
significant
• When we considered all original tests, not just those that
were significant, 7.3% to 40.2% of tests remained
significant after correction
• The problem is even worse, because most of the studies
tested multiple outcome variables without correcting for
alpha inflation
• Of the 33 studies, 6-19 studies no longer had any
significant results after correction, depending on
assumptions
332
332
N =
Corrected
Reported
Degrees
of
Freedom
200
100
0
332
332
332
332
N =
Corrected F ICC=.30
Corrected F ICC=.15
Corrected F ICC=.05
Reported F
F
80
60
40
20
0
119
119
N =
Corrected
Reported
Degrees
of
Freedom
200
100
0
119
119
119
119
N =
Corrected F ICC=.30
Corrected F ICC=.15
Corrected F ICC=.05
Reported F
F
50
40
30
20
10
0
For all N = 332 t- and F-tests
For N = 119 omnibus t- and F-tests with exact information
Other Issues at the Cluster Level
• Sample Size and Power
• Contamination
• Getting Agreement to Participate
Estimating the Needed Sample
Size
• We are not dealing here with statistical power in
general, only at school level
• Question is: How many schools are needed for
ES of .20, with p < .05, power .80, assuming a
balanced design and > 50 students per school.
• Why .20? Why .05, why .80. Why balanced?
What role does the N of students play?
Key Considerations
We estimate the cluster effect via the
unconditional ICC, that part of the total
variation that is between schools
• But sample size needs are driven by the
conditional ICC, the difference between schools
after covariates are used to “explain” some of
the between-school variation
• We want to use 2 examples, one local and one
national, to illustrate how careful use of school-
level covariates can reduce the N of schools
needed
Example 1: Kentucky
• An achievement study
• A school-level question
• A limited budget
• One year of prior achievement data at
both the school and student levels
• Given these data, and traditional power
assumptions, how many schools needed
to detect an effect of .20?
• We use J for schools and N for students
Kentucky Cluster Table
Within-
School
Variance
s2
Between-
School
Variance
t2
Total
Unexplained
Variance
t2s2
Intra-Class
Correlation
(ICC)
t2/(t2s2)
1209 146 1355 0.11
Table 1: Estimates from Unconditional Model
Table 2: Required J for the
Unconditional Model
Unconditional Effect Size Required J
0.20 94
0.25 61
0.30 43
What is the School Level
Covariate like?
For reading, the obtained covariate-outcome
r is .85--the usual range in other studies is
.70 to .95
As corrected in HLM this value is .92
What happens when this pretest school-
level covariate is used in the model?
Table 3: Estimates from
Conditional Model (CTBS as
Level-2 Covariate)
Within
School
Variance
s2
Between
School
Variance
T2
Total
Unexplained
Variance
t2s2
Intra-Class
Correlati
on (ICC)
t2/(t2s2)
1210 21.6 1231.6 0.0175
What has happened?
• The total unexplained variation has shrunk
from 1355 to 1232--why?
• The total between-school variation has
shrunk from 146 to 26--why?
• So how many school are now needed for
the same power?
Table 4: Required J for Two Level
Unconditional and Conditional
Models
Effect
Size
Required J
No Covariate
Required J
With Covariate
0.20 94 22
0.25 61 15
0.30 43 12
How do these Values Compare?
• The work of Hedges and Hallberg with
nationally representative data where m is
his term for sample size at the school level
(not J)
National Estimates from Hedges
Grade Covariates m=10 m=15 m=20 m=25 m=30
1 None 0.67 0.54 0.46 0.41 0.37
Pretest 0.32 0.25 0.22 0.19 0.18
5 None 0.70 0.56 0.48 0.43 0.39
pretest 0.30 0.24 0.21 0.19 0.17
12 None 0.58 0.46 0.40 0.36 0.32
pretest 0.21 0.17 0.15 0.13 0.12
Conclusions about needed
Sample Sizes
• Will vary by type of outcome, local setting and
quality of the covariate structure
• With achievement outcomes, about 20 schools
will often do, 10 per group in a two-group study
• But to protect against attrition, some more might
be added
• Further gains accrue from several prior years of
school-level achievement data, not difficult to get
• Since intervention groups can cost more, an
unbalanced design with more control units will
also help, though gain depends on harmonic n
Contamination Issues with
Cluster-level Assignment
• To reduce contamination one can move to a
higher level of analysis: from student to
classroom to grade level to school to district
• Need to assess/monitor type and level of
contamination--PGC Comer as an example
• How to analyze it: Instrumental Variables for
dichotomously distributed contamination
• More problematic with more complex forms of
contamination
Cluster Level Random
Assignment- Getting Agreement
• High rate of RA in preschool studies of
achievement and in school-based studies of
prevention, but not in school-based studies of
achievement. Why? Culture or Structure?
• Cook’s war stories - PGC; Chicago; Detroit
• Grant Fdn. Resources
• Experiences at Mathematica
• District-level brokers
• IES experience generally positive that RA can be
often achieved (and maintained). But difficult
Summary re RCTs
• Best in theory for certain kind of cause
• Has its own assumptions that need to be tested
• Importance of a marriage of statistical theory
and an ad hoc “theory” of implementation, as
with survey research
• RCTs not usable in all ed research practice
• Limited capacity to explore causal contingencies
• Results from single studies probabilistic rather
than deterministic
• Philosophers of science might say: First rate
method for second rate theory of cause
Summary 2
• Lower level at which assign the better; Higher
order designs can be expensive
• Covariates help reduce sample size needs:
Crucial role of pretest
• Value of description of implementation based on
program theory and quality measurement
• Black box RCTs not a good idea, but ironic that
traditional methods and standards cannot yet
support complex causal explanations of why A
and B are related - only a single mediator but
many more moderators
Remember, though…
• Causal descriptions of an A causes B form are
“the cement of the universe” because:
• Each causal explanation of why A causes B
requires that A causes B
• Every causal model assumes the validity of each
causal link it contains. These links are tested
outside of the model.
• Every review to identify stable causal
knowledge assumes the validity of the causal
knowledge being reviewed.
• So testing A-B links is real important even if it is
rarely the end-goal of a generalizing science.

More Related Content

Similar to day1.week22010.ppt

.Case study and enantioselectivity in drug ADME.pptx
.Case study and enantioselectivity in drug ADME.pptx.Case study and enantioselectivity in drug ADME.pptx
.Case study and enantioselectivity in drug ADME.pptxPurushothamKN1
 
Research and types L1.pptx
Research and types L1.pptxResearch and types L1.pptx
Research and types L1.pptxKshitizBhargava
 
TYPES OF RESEARCHES AND ITS IMPORTANCE IN PHYSIOTHERAPY
TYPES OF RESEARCHESAND ITS IMPORTANCE IN PHYSIOTHERAPYTYPES OF RESEARCHESAND ITS IMPORTANCE IN PHYSIOTHERAPY
TYPES OF RESEARCHES AND ITS IMPORTANCE IN PHYSIOTHERAPYQURATULAIN MUGHAL
 
NMRN research notes.pptx
NMRN research notes.pptxNMRN research notes.pptx
NMRN research notes.pptxJaniceScates
 
What is an experimental research (1)
What is an experimental research (1)What is an experimental research (1)
What is an experimental research (1)M Usama Sehgal
 
Introduction to research
Introduction to researchIntroduction to research
Introduction to researchANCYBS
 
nursing research Chapter 5 for pbn 3rd year pu PU nepal
 nursing research Chapter 5 for pbn 3rd year pu PU nepal nursing research Chapter 5 for pbn 3rd year pu PU nepal
nursing research Chapter 5 for pbn 3rd year pu PU nepalpurnamepurna
 
Research methodology khushbu
Research methodology khushbuResearch methodology khushbu
Research methodology khushbukhushbu mishra
 
Periodontal research khushbu
Periodontal research khushbuPeriodontal research khushbu
Periodontal research khushbukhushbu mishra
 
research problem.pptx
research  problem.pptxresearch  problem.pptx
research problem.pptxNursing
 
Business Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow SyllabusBusiness Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow SyllabusKartikeya Singh
 
2. The scientific method - Chap 2.pptx
2. The scientific method - Chap 2.pptx2. The scientific method - Chap 2.pptx
2. The scientific method - Chap 2.pptxSaleemAbbas42
 
TSL3133 Topic 2 Types of Educatioanal Research
TSL3133 Topic 2 Types of Educatioanal ResearchTSL3133 Topic 2 Types of Educatioanal Research
TSL3133 Topic 2 Types of Educatioanal ResearchYee Bee Choo
 
RSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and QuestionRSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and QuestionWesam Abuznadah
 

Similar to day1.week22010.ppt (20)

Lesson 2-quanti (1)
Lesson 2-quanti (1)Lesson 2-quanti (1)
Lesson 2-quanti (1)
 
Hnc research methodology
Hnc research methodologyHnc research methodology
Hnc research methodology
 
.Case study and enantioselectivity in drug ADME.pptx
.Case study and enantioselectivity in drug ADME.pptx.Case study and enantioselectivity in drug ADME.pptx
.Case study and enantioselectivity in drug ADME.pptx
 
Research and types L1.pptx
Research and types L1.pptxResearch and types L1.pptx
Research and types L1.pptx
 
Amsale Read.ppt
Amsale Read.pptAmsale Read.ppt
Amsale Read.ppt
 
TYPES OF RESEARCHES AND ITS IMPORTANCE IN PHYSIOTHERAPY
TYPES OF RESEARCHESAND ITS IMPORTANCE IN PHYSIOTHERAPYTYPES OF RESEARCHESAND ITS IMPORTANCE IN PHYSIOTHERAPY
TYPES OF RESEARCHES AND ITS IMPORTANCE IN PHYSIOTHERAPY
 
NMRN research notes.pptx
NMRN research notes.pptxNMRN research notes.pptx
NMRN research notes.pptx
 
What is research
What is researchWhat is research
What is research
 
What is an experimental research (1)
What is an experimental research (1)What is an experimental research (1)
What is an experimental research (1)
 
Introduction to research
Introduction to researchIntroduction to research
Introduction to research
 
Module5_Quantitative_.pptx
Module5_Quantitative_.pptxModule5_Quantitative_.pptx
Module5_Quantitative_.pptx
 
nursing research Chapter 5 for pbn 3rd year pu PU nepal
 nursing research Chapter 5 for pbn 3rd year pu PU nepal nursing research Chapter 5 for pbn 3rd year pu PU nepal
nursing research Chapter 5 for pbn 3rd year pu PU nepal
 
Research methodology khushbu
Research methodology khushbuResearch methodology khushbu
Research methodology khushbu
 
Periodontal research khushbu
Periodontal research khushbuPeriodontal research khushbu
Periodontal research khushbu
 
Research Problem.pptx
Research Problem.pptxResearch Problem.pptx
Research Problem.pptx
 
research problem.pptx
research  problem.pptxresearch  problem.pptx
research problem.pptx
 
Business Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow SyllabusBusiness Research Method - Unit II, AKTU, Lucknow Syllabus
Business Research Method - Unit II, AKTU, Lucknow Syllabus
 
2. The scientific method - Chap 2.pptx
2. The scientific method - Chap 2.pptx2. The scientific method - Chap 2.pptx
2. The scientific method - Chap 2.pptx
 
TSL3133 Topic 2 Types of Educatioanal Research
TSL3133 Topic 2 Types of Educatioanal ResearchTSL3133 Topic 2 Types of Educatioanal Research
TSL3133 Topic 2 Types of Educatioanal Research
 
RSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and QuestionRSS 2012 Developing Research Idea and Question
RSS 2012 Developing Research Idea and Question
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 

day1.week22010.ppt

  • 1. Quasi-Experimental Workshop Tom Cook and Will Shadish Supported by Institute for Educational Sciences
  • 2. Introductions: People Workshop Staff Your names and affiliations Logic for selecting you
  • 3. Introduction: Purposes • Briefly describe the sorry, but improving, current state of causal research in education • Briefly learn why the randomized experiment is so preferred when it is feasible • Learn better quasi-experimental designs and principles for creating novel designs • Improve your own causal research projects through discussion with Will and Tom • Disseminate better practices when go home • Have fun, eat well, meet interesting new people
  • 4. Decorum No dress code Please interrupt for clarification Engage instructors in side conversations at breaks and meals No titles, only first names
  • 5. Schedule by Day • Morning session, lunch period, afternoon session, rest or talk 4 to 6, dinner • Lunch to meet others, let off steam, or discuss your own projects with Tom or Will • Rest period to catch up on home, meet Tom or Will, discuss material with others • Dinners are to have fun and socialize
  • 6. And at the End of the Workshop • We will provide a web site and email addresses for followup • Provide you with finalized sets of powerpoint slides (since we do tend to change them a little at each workshop).
  • 7. • The Current State of the Causal Art in Education Research
  • 8. Education leads the other Social Sciences in Methods for: • Meta-Analysis • Hierarchical Linear Modeling • Psychometrics • Analysis of Individual Change • Next 5 days do not entail an indictment of education research methods writ large • Only of its methods for identifying what works, identifying causal relationships
  • 9. State of Practice in Causal Research in Education • Theory and research suggest best methods for causal purposes. Yet • Low prevalence of randomized experiments-- Ehri; Mosteller • Low prevalence of Regression-discontinuity • Low prevalence of interrupted time series • Low prevalence of studies combining control groups, pretests and sophisticated matching • High prevalence of weakest designs: pretest- posttest only; non-equivalent control group without a pretest; and even with it.
  • 10. Forces Impelling Change in Causal Research Practice • General dissatisfaction with knowledge of “what works” in education • IES’ experimental agenda and its control over many U.S. research funds • Role of some foundations, esp. W.T. Grant • Growth of applied micro-economists in education research in USA and abroad • Better causal research practice in early childhood education and school-based prevention -- why?
  • 11. RA in American Educational Research Today Heavily promoted at IES, NIH and in some Foundations • Normative in pre-school research (ASPE) and in research on behavior problems in schools (IES and NIJJ) • Reality in terms of Funding Decisions • Growing reality in terms of publication decisions
  • 12. IES Causal Programs in Bush Administration • National Evaluations mandated by Congress or some Title and done by contract research firms • Program Announcements for Field-Initiated Studies mostly done in universities • Training and Center Grants to universities • What Works Clearinghouse • Regional Labs • SREE - Society for Research in Educational Effectiveness • Unusual degree of focus on a single method
  • 13. Institutionalizing the Agenda • Depends on more persons able and willing to assign at random entering into ed research • Degree to which the opposition is mobilized to fight random assignment • Emerging results show few large effects - shall we shoot the messenger? • Not yet totally clear how much the old Bush priority for RA is being pursued, but there has been clear change towards RA in ed research
  • 14. Our Main Purpose • To raise quality of causal research, we will: • Do one afternoon on randomized experiments, though some content also applies to quasi-exps • A day on Regression-discontinuity • A half-day on short interrupted time-series, with some material on value-added analyses • A day on various sample- and individual case- matching practices, good and bad • A day on other causal design principles that are not predicated on matching cases for comparability
  • 15. Terminology to Keep on Track • Experimentation - deliberate intrusion into an ongoing process to identify effects of that intrusion - exogenous shock • Randomized experiments involve assignment to treatment and comparison groups based on chance--examples • Natural experiment denotes some sudden and non-researcher controlled intrusion into an ongoing process--examples
  • 16. Terminology • Quasi-experiments involve assignment to treatment not based on chance--examples • A non-experiment seeks to draw conclusions about a causal agent that is not deliberately manipulated nor suddenly intrudes into an ongoing process -- e.g., cutting into an endogenous process relating attention to learning gains
  • 17. Today we will • Discuss what we mean by causation • Discuss threats to validity, esp. internal validity • Analyze the randomized experiment as the archetypal causal study • Discuss the limitations to doing experiments in real school settings • Discuss ways of circumventing these limitations
  • 19. Activity or Manipulability Theory from Philosophy of Science • What is it? • Some examples from daily life and science • Why it is important for practice and policy • How it relates to experimentation • Illustrating its major limitations through confrontation with other theories of causation
  • 20. Mackie’s INUS Conditional Causal agents as Insufficient but Necessary Parts of Unnecessary but Sufficient conditions for an effect • Example of all the “hidden” factors it takes for a matchstick to cause fire dependably or for class size decreases to cause learning • Experimentation is partial because it teaches us about few causal contingencies • Full causal knowledge requires knowing the causal role of multiple contingency variables • So the conclusion from any one study may be unstable - causal heterogeneity.
  • 21. Cronbach’s UTOS Formulation • Studies require Units, Treatments, Outcomes (Observations), Settings -- and also Times • These condition the results of any one causal claim from an experiment -- some examples • Implies 2 things: Unit of progress is review not single study; and the greater value of identifying general causal mediating processes over ever more causal contingency variables • Both causal explanation and probes of causal robustness require studies whose causal conclusions we can trust! Hence this workshop.
  • 22. Another Way of Saying this (1) • More than study-specific causal descriptions from A to B, Science values (a) explanatory causal knowledge of why A affects B and (b) causal descriptions that robustly replicate across multiple, heterogeneous studies • Aspirations of science should also animate public policy cos each is more helpful when it applies stable knowledge • Experimentation is useful because causal explanation always contains causal descriptions that are better if stable. Why explain causal phenomena that are wrong or weakly replicable?
  • 23. Another Way of Saying this (2) • Reviews allow us to establish dependability of a causal connection IF the UTOS sampling frame is heterogeneous • Reviews allow us to identify some specific moderator and mediator variables • But reviews require at least some individual causal conclusions we trust. Why review many studies if they are biased in same direction? • Hence this workshop. Good knowledge of descriptive causal connections facilitates both explanations and reviews that are dependable and so less dependent on unknown conditions
  • 24. • Now we turn to the best explicated theory of descriptive causal practice for the social sciences: Rubin’s Causal Model
  • 25.
  • 26. Rubin’s Counterfactual Model • At a conceptual level, this is a counterfactual model of causation. – An observed treatment given to a person. The outcome of that treatment is Y(1) – The counterfactual is the outcome that would have happened Y(0) if the person had not received the treatment. – An effect is the difference between what did happen and what would have happened: Effect = Y(1) – Y(0). • Unfortunately, it is impossible to observe the counterfactual, so much of experimental design is about finding a credible source of counterfactual inference.
  • 27. Rubin’s Model: Potential Outcomes • Rubin often refers to this model as a “potential outcomes” model. • Before an experiment starts, each participant has two potential outcomes, – Y(1): Their outcome given treatment – Y(0): Their outcome without treatment • This can be diagrammed as follows:
  • 28. Rubin’s Potential Outcomes Model Units Potential Outcomes Causal Effects Treatment Control 1 Y1(1) Y1(0) Y1(1) – Y1(0) i Yi(1) Yii(0) Yi(1) – Yi(0) N YN(1) YN(0) YN(1) – YN(0)       ) 1 ( Y ) 0 ( Y Under this model, we can get a causal effect for each person. ) 0 ( ) 1 ( Y Y  And we can get an average causal effect as the difference between group means.
  • 29. Rubin’s Potential Outcomes Model Units Potential Outcomes Causal Effects Treatment Control 1 Y1(1) Y1(0) Y1(1) – Y1(0) i Yi(1) Yii(0) Yi(1) – Yi(0) N YN(1) YN(0) YN(1) – YN(0)       ) 1 ( Y ) 0 ( Y Unfortunately, we can only observe one of the two potential outcomes for each unit. Rubin proposed that we do so randomly, which we accomplish by random assignment: ) 0 ( ) 1 ( Y Y 
  • 30. Rubin’s Potential Outcomes Model Units Potential Outcomes Causal Effects Treatment Control 1 Y1(1) i Yii(0) N YN(1)       ) 0 ( ) 1 ( Y Y  ) 0 ( Y The cost of doing this is that we can no longer estimate individual causal effects. But we can still estimate Average Causal Effect (ACE) as the difference between the two group means. This estimate is unbiased because the potential outcomes are missing completely at random. ) 1 ( Y
  • 31. Rubin’s Model and Quasi- Experiments • The aim is to construct a good source of counterfactual inference given that we cannot assign randomly, for example – Well-matched groups – Persons as their own controls • Rubin has also created statistical methods for helping in this task: – Propensity scores – Hidden bias analysis
  • 32. Is Rubin’s Model Universally Applicable? • Natural Sciences invoke causation and they experiment, but they rarely use comparison groups for matching purposes • They pattern-match instead, creating either a • Very specific hypothesis as a point prediction; or • Very elaborate hypothesis that is then tested via re-application and removal of treatment under experimenter control • We will later use insights from this notion to construct a non-matching approach to causal inference in quasi-experiments to complement matching approaches
  • 33. Very Brief Exigesis of Validity • This goes over some well known ground • But it forces us to be explicit about the issues on which we prioritize in this workshop
  • 34. Validity • We do (or read about) a quasi-experiment that gathered (or reported) data • Then we make all sorts of inferences from the data – About whether the treatment worked – About whether it might work elsewhere • The question of validity is the question of the truth of those inferences. • Campbell’s validity typology is one way to organize our thinking about inferences.
  • 35. Campbell’s Validity Typology • As developed by Campbell (1957), Campbell & Stanley (1963), Cook & Campbell (1979), with very minor changes in Shadish, Cook & Campbell (2002) – Internal Validity – Statistical Conclusion Validity – Construct Validity – External Validity • Each of the validity types has prototypical threats to validity—common reasons why we are often wrong about each of the four inferences.
  • 36. Internal Validity • Internal Validity: The validity of inferences about whether observed covariation between A (the presumed treatment) and B (the presumed outcome) reflects a causal relationship from A to B, as those variables were manipulated or measured. • Or more simply—did the treatment affect the outcome? • This will be the main priority in this workshop.
  • 37. Threats to Internal Validity 1. Ambiguous Temporal Precedence 2. Selection 3. History 4. Maturation 5. Regression 6. Attrition 7. Testing 8. Instrumentation 9. Additive and Interactive Effects of Threats to Internal Validity Think of these threats as specific kinds of counterfactuals—things that might have happened to the participants if they had not received treatment.
  • 38. Statistical Conclusion Validity • Statistical Conclusion Validity: The validity of inferences about the correlation (covariation) between treatment and outcome. • Closely tied to Internal Validity – SCV asks if the two variables are correlated – IV asks if that correlation is due to causation
  • 39. Threats to Statistical Conclusion Validity 1. Low Statistical Power (very common) 2. Violated Assumptions of Statistical Tests (especially problems of nesting—students nested in classes) 3. Fishing and the Error Rate Problem 4. Unreliability of Measures 5. Restriction of Range 6. Unreliability of Treatment Implementation 7. Extraneous Variance in the Experimental Setting 8. Heterogeneity of Units 9. Inaccurate Effect Size Estimation
  • 40. Construct Validity • Construct Validity: The validity of inferences about the higher-order constructs that represent sampling particulars. – We do things in experiments – We talk about the things we did in our reports – One way to think about construct validity is that it is about how accurately our talk matches what we actually did.
  • 41. External Validity • External Validity: The validity of inferences about whether the cause-effect relationship holds over variation in persons, settings, treatment variables, and measurement variables. • Always the “stepchild” in Campbell’s work, Cook has developed a theory of causal generalization addressing both construct and external validity. • But that is another workshop.
  • 42. Validity Priorities for This Workshop Main Focus is Internal Validity Statistical Conclusion Validity: Because it is so closely tied to Internal Validity Relatively little focus Construct Validity External Validity
  • 43. Randomized Experiments with Individual Students and with Clusters of Classrooms or Schools
  • 44. Randomized Control Trials: Some Selective Issues 1. Logic of random assignment 2. Clarification of Assumptions of RCTs 3. Recent Advances for Dealing with Partial and not Full Implementation of Treatment 4. Recent Advances in Dealing with Sample Size Needs when assigning Schools or Classrooms rather than Students
  • 45. What is an Experiment? • The key feature common to all experiments is to deliberately manipulate a cause in order to discover its effects • Note this differentiates experiments from – Case control studies, which first identify an effect, and then try to discover causes, a much harder task
  • 46. Random Assignment • Any procedure that assigns units to conditions based only on chance, where each unit has a nonzero probability of being assigned to a condition – Coin toss – Dice roll – Lottery – More formal methods (more shortly)
  • 47. What Random Assignment Is Not • Random assignment is not random sampling – Random sampling is rarely feasible in experiments • Random assignment does not require that every unit have an equal probability of being assigned to conditions – You can assign unequal proportions to conditions
  • 48. Equating on Expectation • Randomization equates groups on expectation for all observed and unobserved variables, not in each experiment – In quasi-experiments matching only equates on observed variables. • Expectation: the mean of the distribution of all possible sample means resulting from all possible random assignments of units to conditions – In cards, some get good hands and some don’t (luck of the draw) – But over time, you get your share of good hands
  • 49. Estimates are Unbiased and Consistent • Estimates of effect from randomized experiments are unbiased: the expectation equals the population parameter. – So the average of many randomized experiments is a good estimate of the parameter (e.g., Meta-analysis) • Estimates from randomized experiments are consistent: as the sample size increases in an experiment, the sample estimate approaches the population parameter. – So large sample sizes are good • Quasi-experiments have neither of these characteristics.
  • 50. Randomized Experiments and The Logic of Causal Relationships • Logic of Causal Relationships – Cause must precede effect – Cause must covary with effect – Must rule out alternative causes • Randomized Experiments Do All This – They give treatment, then measure effect – Can easily measure covariation – Randomization makes most other causes less likely • Quasi-experiments are problematic on the third criterion. • But no method matches this logic perfectly (e.g., attrition in randomized experiments).
  • 51. Assumptions on which a Treatment Main Effect depends • Posttest group means will differ, but they are causally interpretable only if: • The assignment is proper, so that pretest and other covariate means do not differ on observables on expectation (and in theory on unobservables) • There is no differential attrition, and so the attrition rate and profile of remaining units is constant across treatment groups • There is no contamination across groups, which is relevant for answering questions about treatment-on-treated but not about intent to treat.
  • 52. Advantages of Experiments • Unbiased estimates of effects • Relatively few, transparent and testable assumptions • More statistical power than alternatives • Long history of implementation in health, and in some areas of education • Credibility in science and policy circles
  • 53. Disadvantages attributed to Experiments we must discuss • Not always feasible for reasons of ethics, politics, logistics and ignorance • Experience is limited in education, especially with higher order units like whole schools • Limited generality of results - voluntarism and INUS conditionals revisited • Danger that the method alone will determine types of causal questions asked and not asked and crowd out other types of knowledge • Asks intent-to-treat questions that have limited yield for theory and program developers
  • 54. Analyses Taking Implementation into Account • An intent-to-treat analysis (ITT) • An analysis by amount of treatment actually received (TOT) • Need to construct studies that give unbiased inference about each type of treatment effect • We have seen how to do ITT. What about TOT?
  • 56. Intent to Treat • Participants analyzed in condition to which they were assigned • Preserves internal validity • Yields unbiased estimate about effects of being assigned to treatment, not of receiving treatment • May be of policy interest • But should be complemented by other analyses
  • 57. Analysis by Treatment Received • Compare outcomes for those who received treatment to outcomes for those who did not • Estimates effects of treatment receipt • But is quasi-experimental • Rarely a good option by itself
  • 58. Instrumental Variables Analysis • Angrist, Imbens, Rubin JASA 1996 • In economics, an instrument is a variable or set of variables is correlated with outcome only through an effect on other variables (in this case, on treatment) • Can use the instrument to obtain an unbiased estimate of effect Instrument Treatment Outcome
  • 59. Instrumental Variables Analysis • Use random assignment as an instrument for incomplete treatment implementation • Yields unbiased estimate of the effects of receipt of treatment • Random assignment is certainly related to treatment, but it is unrelated to outcome except through the treatment. Random Assignment Treatment Outcome
  • 60. Example: The Effects of Serving in the Military on Death • Lottery randomly assigned people to being eligible for the draft. – Intent to treat analysis would assess the effects of being eligible for the draft on death • This is a good randomized experiment yielding an unbiased estimate Lottery: Eligible for Draft or Not Death
  • 61. Example continued • But that is not the question of interest – Not all those eligible for the draft entered the military (not all those assigned to treatment received it). • Some who were draft eligible were never drafted • Some who were not eligible chose to enlist – We could compare the death rates for those who actually entered the military with those who did not (we could compare those who received treatment to those who did not): – But this design is quasi-experimental Entered Military or Not Death
  • 62. Example: Instrumental Variable Analysis • Random assignment is an instrument because it can only affect the outcome through the treatment. • That is, being randomly assigned to being draft eligible only affects death if the person actually joins the military. Lottery: Draft Eligible or not Entered Military or Not Died
  • 63. Analysis for binary outcome and binary treatment implementation • 35.3% draft eligible served in military • 19.4% not eligible served in military • The lottery (random assignment) caused 15.9% to serve in the military (normal randomized experiment) • 2.04% draft eligible died • 1.95% not eligible died • Draft eligible caused 0.09% to die • Causal effect of serving in military on death among those participating in the lottery is .0009/.159 = .0058 = .56%
  • 64. Assumptions of IV Strategy 1. One person’s outcomes do not vary depending on the treatment someone else is assigned 2. The causal effects of assignment both on receipt and on outcome can be estimated using standard intent-to-treat analyses 3. Assignment to treatment has a nonzero effect on receipt of treatment
  • 65. Assumptions, continued 4. Random assignment (the instrumental variable) affects outcome only through its effects on receipt of treatment • a potential draftee’s knowledge that he was now eligible for the draft might cause him to stay in school to gain a deferment, which might improve mortality rates through education and income 5. There are no “oppositional” participants who would always refuse treatment if assigned to it, but take treatment if not assigned to it – A person whose family history would have encouraged him to volunteer for the military in the absence of being drafted but who objected to the government draft and so refused to serve in protest
  • 66. More on Angrist et al. • Extensions to – Variable treatment intensity – Quasi-experiments of all kinds, but regression- discontinuity in particular – Continuous outcomes • An area rapidly developing • But still limited to analyses of a single mediator variable. In many substantive applications, there are many mediators of a treatment’s effects, as in a causal or structural equation model.
  • 67. Issues of Nesting and Clusters, most of which is also relevant to Quasi-Experiments
  • 68. Units and Aggregate Units • Can randomly assign: – Units (e.g., children, households) – Aggregates (e.g., classrooms, neighborhoods) • Why we use aggregates: – When the aggregate is of intrinsic interest (e.g., effects of whole school reform) – To avoid treatment contamination effects within aggregates. – When treatment cannot be restricted to individual units (e.g., city wide media campaigns)
  • 69. The Problem with Aggregates • Most statistical procedures assume (and require) that observations (errors) be independent of each other. • When units are nested within aggregates, units are probably not independent – If units are analyzed as if they were independent, Type I error skyrockets • E.g., an intraclass correlation of .001 can lead to a Type I error rate of a > .20! • Further, degrees of freedom for tests of the treatment effect should now be based on the number of aggregates, not the number of persons • This means test of hypotheses about aggregates can be over- powered if analyzed wrongly and that the correct analysis might need “many” classrooms or schools, which is expensive
  • 70. What Creates Dependence? • Aggregates create dependence by – Participants interacting with each other – Exposure to common influences (e.g,. Patients nested within physician practices) • Both these problems are greater the longer the group members have been interacting with each other.
  • 71. Making an Unnecessary Independence Problem • Individual treatment provided in groups for convenience alone creates dependence the more groups members interact and are exposed to same influences. • For instance, Empirically Supported Treatments or Type I errors? – About of a third of ESTs provide treatment in groups – When properly reanalyzed, very few results were still significant.
  • 72. Some Myths about Nesting • Myth: Random assignment to aggregates solves the problem. – This does not stop interacting or common influences • Myth: All is OK if the unit of assignment is the same as the unit of analysis. – That is irrelevant if there is nesting. • Myth: You can test if the ICC = 0, and if so, ignore aggregates. – That test is a low power test • Myth: No problem if randomly assign students to two groups within one classroom. – Students are still interacting and exposed to same influences
  • 73. The Worst Possible Case • Random assignment of one aggregate (e.g., a class) per condition – The problem is that class and condition are completely confounded, leaving no degrees of freedom with which to estimate the effect of the class. – This is true even if you randomly assign students to classes first.
  • 74. What to Do? • Avoid using one aggregate per condition • Design to ensure sufficient power--more to come later – have more aggregates with fewer units per aggregate – randomly assign from strata – use covariates or repeated measure • Analyze correctly – On aggregate means (but low power, and loses individual data) – Using multilevel modeling (preferred) • Increase degrees of freedom for the error term by borrowing information about ICCs from past studies
  • 75. An Example: The Empirically Supported Treatments (EST) list. • EST’s touted as methodologically strong – But problem not limited to ESTs • Includes 33 studies of group-administered treatment – Group therapies – Individual therapies administered in group settings for convenience • None took nesting into account in analysis • We estimated what proper analysis would have yielded, using various assumptions about ICC. – Adjust significance tests based on ICCs – Adjust df based on number of groups not individuals
  • 76. Table 1 Equations for adjusting Effects Estimators. t-test ICC m t t o unadj adj ) 1 ( 1    F-test for ANOVA Fadj  Funadj 1 (mo 1)ICC Chi-Square ICC mo unadj adj ) 1 ( 1 2 2      Note. Adapted from Rooney (1992). m=number of members per group, ICC=Intraclass Correlation.
  • 77. Results • After the corrections, only 12.4% to 68.2% of tests that were originally reported as significant remained significant • When we considered all original tests, not just those that were significant, 7.3% to 40.2% of tests remained significant after correction • The problem is even worse, because most of the studies tested multiple outcome variables without correcting for alpha inflation • Of the 33 studies, 6-19 studies no longer had any significant results after correction, depending on assumptions
  • 78. 332 332 N = Corrected Reported Degrees of Freedom 200 100 0 332 332 332 332 N = Corrected F ICC=.30 Corrected F ICC=.15 Corrected F ICC=.05 Reported F F 80 60 40 20 0 119 119 N = Corrected Reported Degrees of Freedom 200 100 0 119 119 119 119 N = Corrected F ICC=.30 Corrected F ICC=.15 Corrected F ICC=.05 Reported F F 50 40 30 20 10 0 For all N = 332 t- and F-tests For N = 119 omnibus t- and F-tests with exact information
  • 79. Other Issues at the Cluster Level • Sample Size and Power • Contamination • Getting Agreement to Participate
  • 80. Estimating the Needed Sample Size • We are not dealing here with statistical power in general, only at school level • Question is: How many schools are needed for ES of .20, with p < .05, power .80, assuming a balanced design and > 50 students per school. • Why .20? Why .05, why .80. Why balanced? What role does the N of students play?
  • 81. Key Considerations We estimate the cluster effect via the unconditional ICC, that part of the total variation that is between schools • But sample size needs are driven by the conditional ICC, the difference between schools after covariates are used to “explain” some of the between-school variation • We want to use 2 examples, one local and one national, to illustrate how careful use of school- level covariates can reduce the N of schools needed
  • 82. Example 1: Kentucky • An achievement study • A school-level question • A limited budget • One year of prior achievement data at both the school and student levels • Given these data, and traditional power assumptions, how many schools needed to detect an effect of .20? • We use J for schools and N for students
  • 84. Table 2: Required J for the Unconditional Model Unconditional Effect Size Required J 0.20 94 0.25 61 0.30 43
  • 85. What is the School Level Covariate like? For reading, the obtained covariate-outcome r is .85--the usual range in other studies is .70 to .95 As corrected in HLM this value is .92 What happens when this pretest school- level covariate is used in the model?
  • 86. Table 3: Estimates from Conditional Model (CTBS as Level-2 Covariate) Within School Variance s2 Between School Variance T2 Total Unexplained Variance t2s2 Intra-Class Correlati on (ICC) t2/(t2s2) 1210 21.6 1231.6 0.0175
  • 87. What has happened? • The total unexplained variation has shrunk from 1355 to 1232--why? • The total between-school variation has shrunk from 146 to 26--why? • So how many school are now needed for the same power?
  • 88. Table 4: Required J for Two Level Unconditional and Conditional Models Effect Size Required J No Covariate Required J With Covariate 0.20 94 22 0.25 61 15 0.30 43 12
  • 89. How do these Values Compare? • The work of Hedges and Hallberg with nationally representative data where m is his term for sample size at the school level (not J)
  • 90. National Estimates from Hedges Grade Covariates m=10 m=15 m=20 m=25 m=30 1 None 0.67 0.54 0.46 0.41 0.37 Pretest 0.32 0.25 0.22 0.19 0.18 5 None 0.70 0.56 0.48 0.43 0.39 pretest 0.30 0.24 0.21 0.19 0.17 12 None 0.58 0.46 0.40 0.36 0.32 pretest 0.21 0.17 0.15 0.13 0.12
  • 91. Conclusions about needed Sample Sizes • Will vary by type of outcome, local setting and quality of the covariate structure • With achievement outcomes, about 20 schools will often do, 10 per group in a two-group study • But to protect against attrition, some more might be added • Further gains accrue from several prior years of school-level achievement data, not difficult to get • Since intervention groups can cost more, an unbalanced design with more control units will also help, though gain depends on harmonic n
  • 92. Contamination Issues with Cluster-level Assignment • To reduce contamination one can move to a higher level of analysis: from student to classroom to grade level to school to district • Need to assess/monitor type and level of contamination--PGC Comer as an example • How to analyze it: Instrumental Variables for dichotomously distributed contamination • More problematic with more complex forms of contamination
  • 93. Cluster Level Random Assignment- Getting Agreement • High rate of RA in preschool studies of achievement and in school-based studies of prevention, but not in school-based studies of achievement. Why? Culture or Structure? • Cook’s war stories - PGC; Chicago; Detroit • Grant Fdn. Resources • Experiences at Mathematica • District-level brokers • IES experience generally positive that RA can be often achieved (and maintained). But difficult
  • 94. Summary re RCTs • Best in theory for certain kind of cause • Has its own assumptions that need to be tested • Importance of a marriage of statistical theory and an ad hoc “theory” of implementation, as with survey research • RCTs not usable in all ed research practice • Limited capacity to explore causal contingencies • Results from single studies probabilistic rather than deterministic • Philosophers of science might say: First rate method for second rate theory of cause
  • 95. Summary 2 • Lower level at which assign the better; Higher order designs can be expensive • Covariates help reduce sample size needs: Crucial role of pretest • Value of description of implementation based on program theory and quality measurement • Black box RCTs not a good idea, but ironic that traditional methods and standards cannot yet support complex causal explanations of why A and B are related - only a single mediator but many more moderators
  • 96. Remember, though… • Causal descriptions of an A causes B form are “the cement of the universe” because: • Each causal explanation of why A causes B requires that A causes B • Every causal model assumes the validity of each causal link it contains. These links are tested outside of the model. • Every review to identify stable causal knowledge assumes the validity of the causal knowledge being reviewed. • So testing A-B links is real important even if it is rarely the end-goal of a generalizing science.