Evaluation19(3) 321 –332© The Author(s) 2013 Reprints .docx

Evaluation
19(3) 321 –332
© The Author(s) 2013
Reprints and permissions:
sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/1356389013497081
evi.sagepub.com
Validity and generalization in
future case study evaluations
Robert K. Yin
COSMOS Corporation, USA
Abstract
Validity and generalization continue to be challenging aspects
in designing and conducting case
study evaluations, especially when the number of cases being
studied is highly limited (even limited
to a single case). To address the challenge, this article
highlights current knowledge regarding
the use of: (1) rival explanations, triangulation, and logic
models in strengthening validity, and
(2) analytic generalization and the role of theory in seeking to
generalize from case studies.
To ground the discussion, the article cites specific practices and
examples from the existing
literature as well as from the six preceding articles assembled in
this special issue. Throughout,

the article emphasizes that current knowledge may still be
regarded as being at its early stage
of development, still leaving room for more learning. The
article concludes by pointing to three
topics worthy of future methodological inquiry, including: (1)
examining the connection between
the way that initial evaluation questions are posed and the
selection of the appropriate evaluation
method in an ensuing evaluation, (2) the importance of
operationally defining the ‘complexity’ of
an intervention, and (3) raising awareness about case study
evaluation methods more generally.
Keywords
analytic generalization, initial evaluation questions,
intervention complexity, logic models, rival
explanations, role of theory, triangulation
Introduction
The classic case study consists of an in-depth inquiry into a
specific and complex phenomenon
(the ‘case’), set within its real-world context. To arrive at a
sound understanding of the case, a
case study should not be limited to the case in isolation but
should examine the likely interaction
between the case and its context. Technically, such an objective
adds to a common problem,
whether doing case study research (Yin, 2014) or case study
evaluation (Yin and Ridde, 2012):
the number of datapoints (each case being a single datapoint)
will be far outstripped by the num-
ber of variables under study − because of the complexity of the
case as well as the embracing of
Corresponding author:

Robert K. Yin, COSMOS Corporation, 3 Bethesda Metro
Center, Suite 700, Bethesda, MD 20814, USA.
Email: [email protected]
497081EVI19310.1177/1356389013497081EvaluationRobert K.
Yin: Validity and generalization in future case study
evaluations
2013
Article
322 Evaluation 19(3)
the contextual conditions. This situation is nearly impossible to
remedy, even if a modest number
of cases is included as part of the same (multiple-) case study.
As a result, the usual analytic
techniques based on having a large number of datapoints and a
small number of variables
(thereby permitting estimates of means and variances) are likely
to be irrelevant in doing case
study research.
For evaluations, the ability to address the complexity and
contextual conditions nevertheless
establishes case study methods as a viable alternative among the
other methodological choices,
such as survey, experimental, or economic research
(Stufflebeam and Shinkfield, 2007). The con-
ditions appear especially relevant in efforts to evaluate highly
broad and complex initiatives; for
example, systems reforms, service delivery integration,
community and economic development
projects, and international development (e.g. Yin and Davis,
2007). At the same time, doing case

study evaluations with acceptable and rigorous procedures must
rely on a state-of-the-art still in its
formative stages.
The May 2012 workshop on ‘Validity, Generalization, and
Learning’ provided an opportunity
for a variety of scholars to share their working knowledge and
to advance the state-of-the-art. Six
of the presentations became the other articles contained in this
journal issue.1 Together, the six
assembled articles form a basis for briefly reviewing the key
practices regarding validity and gen-
eralization when doing case study evaluations. The present
article tries to reinforce and also to
elaborate upon the six articles. The goal is to stimulate yet
newer contributions on all these impor-
tant methodological practices. Only in this manner will case
study evaluations continue to get
stronger. The article is organized according to a slight
adaptation of the main themes of the original
workshop: Strengthening validity; Seeking to generalize; and
Still more learning.
Strengthening validity
Case study evaluations may limit themselves to descriptive or
even exploratory objectives.
However, the greatest challenge arises when case study
evaluations fill an explanatory role. This
means: (a) documenting (and interpreting) a set of outcomes,
and then (b) trying to explain how
those outcomes came about. When adopting such an explanatory
objective, a case study evaluation
will in effect be examining causal relationships. The evaluation
thus squarely confronts issues of
internal validity.2 To address these issues, the small number of

cases in a case study − frequently
involving only a single case − precludes the use of conventional
experimental designs. These
require the availability of a sufficiently large number of cases
that can in turn be divided into two
(or more) comparison groups. Instead, case study evaluations
must rely on other techniques.
One evaluative approach has been to conduct and document
direct observations of the events
and actions as they actually occur in a local setting as a critical
part of a case study’s data collection
(e.g. Erickson, 2012: 688; Maxwell, 2004, 2012; Miles and
Huberman, 1994: 132). The inquiry
can highlight the contextual role of the local settings and
accommodate if not feature the non-linear
and recursive flows of events (‘feedback loops that occur at
irregular times’ − Betts, 2013: 255) as
well as the possibility of entertaining multiple causes, both
proximal and distal. However, the ensu-
ing analysis remains highly qualitative and may not be very
convincing.
To improve on the precision of such an approach and to boost
confidence in the findings, two
of the six assembled articles (Befani, 2013; Byrne, 2013) offer
insights into a technique known
as qualitative comparative analysis (QCA), developed by
Charles Ragin (1987, 2000, 2009).
This technique captures within-case patterns or configurations
(Byrne, 2013: 224), consisting of
the combination of intervention and outcome conditions for
each particular case being studied.
The cross-case analysis then becomes the systematic
comparison of these within-case

evaluations 323
configurations or sets of intervention-outcome conditions.3
When a sufficiently large number of
cases is available, QCA can be ‘strong in testing, refining, and
validating findings’ (Befani,
2013: 280). As examples, Befani’s article discusses two
illustrations, having 17 and 11 cases,
respectively.
The more advanced versions of QCA (Ragin, 2000) permit the
handling of 50 to 100 cases
(Befani, 2013: 281–82). However, such a capability, as well as
the QCA procedure more generally,
may in fact focus on cases rather than on the conduct of in-
depth case studies. Except when pre-
existing cases are already archivally available, a study covering
a large number of new in-depth
case studies is likely to be difficult to conduct, because of both
the elapsed time and the resources
needed by the study. QCA’s capability, therefore, may move in
the opposite direction from the
initial challenge of confronting validity with a small number of
cases, including the classic single-
case study. For such situations, the six assembled articles gave
less attention to three known prac-
tices, possibly because the practices remain underdeveloped.
Plausible, rival explanations
The role of examining plausible rival explanations has been
readily recognized in doing evalua-
tions (e.g. Maxwell, 2004: 257−60; Yin, 2000b). Appealing to

such rivals has formed a central part
of nearly all types of research in the social and physical
sciences (e.g. Campbell, 2014: xvii−xviii).
Although experimental designs may control for all rivals (but
without specifying any of them), the
number of plausible rivals competing with the main
hypothesized causal relationships in a case
study may be sufficiently limited that they can be studied
directly. Thus, as part of the same case
study, the procedure calls for a vigorous search for data related
to the rivals, as if trying to find
support for them (Patton, 2002: 553; Rosenbaum, 2002: 8−10).
Given a vigorous search, but finding no such support, more
confidence can be placed in the
main hypothesized relationships. The degree of certainty will be
lower than that associated with an
experimental design but higher than if a case study had not
investigated any plausible rivals. As
noted in the field of education research, ‘the use of qualitative
methods . . . can be particularly help-
ful in ruling out alternative explanations . . . [and] can enable
stronger causal inferences’ (Shavelson
and Towne, 2002: 109). For a case study evaluation, the most
common rivals might be the exist-
ence of: an initiative similar to or overlapping with the
intervention being evaluated; a salient
policy shift not related to the intervention; or some other
identifiable influence in the contextual
environment.
However, beyond being identified as an integral and critical
part of doing an evaluation, the
operational procedure for making comparisons with plausible
rival explanations has received little
attention. Explicit procedures are needed to deal with how and

whether the acceptance or rejection
of rivals meets such benchmarks as being ‘acceptable,’ ‘weak,’
or ‘strong,’ or even how to distin-
guish between a plausible rival and a mere red herring. In
addition, the operational steps involved
in comparing the rival findings with those related to the main
hypothesis may be intricate and may
benefit from being represented as formal designs. To this
extent, the use of plausible rival explana-
tions remains an extremely promising but still underdeveloped
procedure for strengthening the
validity of case study evaluations.
Triangulation
Triangulation presents a similar situation. The principle has
been long understood (e.g. Denzin, 1978;
Jick, 1979), with at least four types of triangulation being
possible: (1) data source triangulation, (2)
analyst triangulation, (3) theory/perspective triangulation, and
(4) methods triangulation (Patton,
2002: 556−63). Of the four, the data source and methods types
in particular are likely to strengthen
the validity of a case study evaluation. Renewed interest in
mixed methods research has highlighted
the ways in which a methods triangulation can provide
increased confidence in the findings from a
study that has combined quantitative with qualitative methods
(e.g. Creswell and Plano Clark, 2007;
Teddlie and Tashakkori, 2009). (Vellema et al. [2013: Table 1],
briefly refer to their use of one of the

other kinds of triangulation: theory/perspective triangulation.)
Many case study evaluations, especially those focusing on broad
or complex interventions, can
involve a combination of two or more methods. When these
methods are purposely designed to
collect some overlapping data, the possibility for triangulation
certainly exists and, if the results are
convergent, greater confidence may be placed in the
evaluation’s overall findings. Similarly, con-
vergence over the examination of causal relationships will
strengthen the evaluation’s internal
validity.
At the same time, operational procedures for carrying out
triangulations also have received little
attention. No benchmarks exist to define when triangulation
might be considered ‘strong’ or ‘weak’
or ‘complete’ or ‘incomplete.’ Similarly, sufficient
triangulation might involve an intricate number
of steps that need to be represented as formal designs. The
ultimate goal, as with making compari-
sons with plausible rival explanations, calls for a common
procedure that can be routinely adopted
and used by many if not all case study evaluations.
Logic models
Case study evaluations frequently use logic models, initially to
express the theoretical causal rela-
tionships between an intervention and its outcomes, and then to
guide data collection on these same
topics. The collected data can be analyzed by comparing the
empirical findings with the initially
stipulated theoretical relationships, and a match between the
empirical and the theoretical adds to

the support for explaining how an intervention produced (or
not) its outcomes.
The practice of using logic models in evaluations has again
been understood for a lengthy
period of time (e.g. Wholey, 1979). Nevertheless, although the
practice of using logic models has
become quite common, little has occurred to sharpen their use
and strengthen their role.
For instance, a major shortcoming derives from the
coincidentally graphic similarities between
logic models and flow charts. Both are usually expressed as a
sequence of boxes. In the case of the
logic model, the boxes represent the key steps or events within
an intervention and then between
the intervention and its outcomes. Graphically, the boxes are
then connected by arrows that identify
the links between and among the events. Unfortunately, most
evaluations collect data about the
boxes, but nearly no data about the arrows. Yet they represent
the flow of transitional or causal
conditions, showing or explaining how one event (box) might
actually lead to another event (a
second box). One possible reason for such negligence is that
transitional data are irrelevant in flow
charts, which only represent the shifting from one task to
another, but without implying any causal
relationship. For logic models not having any transitional data,
only a correlational analysis can be
conducted, reducing the causal value (and validity) of the entire
exercise. Future studies could
again investigate ways of improving the use of logic models.
Summary

Case study evaluations need to continue to confront the
challenge of strengthening validity. Several
known methodological practices accept rather than avoid the
necessary underlying assumption that
evaluations 325
the typical case study will only include a small number of cases:
checking for plausible, rival
explanations; triangulating data or methods; and using logic
models. These practices deserve
greater attention than they have attracted in the past. In each
situation, although the practices have
been recognized and used for many years, the preceding
paragraphs have suggested that room for
improvement still exists. Future methodological contributions
could therefore yield desirable
payoffs.
Seeking to generalize
Concerns in doing case study evaluations extend from issues of
validity to issues of generalization.
In international development, the generalizations form the basis
for transferring lessons from one
country to another as well as for ‘scaling-up’ a desirable
intervention within the same country. This
facet of the May 2012 workshop theme led the six assembled
articles to delve, in some cases quite
deeply, into generalization issues.
The widespread assumption, embraced by most of the articles as
well as the prevailing evalua-

tion literature, interprets case study generalization as an effort
to generalize from a small number
of cases to a larger population of cases (e.g. Byrne, 2013;
Ragin, 2009; Seawright and Gerring,
2008; and Woolcock, 2013). The common quest has been, first,
to establish a sufficiently precise
definition of the ‘case’ being studied (if not at the outset of a
case study at least by its conclusion),
and then to (retrospectively) define the broader population of
relevant cases. The process mimics
the conventional sampling procedure but can fail for two
reasons.
First, the difficulties of selecting the initial case(s) usually
mean that the case(s) being studied
do not represent a known, much less random sample from the
larger set of cases. An additional and
circular problem involves not fully understanding the case or
having sufficient data for selection
purposes to be able to define the potential population of cases;
but, without knowing the popula-
tion, not being able to define fully the nature of the sampled
case(s) to be studied.
Second, if a study genuinely takes advantage of the case study
method − that is, by probing a
case and its context in-depth − the study will likely only be able
to include a small number of cases.
In fact, the classic case study, as well as many case study
evaluations, is usually limited to only a
single case. The goal of understanding a case and its context,
potentially over a meaningful period
of time, is sufficiently engrossing that, even if thick description
(Geertz, 1973) is not the end result,
a case study will just not be able to cover more than a small
number of cases. The only way of

increasing the number of cases to some substantial level would
mean sacrificing the in-depth and
contextual nature of the insights inherent in using the case study
method in the first place.
Analytic generalization
Instead of pursuing the sample-to-population logic, analytic
generalization can serve as an appro-
priate logic for generalizing the findings from a case study (e.g.
Bromley, 1986: 290–1; Burawoy,
1991: 271–87; Donmoyer, 1990; Gomm et al., 2000; Mitchell,
1983; and Small, 2009).4 By ana-
lytic generalization is meant the extraction of a more abstract
level of ideas from a set of case study
findings − ideas that nevertheless can pertain to newer
situations other than the case(s) in the origi-
nal case study. For case study evaluations, the analytic
generalization should aim to apply to other
concrete situations and not just to contribute to abstract theory
building.
The desired analytic generalization also should go beyond
serving only as a ‘working hypothesis’
(e.g. Cronbach, 1975) − that is, one in need of further study
rather than being ready to be generalized
or applied to new situations. This shortcoming is not easily
overcome. However, carefully linking an
analytic generalization to the related research literature by
identifying overlaps as well as gaps will
help. Replication of the same findings by conducting a second

or third case study (e.g. Yin, 2014:
57−9) can strengthen the generalization even further.
Eventually, the ideal generalization may extend
not only to other ‘like’ cases but also ‘apply to many different
types of cases’ (Bennett, 2004: 50).
This manner of generalizing is not peculiar to doing case
studies but is in fact analogous to
the way that generalizations are made in doing experiments.
Thus, the selection and conduct of
an experiment derives from the goal of developing fresh data
about some initially hypothesized
conditions − or about discovering a totally new condition − but
not from being a sample of some
known, larger population of like experiments.5 Case study
research follows a similar motive
(Yin, 2014: 44).
One of the six assembled articles (Mookherji and LaFond, 2013)
demonstrated the development
of analytic generalizations in considerable detail. The study
examined the ‘initiatives and pro-
cesses [that were] actually “driving” the improvements in
routine immunization [projects in three
African countries]’ (Mookherji and LaFond, 2013: 288). A
critical analytic step occurred after the
data had been collected: the identification of the varied drivers
in each of the case studies, followed
by a cross-case synthesis of how the case-specific drivers fell
into six categories, each representing
one of six (conceptually) common drivers (see Table 1 of their
article).
Based on these and other cross-case findings, Mookherji and
LaFond formulated a comprehen-
sive framework depicting the flow of pre-conditions, contextual

conditions, and drivers (see Figure
4 of their article). The framework, now empirically derived,
explains how and why immunization
projects can succeed. In the authors’ view, it became the basis
for generalizing the results from their
evaluation to other districts in other African countries (p. 22).
(By inspecting the framework
closely, a reader might even speculate that the framework can
pertain to immunization projects
outside of that region − or even to the design of community
health initiatives more broadly.)
Mookherji and LaFond’s example shows how analytic
generalization offers improved ways of
generalizing from case study evaluations. An additional line of
thinking that builds on the impor-
tance of analytic generalization is described next.
The role of ‘theory’ in making analytic generalizations
Mookherji and LaFond rightfully regarded their framework as
expressing a theory of change (p.
23). One way to have further strengthened their framework
would have been to connect it to the
extant literature, which contains a considerable body of work on
the locally decentralized service
delivery conditions and the local partnering arrangements
central to their framework. The authors
might then have been able to discuss how their case study
contributed (or not) to new knowledge
about health interventions, and whether their findings were
limited to immunization projects or
could be applied to community health projects more generally.
In essence, the desired analytic generalization should present an
explanation of how and why

the initiative being evaluated produced results (or not) − or, for
non-evaluation studies, how and
why the studied events occurred (or not). In this latter regard,
two other examples are worth noting.
The first is Graham Allison’s well-known single-case study on
the Cuban missile crisis (Allison,
1971; Allison and Zelikow, 1999). The case study has for over
30 years been a best-seller in the
field of political science because of its analytic generalizations
and implications for a broad array
of international relationships.
The second example (also illustrating how a detailed single-case
study can be published in a
leading academic journal, even given its page-length
limitations) examined how the Croatian gov-
ernment represented the country’s past, present, and future in
the aftermath of the wars of Yugoslav
evaluations 327
secession (Rivera, 2008). The wars had left a reputation-
damaging effect, threatening Croatia’s
highly valued tourist industry. The case study showed how, in
response, the government reframed
the country’s past by omitting the war in its representations of
national history, re-positioning the
country as more closely sharing a history and culture with its
Western European neighbors. The
explanation for these findings then drew from a prevailing
theoretical framework, in which the
author innovatively extended Erving Goffman’s well-regarded
work on stigma and the manage-

ment of ‘spoiled identity’ from the individual to the
institutional realm (Goffman, 1963). The
author concluded by claiming that the analytic generalization
had applicability to other situations
of collective memory and cultural sociology.
Summary
The preferred manner of generalizing from case studies and case
study evaluations is likely to take
the form of making an analytic or conceptual generalization,
rather than of reaching for a numeric
one. The desired generalization should present an explanation
for how an evaluated initiative pro-
duces its results (or not). The explanation can be regarded as a
theory of sorts − certainly more than
a set of isolated concepts − and therefore yield a better
understanding of an intervention and its
outcomes. Whether such an explanation is based on a theory
that emerged for the first time from a
case study or had been entertained in hypothetical form prior to
the conduct of the case study,
researchers need to connect the theory to the extant literature,
or alternatively, to use their findings
to explain the gaps and weaknesses in that literature. By doing
so, the generalizations from a single
case study can be interpreted with greater meaning and lead to a
desired cumulative knowledge.
Finally, replications of the original case study also help.
At the same time, the strongest empirical foundation for these
generalizations derives from the
close-up, in-depth study of a specific case in its real-world
context.6 Such a condition usually limits
the number of cases that can be studied. In turn, such a
limitation precludes applying the conven-

tional numeric, or sample-to-population generalizations when
doing case studies. If, in contrast, an
evaluation genuinely has an overarching goal of establishing or
estimating numeric relationships,
doing a case study evaluation might not be the preferred method
to satisfy such a goal.
Still more learning
The present article’s treatment of validity and generalization
suggests ways that case study evalu-
ations can gain from methodological studies yet to be done.
These studies need to focus on case
study practices to strengthen future case study evaluations. In
this sense, there is still more learning
to be done. Discussed next are three topics connected to validity
and generalization that represent
priorities for the desired methodological studies.
Noting carefully the nature of the initial evaluation questions
Perhaps the most important inquiry points to the very start of a
case study evaluation − its evalua-
tion questions. These questions have serious implications for
the remainder of the case study.
However, many case study evaluations may not be attending
carefully to the way that these ques-
tions are posed. How best to pose these questions, therefore,
should be a high priority for future
investigation. Such studies could be quite straightforward, for
example, conducting a meta-analysis
of completed evaluations, deliberately covering a variety of
forms of questions and types of evalu-
ation methods.

The studies might initially assume that the desired questions for
case study evaluations, as with
case study research more generally, should be cast as ‘how’ or
‘why’ questions (Yin, 2014: 10−11).
Such questions implicitly direct attention to events and actions
over time, including but not limited
to causal processes (and therefore not restricted to explanatory
case study evaluations but also
embracing descriptive ones). The strength of the subsequent
case study would be its ability to
examine the relevant events and actions in all their complexity,
even if re-creating a contemporary
period of time retrospectively. ‘How’ and ‘why’ questions, for
instance, highlighted the seven
questions posed in doing each of the three country case studies
in Mookherji and LaFond’s article
(2013: 289).
Unfortunately, many evaluations, including those dealing with
international development,
totally ignore ‘how’ and ‘why’ questions and start with ‘what’
or ‘to what extent’ questions. The
‘what’ questions seek to identify the specific conditions
associated with a successful (or not) inter-
vention. Moreover, these conditions are sometimes expressed as
single ‘present-absent’ variables,
even when a condition, such as decentralization, is entirely too
complex to be treated in this man-
ner. Nevertheless, note that − assuming the availability of
sufficient data − regressions, factor
analyses, and other quantitative models can readily support the
identification process. Furthermore,
the models can more than capably demonstrate the potency of a

targeted condition by controlling
for competing conditions or showing how sets of conditions
interact. Likewise, if properly
addressed, the ‘to what extent’ questions beg for a numeric, not
explanatory or even descriptive
response.
When the initial evaluation questions appear to favor methods
other than case studies, attempts
to conduct case study evaluations in spite of these questions
may lead to tough sledding for the
ensuing case study. First, validity questions may arise about the
sample of cases selected, the avail-
ability of counterfactual conditions, and the metrics used to
assess the ‘extent’ in the phrase ‘to
what “extent”.’ Most commonly, to address the ‘to what extent’
questions, a case study evaluation
will have to resort to the use of Likert scales and then query
respondents or analysts. Yet, such a
maneuver can raise even more uncertainties about the sample
and implicit biases of the respond-
ents or analysts who were queried.
By addressing the less preferred form of questions, however, the
greatest loss may be a case
study’s inability to arrive at any generalizations. For instance,
the ‘what’ questions may lead to
no particular theoretical framework other than a correlative one,
making analytic generaliza-
tions difficult. Depending upon the number of cases, numeric
generalizations about the fre-
quency or combination of the ‘whats’ may be tenuous from any
conventional quantitative
standpoint.
Overall, future inquiries should aim to yield a better

understanding of how an evaluation’s
initial questions can imply certain preferences in selecting the
methods for an evaluation. An
important hypothesis to be entertained is that the form of these
questions dictates whether a case
study (or other evaluation method) should be used in the first
place (e.g. Shavelson and Towne,
2002: 99−108).
Extending this challenge into a slightly more controversial
realm, a somewhat more compli-
cated situation surfaces when evaluations are initially driven by
the ‘realist’ framework of ques-
tions − ‘what works for whom, when, where, and why?’
(Woolcock, 2013: 245; Betts, 2013: 256).
This common framework, appearing in many evaluations and
evaluation programs (international
and otherwise), leads to the impression that a short or at least
manageable list of conditions can
eventually be identified. Moreover, the ‘whom, when, where,
and why’ portion of the framework
leaves the impression that the responses will identify a set of
constraining and enabling conditions
related to generalizing to other situations.
evaluations 329
However, the complexity of an intervention and its context may
yield such a large number
of conditions, not to speak of their distinctiveness or
uniqueness, that they cannot be itemized
in any practical way. Even if successfully itemized, the likely
analytic tool may again be a

correlative one, not a case study. Thus, future studies should
deliberately examine the implica-
tions of using the evaluation questions deriving from a realist
framework − at a minimum
examining whether a useful procedure might be for a new study
to speculate about the kind
and length of the likely items before deciding whether to
proceed with a case study or some
alternative method.
Revisiting the ‘complexity’ of interventions
A second priority topic covers the presumed complexity of an
intervention and how it appears to
influence the choice between case study evaluations and other
evaluation methods. Many evalua-
tions, as well as the present article, portray ‘complexity’ as an
important feature justifying the use
of case studies. The usual context for making this choice is a
comparison to experiments, which in
their classic form mainly focus on the relationship between a
single cause and a single effect at a
time (Befani, 2013: 270; Byrne, 2013: 220). However, instead
of relying on a comparison with
experiments, a better justification for proceeding with a case
study evaluation might require a
sharper definition of what makes an intervention complex.
Some interventions may consist of a number of components that
have complex relationships.
These types of interventions and this type of complexity may
nevertheless be highly amenable to
methods other than case studies (e.g. an economic-based study
of a housing intervention). Simply
stipulating that complex interventions warrant the use of the
case study method might appear to be

naive if not offensive to analysts familiar with the alternative
methods, which in fact can cover
certain kinds of complexity quite well (again, regression
models, structural equation models, and
the like come readily to mind).
Instead, the desired future studies should explicitly define the
conditions associated with the
‘complexity’ of the interventions that appear to favor case
studies. Several of the six articles in
this issue have begun to define these conditions, and future
methodological work could usefully
build on this foundation. For instance, an initially relevant
characteristic of complexity can
involve interventions having multiple causes and effects.
Moreover, the intervention may be
‘quite distal from the outcomes and impacts of interest’
(Mookherji and LaFond, 2013: 285).
Complexity also may mean understanding interventions in their
totality, not ‘in terms of their
components’ (Byrne, 2013: 218).
Finally, Woolcock suggests that interventions can vary
according to their causal density:
those having a high causal density might trigger a case study
evaluation (Woolcock, 2013:
237−39). According to Woolcock, density reflects four
conditions: (1) the number of required
person-to-person transactions, (2) the amount of discretion by
front-line implementing agents,
(3) the pressure on the agents to respond to distracting
conditions, and (4) whether the agents’
solutions come from a known menu or need to innovate. In
contrast, interventions with low
causal densities may be physical development projects having
known technological solutions,

such as building roads, providing proper sanitation and
electricity, building schools, and
administering vaccinations (Andrews et al., 2012) − and for
which other evaluation methods
may be entirely appropriate.
In summary, future studies should examine the importance of
describing the actual features
associated with the labeling of an intervention as ‘complex,’
rather than relying on the use of the
label alone.
Making the awareness of case study evaluation methods a
higher priority
A third priority topic sits at a higher plane than the first two −
and may be more difficult to pursue.
Although case study evaluation methods have advanced over the
years, progress has been slow
(e.g. Yin, 2000a). Some key topics such as triangulation and the
use of rival explanations, as previ-
ously discussed in this article, still appear to be underdeveloped
and await further investigation and
elaboration in order to become potent routines.
One possible explanation for the lack of progress is that articles
whose main concerns deal with
case study evaluations paradoxically begin with a fairly
elaborate discussion of non-case study
methods, such as the experimental method. The effect of these
lengthy and occasionally apologetic
discussions may be to displace a systematic and more thorough

canvassing of the potentially rele-
vant case study methods. The desired canvassing would increase
the awareness over justifying why
some case study practices but not others are to be employed in a
planned evaluation. As an exam-
ple, an initial discussion on rival explanations might cite the
relevant literature, show how rivals
had been incorporated (or not) in previous studies, and then
indicate how rivals are to be used (or
not) in the design of the planned evaluation. Rival explanations
were only mentioned once in the
six assembled articles (see Vellema et al., 2013).
Taking analytic generalization as a second example, the creation
of some typology of analytic
generalizations, along with the operational procedures for
deriving each type, would represent a
greater advance than has been experienced during the past
couple of decades. For example, Halkier
(2011) suggests three forms of analytic generalization and
offers procedures for examining them in
empirical studies: (1) ideal-typologizing, (2) category zooming
(depth on a single point), and (3)
positioning (the reflection of multiple voices and discourse).
Again, if an upcoming case study
evaluation were initially to discuss the previous use of analytic
generalization, even as a candidate
but then rejected practice, the study still could be building
important methodological lessons.
In summary, a more systematic canvassing should concentrate
on case study methods. These
could include rivals, analytic generalization, and other practices
not even touched upon in the pre-
sent article (e.g. case selection, the distinction between
proximal and distal causes, the mixture of

case study and other methods in the same evaluation, yet other
ways of generalizing, or parsing
contextual conditions rather than leaving them as an amorphous
entity as they now are). Only in
this way might newer contributions emerge, accelerating
progress in strengthening future case
study evaluations. Now that would be some kind of learning.
Funding
This research received no specific grant from any funding
agency in the public, commercial or not-for-profit
sectors.
Notes
1. The present article is not intended to be a review of any sort
of the assembled articles, nor did the present
author attend the May 2012 workshop.
2. The brevity of this article precludes discussing a related type
of validity − construct validity (e.g. Yin,
2014: 46−7).
3. Whether using QCA or not, the sequence of the within-case
analysis preceding the between-case analysis
− rather than starting an analysis by estimating the cross-case
averages for specific variables − is critical for
preserving the integrity of the individual cases in properly
doing any multiple-case study (Yin, 2014: 164−7).
4. The brevity of this article precludes discussing potentially
related kinds of generalizing, such as case-
to-case transferability, whose strength depends on the similarity
of the sending and receiving contexts
(Lincoln and Guba, 1985: 297).

evaluations 331
5. Regarding this contrast with a sample-population mode of
generalizing from experiments, whether
research experiments should admit to involving a well-defined
sample of human subjects and therefore
be limited to only the fuller population of similar people rather
than standing for ‘the norm for all human
beings’ (Prescott, 2002: 38) has been the topic of continuing
debate in psychology. The debate started
because of the over-reliance on college sophomores in serving
as subjects in behavioral research, now
augmented by the realization that most subjects have been white
males from industrialized countries
(Henrich et al., 2010).
6. Ethnographic methods are usually associated with the desire
to study phenomena in a real-world, up-
close, and in-depth manner (e.g. Emerson, 2001). However,
many ethnographies shy away from devel-
oping the theoretical insights and ideas needed to make analytic
generalizations. The predilections of this
kind of ethnography should therefore be considered carefully
before adopting the ethnographic method
to do the fieldwork in a case study evaluation.
References
Allison GT (1971) Essence of Decision: Explaining the Cuban
Missile Crisis. Boston, MA: Little, Brown.
Allison GT and Zelikow P (1999) Essence of Decision:
Explaining the Cuban Missile Crisis, 2nd edn. New

York: Addison Wesley Longman.
Andrews M, Pritchett L and Woolcock M (2012) Escaping
capability traps through problem-driven iterative
adaptation (PDIA). Working Paper 299. Washington, DC:
Center for Global Development.
Befani B (2013) Between complexity and generalization:
Addressing evaluation challenges with QCA.
Evaluation 19(3): 269–83.
Bennett A (2004) Testing theories and explaining cases. In:
Ragin CC, Nagel J and White P (eds), Workshop
on Scientific Foundations of Qualitative Research. Arlington,
VA: National Science Foundation, 49−51.
Betts J (2013) Aid Effectiveness and Governance Reforms:
Applying realist principles to a complex synthesis
across varied cases. Evaluation 19(3): 249–68.
Bromley DB (1986) The Case-Study Method in Psychology and
Related Disciplines. Chichester: Wiley.
Burawoy M (1991) The extended case method. In: Burawoy M
et al.. (eds), Ethnography Unbound: Power
and Resistance in the Modern Metropolis. Berkeley: University
of California Press, 271−87.
Byrne D (2013) Evaluating complex social interventions in a
complex world. Evaluation 19(3): 217–28.
Campbell DT (2014) Foreword. In: Yin RK, Case Study
Research: Design and Methods. Thousand Oaks,
CA: SAGE, xvii−xviii.
Creswell JW and Plano Clark VL (2007) Designing and
Conducting Mixed Methods Research. Thousand

Oaks, CA: SAGE.
Cronbach LJ (1975) Beyond the two disciplines of scientific
psychology. American Psychologist 30: 116–27.
Denzin NK (1978) The Research Act: A Theoretical
Introduction to Sociological Methods, 2nd edn. New
York: McGraw-Hill.
Donmoyer R (1990) Generalizability and the single-case study.
In: Eisner EW and Peshkin A (eds), Qualitative
Inquiry in Education: The Continuing Debate. New York:
Teachers College, 175−200.
Emerson RM (ed.) (2001) Contemporary Field Research:
Perspectives and Formulations, 2nd edn. Prospect
Heights, IL: Waveland Press.
Erickson F (2012) Comments on causality in qualitative inquiry.
Qualitative Inquiry 18: 686−8.
Geertz C (1973) The Interpretation of Cultures. New York:
Basic Books.
Goffman E (1963) Stigma: Notes on the Management of Spoiled
Identity. New York: Prentice-Hall.
Gomm R, Hammersley M and Foster P (2000) Case study and
generalization. In: Gomm R, Hammersley M
and Foster P (eds), Case Study Method. London: SAGE,
98−115.
Halkier B (2011) Methodological practicalities in analytic
generalization. Qualitative Inquiry 17: 787−97.
Henrich J, Heine SJ and Norenzayan A (2010) The weirdest
people in the world? Behavioral and Brain
Sciences 33: 61–83.
Jick TD (1979) Mixing qualitative and quantitative methods:
triangulation in action. Administrative Science

Quarterly 24: 602−11.
Lincoln YS and Guba E (1985) Naturalistic Inquiry. Thousand
Oaks, CA: SAGE.
Maxwell JA (2004) Using qualitative methods for causal
explanation. Field Methods 16: 243−64.
Maxwell JA (2012) The importance of qualitative research for
causal explanation in education. Qualitative
Inquiry 18: 655−61.
Miles M and Huberman M (1994) Qualitative Data Analysis: A
Sourcebook for New Methods. Thousand
Oaks, CA: SAGE.
Mitchell JC (1983) Case and situation analysis. Sociological
Review 31: 187–211.
Mookherji S and LaFond A (2013) Strategies to maximize
generalization from multiple case studies: lessons
from the Africa routine immunization system essentials
(ARISE) project. Evaluation 19(3): 284–303.
Patton M (2002) Qualitative Research and Evaluation Methods,
3rd edn. Thousand Oaks, CA: SAGE.
Prescott HM (2002) Using the student body: college and
university students as research subjects in the United
States during the twentieth century. Journal of the History of
Medicine 57: 3–38.
Ragin CC (1987) The Comparative Method: Moving beyond
Qualitative and Quantitative Strategies.
Berkeley, CA: University of California Press.

Ragin CC (2000) Fuzzy Set Social Science. Chicago: University
of Chicago Press.
Ragin CC (2009) Reflections on casing and case-oriented
research. In: Byrne D and Ragin CC (eds), The Sage
Handbook of Case-based Methods. London: SAGE, 522−34.
Rivera LA (2008) Managing ‘spoiled’ national identity: war,
tourism, and memory in Croatia. American
Sociological Review 73: 613−34.
Rosenbaum PR (2002) Observational Studies, 2nd edn. New
York: Springer.
Seawright J and Gerring J (2008) Case selection techniques in
case study research: a menu of qualitative and
quantitative options. Political Research Quarterly 61: 294−308.
Shavelson RJ and Towne L (eds) (2002) Scientific Research in
Education. Washington, DC: National
Academy Press.
Small ML (2009) ‘How many cases do I need?’ On science and
the logic of case selection in field-based
research. Ethnography 10: 5–38.
Stufflebeam DL and Shinkfield AJ (2007) Evaluation Theory,
Models, and Applications. San Francisco, CA:
Jossey-Bass.
Teddlie C and Tashakkori A (2009) Foundations of Mixed
Methods Research: Integrating Quantitative and
Qualitative Approaches in the Social and Behavioral Sciences.
Thousand Oaks, CA: SAGE.
Vellema S, Ton G, de Roo N and van Wijk J (2013) Value
chains, partnerships and development: using case

studies to refine programme theories. Evaluation 19(3): 304–20.
Wholey J (1979) Evaluation: Performance and Promise.
Washington, DC: The Urban Institute.
Woolcock M (2013) Using case studies to explore the external
validity of ‘complex’ development interven-
tions. Evaluation 19(3): 229–48.
Yin RK (2000a) Case study evaluations: a decade of progress?
In: Stufflebeam DL, Madaus GF and
Kelleghan T (eds), Evaluation Models: Viewpoints on
Educational and Human Services Evaluation, 2nd
edn. Boston, MA: Kluwer, 185–93.
Yin RK (2000b) Rival explanations as an alternative to ‘reforms
as experiments’. In: Bickman L (ed.), Validity
& Social Experimentation: Donald Campbell’s Legacy.
Thousand Oaks, CA: SAGE, 239−66.
Yin RK and Davis D (2007) Adding new dimensions to case
study evaluations: the case of evaluating com-
prehensive reforms. New Directions for Program Evaluation:
Informing Federal Policies for Evaluation
Methodology 113: 75−93.
Yin RK and Ridde V (2012) Théorie et pratiques des études de
cas en évaluation de programmes. In: Ridde
V and Dagenais C (eds), Approches et practiques en évaluation
de programmes, 2nd edn. Montreal:
University of Montreal Press, Chapter 10.
Yin RK (2014) Case Study Research: Design and Methods (5th
edn.). Thousand Oaks, CA: SAGE.
Robert K. Yin is President of the COSMOS Corporation and has
consulted extensively on the use of case

study evaluations for many clients including the United Nations
Development Programme and The World
Bank. He has published extensively on case study methods: the
3rd edition of Applications of Case Study
Research was published in 2012; and the 5th edition of Case
Study Research: Design and Methods has just
been published with a 2014 copyright date.

Evaluation19(3) 321 –332© The Author(s) 2013 Reprints .docx

Recommended

Recommended

More Related Content

Similar to Evaluation19(3) 321 –332© The Author(s) 2013 Reprints .docx

Similar to Evaluation19(3) 321 –332© The Author(s) 2013 Reprints .docx (20)

More from elbanglis

More from elbanglis (20)

Recently uploaded

Recently uploaded (20)

Evaluation19(3) 321 –332© The Author(s) 2013 Reprints .docx