1. Evaluation and Program Planning 24 (2001) 129±143
www.elsevier.com/locate/evalprogplan
Assessing the subsequent effect of a formative evaluation on a program
J. Lynne Brown a,*, Nancy Ellen Kiernan b
a
Penn State University, Department of Food Science, 203B Borland, University Park, Pennsylvania, PA, USA
b
Penn State University, College of Agricultural Sciences, 401 Agricultural Administration Building, University Park, Pennsylvania, PA 814-863-3439, USA
Received 30 June 1999; received in revised form 1 September 2000; accepted 31 October 2000
Abstract
The literature on formative evaluation focuses on its conceptual framework, methodology and use. Permeating this work is a consensus
that a program will be strengthened as a result of a formative evaluation although little empirical evidence exists in the literature to
demonstrate the subsequent effects of a formative evaluation on a program. This study begins to ®ll that gap. To do this, we outline the
initial program and formative evaluation, present key ®ndings of the formative evaluation, describe how these ®ndings in¯uenced the ®nal
program and summative evaluation, and then compare the ®ndings to those of the formative. The study demonstrates that formative
evaluation can strengthen the implementation and some impacts of a program, i.e. knowledge and some behaviors. The ®ndings also suggest
that when researchers are faced with negative feedback about program components in a formative evaluation, they need to exercise care in
interpreting and using this feedback. q 2001 Elsevier Science Ltd. All rights reserved.
Keywords: Formative evaluation; Summative evaluation; Impact; Assessing feedback
1. Introduction Morris, 1978); debated its frequency and timing in the
program cycle (Markle, 1979; Thiagarajan, 1991; Russell
Formative evaluation commands a formidable place in & Blake, 1988; Chambers, 1994); scrutinized its overlap
the evaluation literature. Highly regarded, the process was with process evaluation (Patton, 1982; Stuf¯ebeam, 1983;
used to improve educational ®lms in the 1920's (Cambre, Scheirer & Rezmovic, 1983; Dehar, Casswell & Duignan,
1981). Academic areas as diverse as agricultural safety 1993; Scheirer, 1994; Chen, 1996); and expanded its epis-
(Witte, Peterson, Vallabhan, Stephenson, Plugge, Givens temological framework, linking it to developmental
et al., 1992/93) and cardiovascular disease (Jacobs, Luep- programs (Patton, 1996). As the conceptual framework
ker, Mittelmark, Folsom, Pirie, Mascioli et al., 1986) draw evolved, the perceived value of formative evaluation has
on the process today, using ®ndings to improve a program; only increased.
among educators in particular, it is `almost universally Second, the literature focuses on methods and design
embraced' (Weston, 1986, p. 5). Surprisingly, the sub- strategies to conduct formative evaluation. That focus
sequent effect of using the ®ndings of formative evaluation appears ®rst, in handbooks or articles describing methods
has not received systematic attention. This paper address and design strategies for either an entire program (Rossi &
that gap. Freeman, 1982; Patton, 1978; Fitzs-Gibbon & Morris, 1978)
The literature focuses attention on three aspects of forma- or a segment of a program such as the materials (Weston,
tive evaluation, the ®rst of which is its conceptualization. 1986; Bertrand, 1978), instruction (Tessmer, 1993), electro-
Over time, researchers clari®ed the concept. They distin- nic delivery like television (Baggaley, 1986), or interactive
guished it from other forms of evaluation especially summa- technology (Flagg, 1990; Chen & Brown, 1994). The focus
tive, the fundamental difference being the rationale and use on method and strategies appears secondly, in case studies
of the data (Baker & Alkin, 1973; Markle, 1989; Patton, which illuminate a particular method or strategy tailored to
1994; Chambers, 1994; Weston, 1986); labeled it formative the exigencies of a particular situation such as a community
evaluation (Scriven, 1967) and accepted that designation (Jacobs et al., 1986; Johnson, Osganian, Budman, Lytle,
(Rossi & Freeman, 1982; Patton, 1982; Fitz-Gibbon & Barrera, Bonura et al., 1994; McGraw, Stone, Osganian,
Elder, Johnson, Parcel et al., 1994; McGraw, McKinley,
* Corresponding author. Tel.: 11-814-863-3973; fax: 11-814-863-6132. McClements, Lasater, Assaf & Carleton, 1989) or worksite
E-mail address: f9a@psu.edu (J.L. Brown). (Kishchuk, Peters, Towers, Sylvestre, Bourgault & Richard,
0149-7189/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved.
PII: S 0149-718 9(01)00004-0
2. 130 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143
1994). Over time, the focus on methods and strategies illu- ers take the next step and demonstrate this by comparing
minated critical decisions needed to design a valid forma- data from the initial program with data from the ®nal
tive evaluation. The decisions include: (1) who should program to show whether the changes resulted in an
participateÐexperts (Geis, 1987), learners from the improvement in program implementation and impacts.
targeted audience (Weston, 1986; Russell & Blake, 1988), Reviewing over 60 years of work in formative evaluation,
learners with different aptitudes (Wager, 1983), instructors scholars (Flagg, 1990; Dick, 1980; Dick & Carey, 1985;
representative of those in the ®eld (Weston, 1987; Peterson Weston, 1986,) found that the `evidence is supportive but
& Bickman, 1988), or drop outs from a program (Rossi & meager' (Geis, 1987, p. 6). Furthermore, most evidence
Freeman, 1982); (2) how many to include and in what (Baker & Alkin, 1973; Baghdadi, 1981; Kandaswamy,
formÐone or a group (Wager, 1983; Dick, 1980); (3) Stolovitch & Thiagarajan, 1976; Nathenson & Henderson,
type of data to collectÐqualitative or quantitative (Dennis, 1980; Scanlon, 1981; Wager, 1983; Montague, Ellis &
Fetterman & Sechrest, 1994; Peterson & Bickman, 1988; Wulfeck, 1983; Cambre, 1981) relates to only a component
Flay, 1986); (4) data collection techniques (Weston, 1986; of a program, the educational materials, not to an entire
Tessmer, 1993) and (5) similarity of pilot sessions relative program. Some landmark studies examine the impact of
to actual learning situations (Rossi & Freeman, 1982; an entire program in its formative stage such as the use of
Weston, 1986). Not surprisingly, the conviction permeating negative income tax strategies as a substitute for welfare
the literature on methods and strategies is that formative (Kershaw & Fair, 1976; Rossi & Lyall, 1976; Robins, Spie-
evaluation will lead to a stronger, more effective program. gelman, Weiner & Bell, 1980; Hausman & Wise, 1985) and
Third, attention in the literature dwells on the immediate the Department of Labor's LIFE effort to decrease arrest
use of formative evaluation ®ndings. Academic areas such rates among released felons with increased employment
as nutrition (Cardinal & Sachs, 1995), cancer prevention for (Lenihan, 1976), but only a few, such as those reported by
agricultural workers (Parrott, Steiner & Goldenhar, 1996), Fairweather and Tornatzky (1977), actually document that
and child health (Seidel, 1993) have evaluated a program in the changes made as a result of a formative evaluation
its formative stage. In case studies such as these, researchers resulted in a change in the impact of the ®nal program.
hail the evaluation process, describing the immediate effects Given that researchers hail formative evaluation as impor-
of the evaluation, i.e., the problems identi®ed and/or tant, the lack of evidence about its subsequent effect points
changes to be made in a modi®ed version of the program to a surprising gap in the literature.
(Potter et al., 1990; Finnegan, Rooney, Viswanath, Elmer, The purpose of this paper is to examine the subsequent
Graves, Baxter et al., 1992; Kishchuk et al., 1994; Iszler, effect of a formative evaluation to see whether the changes
Crockett, Lytle, Elmer, Finnegan, Luepker et al., 1995). resulting from it improved the ®nal program, suf®ciently to
These researchers are not consistent when reporting the distinguish between the impact of two program delivery
immediate effects of a formative evaluation. Some do not methods. To do this, we: (a) outline the initial program
include data; some do not outline the problems the process and its formative evaluation, (b) present the key ®ndings
identi®ed; and some do not describe the changes they made. of the formative evaluation, (c) describe how the formative
What is consistent however, is the message from these ®ndings in¯uenced the design of the revised program and its
researchers: formative evaluation led them to make changes evaluation, and then (d) compare the results of the initial and
that should lead to a stronger program. revised program, something rarely done in the formative
In summary, much has been written about formative evaluation literature. In doing this, we provide a compre-
evaluationÐit's conceptual framework, its methods, and hensive look at the implementation of both a formative and
its use. Throughout this literature, there is strong consensus summative evaluation. In conclusion, we identify issues that
on the value of formative evaluation, some calling its value evaluators wishing to improve the design of a formative
`obvious' (Baggaley, 1986, p. 34) and `no longer ques- evaluation need to consider. In addition, we identify
tioned' (Chen & Brown, 1994, p. 192). Many educators problems we encountered in attempting to assess the effec-
contend, however, that formative evaluation is not used tiveness of a formative evaluation.
enough (Flagg, 1990; Kaufman, 1980; Geis, 1987; Foshee,
McLeroy, Sumner & Bibeau, 1986). Indeed, some evalua-
tions re¯ect no previous attempt at formative evaluation 2. Stage one: The initial program
(Foshee et al., 1988; Glanz, Sorensen & Farmer, 1996;
Pelletier, 1996; Schneider, Ituarte & Stokols, 1993; Wilk- 2.1. Background
inson, Schuler & Skjolaas, 1993; Hill, May, Coppolo &
Jenkins, 1993). Other formative evaluations are limited: Combining federal, state and local funding, land grant
using few people, non-representative samples, or selected colleges support educational health promotion programs
materials (Tessmer, 1993). for individuals and communities offered by county-based
Part of the explanation for limited use of formative co-operative extension family living agents. Prior to our
evaluation may lie in the lack of empirical evidence in the study, agents reported poor attendance at evening and
literature demonstrating its subsequent effect. Few research- weekend meetings but rarely offered daytime programs at
3. J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 131
worksites. Instead, agents used correspondence lessons to 2.4. Delivery method
reach people unwilling to attend meetings. However, group
interaction is more likely to facilitate changes in behavior We tested two bi-weekly delivery methods for the
(Glanz & Seewald-Klein, 1986), possibly through the lessons. Group-delivery (G), based on the discussion±deci-
support offered by sharing experiences. While it was easier sion methods of Lewin (1943), was a 30 min motivational
for agents to mail correspondence lessons (an impersonal session in which participants discussed adopting a behavior
delivery method), we postulated that using a group meeting suggested in each lesson (i.e. trying recipes, walking for
to motivate the use of each lesson in a series before it was exercise, involving children in food preparation, and eating
distributed was more likely to promote change in food/ calcium-rich foods, not supplements). Participants could
health behaviors. To test this hypothesis, we designed a taste a recipe using the featured calcium-rich food and
two-stage impact study to evaluate two methods of deliver- vote by raised hands on their willingness to individually
ing lessons biweekly at worksites: distribution alone vs adopt the suggested behavior. An agent served as facilita-
distribution in conjunction with a half-hour group meeting. tor/motivator and distributed a lesson at the end of each
Agents delivering the program would work with new session. The other method, impersonal-delivery (I),
delivery sites, new clientele, new content, and new delivery consisted of either the agent or a company contact person
methods. Because of this unfamiliarity and because this simply distributing the required lesson to participants
program had to ®t into worksite environments with differing according to schedule.
work-shift patterns, lunch patterns, physical settings,
personnel departments, and required advertising, we 2.5. Staff training
conducted a formative evaluation of the initial program To insure consistency, all agents received guidelines for
impact and its implementation. We included participants recruitment of worksites and participants, a program content
and instructors in the evaluation, using a variety of qualita- review, a printed program delivery script and instructions
tive and quantitative methods. for instrument administration.
2.2. Target health problem and audience 2.6. Recruitment
Four print lessons in the initial program addressed Seven agents representing three rural and four urban/
prevention of osteoporosis, a recently proclaimed public suburban counties interviewed personnel managers at busi-
health problem (National Institutes of Health, 1985) most nesses within their county and recruited 48 worksites where
often affecting white, elderly women. Prevention requires women comprised over 30% of the work force. Once work-
life long adequate calcium intake and exercise. According to sites were randomly assigned to a delivery method (G or I),
NHANES II data, 75% of American women fail to consume agents systematically recruited participants within a month.
the recommended daily amount of calcium (Walden, 1989).
We targeted working women, ages 21±45, with at least
3. Stage one: Formative evaluation
one child at home because these women are building bone
mass which peaks at age 35±45. Mothers can also provide
We delineate the data collection methods, the evaluation
nutrition activities (Gillespie & Achterberg, 1989) that
design, and analyses.
teach children how to protect bone health.
3.1. Evaluating program implementation
2.3. Lesson content and organization
Our goals were to assess: (a) participant characteristics
The lessons, based on the Health Belief Model (Janz & relative to the prescribed target audience; (b) participant
Becker, 1984), encouraged participants to eat calcium-rich attention to, and use of, the lessons; (c) participant reaction
foods and to walk for exercise by focusing on personal to advertising, lessons content and structure, delivery
susceptibility, disease severity, bene®ts of prevention, and method, and time between lessons and (d) agent reaction
over coming barriers to health protecting actions. Because to delivering the program and its content.
many in the target audience disliked drinking ¯uid milk To address goal (a), we included demographic questions
(based on an initial survey) or could have reactions to in the ®rst questionnaire administered. To address (b), we
milk (Houts, 1988), each lesson introduced a different designed a response sheet for each lesson which asked parti-
calcium-rich food (non-fat dry milk, plain yogurt, canned cipants how completely they had read the lesson, how easy
salmon or tofu) and menu ideas. Each also included scien- it was to read, and how useful it was, and whether they
ti®c background on the lifestyle±osteoporosis link, a self completed the worksheet, tried suggestions or recipes, and
assessment worksheet, a featured food fact sheet, sugges- shared lesson materials. To address (c), we developed focus
tions for involving children in food preparation, and group questions for participants, and, for (d), questions for
calcium-rich recipes. Rightwriter (1990) analysis indicated agents attending a debrie®ng.
a 12th grade reading level. We conducted four focus groups among participants
4. 132 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143
within a month of the intervention, two each for group-
delivery and impersonal-delivery. Each focus group derived
from a purposeful sample of thirty participants composed of
two-thirds completers and one-third non-completers. The
agent telephoned all selected and those who chose to attend
became the sample. We held the debrie®ng with all agents
within a month also. Data consisted of tape recordings and
written notes.
3.2. Evaluating program impact
Our goal was to examine changes in knowledge, attitudes,
and behaviors (KAB) needed to prevent osteoporosis using
appropriate scales, changes in calcium intake using a food
frequency questionnaire (FFQ), and changes in exercise
pattern using speci®c questions. Our hypothesis was persons
in group±delivery would exhibit greater changes in attitude
and behavior scores, calcium intake, and exercise pattern
than those in impersonal-delivery. We anticipated similar
changes in knowledge for both delivery methods because
the same lessons were used; the meeting focused primarily
on motivation.
To assess changes, we developed the KAB scales using
nutrition expert and target audience reviews and internal
consistency testing with 65 of our target audience prior to
Fig. 1. Model of formative and summative evaluation design.
use in Stage One. The ®nal formative instrument contained a
20 item knowledge scale (KR-20 ˆ 0.80); a 22 item attitude
scale (a ˆ 0.78); and a 16 item behavior scale (a ˆ 0.75) all response sheet that participants returned prior to receiving
addressing concepts in the lessons. the next lesson.
We used a modi®ed version of the Block food
frequency questionnaire (Brown & Griebler, 1993) that 3.4. Formative evaluation data analyses
included the four foods featured in the osteoporosis
We used x 2 analysis to compare categorical and ANOVA
lessons to assess calorie and calcium intake. To examine
to compare continuous implementation data, between
exercise behavior directly, we asked participants if they
lessons and between delivery methods, from response sheets
exercised regularly within the last several months each
returned. We examined tape recording transcripts and focus
time they completed the KAB scales; after the lessons
group and debrie®ng notes for repeated themes (Krippen-
we also asked if this exercise pattern was new, and, if
dorff, 1980).
new, if it was due to the lessons.
Data from those completing both KAB instruments were
3.3. Formative evaluation design analyzed and scale scores determined allowing only one
missing value. Each individual's knowledge score was the
We employed a pre-test (T0), 8 week intervention, post- sum of correct answers. Each attitude and behavior state-
test (T1) design to compare group-delivery (G) and imper- ment required a response on a 5-point Likert scale. Each
sonal-delivery (I) (Fig. 1). We arranged the 48 worksites in individual's attitude and behavior scale score was the mean
four blocks re¯ecting business types (white collar, educa- of all their responses to those questions.
tional/municipal, health care and blue collar) and assigned Data from those completing both FFQs were coded,
them randomly to either delivery. Although eleven work- entered, and analyzed using FFQ software (Health Habits
sites withdrew prior to the intervention, primarily due to and History Questionnaire, 1989). Because nutrient value
company changes, the proportion of business types in distributions were not normal, the data were transformed
each delivery method was unaffected. using log e prior to statistical analysis (SAS Proprietary
Participants completed pre KAB and FFQ instruments at software, 1989).
a meeting 1 week prior to receiving lesson one; the last Non-directional t-tests for independent samples were
lesson included post KAB and FFQ instruments, which used to test signi®cance of continuous and ordinal data
participants returned at an optional post program meeting (mean age, education, KAB scores and calcium) between
1 week later or by mail according to Dillman (1978). delivery methods (G vs I) at each time point (T0, T1). Cate-
Question order in each KAB scale differed at each measure- gorical demographic and exercise data were compared using
ment to diminish recall bias. Each lesson included the x 2 analysis. ANOVA for repeated measures and ANCOVA
5. J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 133
Fig. 2. Percent reading all the lesson in each evaluation.
were used to test signi®cance of mean KAB scores and worksheet one, 28% worksheet two, 80% worksheet three,
calcium intake of matched individuals across time. The and 50% worksheet four.
covariates of mean income and employment status were The response sheets assessed whether participants tried
used in testing changes in KAB scores. Signi®cance was recipes, involved children in food preparation, and shared
assumed at #0.05. lesson materials and revealed no signi®cant differences
between delivery methods. Although 37% of method G
tried lesson one recipes compared to 20% in method I, there-
4. Key ®ndings from the formative evaluation of the after, percentages were lower and similar between delivery
initial program groups. Those involving children varied from 11% for
lesson one to 2% for lesson four and those sharing recipes
We outline program implementation ®ndings for goals with friends between 16 and 22% (Fig. 3).
(a)±(d) and impact ®ndings.
4.1.3. Goal (c): Participant reactions
4.1. Program implementation Fifty women (27 from G and 23 from I) participated in the
focus groups. Participants from both delivery methods were
4.1.1. Goal (a): Target audience more likely to remember personal contacts and paycheck
Ultimately, 275/489 (56%) women completed post ¯yers than other advertisements. They recommended
questionnaires that met analysis criteria. Completers changes in lesson format, recipes, worksheets, and
and non-completers did not differ in demographic calcium-rich foods featured. Many found the lesson booklet
characteristics (data not shown). When comparing deliv- cumbersome, the menus unhelpful, the worksheets in two
ery methods, completers differed signi®cantly only in lessons long, and some featured foods dif®cult to adopt.
two factors: percent employed full time (91.6% in G Some participants wanted the emphasis on drinking milk.
vs 81.6% in I) and percent of families with incomes They suggested including menus and microwave instruc-
over $35,000 (57.7% vs 42.4%). tions in the recipes. With some exceptions, women reported
it was dif®cult to involve children in food preparation or that
4.1.2. Goal (b): Participant's attention to and use of their children were grown.
lessons However, some feedback was unique to a delivery
Response sheets returned dropped over the four lessons; method. Group-delivery participants wanted more
Method G dropped from 81% of initial registrants for lesson lecture, more question and answer time, and less moti-
one to 41% for lesson four and method I from 95 to 67%. vational discussion. They could not recall voting to try
Otherwise, the two delivery methods did not differ signi®- a behavior (critical to the discussion±decision method)
cantly in attention to, and use of lessons. but liked the food tasting activity. Impersonal-delivery
Respondents that reported reading all lesson materials fell participants also wanted question and answer time and
from 85% for lesson one to 62±64% for lesson four (Fig. 2). reminders to complete each lesson, but disagreed about
Regardless of delivery method, respondents rated all the period between lessons.
lessons, on a scale of 1±5, fairly easy to read (1.4 ^ 0.6 Participants from both delivery methods revealed that
where 1 ˆ easy to read), and useful (hovering at 2.1 ^ 0.8 they had limited time to try recipes and had not yet put
where 1 is very useful). About 70% reported completing learned health-promoting actions into practice. They
6. 134 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143
Fig. 3. Percent reporting sharing lesson recipe with friends.
disliked the long KAB questionnaire and completing the advice. They felt the recipes needed improvement.
second FFQ, only 2 months after the initial one, when Agents echoed the lack of emphasis on drinking milk,
they had not yet initiated changes in eating habits. a political issue in counties with a dairy industry.
4.1.4. Goal (d): Agent reactions 4.2. Program impact
All agents participated in both delivery methods. They
reported that the advertising materials did not clearly As hoped, changes over time for KAB were signi®-
de®ne the target audience and that in-person appeals cant. As expected, the hypothesis that changes in knowl-
and an enthusiastic site contact improved recruitment. edge would not differ by delivery method was supported.
Despite managing shifts, they preferred the interaction Unexpectedly, the hypothesis that those in group-deliv-
and participant interest in group-delivery and the oppor- ery would show greater gains in attitude, behavior,
tunity for daytime programs. But agents using group- calcium intake, and exercise pattern than those in imper-
delivery resisted being motivators and asked to provide sonal-delivery was not supported. For the KAB
lectures, perceiving that participants wanted prescriptive measures, time by delivery method interaction was not
Fig. 4. Change in knowledge score over time.
7. J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 135
Fig. 5. Change in attitude score over time.
Fig. 6. Change in behavior score over time.
signi®cant (Figs. 4±6). Group delivery did not affect 5.1. Revised program lesson content and recruitment
knowledge, attitude, or behavior scores any more than
impersonal delivery. Changes in calcium intake and We changed the lesson content to address the concerns
exercise pattern were not signi®cantly different between outlined above. We asked six county agents, representing
delivery groups (data not shown). three rural and three suburban counties, to recruit four work-
sites each, a total of 24. We clari®ed the target audience in
advertising materials and directed agents toward in-person
recruiting. We lowered the lessons' reading level to accom-
5. Stage two: The revised program and summative modate participants from more blue collar worksites where
evaluation mothers, ages 21±45, were a signi®cant part of the work
force to insure enrolling more working women with young
The changes made in stage two in the program content, children.
recruitment, delivery method, and evaluation design and
instruments for the summative evaluation are shown in 5.2. Revised program delivery method
Table 1. Almost all re¯ect key ®ndings of the stage one
formative evaluation. The initial program implementation and impact data
8. 136 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143
Table 1
Major changes in educational program, evaluation design and evaluation instruments prompted by results of the formative evaluation
Type of change From To
Program lesson content
z Layout of each lesson Booklet Folder with pull-out fact sheets
z Calcium rich foods Emphasize four non-traditional foods Emphasize ¯uid milk and four non-traditional foods
z Worksheets Lesson 1: 7 day exercise diary Lesson 1: 3 day exercise diary
Lesson 4: long contract to make one Lesson 4: short contract to make one behavior change
behavior change
z Fact sheet: food activities for children Suggestions to involve children in food Retain and give added emphasis in lessons and group meeting
activities
z Recipes Six per lesson with conventional Keep four most popular, but add microwave instructions and
instructions menu suggestions; emphasize testing on weekends
z Reading level 12th grade 8th grade
Program recruitment
z Recruitment of worksites Work force must have a high percentage Target blue collar worksites; work force must have a high
of working women percentage of working mothers
z Advertising for target audience Print material and in-person recruitment Emphasis on in-person recruitment; clarify target audience in
all recruitment material
Program delivery
z Delivery method Group: motivational discussion about Group: lecture stressing 2±3 main points of lesson followed by
overcoming barriers to suggested pep talk about suggested behavior followed by group vote on
behaviors ending with group vote on trying the behavior [try recipes, start walking program, involve
trying the behavior [try recipes, start kids in kitchen, use foods not supplements] plus food tasting
walking program, involve kids in with revised recipes
kitchen, use foods not supplements]
plus food tasting
Impersonal: pass out lessons on schedule Impersonal: pass out lessons on schedule
Evaluation design
z Intervention design Comparison of two delivery methods Comparison of two delivery methods with a control
Pre±post measures, T0 & T1 ÐKAB and Pre±post 4 month post measures: T0, T1 & T2 ÐKAB, T0 &
FFQ T2 ÐFFQ
Response sheet in each lesson; no Response sheet in each lesson; provide incentive for return
incentive to return
Evaluation instruments
z Impact instrument scales KAB questionnaire KAB questionnaire
z 20 knowledge questions z 14 behavior questions
KR-20 ˆ 0.80 KR-20 ˆ 0.725
z 22 attitude questions z 16 attitude questions
a ˆ 0.78 a ˆ 0.80
z 16 behavior questions z 14 behavior questions
a ˆ 0.75 a ˆ 0.80
z Response sheets z Try any suggestion for child activity: For both questions, add the response: no, but plan to
responsesÐyes, no
z Try any recipe: responsesÐyes, no
indicated the group delivery method did not affect atti- tion rate led us to use a pre (T0), immediate post (T1), and 4
tudes and behaviors possibly because agents were month post (T2) summative evaluation design (Fig. 1). We
uncomfortable and did not conduct the meeting accord- asked participants to complete the KAB instrument at all
ing to directions. To rectify this, using Pelz (1959), six time points, but the FFQ only at T0 and T2, a 6 month
agents designed four new 30 min meeting scripts that interval, expecting the T2 measure would detect changes
included two to three main points, retained the food which initial program participants claimed took time to
tasting (with new recipes), and eliminated the motiva- implement.
tional discussion. A suggested action was still promoted To improve our ability to detect changes, we compared
at the end of the meeting and a group vote taken on three intervention groups (two experimentalÐgroup-deliv-
adoption. Agents were trained to use these scripts and ery and impersonal-deliveryÐand one control). The
distributed the lessons biweekly. controls received four correspondence lessons addressing
cancer prevention, identical in design to the osteoporosis
5.3. Summative evaluation design lessons the experimental groups received. The osteoporosis
and cancer lessons differed only in diet±disease context,
Participants' comments and the poor formative comple- bene®cial nutrients and foods, and emphasis on exercise
9. J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 137
in the osteoporosis lessons. In sum, those in group-delivery more accurate assessment of the responses of those complet-
received the modi®ed group meeting and osteoporosis ing the program.
lessons; those in impersonal-delivery only the osteoporosis Impact analysis methods were similar to those used in the
lessons, and the controls only the cancer lessons. formative analyses with these modi®cations: (a) we used
We divided the 24 worksites into ®ve blocks re¯ecting only data of those completing all three KAB or FFQ instru-
relative pay scale and type of worker. These were assigned ments; (b) we allowed up to two missing answers on the
purposefully to the three intervention groups such that there knowledge scale; (c) we tested the signi®cance of continu-
was an equal representation of all ®ve blocks in the two ous and ordinal data among the three delivery groups at
experimental groups while the controls lacked representa- three time points (T0, T1, T2) and (d) age served as the
tion from one of two lower pay blocks. Three companies covariate for ANOVA and ANCOVA. We determined
withdrew prior to recruitment. statistically signi®cant differences among values at time
All participants completed pre-test instruments at a meet- points using pair-wise tests of differences between least-
ing 1 week prior to receiving the ®rst lesson. The post-test squares means. A Bonferoni adjustment was used to control
KAB instrument, distributed with the last lesson, was the overall error rate. Signi®cance was assumed at #0.05.
collected at a concluding meeting 2 weeks later. Three Finally, we compared categorical and continuous demo-
months later the ®nal instruments were distributed to all graphic characteristics (mean age and education) between
participants by the agent or by mail using a modi®ed Dill- the formative and the summative evaluation completers
man Method. using x 2 analysis and non-directional t-tests.
5.4. Evaluating revised program implementation
6. Summative evaluation ®ndings and comparison with
To assess demographics, we included questions in the the formative
pre-test instrument of all three intervention groups. To
assess attention to, and use of the lessons, we included a First, we examine the implementation ®ndings. Then we
response sheet in each lesson for the two experimental examine impacts over time comparing the results to the
groups only. We added a third possible response (no, but I control, looking at differences between the two delivery
plan to) to questions about children's activities or recipes to methods. In each instance we compare the summative ®nd-
capture behavioral intention. ings with those of the formative.
5.5. Evaluating revised program impact 6.1. Program implementation
As in the formative evaluation, we hypothesized that 6.1.1. Goal (a): Target audience
those in group-delivery would exhibit greater changes Completion rates were better and participant demo-
than those in impersonal-delivery in attitude and behavior graphics were closer to those desired in the summative
scores, exercise pattern, and calcium intake. In addition, we compared to the formative. In the summative, 70% of initial
hypothesized that: (a) both experimental groups would exhi- registrants completed all three KAB measures. Almost 90%
bit greater changes in knowledge than controls and (b) those completed the KAB instruments at T0 and T1, in contrast to
in impersonal-delivery would exhibit greater changes than 56% in the formative. Eighty percent completed both FFQ
controls in attitude and behavior scores, exercise pattern, instruments in the summative compared to 51% in the
and calcium intake. formative. Table 2 lists the demographics of completers in
Based on formative participant comments, we shortened both evaluations, ®nding them similar in family income,
the KAB scales to reduce participant burden, hoping to race, marital status, awareness of relatives with osteoporo-
improve completion. Using the formative instruments sis, initial exercise pattern, and calcium intake per 1000
completed by the 677 women registrants for stage one calories. Those completing the summative evaluation were
(mean age 43.34 ^ 11.58), we used internal consistency signi®cantly more likely than those in the formative
testing to eliminate less discriminating items, producing however, to be younger, have only a high school education,
the scales in Table 1. Items retained in the KAB instrument work full time, and have at least one child at home.
(76% of the original) represented all content areas in the In the summative, the two experimental groups and
formative, improving as for two scales. However, no new controls differed signi®cantly in only age (data not
questions were added. Both the question about exercise shown). Those in group-delivery had a mean age of
regularity and the FFQ were not changed for the summative 39.1 ^ 9.95 compared to 40.6 ^ 10.10 in impersonal-
evaluation. delivery and 40.9 ^ 9.61 in the control.
5.6. Summative evaluation data analyses 6.1.2. Goal (b): Participant's attention to, and use of,
lessons
In contrast to the formative, implementation data of those In the summative evaluation, 78% of group-delivery and
completing all four response sheets were used, providing a 63% of impersonal-delivery registrants returned all four
10. 138 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143
Table 2
Demographic characteristics of those completing each evaluation phase. SD ˆ standard deviation; yr. ˆ years
Variable Formative Summative
a
Mean age ^ SD N ˆ 275, 43.02 ^ 11.12 N ˆ 247, 40.15 ^ 9.88
Family income N ˆ 255 N ˆ 232
# 10±19.9000 41 (16.1%) 45 (19.4%)
20±34.9000 92 (36.1%) 83 (35.8%)
35±50,000 1 122 (47.8%) 104 (44.8%)
Employment status a N ˆ 272 N ˆ 245
% full time 235 (86.4%) 232 (94.7%)
Mean educational level ^ SD (yr.) b N ˆ 268, 13.40 ^ 1.97 N ˆ 246, 12.70 ^ 1.54
Race N ˆ 272 N ˆ 246
% white 260 (95.6%) 239 (97.2%)
Marital status N ˆ 272 N ˆ 246
Married 200 (73.5%) 175 (71.1%)
Single 40 (14.7%) 27 (11.0%)
Other 32 (11.7%) 44 (17.9%)
Percent with at least one child at home b N ˆ 274, 129 (47.1%) N ˆ 247, 163 (66.0%)
Relatives with osteoporosis N ˆ 246 N ˆ 247
Yes 49 (19.9%) 35 (14.2%)
No 167 (67.9%) 166 (67.2%)
Don't know 30 (12.2%) 46 (18.6%)
Exercise regularly in past 6 months? N ˆ 269 N ˆ 246
Yes 94 (34.9%) 90 (36.6%)
No 175 (65.1%) 156 (63.4%)
N ˆ 248 N ˆ 244
Calories (mean ^ SD) 1623.8 ^ 584.4 1798.0 ^ 711.2
Calcium in mg 805.4 ^ 415.5 895.3 ^ 549.2
Calcium in mg/1000 calories 497.1 ^ 178.9 490.2 ^ 206.3
a
p , 0.01.
b
p , 0.001.
response sheets, providing the sample for analysis. tofu respectively, signi®cantly more useful than did
However, formative and summative return rates are not impersonal-delivery participants, whereas in the forma-
comparable because we did not restrict that sample to tive, the ratings were identical.
those completing all four. Questions about materials use revealed that the summa-
In both evaluations, we asked participants if they read all, tive results differed from the formative, regardless of the
parts of, skimmed, or did not read each lesson. Similar to the delivery method, in that:
formative, those reading the whole lesson declined to 65±
70% by lesson four. Unlike the formative where there was ² 3±10% tried a recipe (data not shown) compared to 10±
no difference between delivery groups in percent reading the 37% in the formative. Yet, in the summative, two thirds
lessons, in the summative, signi®cantly more in group- or more in both delivery groups indicated they planned to
delivery skimmed and less read lessons one and two than try a recipe, an option not available in the formative.
in impersonal-delivery (Fig. 2). ² 11±21% involved children in food preparation compared
Completion rates of the worksheets did not differ between to 2±11% in the formative. Additionally, in contrast to
evaluations with one exception. More completed worksheet the formative, the summative exposed differences among
two, a revised exercise diary, in the summative than in the delivery groups, in that the percent sharing a recipe with
formative (40 vs 28%, respectively). friends was signi®cantly greater in group-delivery than
In both evaluations, respondents were asked how easy impersonal-delivery for lessons two (20 and 6% respec-
to read and how useful each lesson was. Respondents in tively) and four (24 and 10%) while no signi®cant differ-
both evaluations provided nearly identical ratings of ease ences between delivery methods was evident in the
of reading regardless of delivery method, suggesting the formative (Fig. 3). Lessons two and four featured less
lower reading level of the summative materials made it familiar calcium-rich foods.
easier for the less educated participants. Respondents in ² Consistently more in group-delivery shared other
both evaluations provided nearly identical ratings of lesson materials with friends than in impersonal-delivery
perceived usefulness for lessons one and two; however, (16±22% vs 9±11%). This distinction was not seen in the
in the summative, group-delivery participants rated formative where about 15% in both delivery methods
lessons three and four, highlighting canned salmon and shared materials (data not shown).
11. J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 139
6.2. Program impact greater impact than those of the formative although the
difference was in the right direction.
6.2.1. Change in knowledge
As expected, our hypothesis that both group-delivery and
impersonal-delivery of osteoporosis lessons would lead to 7. Discussion
greater knowledge gain than in controls was supported
(Fig. 4). Indeed, the gain in knowledge in group-delivery The purpose of this study was to test whether the changes
was signi®cantly greater than that in impersonal-delivery made in a program as a result of a formative evaluation
and both were signi®cantly greater than that of the strengthened the implementation and impact of the program
controls at both T1 and T2. This was not seen in the formative in the summative. We found that implementation improved
evaluation. but only certain measures of impact improved enough to
distinguish the effects of each delivery method. As a result
we suggest the following for evaluators to consider in
6.2.2. Change in attitude designing a formative evaluation and in attempting to assess
Our hypothesis that those in the experimental groups its effectiveness.
would show greater gains in attitude than controls was not
supported in the summative evaluation (Fig. 5). Our hypoth- 7.1. Improving the design of a formative evaluation
esis that gains in attitude would differ between delivery
methods was not supported either. The initial mean attitude 7.1.1. Interpreting (focus group) feedback from participants
scores in the summative were signi®cantly higher than those As a result of participants' comments in the formative
in the formative (3.9±4.0 vs 3.0±3.1, respectively), perhaps focus groups, we extensively altered the lessons for the
due to increased media focus on osteoporosis or to differ- summative and this did lead to greater use of a formerly
ences in participants, and could have limited our ability to underused worksheet, continued ease of reading with parti-
improve attitudes and thus to detect signi®cant changes. We cipants with lower educational levels, and more involve-
did not detect the in¯uence of any previous worksite educa- ment of children in food preparation.
tional activities, however. Focus group participants wanted less motivational discus-
sion and more lecture in the meetings. In response, we
increased lecture time in the group-delivery method. The
6.2.3. Change in behavior
in¯uence of the group-delivery method was underscored
Our hypothesis that experimental groups would show
because group-delivery participants were more likely to
greater gains in behavior than controls was partially skim than to read the lessons. Even with less reliance on
supported in the summative (Fig. 6). Gains in behavior for reading, group-delivery participants found two lessons
group-delivery were signi®cantly greater than the control at signi®cantly more useful and were more likely to share
T2. Administering the behavior scale 4 months after the lesson materials with others than those in impersonal-deliv-
intervention (T2) identi®ed further changes in participant ery. The agent presentation in group-delivery was clearly
behavior in group-delivery that were not seen in controls, critical to sell the lessons. In this case, following participant
even when controlling for age, a possible explanation. Our
recommendations appeared to improve impact on knowl-
hypothesis that those in group-delivery would show greater
edge gained, but not in most areas of behavior change.
gains in food-related behavior than those in the impersonal- When the focus group participants indicated they did not
delivery was not supported in the summative evaluation. like the group-delivery motivational discussion±decision
session, they also indicated a preferred alternative. Given
6.2.4. Changes in calcium intake and exercise pattern the extensive literature on assessing participant perspectives
Our hypothesis that those in experimental groups would (Basch,1987), we assumed we needed to alter the delivery
show greater gains in calcium intake than controls was not method to one they liked (especially since the agents echoed
supported in the summative. Our hypothesis that those in this preference) and ultimately emphasized lecture over
group-delivery would show greater gains than those in motivation in the summative. In hindsight, we should
impersonal-delivery was also not supported. The summative have asked the focus group participants how to change the
delivery methods did not have a greater differential impact motivational aspects of the session with its emphasis on
than those of the formative. behavior modi®cation, rather than abandon this based on
Our hypothesis that experimental groups would report their negative feedback.
greater change in exercise patterns than controls was A researcher using focus groups should not assume that if
partially supported (data not shown). In the summative participants express dislike of a particular delivery method
evaluation, the number at T2 who reported that exercising and suggest another, that one should drop the original
regularly was a new pattern was signi®cantly greater in method without considering the down stream effects. Be
group-delivery than in controls. Our hypothesis that exer- prepared to probe to learn why it was disliked and how it
cise patterns would differ between delivery methods was not might be modi®ed, especially if participants are suggesting
supported. The summative delivery methods did not have a a more passive path to learning. By making the assumption
12. 140 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143
participants were right, we limited focus group inquiry and to plan the summative. In hindsight, we should have taken a
the type of information collected for program planning in proactive stance once these unanticipated barriers surfaced
the summative. in the initial focus group, and posed questions to later focus
We revised the recipes, believing these could be used as groups based on the comments offered in previous ones.
the context to show participants how to use calcium-rich
foods and as a device to facilitate behavior change. In the 7.1.3. Interpreting feedback from program instructors
focus groups, we listened openly to complaints about the As a result of the formative, we allowed instructors to
recipes and asked participants how to improve them. Some change the presentation in the group delivery method,
participants indicated they came to the program for the believing they would have greater ownership and thus
recipes but disliked those provided. Despite these partici- impact, if they designed a presentation with which they
pant-guided revisions, reported use of recipes did not were more comfortable. The remedy the instructors devel-
improve in the summative, although a large number indi- oped, a lecture rather than a motivational discussion, led to
cated they planned to. an emphasis on knowledge rather than on behavior change
In the formative, we assumed that when a component of in the presentation. This may partially explain why the two
the program i.e., the recipes, was poorly received by parti- delivery methods did not differ signi®cantly in ®nal beha-
cipants that this merely needed improvement and thus we vior scale scores. And it may also explain the signi®cantly
speci®cally asked participants for suggestions. In hindsight, greater gain in knowledge in the group- than the impersonal-
the question we should have asked ®rst, given the delivery.
complaints, was one that tested our assumption that partici- The formative evaluation provided feedback about the
pants valued recipes, i.e. should recipes be included in the presentation. We assumed that when a component of the
lessons at all? And, if the answer was no, we should have program i.e., the presentation, was unacceptable to instruc-
been ready to discuss with focus group participants, alter- tors that it should be changed and we asked for suggestions.
native devices to motivate behavior change. The acceptable change did not lead to greater impact.
Researchers designing formative evaluations need to be Although instructor suggestions carry great face validity,
alert for such a methodological inconsistency: why did researchers need to be wary because instructors may provide
participants give us suggestions to improve the recipes suggestions that shift the aims of the program. In hindsight,
when, in fact, many had not used these (10±30% in the what we should have done was to propose to the instructors,
formative). This might be explained by the inclination of alternative presentation methods that retained an emphasis
people to provide answers to questions when they are asked. on behavior change. If none were acceptable, we should
People have a tendency to tell more than they can know have queried our assumption that agents were the appropri-
(Nisbett & Wilson, 1977). ate instructors for the group presentation.
In summary, researchers using focus groups must be
prepared to probe assertions by participants that some
component of a program is unsatisfactory. Probing should 7.1.4. Incorporating a control group
investigate both how that component might be improved We included impact measures in the formative to gain a
and retained as well as what might be substituted and quantitative estimate of the effects of the initial program but
why. In particular, asking a question about the fundamental we did not include a control group because we expected the
usefulness of some component of a program in a formative contrast between group- and impersonal-delivery to be
evaluation may not be easy for researchers as they may have robust. Due to instructor dif®culties with the stage one
considerable ownership and resources invested in that group-delivery, this contrast did not materialize. Although
component. However, when faced with negative feedback we had evidence that participants were learning from the
about a component of the initial program, researchers print materials in both delivery methods, we could not
should investigate both options to insure suf®cient data for demonstrate that the change seen was better than with no
an informed decision about summative activities. intervention. Hence we suggest that evaluators include a
control group when testing the impact of a new program
7.1.2. Assessing barriers to changing behaviors offered via different delivery methods in a formative
Because participants came to this program on calcium- evaluation.
rich foods, we assumed that they would be open to change
and that testing the suggested foods at family meals would 7.1.5. Watching for serendipitous effects of formative
be acceptable. Some focus group participants indicated it evaluation
was dif®cult to introduce these foods because of family Including impact measures lead to unexpected participant
member aversion to change. On hearing this, we assumed feedback about the instruments and their mode of adminis-
that altered recipes would be suf®cient to overcome opposi- tration. This feedback proved invaluable in improving
tion and explored this in all focus groups. By not making a program implementation in the summative, underscoring
conscious effort to uncover social barriers to changing food the usefulness of impact measures in formative evaluations.
choices for families, we limited the information we gained In hindsight, we recommend including these measures in the
13. J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 141
formative to assess impact as well as to obtain feedback the subsequent effects of formative evaluation on a program.
about the instruments and their implementation. This study begins to ®ll that gap.
This study demonstrates that formative evaluation can
7.2. Assessing the effectiveness of the formative evaluation strengthen the implementation and impact of an educational
intervention designed to compare the impact of two program
The need exists to demonstrate the subsequent effects of
delivery methods. The modi®cations made as a result of the
formative evaluation in order to improve formative evalua-
formative appear to have signi®cantly improved knowledge
tion design. Our study provides some guidance.
gained but resulted in only modest improvements in beha-
viors in the ®nal program.
7.2.1. Altering the evaluation design
Our retrospective analysis of our experience supports the
As a result of the formative, we added a control group and a
inclusion of a control group and impact measures at several
3-month post-intervention measurement in the summative.
time points in a formative evaluation of a program imple-
Without either of these, we would not have been able to
mented in a new environment for agency personnel. The
observe some signi®cant differences between the two delivery
®ndings also suggest that when researchers are faced with
methods in the summative. Three months after the interven-
negative feedback about the components of the program in a
tion, group delivery produced signi®cantly greater changes in
formative evaluation, they need to exercise care in interpret-
behavior scores and exercise pattern than seen in the controls
ing feedback and in revising those components.
while the impersonal-delivery method did not. This design
In addition, the need to gather evidence of the subsequent
change also revealed that the group method lead to signi®-
effect of using data from formative evaluations is critical.
cantly greater knowledge gain than the impersonal method
Otherwise we cannot begin to examine whether the methods
and both gains were greater than that of controls. Clearly our
and processes we take for granted in the formative evalua-
target audience was not likely to adopt behavior changes based
tion are valid and appropriate. However, evaluators must
on receiving just the printed materials.
carefully plan the evaluations making sure that instruments
However bene®cial these changes in evaluation design
and evaluation design are parallel in order to carry out these
were in illuminating important ®ndings about the impact
comparisons. We offer our ®ndings as stimulus to consider
of the ®nal program, implementing them only in stage two
such studies.
prevented a rigorous comparison between formative and
summative evaluations which might have allowed us to
see more clearly, the effect of the formative evaluation.
Lack of the T2 measure in the formative meant we could
References
not be sure what effect the initial program had 3 months later
and lack of a control group meant we could not fully inter- Baggaley, J. (1986). Formative evaluation of educational television. Cana-
pret the formative impact data. In future, we recommend dian Journal of Educational Communication, 15 (1), 29±43.
that if researchers want to assess the effectiveness of a Baghdadi, A. A. (1981). A comparison between two formative evaluation
formative evaluation, that the design elements of formative methods. Dissertation Abstracts International, 41 (8), 3387A.
and summative steps be parallel. These design details may Baker, E. L., & Alkin, M. C. (1973). Formative evaluation of instructional
development. AV Communication Review, 21 (4), 389±418.
seem more appropriate for a summative evaluation but they Basch, C. E. (1987). Focus group interview: An underutilized research
are necessary to see the effects of the formative. technique for improving theory and practice in health education. Health
Education Quarterly, 14 (4), 411±448.
7.2.2. Altering the evaluation instruments Bertrand, J. (1978). Communications pretesting (Media Monograph No.
Six). Chicago: University of Chicago, Community and Family Study
As a result of the formative, we shortened the KAB ques-
Center.
tionnaire somewhat to address participant complaints in the Brown, J. L., & Griebler, R. (1993). Reliability of a short and long
summative. We believe this contributed to more participants version of the Block food frequency from for assessing changes in
completing all the evaluation instruments in stage two. Thus calcium intake. Journal of the American Dietetic Association, 93
implementation improved in the summative. However, this (7), 784±789.
Cambre, M. (1981). Historical overview of formative evaluation of instruc-
improvement carried a price. Subsequently, we were not
tional media products. Educational Communication & Technology
able to conduct the most rigorous comparison of the forma- Journal, 29 (1), 3±25.
tive KAB results to those of the summative. Cardinal, B. J., & Sachs, M. L. (1995). Prospective analysis of stage-of-
exercise movement following mail-delivered, self-instructional exer-
cise packets. American Journal of Health Promotion, 9 (6), 430±432.
8. Conclusions Chambers, F. (1994). Removing confusion about formative and summative
evaluation: Purpose versus time. Evaluation and Program Planning, 17
The literature on formative evaluation focuses on its (1), 9±12.
conceptual framework, its methodology and use. Permeat- Chen, H. T. (1996). A comprehensive typology for program evaluation.
Evaluation Practice, 17 (2), 121±130.
ing this work is a consensus that a program will be strength- Chen, C. H., & Brown, S. W. (1994). The impact of feedback during
ened as a result of a formative evaluation although little interactive video instruction. International Journal of Instructional
empirical evidence exists in the literature to demonstrate Media, 21 (3), 191±197.
14. 142 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143
Dehar, M. A., Casswell, S., & Duignan, P. (1993). Formative and process veri®cation and revision: An experimental comparison of two methods.
evaluation of health promotion and disease prevention programs. AV Communication Review, 24 (3), 316±328.
Evaluation Review Journal, 17 (2), 204±220. Kaufman, R. (1980). A formative evaluation of formative evaluation:
Dick, W. (1980). Formative evaluation in instructional development. Jour- The state of the art concept. Journal of Instructional Development, 3
nal of Instructional Development, 3 (3), 2±6. (3), 1±2.
Dick, W., & Carey, L. (1985). The systematic design of instruction, (2nd ed) Kershaw, D., & Fair, J. (1976). The New Jersey income maintenance
Glenview, IL: Scott, Foresman. experiment, New York: Academic Press.
Dillman, D. A. (1978). Mail and telephone surveys: The total design Kishchuk, N., Peters, C., Towers, A. M., Sylvestre, M., Bourgault, C., &
method, New York: John Wiley and Sons. Richard, L. (1994). Formative and effectiveness evaluation of a work-
Dennis, M. L., Fetterman, D. M., & Sechrest, L. (1994). Integrating quali- site program promoting healthy alcohol consumption. American Jour-
tative and quantitative evaluation methods in substance abuse research. nal of Health Promotion, 8 (5), 353±362.
Evaluation and Program Planning, 17 (4), 419±427. Krippendorff, K. (1980). Content analysis: An introduction to its methodol-
Fairweather, G., & Tornatzky, L. G. (1977). Experimental methods for ogy, Sage: Beverly Hills, CA.
social policy research, New York: Pergamon. Lenihan, K. (1976). Opening the second gate, Washington, DC: U.S.
Finnegan Jr, J. R., Rooney, B., Viswanath, K., Elmer, P., Graves, K., Government Printing Services.
Baxter, J., Hertog, J., Mullis, R., & Potter, J. (1992). Process evaluation Lewin, K. (1943). Forces behind food habits and methods of change. In
of a home-based program to reduce diet-related cancer risk. The `WIN The Problem of Changing Food Habits. National Research Council
at Home Series'. Health Education Quarterly, 19 (2), 233±248. Bulletin 108. (pp. 35±65). Washington, D.C.: National Academy of
Fitz-Gibbon, C. T., & Morris, L. L. (1978). How to design a program Sciences.
evaluation, Beverly Hills, CA: Sage Publications. Markle, S. M. (1979). Evaluating instructional programs: How much is
Flagg, B. N. (1990). Formative evaluation for educational technologies, enough? NSPI Journal, Feb, 22±24.
Hillsdale, NJ: Lawrence Erlbaum Associates. Markle, S. M. (1989). The ancient history of formative evaluation. Perfor-
Flay, B. R. (1986). Ef®cacy and effectiveness trials (and other phases of mance and Instruction, Aug, 27±29.
research) in the development of health promotion programs. Preventa- McGraw, S. A., McKinley, S. A., McClements, L., Lasater, T. M., Assaf,
tive Medicine, 15, 451±474. A., & Carleton, R. A. (1989). Methods in program evaluation: The
Foshee, V., McLeroy, K. R., Sumner, S. K., & Bibeau, D. L. (1986). process evaluation system of the Pawtucket Heart Health Program.
Evaluation of worksite weight loss programs: A review of data and Evaluation Review, 13 (5), 459±483.
issues. Journal of Nutrition Education, 18 (1), S38±S43. McGraw, S. A., Stone, E. J., Osganian, S. K., Elder, J. P., Johnson, C. C.,
Geis, G. L. (1987). Formative evaluation: Developmental testing and expert Parcel, G. S., Webber, L. S., & Luepker, R. V. (1994). Design of
review. Performance & Instruction, May/June, 1±8. process evaluation within the child and adolescent trial for cardio-
Gillespie, A., & Achterberg, C. (1989). Comparison of family interaction vascular health (CATCH). Health Education Quarterly, Supplement,
patterns related to food and nutrition. Journal of the American Dietetic 2, S5±S26.
Association, 89 (4), 509±512. Montague, W. E., Ellis, J. A., & Wulfeck, W. H. (1983). Instructional
Glanz, K., & Seewald-Klein, T. (1986). Nutrition at the worksite: An over- quality inventory: A formative evaluation tool for instructional devel-
opment. Performance and Instruction Journal, 22 (5), 11±14.
view. Journal of Nutrition Education, 18 (1), S1±S12.
Nathenson, M. B., & Henderson, E. S. (1980). Using student feedback to
Glanz, K., Sorensen, G., & Farmer, A. (1996). The health impact of work-
improve learning materials, London: Croom Helm.
site nutrition and cholesterol intervention programs. American Journal
National Institutes of Health (1985). Surgeon general's report on nutrition
of Health Promotion, 10 (6), 453±470.
and health. U.S. Department of Health and Human Services, Public
Hausman, J. A., & Wise, D. A. (1985). Social experimentation, Chicago:
Health Service (Chapter 7, pp. 311±343). Washington, DC: U.S.
The University of Chicago Press.
Government Printing Service.
Hill, M., May, J., Coppolo, D., & Jenkins, P. (1993). Long term effective-
Nisbett, R. E. & Wilson, T. D. (1977). Tellimg more than we can know:
ness of a respiratory awareness program for farmers. National Institute
Verbal reports on mental processes. Psychological Review, 84(3), May.
for Farm Safety, Inc. NIFS Paper No. 93-3. Columbia, MO. NIFS
Parrott, R., Steiner, C., & Godenhar, L. (1996). Georgia's harvesting
Summer Meeting, Coeur d'Alene, Idaho.
healthy habits: A formative evaluation. The Journal of Rural Health,
Health Habits And History Questionnaire: Diet History And Other Risk
12 (4), 291±300.
Factors (1989). Personal computer system packet. Version 2.2.
Patton, M. Q. (1978). UtilizationÐfocused evaluation, Beverly Hills, CA:
Washington, D.C.: National Cancer Institute, Division of Cancer
Sage.
Prevention and Control, National Institutes of Health.
Patton, M. Q. (1982). Practical evaluation, Beverly Hills, CA: Sage.
Houts, S. A. (1988). Lactose intolerance. Food Technology, 42 (3), 110±
Patton, M. Q. (1994). Developmental evaluation. Evaluation Practice, 15
113. (3), 311±319.
Iszler, J., Crockett, S., Lytle, L., Elmer, P., Finnegan, J., Luepker, R., & Patton, M. Q. (1996). A world larger than formative and summative.
Laing, B. (1995). Formative evaluation for planning a nutrition inter- Evaluation Practice, 17 (2), 131±144.
vention: Results from focus groups. Journal of Nutrition Education, 27 Pelletier, K. R. (1996). A review and analysis of the health and cost-effec-
(3), 127±132. tive outcome studies of comprehensive health promotion and disease
Jacobs Jr, D. R., Luepker, R. V., Mittelmark, M. B., Folsom, A. R., Pirie, P. prevention programs at the worksite: 1993±1995 update. American
L., Mascioli, S. R., Hannan, P. J., Pechacek, T. F., Bracht, N. F., Carlaw, Journal of Health Promotion, 10 (5), 380±388.
R. W., Kline, F. G., & Blackburn, H. (1986). Community-wide Pelz, E. B. (1959). Some factors in group decision. In E. E. Macoby, T. M.
prevention strategies: Evaluation design of the Minnesota heart health Newcomb & E. L. Hartley, Readings in social psychology (3rd ed).
program. Journal of Chronic Disease, 39 (10), 775±788. (pp. 212±219). New York: Holt, Rinehart and Winston, Inc.
Janz, N. K., & Becker, M. H. (1984). The health belief model: A decade Peterson, K. A., & Bickman, L. (1988). Program personnel: The missing
later. Health Education Quarterly, 11 (1), 1±47. ingredient in describing the program environment. In J. Kendon,
Johnson, C. C., Osganian, S. K., Budman, S. B., Lytle, L. A., Barrera, E. P., Conrad Roberts-Gray & Cynthia Roberts-Gray, Evaluating program
Bonura, S. R., Wu, M. C., & Nader, P. R. (1994). CATCH: Family environments, San Francisco, CA: Jossey-Bass, Inc.
process evaluation in a multicenter trial. Health Education Quarterly, Potter, J. D., Graves, K. L., Finnegan, J. R., Mullis, R. M., Baxter, J. S.,
Supplement, 2, S91±S106. Crockett, S., Elmer, P. J., Gloeb, B. D., Hall, N. J., Hertog, J., Pirie, P.,
Kandaswamy, S., Stolovitch, H. D., & Thiagarajan, S. (1976). Learner Richardson, S. L., Rooney, B., Slavin, J., Snyder, M. P., Splett, P., &
15. J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 143
Viswanath, K. (1990). The cancer and diet intervention project: a Seidel, R. E. (1993). Notes from the ®eld in communication for child survi-
community-based intervention to reduce nutrition-related risk of val, Washington, DC: USAID.
cancer. Health Education Research, 5 (4), 489±503. Stuf¯ebeam, D. L. (1983). The CIPP model for program evaluation. In G.
Rightwriter (1990). Version 3.1. Sarasota, FL: RightSoft, Inc. Madaus, M. Scriven & D. Stuf¯ebeam, Evaluation models: Viewpoints
Robins, P. K., Spiegelman, R. G., Weiner, S., & Bell, J. G. (1980). A on educational and human services evaluationBoston: Kluwer-Nijhoff.
guaranteed annual income: Evidence from a social experiment, New Tessmer, M. (1993). Planning and conducting formative evaluations,
York: Academic Press. London: Kogan Page.
Rossi, P. H., & Lyall, K. (1976). Reforming public welfare, New York: Thiagarajan, S. (1991). Formative evaluation in performance technology.
Russell Sage. Performance Improvement Quarterly, 4 (2), 22±34.
Rossi, P. H., & Freeman, H. E. (1982). Evaluation: A systematic approach Wager, J. C. (1983). One-to-one and small group formative evaluation: An
(p. 69). Beverly Hills, CA: Sage Publications. examination of two basic formative evaluation procedures. Perfor-
Russell, J. D., & Blake, B. L. (1988). Formative and summative evaluation mance and Instruction, 22 (5), 5±7.
of instructional products and learners. Educational Technology, 28 (9), Walden, O. (1989). The relationship of dietary and supplemental calcium
22±28. intake to bone loss and osteoporosis. Journal of the American Dietetic
SAS Proprietary Software Release 6.09. (1989). Cary, N.C.: SAS Institute, Association, 89 (3), 397±400.
Inc. Weston, C. B. (1986). Formative evaluation of instructional materials: An
Scanlon, E. (1981). Evaluating the effectiveness of distance learning: A overview of approaches. Canadian Journal of Educational Communi-
case study. In F. Percival & H. Ellington, Aspects of educational tech- cation, 15 (1), 5±17.
nology: Vol. XV: Distance learning and evaluation (pp. 164±171). Weston, C. B. (1987). The importance of involving experts and learners in
London: Kogan Page. formative evaluation. Canadian Journal of Educational Communica-
Scheirer, M. A. (1994). Designing and using process evaluation. In J. S. tions, 16 (1), 45±58.
Wholey, H. Hatry & K. Newcomer, Handbook of practical program Wilkinson, T. L., Schuler, R. T., & Skjolaas, C. A. (1993). The effect of
evaluation (pp. 40±68). San Francisco: Jossey-Bass. safety training and experience of youth tractor operators. National Insti-
Scheirer, M. A., & Rezmovic, E. L. (1983). Measuring the degree of tute for Farm Safety, Inc. NIFS Paper No. 93±6. Columbia, MO. NIFS
program implementation. Evaluation Review, 7 (5), 599±633. Summer Meeting, Coeur d'Alene, Idaho.
Schneider, M. L., Ituarte, P., & Stokols, D. (1993). Evaluation of a commu- Witte, K., Peterson, T. R., Vallabhan, S., Stephenson, M. T., Plugge, C. D.,
nity bicycle helmet promotion campaign: What works and why. Amer- Givens, V. K., Todd, J. D., Bechtold, M. G., Hyde, M. K., & Jarrett, R.
ican Journal of Health Promotion, 7 (4), 281±287. (1992/3). Preventing tractor-related injuries and deaths in rural popula-
Scriven, M. (1967). The methodology of evaluation. In R. Tyler, R. Gagne tions: Using a persuasive health message framework in formative
& M. Scriven, Perspectives of curriculum evaluation (pp. 39±83). evaluation research. International Quarterly of Community Health
Chicago: Rand McNally. Education, 13 (3), 219±251.