Evaluation and Program Planning 24 (2001) 129±143 www.elsevier.com/locate/evalprogplan Assessing the subsequent effect of a formative evaluation on a program J. Lynne Brown a,*, Nancy Ellen Kiernan b a Penn State University, Department of Food Science, 203B Borland, University Park, Pennsylvania, PA, USAb Penn State University, College of Agricultural Sciences, 401 Agricultural Administration Building, University Park, Pennsylvania, PA 814-863-3439, USA Received 30 June 1999; received in revised form 1 September 2000; accepted 31 October 2000Abstract The literature on formative evaluation focuses on its conceptual framework, methodology and use. Permeating this work is a consensusthat a program will be strengthened as a result of a formative evaluation although little empirical evidence exists in the literature todemonstrate the subsequent effects of a formative evaluation on a program. This study begins to ®ll that gap. To do this, we outline theinitial program and formative evaluation, present key ®ndings of the formative evaluation, describe how these ®ndings in¯uenced the ®nalprogram and summative evaluation, and then compare the ®ndings to those of the formative. The study demonstrates that formativeevaluation can strengthen the implementation and some impacts of a program, i.e. knowledge and some behaviors. The ®ndings also suggestthat when researchers are faced with negative feedback about program components in a formative evaluation, they need to exercise care ininterpreting and using this feedback. q 2001 Elsevier Science Ltd. All rights reserved.Keywords: Formative evaluation; Summative evaluation; Impact; Assessing feedback1. Introduction Morris, 1978); debated its frequency and timing in the program cycle (Markle, 1979; Thiagarajan, 1991; Russell Formative evaluation commands a formidable place in & Blake, 1988; Chambers, 1994); scrutinized its overlapthe evaluation literature. Highly regarded, the process was with process evaluation (Patton, 1982; Stuf¯ebeam, 1983;used to improve educational ®lms in the 1920s (Cambre, Scheirer & Rezmovic, 1983; Dehar, Casswell & Duignan,1981). Academic areas as diverse as agricultural safety 1993; Scheirer, 1994; Chen, 1996); and expanded its epis-(Witte, Peterson, Vallabhan, Stephenson, Plugge, Givens temological framework, linking it to developmentalet al., 1992/93) and cardiovascular disease (Jacobs, Luep- programs (Patton, 1996). As the conceptual frameworkker, Mittelmark, Folsom, Pirie, Mascioli et al., 1986) draw evolved, the perceived value of formative evaluation hason the process today, using ®ndings to improve a program; only increased.among educators in particular, it is `almost universally Second, the literature focuses on methods and designembraced (Weston, 1986, p. 5). Surprisingly, the sub- strategies to conduct formative evaluation. That focussequent effect of using the ®ndings of formative evaluation appears ®rst, in handbooks or articles describing methodshas not received systematic attention. This paper address and design strategies for either an entire program (Rossi &that gap. Freeman, 1982; Patton, 1978; Fitzs-Gibbon & Morris, 1978) The literature focuses attention on three aspects of forma- or a segment of a program such as the materials (Weston,tive evaluation, the ®rst of which is its conceptualization. 1986; Bertrand, 1978), instruction (Tessmer, 1993), electro-Over time, researchers clari®ed the concept. They distin- nic delivery like television (Baggaley, 1986), or interactiveguished it from other forms of evaluation especially summa- technology (Flagg, 1990; Chen & Brown, 1994). The focustive, the fundamental difference being the rationale and use on method and strategies appears secondly, in case studiesof the data (Baker & Alkin, 1973; Markle, 1989; Patton, which illuminate a particular method or strategy tailored to1994; Chambers, 1994; Weston, 1986); labeled it formative the exigencies of a particular situation such as a communityevaluation (Scriven, 1967) and accepted that designation (Jacobs et al., 1986; Johnson, Osganian, Budman, Lytle,(Rossi & Freeman, 1982; Patton, 1982; Fitz-Gibbon & Barrera, Bonura et al., 1994; McGraw, Stone, Osganian, Elder, Johnson, Parcel et al., 1994; McGraw, McKinley, * Corresponding author. Tel.: 11-814-863-3973; fax: 11-814-863-6132. McClements, Lasater, Assaf & Carleton, 1989) or worksite E-mail address: email@example.com (J.L. Brown). (Kishchuk, Peters, Towers, Sylvestre, Bourgault & Richard,0149-7189/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved.PII: S 0149-718 9(01)00004-0
130 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±1431994). Over time, the focus on methods and strategies illu- ers take the next step and demonstrate this by comparingminated critical decisions needed to design a valid forma- data from the initial program with data from the ®naltive evaluation. The decisions include: (1) who should program to show whether the changes resulted in anparticipateÐexperts (Geis, 1987), learners from the improvement in program implementation and impacts.targeted audience (Weston, 1986; Russell & Blake, 1988), Reviewing over 60 years of work in formative evaluation,learners with different aptitudes (Wager, 1983), instructors scholars (Flagg, 1990; Dick, 1980; Dick & Carey, 1985;representative of those in the ®eld (Weston, 1987; Peterson Weston, 1986,) found that the `evidence is supportive but& Bickman, 1988), or drop outs from a program (Rossi & meager (Geis, 1987, p. 6). Furthermore, most evidenceFreeman, 1982); (2) how many to include and in what (Baker & Alkin, 1973; Baghdadi, 1981; Kandaswamy,formÐone or a group (Wager, 1983; Dick, 1980); (3) Stolovitch & Thiagarajan, 1976; Nathenson & Henderson,type of data to collectÐqualitative or quantitative (Dennis, 1980; Scanlon, 1981; Wager, 1983; Montague, Ellis &Fetterman & Sechrest, 1994; Peterson & Bickman, 1988; Wulfeck, 1983; Cambre, 1981) relates to only a componentFlay, 1986); (4) data collection techniques (Weston, 1986; of a program, the educational materials, not to an entireTessmer, 1993) and (5) similarity of pilot sessions relative program. Some landmark studies examine the impact ofto actual learning situations (Rossi & Freeman, 1982; an entire program in its formative stage such as the use ofWeston, 1986). Not surprisingly, the conviction permeating negative income tax strategies as a substitute for welfarethe literature on methods and strategies is that formative (Kershaw & Fair, 1976; Rossi & Lyall, 1976; Robins, Spie-evaluation will lead to a stronger, more effective program. gelman, Weiner & Bell, 1980; Hausman & Wise, 1985) and Third, attention in the literature dwells on the immediate the Department of Labors LIFE effort to decrease arrestuse of formative evaluation ®ndings. Academic areas such rates among released felons with increased employmentas nutrition (Cardinal & Sachs, 1995), cancer prevention for (Lenihan, 1976), but only a few, such as those reported byagricultural workers (Parrott, Steiner & Goldenhar, 1996), Fairweather and Tornatzky (1977), actually document thatand child health (Seidel, 1993) have evaluated a program in the changes made as a result of a formative evaluationits formative stage. In case studies such as these, researchers resulted in a change in the impact of the ®nal program.hail the evaluation process, describing the immediate effects Given that researchers hail formative evaluation as impor-of the evaluation, i.e., the problems identi®ed and/or tant, the lack of evidence about its subsequent effect pointschanges to be made in a modi®ed version of the program to a surprising gap in the literature.(Potter et al., 1990; Finnegan, Rooney, Viswanath, Elmer, The purpose of this paper is to examine the subsequentGraves, Baxter et al., 1992; Kishchuk et al., 1994; Iszler, effect of a formative evaluation to see whether the changesCrockett, Lytle, Elmer, Finnegan, Luepker et al., 1995). resulting from it improved the ®nal program, suf®ciently toThese researchers are not consistent when reporting the distinguish between the impact of two program deliveryimmediate effects of a formative evaluation. Some do not methods. To do this, we: (a) outline the initial programinclude data; some do not outline the problems the process and its formative evaluation, (b) present the key ®ndingsidenti®ed; and some do not describe the changes they made. of the formative evaluation, (c) describe how the formativeWhat is consistent however, is the message from these ®ndings in¯uenced the design of the revised program and itsresearchers: formative evaluation led them to make changes evaluation, and then (d) compare the results of the initial andthat should lead to a stronger program. revised program, something rarely done in the formative In summary, much has been written about formative evaluation literature. In doing this, we provide a compre-evaluationÐits conceptual framework, its methods, and hensive look at the implementation of both a formative andits use. Throughout this literature, there is strong consensus summative evaluation. In conclusion, we identify issues thaton the value of formative evaluation, some calling its value evaluators wishing to improve the design of a formative`obvious (Baggaley, 1986, p. 34) and `no longer ques- evaluation need to consider. In addition, we identifytioned (Chen & Brown, 1994, p. 192). Many educators problems we encountered in attempting to assess the effec-contend, however, that formative evaluation is not used tiveness of a formative evaluation.enough (Flagg, 1990; Kaufman, 1980; Geis, 1987; Foshee,McLeroy, Sumner & Bibeau, 1986). Indeed, some evalua-tions re¯ect no previous attempt at formative evaluation 2. Stage one: The initial program(Foshee et al., 1988; Glanz, Sorensen & Farmer, 1996;Pelletier, 1996; Schneider, Ituarte & Stokols, 1993; Wilk- 2.1. Backgroundinson, Schuler & Skjolaas, 1993; Hill, May, Coppolo &Jenkins, 1993). Other formative evaluations are limited: Combining federal, state and local funding, land grantusing few people, non-representative samples, or selected colleges support educational health promotion programsmaterials (Tessmer, 1993). for individuals and communities offered by county-based Part of the explanation for limited use of formative co-operative extension family living agents. Prior to ourevaluation may lie in the lack of empirical evidence in the study, agents reported poor attendance at evening andliterature demonstrating its subsequent effect. Few research- weekend meetings but rarely offered daytime programs at
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 131worksites. Instead, agents used correspondence lessons to 2.4. Delivery methodreach people unwilling to attend meetings. However, groupinteraction is more likely to facilitate changes in behavior We tested two bi-weekly delivery methods for the(Glanz & Seewald-Klein, 1986), possibly through the lessons. Group-delivery (G), based on the discussion±deci-support offered by sharing experiences. While it was easier sion methods of Lewin (1943), was a 30 min motivationalfor agents to mail correspondence lessons (an impersonal session in which participants discussed adopting a behaviordelivery method), we postulated that using a group meeting suggested in each lesson (i.e. trying recipes, walking forto motivate the use of each lesson in a series before it was exercise, involving children in food preparation, and eatingdistributed was more likely to promote change in food/ calcium-rich foods, not supplements). Participants couldhealth behaviors. To test this hypothesis, we designed a taste a recipe using the featured calcium-rich food andtwo-stage impact study to evaluate two methods of deliver- vote by raised hands on their willingness to individuallying lessons biweekly at worksites: distribution alone vs adopt the suggested behavior. An agent served as facilita-distribution in conjunction with a half-hour group meeting. tor/motivator and distributed a lesson at the end of each Agents delivering the program would work with new session. The other method, impersonal-delivery (I),delivery sites, new clientele, new content, and new delivery consisted of either the agent or a company contact personmethods. Because of this unfamiliarity and because this simply distributing the required lesson to participantsprogram had to ®t into worksite environments with differing according to schedule.work-shift patterns, lunch patterns, physical settings,personnel departments, and required advertising, we 2.5. Staff trainingconducted a formative evaluation of the initial program To insure consistency, all agents received guidelines forimpact and its implementation. We included participants recruitment of worksites and participants, a program contentand instructors in the evaluation, using a variety of qualita- review, a printed program delivery script and instructionstive and quantitative methods. for instrument administration.2.2. Target health problem and audience 2.6. Recruitment Four print lessons in the initial program addressed Seven agents representing three rural and four urban/prevention of osteoporosis, a recently proclaimed public suburban counties interviewed personnel managers at busi-health problem (National Institutes of Health, 1985) most nesses within their county and recruited 48 worksites whereoften affecting white, elderly women. Prevention requires women comprised over 30% of the work force. Once work-life long adequate calcium intake and exercise. According to sites were randomly assigned to a delivery method (G or I),NHANES II data, 75% of American women fail to consume agents systematically recruited participants within a month.the recommended daily amount of calcium (Walden, 1989). We targeted working women, ages 21±45, with at least 3. Stage one: Formative evaluationone child at home because these women are building bonemass which peaks at age 35±45. Mothers can also provide We delineate the data collection methods, the evaluationnutrition activities (Gillespie & Achterberg, 1989) that design, and analyses.teach children how to protect bone health. 3.1. Evaluating program implementation2.3. Lesson content and organization Our goals were to assess: (a) participant characteristics The lessons, based on the Health Belief Model (Janz & relative to the prescribed target audience; (b) participantBecker, 1984), encouraged participants to eat calcium-rich attention to, and use of, the lessons; (c) participant reactionfoods and to walk for exercise by focusing on personal to advertising, lessons content and structure, deliverysusceptibility, disease severity, bene®ts of prevention, and method, and time between lessons and (d) agent reactionover coming barriers to health protecting actions. Because to delivering the program and its content.many in the target audience disliked drinking ¯uid milk To address goal (a), we included demographic questions(based on an initial survey) or could have reactions to in the ®rst questionnaire administered. To address (b), wemilk (Houts, 1988), each lesson introduced a different designed a response sheet for each lesson which asked parti-calcium-rich food (non-fat dry milk, plain yogurt, canned cipants how completely they had read the lesson, how easysalmon or tofu) and menu ideas. Each also included scien- it was to read, and how useful it was, and whether theyti®c background on the lifestyle±osteoporosis link, a self completed the worksheet, tried suggestions or recipes, andassessment worksheet, a featured food fact sheet, sugges- shared lesson materials. To address (c), we developed focustions for involving children in food preparation, and group questions for participants, and, for (d), questions forcalcium-rich recipes. Rightwriter (1990) analysis indicated agents attending a debrie®ng.a 12th grade reading level. We conducted four focus groups among participants
132 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143within a month of the intervention, two each for group-delivery and impersonal-delivery. Each focus group derivedfrom a purposeful sample of thirty participants composed oftwo-thirds completers and one-third non-completers. Theagent telephoned all selected and those who chose to attendbecame the sample. We held the debrie®ng with all agentswithin a month also. Data consisted of tape recordings andwritten notes.3.2. Evaluating program impact Our goal was to examine changes in knowledge, attitudes,and behaviors (KAB) needed to prevent osteoporosis usingappropriate scales, changes in calcium intake using a foodfrequency questionnaire (FFQ), and changes in exercisepattern using speci®c questions. Our hypothesis was personsin group±delivery would exhibit greater changes in attitudeand behavior scores, calcium intake, and exercise patternthan those in impersonal-delivery. We anticipated similarchanges in knowledge for both delivery methods becausethe same lessons were used; the meeting focused primarilyon motivation. To assess changes, we developed the KAB scales usingnutrition expert and target audience reviews and internalconsistency testing with 65 of our target audience prior to Fig. 1. Model of formative and summative evaluation design.use in Stage One. The ®nal formative instrument contained a20 item knowledge scale (KR-20 0.80); a 22 item attitudescale (a 0.78); and a 16 item behavior scale (a 0.75) all response sheet that participants returned prior to receivingaddressing concepts in the lessons. the next lesson. We used a modi®ed version of the Block foodfrequency questionnaire (Brown & Griebler, 1993) that 3.4. Formative evaluation data analysesincluded the four foods featured in the osteoporosis We used x 2 analysis to compare categorical and ANOVAlessons to assess calorie and calcium intake. To examine to compare continuous implementation data, betweenexercise behavior directly, we asked participants if they lessons and between delivery methods, from response sheetsexercised regularly within the last several months each returned. We examined tape recording transcripts and focustime they completed the KAB scales; after the lessons group and debrie®ng notes for repeated themes (Krippen-we also asked if this exercise pattern was new, and, if dorff, 1980).new, if it was due to the lessons. Data from those completing both KAB instruments were3.3. Formative evaluation design analyzed and scale scores determined allowing only one missing value. Each individuals knowledge score was the We employed a pre-test (T0), 8 week intervention, post- sum of correct answers. Each attitude and behavior state-test (T1) design to compare group-delivery (G) and imper- ment required a response on a 5-point Likert scale. Eachsonal-delivery (I) (Fig. 1). We arranged the 48 worksites in individuals attitude and behavior scale score was the meanfour blocks re¯ecting business types (white collar, educa- of all their responses to those questions.tional/municipal, health care and blue collar) and assigned Data from those completing both FFQs were coded,them randomly to either delivery. Although eleven work- entered, and analyzed using FFQ software (Health Habitssites withdrew prior to the intervention, primarily due to and History Questionnaire, 1989). Because nutrient valuecompany changes, the proportion of business types in distributions were not normal, the data were transformedeach delivery method was unaffected. using log e prior to statistical analysis (SAS Proprietary Participants completed pre KAB and FFQ instruments at software, 1989).a meeting 1 week prior to receiving lesson one; the last Non-directional t-tests for independent samples werelesson included post KAB and FFQ instruments, which used to test signi®cance of continuous and ordinal dataparticipants returned at an optional post program meeting (mean age, education, KAB scores and calcium) between1 week later or by mail according to Dillman (1978). delivery methods (G vs I) at each time point (T0, T1). Cate-Question order in each KAB scale differed at each measure- gorical demographic and exercise data were compared usingment to diminish recall bias. Each lesson included the x 2 analysis. ANOVA for repeated measures and ANCOVA
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 133 Fig. 2. Percent reading all the lesson in each evaluation.were used to test signi®cance of mean KAB scores and worksheet one, 28% worksheet two, 80% worksheet three,calcium intake of matched individuals across time. The and 50% worksheet four.covariates of mean income and employment status were The response sheets assessed whether participants triedused in testing changes in KAB scores. Signi®cance was recipes, involved children in food preparation, and sharedassumed at #0.05. lesson materials and revealed no signi®cant differences between delivery methods. Although 37% of method G tried lesson one recipes compared to 20% in method I, there-4. Key ®ndings from the formative evaluation of the after, percentages were lower and similar between deliveryinitial program groups. Those involving children varied from 11% for lesson one to 2% for lesson four and those sharing recipes We outline program implementation ®ndings for goals with friends between 16 and 22% (Fig. 3).(a)±(d) and impact ®ndings. 4.1.3. Goal (c): Participant reactions4.1. Program implementation Fifty women (27 from G and 23 from I) participated in the focus groups. Participants from both delivery methods were4.1.1. Goal (a): Target audience more likely to remember personal contacts and paycheck Ultimately, 275/489 (56%) women completed post ¯yers than other advertisements. They recommendedquestionnaires that met analysis criteria. Completers changes in lesson format, recipes, worksheets, andand non-completers did not differ in demographic calcium-rich foods featured. Many found the lesson bookletcharacteristics (data not shown). When comparing deliv- cumbersome, the menus unhelpful, the worksheets in twoery methods, completers differed signi®cantly only in lessons long, and some featured foods dif®cult to adopt.two factors: percent employed full time (91.6% in G Some participants wanted the emphasis on drinking milk.vs 81.6% in I) and percent of families with incomes They suggested including menus and microwave instruc-over $35,000 (57.7% vs 42.4%). tions in the recipes. With some exceptions, women reported it was dif®cult to involve children in food preparation or that4.1.2. Goal (b): Participants attention to and use of their children were grown.lessons However, some feedback was unique to a delivery Response sheets returned dropped over the four lessons; method. Group-delivery participants wanted moreMethod G dropped from 81% of initial registrants for lesson lecture, more question and answer time, and less moti-one to 41% for lesson four and method I from 95 to 67%. vational discussion. They could not recall voting to tryOtherwise, the two delivery methods did not differ signi®- a behavior (critical to the discussion±decision method)cantly in attention to, and use of lessons. but liked the food tasting activity. Impersonal-delivery Respondents that reported reading all lesson materials fell participants also wanted question and answer time andfrom 85% for lesson one to 62±64% for lesson four (Fig. 2). reminders to complete each lesson, but disagreed aboutRegardless of delivery method, respondents rated all the period between lessons.lessons, on a scale of 1±5, fairly easy to read (1.4 ^ 0.6 Participants from both delivery methods revealed thatwhere 1 easy to read), and useful (hovering at 2.1 ^ 0.8 they had limited time to try recipes and had not yet putwhere 1 is very useful). About 70% reported completing learned health-promoting actions into practice. They
134 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 Fig. 3. Percent reporting sharing lesson recipe with friends.disliked the long KAB questionnaire and completing the advice. They felt the recipes needed improvement.second FFQ, only 2 months after the initial one, when Agents echoed the lack of emphasis on drinking milk,they had not yet initiated changes in eating habits. a political issue in counties with a dairy industry.4.1.4. Goal (d): Agent reactions 4.2. Program impact All agents participated in both delivery methods. Theyreported that the advertising materials did not clearly As hoped, changes over time for KAB were signi®-de®ne the target audience and that in-person appeals cant. As expected, the hypothesis that changes in knowl-and an enthusiastic site contact improved recruitment. edge would not differ by delivery method was supported.Despite managing shifts, they preferred the interaction Unexpectedly, the hypothesis that those in group-deliv-and participant interest in group-delivery and the oppor- ery would show greater gains in attitude, behavior,tunity for daytime programs. But agents using group- calcium intake, and exercise pattern than those in imper-delivery resisted being motivators and asked to provide sonal-delivery was not supported. For the KABlectures, perceiving that participants wanted prescriptive measures, time by delivery method interaction was not Fig. 4. Change in knowledge score over time.
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 135 Fig. 5. Change in attitude score over time. Fig. 6. Change in behavior score over time.signi®cant (Figs. 4±6). Group delivery did not affect 5.1. Revised program lesson content and recruitmentknowledge, attitude, or behavior scores any more thanimpersonal delivery. Changes in calcium intake and We changed the lesson content to address the concernsexercise pattern were not signi®cantly different between outlined above. We asked six county agents, representingdelivery groups (data not shown). three rural and three suburban counties, to recruit four work- sites each, a total of 24. We clari®ed the target audience in advertising materials and directed agents toward in-person recruiting. We lowered the lessons reading level to accom-5. Stage two: The revised program and summative modate participants from more blue collar worksites whereevaluation mothers, ages 21±45, were a signi®cant part of the work force to insure enrolling more working women with young The changes made in stage two in the program content, children.recruitment, delivery method, and evaluation design andinstruments for the summative evaluation are shown in 5.2. Revised program delivery methodTable 1. Almost all re¯ect key ®ndings of the stage oneformative evaluation. The initial program implementation and impact data
136 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143Table 1Major changes in educational program, evaluation design and evaluation instruments prompted by results of the formative evaluationType of change From ToProgram lesson contentz Layout of each lesson Booklet Folder with pull-out fact sheetsz Calcium rich foods Emphasize four non-traditional foods Emphasize ¯uid milk and four non-traditional foodsz Worksheets Lesson 1: 7 day exercise diary Lesson 1: 3 day exercise diary Lesson 4: long contract to make one Lesson 4: short contract to make one behavior change behavior changez Fact sheet: food activities for children Suggestions to involve children in food Retain and give added emphasis in lessons and group meeting activitiesz Recipes Six per lesson with conventional Keep four most popular, but add microwave instructions and instructions menu suggestions; emphasize testing on weekendsz Reading level 12th grade 8th gradeProgram recruitmentz Recruitment of worksites Work force must have a high percentage Target blue collar worksites; work force must have a high of working women percentage of working mothersz Advertising for target audience Print material and in-person recruitment Emphasis on in-person recruitment; clarify target audience in all recruitment materialProgram deliveryz Delivery method Group: motivational discussion about Group: lecture stressing 2±3 main points of lesson followed by overcoming barriers to suggested pep talk about suggested behavior followed by group vote on behaviors ending with group vote on trying the behavior [try recipes, start walking program, involve trying the behavior [try recipes, start kids in kitchen, use foods not supplements] plus food tasting walking program, involve kids in with revised recipes kitchen, use foods not supplements] plus food tasting Impersonal: pass out lessons on schedule Impersonal: pass out lessons on scheduleEvaluation designz Intervention design Comparison of two delivery methods Comparison of two delivery methods with a control Pre±post measures, T0 & T1 ÐKAB and Pre±post 4 month post measures: T0, T1 & T2 ÐKAB, T0 & FFQ T2 ÐFFQ Response sheet in each lesson; no Response sheet in each lesson; provide incentive for return incentive to returnEvaluation instrumentsz Impact instrument scales KAB questionnaire KAB questionnaire z 20 knowledge questions z 14 behavior questions KR-20 0.80 KR-20 0.725 z 22 attitude questions z 16 attitude questions a 0.78 a 0.80 z 16 behavior questions z 14 behavior questions a 0.75 a 0.80z Response sheets z Try any suggestion for child activity: For both questions, add the response: no, but plan to responsesÐyes, no z Try any recipe: responsesÐyes, noindicated the group delivery method did not affect atti- tion rate led us to use a pre (T0), immediate post (T1), and 4tudes and behaviors possibly because agents were month post (T2) summative evaluation design (Fig. 1). Weuncomfortable and did not conduct the meeting accord- asked participants to complete the KAB instrument at alling to directions. To rectify this, using Pelz (1959), six time points, but the FFQ only at T0 and T2, a 6 monthagents designed four new 30 min meeting scripts that interval, expecting the T2 measure would detect changesincluded two to three main points, retained the food which initial program participants claimed took time totasting (with new recipes), and eliminated the motiva- implement.tional discussion. A suggested action was still promoted To improve our ability to detect changes, we comparedat the end of the meeting and a group vote taken on three intervention groups (two experimentalÐgroup-deliv-adoption. Agents were trained to use these scripts and ery and impersonal-deliveryÐand one control). Thedistributed the lessons biweekly. controls received four correspondence lessons addressing cancer prevention, identical in design to the osteoporosis5.3. Summative evaluation design lessons the experimental groups received. The osteoporosis and cancer lessons differed only in diet±disease context, Participants comments and the poor formative comple- bene®cial nutrients and foods, and emphasis on exercise
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 137in the osteoporosis lessons. In sum, those in group-delivery more accurate assessment of the responses of those complet-received the modi®ed group meeting and osteoporosis ing the program.lessons; those in impersonal-delivery only the osteoporosis Impact analysis methods were similar to those used in thelessons, and the controls only the cancer lessons. formative analyses with these modi®cations: (a) we used We divided the 24 worksites into ®ve blocks re¯ecting only data of those completing all three KAB or FFQ instru-relative pay scale and type of worker. These were assigned ments; (b) we allowed up to two missing answers on thepurposefully to the three intervention groups such that there knowledge scale; (c) we tested the signi®cance of continu-was an equal representation of all ®ve blocks in the two ous and ordinal data among the three delivery groups atexperimental groups while the controls lacked representa- three time points (T0, T1, T2) and (d) age served as thetion from one of two lower pay blocks. Three companies covariate for ANOVA and ANCOVA. We determinedwithdrew prior to recruitment. statistically signi®cant differences among values at time All participants completed pre-test instruments at a meet- points using pair-wise tests of differences between least-ing 1 week prior to receiving the ®rst lesson. The post-test squares means. A Bonferoni adjustment was used to controlKAB instrument, distributed with the last lesson, was the overall error rate. Signi®cance was assumed at #0.05.collected at a concluding meeting 2 weeks later. Three Finally, we compared categorical and continuous demo-months later the ®nal instruments were distributed to all graphic characteristics (mean age and education) betweenparticipants by the agent or by mail using a modi®ed Dill- the formative and the summative evaluation completersman Method. using x 2 analysis and non-directional t-tests.5.4. Evaluating revised program implementation 6. Summative evaluation ®ndings and comparison with To assess demographics, we included questions in the the formativepre-test instrument of all three intervention groups. Toassess attention to, and use of the lessons, we included a First, we examine the implementation ®ndings. Then weresponse sheet in each lesson for the two experimental examine impacts over time comparing the results to thegroups only. We added a third possible response (no, but I control, looking at differences between the two deliveryplan to) to questions about childrens activities or recipes to methods. In each instance we compare the summative ®nd-capture behavioral intention. ings with those of the formative.5.5. Evaluating revised program impact 6.1. Program implementation As in the formative evaluation, we hypothesized that 6.1.1. Goal (a): Target audiencethose in group-delivery would exhibit greater changes Completion rates were better and participant demo-than those in impersonal-delivery in attitude and behavior graphics were closer to those desired in the summativescores, exercise pattern, and calcium intake. In addition, we compared to the formative. In the summative, 70% of initialhypothesized that: (a) both experimental groups would exhi- registrants completed all three KAB measures. Almost 90%bit greater changes in knowledge than controls and (b) those completed the KAB instruments at T0 and T1, in contrast toin impersonal-delivery would exhibit greater changes than 56% in the formative. Eighty percent completed both FFQcontrols in attitude and behavior scores, exercise pattern, instruments in the summative compared to 51% in theand calcium intake. formative. Table 2 lists the demographics of completers in Based on formative participant comments, we shortened both evaluations, ®nding them similar in family income,the KAB scales to reduce participant burden, hoping to race, marital status, awareness of relatives with osteoporo-improve completion. Using the formative instruments sis, initial exercise pattern, and calcium intake per 1000completed by the 677 women registrants for stage one calories. Those completing the summative evaluation were(mean age 43.34 ^ 11.58), we used internal consistency signi®cantly more likely than those in the formativetesting to eliminate less discriminating items, producing however, to be younger, have only a high school education,the scales in Table 1. Items retained in the KAB instrument work full time, and have at least one child at home.(76% of the original) represented all content areas in the In the summative, the two experimental groups andformative, improving as for two scales. However, no new controls differed signi®cantly in only age (data notquestions were added. Both the question about exercise shown). Those in group-delivery had a mean age ofregularity and the FFQ were not changed for the summative 39.1 ^ 9.95 compared to 40.6 ^ 10.10 in impersonal-evaluation. delivery and 40.9 ^ 9.61 in the control.5.6. Summative evaluation data analyses 6.1.2. Goal (b): Participants attention to, and use of, lessons In contrast to the formative, implementation data of those In the summative evaluation, 78% of group-delivery andcompleting all four response sheets were used, providing a 63% of impersonal-delivery registrants returned all four
138 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143Table 2Demographic characteristics of those completing each evaluation phase. SD standard deviation; yr. yearsVariable Formative Summative aMean age ^ SD N 275, 43.02 ^ 11.12 N 247, 40.15 ^ 9.88Family income N 255 N 232# 10±19.9000 41 (16.1%) 45 (19.4%)20±34.9000 92 (36.1%) 83 (35.8%)35±50,000 1 122 (47.8%) 104 (44.8%)Employment status a N 272 N 245% full time 235 (86.4%) 232 (94.7%)Mean educational level ^ SD (yr.) b N 268, 13.40 ^ 1.97 N 246, 12.70 ^ 1.54Race N 272 N 246% white 260 (95.6%) 239 (97.2%)Marital status N 272 N 246Married 200 (73.5%) 175 (71.1%)Single 40 (14.7%) 27 (11.0%)Other 32 (11.7%) 44 (17.9%)Percent with at least one child at home b N 274, 129 (47.1%) N 247, 163 (66.0%)Relatives with osteoporosis N 246 N 247Yes 49 (19.9%) 35 (14.2%)No 167 (67.9%) 166 (67.2%)Dont know 30 (12.2%) 46 (18.6%)Exercise regularly in past 6 months? N 269 N 246Yes 94 (34.9%) 90 (36.6%)No 175 (65.1%) 156 (63.4%) N 248 N 244Calories (mean ^ SD) 1623.8 ^ 584.4 1798.0 ^ 711.2Calcium in mg 805.4 ^ 415.5 895.3 ^ 549.2Calcium in mg/1000 calories 497.1 ^ 178.9 490.2 ^ 206.3 a p , 0.01. b p , 0.001.response sheets, providing the sample for analysis. tofu respectively, signi®cantly more useful than didHowever, formative and summative return rates are not impersonal-delivery participants, whereas in the forma-comparable because we did not restrict that sample to tive, the ratings were identical.those completing all four. Questions about materials use revealed that the summa- In both evaluations, we asked participants if they read all, tive results differed from the formative, regardless of theparts of, skimmed, or did not read each lesson. Similar to the delivery method, in that:formative, those reading the whole lesson declined to 65±70% by lesson four. Unlike the formative where there was ² 3±10% tried a recipe (data not shown) compared to 10±no difference between delivery groups in percent reading the 37% in the formative. Yet, in the summative, two thirdslessons, in the summative, signi®cantly more in group- or more in both delivery groups indicated they planned todelivery skimmed and less read lessons one and two than try a recipe, an option not available in the formative.in impersonal-delivery (Fig. 2). ² 11±21% involved children in food preparation compared Completion rates of the worksheets did not differ between to 2±11% in the formative. Additionally, in contrast toevaluations with one exception. More completed worksheet the formative, the summative exposed differences amongtwo, a revised exercise diary, in the summative than in the delivery groups, in that the percent sharing a recipe withformative (40 vs 28%, respectively). friends was signi®cantly greater in group-delivery than In both evaluations, respondents were asked how easy impersonal-delivery for lessons two (20 and 6% respec-to read and how useful each lesson was. Respondents in tively) and four (24 and 10%) while no signi®cant differ-both evaluations provided nearly identical ratings of ease ences between delivery methods was evident in theof reading regardless of delivery method, suggesting the formative (Fig. 3). Lessons two and four featured lesslower reading level of the summative materials made it familiar calcium-rich foods.easier for the less educated participants. Respondents in ² Consistently more in group-delivery shared otherboth evaluations provided nearly identical ratings of lesson materials with friends than in impersonal-deliveryperceived usefulness for lessons one and two; however, (16±22% vs 9±11%). This distinction was not seen in thein the summative, group-delivery participants rated formative where about 15% in both delivery methodslessons three and four, highlighting canned salmon and shared materials (data not shown).
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 1396.2. Program impact greater impact than those of the formative although the difference was in the right direction.6.2.1. Change in knowledge As expected, our hypothesis that both group-delivery andimpersonal-delivery of osteoporosis lessons would lead to 7. Discussiongreater knowledge gain than in controls was supported(Fig. 4). Indeed, the gain in knowledge in group-delivery The purpose of this study was to test whether the changeswas signi®cantly greater than that in impersonal-delivery made in a program as a result of a formative evaluationand both were signi®cantly greater than that of the strengthened the implementation and impact of the programcontrols at both T1 and T2. This was not seen in the formative in the summative. We found that implementation improvedevaluation. but only certain measures of impact improved enough to distinguish the effects of each delivery method. As a result we suggest the following for evaluators to consider in6.2.2. Change in attitude designing a formative evaluation and in attempting to assess Our hypothesis that those in the experimental groups its effectiveness.would show greater gains in attitude than controls was notsupported in the summative evaluation (Fig. 5). Our hypoth- 7.1. Improving the design of a formative evaluationesis that gains in attitude would differ between deliverymethods was not supported either. The initial mean attitude 7.1.1. Interpreting (focus group) feedback from participantsscores in the summative were signi®cantly higher than those As a result of participants comments in the formativein the formative (3.9±4.0 vs 3.0±3.1, respectively), perhaps focus groups, we extensively altered the lessons for thedue to increased media focus on osteoporosis or to differ- summative and this did lead to greater use of a formerlyences in participants, and could have limited our ability to underused worksheet, continued ease of reading with parti-improve attitudes and thus to detect signi®cant changes. We cipants with lower educational levels, and more involve-did not detect the in¯uence of any previous worksite educa- ment of children in food preparation.tional activities, however. Focus group participants wanted less motivational discus- sion and more lecture in the meetings. In response, we increased lecture time in the group-delivery method. The6.2.3. Change in behavior in¯uence of the group-delivery method was underscored Our hypothesis that experimental groups would show because group-delivery participants were more likely togreater gains in behavior than controls was partially skim than to read the lessons. Even with less reliance onsupported in the summative (Fig. 6). Gains in behavior for reading, group-delivery participants found two lessonsgroup-delivery were signi®cantly greater than the control at signi®cantly more useful and were more likely to shareT2. Administering the behavior scale 4 months after the lesson materials with others than those in impersonal-deliv-intervention (T2) identi®ed further changes in participant ery. The agent presentation in group-delivery was clearlybehavior in group-delivery that were not seen in controls, critical to sell the lessons. In this case, following participanteven when controlling for age, a possible explanation. Our recommendations appeared to improve impact on knowl-hypothesis that those in group-delivery would show greater edge gained, but not in most areas of behavior change.gains in food-related behavior than those in the impersonal- When the focus group participants indicated they did notdelivery was not supported in the summative evaluation. like the group-delivery motivational discussion±decision session, they also indicated a preferred alternative. Given6.2.4. Changes in calcium intake and exercise pattern the extensive literature on assessing participant perspectives Our hypothesis that those in experimental groups would (Basch,1987), we assumed we needed to alter the deliveryshow greater gains in calcium intake than controls was not method to one they liked (especially since the agents echoedsupported in the summative. Our hypothesis that those in this preference) and ultimately emphasized lecture overgroup-delivery would show greater gains than those in motivation in the summative. In hindsight, we shouldimpersonal-delivery was also not supported. The summative have asked the focus group participants how to change thedelivery methods did not have a greater differential impact motivational aspects of the session with its emphasis onthan those of the formative. behavior modi®cation, rather than abandon this based on Our hypothesis that experimental groups would report their negative feedback.greater change in exercise patterns than controls was A researcher using focus groups should not assume that ifpartially supported (data not shown). In the summative participants express dislike of a particular delivery methodevaluation, the number at T2 who reported that exercising and suggest another, that one should drop the originalregularly was a new pattern was signi®cantly greater in method without considering the down stream effects. Begroup-delivery than in controls. Our hypothesis that exer- prepared to probe to learn why it was disliked and how itcise patterns would differ between delivery methods was not might be modi®ed, especially if participants are suggestingsupported. The summative delivery methods did not have a a more passive path to learning. By making the assumption
140 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143participants were right, we limited focus group inquiry and to plan the summative. In hindsight, we should have taken athe type of information collected for program planning in proactive stance once these unanticipated barriers surfacedthe summative. in the initial focus group, and posed questions to later focus We revised the recipes, believing these could be used as groups based on the comments offered in previous ones.the context to show participants how to use calcium-richfoods and as a device to facilitate behavior change. In the 7.1.3. Interpreting feedback from program instructorsfocus groups, we listened openly to complaints about the As a result of the formative, we allowed instructors torecipes and asked participants how to improve them. Some change the presentation in the group delivery method,participants indicated they came to the program for the believing they would have greater ownership and thusrecipes but disliked those provided. Despite these partici- impact, if they designed a presentation with which theypant-guided revisions, reported use of recipes did not were more comfortable. The remedy the instructors devel-improve in the summative, although a large number indi- oped, a lecture rather than a motivational discussion, led tocated they planned to. an emphasis on knowledge rather than on behavior change In the formative, we assumed that when a component of in the presentation. This may partially explain why the twothe program i.e., the recipes, was poorly received by parti- delivery methods did not differ signi®cantly in ®nal beha-cipants that this merely needed improvement and thus we vior scale scores. And it may also explain the signi®cantlyspeci®cally asked participants for suggestions. In hindsight, greater gain in knowledge in the group- than the impersonal-the question we should have asked ®rst, given the delivery.complaints, was one that tested our assumption that partici- The formative evaluation provided feedback about thepants valued recipes, i.e. should recipes be included in the presentation. We assumed that when a component of thelessons at all? And, if the answer was no, we should have program i.e., the presentation, was unacceptable to instruc-been ready to discuss with focus group participants, alter- tors that it should be changed and we asked for suggestions.native devices to motivate behavior change. The acceptable change did not lead to greater impact. Researchers designing formative evaluations need to be Although instructor suggestions carry great face validity,alert for such a methodological inconsistency: why did researchers need to be wary because instructors may provideparticipants give us suggestions to improve the recipes suggestions that shift the aims of the program. In hindsight,when, in fact, many had not used these (10±30% in the what we should have done was to propose to the instructors,formative). This might be explained by the inclination of alternative presentation methods that retained an emphasispeople to provide answers to questions when they are asked. on behavior change. If none were acceptable, we shouldPeople have a tendency to tell more than they can know have queried our assumption that agents were the appropri-(Nisbett & Wilson, 1977). ate instructors for the group presentation. In summary, researchers using focus groups must beprepared to probe assertions by participants that somecomponent of a program is unsatisfactory. Probing should 7.1.4. Incorporating a control groupinvestigate both how that component might be improved We included impact measures in the formative to gain aand retained as well as what might be substituted and quantitative estimate of the effects of the initial program butwhy. In particular, asking a question about the fundamental we did not include a control group because we expected theusefulness of some component of a program in a formative contrast between group- and impersonal-delivery to beevaluation may not be easy for researchers as they may have robust. Due to instructor dif®culties with the stage oneconsiderable ownership and resources invested in that group-delivery, this contrast did not materialize. Althoughcomponent. However, when faced with negative feedback we had evidence that participants were learning from theabout a component of the initial program, researchers print materials in both delivery methods, we could notshould investigate both options to insure suf®cient data for demonstrate that the change seen was better than with noan informed decision about summative activities. intervention. Hence we suggest that evaluators include a control group when testing the impact of a new program7.1.2. Assessing barriers to changing behaviors offered via different delivery methods in a formative Because participants came to this program on calcium- evaluation.rich foods, we assumed that they would be open to changeand that testing the suggested foods at family meals would 7.1.5. Watching for serendipitous effects of formativebe acceptable. Some focus group participants indicated it evaluationwas dif®cult to introduce these foods because of family Including impact measures lead to unexpected participantmember aversion to change. On hearing this, we assumed feedback about the instruments and their mode of adminis-that altered recipes would be suf®cient to overcome opposi- tration. This feedback proved invaluable in improvingtion and explored this in all focus groups. By not making a program implementation in the summative, underscoringconscious effort to uncover social barriers to changing food the usefulness of impact measures in formative evaluations.choices for families, we limited the information we gained In hindsight, we recommend including these measures in the
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 141formative to assess impact as well as to obtain feedback the subsequent effects of formative evaluation on a program.about the instruments and their implementation. This study begins to ®ll that gap. This study demonstrates that formative evaluation can7.2. Assessing the effectiveness of the formative evaluation strengthen the implementation and impact of an educational intervention designed to compare the impact of two program The need exists to demonstrate the subsequent effects of delivery methods. The modi®cations made as a result of theformative evaluation in order to improve formative evalua- formative appear to have signi®cantly improved knowledgetion design. Our study provides some guidance. gained but resulted in only modest improvements in beha- viors in the ®nal program.7.2.1. Altering the evaluation design Our retrospective analysis of our experience supports the As a result of the formative, we added a control group and a inclusion of a control group and impact measures at several3-month post-intervention measurement in the summative. time points in a formative evaluation of a program imple-Without either of these, we would not have been able to mented in a new environment for agency personnel. Theobserve some signi®cant differences between the two delivery ®ndings also suggest that when researchers are faced withmethods in the summative. Three months after the interven- negative feedback about the components of the program in ation, group delivery produced signi®cantly greater changes in formative evaluation, they need to exercise care in interpret-behavior scores and exercise pattern than seen in the controls ing feedback and in revising those components.while the impersonal-delivery method did not. This design In addition, the need to gather evidence of the subsequentchange also revealed that the group method lead to signi®- effect of using data from formative evaluations is critical.cantly greater knowledge gain than the impersonal method Otherwise we cannot begin to examine whether the methodsand both gains were greater than that of controls. Clearly our and processes we take for granted in the formative evalua-target audience was not likely to adopt behavior changes based tion are valid and appropriate. However, evaluators muston receiving just the printed materials. carefully plan the evaluations making sure that instruments However bene®cial these changes in evaluation design and evaluation design are parallel in order to carry out thesewere in illuminating important ®ndings about the impact comparisons. We offer our ®ndings as stimulus to considerof the ®nal program, implementing them only in stage two such studies.prevented a rigorous comparison between formative andsummative evaluations which might have allowed us tosee more clearly, the effect of the formative evaluation.Lack of the T2 measure in the formative meant we could Referencesnot be sure what effect the initial program had 3 months laterand lack of a control group meant we could not fully inter- Baggaley, J. (1986). Formative evaluation of educational television. Cana-pret the formative impact data. In future, we recommend dian Journal of Educational Communication, 15 (1), 29±43.that if researchers want to assess the effectiveness of a Baghdadi, A. A. (1981). A comparison between two formative evaluationformative evaluation, that the design elements of formative methods. Dissertation Abstracts International, 41 (8), 3387A.and summative steps be parallel. These design details may Baker, E. L., & Alkin, M. C. (1973). Formative evaluation of instructional development. AV Communication Review, 21 (4), 389±418.seem more appropriate for a summative evaluation but they Basch, C. E. (1987). Focus group interview: An underutilized researchare necessary to see the effects of the formative. technique for improving theory and practice in health education. Health Education Quarterly, 14 (4), 411±4184.108.40.206. Altering the evaluation instruments Bertrand, J. (1978). Communications pretesting (Media Monograph No. Six). Chicago: University of Chicago, Community and Family Study As a result of the formative, we shortened the KAB ques- Center.tionnaire somewhat to address participant complaints in the Brown, J. L., & Griebler, R. (1993). Reliability of a short and longsummative. We believe this contributed to more participants version of the Block food frequency from for assessing changes incompleting all the evaluation instruments in stage two. Thus calcium intake. Journal of the American Dietetic Association, 93implementation improved in the summative. However, this (7), 784±789. Cambre, M. (1981). Historical overview of formative evaluation of instruc-improvement carried a price. Subsequently, we were not tional media products. Educational Communication & Technologyable to conduct the most rigorous comparison of the forma- Journal, 29 (1), 3±25.tive KAB results to those of the summative. Cardinal, B. J., & Sachs, M. L. (1995). Prospective analysis of stage-of- exercise movement following mail-delivered, self-instructional exer- cise packets. American Journal of Health Promotion, 9 (6), 430±432.8. Conclusions Chambers, F. (1994). Removing confusion about formative and summative evaluation: Purpose versus time. Evaluation and Program Planning, 17 The literature on formative evaluation focuses on its (1), 9±12.conceptual framework, its methodology and use. Permeat- Chen, H. T. (1996). A comprehensive typology for program evaluation. Evaluation Practice, 17 (2), 121±130.ing this work is a consensus that a program will be strength- Chen, C. H., & Brown, S. W. (1994). The impact of feedback duringened as a result of a formative evaluation although little interactive video instruction. International Journal of Instructionalempirical evidence exists in the literature to demonstrate Media, 21 (3), 191±197.
142 J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143Dehar, M. A., Casswell, S., & Duignan, P. (1993). Formative and process veri®cation and revision: An experimental comparison of two methods. evaluation of health promotion and disease prevention programs. AV Communication Review, 24 (3), 316±328. Evaluation Review Journal, 17 (2), 204±220. Kaufman, R. (1980). A formative evaluation of formative evaluation:Dick, W. (1980). Formative evaluation in instructional development. Jour- The state of the art concept. Journal of Instructional Development, 3 nal of Instructional Development, 3 (3), 2±6. (3), 1±2.Dick, W., & Carey, L. (1985). The systematic design of instruction, (2nd ed) Kershaw, D., & Fair, J. (1976). The New Jersey income maintenance Glenview, IL: Scott, Foresman. experiment, New York: Academic Press.Dillman, D. A. (1978). Mail and telephone surveys: The total design Kishchuk, N., Peters, C., Towers, A. M., Sylvestre, M., Bourgault, C., & method, New York: John Wiley and Sons. Richard, L. (1994). Formative and effectiveness evaluation of a work-Dennis, M. L., Fetterman, D. M., & Sechrest, L. (1994). Integrating quali- site program promoting healthy alcohol consumption. American Jour- tative and quantitative evaluation methods in substance abuse research. nal of Health Promotion, 8 (5), 353±362. Evaluation and Program Planning, 17 (4), 419±427. Krippendorff, K. (1980). Content analysis: An introduction to its methodol-Fairweather, G., & Tornatzky, L. G. (1977). Experimental methods for ogy, Sage: Beverly Hills, CA. social policy research, New York: Pergamon. Lenihan, K. (1976). Opening the second gate, Washington, DC: U.S.Finnegan Jr, J. R., Rooney, B., Viswanath, K., Elmer, P., Graves, K., Government Printing Services. Baxter, J., Hertog, J., Mullis, R., & Potter, J. (1992). Process evaluation Lewin, K. (1943). Forces behind food habits and methods of change. In of a home-based program to reduce diet-related cancer risk. The `WIN The Problem of Changing Food Habits. National Research Council at Home Series. Health Education Quarterly, 19 (2), 233±248. Bulletin 108. (pp. 35±65). Washington, D.C.: National Academy ofFitz-Gibbon, C. T., & Morris, L. L. (1978). How to design a program Sciences. evaluation, Beverly Hills, CA: Sage Publications. Markle, S. M. (1979). Evaluating instructional programs: How much isFlagg, B. N. (1990). Formative evaluation for educational technologies, enough? NSPI Journal, Feb, 22±24. Hillsdale, NJ: Lawrence Erlbaum Associates. Markle, S. M. (1989). The ancient history of formative evaluation. Perfor-Flay, B. R. (1986). Ef®cacy and effectiveness trials (and other phases of mance and Instruction, Aug, 27±29. research) in the development of health promotion programs. Preventa- McGraw, S. A., McKinley, S. A., McClements, L., Lasater, T. M., Assaf, tive Medicine, 15, 451±474. A., & Carleton, R. A. (1989). Methods in program evaluation: TheFoshee, V., McLeroy, K. R., Sumner, S. K., & Bibeau, D. L. (1986). process evaluation system of the Pawtucket Heart Health Program. Evaluation of worksite weight loss programs: A review of data and Evaluation Review, 13 (5), 459±483. issues. Journal of Nutrition Education, 18 (1), S38±S43. McGraw, S. A., Stone, E. J., Osganian, S. K., Elder, J. P., Johnson, C. C.,Geis, G. L. (1987). Formative evaluation: Developmental testing and expert Parcel, G. S., Webber, L. S., & Luepker, R. V. (1994). Design of review. Performance & Instruction, May/June, 1±8. process evaluation within the child and adolescent trial for cardio-Gillespie, A., & Achterberg, C. (1989). Comparison of family interaction vascular health (CATCH). Health Education Quarterly, Supplement, patterns related to food and nutrition. Journal of the American Dietetic 2, S5±S26. Association, 89 (4), 509±512. Montague, W. E., Ellis, J. A., & Wulfeck, W. H. (1983). InstructionalGlanz, K., & Seewald-Klein, T. (1986). Nutrition at the worksite: An over- quality inventory: A formative evaluation tool for instructional devel- opment. Performance and Instruction Journal, 22 (5), 11±14. view. Journal of Nutrition Education, 18 (1), S1±S12. Nathenson, M. B., & Henderson, E. S. (1980). Using student feedback toGlanz, K., Sorensen, G., & Farmer, A. (1996). The health impact of work- improve learning materials, London: Croom Helm. site nutrition and cholesterol intervention programs. American Journal National Institutes of Health (1985). Surgeon generals report on nutrition of Health Promotion, 10 (6), 453±470. and health. U.S. Department of Health and Human Services, PublicHausman, J. A., & Wise, D. A. (1985). Social experimentation, Chicago: Health Service (Chapter 7, pp. 311±343). Washington, DC: U.S. The University of Chicago Press. Government Printing Service.Hill, M., May, J., Coppolo, D., & Jenkins, P. (1993). Long term effective- Nisbett, R. E. & Wilson, T. D. (1977). Tellimg more than we can know: ness of a respiratory awareness program for farmers. National Institute Verbal reports on mental processes. Psychological Review, 84(3), May. for Farm Safety, Inc. NIFS Paper No. 93-3. Columbia, MO. NIFS Parrott, R., Steiner, C., & Godenhar, L. (1996). Georgias harvesting Summer Meeting, Coeur dAlene, Idaho. healthy habits: A formative evaluation. The Journal of Rural Health,Health Habits And History Questionnaire: Diet History And Other Risk 12 (4), 291±300. Factors (1989). Personal computer system packet. Version 2.2. Patton, M. Q. (1978). UtilizationÐfocused evaluation, Beverly Hills, CA: Washington, D.C.: National Cancer Institute, Division of Cancer Sage. Prevention and Control, National Institutes of Health. Patton, M. Q. (1982). Practical evaluation, Beverly Hills, CA: Sage.Houts, S. A. (1988). Lactose intolerance. Food Technology, 42 (3), 110± Patton, M. Q. (1994). Developmental evaluation. Evaluation Practice, 15 113. (3), 311±319.Iszler, J., Crockett, S., Lytle, L., Elmer, P., Finnegan, J., Luepker, R., & Patton, M. Q. (1996). A world larger than formative and summative. Laing, B. (1995). Formative evaluation for planning a nutrition inter- Evaluation Practice, 17 (2), 131±144. vention: Results from focus groups. Journal of Nutrition Education, 27 Pelletier, K. R. (1996). A review and analysis of the health and cost-effec- (3), 127±132. tive outcome studies of comprehensive health promotion and diseaseJacobs Jr, D. R., Luepker, R. V., Mittelmark, M. B., Folsom, A. R., Pirie, P. prevention programs at the worksite: 1993±1995 update. American L., Mascioli, S. R., Hannan, P. J., Pechacek, T. F., Bracht, N. F., Carlaw, Journal of Health Promotion, 10 (5), 380±388. R. W., Kline, F. G., & Blackburn, H. (1986). Community-wide Pelz, E. B. (1959). Some factors in group decision. In E. E. Macoby, T. M. prevention strategies: Evaluation design of the Minnesota heart health Newcomb & E. L. Hartley, Readings in social psychology (3rd ed). program. Journal of Chronic Disease, 39 (10), 775±788. (pp. 212±219). New York: Holt, Rinehart and Winston, Inc.Janz, N. K., & Becker, M. H. (1984). The health belief model: A decade Peterson, K. A., & Bickman, L. (1988). Program personnel: The missing later. Health Education Quarterly, 11 (1), 1±47. ingredient in describing the program environment. In J. Kendon,Johnson, C. C., Osganian, S. K., Budman, S. B., Lytle, L. A., Barrera, E. P., Conrad Roberts-Gray & Cynthia Roberts-Gray, Evaluating program Bonura, S. R., Wu, M. C., & Nader, P. R. (1994). CATCH: Family environments, San Francisco, CA: Jossey-Bass, Inc. process evaluation in a multicenter trial. Health Education Quarterly, Potter, J. D., Graves, K. L., Finnegan, J. R., Mullis, R. M., Baxter, J. S., Supplement, 2, S91±S106. Crockett, S., Elmer, P. J., Gloeb, B. D., Hall, N. J., Hertog, J., Pirie, P.,Kandaswamy, S., Stolovitch, H. D., & Thiagarajan, S. (1976). Learner Richardson, S. L., Rooney, B., Slavin, J., Snyder, M. P., Splett, P., &
J.L. Brown, N.E. Kiernan / Evaluation and Program Planning 24 (2001) 129±143 143 Viswanath, K. (1990). The cancer and diet intervention project: a Seidel, R. E. (1993). Notes from the ®eld in communication for child survi- community-based intervention to reduce nutrition-related risk of val, Washington, DC: USAID. cancer. Health Education Research, 5 (4), 489±503. Stuf¯ebeam, D. L. (1983). The CIPP model for program evaluation. In G.Rightwriter (1990). Version 3.1. Sarasota, FL: RightSoft, Inc. Madaus, M. Scriven & D. Stuf¯ebeam, Evaluation models: ViewpointsRobins, P. K., Spiegelman, R. G., Weiner, S., & Bell, J. G. (1980). A on educational and human services evaluationBoston: Kluwer-Nijhoff. guaranteed annual income: Evidence from a social experiment, New Tessmer, M. (1993). Planning and conducting formative evaluations, York: Academic Press. London: Kogan Page.Rossi, P. H., & Lyall, K. (1976). Reforming public welfare, New York: Thiagarajan, S. (1991). Formative evaluation in performance technology. Russell Sage. Performance Improvement Quarterly, 4 (2), 22±34.Rossi, P. H., & Freeman, H. E. (1982). Evaluation: A systematic approach Wager, J. C. (1983). One-to-one and small group formative evaluation: An (p. 69). Beverly Hills, CA: Sage Publications. examination of two basic formative evaluation procedures. Perfor-Russell, J. D., & Blake, B. L. (1988). Formative and summative evaluation mance and Instruction, 22 (5), 5±7. of instructional products and learners. Educational Technology, 28 (9), Walden, O. (1989). The relationship of dietary and supplemental calcium 22±28. intake to bone loss and osteoporosis. Journal of the American DieteticSAS Proprietary Software Release 6.09. (1989). Cary, N.C.: SAS Institute, Association, 89 (3), 397±400. Inc. Weston, C. B. (1986). Formative evaluation of instructional materials: AnScanlon, E. (1981). Evaluating the effectiveness of distance learning: A overview of approaches. Canadian Journal of Educational Communi- case study. In F. Percival & H. Ellington, Aspects of educational tech- cation, 15 (1), 5±17. nology: Vol. XV: Distance learning and evaluation (pp. 164±171). Weston, C. B. (1987). The importance of involving experts and learners in London: Kogan Page. formative evaluation. Canadian Journal of Educational Communica-Scheirer, M. A. (1994). Designing and using process evaluation. In J. S. tions, 16 (1), 45±58. Wholey, H. Hatry & K. Newcomer, Handbook of practical program Wilkinson, T. L., Schuler, R. T., & Skjolaas, C. A. (1993). The effect of evaluation (pp. 40±68). San Francisco: Jossey-Bass. safety training and experience of youth tractor operators. National Insti-Scheirer, M. A., & Rezmovic, E. L. (1983). Measuring the degree of tute for Farm Safety, Inc. NIFS Paper No. 93±6. Columbia, MO. NIFS program implementation. Evaluation Review, 7 (5), 599±633. Summer Meeting, Coeur dAlene, Idaho.Schneider, M. L., Ituarte, P., & Stokols, D. (1993). Evaluation of a commu- Witte, K., Peterson, T. R., Vallabhan, S., Stephenson, M. T., Plugge, C. D., nity bicycle helmet promotion campaign: What works and why. Amer- Givens, V. K., Todd, J. D., Bechtold, M. G., Hyde, M. K., & Jarrett, R. ican Journal of Health Promotion, 7 (4), 281±287. (1992/3). Preventing tractor-related injuries and deaths in rural popula-Scriven, M. (1967). The methodology of evaluation. In R. Tyler, R. Gagne tions: Using a persuasive health message framework in formative & M. Scriven, Perspectives of curriculum evaluation (pp. 39±83). evaluation research. International Quarterly of Community Health Chicago: Rand McNally. Education, 13 (3), 219±251.