Washington Evaluators Brown Bag
by Stephanie Shipman and Valerie Caracelli
The desire to improve federal government performance and accountability has led to calls to focus federal funds on approaches found effective through rigorous evaluation. While a randomized experiment is considered a highly rigorous approach for assessing the effectiveness of health and social service interventions, it is not the only rigorous method available and it is not always feasible. Stephanie Shipman and Valerie Caracelli will present the findings of their recent report (GAO-10-30) on what types of interventions are best suited to randomized experiments and what other rigorous approaches are available for assessing program effectiveness.
Dr. Stephanie Shipman is an Assistant Director of the Center for Evaluation Methods and Issues in the Applied Research and Methods Team at the U.S. Government Accountability Office (GAO). Over the past several years she has directed studies of federal agencies' performance measurement and program evaluation activities, and methods for solving analytic challenges in program performance assessment. She will speak to us about the GAO report released last November, Program Evaluation: A Variety of Rigorous Methods Can Help Identify Effective Interventions. Stephanie currently serves on the American Evaluation Association’s Evaluation Policy Task Force, and in 2008, received the Association’s Alva and Gunnar Myrdal Government Award for evaluation work that has been highly influential in government. Dr. Shipman is a founding member of and coordinator for the Federal Evaluators group, an informal network of evaluation officials.
Dr. Valerie Caracelli is a Senior Analyst in the Center for Evaluation Methods and Issues in the Applied Research and Methods Team at the U.S. Government Accountability Office (GAO). Valerie assisted Stephanie on this project and consults with GAO content area teams on evaluation design issues and assists teams in examining the quality of evaluation studies. Valerie recently served on the Board of the American Evaluation Association and is on the Board of the Washington Evaluators.
103-Basco Identifying metrics to fully assess the impact of federal R&D inves...innovationoecd
The document summarizes a study that used a Delphi method consensus process with panels of US government officials and researchers to identify metrics for measuring the impact of federal R&D investments. The panels rated 58 proposed metrics across academia, government, economy and society in three rounds. 33 metrics met criteria to be considered useful by both panels. 16 metrics were endorsed by both panels, while 7 were endorsed just by government officials and 10 just by researchers. The endorsed metrics covered areas like new products, funding, health outcomes, behavior changes and more.
The document discusses evaluation of health programs. It defines evaluation as the systematic acquisition and assessment of information to provide useful feedback. The main goals of evaluation are to influence decision-making and policy formulation through empirically-driven feedback. Formative evaluation assesses needs and implementation, while summative evaluation determines outcomes, impacts, costs and benefits. Evaluation questions, methods, and frameworks are described to establish program merit, worth and significance based on credible evidence from stakeholders. Standards ensure evaluations are useful, feasible, proper and accurate.
Can systematic reviews help identify what works and why?Carina van Rooyen
This document discusses systematic reviews (SRs) as a tool to evaluate the impact of development interventions. It notes calls from funders to demonstrate what works using evidence-based approaches. While randomized controlled trials (RCTs) are often advocated, SRs are presented as a way to overcome some of RCTs' limitations. The document summarizes a SR conducted by the authors on the impact of microfinance in sub-Saharan Africa. It took a pragmatic approach, including a variety of study designs and developing a causal pathway to understand impact. The SR found microfinance has the potential to benefit the poor but also identified challenges, calling for more and better evaluations.
This document discusses monitoring and evaluation concepts for family planning programs. It begins by outlining session objectives related to applying M&E frameworks, indicators, and issues to family planning programs from a post-Cairo perspective. It then provides an overview of topics to be covered including family planning frameworks, implications of the Cairo agenda, indicators like contraceptive prevalence and unmet need, monitoring quality of care, and linkages between family planning and HIV. The document reviews conceptual frameworks for understanding factors influencing fertility and family planning supply. It discusses applying these frameworks for M&E by examining inputs, outputs, outcomes, and impacts. Specific indicators, data sources, and issues related to monitoring quality of care, contraceptive prevalence, unmet need,
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...MEASURE Evaluation
This document discusses the evolution of impact evaluations for family planning programs. It provides historical context on impact evaluations dating back to the 1990s, which primarily used randomized controlled trials and quasi-experimental designs. More recent considerations include theory-based approaches, systems-based approaches, and implementation science to evaluate family planning programs. The document recommends accepting a wide range of evaluation designs that meet but not exceed stakeholder needs.
I gave this talk at a Nigeria Health Summit in March 2016. It was an introduction to impact evaluation: what it is, when it's a good idea, and some possible approaches.
Errors Found in National Evaluation of UpwardBound- Postive Re-Analysis ResultsCHEARS
Presentation to Council for Opportunity in Education (COE) documents errors in National Evaluation of Upward Bound reports. Eight major errors are identified. Results summarized from re-analysis correcting for sampling and non-sampling errors that found strong positive impacts for the federal TRIO program.
103-Basco Identifying metrics to fully assess the impact of federal R&D inves...innovationoecd
The document summarizes a study that used a Delphi method consensus process with panels of US government officials and researchers to identify metrics for measuring the impact of federal R&D investments. The panels rated 58 proposed metrics across academia, government, economy and society in three rounds. 33 metrics met criteria to be considered useful by both panels. 16 metrics were endorsed by both panels, while 7 were endorsed just by government officials and 10 just by researchers. The endorsed metrics covered areas like new products, funding, health outcomes, behavior changes and more.
The document discusses evaluation of health programs. It defines evaluation as the systematic acquisition and assessment of information to provide useful feedback. The main goals of evaluation are to influence decision-making and policy formulation through empirically-driven feedback. Formative evaluation assesses needs and implementation, while summative evaluation determines outcomes, impacts, costs and benefits. Evaluation questions, methods, and frameworks are described to establish program merit, worth and significance based on credible evidence from stakeholders. Standards ensure evaluations are useful, feasible, proper and accurate.
Can systematic reviews help identify what works and why?Carina van Rooyen
This document discusses systematic reviews (SRs) as a tool to evaluate the impact of development interventions. It notes calls from funders to demonstrate what works using evidence-based approaches. While randomized controlled trials (RCTs) are often advocated, SRs are presented as a way to overcome some of RCTs' limitations. The document summarizes a SR conducted by the authors on the impact of microfinance in sub-Saharan Africa. It took a pragmatic approach, including a variety of study designs and developing a causal pathway to understand impact. The SR found microfinance has the potential to benefit the poor but also identified challenges, calling for more and better evaluations.
This document discusses monitoring and evaluation concepts for family planning programs. It begins by outlining session objectives related to applying M&E frameworks, indicators, and issues to family planning programs from a post-Cairo perspective. It then provides an overview of topics to be covered including family planning frameworks, implications of the Cairo agenda, indicators like contraceptive prevalence and unmet need, monitoring quality of care, and linkages between family planning and HIV. The document reviews conceptual frameworks for understanding factors influencing fertility and family planning supply. It discusses applying these frameworks for M&E by examining inputs, outputs, outcomes, and impacts. Specific indicators, data sources, and issues related to monitoring quality of care, contraceptive prevalence, unmet need,
Evolution of Family Planning Impact Evaluation: New contexts and methodologic...MEASURE Evaluation
This document discusses the evolution of impact evaluations for family planning programs. It provides historical context on impact evaluations dating back to the 1990s, which primarily used randomized controlled trials and quasi-experimental designs. More recent considerations include theory-based approaches, systems-based approaches, and implementation science to evaluate family planning programs. The document recommends accepting a wide range of evaluation designs that meet but not exceed stakeholder needs.
I gave this talk at a Nigeria Health Summit in March 2016. It was an introduction to impact evaluation: what it is, when it's a good idea, and some possible approaches.
Errors Found in National Evaluation of UpwardBound- Postive Re-Analysis ResultsCHEARS
Presentation to Council for Opportunity in Education (COE) documents errors in National Evaluation of Upward Bound reports. Eight major errors are identified. Results summarized from re-analysis correcting for sampling and non-sampling errors that found strong positive impacts for the federal TRIO program.
The role of Monitoring and Evaluation in Improving Public Policies – Challeng...UNDP Policy Centre
IPC-IG's Research Coordinator, Fábio Veras Soares, presentation at the "International Conference on the
Institutionalization of Public Policies Evaluation", held in Rabat, on 5 October.
Impact evaluation aims to systematically and objectively assess the causal effects of development programs and policies. It helps determine if interventions are cost-effective and achieving their intended impacts. Impact evaluations use quantitative and qualitative methods to estimate what would have occurred in the absence of the program by identifying valid counterfactual comparisons. Key challenges include self-selection bias which can be addressed through experimental designs that introduce randomness, natural experiments, or statistical techniques to reduce observable differences between treatment and control groups.
This document outlines a model toolkit for conducting impact evaluations. It discusses key concepts in impact evaluation including definitions of impact, theories of change, causal attribution, and mixed methods approaches. The document proposes an ontological framework to guide impact assessment planning, covering aspects like subject area, target groups, research design, sampling, data collection and analysis methods. It describes experimental, quasi-experimental and non-experimental research designs for addressing causal attribution and achieving credible results. The goal is to integrate monitoring, evaluation and research from the beginning to generate a range of evidence and understand both outcomes and impacts of interventions over time.
How Do We Evaluate That? Evaluation in the Uncontrolled WorldMEASURE Evaluation
This document summarizes the main challenges to conducting rigorous impact evaluations of global health programs operating in real-world settings. It discusses several proposed approaches to address these challenges, including adequacy, plausibility and probability assessments, national evaluation platforms, step-wise approaches, and mixed methods. Common themes across different approaches include starting with a program's causal model, measuring progress along the causal pathway including implementation, using multiple existing and new data sources, and focusing on producing actionable results for decision making.
CORE Group Fall Meeting 2010. Family Planning Integration: Overcoming Barriers to NGO Programming. A Presentation of Preliminary Results from the CORE Group CBFP/MCH Integration Survey. - Paige Anderson Bowen, CORE Group Consultant
This document provides an overview of strategic prevention frameworks for substance abuse prevention. It discusses using data-driven strategic planning processes to build prevention systems at the community level. The key aspects covered include assessing community needs through data, setting goals and outcomes to reduce substance abuse, building community capacity, developing strategic plans, implementing programs and policies, and evaluating efforts to achieve goals and guide future work.
Development of a Compendium of Effective Structural Interventions for HIV Pre...CDC NPIN
This document discusses the development of a compendium of effective structural HIV prevention interventions. It describes the process of establishing expert criteria for inclusion, selecting 18 interventions, and preparing detailed entries on the rationale, implementation, evaluation and lessons of each. The interventions addressed policies/laws, resource provision, and social marketing. Key themes were emerging from community needs, evolving implementation, and the difficulty of evaluating complex structural interventions. The compendium aims to facilitate replication and adaptation of evidence-based structural approaches.
Evaluation: Lessons Learned for the Global Health InitiativeMEASURE Evaluation
This document summarizes lessons learned from evaluations of global health programs. It discusses challenges with evaluation designs and provides examples of evaluations in Kenya, Tanzania, Nigeria, and Bangladesh. Key lessons are the importance of clear program descriptions, considering impact pathways, assessing implementation, combining quantitative and qualitative data, and focusing on using findings to inform programs.
An overview of impact evaluation for organizations based on a program's Theory of Change, highlighting the need for a counterfactual and randomization (when possible) in order to convincingly demonstrate the effect of the program.
The document discusses the use of cost-benefit analysis in justice policymaking. It describes how the Vera Institute of Justice's Cost-Benefit Analysis Unit helps policymakers evaluate the economic costs and benefits of criminal justice programs and policies. It provides examples of cost-benefit studies that have found some evidence-based programs reduce recidivism and generate cost-savings, while incarceration is only cost-effective for serious offenders. The document encourages the use of cost-benefit analysis to inform decision-making and identifies resources for further information.
This document provides guidance for staff at the Botswana Christian AIDS Intervention Program (BOCAIP) on writing proposals to access funding. It defines key terms related to project design and evaluation. It also offers tips on finding donors, including government agencies, international NGOs, corporations, family foundations, and churches. The document provides guidance on conducting needs assessments, developing the components of a funding proposal, including logical frameworks, implementation schedules and budgets. It highlights where to find additional resources and information.
Improving measurement through Operations Researchjehill3
Improving measurement through Operations Research
Peter Winch, Johns Hopkins Bloomberg School of Public Health
CORE Group Spring Meeting, April 28, 2010
When It Comes from the People: The Effects of Reforming Ballot Initiative Exp...Robert Richards
Obstacles to citizens’ understanding of the legal content of ballot measures were thought to lie in inconsistencies between the communicative practices of official summaries of such measures and citizens’ own legal communicative practices. Theory suggested that revising official ballot-measure summaries to include elements of citizens’ legal communicative practices would enhance citizens’ understanding of the legal content of ballot measures, confidence in that understanding, and confidence in their voting decision about such measures, and that intrapersonal reflection would mediate those effects. Those predictions were tested in a controlled experiment. Results showed no evidence of knowledge gains, but subjects exposed to a description of the policy objectives of a ballot measure showed significant increases in voting confidence. A subset of those subjects also experienced significant increases in knowledge confidence. Many findings were consistent with theories of sense-making (Dervin & Frenette, 2001), framing (Scheufele, 1999, 2000), and the theory of reasoned action (Ajzen & Fishbein, 1977), and with the prediction that a “means-end” schema operates in citizens’ minds during deliberations about proposed laws.
BSCHUCH Culminating project final post defenseFINALBrittany Schuch
This document summarizes an evaluation of Ohio's Surface Water Contact Advisory Removal Program. The evaluation was conducted to develop recommendations for a more consistent process to reevaluate and remove contact advisories from contaminated waterways. Through interviews with Ohio Department of Health decision-makers and a survey of other Midwestern state programs, the evaluation aims to determine the key factors considered when deciding to remove an advisory. A review of past Ohio case studies also identifies similarities and differences in how sites were evaluated for remediation. The results will help Ohio agencies streamline the advisory removal process and clarify the information needed to make removal decisions.
Performance Partnership Case Presentation: Evaluation @EPANick Hart, Ph.D.
This document summarizes a case study on evaluations of the National Environmental Performance Partnership System (NEPPS) at the U.S. Environmental Protection Agency (EPA). Key points:
- NEPPS allows states flexibility in implementing environmental programs and allocating grant funds but has unclear goals, varying by state, limiting evaluations.
- Only 3 formal evaluations of NEPPS conducted in 20 years despite annual "joint evaluations" between states and EPA that are compliance reports, not true evaluations.
- Future opportunities for NEPPS evaluation include clarifying design to support evaluation, strengthening joint evaluation protocols, and developing systems to analyze outcomes.
- Barriers and facilitators of evaluation capacity at EPA exist
1) The document discusses citizen engagement and the political-bureaucratic divide in government. It covers topics like client satisfaction, citizen-centered approaches, and measuring user satisfaction to improve government services.
2) A key difference between public and private sectors is that government provides public services that are non-excludable and non-rival, making user satisfaction difficult to measure. Citizen-centered approaches aim to empower citizens and ensure accountability.
3) Maintaining the distinct roles of elected political staff and unelected public servants is important to uphold democratic principles. Public servants must give unbiased advice while respecting the ultimate decision-making of politicians.
Miguel de Cervantes was a famous Spanish writer known for his novel 'Don Quixote de la Mancha'. The novel is considered one of the most influential works of literature and is universally known around the world. A group of students - Sandra, Adrián, Ana and Isabel B. - created a presentation about Miguel de Cervantes and his most famous work.
Calusul is a Romanian folk dance that represents a horse divinity and protector of the seasons and horses. It dates back to pre-Christian Roman times and is one of the few dances from that era still performed today. The dance is performed by a group of men called "calusari" who are chosen by their leader, the "vataf", and follow an initiation ritual passed down through generations. During the dance, the calusari imitate the movements and sounds of horses through bells, spurs and a harness. The dance contains magical elements to protect people, animals and crops from evil forces.
The role of Monitoring and Evaluation in Improving Public Policies – Challeng...UNDP Policy Centre
IPC-IG's Research Coordinator, Fábio Veras Soares, presentation at the "International Conference on the
Institutionalization of Public Policies Evaluation", held in Rabat, on 5 October.
Impact evaluation aims to systematically and objectively assess the causal effects of development programs and policies. It helps determine if interventions are cost-effective and achieving their intended impacts. Impact evaluations use quantitative and qualitative methods to estimate what would have occurred in the absence of the program by identifying valid counterfactual comparisons. Key challenges include self-selection bias which can be addressed through experimental designs that introduce randomness, natural experiments, or statistical techniques to reduce observable differences between treatment and control groups.
This document outlines a model toolkit for conducting impact evaluations. It discusses key concepts in impact evaluation including definitions of impact, theories of change, causal attribution, and mixed methods approaches. The document proposes an ontological framework to guide impact assessment planning, covering aspects like subject area, target groups, research design, sampling, data collection and analysis methods. It describes experimental, quasi-experimental and non-experimental research designs for addressing causal attribution and achieving credible results. The goal is to integrate monitoring, evaluation and research from the beginning to generate a range of evidence and understand both outcomes and impacts of interventions over time.
How Do We Evaluate That? Evaluation in the Uncontrolled WorldMEASURE Evaluation
This document summarizes the main challenges to conducting rigorous impact evaluations of global health programs operating in real-world settings. It discusses several proposed approaches to address these challenges, including adequacy, plausibility and probability assessments, national evaluation platforms, step-wise approaches, and mixed methods. Common themes across different approaches include starting with a program's causal model, measuring progress along the causal pathway including implementation, using multiple existing and new data sources, and focusing on producing actionable results for decision making.
CORE Group Fall Meeting 2010. Family Planning Integration: Overcoming Barriers to NGO Programming. A Presentation of Preliminary Results from the CORE Group CBFP/MCH Integration Survey. - Paige Anderson Bowen, CORE Group Consultant
This document provides an overview of strategic prevention frameworks for substance abuse prevention. It discusses using data-driven strategic planning processes to build prevention systems at the community level. The key aspects covered include assessing community needs through data, setting goals and outcomes to reduce substance abuse, building community capacity, developing strategic plans, implementing programs and policies, and evaluating efforts to achieve goals and guide future work.
Development of a Compendium of Effective Structural Interventions for HIV Pre...CDC NPIN
This document discusses the development of a compendium of effective structural HIV prevention interventions. It describes the process of establishing expert criteria for inclusion, selecting 18 interventions, and preparing detailed entries on the rationale, implementation, evaluation and lessons of each. The interventions addressed policies/laws, resource provision, and social marketing. Key themes were emerging from community needs, evolving implementation, and the difficulty of evaluating complex structural interventions. The compendium aims to facilitate replication and adaptation of evidence-based structural approaches.
Evaluation: Lessons Learned for the Global Health InitiativeMEASURE Evaluation
This document summarizes lessons learned from evaluations of global health programs. It discusses challenges with evaluation designs and provides examples of evaluations in Kenya, Tanzania, Nigeria, and Bangladesh. Key lessons are the importance of clear program descriptions, considering impact pathways, assessing implementation, combining quantitative and qualitative data, and focusing on using findings to inform programs.
An overview of impact evaluation for organizations based on a program's Theory of Change, highlighting the need for a counterfactual and randomization (when possible) in order to convincingly demonstrate the effect of the program.
The document discusses the use of cost-benefit analysis in justice policymaking. It describes how the Vera Institute of Justice's Cost-Benefit Analysis Unit helps policymakers evaluate the economic costs and benefits of criminal justice programs and policies. It provides examples of cost-benefit studies that have found some evidence-based programs reduce recidivism and generate cost-savings, while incarceration is only cost-effective for serious offenders. The document encourages the use of cost-benefit analysis to inform decision-making and identifies resources for further information.
This document provides guidance for staff at the Botswana Christian AIDS Intervention Program (BOCAIP) on writing proposals to access funding. It defines key terms related to project design and evaluation. It also offers tips on finding donors, including government agencies, international NGOs, corporations, family foundations, and churches. The document provides guidance on conducting needs assessments, developing the components of a funding proposal, including logical frameworks, implementation schedules and budgets. It highlights where to find additional resources and information.
Improving measurement through Operations Researchjehill3
Improving measurement through Operations Research
Peter Winch, Johns Hopkins Bloomberg School of Public Health
CORE Group Spring Meeting, April 28, 2010
When It Comes from the People: The Effects of Reforming Ballot Initiative Exp...Robert Richards
Obstacles to citizens’ understanding of the legal content of ballot measures were thought to lie in inconsistencies between the communicative practices of official summaries of such measures and citizens’ own legal communicative practices. Theory suggested that revising official ballot-measure summaries to include elements of citizens’ legal communicative practices would enhance citizens’ understanding of the legal content of ballot measures, confidence in that understanding, and confidence in their voting decision about such measures, and that intrapersonal reflection would mediate those effects. Those predictions were tested in a controlled experiment. Results showed no evidence of knowledge gains, but subjects exposed to a description of the policy objectives of a ballot measure showed significant increases in voting confidence. A subset of those subjects also experienced significant increases in knowledge confidence. Many findings were consistent with theories of sense-making (Dervin & Frenette, 2001), framing (Scheufele, 1999, 2000), and the theory of reasoned action (Ajzen & Fishbein, 1977), and with the prediction that a “means-end” schema operates in citizens’ minds during deliberations about proposed laws.
BSCHUCH Culminating project final post defenseFINALBrittany Schuch
This document summarizes an evaluation of Ohio's Surface Water Contact Advisory Removal Program. The evaluation was conducted to develop recommendations for a more consistent process to reevaluate and remove contact advisories from contaminated waterways. Through interviews with Ohio Department of Health decision-makers and a survey of other Midwestern state programs, the evaluation aims to determine the key factors considered when deciding to remove an advisory. A review of past Ohio case studies also identifies similarities and differences in how sites were evaluated for remediation. The results will help Ohio agencies streamline the advisory removal process and clarify the information needed to make removal decisions.
Performance Partnership Case Presentation: Evaluation @EPANick Hart, Ph.D.
This document summarizes a case study on evaluations of the National Environmental Performance Partnership System (NEPPS) at the U.S. Environmental Protection Agency (EPA). Key points:
- NEPPS allows states flexibility in implementing environmental programs and allocating grant funds but has unclear goals, varying by state, limiting evaluations.
- Only 3 formal evaluations of NEPPS conducted in 20 years despite annual "joint evaluations" between states and EPA that are compliance reports, not true evaluations.
- Future opportunities for NEPPS evaluation include clarifying design to support evaluation, strengthening joint evaluation protocols, and developing systems to analyze outcomes.
- Barriers and facilitators of evaluation capacity at EPA exist
1) The document discusses citizen engagement and the political-bureaucratic divide in government. It covers topics like client satisfaction, citizen-centered approaches, and measuring user satisfaction to improve government services.
2) A key difference between public and private sectors is that government provides public services that are non-excludable and non-rival, making user satisfaction difficult to measure. Citizen-centered approaches aim to empower citizens and ensure accountability.
3) Maintaining the distinct roles of elected political staff and unelected public servants is important to uphold democratic principles. Public servants must give unbiased advice while respecting the ultimate decision-making of politicians.
Miguel de Cervantes was a famous Spanish writer known for his novel 'Don Quixote de la Mancha'. The novel is considered one of the most influential works of literature and is universally known around the world. A group of students - Sandra, Adrián, Ana and Isabel B. - created a presentation about Miguel de Cervantes and his most famous work.
Calusul is a Romanian folk dance that represents a horse divinity and protector of the seasons and horses. It dates back to pre-Christian Roman times and is one of the few dances from that era still performed today. The dance is performed by a group of men called "calusari" who are chosen by their leader, the "vataf", and follow an initiation ritual passed down through generations. During the dance, the calusari imitate the movements and sounds of horses through bells, spurs and a harness. The dance contains magical elements to protect people, animals and crops from evil forces.
The document discusses speed and agility training for football (soccer) players. It outlines some key elements of success in football that relate to speed such as accelerating, changing direction, and maintaining ball control at a fast pace. The Academy of Sport Speed and Agility focuses on developing these skills through sprint mechanics, strength, explosive speed and power, agility, and flexibility training. Common issues restricting speed in footballers like tight ankles and hips are addressed. Training is conducted by director Ranell Hobson who has 20 years experience coaching athletes at various levels.
Washington Evaluators (WE) is a local affiliate of the American Evaluation Association (AEA). WE was founded over 30 years ago as a professional society devoted to fostering state-of-the-art knowledge and information sharing.
A equação r2=4cosθ2 define uma curva chamada lemniscatada. Um gráfico mostra que R=3(1-cosθ) para ângulos de 45 graus. A figura geométrica lemniscatada é descrita.
A group of friends sailing on a yacht hear a strange noise coming from a nearby island. When they go to investigate, their yacht disappears. Stranded on the island, they grow hungry and one friend goes to check the source of the noise but never returns. The others find him badly injured, and with his dying words he warns them to watch out for mermaids. Later, the mermaids befriend the group but also reveal they have a magic stone that can revive their dead friend, which they use to bring him back to life before helping reunite the friends with their yacht.
The Utilization of DHHS Program Evaluations: A Preliminary ExaminationWashington Evaluators
Washington Evaluators Brown Bag
by Andrew Rock and Lucie Vogel
October 5, 2010
The presentation will describe a study conducted by the Lewin Group on the utilization of program evaluations in the Department of Health and Human Services for the Assistant Secretary for Planning and Evaluation. The study used an online survey of project officers and managers from a sample of program evaluations selected from the Policy Information Center database. To supplement the survey data, Lewin conducted focus groups with senior staff in six agencies. Key findings of the study focused on direct, conceptual and indirect use and the importance of high quality methods, stakeholder involvement in evaluation design, presence of a champion, and study findings that were perceived to be important. The study concluded with recommendations for a strengthened internal evaluation group within HHS and future research using a case study approach for greater in-depth examination.
Mr. Andrew Rock initiated/conceived and was the Project Officer (COTR) for the study. He works for the Office of Planning and Policy Support in the Office of the Assistant Secretary for Planning and Evaluation (ASPE), HHS. He is responsible for the Department's annual comprehensive report to Congress on HHS evaluations, coordinates the HHS legislative development process, represents his office on the Continuity of Operations Workgroup, and has worked on various cross-cutting issues including homelessness, tribal self-governance, and health reform. In addition to his work in ASPE, he has worked at the Centers for Medicare and Medicaid Services, the Public Health Service, and the Office of the National Coordinator for Health Information Technology.
Ms Lucie Vogel served as a Stakeholder Committee Member for the study. She works in the Division of Planning, Evaluation and Research in the Indian Health Service, developing Strategic and Health Service Master Plans, conducting evaluation studies, and reporting on agency performance. She previously served in evaluation and planning positions in the Food Safety and Inspection Service, the Virginia Department of Rehabilitative Services, the University of Virginia, and the Wisconsin Department of Health and Social Services.
Nicole Moller received a letter of commendation from the Board of Examiners for her excellent results in Semester 1, 2016 at Curtin University's School of Accounting. Her results placed her among the top students for the semester weighted average. Professor Alina Lee, the Head of School, also congratulated Nicole and wished her continued success in her future studies.
This document discusses different programming paradigms and languages. It describes batch programs which run without user interaction and event-driven programs which respond to user events. It lists many popular programming languages from Machine Language to Java and C#, and describes low-level languages that are close to machine code and high-level languages that are more human-readable. It also discusses the different types of language translators like compilers, interpreters, and assemblers and how they convert code between languages. Finally, it covers testing, debugging, and different types of errors in programming.
This presentation will introduce you to programming languages that support different programming paradigms and to give you the knowledge of basic concepts and techniques that will allow them to differentiate between various programming paradigms.
SOCW 6311 WK 6 responses Respond to at least two colleagues .docxsamuel699872
SOCW 6311 WK 6 responses
Respond to at least two colleagues each one has to be answered separately name first then response
Bottom of Form
Respond
to
at least two
colleagues by doing all of the following:
Identify the stage or stages of the program to which your colleague’s selected question relates.
Suggest an additional question or concern that stakeholders may have about program evaluation.
Recommend an alternative model for the evaluation.
Instructor wants lay out like this:
Respond to at least two colleagues ( 2 peers posts are provided) by doing all of the following:
Identify strengths of your colleagues’ analyses and areas in which the analyses could be improved.
Your response
Address his or her evaluation of the efficacy and applicability of the evidence-based practice,
Your response
[Evaluate] his or her identification of factors that could support or hinder the implementation of the evidence-based practice,
Your response
And [evaluate] his or her solution for mitigating those factors.
Your response
Offer additional insight to your colleagues by either identifying additional factors that may support or limit implementation of the evidence-based practice or an alternative solution for mitigating one of the limitations that your colleagues identified.
Your response
References
Your response
PEER 1
Elektra Smith
Top of Form
Post a brief summary of the program that you selected. Recommend a program evaluation model that would answer a question relevant to the program.
I chose a victim advocate program that provides crisis intervention for sexual assault victims. “The Victim Advocate provides emotional support to primary victims and secondary victims during the examination at the hospital or during an interview with the police. Applicant must be able to respond to victim/family in a non-judgmental and unbiased manner. The requirement is to work a minimum 2 shifts per month (
https://visitthecenter.org/volunteer
, 2018).” I chose the program monitoring to answer the question about clients being satisfied with this service program.
Explain the potential benefits of the program evaluation that you proposed (both process and outcome).
The process benefits of monitoring the program helps with determining the strengths and weaknesses of the service program that is being implemented. It helps to discover ways to improve program services for the most effective outcomes. Additionally, monitoring the program presents accountability to ensure effectiveness and integrity of the program. “Program monitoring typically uses many different types of data-collection strategies, such as questionnaires given out to clients or staff members, individual and group interviewing of staff and clients, observations of pro-grams and specific interactions between staff members and clients, reviews of existing documents such as client files and personnel documents, and consulting experts (Dudley, 2014) (p.73).”
Identify 2–3 concern.
Applied research is intended to provide practical information to organizations to suggest actions or increase effectiveness. It uses theory instrumentally to identify concepts and variables that will produce practical results. Evaluation research assesses the impact of programs, policies, or laws, often focusing on whether the intervention succeeded in creating intended changes. Different types of evaluation research include outcome evaluation, cost-benefit analysis, process evaluation, formative analysis, and needs assessments. Careful research design is needed to make causal claims about a program's effects. Participatory action research involves community members in the research process. Politics and organizational interests can influence evaluation findings and their use.
490The Future of EvaluationOrienting Questions1. H.docxblondellchancy
490
The Future of Evaluation
Orienting Questions
1. How are future program evaluations likely to be different from current evaluations in
• the way in which political considerations are handled?
• the approaches that will be used?
• the involvement of stakeholders?
• who conducts them?
2. How is evaluation like some other activities in organizations?
3. How is evaluation viewed differently in other countries?
We have reached the last chapter of this book, but we have only begun to share
what is known about program evaluation. The references we have made to other
writings reflect only a fraction of the existing literature in this growing field. In
choosing to focus attention on (1) alternative approaches to program evaluation,
and (2) practical guidelines for planning, conducting, reporting, and using evalu-
ation studies, we have tried to emphasize what we believe is most important to
include in any single volume that aspires to give a broad overview of such a complex
and multifaceted field. We hope we have selected well, but we encourage students
and evaluation practitioners to go beyond this text to explore the richness and
depth of other evaluation literature. In this final chapter, we share our perceptions
and those of a few of our colleagues about evaluation’s future.
The Future of Evaluation
Hindsight is inevitably better than foresight, and ours is no exception. Yet present
circumstances permit us to hazard a few predictions that we believe will hold true
for program evaluation in the next few decades. History will determine whether
18
Chapter 18 • The Future of Evaluation 491
Predictions Concerning the Profession
of Evaluation
1. Evaluation will become an increasingly useful force in our society. As
noted, evaluation will have increasing impacts on programs, on organizations, and
on society. Many of the movements we have discussed in this text—performance
monitoring, organizational learning, and others—illustrate the increasing interest
in and impact of evaluation in different sectors. Evaluative means of thinking will
improve ways of planning and delivering programs and policies to achieve their
intended effects and, more broadly, improve society.
2. Evaluation will increase in the United States and in other developed
countries as the pressure for accountability weighs heavily on governments and
nonprofit organizations that deliver vital services. The emphasis on accountability
and data-based decision making has increased dramatically in the first decade of
the twenty-first century. Also, virtually every trend points to more, not less, eval-
uation in the public, private, and nonprofit sectors in the future. In some organi-
zations, the focus is on documenting outcomes in response to external political
pressures. In other organizations, evaluation is being used for organizational
growth and development, which should, ultimately, improve the achievement of
those outcomes. In each context, however, evaluation is in dema ...
Presentation by Eduardo Esteban Romero Fong, General Coordinator, Regulatory Impact Assessment, Federal Commission for Regulatory Improvement, Mexico, at the 6th Expert Meeting on Measuring Regulatory Performance: Evaluating Stakeholder Engagement in Regulatory Policy, Reporting back, Breakout Session 2, The Hague, 16-18 June 2014. Further information is available at http://www.oecd.org/gov/regulatory-policy/
The document discusses communication audits, which involve evaluating an organization's communication processes and systems. A communication audit aims to determine if all stakeholders are receiving intended messages, identify strengths and weaknesses, and indicate areas for improvement. The key steps of conducting an audit involve securing management commitment, identifying current practices, setting success standards, developing an action plan, and measuring results. Common audit methods include questionnaires, interviews, focus groups, and analyzing communication outputs and stakeholder feedback. The goals are to improve communication quality and relationships within the organization.
CHAPTER SIXTEENUnderstanding Context Evaluation and MeasuremeJinElias52
CHAPTER SIXTEEN
Understanding Context: Evaluation and Measurement in Not-for-Profit Sectors
Dale C. Brandenburg
Many individuals associated with community agencies, health care, public workforce development, and similar not-for-profit organizations view program evaluation akin to a visit to the dentist’s office. It’s painful, but at some point it cannot be avoided. A major reason for this perspective is that evaluation is seen as taking money away from program activities that perform good for others, that is, intruding on valuable resources that are intended for delivering the “real” services of the organization (Kopczynski & Pritchard, 2004). A major reason for this logic is that since there are limited funds available to serve the public good, why must a portion of program delivery be allocated to something other than serving people in need? This is not an unreasonable point and one that program managers in not-for-profits face on a continuing basis.
The focus of evaluation in not-for-profit organization has shifted in recent years from administrative data to outcome measurement, impact evaluation, and sustainability (Aspen Institute, 2000), thus a shift from short-term to long-term effects of interventions. Evaluators in the not-for-profit sector view their world as the combination of technical knowledge, communication skills, and political savvy that can make or break the utility and value of the program under consideration. Evaluation in not-for-profit settings tends to value the importance of teamwork, collaboration, and generally working together. This chapter is meant to provide a glimpse at a minor portion of the evaluation efforts that take place in the not-for-profit sector. It excludes, for example, the efforts in public education, but does provide some context for workforce development efforts.
CONTRAST OF CONTEXTS
Evaluation in not-for-profit settings tends to have different criteria for the judgment of its worth than is typically found in corporate and similar settings. Such criteria are likely to include the following:
How useful is the evaluation?
Is the evaluation feasible and practical?
Does the evaluation hold high ethical principles?
Does the evaluation measure the right things, and is it accurate?
Using criteria such as the above seems a far cry from concepts of return on investment that are of vital importance in the profit sector. Even the cause of transfer of training can sometimes be of secondary importance to assuring that the program is described accurately. Another difference is the pressure of time. Programs offered by not-for-profit organizations, such as an alcohol recovery program, take a long time to see the effects and, by the time results are viewable, the organization has moved on to the next program. Instead we often see that evaluation is relegated to measuring the countable, the numbers of people who have completed the program, rather than the life-changing impact that decreased alcohol abuse has on ...
Assessing and improving partnership relationships and outcomes a proposed fr...Emily Smith
This document proposes a framework for assessing partnership relationships and outcomes. The framework aims to: 1) improve partnership practice as programs are implemented, 2) refine and test hypotheses about how partnerships contribute to performance, and 3) provide lessons for future partnerships. The proposed assessment approach is continuous, participatory, and developmental. It measures compliance with partnership success factors, the degree of partnership practices, partnership outcomes, partner performance, and efficiency. The framework integrates process and institutional factors into performance measurement to provide a more holistic view of how partnerships function and contribute to outcomes.
A Framework for Assessing the Socio-Economic Impact of E-Gov.docxsleeperharwell
A Framework for Assessing the Socio-Economic
Impact of E-Governance Projects in Developing
Countries
Sylvester Hatsu
University of South Africa/Accra Polytechnic
P.O. Box 561
Accra-Ghana
+233 543937818
[email protected]
Ernest Ketcha Ngassam
University of South Africa
P O Box 392, Pretoria, South Africa
+27823552519
[email protected]
Abstract— A study of more than 100 e-Governance
projects showed that impact assessment of rolled out e-
Governance projects remain insignificant. These findings
remain inconclusive notwithstanding the fact that outcomes
of public sector based ICT4D initiatives have not been fully
established and disseminated. This paper proposes a
framework for assessing the socio-economic impact of e-
governance projects in developing countries. Socio-economic
indicators for e-Governance programmes are identified and
grouped into both core and contextual indicators that form
the basis for the development of an evaluation model. The
proposed assessment framework centered on stakeholders’
participation is then subjected to expert evaluation.
Outcome of our evaluation revealed wide acceptance and
acknowledgement of the relevance and importance of the
framework not only by experts, but also through case-study
based validation tests.
Keywords—Framework, e-Governance, Socio-economic
Impact, developing countries, project lifecycle Critical Success
Factors
I. INTRODUCTION
Drawing upon a study of more than 100 e-Gov projects, it
was observed in a European report that impact assessment
of deployed e-Governance (e-Gov) projects, in terms of
tangible and quantifiable socio-economic benefits, was
found to be still insignificant [9]. Unfortunately, this
situation seems to be in line with findings from other
studies [2; 11; 13]. These findings remain inconclusive by
virtue of the fact that outcomes of public sector based
ICT4D initiatives (e.g. e-Gov) have not been fully
established [4].
Impact assessment of e-Gov faces a number of challenges
because of certain flaws intrinsic to conventional impact
assessment approaches. Some of these challenges include
assessing process as against actual impact, placing more
weight on external as against community centered
indicators of impact. There is also the matter of weak or
absence of baselines.
This paper therefore seeks to develop a framework for
assessing the socio-economic impact of e-governance
projects in developing countries using expert evaluation
and case study for its validation and acceptance. Our
proposed framework is premised by the identification of
the overall key stakeholders and socio-economic
indicators. The latter ought to be considered in
quantitatively and qualitatively determine the effect of the
intervention to its stakeholders and lesson learnt for
improvement thereof.
The remaining part of this paper is structured as
follow. In section 2 below, we propose a methodology
followed.
Running Head: DATA SOURCE EVALUATION 1
DATA SOURCE EVALUATION 2
Data source in Evaluation
Name:
Anthony Tyler
Institution:
Strayer University
Professor:
Dr. Jacob
Date:
June 7, 2020
Data source in Evaluation
There are various sources of data that are used in the process of evaluation in that data is easily acquired in most convenient forms such that it can be used in proper evaluation that aids the best result (Long, et al 2015).
Current and Previous Data
An individual who is involved in an evaluation should start by considering data that is already available as well as the data that had been used previously. According to the program it has been out of operation for more than five years, therefore, being inactive. Knowing what program had been used by checking records that have been documented (Boulmetis et al, 2016). For example, looking at reports written previously allows the evaluator to have the whole idea.
Plan Recipient
The most convenient way to source data is when an evaluator relies on the plan or the program that the recipients have in existence. This is because this is a way that can provide a route way to achieve evaluation needs (Greene et al, 2017). Discussion together with the recipients is a better way to access information while surveys that are involved verify the whole process.
Records from observation
Images that have been recorded in various forms are essential data capturing sources (Greene et al, 2017). Some of these sources include pictures, stored clips, and videos which is the preferred data acquiring points that have been in use. This rationale provides well-elaborated information to the evaluator since well-represented data in form of images and videos that a good example of well-processed data that can be used in making the final resolution in management. However, other means of collecting data depend on the source and individuals that are being involved in their disposition and capacity to provide information (Boulmetis et al, 2016). Actual situations are preferred as they can give accountancy of the real experiences in the entire program and the reason behind occurrences.
Questions
1. According to your accountancies tell us which is the appropriate period when you get to follow programs directives in an actual way
2. Talk about a schedule that was tight and how you handled the schedule.
3. Describe the occurrence when bad information had been brought to your manager, what the reaction
4. During your duty explain incidences that you had a mistake that shows serious improvement is required in a certain field of work.
5. Do you think there is any relationship that exists between Latinos in the schools as appointees and Americans who make them run away from schools?
6. What kind of association should be in existence between a me.
PUA 5303, Organizational Theory 1 Course Learning OutTatianaMajor22
PUA 5303, Organizational Theory 1
Course Learning Outcomes for Unit IV
Upon completion of this unit, students should be able to:
3. Apply different decision-making techniques of effective human resource management to case
scenarios.
3.1 Express your thoughts on decision-making techniques that you have found to be valuable
through your own experiences.
5. Analyze the use of power and politics in public organizations by managers and leaders.
5.1 Describe tactics that are the most valuable for properly managing power within public sector
organizations.
5.2 Differentiate between the major sources of power.
6. Compare ways to improve human communication to build effective teams and groups.
6.1 Determine when different decision-making tactics are best and most appropriately applied in
public sector organizations.
Course/Unit
Learning Outcomes
Learning Activity
3.1
Unit Lesson
Unit IV Case Study
5.1
Unit Lesson
Chapter 5, pp. 136-144
Chapter 8, pp. 235-252
Video: Developing Alternatives and Considering Their Consequences
Video: Moving the Group from Conflict to Consensus
Video: Summing Up the Decision Making Process
Unit IV Case Study
5.2
Unit Lesson
Chapter 5, pp. 136-144
Chapter 8, pp. 235-252
Unit IV Case Study
6.1
Chapter 5, pp. 136-144
Chapter 8, pp. 235-252
Video: Developing Alternatives and Considering Their Consequences
Video: Moving the Group from Conflict to Consensus
Video: Summing Up the Decision Making Process
Unit IV Case Study
Required Unit Resources
Chapter 5: Decision-Making, pp. 136-144 (stop at Who Should Be Involved?)
Chapter 8: Power and Organizational Politics, pp. 235-252 (stop at Is Power a Positive Force or a
Destructive Force?)
UNIT IV STUDY GUIDE
Decision-Making and Organization Politics
PUA 5303, Organizational Theory 2
UNIT x STUDY GUIDE
Title
In order to access the following resources, click the links below.
Watch the following segments from the full video referenced below:
Developing alternatives and considering their consequences (Segment 5 of 8) [Video],
Moving the group from conflict to consensus (Segment 6 of 8) [Video], and
Summing up the decision making process (Segment 7 of 8) [Video].
The Hathaway Group (Producer). (2014). The Cuban Missile Crisis: A case study in decision making and its
consequences [Video]. Films on Demand.
https://libraryresources.columbiasouthern.edu/login?auth=CAS&url=https://fod.infobase.com/PortalPl
aylists.aspx?wID=273866&xtid=53321
Unit Lesson
By virtue of the missions of numerous public sector organizations and the sector in general, decision-making
is an important aspect to many public sector employees. Additionally, these decisions often affect a large
number of people from the general public and can have widespread ramifications. In many instances,
decisions need to be made in a short period of time and with limited information, which complicates the
proce ...
SOCW 6311 wk 11 discussion 1 peer responses
Respond
to
at least two
colleagues’ by doing the following:
Respond to at least two colleagues by offering critiques of their analyses. Identify strengths in their analyses and strategies for presenting evaluation results to others.
Identify ways your colleagues might improve their presentations.
Identify potential needs or questions of the audience that they may not have considered.
Provide an additional strategy for overcoming the obstacles or challenges in communicating the content of the evaluation reports.
Name first and references after every person
Instructor wants lay out like this:
Respond to at least two colleagues ( 2 peers posts are provided) by doing all of the following:
Identify strengths of your colleagues’ analyses and areas in which the analyses could be improved.
Your response
Address his or her evaluation of the efficacy and applicability of the evidence-based practice,
Your response
[Evaluate] his or her identification of factors that could support or hinder the implementation of the evidence-based practice,
Your response
And [evaluate] his or her solution for mitigating those factors.
Your response
Offer additional insight to your colleagues by either identifying additional factors that may support or limit implementation of the evidence-based practice or an alternative solution for mitigating one of the limitations that your colleagues identified.
Your response
References
Your response
Peer 1: McKenna Bull
RE: Katie Otte Initial Post-Discussion 1 - Week 11
COLLAPSE
Top of Form
Identify strengths in their analyses and strategies for presenting evaluation results to others.
You provided an insightful analysis of this particular process evaluation, and it seems that you were able to design a comprehensive presentation guideline. I agree with your tactic to break the presentation up into categories, and the categories you have selected seem to address the major components of the program, the evaluation itself, and the findings of said evaluation. You also provided a great analysis and summary of the PATHS program. The purpose of the program is clear, and the overarching purpose of the evaluation was made clear in your synopsis as well.
Identify ways your colleagues might improve their presentations.
You addressed outcome measures very well, however, there may have been some lacking information in regards to overall evaluation methods as a whole. Addressing factors such as who was collecting the data, how they were trained, how their training or standing could limit potential bias, and similar information. This may be an important piece of information that could help to provide audience members with a better understanding of the evaluation processes as a whole.
Identify potential needs or questions of the audience that they may not have considered.
As mentioned by Law and Shek (2011), this program was designed and facilitated in Hong Kong, Chi.
The implementation 'black box' and evaluation as a driver for change. Presentation by Katie Burke and Claire Hickey of the Centre for Effective Services.
Lessons Learned from OVC Evaluations for Future Public Health EvaluationsMEASURE Evaluation
Three key lessons learned from OVC (Orphans and Vulnerable Children) evaluations for future public health evaluations:
1) Evaluations need to be designed and incorporated from the beginning of projects rather than as an afterthought, but it is difficult to attract attention to evaluations early on.
2) There are methodological challenges to conducting rigorous impact evaluations of public health programs including non-random placement of programs, lack of suitable control groups, and inability to control for external factors.
3) Evaluations require strong leadership and buy-in from stakeholders to facilitate data collection and use of findings to improve programs. Early and continued engagement of stakeholders is important.
Global Topic will be World Hunger, I will be representing the pers.docxwhittemorelucilla
Global Topic will be World Hunger, I will be representing the perspective of Confucianism and Daoism
Prepare and present a multi-media Interfaith Initiative OR a Joint Resolution providing your group's solution to a real Global Issue that has been identified by the United Nations as needing major solutions in this day and age. The purpose of your task is to role play in such a way as though you are making a formal presentation of your solution to the United Nations Assembly. The key being that each person in your Group will represent at least one religious viewpoint from among those studied in this class and you must stay faithful to the beliefs and characteristics of your religion in developing your solution with the Group. Your Group will need to complete its work and the Leader post your work on or before Thursday of Week #8 in the weekly Forum for review by the class. You will need to reply to at least two other Group Projects.
As a result, your Interfaith Initiative OR Joint Resolution should include the following components:
· A brief Introduction that identifies the Global Issue presented by the United Nations as to the background information, history, and current status of the issue in the world today.
· Identification of the major components offered by each individual in the Group representing their specific religious beliefs and characteristics in direct relation to this issue alone.
· Presentation of your Group's Interfaith Initiative OR Joint Resolution which will include the specific directives of your solution, reasoning for the directives, and a brief plan for implementation by the United Nations.
· A Summary Statement briefly wrapping up your presentation and progress made for addressing this Global Issue.
· Be sure to include MLA citations and a Works Cited Page for inclusion of all resources used in each slide and in your presentation to avoid plagiarism.
· Failure to participate in the formation of this statement with your Group will result in major deductions as Group Leaders will be tasked with submitting participation completions or failures to participate.
Running head: GEORGIA SCHOOLS PUNISHMENT SYSTEM PROGRAM EVALUATION 1
GEORGIA SCHOOLS PUNISHMENT SYSTEM PROGRAM EVALUATION 4
Georgia Schools Punishment System Program Evaluation
Vibert Jacob
South University
Program Evaluation Criteria
The following five criteria are used in evaluating Georgia schools punishment system as a program: relevance, efficiency, effectiveness, impact and sustainability (Posavac, 2015). Relevance is a measure or criterion of the extent to which the punishment program meets the needs of the teachers, students and other important state education stakeholders, and the needs are consistent with the policies of the education administration in Georgia. For instance, a common question that can be asked under thi ...
Research is essential for public relations to understand how the public perceives an organization and to plan and evaluate PR programs. There are three main types of PR research: applied research which examines specific issues and develops campaigns, basic research which examines underlying processes, and introspective research which examines the field of PR itself. Research informs all aspects of the PR process from defining problems to planning, implementing, and evaluating programs. Evaluation research in particular judges the effectiveness of programs and identifies ways to improve future efforts.
SOCW 6311 WK 1 responses Respond to at least two colleagues .docxsamuel699872
SOCW 6311 WK 1 responses
Respond to at least two colleagues
(You have to compare my post to 2 SEPARATE peer posts and respond to their posts and ask a question I have provided all three)
by noting the similarities and differences in the factors that would support or impede your colleague’s implementation of evidence-based practice as noted in his or her post to those that would impact your implementation of evidence-based practice as noted in your original post. Offer a solution for addressing one of the factors that would impede your colleague’s implementation of evidence-based practice.
IT does not have to be long but has to in text citation and full references
MY POST
SummerLove Holcomb
RE: Discussion - Week 1
Top of Form
The Characteristics of the evidence-based practice (EBP)
The evidence-based program is defined as the programs that are effective and this is based on the rigorous assessment. One of the key features of EBP is that they have been assessed thoroughly in an experimental or quasi-experimental study. The evaluation of the EBP has been subjected to critical peer review and this implies that a conclusion has been reached by the evaluation experts. The EBP requires the ability to differentiate between the unverified opinions concerning the psychosocial interventions and the facts about their effectiveness. It is involving the process of inquiry that is provided to the practitioners and described for the physicians. This is important in integrating the best evidence, clinical expertise, and patient values as well as the situations that are linked to the management of the patient, management of the practice, and health policy decision-making processes (Small & O'Connor, 2007).
The assessment of the factors that are supporting or impeding the adoption of the evidence-based practice
Several factors are associated with the failure to the successful adoption of EBP. The implementation of EBP for example in healthcare facilities requires the dedication of time. Therefore, lack of adequate time for the training and implementation of the EBP makes it hard to adopt it within the facility. The adoption of evidence-based practice also requires adequate resources. This, therefore, implies that there must be adequate resources to facilitate the effective implementation and the adoption of the EBP. This, therefore, implies that smaller organizations with unstable capital income might not adopt the EBP. Another barrier is the inability to understand the statistical terms or the jargons used in the EBP. This leads to barriers in understanding thus making it hard to implement the EBP (Duncombe, 2018). Therefore, the factors that might support the implementation of the EBP are the availability of resources and adequate time.
References
Duncombe, D. C. (2018). A multi‐institutional study of the perceived barriers and facilitators to implementing evidence‐based practice. Journal of Clinical Nursing,.
This document discusses challenges in evaluating human rights progress and techniques that can help. It notes both benefits and drawbacks to measuring results, and challenges like long timeframes and attribution. A theory-driven approach is recommended to identify pathways and indicators to measure short-term outcomes contributing to long-term goals. Gathering diverse feedback, proxies for data, and transparency are also advised. Ongoing learning approaches focus on understanding program design and connecting activities to intended outcomes.
Highlights from ExL Pharma's Proactive GCP ComplianceExL Pharma
This document summarizes a conference on Good Clinical Practice (GCP) compliance. It discusses the objectives of clinical research and challenges to harmonizing GCP standards internationally. It also outlines current ethical challenges in clinical trials and dimensions of GCP frameworks. The document proposes that further developing national and regional GCP guidance, broadening the scope of GCP, and establishing appropriate platforms can help advance GCP. It provides guidance on investigator responsibilities and comments on adverse event reporting.
This document summarizes a regional workshop on evaluating HIV/AIDS programs. It discusses the objectives and importance of program evaluation, current challenges, and key concepts like monitoring, process evaluation, and outcome/impact evaluation. Different evaluation designs are described based on the type of inference required, from adequacy to probability. Factors to consider in choosing a design include indicators, target audience, and readiness of the program. Experimental and quasi-experimental designs can provide stronger evidence of impact but are more complex.
Similar to A Variety of Rigorous Methods for Assessing Program Effectiveness (20)
The Washington Eval membership survey found:
- Most members joined to learn about evaluation theories/practice and make connections.
- Monthly brown bags, deep dives, and social events are most popular. Preferred times are on-demand, 12-2pm, and 5:30-6pm.
- The weekly digest is most useful for sharing events, jobs, and opportunities.
- Most support increasing dues to $30, offering a two-year option, and auto-renew with opt-in.
- Members are generally satisfied with WE's diversity efforts but want more training and DEI incorporation.
- Many members expressed interest in pro bono, mentoring, and volunteer opportunities.
Are you interested in supporting emerging evaluators and developing the evaluation profession in the Washington, DC area? Are you an emerging evaluator interested in improving your skills and understanding or moving into a different field? This presentation will provide information on ways that Washington Evaluators members can engage in Mentor Minutes.
Mentor Minutes is an initiative that aims to connect current WE members to experienced evaluation professionals in the WE community through short-term mentorship opportunities. The purpose of Mentor Minutes is to pair experienced evaluators (mentors) with aspiring, emerging, or seasoned evaluators (mentees) and establish mutually beneficial professional connections.
George Julnes: Humility in Valuing in the Public Interest - Multiple Methods ...Washington Evaluators
Roundtable: Contributions of Cost-Effectiveness Studies to Evidence-Based Policymaking
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
Harry Hatry: Cost-Effectiveness Basics for Evidence-Based PolicymakingWashington Evaluators
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
The panel discussion will be introduced and chaired by Nick Hart, Director of BPC's Evidence-Based Policymaking Initiative and the 2017 Washington Evaluators President.
Panelists:
Harry Hatry, Distinguished Fellow and Director of the Urban Institute's Public Management Program
George Julnes, Professor in the University of Baltimore's School of Public and International Affairs
Sandy Davis, Senior Advisor to BPC's Evidence-Based Policymaking Initiative
George Julnes: Humility in Valuing in the Public Interest - Multiple Methods ...Washington Evaluators
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
The panel discussion will be introduced and chaired by Nick Hart, Director of BPC's Evidence-Based Policymaking Initiative and the 2017 Washington Evaluators President.
Panelists:
Harry Hatry, Distinguished Fellow and Director of the Urban Institute's Public Management Program
George Julnes, Professor in the University of Baltimore's School of Public and International Affairs
Sandy Davis, Senior Advisor to BPC's Evidence-Based Policymaking Initiative
The DC Consortium Student Conference on Evaluation and Policy (SCEP) is a collaboration of universities in the District of Columbia, Northern Virginia and Maryland regions, representing the interests of students aspiring to be evaluators and policy makers. This collaboration aims to provide students with a platform to present their research and engage with evaluation experts in the opportunity-rich region of Washington, D.C., thereby serving as a bridge between students, academia and other evaluation and policy agencies/organizations. In this presentation, students from the Organizing Committee discuss lessons learned from DC SCEP’s inaugural conference. Features of the conference include a keynote address, interdisciplinary panel, and about 30 student presentations. We will highlight lessons learned concerning how the conference served to broker knowledge towards its theme, ‘Advancing Social Justice in Evaluation and Policy Integration’ with Consortium graduate students in the region.
The document summarizes findings from three recent GAO reports on the use of evidence in federal decision making. It discusses the results of GAO's 2017 survey of federal managers which found no significant increase in the use of performance measures or information in decision making. It also summarizes the GAO's assessment of GPRAMA implementation and key findings about quarterly performance reviews and program evaluation from the manager survey. The document concludes with a recommendation that OMB direct each agency to prepare an annual evaluation agenda.
As evaluators, policy makers, and program managers, we want our efforts to solve the problems of the world to be based on the best possible knowledge. Too often, however, that knowledge is not organized in a way that makes it easy to use for decision-making and action. As a result, too many programs fail to meet their potential.
“Causal knowledge mapping” is a technique for integrating and measurably improving knowledge from a broad range of sources. In this webinar, we’ll use real-world examples and interactive conversations to show three kinds of causal knowledge maps that can benefit an evaluation: (1) Collaborative maps to design programs that fit the local situation; (2) Literature maps to identify and improve upon effective practices; (3) Evaluation findings maps for continual improvement.
Partnerships for Transformative Change in Challenging Political Contexts w/ D...Washington Evaluators
The document summarizes a 4-day course on transformative evaluation held in Santiago, Chile in September 2016. The course was attended by 35 evaluators from several South American countries and focused on how evaluators can contribute to social justice and human rights through their work. It covered the transformative paradigm and questions about incorporating social change into evaluation design. Participants discussed solutions like empowering marginalized communities and forming diverse evaluation teams. The course organizers were flexible in bringing transformative evaluation concepts to different universities and organizations in Chile.
Founded in 1984 with an initial membership of 12 evaluators, the Washington Evaluators (WE) has since grown to include a professional and student membership base of more than 200 in the nation's capitol. This presentation describes WE's experience in developing and maintaining a community of evaluation practitioners that include a diverse mix of government, private, and self-employed evaluators as well as prominent evaluators in academia. This presentation discusses the strategies WE uses to foster personal connections and sharing information about the evaluation profession for both new and long-time evaluators.
Transitioning from School to Work: Preparing Evaluation Students and New Eval...Washington Evaluators
Unlike some professions, there is no single path for making the leap from student to new professional to being an established member of the profession. In large part this is because of the trans-disciplinary nature of evaluation field and the many the broad number of professions and sectors (public, non-profit, private) in which evaluation and social science research skills may be useful. This panel will explore the many approaches used by universities in the Washington, DC area to train graduate and undergraduate students in the field of evaluation, and the transition strategies to help students and new evaluators establish themselves in the evaluation field. The seven distinguished panelists are all associated with Washington Evaluators, and have served in AEA and/or WE leadership positions. Panelists and our Discussant will be asked to address questions such as:
1. In which disciplines/schools at your university would we expect to find courses in evaluation or related to evaluation?
2. What are the components of the evaluation curricula? Do you offer a degree or major field in evaluation?
3. Do you offer hands-on experiences for your students to design and conduct evaluations?
4. Where have your former students worked in the evaluation field, and what kinds of careers have they had?
5. What advice do you have for new evaluators regarding making the shift from school to work in the evaluation field? What types of professional and networking activities would you recommend to further careers in evaluation?
Challenges and Solutions to Conducting High Quality Contract Evaluations for ...Washington Evaluators
Challenges and Solutions to Conducting High Quality Contract Evaluations for the U.S. Government
Washington Evaluators Brown Bag
July 7, 2015
Presenter: David J. Bernstein
Discussant: Kathryn E. Newcomer
Lessons from World Bank Support for Evidence-Based Policy Making, Presented by Nils Junge on Wednesday, June 17, 2015 from 12 - 1:30 pm in the George Washington University Marvin Center (Room 308).
Since the late 1990s the World Bank has placed greater and greater emphasis on evidence-based policy making, with a specific focus on how the poor and vulnerable are affected. A commonly used approach is ‘Poverty and Social Impact Analysis’ (PSIA), typically undertaken before development projects are approved. PSIAs are implemented with the express purpose of informing public sector reforms in order to mitigate negative distributional impacts. To identify winners and losers of a given policy reform, PSIAs may use or combine various kinds of analysis: statistical, econometric cost-benefit, social, stakeholder, political economy, etc. Strongly utilization-focused, the evaluation process is often as important as the analytical work itself. After introducing PSIA methods, the presenter will share practical lessons from 12 years conducting PSIAs and some of the challenges inherent in this exciting area of evaluation.
Nils Junge works internationally as an independent evaluator and policy advisor. In addition to advising the World Bank and government counterparts on addressing reform impacts, he has conducted evaluations for over 20 clients in Africa, Eastern Europe and the Middle East/North Africa. Multi-lingual, he has worked in 5 languages. He has an MA from Johns Hopkins – School of Advanced International Studies (SAIS).
This document outlines the current state of monitoring and evaluation (M&E) in Tajikistan. It discusses the country's background, M&E system and players, possibilities and limitations. It also describes Tajikistan's National M&E Network, which was established in 2008 and includes over 100 members. The Network aims to share information, expand partnerships, and build M&E capacity in Tajikistan through activities like attending international conferences and developing local language resources. Overall, the document provides an overview of M&E practice in Tajikistan and the goals of the National M&E Network to further develop the field.
The Kyrgyz Republic established a national monitoring and evaluation (M&E) system beginning in the 2000s. As strategic planning increased the need for M&E and non-governmental organization involvement, the government began including M&E sections in programs and strategies from 2000 onward. A National M&E Network was formed in 2007 by NGOs and individuals to support M&E system development. While M&E practices were adopted, implementation has faced challenges of disconnected data collection across agencies and a lack of public input. The Network works to strengthen professional evaluation through training, publications, and events to help address these challenges and further establish M&E in governance.
Ann K. Emery gave a brown bag presentation on visualizing evaluation results to the Washington Evaluators on September 15, 2014 at George Washington University. The presentation highlighted tips for creating effective data visualizations including using intentional color schemes, ensuring visuals are accessible on websites and social media, and using checklists to guide design. Emery emphasized the importance of visualizing both qualitative and quantitative evaluation findings to tell compelling stories with data.
Influencing Evaluation Policy and Practice: The American Evaluation Associati...Washington Evaluators
Influencing Evaluation Policy and Practice: The American Evaluation Association's Evaluation Policy Task Force by Cheryl J. Oros, Ph.D., Consultant to the Evaluation Policy Task Force
ग्रेटर मुंबई के नगर आयुक्त को एक खुले पत्र में याचिका दायर कर 540 से अधिक मुंबईकरों ने सभी अवैध और अस्थिर होर्डिंग्स, साइनबोर्ड और इलेक्ट्रिक साइनेज को तत्काल हटाने और 13 मई, 2024 की शाम को घाटकोपर में अवैध होर्डिंग के गिरने की विनाशकारी घटना के बाद अपराधियों के खिलाफ सख्त कार्रवाई की मांग की है, जिसमें 17 लोगों की जान चली गई और कई निर्दोष लोग गंभीर रूप से घायल हो गए।
Christian persecution in Islamic countries has intensified, with alarming incidents of violence, discrimination, and intolerance. This article highlights recent attacks in Nigeria, Pakistan, Egypt, Iran, and Iraq, exposing the multifaceted challenges faced by Christian communities. Despite the severity of these atrocities, the Western world's response remains muted due to political, economic, and social considerations. The urgent need for international intervention is underscored, emphasizing that without substantial support, the future of Christianity in these regions is at grave risk.
https://ecspe.org/the-rise-of-christian-persecution-in-islamic-countries/
केरल उच्च न्यायालय ने 11 जून, 2024 को मंडला पूजा में भाग लेने की अनुमति मांगने वाली 10 वर्षीय लड़की की रिट याचिका को खारिज कर दिया, जिसमें सर्वोच्च न्यायालय की एक बड़ी पीठ के समक्ष इस मुद्दे की लंबित प्रकृति पर जोर दिया गया। यह आदेश न्यायमूर्ति अनिल के. नरेंद्रन और न्यायमूर्ति हरिशंकर वी. मेनन की खंडपीठ द्वारा पारित किया गया
18062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
12062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
Federal Authorities Urge Vigilance Amid Bird Flu Outbreak | The Lifesciences ...The Lifesciences Magazine
Federal authorities have advised the public to remain vigilant but calm in response to the ongoing bird flu outbreak of highly pathogenic avian influenza, commonly known as bird flu.
Why We Chose ScyllaDB over DynamoDB for "User Watch Status"ScyllaDB
Yichen Wei and Adam Drennan share the architecture and technical requirements behind "user watch status" for a major global media streaming service, what that meant for their database, the pros and cons of the many options they considered for replacing DynamoDB, why they ultimately chose ScyllaDB, and their lessons learned so far.
15062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
16062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
लालू यादव की जीवनी LALU PRASAD YADAV BIOGRAPHYVoterMood
Discover the life and times of Lalu Prasad Yadav with a comprehensive biography in Hindi. Learn about his early days, rise in politics, controversies, and contribution.
13062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
17062024_First India Newspaper Jaipur.pdfFIRST INDIA
Find Latest India News and Breaking News these days from India on Politics, Business, Entertainment, Technology, Sports, Lifestyle and Coronavirus News in India and the world over that you can't miss. For real time update Visit our social media handle. Read First India NewsPaper in your morning replace. Visit First India.
CLICK:- https://firstindia.co.in/
#First_India_NewsPaper
Recent years have seen a disturbing rise in violence, discrimination, and intolerance against Christian communities in various Islamic countries. This multifaceted challenge, deeply rooted in historical, social, and political animosities, demands urgent attention. Despite the escalating persecution, substantial support from the Western world remains lacking.
Slide deck with charts from our Digital News Report 2024, the most comprehensive exploration of news consumption habits around the world, based on survey data from more than 95,000 respondents across 47 countries.
Shark Tank Jargon | Operational ProfitabilityTheUnitedIndian
Don't let fancy business words confuse you! This blog is your cheat sheet to understanding the Shark Tank Jargon. We'll translate all the confusing terms like "valuation" (how much the company is worth) and "royalty" (a fee for using someone's idea). You'll be swimming with the Sharks like a pro in no time!
A Variety of Rigorous Methods for Assessing Program Effectiveness
1. United States Government Accountability Office
GAO
Report to Congressional Requesters
November 2009
PROGRAM
EVALUATION
A Variety of Rigorous
Methods Can Help
Identify Effective
Interventions
GAO-10-30
2. November 2009
PROGRAM EVALUATION
Accountability Integrity Reliability
Highlights
A Variety of Rigorous Methods Can Help Identify
Effective Interventions
Highlights of GAO-10-30, a report to
congressional requesters
Why GAO Did This Study
What GAO Found
Recent congressional initiatives
seek to focus funds for certain
federal social programs on
interventions for which
randomized experiments show
sizable, sustained benefits to
participants or society. The
private, nonprofit Coalition for
Evidence-Based Policy undertook
the Top Tier Evidence initiative to
help federal programs identify
interventions that meet this
standard.
The Coalition’s Top Tier Evidence initiative criteria for assessing evaluation
quality conform to general social science research standards, but other features of
its overall process differ from common practice for drawing conclusions about
intervention effectiveness. The Top Tier initiative clearly describes how it
identifies candidate interventions but is not as transparent about how it
determines whether an intervention meets the top tier criteria. In the absence of
detailed guidance, the panel defined sizable and sustained effects through case
discussion. Over time, it increasingly obtained agreement on whether an
intervention met the top tier criteria.
GAO was asked to examine (1) the
validity and transparency of the
Coalition’s process, (2) how its
process compared to that of six
federally supported efforts to
identify effective interventions, (3)
the types of interventions best
suited for assessment with
randomized experiments, and (4)
alternative rigorous methods used
to assess effectiveness. GAO
reviewed documents, observed the
Coalition’s advisory panel
deliberate on interventions meeting
its top tier standard, and reviewed
other documents describing the
processes the federally supported
efforts had used. GAO reviewed the
literature on evaluation methods
and consulted experts on the use of
randomized experiments.
The Coalition generally agreed with
the findings. The Departments of
Education and Health and Human
Services provided technical
comments on a draft of this report.
The Department of Justice
provided no comments.
The major difference in rating study quality between the Top Tier and the six
other initiatives examined is a product of the Top Tier standard as set out in
certain legislative provisions: the other efforts accept well-designed, wellconducted, nonrandomized studies as credible evidence. The Top Tier initiative’s
choice of broad topics (such as early childhood interventions), emphasis on longterm effects, and use of narrow evidence criteria combine to provide limited
information on what is effective in achieving specific outcomes. The panel
recommended only 6 of 63 interventions reviewed as providing “sizeable,
sustained effects on important outcomes.” The other initiatives acknowledge a
continuum of evidence credibility by reporting an intervention’s effectiveness on a
scale of high to low confidence.
The program evaluation literature generally agrees that well-conducted
randomized experiments are best suited for assessing effectiveness when multiple
causal influences create uncertainty about what caused results. However, they are
often difficult, and sometimes impossible, to carry out. An evaluation must be able
to control exposure to the intervention and ensure that treatment and control
groups’ experiences remain separate and distinct throughout the study.
Several rigorous alternatives to randomized experiments are considered
appropriate for other situations: quasi-experimental comparison group studies,
statistical analyses of observational data, and—in some circumstances—in-depth
case studies. The credibility of their estimates of program effects relies on how
well the studies’ designs rule out competing causal explanations. Collecting
additional data and targeting comparisons can help rule out other explanations.
GAO concludes that
•
requiring evidence from randomized studies as sole proof of effectiveness will
likely exclude many potentially effective and worthwhile practices;
•
reliable assessments of evaluation results require research expertise but can
be improved with detailed protocols and training;
GAO makes no recommendations.
•
View GAO-10-30 or key components.
For more information, contact Nancy
Kingsbury at (202) 512-2700 or
kingsburyn@gao.gov.
deciding to adopt an intervention involves other considerations in addition to
effectiveness, such as cost and suitability to the local community; and
•
improved evaluation quality would also help identify effective interventions.
What GAO Recommends
United States Government Accountability Office
3. Contents
Letter
1
Background
Top Tier Initiative’s Process Is Mostly Transparent
Top Tier Follows Rigorous Standards but Is Limited for Identifying
Effective Interventions
Randomized Experiments Can Provide the Most Credible Evidence
of Effectiveness under Certain Conditions
Rigorous Alternatives to Random Assignment Are Available
Concluding Observations
Agency and Third-Party Comments
Appendix I
Appendix II
Appendix III
3
8
20
26
31
32
Steps Seven Evidence-Based Initiatives Take to
Identify Effective Interventions
34
Comments from the Coalition for Evidence-Based
Policy
37
GAO Contact and Staff Acknowledgments
40
13
Bibliography
41
Related GAO Products
44
Page i
GAO-10-30 Effective Interventions
4. Abbreviations
AHRQ
CDC
EPC
GPRA
HHS
MPG
NREPP
OMB
PART
PRS
SAMHSA
SCHIP
WWC
Agency for Healthcare Research and Quality
Centers for Disease Control and Prevention
Evidence-based Practice Centers
Government Performance and Results Act of 1993
Department of Health and Human Services
Model Programs Guide
National Registry of Evidence-based Programs and
Practices
Office of Management and Budget
Program Assessment Rating Tool
HIV/AIDS Prevention Research Synthesis
Substance Abuse and Mental Health Administration
State Children’s Health Insurance Program
What Works Clearinghouse
This is a work of the U.S. government and is not subject to copyright protection in the
United States. The published product may be reproduced and distributed in its entirety
without further permission from GAO. However, because this work may contain
copyrighted images or other material, permission from the copyright holder may be
necessary if you wish to reproduce this material separately.
Page ii
GAO-10-30 Effective Interventions
5. United States Government Accountability Office
Washington, DC 20548
November 23, 2009
The Honorable Joseph I. Lieberman
Chairman
The Honorable Susan M. Collins
Ranking Member
Committee on Homeland Security and Governmental Affairs
United States Senate
The Honorable Mary L. Landrieu
Chairman
Subcommittee on Disaster Recovery
Committee on Homeland Security and Governmental Affairs
United States Senate
Several recent congressional initiatives seek to focus funds in certain
federal social programs on activities for which the evidence of
effectiveness is rigorous—specifically, well-designed randomized
controlled trials showing sizable, sustained benefits to program
participants or society. To help agencies, grantees, and others implement
the relevant legislative provisions effectively, the private, nonprofit
Coalition for Evidence-Based Policy launched the Top Tier Evidence
initiative in 2008 to identify and validate social interventions meeting the
standard of evidence set out in these provisions. In requesting this report,
you expressed interest in knowing whether limiting the search for
effective interventions to those that had been tested against these
particular criteria might exclude from consideration other important
interventions. To learn whether the Coalition’s approach could be valuable
in helping federal agencies implement such funding requirements, you
asked GAO to independently assess the Coalition’s approach. GAO’s
review focused on the following questions.
1. How valid and transparent is the process the Coalition used—
searching, selecting, reviewing, and synthesizing procedures and
criteria—to identify social interventions that meet the standard of
“well-designed randomized controlled trials showing sizable, sustained
effects on important outcomes”?
2. How do the Coalition’s choices of procedures and criteria compare to
(a) generally accepted design and analysis techniques for identifying
effective interventions and (b) similar standards and processes other
federal agencies use to evaluate similar efforts?
Page 1
GAO-10-30 Effective Interventions
6. 3. What types of interventions do randomized controlled experiments
appear to be best suited to assessing effectiveness?
4. For intervention types for which randomized controlled experiments
appear not to be well suited, what alternative forms of evaluation are
used to successfully assess effectiveness?
To assess the Coalition’s Top Tier initiative, we reviewed documents,
conducted interviews, and observed the deliberations of its advisory panel,
who determined which interventions met the “top tier” evidence
standard—well-designed, randomized controlled trials showing sizable,
sustained benefits to program participants or society. We evaluated the
transparency of the initiative’s process against its own publicly stated
procedures and criteria, including the top tier evidence standard. To
assess the validity of the Coalition’s approach, we compared its
procedures and criteria to those recommended in program evaluation
textbooks and related publications, as well as to the processes actually
used by six federally supported initiatives with a similar purpose to the
Coalition. Through interviews and database searches, we identified six
initiatives supported by the U.S. Department of Education, Department of
Health and Human Services (HHS), and Department of Justice that also
conduct systematic reviews of evaluation evidence to identify effective
interventions. 1 We ascertained the procedures and criteria these federally
supported efforts used from interviews and document reviews.
We identified the types of interventions for which randomized controlled
experiments—the Coalition’s primary evidence criterion—are best suited
and alternative methods for assessing effectiveness by reviewing the
program evaluation methodology literature and by having our summaries
of that literature reviewed by a diverse set of experts in the field. We
obtained reviews from seven experts who had published on evaluation
methodology, held leadership positions in the field, and had experience in
diverse subject areas and methodologies.
We conducted this performance audit from May 2008 through November
2009 in accordance with generally accepted government auditing
standards. Those standards require that we plan and perform the audit to
1
In addition, the federal Interagency Working Group on Youth Programs Web site
www.findyouthinfo.gov provides interactive tools and other resources to help youthserving organizations assess community assets, identify local and federal resources, and
search for evidence-based youth programs.
Page 2
GAO-10-30 Effective Interventions
7. obtain sufficient, appropriate evidence to provide a reasonable basis for
our findings and conclusions based on our audit objectives. We believe
that the evidence obtained provides a reasonable basis for our findings
and conclusions based on our audit objectives.
Background
Over the past two decades, several efforts have been launched to improve
federal government accountability and results, such as the strategic plans
and annual performance reports required under the Government
Performance and Results Act of 1993 (GPRA). The act was designed to
provide executive and congressional decision makers with objective
information on the relative effectiveness and efficiency of federal
programs and spending. In 2002, the Office of Management and Budget
(OMB) introduced the Program Assessment Rating Tool (PART) as a key
element of the budget and performance integration initiative under
President George W. Bush’s governmentwide Management Agenda. PART
is a standard set of questions meant to serve as a diagnostic tool, drawing
on available program performance and evaluation information to form
conclusions about program benefits and recommend adjustments that may
improve results.
The success of these efforts has been constrained by lack of access to
credible evidence on program results. We previously reported that the
PART review process has stimulated agencies to increase their evaluation
capacity and available information on program results. 2 After 4 years of
PART reviews, however, OMB rated 17 percent of 1,015 programs “results
not demonstrated”—that is, did not have acceptable performance goals or
performance data. Many federal programs, while tending to have limited
evaluation resources, require program evaluation studies, rather than
performance measures, in order to distinguish a program’s effects from
those of other influences on outcomes.
Program evaluations are systematic studies that assess how well a
program is working, and they are individually tailored to address the
client’s research question. Process (or implementation) evaluations assess
the extent to which a program is operating as intended. Outcome
evaluations assess the extent to which a program is achieving its outcome-
2
GAO, Program Evaluation: OMB’s PART Reviews Increased Agencies’ Attention to
Improving Evidence of Program Results, GAO-06-67 (Washington, D.C.: October 28, 2005),
p. 28.
Page 3
GAO-10-30 Effective Interventions
8. oriented objectives but may also examine program processes to
understand how outcomes are produced. When external factors such as
economic or environmental conditions are known to influence a program’s
outcomes, an impact evaluation may be used in an attempt to measure a
program’s net effect by comparing outcomes with an estimate of what
would have occurred in the absence of the program intervention. A
number of methodologies are available to estimate program impact,
including experimental and nonexperimental designs.
Concern about the quality of social program evaluation has led to calls for
greater use of randomized experiments—a method used more widely in
evaluations of medical than social science interventions. Randomized
controlled trials (or randomized experiments) compare the outcomes for
groups that were randomly assigned either to the treatment or to a
nonparticipating control group before the intervention, in an effort to
control for any systematic difference between the groups that could
account for a difference in their outcomes. A difference in these groups’
outcomes is believed to represent the program’s impact. While random
assignment is considered a highly rigorous approach in assessing program
effectiveness, it is not the only rigorous research design available and is
not always feasible.
The Coalition for Evidence-Based Policy is a private, nonprofit
organization that was sponsored by the Council for Excellence in
Government from 2001 until the Council closed in 2009. The Coalition
aims to improve the effectiveness of social programs by encouraging
federal agencies to fund rigorous studies—particularly randomized
controlled trials—to identify effective interventions and to provide strong
incentives and assistance for federal funding recipients to adopt such
interventions. 3 Coalition staff have advised OMB and federal agencies on
how to identify rigorous evaluations of program effectiveness, and they
manage a Web site called “Social Programs That Work” that provides
examples of evidence-based programs to “provide policymakers and
practitioners with clear, actionable information on what works, as
demonstrated in scientifically-valid studies. . . .” 4
3
See Coalition for Evidence-Based Policy, www.coalition4evidence.org.
4
See Coalition for Evidence-Based Policy, Social Programs That Work,
www.evidencebasedprograms.org.
Page 4
GAO-10-30 Effective Interventions
9. In 2008, the Coalition launched a similar but more formal effort, the Top
Tier Evidence initiative, to identify only interventions that have been
shown in “well-designed and implemented randomized controlled trials,
preferably conducted in typical community settings, to produce sizeable,
sustained benefits to participants and/or society.” 5 At the same time, it
introduced an advisory panel of evaluation researchers and former
government officials to make the final determination. The Coalition has
promoted the adoption of this criterion in legislation to direct federal
funds toward strategies supported by rigorous evidence. By identifying
interventions meeting this criterion, the Top Tier Evidence initiative aims
to assist agencies, grantees, and others in implementing such provisions
effectively.
Federally Supported
Initiatives to Identify
Effective Interventions
Because of the flexibility provided to recipients of many federal grants,
achieving these federal programs’ goals relies heavily on agencies’ ability
to influence their state and local program partners’ choice of activities. In
the past decade, several public and private efforts have been patterned
after the evidence-based practice model in medicine to summarize
available effectiveness research on social interventions to help managers
and policymakers identify and adopt effective practices. The Department
of Education, HHS, and Department of Justice support six initiatives
similar to the Coalition’s to identify effective social interventions. These
initiatives conduct systematic searches for and review the quality of
evaluations of intervention effectiveness in a given field and have been
operating for several years.
We examined the processes used by these six ongoing federally supported
efforts to identify effective interventions in order to provide insight into
the choices of procedures and criteria that other independent
organizations made in attempting to achieve a similar outcome as the Top
Tier initiative: to identify interventions with rigorous evidence of
effectiveness. The Top Tier initiative, however, aims to identify not all
effective interventions but only those supported by the most definitive
evidence of effectiveness. The processes each of these initiatives
(including Top Tier) takes to identify effective interventions are
summarized in appendix I.
5
See Coalition for Evidence-Based Policy, Top Tier Evidence, http://toptierevidence.org.
The criterion is also sometimes phrased more simply as interventions that have been
shown in well-designed randomized controlled trials to produce sizable, sustained effects
on important outcomes.
Page 5
GAO-10-30 Effective Interventions
10. Evidence-Based Practice
Centers
In 1997, the Agency for Healthcare Research and Quality (AHRQ)
established the Evidence-based Practice Centers (EPC) (there are
currently 14) to provide evidence on the relative benefits and risks of a
wide variety of health care interventions to inform health care decisions. 6
EPCs perform comprehensive reviews and synthesize scientific evidence
to compare health treatments, including pharmaceuticals, devices, and
other types of interventions. The reviews, with a priority on topics that
impose high costs on the Medicare, Medicaid, or State Children’s Health
Insurance (SCHIP) programs, provide evidence about effectiveness and
harms and point out gaps in research. The reviews are intended to help
clinicians and patients choose the best tests and treatments and to help
policy makers make informed decisions about health care services and
quality improvement. 7
The Guide to Community
Preventive Services
HHS established the Guide to Community Preventive Services (the
Community Guide) in 1996 to provide evidence-based recommendations
and findings about public health interventions and policies to improve
health and promote safety. With the support of the Centers for Disease
Control and Prevention (CDC), the Community Guide synthesizes the
scientific literature to identify the effectiveness, economic efficiency, and
feasibility of program and policy interventions to promote community
health and prevent disease. The Task Force on Community Preventive
Services, an independent, nonfederal, volunteer body of public health and
prevention experts, guides the selection of review topics and uses the
evidence gathered to develop recommendations to change risk behaviors,
address environmental and ecosystem challenges, and reduce disease,
injury, and impairment. Intended users include public health professionals,
legislators and policy makers, community-based organizations, health care
service providers, researchers, employers, and others who purchase health
care services. 8
HIV/AIDS Prevention Research
Synthesis
CDC established the HIV/AIDS Prevention Research Synthesis (PRS) in
1996 to review and summarize HIV behavioral prevention research
literature. PRS conducts systematic reviews to identify evidence-based
HIV behavioral interventions with proven efficacy in preventing the
6
AHRQ was formerly called the Agency for Health Care Policy and Research.
7
See Agency for Healthcare Research and Quality, Effective Health Care,
www.effectivehealthcare.ahrq.gov.
8
See Guide to Community Preventive Services, www.thecommunityguide.org/index.html.
Page 6
GAO-10-30 Effective Interventions
11. acquisition or transmission of HIV infection (reducing HIV-related risk
behaviors, sexually transmitted diseases, HIV incidence, or promoting
protective behaviors). These reviews are intended to translate scientific
research into practice by providing a compendium of evidence-based
interventions to HIV prevention planners and providers and state and local
health departments for help with selecting interventions best suited to the
needs of the community. 9
Model Programs Guide
The Office of Juvenile Justice and Delinquency Prevention established the
Model Programs Guide (MPG) in 2000 to identify effective programs to
prevent and reduce juvenile delinquency and related risk factors such as
substance abuse. MPG conducts reviews to identify effective intervention
and prevention programs on the following topics: delinquency; violence;
youth gang involvement; alcohol, tobacco, and drug use; academic
difficulties; family functioning; trauma exposure or sexual activity and
exploitation; and accompanying mental health issues. MPG produces a
database of intervention and prevention programs intended for juvenile
justice practitioners, program administrators, and researchers. 10
National Registry of EvidenceBased Programs and Practices
The Substance Abuse and Mental Health Services Administration
(SAMHSA) established the National Registry of Evidence-based Programs
and Practices (NREPP) in 1997 and provides the public with information
about the scientific basis and practicality of interventions that prevent or
treat mental health and substance abuse disorders. 11 NREPP reviews
interventions to identify those that promote mental health and prevent or
treat mental illness, substance use, or co-occurring disorders among
individuals, communities, or populations. NREPP produces a database of
interventions that can help practitioners and community-based
organizations identify and select interventions that may address their
particular needs and match their specific capacities and resources. 12
9
See Centers for Disease Control and Prevention, HIV/AIDS Prevention Research Synthesis
Project, www.cdc.gov/hiv/topics/research/prs.
10
See Office of Juvenile Justice and Delinquency Prevention Programs, OJJDP Model
Programs Guide, www2.dsgonline.com/mpg.
11
It was established as the National Registry of Effective Prevention Programs; it was
expanded in 2004 to include mental health and renamed the National Registry of Evidencebased Programs and Practices.
12
See NREPP, SAMHSA’s National Registry of Evidence-based Programs and Practices,
www.nrepp.samhsa.gov.
Page 7
GAO-10-30 Effective Interventions
12. What Works Clearinghouse
The Institute of Education Sciences established the What Works
Clearinghouse (WWC) in 2002 to provide educators, policymakers,
researchers, and the public with a central source of scientific evidence on
what improves student outcomes. WWC reviews research on the
effectiveness of replicable educational interventions (programs, products,
practices, and policies) to improve student achievement in areas such as
mathematics, reading, early childhood education, English language, and
dropout prevention. The WWC Web site reports information on the
effectiveness of interventions through a searchable database and summary
reports on the scientific evidence. 13
Top Tier Initiative’s
Process Is Mostly
Transparent
The Coalition provides a clear public description on its Web site of the
first two phases of its process—search and selection to identify candidate
interventions. It primarily searches other evidence-based practice Web
sites and solicits nominations from experts and the public. Staff post their
selection criteria and a list of the interventions and studies reviewed on
their Web site. However, their public materials have not been as
transparent about the criteria and process used in the second two phases
of its process—review and synthesize study results to determine whether
an intervention met the Top Tier criteria. Although the Coalition provides
brief examples of the panel’s reasoning in making Top Tier selections, it
has not fully reported the panel’s discussion of how to define sizable and
sustained effects in the absence of detailed guidance or the variation in
members’ overall assessments of the interventions.
The Top Tier Initiative
Clearly Described Its
Process for Identifying
Interventions
Through its Web site and e-mailed announcements, the Coalition has
clearly described how it identified interventions by searching the strongest
evidence category of 15 federal, state, and private Web sites profiling
evidence-based practices and by soliciting nominations from federal
agencies, researchers, and the general public. Its Web site posting clearly
indicated the initiative’s search and selection criteria: (1) early childhood
interventions (for ages 0–6) in the first phase of the initiative and
interventions for children and youths (ages 7–18) in the second phase
(starting in February 2009) and (2) interventions showing positive results
in well-designed and implemented randomized experiments. Coalition
staff then searched electronic databases and consulted with researchers to
identify any additional randomized studies of the interventions selected
13
See IES What Works Clearinghouse, http://ies.ed.gov/ncee/wwc.
Page 8
GAO-10-30 Effective Interventions
13. for review. The July 2008 announcement of the initiative included its
August 2007 “Checklist for Reviewing a Randomized Controlled Trial of a
Social Program or Project, to Assess Whether It Produced Valid Evidence.”
The Checklist describes the defining features of a well-designed and
implemented randomized experiment: equivalence of treatment and
control groups throughout the study, valid measurement and analysis, and
full reporting of outcomes. It also defines a strong body of evidence as
consisting of two or more randomized experiments or one large multisite
study.
In the initial phase (July 2008 through February 2009), Coalition staff
screened studies of 46 early childhood interventions for design or
implementation flaws and provided the advisory panel with brief
summaries of the interventions and their results and reasons why they
screened out candidates they believed clearly did not meet the Top Tier
standard. Reasons for exclusion included small sample sizes, high sample
attrition (both during and after the intervention), follow-up periods of less
than 1 year, questionable outcome measures (for example, teachers’
reports of their students’ behavior), and positive effects that faded in later
follow-up. Staff also excluded interventions that lacked confirmation of
effects in a well-implemented randomized study. Coalition staff
recommended three candidate interventions from their screening review;
advisory panel members added two more for consideration after reviewing
the staff summaries (neither of which was accepted as top tier by the full
panel). While the Top Tier Initiative explains each of its screening
decisions to program developers privately, on its Web site it simply posts a
list of the interventions and studies reviewed, along with full descriptions
of interventions accepted as top tier and a brief discussion of a few
examples of the panel’s reasoning. 14
Reviewers Defined the Top
Tier Criteria through Case
Discussion
The Top Tier initiative’s public materials are less transparent about the
process and criteria used to determine whether an intervention met the
Top Tier standard than about candidate selection. One panel member, the
lead reviewer, explicitly rates the quality of the evidence on each
candidate intervention using the Checklist and rating form. Coalition staff
members also use the Checklist to review the available evidence and
prepare detailed study reviews that identify any significant limitations. The
full advisory panel then discusses the available evidence on the
14
See http://toptierevidence.org.
Page 9
GAO-10-30 Effective Interventions
14. recommended candidates and holds a secret ballot on whether an
intervention meets the Top Tier standard, drawing on the published
research articles, the staff review, and the lead reviewer’s quality rating
and Top Tier recommendation.
The advisory panel discussions did not generally dispute the lead
reviewer’s study quality ratings (on quality of overall design, group
equivalence, outcome measures, and analysis reporting) but, instead,
focused on whether the body of evidence met the Top Tier standard (for
sizable, sustained effects on important outcomes in typical community
settings). The Checklist also includes two criteria or issues that were not
explicit in the initial statement of the Top Tier standard—whether the
body of evidence showed evidence of effects in more than one site
(replication) and provided no strong countervailing evidence. Because
neither the Checklist nor the rating form provides definitions of how large
a sizable effect should be, how long a sustained effect should last, or what
constituted an important outcome, the panel had to rely on its professional
judgment in making these assessments.
Although a sizable effect was usually defined as one passing tests of
statistical significance at the 0.05 level, panel members raised questions
about whether particular effects were sufficiently large to have practical
importance. The panel often turned to members with subject matter
expertise for advice on these matters. One member cautioned against
relying too heavily on the reported results of statistical tests, because
some studies, by conducting a very large number of comparisons,
appeared to violate the assumptions of those tests and, thus, probably
identified some differences between experimental groups as statistically
significant simply by chance.
The Checklist originally indicated a preference for data on long-term
outcomes obtained a year after the intervention ended, preferably longer,
noting that “longer-term effects . . . are of greatest policy and practical
importance.” 15 Panel members disagreed over whether effects measured
no later than the end of the second grade—at the end of the intervention—
were sufficiently sustained and important to qualify as top tier, especially
in the context of other studies that tracked outcomes to age 15 or older.
15
Coalition for Evidence-Based Policy, “Checklist for Reviewing a Randomized Controlled
Trial of a Social Program or Project, to Assess Whether It Produced Valid Evidence,”
August 2007, p. 5. http://toptierevidence.org
Page 10
GAO-10-30 Effective Interventions
15. One panel member questioned whether it was realistic to expect the
effects of early childhood programs to persist through high school,
especially for low-cost interventions; others noted that the study design
did not meet the standard because it did not collect data a year after the
intervention ended. In the end, a majority (but not all) of the panel
accepted this intervention as top tier because the study found that effects
persisted over all 3 program years, and they agreed to revise the language
in the Checklist accordingly.
Panel members disagreed on what constituted an important outcome. Two
noted a pattern of effects in one study on cognitive and academic tests
across ages 3, 5, 8, and 18. Another member did not consider cognitive
tests an important enough outcome and pointed out that the effects
diminished over time and did not lead to effects on other school-related
behavioral outcomes such as special education placement or school dropout. Another member thought it was unreasonable to expect programs for
very young children (ages 1–3) to show an effect on a child at age 18, given
all their other experiences in the intervening years.
A concern related to judging importance was whether and how to
incorporate the cost of the intervention into the intervention assessment.
On one hand, there was no mention of cost in the Checklist or intervention
rating form. On the other hand, panel members frequently raised the issue
when considering whether they were comfortable recommending the
intervention to others. One aspect of this was proportionality: they might
accept an outcome of less policy importance if the intervention was
relatively inexpensive but would not if it was expensive. Additionally, one
panel member feared that an expensive intervention that required a lot of
training and monitoring to produce results might be too difficult to
successfully replicate in more ordinary settings. In the February 2009
meeting, it was decided that program cost should not be a criterion for
Top Tier status but should be considered and reported with the
recommendation, if deemed relevant.
The panel discussed whether a large multisite experiment should qualify
as evidence meeting the replication standard. One classroom-based
intervention was tested by randomly assigning 41 schools nationwide.
Because the unit of analysis was the school, results at individual schools
were not analyzed or reported separately but were aggregated to form one
experimental–control group comparison per outcome measure. Some
panel members considered this study a single randomized experiment;
others accepted it as serving the purpose of a replication, because effects
were observed over a large number of different settings. In this case,
Page 11
GAO-10-30 Effective Interventions
16. limitations in the original study report added to their uncertainty. Some
panel members stated that if they had learned that positive effects had
been found in several schools rather than in only a few odd cases, they
would have been more comfortable ruling this multisite experiment a
replication.
Reviewers Initially
Disagreed in Assessing Top
Tier Status
Because detailed guidance was lacking, panel members, relying on
individual judgment, arrived at split decisions (4–3 and 3–5) on two of the
first four early childhood interventions reviewed, and only one
intervention received a unanimous vote. Panel members expressed
concern that because some criteria were not specifically defined, they had
to use their professional judgment yet found that they interpreted the
terms somewhat differently. This problem may have been aggravated by
the fact that, as one member noted, they had not had a “perfect winner”
that met all the top tier criteria. Indeed, a couple of members expressed
their desire for a second category, like “promising,” to allow them to
communicate their belief in an intervention’s high quality, despite the fact
that its evidence did not meet all their criteria. In a discussion of their
narrow (4–3) vote at their next meeting (February 2009), members
suggested that they take more time to discuss their decisions, set a
requirement for a two-thirds majority agreement, or ask for votes from
members who did not attend the meeting. The latter suggestion was
countered with concern that absent members would not be aware of their
discussion, and the issue was deferred to see whether these differences
might be resolved with time and discussion of other interventions.
Disagreement over Top Tier status was less a problem with later reviews,
held in February and July 2009, when none of the votes on Top Tier status
were split decisions and three of seven votes were unanimous.
The Coalition reports that it plans to supplement guidance over time by
accumulating case decisions rather than developing more detailed
guidance on what constitutes sizable and sustained effects. The December
2008 and May 2009 public releases of the results of the Top Tier Evidence
review of early childhood interventions provided brief discussion of
examples of the panel’s reasoning for accepting or not accepting specific
interventions. In May 2009, the Coalition also published a revised version
of the Checklist that removed the preference for outcomes measured a
year after the intervention ended, replacing it with a less specific
Page 12
GAO-10-30 Effective Interventions
17. reference: “over a long enough period to determine whether the
intervention’s effects lasted at least a year, hopefully longer.” 16
At the February 2009 meeting, Coalition staff stated that they had received
a suggestion from external parties to consider introducing a second
category of “promising” interventions that did not meet the top tier
standard. Panel members agreed to discuss the idea further but noted the
need to provide clear criteria for this category as well. For example, they
said it was important to distinguish interventions that lacked good quality
evaluations (and thus had unknown effectiveness) from those that simply
lacked replication of sizable effects in a second randomized study. It was
noted that broadening the criteria to include studies (and interventions)
that the staff had previously screened out may require additional staff
effort and, thus, resources beyond those of the current project.
Top Tier Follows
Rigorous Standards
but Is Limited for
Identifying Effective
Interventions
The Top Tier initiative’s criteria for assessing evaluation quality conform
to general social science research standards, but other features of the
overall process differ from common practice for drawing conclusions
about intervention effectiveness from a body of research. The initiative’s
choice of a broad topic fails to focus the review on how to achieve a
specific outcome. Its narrow evidence criteria yield few recommendations
and limited information on what works to inform policy and practice
decisions.
Review Initiatives Share
Criteria for Assessing
Research Quality
The Top Tier and all six of the agency-supported review initiatives we
examined assess evaluation quality on standard dimensions to determine
whether a study provides credible evidence on effectiveness. These
dimensions include the quality of research design and execution, the
equivalence of treatment and comparison groups (as appropriate),
adequacy of samples, the validity and reliability of outcome measures, and
appropriateness of statistical analyses and reporting. Some initiatives
included additional criteria or gave greater emphasis to some issues than
others. The six agency-supported initiatives also employed several
features to ensure the reliability of their quality assessments.
In general, assessing the quality of an impact evaluation’s study design and
execution involves considering how well the selected comparison protects
16
Coalition, 2007, p. 5.
Page 13
GAO-10-30 Effective Interventions
18. against the risk of bias in estimating the intervention’s impact. For random
assignment designs, this primarily consists of examining whether the
assignment process was truly random, the experimental groups were
equivalent before the intervention, and the groups remained separate and
otherwise equivalent throughout the study. For other designs, the reviewer
must examine the assignment process even more closely to detect whether
a potential source of bias (such as higher motivation among volunteers)
may have been introduced that could account for any differences observed
in outcomes between the treatment and comparison groups. In addition to
confirming the equivalence of the experimental groups at baseline, several
review initiatives examine the extent of crossover or “contamination”
between experimental groups throughout the study because this could
blur the study’s view of the intervention’s true effects.
All seven review initiatives we examined assess whether a study’s sample
size was large enough to detect effects of a meaningful size. They also
assess whether any sample attrition (or loss) over the course of the study
was severe enough to question how well the remaining members
represented the original sample or whether differential attrition may have
created significant new differences between the experimental groups.
Most review forms ask whether tests for statistical significance of group
differences accounted for key study design features (for example, random
assignment of groups rather than individuals), as well as for any deviations
from initial group assignment (intention-to-treat analysis). 17
The rating forms vary in structure and detail across the initiatives. For
example, “appropriateness of statistical analyses” can be found under the
category “reporting of the intervention’s effects” on one form and in a
category by itself on another form. In the Model Programs Guide rating
form, “internal validity”—or the degree to which observed changes can be
attributed to the intervention—is assessed through how well both the
research design and the measurement of program activities and outcomes
controlled for nine specific threats to validity. 18 The EPC rating form notes
whether study participants were blind to the experimental groups they
17
In intention-to-treat analysis, members of the treatment and control groups are retained in
the group to which they were originally assigned, even if some treatment group members
failed to participate in or complete the intervention or some control group members later
gained access to the intervention. See Checklist, p. 4.
18
These factors were initially outlined in the classic research design book by Donald T.
Campbell and Julian C. Stanley, Experimental and Quasi-Experimental Designs for
Research (Chicago: Rand McNally, 1963).
Page 14
GAO-10-30 Effective Interventions
19. belonged to—standard practice in studies for medical treatments but not
as common in studies of social interventions, while the PRS form does not
directly address study blinding in assessing extent of bias in forming study
groups.
The major difference in rating study quality between the Top Tier initiative
and the six other initiatives is a product of the top tier standard as set out
in certain legislative provisions: the other initiatives accept well-designed,
well-conducted quasi-experimental studies as credible evidence. Most of
the federally supported initiatives recognize well-conducted randomized
experiments as providing the most credible evidence of effectiveness by
assigning them their highest rating for quality of research design, but three
do not require them for interventions to receive their highest evidence
rating: EPC, the Community Guide, and National Registry of Evidencebased Programs and Practices (NREPP). The Coalition has, since its
inception, promoted randomized experiments as the highest-quality,
unbiased method for assessing an intervention’s true impact. Federal
officials provided a number of reasons for including well-conducted quasiexperimental studies: (1) random assignment is not feasible for many of
the interventions they studied, (2) study credibility is determined not by a
particular research design but by its execution, (3) evidence from carefully
controlled experimental settings may not reflect the benefits and harms
observed in everyday practice, and (4) too few high-quality, relevant
random assignment studies were available.
The Top Tier initiative states a preference for studies that test
interventions in typical community settings over those run under ideal
conditions but does not explicitly assess the quality (or fidelity) of
program implementation. The requirement that results be shown in two or
more randomized studies is an effort to demonstrate the applicability of
intervention effects to other settings. However, four other review
initiatives do explicitly assess intervention fidelity—the Community Guide,
MPG, NREPP, and PRS—through either describing in detail the
intervention’s components or measuring participants’ level of exposure.
Poor implementation fidelity can weaken a study’s ability to detect an
intervention’s potential effect and thus lessen confidence in the study as a
true test of the intervention model. EPC and the Community Guide assess
how well a study’s selection of population and setting matched those in
which it is likely to be applied; any notable differences in conditions would
undermine the relevance or generalizability of study results to what can be
expected in future applications.
Page 15
GAO-10-30 Effective Interventions
20. All seven initiatives have experienced researchers with methodological
and subject matter expertise rate the studies and use written guidance or
codebooks to help ensure ratings consistency. Codebooks varied but most
were more detailed than the Top Tier Checklist. Most of the initiatives also
provided training to ensure consistency of ratings across reviewers. In
each initiative, two or more reviewers rate the studies independently and
then reach consensus on their ratings in consultation with other experts
(such as consultants to or supervisors of the review). After the Top Tier
initiative’s staff screening review, staff and one advisory panel member
independently review the quality of experimental evidence available on an
intervention, before the panel as a group discussed and voted on whether
it met the top tier standard. However, because the panel members did not
independently rate study quality or the body of evidence, it is unknown
how much of the variation in their overall assessment of the interventions
reflected differences in their application of the criteria making up the Top
Tier standard.
Broad Scope Fails to
Focus on Effectiveness in
Achieving Specific
Outcomes
The Top Tier initiative’s topic selection, emphasis on long-term effects,
and narrow evidence criteria combine to provide limited information on
the effectiveness of approaches for achieving specific outcomes. It is
standard practice in research and evaluation syntheses to pose a clearly
defined research question—such as, Which interventions have been found
effective in achieving specific outcomes of interest for a specific
population?—and then assemble and summarize the credible, relevant
studies available to answer that question. 19 A well-specified research
question clarifies the objective of the research and guides the selection of
eligibility criteria for including studies in a systematic evidence review. In
addition, some critics of systematic reviews in health care recommend
using the intervention’s theoretical framework or logic model to guide
analyses toward answering questions about how and why an intervention
works when it does. 20 Evaluators often construct a logic model—a diagram
19
GAO, The Evaluation Synthesis, GAO/PEMD-10.1.2 (Washington, D.C.: March 1992);
Institute of Medicine, Knowing What Works in Health Care (Washington, D.C.: National
Academies Press, 2008); Iain Chalmers, “Trying to Do More Good Than Harm in Policy and
Practice: The Role of Rigorous, Transparent, Up-to-Date Evaluations,” The Annals of the
American Academy of Political and Social Science (Thousand Oaks, Calif.: Sage, 2003);
Agency for Healthcare Research and Quality, Systems to Rate the Strength of Scientific
Evidence (Rockville, Md.: 2002).
20
Institute of Medicine, Knowing What Works; N. Jackson and E. Waters, “Criteria for the
Systematic Review of Health Promotion and Public Health Interventions,” Health
Promotion International (2005): 367–74.
Page 16
GAO-10-30 Effective Interventions
21. showing the links between key intervention components and desired
results—to explain the strategy or logic by which it is expected to achieve
its goals. 21 The Top Tier initiative’s approach focuses on critically
appraising and summarizing the evidence without having first formulated a
precise, unambiguous research question and the chain of logic underlying
the interventions’ hypothesized effects on the outcomes of interest.
Neither of the Top Tier initiative’s topic selections—interventions for
children ages 0–6 or youths ages 7–18—identify either a particular type of
intervention, such as preschool or parent education, or a desired outcome,
such as healthy cognitive and social development or prevention of
substance abuse, that can frame and focus a review as in the other
effectiveness reviews. The other initiatives have a clear purpose and focus:
learning what has been effective in achieving a specific outcome or set of
outcomes (for example, reducing youth involvement in criminal activity).
Moreover, recognizing that an intervention might be successful on one
outcome but not another, EPC, NREPP, and WWC rate the effectiveness of
an intervention by each outcome. Even EPC, whose scope is the broadest
of the initiatives we reviewed, focuses individual reviews by selecting a
specific healthcare topic through a formal process of soliciting and
reviewing nominations from key stakeholders, program partners, and the
public. Their criteria for selecting review topics include disease burden for
the general population or a priority population (such as children),
controversy or uncertainty over the topic, costs associated with the
condition, potential impact for improving health outcomes or reducing
costs, relevance to federal health care programs, and availability of
evidence and reasonably well-defined patient populations, interventions,
and outcome measures.
The Top Tier initiative’s emphasis on identifying interventions with longterm effects—up to 15 years later for some early childhood
interventions—also leads away from focusing on how to achieve a specific
outcome and could lead to capitalizing on chance results. A search for
interventions with “sustained effects on important life outcomes,”
regardless of the content area, means assembling results on whatever
outcomes—special education placement, high school graduation, teenage
pregnancy, employment, or criminal arrest—the studies happen to have
measured. This is of concern because it is often not clear why some long-
21
GAO, Program Evaluation: Strategies for Assessing How Information Dissemination
Contributes to Agency Goals, GAO-02-923 (Washington, D.C.: Sept. 30, 2002).
Page 17
GAO-10-30 Effective Interventions
22. term outcomes were studied for some interventions and not others.
Moreover, focusing on the achievement of long-term outcomes, without
regard to the achievement of logically related short-term outcomes, raises
questions about the meaning and reliability of those purported long-term
program effects. For example, without a logic model or hypothesis linking
preschool activities to improving children’s self-control or some other
intermediate outcome, it is unclear why one would expect to see effects
on their delinquent behavior as adolescents. Indeed, one advisory panel
member raised questions about the mechanism behind long-term effects
measured on involvement in crime when effects on more conventional (for
example, academic) outcomes disappeared after a few years. Later, he
suggested that the panel should consider only outcomes the researcher
identified as primary. Coalition staff said that reporting chance results is
unlikely because the Top Tier criteria require the replication of results in
multiple (or multi-site) studies, and they report any nonreplicated findings
as needing confirmation in another study.
Unlike efforts to synthesize evaluation results in some systematic evidence
reviews, the Top Tier initiative examines evidence on each intervention
independently, without reference to similar interventions or, alternatively,
to different interventions aimed at the same goal. Indeed, of the initiatives
we reviewed, only EPC and the Community Guide directly compare the
results of several similar interventions to gain insight into the conditions
under which an approach may be successful. (WWC topic reports display
effectiveness ratings by outcome for all interventions they reviewed in a
given content area, such as early reading, but do not directly compare
their approaches.) These two initiatives explicitly aim to build knowledge
about what works in an area by developing logic models in advance to
structure their evaluation review by defining the specific populations and
outcome measures of interest. A third, MPG, considers the availability of a
logic model and the quality of an intervention’s research base in rating the
quality of its evidence. Where appropriate evidence is available, EPCs
conduct comparative effectiveness studies that directly compare the
effectiveness, appropriateness, and safety of alternative approaches (such
as drugs or medical procedures) to achieving the same health outcome.
Officials at the other initiatives explained that they did not compare or
combine results from different interventions because they did not find
them similar enough to treat as replications of the same approach.
However, most initiatives post the results of their reviews on their Web
sites by key characteristics of the intervention (for example, activities or
setting), outcomes measured, and population, so that viewers can search
for particular types of interventions or compare their results.
Page 18
GAO-10-30 Effective Interventions
23. Narrow Evidence Criteria
Yield Limited Guidance for
Practitioners
The Top Tier initiative’s narrow primary criterion for study design
quality—randomized experiments only—diverges from the other initiatives
and limits the types of interventions they considered. In addition, the
exclusivity of its top tier standard also diverges from the more common
approach of rating the credibility of study findings along a continuum and
resulted in the panel’s recommending only 6 of 63 interventions for ages 0–
18 reviewed as providing “sizable, sustained effects on important life
outcomes.” Thus, although they are not their primary audience, the Top
Tier initiative provides practitioners with limited guidance on what works.
Two basic dimensions are assessed in effectiveness reviews: (1) the
credibility of the evidence on program impact provided by an individual
study or body of evidence, based on research quality and risk of bias in the
individual studies, and (2) the size and consistency of effects observed in
those studies. The six other evidence reviews report the credibility of the
evidence on the interventions’ effectiveness in terms of their level of
confidence in the findings—either with a numerical score (0 to 4, NREPP)
or on a scale (high, moderate, low, or insufficient, EPC). Scales permit an
initiative to communicate intermediate levels of confidence in an
intervention’s results and to distinguish approaches with “promising”
evidence from those with clearly inadequate evidence. Federal officials
from initiatives using this more inclusive approach indicated that they
believed that it provides more useful information and a broader range of
choices for practitioners and policy makers who must decide which
intervention is most appropriate and feasible for their local setting and
available resources. To provide additional guidance to practitioners
looking for an intervention to adopt, NREPP explicitly rates the
interventions’ readiness for dissemination by assessing the quality and
availability of implementation materials, resources for training and
ongoing support, and the quality assurance procedures the program
developer provides.
Some initiatives, like Top Tier, provide a single rating of the effectiveness
of an intervention by combining ratings of the credibility and size (and
consistency, if available) of intervention effects. However, combining
scores creates ambiguity in an intermediate strength of evidence rating—it
could mean that reviewers found strong evidence of modest effects or
weak evidence of strong effects. Other initiatives report on the credibility
of results and the effect sizes separately. For example, WWC reports three
summary ratings for an intervention’s result on each outcome measured:
an improvement index, providing a measure of the size of the
intervention’s effect; a rating of effectiveness, summarizing both study
quality and the size and consistency of effects; and an extent of evidence
Page 19
GAO-10-30 Effective Interventions
24. rating, reflecting the number and size of effectiveness studies reviewed.
Thus, the viewer can scan and compare ratings on all three indexes in a
list of interventions rank-ordered by the improvement index before
examining more detailed information about each intervention and its
evidence of effectiveness.
Randomized
Experiments Can
Provide the Most
Credible Evidence of
Effectiveness under
Certain Conditions
In our review of the literature on program evaluation methods, we found
general agreement that well-conducted randomized experiments are best
suited for assessing intervention effectiveness where multiple causal
influences lead to uncertainty about what has caused observed results but,
also, that they are often difficult to carry out. Randomized experiments are
considered best suited for interventions in which exposure to the
intervention can be controlled and the treatment and control groups’
experiences remain separate, intact, and distinct throughout the study.
The evaluation methods literature also describes a variety of issues to
consider in planning an evaluation of a program or of an intervention’s
effectiveness, including the expected use of the evaluation, the nature and
implementation of program activities, and the resources available for the
evaluation. Selecting a methodology follows, first, a determination that an
effectiveness evaluation is warranted. It then requires balancing the need
for sufficient rigor to draw firm conclusions with practical considerations
of resources and the cooperation and protection of participants. Several
other research designs are generally considered good alternatives to
randomized experiments, especially when accompanied by specific
features that help strengthen conclusions by ruling out plausible
alternative explanations.
Conditions Necessary for
Conducting Effectiveness
Evaluations
In reviewing the literature on evaluation research methods, we found that
randomized experiments are considered appropriate for assessing
intervention effectiveness only after an intervention has met minimal
requirements for an effectiveness evaluation—that the intervention is
important, clearly defined, and well-implemented and the evaluation itself
is adequately resourced. Conducting an impact evaluation of a social
intervention often requires the expenditure of significant resources to both
collect and analyze data on program results and estimate what would have
happened in the absence of the program. Thus, impact evaluations need
not be conducted for all interventions but reserved for when the effort and
cost appear warranted. There may be more interest in an impact
evaluation when the intervention addresses an important problem, there is
interest in adopting the intervention elsewhere, and preliminary evidence
suggests its effects may be positive, if uncertain. Of course, if the
Page 20
GAO-10-30 Effective Interventions
25. intervention’s effectiveness were known, then there would be no need for
an evaluation. And if the intervention was known or believed to be
ineffective or harmful, then it would seem wasteful as well as perhaps
unethical to subject people to such a test. In addition to federal regulations
concerning the protection of human research subjects, the ethical
principles of relevant professional organizations require evaluators to try
to avoid subjecting study participants to unreasonable risk, harm, or
burden. This includes obtaining their fully informed consent. 22
An impact evaluation is more likely to provide useful information about
what works when the intervention consists of clearly defined activities
and goals and has been well implemented. Having clarity about the nature
of intended activities and evidence that critical intervention components
were delivered to the intended targets helps strengthen confidence that
those activities caused the observed results; it also improves the ability to
replicate the results in another study. Confirming that the intervention was
carried out as designed helps rule out a common explanation for why
programs do not achieve their goals; when done before collecting
expensive outcome data, it can also avoid wasting resources. Obtaining
agreement with stakeholders on which outcomes to consider in defining
success also helps ensure that the evaluation’s results will be credible and
useful to its intended audience. While not required, having a wellarticulated logic model can help ensure shared expectations among
stakeholders and define measures of a program’s progress toward its
ultimate goals.
Regardless of the evaluation approach, an impact evaluation may not be
worth the effort unless the study is adequately staffed and funded to
ensure the study is carried out rigorously. If, for example, an intervention’s
desired outcome consists of participants’ actions back on the job after
receiving training, then it is critical that all reasonable efforts are made to
ensure that high-quality data on those actions are collected from as many
participants as possible. Significant amounts of missing data raises the
possibility that the persons reached are different from those who were not
reached (perhaps more cooperative) and thus weakens confidence that
the observed results reflect the true effect of the intervention. Similarly, it
is important to invest in valid and reliable measures of desired outcomes
22
See 45 C.F.R. Part 46 (2005) and, for example, the American Evaluation Association’s
Guiding Principles for Evaluators, revised in 2004.
www.eval.org/Publications/GuidingPrinciples.asp
Page 21
GAO-10-30 Effective Interventions
26. to avoid introducing error and imprecision that could blur the view of the
intervention’s effect.
Interventions Where
Random Assignment Is
Well Suited
We found in our review of the literature on evaluation research methods
that randomized experiments are considered best suited for assessing
intervention effectiveness where multiple causal influences lead to
uncertainty about program effects and it is possible, ethical, and practical
to conduct and maintain random assignment to minimize the effect of
those influences.
When Random Assignment Is
Needed
As noted earlier, when factors other than the intervention are expected to
influence change in the desired outcome, the evaluator cannot be certain
how much of any observed change reflects the effect of the intervention,
as opposed to what would have occurred anyway without it. In contrast,
controlled experiments are usually not needed to assess the effects of
simple, comparatively self-contained processes like processing income tax
returns. The volume and accuracy of tax returns processed simply reflect
the characteristics of the returns filed and the agency’s application of its
rules and procedures. Thus, any change in the accuracy of processed
returns is likely to result from change in the characteristics of either the
returns or the agency’s processes. In contrast, an evaluation assessing the
impact of job training on participants’ employment and earnings would
need to control for other major influences on those outcomes—features of
the local job market and the applicant pool. In this case, randomly
assigning job training applicants (within a local job market) to either
participate in the program (forming the treatment group) or not
participate (forming the control group) helps ensure that the treatment
and control groups will be equally affected.
When Random Assignment Is
Possible, Ethical, and Practical
Random assignment is, of course, suited only to interventions in which the
evaluator or program manager can control whether a person, group, or
other entity is enrolled in or exposed to the intervention. Control over
program exposure rules out the possibility that the process by which
experimental groups are formed (especially, self-selection) may reflect
preexisting differences between them that might also affect the outcome
variable and, thus, obscure the treatment effect. For example, tobacco
smokers who volunteer for a program to quit smoking are likely to be
more highly motivated than tobacco smokers who do not volunteer. Thus,
smoking cessation programs should randomly assign volunteers to receive
services and compare them to other volunteers who do not receive
services to avoid confounding the effects of the services with the effects of
volunteers’ greater motivation.
Page 22
GAO-10-30 Effective Interventions
27. Random assignment is well suited for programs that are not universally
available to the entire eligible population, so that some people will be
denied access to the intervention in any case. This addresses one concern
about whether a control group experiment is ethical. In fact, in many field
settings, assignment by lottery has often been considered the most
equitable way to assign individuals to participate in programs with limits
on enrollment. Randomized experiments are especially well suited to
demonstration programs for which a new approach is tested in a limited
way before committing to apply it more broadly. Another ethical concern
is that the control group should not be harmed by withholding needed
services, but this can be averted by providing the control group with
whatever services are considered standard practice. In this case, however,
the evaluation will no longer be testing whether a new approach is
effective at all; it will test whether it is more effective than standard
practice.
Random assignment is also best suited for interventions in which the
treatment and control groups’ experiences remain separate, intact, and
distinct throughout the life of the study so that any differences in
outcomes can be confidently attributed to the intervention. It is important
that control group participants not access comparable treatment in the
community on their own (referred to as contamination). Their doing so
could blur the distinction between the two groups’ experiences. It is also
preferred that control group and treatment group members not
communicate, because knowing that they are being treated differently
might influence their perceptions of their experience and, thus, their
behavior. Sometimes people selected for an experimental treatment are
motivated by the extra attention they receive; sometimes those not
selected are motivated to work harder to compete with their peers. Thus,
random assignment works best when participants have no strong beliefs
about the advantage of the intervention being tested and information
about their experimental status is not publicly known. For example, in
comparing alternative reading curriculums in kindergarten classrooms, an
evaluator needs to ensure that the teachers are equally well trained and do
not have preexisting conceptions about the “better” curriculum.
Sometimes this is best achieved by assigning whole schools—rather than
individuals or classes—to the treatment and control groups, but this can
become very expensive, since appropriate statistical analyses now require
about as many schools to participate in a study as the number of classes
participating in the simpler design.
Interventions are well suited for random assignment if the desired
outcomes occur often enough to be observed with a reasonable sample
Page 23
GAO-10-30 Effective Interventions
28. size or study length. Studies of infrequent but not rare outcomes—for
example, those occurring about 5 percent of the time—may require
moderately large samples (several hundred) to allow the detection of a
difference between the experimental and control groups. Because of the
practical difficulties of maintaining intact experimental groups over time,
randomized experiments are also best suited for assessing outcomes that
occur within 1 to 2 years after the intervention, depending on the
circumstances. Although an intervention’s key desired outcome may be a
social, health, or environmental benefit that takes 10 or more years to fully
develop, it may be prohibitively costly to follow a large enough proportion
of both experimental groups over that time to ensure reliable results.
Evaluators may then rely on intermediate outcomes, such as high-school
graduation, as an adequate outcome measure rather than accepting the
costs of directly measuring long-term effects on adult employment and
earnings.
Interventions for Which
Random Assignment Is Not
Well Suited
Random assignment is not appropriate for a range of programs in which
one cannot meet the requirements that make this strategy effective. They
include entitlement programs or policies that apply to everyone,
interventions that involve exposure to negative events, or interventions for
which the evaluator cannot be sure about the nature of differences
between the treatment and control groups’ experiences.
Random Assignment Is Not
Possible
For a few types of programs, random assignment to the intervention is not
possible. One is when all eligible individuals are exposed to the
intervention and legal restrictions do not permit excluding some people in
order to form a comparison group. This includes entitlement programs
such as veterans’ benefits, Social Security, and Medicare, as well as
programs operating under laws and regulations that explicitly prohibit (or
require) a particular practice.
A second type of intervention for which random assignment is precluded
is broadcast media communication where the individual—rather than the
researcher—controls his or her exposure (consciously or not). This is true
of radio, television, billboard, and Internet programming, in which the
individual chooses whether and how long to hear or view a message or
communication. To evaluate the effect of advertising or public service
announcements in broadcast media, the evaluator is often limited to
simply measuring the audience’s exposure to it. However, sometimes it is
possible to randomly assign advertisements to distinct local media
markets and then compare their effects to other similar but distinct local
markets.
Page 24
GAO-10-30 Effective Interventions
29. A third type of program for which random assignment is generally not
possible is comprehensive social reforms consisting of collective,
coordinated actions by various parties in a community—whether school,
organization, or neighborhood. In these highly interactive initiatives, it can
be difficult to distinguish the activities and changes from the settings in
which they take place. For example, some community development
partnerships rely on increasing citizen involvement or changing the
relationships between public and private organizations in order to foster
conditions that are expected to improve services. Although one might
randomly assign communities to receive community development support
or not, the evaluator does not control who becomes involved or what
activities take place, so it is difficult to trace the process that led to any
observed effects.
Random assignment is often not accepted for testing interventions that
prevent or mitigate harm because it is considered unethical to impose
negative events or elevated risks of harm to test a remedy’s effectiveness.
Thus, one must wait for a hurricane or flood, for example, to learn if
efforts to strengthen buildings prevented serious damage. Whether the
evaluator is able to randomly apply different approaches to strengthening
buildings may depend on whether the approaches appear to be equally
likely to be successful in advance of a test. In some cases, the possibility
that the intervention may fail may be considered an unacceptable risk.
When evaluating alternative treatments for criminal offenders, local law
enforcement officers may be unwilling to assign the offenders they
consider to be the most dangerous to the less restrictive treatments.
As implied by the previous discussion of when random assignment is well
suited, it may simply not be practical in a variety of circumstances. It may
not be possible to convince program staff to form control groups by
simple random assignment if it would deny services to some of the
neediest individuals while providing service to some of the less needy. For
example, individual tutoring in reading would usually be provided only to
students with the lowest reading scores. In other cases, the desired
outcome may be so rare or take so long to develop that the required
sample sizes or prospective tracking of cases over time would be
prohibitively expensive.
Finally, the evaluation literature cautions that as social interventions
become more complex, representing a diverse set of local applications of a
broad policy rather than a common set of activities, randomized
experiments may become less informative. When how much of the
intervention is actually delivered, or how it is expected to work, is
Page 25
GAO-10-30 Effective Interventions
30. influenced by characteristics of the population or setting, one cannot be
sure about the nature of the difference between the treatment and control
group experiences or which factors influenced their outcomes. Diversity
in the nature of the intervention can occur at the individual level, as when
counselors draw on their experience to select the approach they believe is
most appropriate for each patient. Or it can occur at a group level, as
when grantees of federal flexible grant programs focus on different
subpopulations as they address the needs of their local communities. In
these cases, aggregating results over substantial variability in what the
intervention entails may end up providing little guidance on what, exactly,
works.
Rigorous Alternatives
to Random
Assignment Are
Available
In our review of the literature on evaluation research methods, we
identified several alternative methods for assessing intervention
effectiveness when random assignment is not considered appropriate—
quasi-experimental comparison group studies, statistical analyses of
observational data, and in-depth case studies. Although experts differed in
their opinion of how useful case studies are for estimating program
impacts, several other research designs are generally considered good
alternatives to randomized experiments, especially when accompanied by
specific features that help strengthen conclusions by ruling out plausible
alternative explanations.
Quasi-Experimental
Comparison Groups
Quasi-experimental comparison group designs resemble randomized
experiments in comparing the outcomes for treatment and control groups,
except that individuals are not assigned to those groups randomly.
Instead, unserved members of the targeted population are selected to
serve as a control group that resembles the treatment group as much as
possible on variables related to the desired outcome. This evaluation
design is used with partial coverage programs for which random
assignment is not possible, ethical, or practical. It is most successful in
providing credible estimates of program effectiveness when the groups are
formed in parallel ways and not based on self-selection—for example, by
having been turned away from an oversubscribed service or living in a
similar neighborhood where the intervention is not available. This
approach requires statistical analyses to establish groups’ equivalence at
baseline.
Regression discontinuity analysis compares outcomes for a treatment and
control group that are formed by having scores above or below a cut-point
on a quantitative selection variable rather than through random
assignment. When experimental groups are formed strictly on a cut-point
Page 26
GAO-10-30 Effective Interventions
31. and group outcomes are analyzed for individuals close to the cut-point, the
groups are left otherwise comparable except for the intervention. This
technique is used where those considered most “deserving” are assigned to
treatment, in order to address ethical concerns about denying services to
those in need—for example, when additional tutoring is provided only to
children with the lowest reading scores. The technique requires a
quantitative assignment variable that users believe is a credible selection
criterion, careful control over assignment to ensure that a strict cut-point
is achieved, large sample sizes, and sophisticated statistical analysis.
Statistical Analyses of
Observational Data
Interrupted time-series analysis compares trends in repeated measures of
an outcome for a group before and after an intervention or policy is
introduced, to learn if the desired change in outcome has occurred. Long
data series are used to smooth out the effects of random fluctuations over
time. Statistical modeling of simultaneous changes in important external
factors helps control for their influence on the outcome and, thus, helps
isolate the impact of the intervention. This approach is used for fullcoverage programs in which it may not be possible to form or find an
untreated comparison group, such as for change in state laws defining
alcohol impairment of motor vehicle drivers (“blood alcohol
concentration” laws). But because the technique relies on the availability
of comparable information about the past—before a policy changed—it
may be limited to use near the time of the policy change. The need for
lengthy data series means it is typically used where the evaluator has
access to long-term, detailed government statistical series or institutional
records.
Observational or cross-sectional studies first measure the target
population’s level of exposure to the intervention rather than controlling
its exposure and then comparing the outcomes of individuals receiving
different levels of the intervention. Statistical analysis is used to control
for other plausible influences. Level of exposure to the intervention can be
measured by whether one was enrolled or how often one participated or
heard the program message. This approach is used with full-coverage
programs, for which it is impossible to directly form treatment and control
groups; nonuniform programs, in which individuals receive different levels
of exposure (such as to broadcast media); and interventions in which
outcomes are observed too infrequently to make a prospective study
practical. For example, an individual’s annual risk of being in a car crash is
so low that it would be impractical to randomly assign (and monitor)
thousands of individuals to use (or not use) their seat belts in order to
assess belts’ effectiveness in preventing injuries during car crashes.
Page 27
GAO-10-30 Effective Interventions
32. Because there is no evaluator control over assignment to the intervention,
this approach requires sophisticated statistical analyses to limit the
influence of any concurrent events or preexisting differences that may be
associated with why people had different exposure to the intervention.
In-depth Case Studies
Case studies have been recommended for assessing the effectiveness of
complex interventions in limited circumstances when other designs are
not available. In program evaluation, in-depth case studies are typically
used to provide descriptive information on how an intervention operates
and produces outcomes and, thus, may help generate hypotheses about
program effects. Case studies may also be used to test a theory of change,
as when the evaluator specifies in advance the expected processes and
outcomes, based on the program theory or logic model, and then collects
detailed observations carefully designed to confirm or refute that model.
This approach has been recommended for assessing comprehensive
reforms that are so deeply integrated with the context (for example, the
community) that no truly adequate comparison case can be found. 23 To
support credible conclusions about program effects, the evaluator must
make specific, refutable predictions of program effects and introduce
controls for, or provide strong arguments against, other plausible
explanations for observed effects. However, because a single case study
most likely cannot provide credible information on what would have
happened in the absence of the program, our experts noted that the
evaluator cannot use this design to reliably estimate the magnitude of a
program’s effect.
Features That Can
Strengthen Any
Effectiveness Evaluation
Reviewing the literature and consulting with evaluation experts, we
identified additional measurement and design features that can help
strengthen conclusions about an intervention’s impact from both
randomized and nonrandomized designs. In general, they involve
collecting additional data and targeting comparisons to help rule out
plausible alternative explanations of the observed results. Since all
evaluation methods have limitations, our confidence in concluding that an
23
See Karen Fulbright-Anderson, Anne S. Kubisch, and James P. Connell, eds., New
Approaches to Evaluating Community Initiatives, vol. 2, Theory, Measurement, and
Analysis (Washington, D.C.: Aspen Institute, 1998), and Patricia Auspos and Anne S.
Kubisch, Building Knowledge about Community Change: Moving Beyond Evaluations
(Washington, D.C.: Aspen Institute, 2004).
Page 28
GAO-10-30 Effective Interventions
33. intervention is effective is strengthened when the conclusion is supported
by multiple forms of evidence.
Collecting Additional Data
Although collecting baseline data is an integral component of the
statistical approaches to assessing effectiveness discussed above, both
experiments and quasi-experiments would benefit from including pretest
measures on program outcomes as well as other key variables. First, by
chance, random assignment may not produce groups that are equivalent
on several important variables known to correlate with program
outcomes, so their baseline equivalence should always be checked.
Second, in the absence of random assignment, ensuring the equivalence of
the treatment and control groups on measures related to the desired
outcome is critical. The effects of potential self-selection bias or other
preexisting differences between the treatment and control groups can be
minimized through selection modeling or “propensity score analysis.”
Essentially, one first develops a statistical model of the baseline
differences between the individuals in the treatment and comparison
groups on a number of important variables and then adjusts the observed
outcomes for the initial differences between the groups to identify the net
effect of the intervention.
Extending data collection either before or after the intervention can help
rule out the influence of unrelated historical trends on the outcomes of
interest. This is in principle similar to interrupted time-series analysis,
yielding more observations to allow analysis of trends in outcomes over
time in relation to the timing of program activities. For example, one could
examine whether the outcome measure began to change before the
intervention could plausibly have affected it, in which case the change was
probably influenced by some other factor.
Another way to attempt to rule out plausible alternative explanations for
observed results is to measure additional outcomes that are or are not
expected to be influenced by the treatment, based on program theory. If
one can predict a relatively unique pattern of expected outcomes for the
intervention, in contrast to an alternative explanation, and if the study
confirms that pattern, then the alternative explanation becomes less
plausible.
Targeting Comparisons
In comparison group studies, the nature of the effect one detects is
defined by the nature of the differences between the experiences of the
treatment and control groups. For example, if the comparison group
receives no assistance at all in gaining employment, then the evaluation
can detect the full effect of all the employment assistance (including child
Page 29
GAO-10-30 Effective Interventions
34. care) the treatment group receives. But if the comparison group also
receives child care, then the evaluation can detect only the effect, or value
added, of employment assistance above and beyond the effect of child
care. Thus, one can carefully design comparisons to target specific
questions or hypotheses about what is responsible for the observed results
and control for specific threats to validity. For example, in evaluating the
effects of providing new parents of infants with health consultation and
parent training at home, the evaluator might compare them to another
group of parents receiving only routine health check-ups to control for the
level of attention the first group received and test the value added by the
parent training.
Sometimes the evaluator can capitalize on natural variations in exposure
to the intervention and analyze the patterns of effects to learn more about
what is producing change. For example, little or no change in outcomes
for dropouts—participants who left the program—might reflect either the
dropouts’ lower levels of motivation compared to other participants or
their reduced exposure to the intervention. But if differences in outcomes
are associated with different levels of exposure for administrative reasons
(such as scheduling difficulties at one site), then those differences may be
more likely to result from the intervention itself.
Gathering a Diverse Body of
Evidence
As reflected in all the review initiatives we identified for this report,
conclusions drawn from findings across multiple studies are generally
considered more convincing than those based on a single study. The two
basic reasons for this are that (1) each study is just one example of many
potential experiences with an intervention, which may or may not
represent that broader experience, and (2) each study employs one
particular set of methods to measure an intervention’s effect, which may
be more or less likely than other methods to detect an effect. Thus, an
analysis that carefully considers the results of diverse studies of an
intervention is more likely to accurately identify when and for whom an
intervention is effective.
A recurring theme in the evaluation literature is the tradeoffs made in
constructing studies to rigorously identify program impact by reducing the
influence of external factors. Studies of interventions tested in carefully
controlled settings, a homogenous group of volunteer participants, and a
comparison group that receives no services at all may not accurately
portray the results that can be expected in more typical operations. To
obtain a comprehensive, realistic picture of intervention effectiveness,
reviewing the results of several studies conducted in different settings and
populations, or large multisite studies, may help ensure that the results
Page 30
GAO-10-30 Effective Interventions
35. observed are likely to be found, or replicated, elsewhere. This is
particularly important when the characteristics of settings, such as
different state laws, are expected to influence the effectiveness of a policy
or practice applied nationally. For example, states set limits on how much
income a family may have while receiving financial assistance, and these
limits—which vary considerably from state to state—strongly influence
the proportion of a state’s assistance recipients who are currently
employed. Thus, any federal policy regarding the employment of recipients
is likely to affect one state’s caseload quite differently from that of
another.
Because every research method has inherent limitations, it is often
advantageous to combine multiple measures or two or more designs in a
study or group of studies to obtain a more comprehensive picture of an
intervention. In addition to choosing whether to measure intermediate or
long-term outcomes, evaluators may choose to collect, for example,
student self-reports of violent behavior, teacher ratings of student
disruptive behavior, or records of school disciplinary actions or referrals
to the criminal justice system, which might yield different results. While
randomized experiments are considered best-suited for assessing
intervention impact, blended study designs can provide supplemental
information on other important considerations of policy makers. For
example, an in-depth case study of an intervention could be added to
develop a deeper understanding of its costs and implementation
requirements or to track participants’ experiences to better understand the
intervention’s logic model. Alternatively, a cross-sectional survey of an
intervention’s participants and activities can help in assessing the extent of
its reach to important subpopulations.
Concluding
Observations
The Coalition provides a valuable service in encouraging government
adoption of interventions with evidence of effectiveness and in drawing
attention to the importance of evaluation quality in assessing that
evidence. Reliable assessments of the credibility of evaluation results
require expertise in research design and measurement, but their reliability
can be improved by providing detailed guidance and training. The Top Tier
initiative provides another useful model in that it engages experienced
evaluation experts to make these quality assessments.
Requiring evidence from randomized experiments as sole proof of an
intervention’s effectiveness is likely to exclude many potentially effective
and worthwhile practices for which random assignment is not practical.
The broad range of studies assessed by the six federally supported
Page 31
GAO-10-30 Effective Interventions
36. initiatives we examined demonstrates that other research designs can
provide rigorous evidence of effectiveness if designed well and
implemented with a thorough understanding of their vulnerability to
potential sources of bias.
Assessing the importance of an intervention’s outcomes entails drawing a
judgment from subject matter expertise—the evaluator must understand
the nature of the intervention, its expected effects, and the context in
which it operates. Defining the outcome measures of interest in advance,
in consultation with program stakeholders and other interested audiences,
may help ensure the credibility and usefulness of a review’s results.
Deciding to adopt an intervention involves additional considerations—
cost, ease of use, suitability to the local community, and available
resources. Thus, practitioners will probably want information on these
factors and on effectiveness when choosing an approach.
A comprehensive understanding of which practices or interventions are
most effective for achieving specific outcomes requires a synthesis of
credible evaluations that compares the costs and benefits of alternative
practices across populations and settings. The ability to identify effective
interventions would benefit from (1) better designed and implemented
evaluations, (2) more detailed reporting on both the interventions and
their evaluations, and (3) more evaluations that directly compare
alternative interventions.
Agency and ThirdParty Comments
The Coalition for Evidence-Based Policy provided written comments on a
draft of this report, reprinted in appendix II. The Coalition stated it was
pleased with the report’s key findings on the transparency of its process
and its adherence to rigorous standards in assessing research quality.
While acknowledging the complementary value of well-conducted
nonrandomized studies as part of a research agenda, the Coalition believes
the report somewhat overstates the confidence one can place in such
studies alone. The Coalition and the Departments of Education and Health
and Human Services provided technical comments that were incorporated
as appropriate throughout the text. The Department of Justice had no
comments.
We are sending copies of this report to the Secretaries of Education,
Justice, and Health and Human Services; the Director of the Office of
Management and Budget; and appropriate congressional committees. The
Page 32
GAO-10-30 Effective Interventions
37. report is also available at no charge on the GAO Web site at
http://www.gao.gov.
If you have questions about this report, please contact me at (202) 5122700 or kingsburyn@gao.gov. Contacts for our offices of Congressional
Relations and Public Affairs are on the last page. Key contributors are
listed in appendix III.
Nancy Kingsbury, Ph.D.
Managing Director
Applied Research and Methods
Page 33
GAO-10-30 Effective Interventions
38. Appendix I: Steps Seven Evidence-Based
Initiatives Take to Identify Effective
Interventions
Search topic
Select studies
Review studies’ quality
Synthesize evidence
1. Evidence-Based Practice Centers at the Agency for Healthcare Research and Quality
Search for selected topics in
health care services,
pharmaceuticals, and medical
devices through
•
Electronic databases
•
Major journals
•
Conference proceedings
•
Consultation with experts
Select
•
Randomized and quasiexperimental studies
•
Observational studies (e.g.,
cohort, case control)
A technical panel of expert
physicians, content and
methods experts, and other
partners rates studies by
outcome on
•
Study design and execution
•
Validity and reliability of
outcome measures
•
Data analysis and reporting
•
Equivalence of comparison
groups
•
Assessment of harm
Body of evidence on each
outcome is scored on four
domains: risk of bias,
consistency, directness, and
precision of effects. Strength of
evidence for each outcome is
classified as
•
High
•
Moderate
•
Low
•
Insufficient
2. Guide to Community Preventive Services at the Centers for Disease Control and Prevention
Search for selected population- Select
based policies, programs, and
•
Randomized and quasihealth care system interventions
experimental studies
to improve health and promote
•
Observational studies (e.g.,
safety through
time series, case control)
•
Electronic databases
•
Major journals
•
Conference proceedings
•
Consultation with experts
In consultation with method and
subject matter experts, two
trained reviewers independently
rate studies using standardized
forms on
•
Study design and execution
•
Validity and reliability of
outcome measures
•
Data analysis and reporting
•
Intervention fidelity
•
Selection of population and
setting
Body of evidence is assessed
on number of studies, study
quality, and size and
consistency of effects to classify
evidence of effectiveness as
•
Strong
•
Sufficient
•
Insufficient
3. HIV Prevention Research Synthesis at the Centers for Disease Control and Prevention
Search for interventions that
prevent new HIV/AIDS
infections or behaviors that
increase the risk of infection
through
•
Electronic databases
•
Major journals
•
Conference proceedings
•
Consultation with experts
•
Nominations solicited from
the public
Select randomized and quasiPairs of trained reviewers—
experimental studies with one or Ph.D.s or M.A.s in behavioral
more positive outcomes
science and health related
areas—independently rate
studies using standardized
forms and codebook on
•
Study design and execution
•
Validity and reliability of
outcome measures
•
Data analysis and reporting
•
Equivalence of comparison
groups
•
Assessment of harm
Page 34
Ratings of study quality and
strength of findings are
combined to classify
interventions as
•
Best evidence
•
Promising evidence
GAO-10-30 Effective Interventions
39. Appendix I: Steps Seven Evidence-Based
Initiatives Take to Identify Effective
Interventions
Search topic
Select studies
Review studies’ quality
Synthesize evidence
4. Model Programs Guide at the Office of Juvenile Justice and Delinquency Prevention
Search for prevention and
intervention programs to reduce
problem behaviors (juvenile
delinquency, violence,
substance abuse) in at-risk
juvenile population through
•
Electronic databases
•
Nominations solicited from
the public
Select randomized and quasiexperimental studies with one or
more positive outcomes and
documentation of program
implementation (fidelity)
A 3-person panel with 2 external
Ph.D. content area experts—
with a codebook and
consensual agreement—
independently rate studies on
•
Study design and execution
•
Validity and reliability of
outcome measures
•
Data analysis and reporting
•
Equivalence of comparison
groups
•
Intervention fidelity
•
Conceptual framework
(logic and research base)
Ratings are combined across
review criteria— including
consistency of evidence—to
classify interventions as
•
Exemplary
•
Effective
•
Promising
5. National Registry of Evidence-Based Programs and Practices at the Substance Abuse and Mental Health Services
Administration
Search for
Select randomized and quasiexperimental studies with one or
•
Mental health promotion
more positive outcomes
•
Mental health treatment
•
Substance abuse
prevention
•
Substance abuse treatment
•
Co-occurring disorders
through
•
Electronic databases
•
Major journals
•
Nominations solicited from
the public
Page 35
Pairs of Ph.D. content
specialists independently rate
studies on
•
Study design and execution
•
Validity and reliability of
outcome measures
•
Data analysis and reporting
•
Intervention fidelity
Pairs of providers and
implementation experts
independently rate readiness for
dissemination on
•
Implementation materials
•
Training and support
resources
•
Quality assurance
procedures
Summary research quality
ratings (0–4) are provided for
statistically significant
outcomes. Interventions
themselves are not rated.
Scores on intervention
readiness are averaged to
provide a score of 0–4
GAO-10-30 Effective Interventions