Your SlideShare is downloading. ×
THE QUALITY OF DFID’S EVALUATION REPORTS
                  AND ASSURANCE SYSTEMS


    A REPORT FOR IACDI BASED ON THE QUA...
IACDI Evaluation Quality Report




                                    Executive Summary

To assist it in fulfilling its ...
IACDI Evaluation Quality Report




The review makes many recommendations to DFID to improve evaluation quality (¶60-99). ...
IACDI Evaluation Quality Report




(2) It is recommended that IACDI and senior management work together to seek to ensure...
IACDI Evaluation Quality Report




(4a) It is recommended that DFID reviews the expectations of senior management of what...
IACDI Evaluation Quality Report




   •   That evaluation summaries systematically capture all the main conclusions and
 ...
IACDI Evaluation Quality Report




X. Continuing the review of country programme evaluations (¶112)
The review contains a...
IACDI Evaluation Quality Report




                                 Abbreviations

BP       Burt Perrin

CSO      Civil S...
IACDI Evaluation Quality Report




                                             Table of Contents



Executive Summary   ...
IACDI Evaluation Quality Report




Tables

         Table 1   Aggregate ratings – DFID evaluations                       ...
IACDI Evaluation Quality Report




1    Introduction and Background

1.   The Independent Advisory Committee on Developme...
IACDI Evaluation Quality Report




     Following the meeting, the reports were revised, and it was agreed that all three...
IACDI Evaluation Quality Report




2     The quality of DFID’s evaluation reports and assurance systems:
      the method...
IACDI Evaluation Quality Report




      established in 2006 as an independent state-funded institute to undertake develo...
IACDI Evaluation Quality Report




      (1) Planning and Conception

      (2) Evaluation Design

      (3) Implementati...
IACDI Evaluation Quality Report




3     Head-line reflections on methodologies for assessing evaluation quality
20.   Th...
IACDI Evaluation Quality Report




               3. Overview of the quality of DFID evaluations and
               assur...
IACDI Evaluation Quality Report




         reviewed is of the wider population of evaluations and thus whether the resul...
IACDI Evaluation Quality Report




           which include criterion measures; and the establishment of IACDI one of who...
IACDI Evaluation Quality Report




      achieved “excellent” or “good” ratings: [Planning and Conception, 28 percent; Ev...
IACDI Evaluation Quality Report




      DFID and 90 percent in the case of non-DFID evaluations – score in the middle ra...
IACDI Evaluation Quality Report




      •   One weakness of DFID in comparison with most other agencies examined is the
...
IACDI Evaluation Quality Report




      on decentralised evaluations - a problem DFID shares with most of the comparator...
IACDI Evaluation Quality Report




      leading more joint evaluations than the other comparator agencies. This move is ...
IACDI Evaluation Quality Report




      insights to a country manager, at a far lower cost, than is achieved with these ...
IACDI Evaluation Quality Report




      analyses are severely constrained, and opportunities for local capacity-building...
IACDI Evaluation Quality Report




4. Recommendations for improving evaluation quality in DFID and in
     other evaluati...
IACDI Evaluation Quality Report




       achievement of broad development outcomes, given the parallel and complementary...
IACDI Evaluation Quality Report




70.    DFID, perhaps with the encouragement of IACDI, should explore ways of minimisin...
IACDI Evaluation Quality Report




79.   IACDI has repeatedly advocated for more resources for evaluation. In this connec...
IACDI Evaluation Quality Report




Evaluation utility and use
88.   EvD should build upon existing good practice in invol...
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Roger Riddell
Upcoming SlideShare
Loading in...5
×

Roger Riddell

788

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
788
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Roger Riddell"

  1. 1. THE QUALITY OF DFID’S EVALUATION REPORTS AND ASSURANCE SYSTEMS A REPORT FOR IACDI BASED ON THE QUALITY REVIEW UNDERTAKEN BY CONSULTANTS BURT PERRIN AND RICHARD MANNING Roger C. Riddell* September 2009 * Roger Riddell is a member of IACDI. He managed the Quality Consultancy work on behalf of IACDI
  2. 2. IACDI Evaluation Quality Report Executive Summary To assist it in fulfilling its role of monitoring evaluation quality in DFID, in early 2009, IACDI commissioned a review to assess the quality of DFID’s evaluation reports and assurance systems. The review was undertaken by Burt Perrin and Richard Manning and completed in August (¶ 1-8). This paper has two purposes. Firstly, it summarises the methods used in the review, the main findings and recommendations (Sections Sec. 2, 3 and 4.1). Secondly, against this backdrop, it lays out 11 recommendations for enhancing evaluation quality (Sec. 4.2) and discusses whether the review provides IACDI with sufficient information on evaluation quality to fulfil its TORs (Sec 5). As there is no agreed way of assessing evaluation quality, the consultants began by developing a methodology for the review. This included the context within which evaluations are carried out and the use to which evaluations are put. In this review the assessment of quality is viewed as a delicate balancing act of different factors whose importance changes from evaluation to evaluation (¶13-9). The methodological framework constructed was used to assess 14 recent DFID evaluations, and twelve evaluations of other agencies (¶9-10), and to compare DFID’s management systems and processes with those of seven comparator agencies (¶11-12). This review is amongst the most in- depth and extensive to have been attempted: it holds DFID up to new and probably higher standards than the quality assessments of many other, and especially bilateral, official aid agencies (¶20-3). The review paints a mixed picture of evaluation quality. Most of both DFID and non-DFID evaluations are assessed as “good”, one of DFID’s as “excellent”. However, a significant proportion is only rated “acceptable” and a minority (less than 15 percent in the case of DFID) as “borderline”. The quality of DFID’s evaluation reports and assurance systems are judged to be broadly on a par with those of the comparator agencies: in some areas, DFID’s management and assurance systems were assessed as stronger in other areas weaker than those of the comparator agencies (¶25, 30-40). The review draws attention to both strengths and weaknesses in evaluation quality (¶41-58) Some weaknesses can be addressed through action taken be EvD, but many are systemic in nature and need to be addressed by DFID’s top management, requiring a significant change in culture (¶ 38-57). A key overarching problem identified is an unduly defensive attitude to evaluation, notwithstanding the real advances that have been made, including, most notably, the efforts of EvD and a growing number of managers, including senior managers, who are increasingly championing, and who recognise the value of rigorous, independent evaluation (¶60, 102-3). ii September 2009
  3. 3. IACDI Evaluation Quality Report The review makes many recommendations to DFID to improve evaluation quality (¶60-99). This report draws on these to draw out the most important ones, which are reproduced below (¶102-13). The report ends by discussing whether the quality review IACDI commissioned has provided a sufficient basis upon which to advise the Secretary of State on evaluation quality in DFID. It judges that while the review has not been able to provide answers to all the questions in depth, it provides sufficient evidence upon which to comment on many of the central issues. It notes that a number of gaps in knowledge would be difficult to fill, even if more work were undertaken (¶114-20). Recommendations to improve evaluation quality in DFID Recommendations for DFID’s top management I. An overarching recommendation of the quality review (¶102-3). The review has found evidence of an unduly defensive attitude to evaluation in the DFID. An overarching recommendation is that: (1) DFID top management needs to address this head-on, as it has committed to do in its new evaluation policy, Building The Evidence To Reduce Poverty. It needs to do this: • by acknowledging current weaknesses; • by taking a clear lead in changing the culture of evaluation to one in which independent high quality evaluation is championed and becomes commonplace. This will require changes in systems and practices across the whole planning and project cycle, with “evaluation” built into all new initiatives as they are being developed, as well as in staff appraisal systems. This change of attitude provides the necessary context for the successful implementation of many of the other recommendations which follow, though it is directly linked to Recommendations 3 and 7. It should thus be addressed as a priority. Recommendations for DFID’s senior management and IACDI II. Establishing a closer integration of evaluation with strategic planning (¶104) While ensuring the independence of evaluation and consistent with DFID’s commitment to the Paris Declaration, iii September 2009
  4. 4. IACDI Evaluation Quality Report (2) It is recommended that IACDI and senior management work together to seek to ensure that decisions about what to evaluate and when are more closely integrated with strategic planning and policy-making at the highest level corporately, with strategic decision-making at the country-level and with DFID’s policy and with research work in order that a far better and more integrated process of lesson-learning is established. IACDI needs to interact with senior management to ensure this happens. Recommendations for DFID’s senior management III. Mainstreaming evaluation more (¶105) DFID is commended for the work it is already doing in prioritising evaluation across the whole Department. (3) As part of its ongoing work in this area it is recommended that DFID senior management change the way they understand and respond to evaluations. Inter alia Senior Management should: • Establish a (formalised) system for all Evaluation Department (EvD) and key decentralised evaluations to be discussed by a senior department committee and the Management Board. • Require relevant managers to provide a succinct written response to evaluations, including explaining clearly how they intend to respond to recommendations or, in cases where they disagree, why. • Develop systems to ensure that “evaluability” issues (including the selection of impact indicators) are considered at the planning stage of all operations. • Require operational departments to commit themselves to strengthening the evidence- base upon which quality evaluations so crucially depend. • Ensure all key staff understand the importance of evaluation and be sufficiently trained to undertake key tasks. • Create robust systems to ensure that lesson-learning from evaluations (both centralised and decentralised, DFID and non-DFID) becomes institutionalised across the Department. Recommendations for DFID senior management and for the Evaluation Department (EvD) IV. Communicating better the results of evaluation and DFID’s impact (¶106) In the changing international environment in which it is operating and especially as DFID works more closely with other donors and recipients, DFID needs to develop different and a more systematic and pro-active way of communicating the findings of evaluations to key audiences. iv September 2009
  5. 5. IACDI Evaluation Quality Report (4a) It is recommended that DFID reviews the expectations of senior management of what it conveys to the public about what evaluations can firmly say about the specific contribution that the UK’s official aid has on development outcomes, given the parallel and complementary contributions by other donors and recipient countries. Management should encourage the use of more appropriate theories-of-change models and intermediate outcomes to which DFID is contributing as ways to better assess the (often significant) contribution that DFID has been making to development outcomes and processes. (4b) It is also recommended that EvD develops its own communications strategy and interacts more closely with the communications managers (and the Press Office) to ensure consistency of message with the findings from evaluations. (4c) It is further recommended that EvD develops a more effective and imaginative way of packaging evaluation findings and engaging more pro-actively with key stakeholders including Parliament, development CSOs and the wider public. V. Reviewing skills and roles within the Evaluation Department (¶107) (5) It is recommended that DFID management undertakes a review of EvD staff skill profiles, turnover rates and current approaches to evaluation to assess the extent to which possible changes might assist its commitment to improving evaluation quality. This should include consideration of EvD making systematic use of senior independent evaluation advisers to attest to the quality of major evaluation reports and to report publicly on their findings. Recommendations for EvD VI. Managing evaluations better (¶108) The review draws attention to a number of weaknesses in the approach to evaluation of both DFID and other agencies. Many are familiar to EvD who are to be commended for the different initiatives already underway to sharpen the focus of and improve the management of evaluation. (6) It is recommended that EvD’s ongoing work in the way it approaches and manages evaluation is informed and shaped in particular by the following proposals made in the review. • That evaluations be more stream-lined in order to be fit-for-purpose. In particular, most evaluation reports will benefit by being shorter and more succinct. There should be far fewer complex evaluations, with fewer and sharper TORs and recommendations. • That management of evaluations be more “light-touch”. • That there should be more formative evaluations and, possibly, fewer large evaluations. • Other changes should include “smarter” planning, a greater delegation of decisions to those undertaking the evaluations in areas such as methodology. • That there should be more direct involvement of all key stakeholders. v September 2009
  6. 6. IACDI Evaluation Quality Report • That evaluation summaries systematically capture all the main conclusions and recommendations. • That management responses to evaluations should be formalised with clear report-back time-lines. • That the relevant OECD/DAC guidelines should be used as they were originally intended: - more to inform and shape than to rigidly determine TORs and methodologies. VII. Making better use of the skills of evaluators (¶109) (7) It is recommended that a “culture-change” takes place so that EvD encourages evaluators to use their expertise to provide honest assessments of what they find, drawing, wherever possible, on robust evidence, but not limiting their conclusions to evidence that can be objectively determined - as the review confirmed, more often than not, such evidence is simply not available. VIII. Operationalising DFID’s ethical standards on evaluation (¶110) DFID is committed to high ethical standards and transparency in evaluation but this commitment needs to be operationalised. (8) It is recommended that DFID develops and then rolls out (ideally with other like-minded donors) a code or codes of practice for itself (and other donors) which spell out clearly the respective roles of managers and evaluators, and of evaluation advisory groups and committees, ensuring that these can also be applied to large and complex evaluations. IX. Doing more to achieve the goals of the Paris Declaration (¶111) Notwithstanding the commitment to undertake more joint evaluations and recognising that some important advances have been made, a strong conclusion of the review is that DFID’s approach to evaluation remains insufficiently shaped and informed by the new thinking on aid and specifically by the UK’s commitments to the Paris Declaration, notably in the areas of alignment, mutual accountability and in the assistance given to help build capacities in major recipient countries. Against this background, (9a) Especially in recipient countries and working with other donors, it is recommended that decisions by DFID of what to evaluate and when are not merely informed, but are increasingly driven by recipient needs and priorities, and be based on a far more integrated system of communication between in-country and headquarters DFID staff, other development agencies and recipient country stakeholders. (9b) It is also recommended that all country assistance plans give priority to capacity development, including in the areas of in monitoring and evaluation, with recipient countries taking the lead, wherever possible. vi September 2009
  7. 7. IACDI Evaluation Quality Report X. Continuing the review of country programme evaluations (¶112) The review contains a range of comments on DFID’s current approach to country programme evaluations. Changes are recommended and a range of ideas are put forward, building in part on the experiences of comparator agencies. Against this backdrop, (10) It is recommended that EvD feed the findings of the review on the core purpose, strengths, weaknesses and challenges of country programme evaluations into its ongoing review of country programme evaluations, building on the experiences of other countries and its Paris Declaration commitments. XI. Improving EvD’s knowledge of decentralised evaluations (¶113) The review assessed the quality of some decentralised evaluations. However, it provided insufficient evidence and information upon which to shed much light on the overall quality of decentralised evaluations. (11) Before more substantial assessments of decentralised evaluations are made, it is recommended that DFID/EvD continue its work on capturing more data and information on the number and type of decentralised evaluations and linked studies that have been carried out. This analysis would benefit by being extended to analyse and understand more clearly the reasons why decentralised evaluations and studies are commissioned, and what value different DFID managers, other donors and recipient country stakeholders see in them. vii September 2009
  8. 8. IACDI Evaluation Quality Report Abbreviations BP Burt Perrin CSO Civil Society Organisation DAC Development Assistance Committee Danida Danish International Development Agency DFID Department for International Development, (United Kingdom) EC European Commission EQS Evaluation Quality Standards EvD Evaluation Department (of DFID) GBS General Budget Support IACDI Independent Advisory Committee on Development Impact IAD Internal Audit Department IEG Independent Evaluation Group (at the World Bank) IOB Inspectie Ontwikkelingssamenwerking en Beleidsevaluatie (Policy and Operations Evaluation Department, the Netherlands) NAO National Audit Office NORAD Norwegian Agency for International Development Co-operation NGO Non-Governmental Organisation ODA Official Development Assistance OECD Organisation for Economic Cooperation and Development OPM Oxford Policy Management RM Richard Manning SADEV Swedish Agency for Development Evaluation Sida Swedish International Development Co-operation Agency TOR Terms of Reference UTV Sekretariatet för utvärdering (Department for Evaluation, Sida) viii September 2009
  9. 9. IACDI Evaluation Quality Report Table of Contents Executive Summary ii Recommendations to improve evaluation quality in DFID iii Abbreviations viii 1. Introduction and background 1 2. The quality of DFID’s evaluation reports and assurance systems: 3 the methods and approaches used 2.1 The three work streams 3 2.2 Methods and approaches used 4 2.3 Head-line reflections on methodologies for assessing evaluation quality 6 3. Overview of the quality of DFID evaluations and assurance systems 7 3.1 Overview of the main findings and headline issues raised in the review 7 3.2 Quantitative assessment of the quality of the evaluations reviewed 9 3.3 A summary of the assessment of evaluation quality and DFID’s quality assurance systems 11 3.4 Other quality concerns raised in the review 16 4. Recommendations for improving evaluation quality in DFID and in other evaluations in which DFID is involved 17 4.1 Recommendations made in or drawn from the quality review 17 4.2 Narrowing down the recommendations: the eleven most important 23 5. Has the review covered the ground adequately? 30 ix September 2009
  10. 10. IACDI Evaluation Quality Report Tables Table 1 Aggregate ratings – DFID evaluations 9 Table 2 Aggregate ratings – non-DFID evaluations 10 x September 2009
  11. 11. IACDI Evaluation Quality Report 1 Introduction and Background 1. The Independent Advisory Committee on Development Impact (IACDI) was established to help strengthen evaluation in the Department for International Development (DFID) focusing on independence and quality. IACDI is required periodically to review the outputs from DFID’s evaluations to assess, inter alia “whether high quality evaluation standards, in line with DAC standards, are being applied and how effective is the Evaluation Department’s quality assurance system” (IACDI TORs 2007): http://iacdi.independent.gov.uk/about/terms-of-reference/. 2. In its first year, the Committee focussed its work on independence and the policy and institutional arrangements for DFID evaluations. Its work on quality was limited. At its June 2008 meeting, Committee members commented on the quality of recent evaluation reports, and, at its October meeting, the Evaluation Department (EvD) reported back on measures that had been taken or were planned to address some of the Committee’s concerns. 3. However, against the backdrop of DFID’s acknowledgement in its policy document Building the evidence to reduce poverty that the quality of evaluation needed to be strengthened and the agreement by IACDI members that the Committee needed a more comprehensive assessment of the quality of DFID’s evaluations and assurance systems, it was agreed that more work on quality issues was needed. 4. At his meeting with DFID’s Secretary of State and members of his Management Board in November 2008, IACDI’s Chair confirmed that the quality of DFID’s evaluation would be a major theme of the Committee’s work in 2009. At its January meeting, the Committee agreed to commission an in-depth review of evaluation quality in DFID. The review would assess the quality of both DFID’s evaluation reports and the assurance and management systems around which decisions are made, from the commissioning of studies to the use made of the findings of these studies, in part by comparing DFID’s reports and practices and its funding of evaluations with a small selection of other (like-minded) official aid agencies. Where gaps and weaknesses are identified, the review would make recommendations for strengthening systems and processes. 5. Terms of Reference for this work were drawn up and in February, after a limited tendering process, two consultants, Burt Perrin (BP) and Richard Manning (RM), were appointed to undertake this review, which was managed by Committee member, Roger Riddell. An Inception Report was prepared and discussed at a meeting in mid-March by three IACDI members (Roger Riddell, Tony Killick and Clive Smee) and two EvD staff members John Murray and Helen Wedgwood. It’s main purpose was to agree the list of evaluations to be reviewed and the method to be used to assess quality. Following this meeting, the consultants began work and their draft reports were completed by early June and circulated to IACDI members and to EvD together with a first draft of this report. 6. The consultants attended the IACDI meeting held at the end of June where the three draft reports were discussed. The Committee broadly endorsed the draft conclusions but made a number of suggestions including the sharpening of some recommendations. Further details are contained in the Minutes of the meeting, available from the IACDI web-site, at http://iacdi.independent.gov.uk/2009/07/minutes-of-6th-meeting-23rd-june-2009/. September 2009 12
  12. 12. IACDI Evaluation Quality Report Following the meeting, the reports were revised, and it was agreed that all three reports should be available on the IACDI web-site, http://iacdi.independent.gov.uk/. 7. The purpose of this Report is to provide a summary of the reports of the two consultants, and, on the basis of the consultants’ work, to lay out key recommendations to DFID for improving the quality of DFID’s evaluation reports and assurance systems. The conclusions of the review will be discussed further by IACDI in the light of planned discussions with outside experts and stakeholders at its October meeting, and the recommendations of the consultants and those contained in this Report together with further discussion by IACDI will then provide a key input into the Chair’s Annual Letter to the Secretary of State. 8. The rest of the Report is divided into five sections. Section 2 summarises the methods used in the review and the headline messages contained in the consultants’ reports. Based on the consultants’ findings, Section 3 provides an overview of the quality of DFID’s evaluations and assurance systems. Section 4 focuses on recommendations for improving evaluation quality in DFID, and it is divided into two. Section 4.1 draws together and lists the recommendations made by the consultants. Section 4.2 provides a shorter list of recommendations that DFID should consider to improve the quality of evaluation across the Department. Section 5 discusses whether the quality review provides IACDI with sufficient material to advise the Secretary of State on evaluation quality in DFID. September 2009 13
  13. 13. IACDI Evaluation Quality Report 2 The quality of DFID’s evaluation reports and assurance systems: the methods and approaches used 1 The three work-streams 9. Based on the Inception Report and decisions taken at the Inception Report meeting, the review comprised three distinct, though linked work-streams. 10. Firstly, using the agreed quality assessment framework, 14 DFID evaluations were assessed for their quality by Burt Perrin (BP). • Six were thematic evaluations: (1) General Budget Support; (2) Citizens’ Voice and Accountability; (3) The applicability of the Paris Declaration in fragile situations; (4) DFID’s policy of gender equality and women’s empowerment; (5) ‘Taking Action’: the UK Government’s strategy for tackling HIV and AIDS in the developing world; and (6) DFID’s private sector infrastructure investment facilities. • Four were country evaluations: (7) Zambia; (8) Sierra Leone; (9) Nepal and (10) Pakistan. • The final four were “decentralised” evaluations - those not formally commissioned or under the direct watch of DFID’s Evaluation Department (EvD). These were (11) The Uganda Poverty Eradication Action Plan; (12) Multi-donor budget support to Ghana; (13) DFID’s Pakistan country programme; and (14) The Anti-Corruption Commission Enhanced Support Project in Zambia. 11. Secondly, a further twelve evaluations commissioned and/or undertaken by the official development agencies of Denmark, the European Commission (EC), Ireland, the Netherlands, Norway and Sweden were assessed for their quality by Richard Manning (RM), using the same framework and compared with DFID evaluations. Efforts were made to try to find studies of similar countries or themes as those selected for DFID, but this proved challenging. • Four of these were country evaluations. (1) Tanzania (EC); (2) Uganda (Irish Aid); (3) Zambia (Norway/NORAD); and (4) Uganda (Denmark/Danida). • Eight were thematic evaluations. These were: (5) Swedish (Sida) and (6) Norwegian (NORAD) gender evaluations; (7) Swedish (Sida) and (8) Norwegian (NORAD) studies on HIV/AIDS; (9) Dutch (IOB) and (10) Swedish (SADEV) studies of private sector / mixed credit initiatives; and (11) Swedish (Sida) and (12) Dutch (IOB) studies of what were loosely classified as voice and accountability projects and programmes. 12. Thirdly, a comparative assessment was made by RM of the management systems and processes used by DFID’s EvD for ensuring the quality of evaluations undertaken and commissioned by DFID, placing these alongside the management systems and processes of other (like-minded) official aid agencies. The comparator agencies were the European Commission (EC), IOB (Netherlands), Irish Aid (Ireland), NORAD (Norway), and for Sweden – the Department for Evaluation (Sekretariatet för utvärdering, UTV) and SADEV, September 2009 14
  14. 14. IACDI Evaluation Quality Report established in 2006 as an independent state-funded institute to undertake development cooperation evaluations. The choice of comparator agencies was not scientifically-based. Rather it was based on a mix of factors including, severe time constraints which biased the choice to those geographically nearby, an attempt to choose agencies similar in terms of size and development outlook to DFID; and agencies whose senior managers were interested in the review and were willing both to share documentation and give time to answering the questionnaire and providing additional data as requested. While the International Financial Institutions clearly have plenty to tell us about evaluation quality, as their structure and remit represent a model and a level of resourcing quite far removed from DFID to be a good point of comparison, it was decided not to include them in this particular review. 2 Methods and approaches used 13. There is no universally agreed method of assessing evaluation quality. However, it is increasingly recognised that assessments of evaluation quality need not only to examine the quality of the evaluation reports produced. They also need to include assessments of the wider context within which these reports are generated - in brief, what happens before evaluations are commissioned (how and why choices are made), what happens during the evaluation process, and what happens after evaluation reports are produced (whether and how recommendations are followed up). This wider understanding of quality is explicitly acknowledged in the title of the TORs for the Quality Review, The Quality of DFID’s Evaluation Reports and Assurance Systems and in the decision to include in the assessment of DFID in comparison with other agencies. 14. Few, if any, attempts have been made to undertake an in-depth assessment of evaluation quality of any official aid agency which includes both an assessment of the quality of evaluation reports and a wider (and comparative) assessment of the setting, processes and (management) systems within decisions to commission and undertake evaluations are located. Hence the first task of the consultants was to develop an approach to assessing evaluation quality which they could use to judge the quality of the evaluation reports they needed to review and wider assurance systems. They developed a two-part approach. 15. Firstly, a “quality assessment framework” against which the selected evaluation reports (the outputs) were to be judged was constructed. This was shaped and informed by the emerging DAC Evaluation Quality Standards (EQS), by a range of other agreed and emerging standards and statements of standards, and by the wider literature on evaluation quality. 16. The framework differs from many others in two important ways. Firstly, the use to which evaluations are put - the “utility” of evaluations - is included throughout the assessment of the report’s quality. Secondly, the assessment deliberately looks well beyond the quality of the final report to the quality of the “setting” within which decisions about the evaluation are made. Indeed for many of the DFID evaluations, BP was able to access and review a large number of documents (inception reports, background papers, interim reports and emails) as well as conduct interviews with evaluators, EvD managers and key DFID personnel involved in the subject being evaluated. 17. The quality framework entailed an eight stage process of assessment. Each evaluation was assessed in relation to each of the following seven different criteria, and then, on the basis of this performance, an overall assessment of the entire evaluation was made. September 2009 15
  15. 15. IACDI Evaluation Quality Report (1) Planning and Conception (2) Evaluation Design (3) Implementation (4) Analysis (5) Reporting (6) Utility and Use (7) Overarching Considerations. 18. For each of the seven stages, assessments were made on the basis of between six and eight specific criteria, and, in all, 54 questions were asked. Based on the individual scores achieved, each of the evaluations reviewed was given a rating for each of the seven stages of the assessment, under the following classification: “excellent”, “good”, “acceptable”, “borderline”, or “unacceptable”. Finally, an overall rating of each evaluation was given, using this same ranking of excellent, good, acceptable, borderline and unacceptable. As noted above, this quality assessment framework was used to assess both the DFID evaluations and those selected for review from other agencies. 19. The second part of the quality assessment comprised making comparisons between DFID and the selected comparator agencies. It consisted of a series of interviews with officials from both DFID and the comparator agencies based on eight questions aimed at assessing the quality of the management systems and processes for ensuring the quality of evaluations. The questions asked were adapted from the questionnaire used by the Peer Review and Evaluation Division of the OECD/DAC and comprised the following eight questions: (1) Is the operating environment favourable to high-quality evaluations? (2) Is the way in which the topics for evaluations are decided likely to encourage relevance and buy-in? (3) Is the approach to involving partners and other donors in evaluations logical and reasoned? (4) Are evaluations managed with a view to a quality product? (5) Are findings disseminated in a way likely to secure impact? (6) Are recommendations followed up systematically? (7) How are evaluation products perceived by outside stakeholders? (8) What role does the evaluation department play in relation to evaluative material commissioned by other parts of the agency? September 2009 16
  16. 16. IACDI Evaluation Quality Report 3 Head-line reflections on methodologies for assessing evaluation quality 20. Three over-arching views about evaluation quality are either made directly in or are strongly implied in the review. 21. Firstly, assessing evaluation quality is more an art than a science. There is no off-the-shelf template against which to judge clearly the quality of an evaluation study or an assurance system, or to rank the quality of one agency’s evaluations and assurance against those of others. This is in part because the judging quality involves assessment of different factors, the importance of which is likely to vary in relation to the nature and purpose of the evaluation and the agency concerned. There is no objective and a priori way of knowing which aspects of the different factors which contribute to evaluation quality are to be considered more, or less important. It is for this reason that BP’s report refers to the assessment of evaluation quality as a “delicate balancing act” rather than something with clear “rights” and “wrongs”. What matters most to one agency, or to the general public of the country in which it is located, may not matter as much to another agency or to its parliament and electorate. This makes it especially difficult to compare evaluation quality across agencies and between different cultural settings. 22. Secondly, and relatedly, evaluation quality is a fast-moving and evolving concept. Thus, few today would challenge the view that an assessment of evaluation quality needs to include an assessment of its use and usefulness, whereas five or so years ago, it was widely assumed that it was possible to assess the quality of an evaluation merely by examining the evaluation report produced in isolation, without reference to the context and how it is used. 23. Thirdly, this review of evaluation quality is probably amongst the most in-depth and extensive to have been attempted of any official aid agency. Consequently, the assessment is holding DFID up to new, different and probably higher standards than other assessments that have been conducted of other official aid agencies which have not only focused predominantly and more narrowly on the quality of evaluation reports, but have made their assessments against a smaller number of criteria. It is partly for this reason that this review has elicited considerable interest among other donor agencies. September 2009 17
  17. 17. IACDI Evaluation Quality Report 3. Overview of the quality of DFID evaluations and assurance systems 24. Against this backdrop, what do the findings from this review tell us about evaluation quality within DFID and how does DFID’s performance ranks in relation to the other comparator agencies? This chapter provides an overview of the results of the review commissioned by IACDI. It is divided into two parts. It starts by presenting a summary of the main “headline” findings on evaluation quality within DFID and in comparison with the comparator agencies. It then looks in more detail at some of the specific findings emerging from the review. Where appropriate, reference is made to the individual reports of the two consultants, referred to as either BP (for Burt Perrin’s report) or RM (for Richard Manning’s report). While IACDI broadly accepts the analysis and findings of the review, it does not necessarily agree with each and every assessment made or conclusion drawn. 3.1 Overview of the main findings and headline issues raised in the review 25. The findings of the quality review provide a mixed picture of evaluation quality both within DFID and beyond. Most of both DFID and non-DFID evaluations are assessed as “good”, one of DFID’s as “excellent”. However, a significant proportion is only rated “acceptable” and a minority (less than 15 percent in the case of DFID) as “borderline”. The review suggests that the quality of DFID’s evaluation reports and evaluation assurance systems are probably on a par with those of the comparator agencies: some DFID evaluations achieved higher ratings than those of other agencies, some did not. Likewise in some areas, DFID’s management and assurance systems were assessed as stronger than those of the comparator agencies, in other areas, they appear weaker. 26. However, as the consultants themselves stress, extreme caution needs to be exercised in interpreting these “raw scores” for the following reasons. • Firstly, the DFID and non-DFID assessments were made by two different consultants, one of whom is a professional evaluator and one of whom is not. • Secondly, the assessment of the non-DFID evaluations were based largely on the stand- alone evaluation reports and did not involve surveying other, supplementary documentation or interviews with key stakeholders as was possible for the majority of the DFID evaluations reviewed. • Thirdly, a particular difficulty arose in the attempt to compare DFID evaluations with those of the comparator agencies selected because, as the consultants emphasise in their reports, it proved extremely difficult to find any evaluations of other agencies that could be considered “similar” to any DFID evaluations in relation to their core purpose and the detailed TORs. • Fourthly, the evaluations assessed and the agencies selected against which to compare evaluation quality were not chosen on the basis of a scientific selection or random sampling process. Hence, it is not known how representative the sample of evaluations September 2009 18
  18. 18. IACDI Evaluation Quality Report reviewed is of the wider population of evaluations and thus whether the results of the assessments reflect the overall quality of DFID evaluations, or how DFID’s systems and processes for assuring evaluation quality rank in relation to all other official development co-operation agencies – though, because the sample of DFID’s centralised evaluations was a fairly large one, this cluster of evaluations is likely to have been “fairly representative”. 27. Hence, any comparisons made on the basis of these ratings should only be viewed as giving a “broad-brush” assessment. The raw figures presented in the consultants’ reports and reproduced here constitute an attempt to provide quantitative ratings to performance, much of which is complex and in qualitative in nature, and which cannot easily be captured by crude numerical scores. Thus the ratings given should be seen more as indicative rather than definitive; to do otherwise would be misleading. 28. The main contribution of the quality review lies less in the numerical ratings given, and far more in the wealth of data and information on the different dimensions of evaluation quality and in the analysis of different elements of the management and assurance systems that it provides. And in that context, the review draws attention to a large number of weaknesses in evaluation quality and assurance systems. One of its main messages is that, notwithstanding a range of initiatives which have been taken in recent years to enhance the quality of evaluation (see para 29), there remains still a significant gap between current levels of evaluation quality and what could – and should - be achieved and plenty of room for improvement (BP, paras S2 and 9.3). A second is that far more needs to be done and can done to enhance quality if evaluation is seen more as a tool for learning and enhancing the impact of aid to add to its function as a tool for accountability (BP, para 6.19, Recommendation 14). However, the efforts of both evaluators and agencies to improve evaluation quality are still often constrained by the lack of key base-line and monitoring data and by insufficiently clear initial objectives of aid projects, programmes and policies whose performance and progress are the subject of evaluation. While the review was path- breaking in trying to incorporate a number of systemic issues into the overall assessment of quality, it was clearly not possible to incorporate all such wider, systemic issues into assessment matrix used. To have done so would almost certainly have resulted in an even harsher assessment. All these different weaknesses need to be addressed, some urgently. The review presents a large number of specific recommendations and more general ideas to help achieve the overall objective of improving evaluation quality. 29. However, the review also commends DFID for recent advances made (see, for instance, BP, para 1.5). In common with other agencies, DFID in general and EvD in particular are committed to trying to improve its assurance systems - and advances have already been made and others are expected – even if the review does not go into much detail on the changes that have taken place. It is important, too, to locate this snap-shot assessment of evaluation quality in DFID today within the wider and historical perspective. Over the last few years, DFID has given growing priority to and allocated far more resources to evaluation, which, inter alia, have included a number of initiatives that directly or indirectly have aimed to enhance evaluation quality. These include recognition of evaluation as a DFID specialism; expanding the number of professional staff in the Evaluation Department; rolling out a new evaluation policy, Building The Evidence to Reduce Poverty, one of whose (four) pillars is aimed at driving up the quality of evaluation, including the development of a new set of standards for evaluation aimed at enhancing evaluation quality September 2009 19
  19. 19. IACDI Evaluation Quality Report which include criterion measures; and the establishment of IACDI one of whose core functions is to help strengthen evaluation quality in DFID.1 3.2 Quantitative assessment of the quality of the evaluations reviewed 30. Table 1 summarises the assessment given for the 14 DFID evaluations reviewed. The overall assessment rates the quality of the evaluations of almost 60 percent of the sample (8 out of 14) as either “excellent” (1) or “good” (7). Four evaluations (almost a third) are rated as “acceptable”, and 2 out of 14 (14 percent) are considered “borderline”. None are rated as “unacceptable” (BP, Annex 4). The average (mean) score of the 14 evaluations would be half-way between “good” and “acceptable”. If the fivefold classification of excellent, good, acceptable, borderline and unacceptable were to be replaced with a grading system of A, B, C, D and E, then the overall mean score for the 14 DFID evaluations assessed would be B-/ C+. Table 1. Aggregate ratings - DFID evaluations Unacceptable Acceptable Borderline Excellent Good Assessment category Overall assessment of the evaluation 1 7 4 2 Planning and Conception 4 6 4 Evaluation Design 1 4 7 2 Implementation 1 8 3 1 Analysis 5 2 5 2 Reporting 1 5 4 4 Utility and Use 2 3 5 3 Overarching Considerations 2 7 2 2 31. In relation to each of the seven evaluations stages the results were more mixed. It was not possible to provide an individual rating for all 14 evaluations, but the ratings indicate a significant number (36 percent) judged to have been “excellent” in terms of Analysis, while for two other categories – Implementation and Overarching Considerations – almost 70 percent of evaluations achieved either “excellent” or “good” ratings. In contrast, in aggregate, only half the evaluations assessed achieved either “excellent” or “good” ratings for Analysis, while for the remaining four categories, less than half of evaluations assessed 1 DFID’s new evaluation policy can be found at http://www.dfid.gov.uk/Documents/publications/evaluation/evaluation-policy.pdf. September 2009 20
  20. 20. IACDI Evaluation Quality Report achieved “excellent” or “good” ratings: [Planning and Conception, 28 percent; Evaluation Design, 36 percent; Reporting, 43 percent; and Utility and Use, 38 percent]. 32. Table 2 summarises the assessment given for the non-DFID evaluations. The overall assessment rates the quality of the evaluations of five out of 12 (41 percent) as “good” and none as “excellent”. Six evaluations (50 percent) are rated as “acceptable”, and 1 out of 12 (8 percent) are considered “borderline”. None are rated as “unacceptable”. 33. In relation to each of the seven evaluations stages the results of the assessments of the non- DFID evaluations were more mixed, though with most (over 90 percent) clustered round the middle scores of “good” and “acceptable”. For the seven criteria, the non-DFID evaluations only achieved 2 scores of “excellent” (2.5 percent of all raw-scores, both for Planning and Conception) compared with 12 “excellent” scores (14 percent of all raw-scores) for the DFID evaluations. More than half the non-DFID evaluations scored better (either “good” or “excellent” rather than “acceptable or “borderline”) in terms of Planning and Conception, Evaluation Design and Utility and Use, and were weakest in terms of Implementation. As can be seen comparing Tables 1 and 2, this was almost the reverse of the scores for the DFID evaluations. Table 2 Aggregate ratings - non-DFID evaluations Unacceptable Acceptable Borderline Excellent Good Assessment category Overall assessment of the evaluation 0 5 6 1 Planning and Conception 2 7 2 1 Evaluation Design 0 7 5 0 Implementation 0 3 7 Analysis 0 6 5 1 Reporting 0 6 5 1 Utility and Use 0 7 3 1 Overarching Considerations 0 2 2 1 34. The ratings reveal quite a wide variation in quality across both evaluations and agencies and even, in some cases, in relation to the different components of particular evaluations. Thus, there is little clear and strong uniformity in performance: only a minority of evaluations score consistently high (or low) marks across all the different categories. This finding is developed in both reports which argue that there is an identifiable and quite large gap between the current quality of evaluations and what could be achieved. In other words, both reports suggest that there is significant scope for improving the current quality of DFID evaluations. 35. Perhaps the overall message of the ratings is that when judged against the seven dimensions of evaluation quality assessed, most evaluations from the sample – 75 percent in the case of September 2009 21
  21. 21. IACDI Evaluation Quality Report DFID and 90 percent in the case of non-DFID evaluations – score in the middle range of “good” to “acceptable”, though more DFID than non-DFID evaluations fell outside this range, slightly more (16) receiving lower (“borderline”) scores than those (12) receiving scores of “excellent” – though these differences could well be attributable, at least in part, to the fact that the DFID and non-DFID evaluations were assessed by different consultants. 36. Did DFID receive value for money for the resources allocated to these evaluations? This was assessed as part of the “Overarching Considerations” in answer to the question “did the evaluation make the best use of its resources with reasonable explanation for modification to the original timeframe or budget”. In short, it was an attempt to compare the costs outlaid with benefits achieved. The findings reveal a mixed picture. Four evaluations (30 percent) were rated “excellent” on this criterion, four (30 percent) “satisfactory”, and two (15 percent) “unacceptable”. However, a high number, four (30 percent) could not be rated because of insufficient information. The discussion in BP’s report draws attention to the high cost of many of the evaluations, at the extreme, running into hundreds of thousands of dollars, but this is usually reflective of the complex nature of the evaluations and the time needed to cover the TORs adequately. Indeed, the consultants contracted often spent more days undertaking their work than they were paid for. More worrying was the observation that the evaluations were generally far more complex than they needed to be, with more complex (and costly) evaluations usually not resulting in higher quality reports or deeper insights into the issues examined (BP, paras 6.18 – 7.5 and page 60). 37. Finally, the BP report on the assessment of DFID evaluations highlights the relationship between evaluation quality and independence, commenting not only that a lack of independence can compromise the quality of the evaluation produced but that in at least two cases it did do so, and in others it threatened to do so (BP, paras 1.5, 3.6 - 3.8 and 3.14 – 3.20). 3.3 A summary of the assessment of evaluation quality and of DFID’s quality assurance systems 38. Both reports devote considerable space to the assessment of DFID’s assurance systems. There are quite a large number of headline messages. 39. Overall, DFID’s quality controls and assurance systems are, broadly, on a par with the system of the other agencies, exhibiting both strengths and weaknesses. In common with other agencies, EvD is committed to trying to improve its assurance systems - and advances have already been made and others are expected. Likewise, DFID’s evaluations are commended for the methodological approaches used in the evaluations reviewed (BP, para 4.1). But, like other agencies, DFID is perceived to be poorly informed about the evaluation work undertaken outside EVD. 40. However, a number of differences between DFID and other agencies are highlighted in the review. Though the sample of studies selected was very small - and one may well not be comparing like-with-like - the review suggests the following: • The quality of DFID’s decentralised evaluations does not appear to be markedly different from those emanating from EvD, whereas in the comparator agencies, the quality of their decentralised evaluations is judged to be of a poorer standard. September 2009 22
  22. 22. IACDI Evaluation Quality Report • One weakness of DFID in comparison with most other agencies examined is the dominance of staff in EvD who are managers and the paucity of staff with a background in evaluation and research. • Additionally, in relation to size, DFID’s budget for commissioning evaluations is “about average” with most comparator agencies, though DFID managers oversee on average among the highest numbers of studies per year. However, DFID has comparatively fewer senior evaluators, and the average of in-house resources used per evaluation appears to be low in the case of DFID (RM, Annex 11). 41. The review is critical of DFID evaluations for their excessive length and lack of focus. These problems, in turn, are often attributable to complex TORs and the sheer number of issues they are expected to address. This is clearly a problem that EvD recognises. It is exacerbated by the (commendable) practice of managers consulting widely prior to the launch of a major evaluation. What seems to be missing is an effective process of pinpointing and selecting the (few) key issues that need to be addressed at the conception phase, with senior managers taking a more active role in ensuring that the tasks outlined in the TORs are “do-able”. 42. Another weakness of DFID reports highlighted in the review is that they are often seen as too “bland” (BP, paras S9 and 5.5), a problem arising, in part, in response to stakeholder feedback and in part in response to DFID requirements that all the conclusions drawn in the evaluation reports be evidence-based. Historically, EvD/DFID has taken a very active approach to managing external consultants, probably more so than with the other agencies examined. While this can contribute positively to the quality of evaluations, the consultants judged that on occasion DFID managers have “leant” too heavily on evaluators in ways which have compromised their independence, with evaluation steering and reference groups, especially, over-extending their role by “suggesting” what evaluation reports should say. For complex multi-donor joint evaluations, it has not been uncommon for different agencies to demand membership of evaluation steering groups and for these on such groups to be as concerned with defending their own agency’s performance as with helping the consultants to produce a high quality evaluation. 43. A related problem identified is that DFID senior managers have tended to take an overly defensive attitude to evaluations and to any critical comments made in reports. Indeed, DFID’s sensitivity to criticism seems to have led them, on occasions, to try, with some success, to “manage” the conclusions of evaluations. The Review identified two cases where the independence of evaluation was compromised and others where it was threatened (BP, para S5). This has two linked consequences. The first is that the reports produced often appear to lack “bite” (BP, para S9). But, secondly, this attitude to evaluation is also seen to be at least a contributory cause of another problem identified in the review, namely weak management follow-up to evaluations: the Review notes that most management responses, where they exist are very vague and general in nature (BP, para S10). The quality of recommendations is uneven and most management responses, where they exist, are vague and general in nature (BP, para 5.1 – 5.7), even though it is recognised that this is an area where changes have been introduced, for instance through the following up of management responses to evaluation undertaken by DFID’s Internal Audit Department. The lack of information held centrally or known to EvD, and accessible on DFID’s web-site or intra-net September 2009 23
  23. 23. IACDI Evaluation Quality Report on decentralised evaluations - a problem DFID shares with most of the comparator agencies - is seen as another symptom of the lack of senior management “buy-in” to lesson-learning. 44. If evaluations were seen less as a threat and more as a “learning tool” for managers to assist them to deploy their resources more effectively, then, on the one hand, evaluators would be encouraged to write less cautious and more useful reports, while, on the other hand, managers would be likely to take more interest in engaging more deeply with the recommendations made and their follow up. Since the demise of the former Projects and Evaluation Committee, no senior DFID committee automatically discusses the findings and recommendations of evaluations (RM, page xii). Too loose a link between evaluation and research, and between the information that evaluations can provide senior management as it tries to achieve its strategic objectives help to explain why the evaluation findings are not seen as so important to decision-makers as they are in other agencies, notably the Netherlands (see RM, para 3.19-23). 45. However, these general comments about a defensive attitude to evaluation, and one which does not encourage evaluation to be used as a lesson-learning tool for managers, though a theme of the Review, need to be placed in perspective. Thus, the Consultants were also impressed by EvD’s interest in how the Review could be used to aid learning – indeed, they were encouraged to explore “challenging issues, where they thought that learning and ideas about future improvement” in evaluation quality might be gained (BP, para 1.5). Additionally in the discussion of country programme evaluations, the Review also drew attention to the fact that “senior management clearly values an independent assessment of the effectiveness, or otherwise, of country strategies, particularly as new strategies are produced” (RM, para 2.11). Overall, it judged that DFID sets an environment for high quality evaluation that “is broadly similar to those of the comparator agencies” (RM, para 3.5). 46. Another serious weakness identified in the review concerns what is seen as the paucity of up-front evaluation planning when strategies, plans and programmes are being developed or modified, a problem which cannot readily be addressed once the time has come to undertake an evaluation. Unless at the outset, objectives are clear and unless consideration has been given to how progress will be identified and monitored, some would argue that high quality evaluation is impossible (BP, para. 4.4 – 4.5). This, too, is a problem that EvD recognises and is trying to address, for example in relation to its assessment of a small number of what are termed “flagship policies”, and the work it is doing in creating a more robust approach to log-frames. 47. The uses to which evaluations are put include the dissemination of the findings. The review highlighted some good practices here, but, overall, its view was that far more needed to be done and could be done, by providing more readable summaries, using less technical language, by using forms of communication other than the written word, and by targeting particular audiences more directly. Though the investigations on this issue were quite limited, the review suggests that DFID evaluation reports probably rate average (in practice this means quite low) in terms of generating parliamentary interest, and below average for civil society organisations (RM, para 3.60-1). 48. DFID is moving (quickly) in the direction of itself commissioning joint evaluations or participating in evaluations led by other agencies; indeed, the review found that DFID is September 2009 24
  24. 24. IACDI Evaluation Quality Report leading more joint evaluations than the other comparator agencies. This move is welcomed in the review. However, the review also draws attention to what it sees as a significant gap between the positive and supportive statements made about the Paris Declaration, confirmed in the Accra Agenda for Action, and the way that DFID is approaching evaluation. In its view, too little has changed in practice, especially in relation to the alignment dimension of the Paris Declaration (RM, pages x - xii). For instance, although DFID has extended its consultation process to include stakeholders outside the Department, the final decision of what EvD chooses to evaluate, and when, is still made by DFID and IACDI. Similarly, developing country stakeholders are rarely, if ever, involved in decisions of what to evaluate and when; the results of centralised evaluations are rarely shared with them in any systematic way, and though DFID is a leader among agencies committed to and undertaking more joint evaluations, there are still very few joint evaluations conducted with and especially led by developing country governments or institutions. Additionally, far too little is being done to help build the capacity of aid-recipient countries to enable them to undertake evaluations, especially to take more of a leadership role. 49. The review considers the ability of evaluations to shed light on the particular contribution that DFID’s own (aid) resources can make to key development outcomes. It reports quite a sharp disconnect between expectations and what evaluations are able to deliver, not because evaluations have tended to provide an inaccurate analysis, but, at least in part, because DFID is seen to have set unrealistic claims about attribution. It is suggested that DFID needs to be more honest and explicit in acknowledging how little it is possible to say about DFID’s overall contribution to development outcomes when, as is usually the case, the share of ODA from DFID is well less than one fifth of all the aid provided and when, in the future, DFID is committing itself to work even more cooperatively with other agencies. The review makes a number of suggestions for how DFID might address the dilemma of wanting to find evidence to show that aid, and especially additional UK aid is needed and finding it difficult to point to the specific contribution that UK aid makes in particular countries. It discusses making use of alternative theory-of-change models and identifying intermediate outcomes which DFID contributes to as ways of helping address these problems. It suggests that DFID should focus its attention far more on trying to assess and report on the overall impact of aid and the contribution that aid makes to aggregate development, in harmony with resources from recipient countries and from other donors, rather than continuing to attempt to use one-off evaluations to try quantify the causal relationship between DFID inputs and aggregate development outcomes. These issues are linked to country programme evaluations. (BP, paras 4.12 – 4.20, RM. Pages x - xii) 50. Concerning country programme evaluations, RM’s report draws attention to the fact that different agencies assess the effectiveness of their country strategies in different ways: there is no “right” one-size fits all model and no watertight case for doing country programme evaluations, as the Swedish case makes clear (RM, para. 2.6 - 2.27). DFID’s “light touch” approach is viewed as one approach among many, its main drawback being that it may well be too “light” to be able to uncover much that it is not already known. BP suggests country programme evaluations should be redesigned to make them more relevant and meaningful (BP, Annex 5). RM cautions against doing more than experimenting with one or two better- resourced country programme evaluations as the benefits are not likely to offset the increase in costs. He records, though doesn’t necessarily agree with the view of someone interviewed who suggested that a quick visit by an experienced person could well provide as many September 2009 25
  25. 25. IACDI Evaluation Quality Report insights to a country manager, at a far lower cost, than is achieved with these light-touch evaluations. His report also challenges the assumption that because agency-specific country programme assessments (of whatever form) have been helpful in the past, they should necessarily be continued. In his view, in the future consideration should be given to developing new models, informed by the Paris Declaration commitments, assessing the impact of joint donor assistance strategies, with an explicit and prominent local input, perhaps forming part of the process of mutual accountability (RM, para 2.27). 51. The merits, strengths and weaknesses of country evaluations are discussed in a number of places in the review (see BP, para 6.4 – 6.5 and Annex 5). One issue highlighted is that the generic TORs with their quite rigid format risk eclipsing or side-stepping (sensitive) political issues and underplaying some of the more crucial country-specific issues that need to be grappled with, and hence reduce the relevance of what is analysed and what is recommended for country managers. Another issue raised concerns the core purpose of undertaking country evaluations which, according to the review, remains unclear (BP, Para 6.4). Indeed, the review suggests that there is a lack of unanimity within the agency about country evaluations: on the one hand, it is argued that they are used for accountability purposes and valued as an input into new country strategies, but, on the other, are seen to have little value for learning. The use to which country programme evaluations have been put remains limited in three different ways. • Firstly, the country evaluations have (rightly) been cautious in drawing firm conclusions about the overall contribution of DFID to aggregate development outcomes, not least because of the paucity of firm data (base-line and monitoring) upon which to draw conclusions, as well as a lack of clarity in terms of the core objectives of the aid intervention. • Secondly, there is only limited evidence to suggest that senior country managers have either been much engaged in these evaluations or have welcomed the analysis undertaken or the recommendations made. Country evaluations have not, generally, been perceived as major inputs into key policy and decision-making. In some cases, country evaluations have been seen as an externally-imposed burden on managers and have not been seen as useful in informing managers on how to use their resources more efficiently and effectively. • Thirdly, only in rare cases have recipient-country stakeholders engaged with the consultants in a manner that has been perceived as beneficial (to them). 52. Although the review of thematic evaluations highlights their added-value and the rich amount of information uncovered (though some were assessed as of poor quality), it cautions against agencies continuing to commission more thematic evaluations on the basis of these past benefits. Two main reasons are given (RM, paras. 2.29ff). Firstly, agencies are (strongly) criticised for commissioning very similar thematic evaluations, often drawing very similar conclusions, many of which are also found in the research literature. Even more single-agency initiated studies are simply not needed. Secondly, agencies are criticised because of the choice of consultants and their use of desk-studies. As those selected, and especially those leading these evaluations tend not to include senior recipient country evaluators, and because field visits are often not factored in, the depth and quality of the September 2009 26
  26. 26. IACDI Evaluation Quality Report analyses are severely constrained, and opportunities for local capacity-building are repeatedly lost. 53. Clearly, management and assurance systems play an important role in determining the quality of evaluation in an agency. But how important are they? Not surprisingly, the review does not provide a definitive answer, as it is an extremely difficult issue to assess. However, it does draw attention to one concern which is shared with other agencies: the variable quality of some of the consultants selected to undertake evaluations (RM, paras 3.40ff). However in BP’s sample of DFID evaluations, only one clear instance of this problem was found. DFID’s commitment to improving the quality of evaluations and the desire to undertake more evaluations is only likely to increase the risk of contracting people with insufficient skills and experience for DFID to deliver on its commitment to improve the quality of evaluations, unless attention is given to addressing this problem. 3.4 Other quality concerns raised in the review 54. Besides the “headline issues” summarised above, the following paragraphs briefly touch on other quality issues highlighted in the review. 55. Large, long-lasting and complex evaluations are widely thought to provide more rigorous and in-depth analysis. In contrast, the review found that even in these evaluations, the time allocated to and spent on trawling through the basic data is often insufficient. When these constraints are set alongside poor quality baseline and monitoring data and, not infrequently rather vague initial objectives, the conclusions drawn are often based on a less than robust analysis (BP, para 2.12). It is the view of both consultants that the number of issues that consultants are required to address - in both DFID and joint evaluations that DFID leads or plays a prominent role in - is more often than not simply too long. A growing number of agencies have decided strictly to limit the number of questions asked, narrowing them down, in the case of the EC, to no more than six. 56. One weakness of DFID’s approach identified in the review is the absence of any consideration of future review or evaluation built in when key policies and strategies are being developed and (eventually) launched. The result is that all evaluations were ex-post in nature. 57. The review discusses the merits of commissioning consultants to undertake evaluations (current EvD practice) rather than undertaking evaluations using in-house expertise, or a mixture of both. Irish Aid, NORAD, Sida and the EC follow DFID practice, whereas initially SADEV used its own staff, but now like IOB Netherlands uses the mixed approach (RM, pages xi - x). Though it sees no absolute advantage of any system, the review warms in particular to the Dutch “mixed” system because the presence of skilled, in-house evaluators can have positive knock-on effects across the department in terms of deepening knowledge about evaluation quality. 58. The review’s assessment of cross-cutting issues concluded that attention was generally given to gender equality issues. However, it was not entirely clear whether other issues, such as HIV/AIDS, the environment or socially-excluded groups ought or ought not to be included in evaluations, and if they were, quite how they should be assessed (BP, para 8.1 – 8.2). September 2009 27
  27. 27. IACDI Evaluation Quality Report 4. Recommendations for improving evaluation quality in DFID and in other evaluations in which DFID is involved 59. Against the backdrop of the weaknesses in evaluation quality in DFID summarised in Section 3, the review makes a number of specific recommendations for improving evaluation quality. The consultants also draw attention to areas where they believe consideration should be given to changing current practices, without drawing these out into explicit recommendations. This section of the Report is divided into two sub-sections. The first presents the full list of specific recommendations made by the consultants for improving both the quality of evaluation reports and studies and the quality of DFID’s assurance systems emerging from the review, to which have been added a number of additional recommendations for improving evaluation quality which are drawn directly from weaknesses identified and discussed in the consultants’ report. This list is long, and includes a mix of both major and minor small and both minor and major recommendations. To help focus discussion, the second sub-section draws out and presents what IACDI considers the eleven most important clusters of recommendations which it believes DFID needs most urgently to address in order to narrow the gap between the current quality of evaluations and what needs to be achieved. 4.1 Recommendation made in or drawn from the quality review Overarching recommendations, planning and conception 60. The prevailing attitude of managers to evaluation is still predominantly one of defensiveness. It is partly for this reason that managers sometimes overstep the mark in trying to unduly influence evaluation reports. This both damages the independence and credibility of the evaluation process and reduces the potential of evaluations to influence DFID’s strategies and policies. This culture needs to change. But it will only change if top management accepts the need for change, if it “champions” independent evaluation, and if it initiates a process of engagement with senior and middle-level managers to develop an attitude and approach to evaluation which encourages and welcomes the production of more challenging evaluations that are perceived as key parts of the tool-kit of methods and approaches aimed at aiding DFID to achieve its strategic objectives more effectively and efficiently. This will also require DFID to ensure evaluation is more effectively “mainstreamed” into the rhythm of its ongoing work. 61. Relatedly, and notwithstanding the advances already made, evaluation needs to be integrated far more closely and centrally with (high-level) strategic planning, country-level planning and with policy and research. Decisions about what to evaluate and when to evaluate need to be more closely linked to top-level strategic planning decisions and integrated more closely with DFID’s research programme. DFID management should commission a study of how research and evaluation programmes might best work together to produce rigorous and evidenced lessons for policymakers. 62. However good the quality of evaluations, DFID needs to be more honest in the expectations it conveys to the public of what evaluations on their own can tell us. In particular, DFID needs to acknowledge more explicitly the limits of one-off discrete evaluations being able to provide robust evidence of the specific contribution that DFID on its own makes to the September 2009 28
  28. 28. IACDI Evaluation Quality Report achievement of broad development outcomes, given the parallel and complementary contributions being made by other donors and recipient countries. . 63. The review recommends that DFID (and other agencies) experiment in order to develop new models and approaches for evaluation which are more suitable for the complex development strategies which are and will form the subject of most DFID evaluations, that, by their very nature, are amorphous and need to be flexible enough to be adapted in response to changing needs and situations, and which are undertaken in combination with the actions and inputs of others. EvD, preferably involving developing country and civil society representatives as well as other applicable donors, should spell out as specifically as possible what are the implications of the Paris Declaration and the Accra Agenda for Action for future evaluation activities. 64. Evaluation needs to be used and understood more clearly as a management tool to support learning and to provide guidance for future directions and for making improvements. This has implications for evaluation processes and outputs, for example with more direct (and more readable) evaluation reports along with more meaningful management responses providing for intelligent rather than the mechanical use of evaluation which should include and embrace the laying out of reasons for disagreeing with the conclusions and recommendations of evaluators. Such an approach has implications for the reward structure of managers; with managers both being encouraged to identify challenges they are facing and also being rewarded for responding to the challenges raised by evaluations. If this is to happen, it will require more attention to be paid to knowledge management and greater support for creating a more effective a learning culture in and across the whole of DFID. 65. Planning for evaluation needs to take place at the beginning, when new strategies, plans, programmes and other forms of initiatives are being developed or modified. Otherwise, meaningful, high quality evaluation is extremely difficult. Indeed, some might say that it is impossible. 66. Evaluations should be focused on a limited number of key evaluation priorities and issues, using the DAC criteria intelligently rather than mechanistically to help in identifying a small number of evaluation questions that are realistic to address, taking into account the context, data availability, and the scope and resources provided. EvD’s priority to evaluation pre- planning should be continued, with greater use of design studies and formative evaluations. EvD should take an intelligent rather than a ‘template’ approach to evaluation. 67. DFID’s Evaluation Department should consider the EC approach of limiting the number of evaluation questions in order to avoid reports that are too long and discursive. 68. DFID management should consider a system whereby all major objectives are covered by evaluation material on a multi-year cycle (on a planned basis over a specified number of years), thus giving greater ownership to the senior manager accountable for each objective. Project management 69. EvD should maintain its present active project management approach including its emphasis on quality assurance, while respecting that the main responsibility for methodology and reporting rests with the independent evaluation team. September 2009 29
  29. 29. IACDI Evaluation Quality Report 70. DFID, perhaps with the encouragement of IACDI, should explore ways of minimising some of the perverse effects of the current contracting process as discussed in the text. 71. While the active involvement of key stakeholders throughout the evaluation process should be maintained, it should be clarified that their main responsibility is to advise on strategic guidance and use of the evaluation and that final decisions for all aspects of the evaluation, in particular including the methodological approach and the contents of all reports, rests with the evaluation team in consultation with EvD. 72. EvD, in conjunction with other partners, should develop a more appropriate management model for future joint evaluations. 73. The quality and independence of evaluation risk being compromised when managers and those with the power to influence evaluations processes “lean” on evaluators in inappropriate ways. One way that DFID could help address this problem and, at the same time, reassure the public of the transparency of DFID’s evaluation systems, and provide concrete evidence of its wish to follow ethical good-practice, would be to develop a “code of practice” for evaluations which, inter alia, spelt out clearly the role of managers and advisory groups. 74. Relatedly, and as joint evaluations become more widespread, DFID should give priority to working with other donors to develop a similar “code of practice” especially for large cross- country, multi-donor evaluations. Such a code needs to lay out clearly how the members of evaluations advisory groups are to be chosen and what role these groups should have, to ensure that the quality and independence of the evaluation is not compromised and distorted by the narrow sectional interests of individual donors. It should also give prominence to the role of representatives from aid-recipient countries. EvD personnel and resources 75. DFID management should ensure that Evaluation Department is able to develop a mix of staff with DFID operational experience and staff (including with no DFID experience) who have particular expertise in evaluation and/or research methodology. 76. DFID management should devise means to ensure that senior evaluation managers with DFID operational backgrounds stay on average for 4 years in post, without sacrificing career enhancement. 77. To aid the improvement of the quality of evaluation, DFID should consider expanding the range of skills within EvD to increase the numbers of senior staff able to undertake evaluations themselves and with research experience. EvD should also ensure that it has sufficient interchange with DFID’s Research Department and country programmes, other donor agencies and the wider research community to ensure that decisions about what DFID chooses to evaluate are likely to complement what others are doing and add value to our knowledge of what works in development. 78. For the present, the model of contracting our evaluations seems appropriate. Once DFID Evaluation Department has an appropriate staff complement with appropriate skills in place, it could consider experimenting with one or two evaluations led by DFID senior evaluators on the Dutch model. September 2009 30
  30. 30. IACDI Evaluation Quality Report 79. IACDI has repeatedly advocated for more resources for evaluation. In this connection, it is recommended that EvD undertake further work on budgetary (income and expenditure) comparisons, ideally through the DAC Evaluation Network, and report to IACDI. Methodological considerations 80. All new and revised strategies, policies and programmes in DFID should include a preliminary evaluation plan of some form. EvD should provide overall guidance and support and also comment on the adequacy of these plans. 81. EvD should undertake exploratory work, perhaps in conjunction with some other partners, to identify appropriate theory-of-change models for complex development strategies. It should also explore means and techniques, including contribution analysis and perhaps econometric modelling, for estimating the effects of DFID’s contributions in complex situations. Reporting and communications 82. EvD should strive towards shorter, more focused reports, organised in a way that makes it easy for the reader to identify the main points and implications. 83. DFID management should ensure that all evaluations are discussed by a senior departmental committee, and that the Management Board sees an overview of key lessons from evaluation once a year. 84. EvD should develop a plan to provide for greater communication about evaluation findings, implications, and how they are being used, and also to increase interest in evaluation. This should include, inter alia, greater use of alternative forms of communication and dissemination such as short fact sheets written in user friendly language, interactive presentations of various types, and better use of the internet and new technologies. 85. DFID should ensure that information on action taken on all evaluation reports is readily available on its website. One way this might be done would be for all agencies to have an ‘evaluation’ button on their home page, leading directly to all published studies, (with EC- style quality scores), management responses to each study, and a list of studies in process or planned – the DAC Evaluation Network could perhaps set some general standards in this area. 86. DFID Information Department and Evaluation Department should examine what can be done to achieve greater public profile for evaluation findings. This should include discussions with civil society and other potential stakeholders. 87. More widely, DFID Evaluation Department should encourage a collectively financed and organised effort, perhaps under DAC/EVALNET or 3ie auspices, to select and publicise strong evaluation lessons in major areas of common interest on a regular basis in order both to improve lesson-learning and to reduce the temptation for each donor to evaluate the same topics. September 2009 31
  31. 31. IACDI Evaluation Quality Report Evaluation utility and use 88. EvD should build upon existing good practice in involving all relevant stakeholders, including from aid-recipient countries, during the evaluation, recognising the importance of evaluation process use, and should consider other ways in which this could be increased. 89. EvD should provide guidelines and quality assurance to improve the appropriateness of evaluation recommendations. 90. A different approach should be taken to management responses. Management should be required to develop and publish an action plan in response to an evaluation that may very well identify reasons for disagreeing with conclusions and recommendations of the evaluation report. Management responses should be reviewed, approved, and monitored by an appropriate senior management committee. 91. DFID senior management needs to (re-)establish a mechanism for the systematic consideration all evaluation reports. It should provide support for the development of a learning-oriented culture that inter alia would lead to greater recognition of evaluation as a management tool to aid in planning and programme improvement and also include more attention to knowledge management and knowledge sharing. 92. The review indicated that DFID/EvD has not been good in pro-actively communicating the results of evaluations to the public, including to development NGOs and CSOs and to Parliament. To address these important weaknesses, it is recommended that DFID review its methods of communicating evaluation findings to its different “publics” and that a far closer a relationship is developed between EvD and DFID’s Communications Division, including the press department. 93. The DAC Evaluation Network agreed in June to a study of the systems and resources of the evaluation operations of its members. This should provide a more comprehensive basis of comparison and, hopefully, data whose comparability are better assured than I have been able to assemble in the time available for this study. It would be useful if IACDI could examine the results of this work at a forthcoming meeting. To help improve the efficiency of resource allocations, it is recommended that DFID’s Evaluation Department should report to IACDI on the results of the DAC Evaluation Network study, as a basis for further consideration of resourcing issues. Country evaluations 94. The review argued that the purpose and approach to country programme evaluations should be redesigned (BP, para 6.19) and various specific suggestions are made. As discussed in paragraph 101, below, some changes are already taking place. • In consultation with DFID’s geographical divisions and country managers, senior management should clarify the core purpose of country programme evaluations vis-à-vis audit, accountability and impact, and strategic planning and lesson-learning. • DFID management, in collaboration with Evaluation Department, should consider replacing its present one-size-fits-all system for country programme evaluations with some deliberate experiments, including one or two better-resourced CPEs, some simpler reviews and (see bullet 3 below) more host-country-led approaches. Additionally, DFID September 2009 32

×