Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Incentivizing Textbooks for Self-Study: Experimental Evidence from the DRC


Published on

Authors: Anders Olofsgård, Jean-Benoit Falisse and Marieke Huysentruyt

Using a randomized field experiment in South Kivu, we study the impact of a simple “textbooks for self-study” incentive scheme targeting primary school students on student achievement. Students in the treatment schools scored 0.26σ higher in French but did no better in math. They were more likely to take the high-stakes end-of-6th-grade national exam and those who passed the test obtained higher scores. The largest positive impact was found in schools with lower teaching-efficacy and for lower-ability students. Our results demonstrate that incentives-only programs designed to intensify and diversify students’ use of existing school resources can sharply improve student achievement.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Incentivizing Textbooks for Self-Study: Experimental Evidence from the DRC

  1. 1. 1 Incentivizing Textbooks for Self-Study: Experimental Evidence from the Democratic Republic of Congo* Jean-Benoit Falisse˥ Marieke Huysentruyt†‡ Anders Olofsgård ‡⁋ Date of this version: February 28, 2020 Abstract Using a randomized field experiment in South Kivu, we study the impact of a simple “textbooks for self-study” incentive scheme targeting primary school students on student achievement. Students in the treatment schools scored 0.26σ higher in French but did no better in math. They were more likely to take the high-stakes end-of-6th-grade national exam and those who passed the test obtained higher scores. The largest positive impact was found in schools with lower teaching-efficacy and for lower- ability students. Our results demonstrate that incentives-only programs designed to intensify and diversify students’ use of existing school resources can sharply improve student achievement. JEL codes: C93, I21, J24, O15 * We are deeply grateful for the support from Cordaid to facilitate this research. We are particularly indebted to Adolphe Malanga, Emmanuel Muzi, Liliane Rwezangabo Milimbo, Kamala Kaghoma and Patrice Kiziba Yafali for excellent project management, supervision of surveys and data gathering, and to Alinda Bosch, Mamadou Sylla, Christian Bahizi and Manuela Jansen for continuous support and facilitation. We also thank Bluesquare (and especially Julie Vanhamme and Antoine Legrand) for their assistance with the RBF data and the surveys. We are also grateful to Abhijeet Singh, Pamela Campa, and Dagmara Celik Katreniak, and seminar participants at the S&O 2019 Research Day at HEC Paris, the ASWEDE 2019 Research Conference at Uppsala University, the 2019 NOVAFRICA Conference in Lisbon, the 2019 CSAE Conference in Oxford, and Stockholm School of Economics for helpful comments. Excellent research assistance was provided by Edvard Von Sydow. Finally, this research was made possible by the generous financial support from the World Bank, through the Results in Education for All Children (REACH) program. The analysis follows the pre- analysis plan published on May 8, 2018, at the AEA RCT Registry. The RCT ID is AEARCTR-0001845. ˥ University of Edinburgh, Edinburgh Futures Institute & Centre of African Studies ( † HEC Paris, Strategy and Business Policy ( ‡ Stockholm Institute of Transition Economics (SITE). ⁋ Stockholm School of Economics (
  2. 2. 2 I. Introduction Over the past decades, numerous programs and billions of dollars in national taxpayer and donor funds have been focused on making textbooks widely available in primary schools, including in the world’s poorest and most fragile settings, such as the Democratic Republic of Congo (DRC) (Fredriksen, Brar, and Trucano 2015; Read 2015; World Bank 2018). Yet the evidence of the impact of textbooks on student achievement is at best equivocal, when not disappointing (Glewwe, Kremer, and Moulin 2009; Sabarwal, Evans, and Marshak 2014; Glewwe and Muralidharan 2016). One potential problem is that these textbook distribution programs rarely foresee incentives for teachers or students to actually make use of the textbooks in schools, let alone to intensify or diversify their use, say by leveraging textbooks as a support for student self-study at home.1 This raises an important policy question, or opportunity: Can simple incentives designed to increase the use of textbooks improve student’s learning? Inspired by insights from recent field experiments finding strong complementarities between school inputs and incentives (Gilligan et al. 2018; Mbiti et al. 2019), this paper studies the impact of a novel “textbooks for self-study” incentive scheme designed to encourage primary school students to regularly take home textbooks for self-study. It is set in the DRC, a context where the government and the donor community have been repeatedly distributing textbooks in primary schools, however without much evidence that they are effectively used.2 The scheme rewards student effort through primarily non-financial 1 Self-study is defined as study outside the classroom and without direct supervision. 2 In the aftermath of Africa’s “First Congo War,” which led to the loss of some five million lives between 1994 and 2003, the education sector, like much else, was in desperate need of reconstruction. The DRC government and the donor community, led by the World Bank, has invested substantial amounts in rebuilding some of that capacity, including a large textbook distribution program that has allocated around 40 million textbooks nationwide since 2008.
  3. 3. 3 incentives and requires minimal input from teachers. Our evaluation was carried out on a sample of 90 primary schools, involving 1,498 5th/6th-grade primary school students, in the province of South Kivu. Thanks to rich longitudinal data that we collected on students, teachers, and school head teachers, we are able to demonstrate the added value of the incentive scheme – notably in terms of improved test scores, both on our own tests and the national exam, greater likelihood of sitting the national exam, and more ambitious attitudes toward studying and careers. To conduct the field experiment, we collaborated with the Ministry of Education of South Kivu and Cordaid (Caritas Netherlands), an international non-governmental organization that has been working in the education sector in the South Kivu over the past ten years. The “textbook for self-study” incentive scheme was designed to intensify and diversify the use of textbooks. More specifically, the scheme encouraged 5th/6th grade- students to regularly take home textbooks to study for weekly quizzes through a mix of incentives: (i) a public display of stars attributed to each student who took math and French textbooks home, brought them back in good condition and participated in an end-of-week quiz [image incentives based on individual student effort]; (ii) in-kind rewards such as pens and pencils for all students in classes where at least 75 percent of the students regularly participated in the routine [in-kind incentives based on overall class effort]; and (iii) a small flat-rate compensation to schools for participating in the intervention and to compensate for a potential increase in the number of lost and damaged books [school-level financial incentive based on overall compliance]. Financial and material incentives were given after the end line survey to avoid income effects. We randomly assigned 45 primary schools to the incentive scheme and 45 comparable primary schools to a control group. The incentive scheme was implemented over an 18-month period.
  4. 4. 4 We emphasize three main sets of results. First, we find that the students in the treatment schools scored 0.26 to 0.27σ higher on the French language test relative to students in the control group schools. We evidence better performance on both the sentence completion tasks and the retrieving words tasks. However, the intervention had no significant impact on student test scores in math in aggregate, or on any of the three main competency domains in math tested. We offer several explanations for why our incentive scheme had a positive effect on language skills but failed to raise performance in math. Second, we find that the students in the treatment schools were not only 13 percent more likely to sit the high-stakes national exam (TENAFEP, from 69 to 78%), we also find that those who passed did so with better scores (0.27σ higher scores in the main specification). This corroborates the first finding that the “textbook for self-study” routine significantly raised student achievement. Third, our findings suggest that the intervention improved students’ perceptions about the usefulness of textbooks and students’ educational and career aspirations. These effects suggest plausible mechanisms for how our incentive scheme produced better learning outcomes and led more people to sit the TENAFEP. By affecting students’ perceptions and aspirations, our incentive scheme may well have long-lasting consequences. Furthermore, we provide suggestive evidence that the weaker students benefited most from the “textbook for self-study” routine, as well as students in classrooms with weaker teachers. Relative to prior research on education, incentives and learning outcomes, this paper shifts focus in four important ways. First, we consider how existing inputs to education production (Hanushek 1979) can be used more efficiently, rather than evaluating the impact of new resources such as new technology-aided instruction programs (Muralidharan, Singh, and Ganimian 2019), new contract teachers (Duflo, Dupas, and Kremer 2011, 2015), or new
  5. 5. 5 grants to purchase classroom materials (Blimpo and Evans 2011). Second, we examine the potential role of textbooks to foster self-study outside the classroom.3 Third, we focus on incentives that reward student effort rather than learning outcomes, using primarily non- financial incentives4 To the best of our knowledge, our study provides the first causal evidence of the positive effects of self-study on student achievement in a fragile and conflict-affected environment. Fourth, we test our novel incentives-only scheme in a severely resource-constrained and fragile setting. Rigorous randomized experiments are rarely seen in fragile or conflict-affected states (Burde et al. 2017): of the 118 evaluations of education interventions reviewed by Glewwe and Muralidharan (2016), only two were conducted in one of the 15 countries deemed most fragile according to the Fragile States Index produced by the think tank Fund for Peace.5 One explanation for scant prior RCTs in fragile or conflict-affected settings is the challenges involved potentially affecting the internal validity of findings due to higher than usual levels of attrition. We undertake various measures to address this concern. The immediate economic and social repercussions of the “learning crisis” are particularly evident in these fragile settings (World Bank 2018). Yet we know surprisingly little about how student achievement can be effectively raised under multiple and severe 3 One of the main challenges for students to self-study is to improve self-control and delay instant gratification (Beland and Murphy 2016). In our setting, the textbook may also have served as a useful reminder or commitment device for students to follow through on their intentions to self-study. Our paper thus ties into the literature on student motivation, self-efficacy, and self-learning 4 These incentives are primarily non-financial, comprising image and in-kind rewards linked to individual and class effort, respectively. Even the small monetary compensations directed at the schools to motivate overall compliance with the scheme are conditional on overall effort, not student performance. By not linking incentives to student performance, we lower the probability of unintentionally raising teacher incentives to cheat or “teach to the test” (Jacob and Levitt 2003; Behrman et al. 2015; Martinelli et al. 2018). 5 Burde and Linden (2013) evaluated the impact of building schools in rural Afghanistan on school enrollment and student achievement. They found large effects on average student test scores (specifically, an increase of 0.4σ for boys and 0.6σ for girls). However, though the authors do not provide an analysis of the project’s cost- effectiveness, the costs of this intervention are likely also high. Orkin (2013) assessed the impact of an increase in instructional time on student learning outcomes in Ethiopia and found that this increase had very small effects on student achievement. Again, no data on the implementation costs of the intervention were provided.
  6. 6. 6 (resource) constraints, including particularly weak teacher capacity. Our study gives evidence of a “low-hanging fruit”-type intervention: The intervention does not require any new inputs but instead leverages an existing widespread yet underutilized educational resource to encourage self-study. Implemented on a small scale, the intervention compares favorably in terms of cost- effectiveness with interventions that increase teaching resources or focus on improved pedagogy (Kremer et al. 2013). Based on per-student expenditures and the most conservatively estimated ITT effect of 0.26σ, we estimate that US$65 is necessary to achieve a 1σ improvement in French test scores. It must be noted that the intervention partly piggy-backed on an existing results-based financing (RBF) program.6 Further gains in terms of cost-effectiveness could possibly be achieved in such a context. It also appears that the provisions for missing or damaged books were too high since very few books went missing or were severely damaged. The rest of this paper is organized as follows. Section II describes the context and the intervention. Section III describes our data and study design. Section IV presents our main results. Section V discusses some additional outcomes and the intervention’s cost- effectiveness. Section VI concludes. II. Context and Intervention II.A. Context 6 We discuss the implication of running the intervention in schools where an RFB program exists below.
  7. 7. 7 The Democratic Republic of Congo is one of the poorest and most conflict-ridden countries in the world (IMF 2015; World Bank 2018; Marivoet and De Herdt 2015). It was the battleground of “Africa’s First World War”, which led to the loss of an estimated five million lives between 1994 and 2003. The province of South Kivu remains marked by tension, and outbursts of violence carried out by various armed groups, including the regular army, are not infrequent (Kivu Security Tracker 2018). Despite recent efforts to improve the budget allocation, public finance for primary education, and education in general, remains low compared with most other countries in the region. Only 10.9 percent of the public budget is allocated to education and the spending execution of the education budget is at about 1.8 percent of GDP. Furthermore, public spending on education is uneven, biased toward the rich, particularly in preschool, secondary, and higher education (World Bank 2015). Even though South Kivu counts 15 percent of the country’s primary-level students, the province receives only 7 percent of teachers’ budget (De Herdt and Titeca 2016). Primary school education lasts six years and is compulsory for 6- to 11-year-old children. Net primary school enrolment has increased from 51 percent to 79 percent between 2005 and 2014, and it keeps increasing, especially due to progress in the enrolment of girls and children living in rural areas. Yet important challenges remain. Dropout and repetition rates are high, with the overall repetition rate hovering at over 10 percent (World Bank 2015). Schooling is frequently interrupted not just due to outbursts of violence, but also because many families struggle to pay the school fees. Quality of education is also lagging behind. According to an assessment carried out in 2014 by PASEC, a joint program of African francophone countries to evaluate the education sector, most grade 6 students in DRC were not sufficiently competent in reading or mathematics (World Bank 2018). Many primary school teachers lack the requisite teacher qualifications or training and frequently
  8. 8. 8 also strong motivation. Teachers’ wages are very low, and their disbursement is highly erratic –the result of commonly delayed payment by government and parents’ irregular payment of prohibitive school fees. As part of a US$150 million World Bank grant aimed at improving public service delivery, the DRC government launched a nationwide initiative in 2008 to distribute 18 million textbooks in primary schools. This initiative was followed by another US$100 million grant, between 2012 and 2017, in which more than 22 million textbooks were distributed. An internal review from the World Bank (2017) reported that 93 percent of these books actually reached schools, but also highlighted that additional effort is needed in the future in order to ensure the efficient and continuous use of the delivered textbooks. In 2018, the DRC government, again with the support of the World Bank, launched a new five-year program called ‘Projet d’Amélioration de la Qualité de l’Education’ (Project for Education Quality Improvement, PAQUE), specifically focused on tackling the learning crisis in primary schools in 12 of its 25 provinces. As part of this, the DRC government began to scale-up results-based financing to an additional 1,350 primary schools in 2019 and plans to introduce RBF in all primary schools across the focal 12 provinces by 2020. Taken together, recent strategic priorities in DRC’s education sector render the findings of our study of immediate policy interest. II.B. The Intervention The “textbooks for self-study” routine was implemented during five consecutive trimesters, from second trimester in school year 2016–2017 through the end of school year 2017–2018, in 45 randomly selected primary schools in the districts of Shabunda and Walungu, in South Kivu. Together with 45 schools in the control group, these schools represent the full population of primary schools where Cordaid had introduced an RBF
  9. 9. 9 program starting in 2008. The RBF program ties financing to school performance, measured using a long list of quantitative and qualitative indicators over primarily three domains. School enrollment carries the biggest weight in the overall computation of points, with extra weight on girls and in particular students classified as vulnerable.7 The second and third most important domains are administrative and financial management and supervision, respectively. There are no direct financial incentives for improvements in student performance, based on say test results. Neither was there, before our routine was introduced, any incentives on the usage of textbooks. The benefits of testing the impacts of the “textbooks for self-study” routine at these schools were twofold. First, thanks to the RBF program, there was already a monitoring system in place (with school visits on a quarterly basis), which we could extend to include monitoring of compliance with our own intervention. Second, we were able to integrate our extrinsic monetary incentives with the financial transfers paid to schools as part of the RBF program. This allowed us to economize on implementation costs, a point which we further discuss in the subsection on cost-effectiveness. A potential drawback of running the experiment only in RBF schools is questionable external validity, as these schools may not be representative for all primary schools.8 However, using administrative data for 2015 (see Table A5), we find that even after on 7 Vulnerable children comprise: (1) orphans; (2) children with a physical or mental handicap; (3) children born out of sexual violence and who are not accepted by their family and without any other assistance; (4) abandoned children; (5) children born out of wedlock; war-displaced children without any other assistance; (6) children affected by HIV/AIDS; children whose parents are affected by HIV/AIDS; (7) children 6-18 who have worked in artisanal mining; (8) children 6-13 who are being raised by vulnerable people or elderly. 8 RBF schools were selected based on two criteria: (i) the school needed to have a minimum ‘viable’ size (computed as 26 pupils multiplied by the number of classes in the school); and (ii) the school needed to be accessible by car. Furthermore, Cordaid sought to ensure that the proportion of different types of schools (Catholic, Protestant, or non-religious public schools) in the sample reflected the true proportions in the population at large in each district.
  10. 10. 10 average six to eight years of RBF implementation, the 90 RBF schools do not systematically outperform non-RBF schools in the same districts. They are better off along some dimensions (they had more teachers, a higher proportion of girls) but not along others (e.g. fewer qualified teachers per pupil and fewer seats per pupils). Importantly, the performance of the RBF and non-RBF schools in terms of both TENAFEP pass rates and TENAFEP mean scores were not significantly different. Together, these findings indeed strengthen the external validity of the present evaluation. Finally, as mentioned above, given that RBF is now being rolled out across all primary schools in the 12 select provinces, testing our intervention at RBF schools is in fact most informative from a policy-making perspective. The “textbooks for self-study” routine was developed with the input of multiple staff members at Cordaid based both at headquarters in The Hague and locally in Bukavu; primary school teachers (through a series of focus group discussions); and officials of SECOPE (Service for the Control of the Teacher Payroll), the provincial Head of Education, and the Ministry of Education. Based on these diverse inputs, the routine was designed to leverage an already-existing, widely available resource (textbooks), and to fit with existing practices of homework (children attend half days in school) and in-class evaluations, using a bundle of incentives. In the past, many development innovations or initiatives – take the US$100 million initiative to distribute textbooks across schools in DRC – have failed to realize their intended impacts in part because they overlooked or underinvested in incentives to take-up or utilize the goods provided. In 41 percent of the schools in our sample, letting students bring home books was not allowed and only a third of the students reported having a textbook allocated to them that they would regularly use. At each treatment school, a common set of initiatives was staged to help ensure clarity on the goals and practical details of the intervention among all relevant stakeholders.
  11. 11. 11 For instance, a team of facilitators, whom we recruited and trained ourselves, held introductory meetings with the head teacher and 5th-grade teachers, with the parent committee, and in class with 5th-grade students during which children made a poster together about the initiative (which then was hung up in the classroom as a useful reminder). They also conducted refresher meetings at the start of the following school year, this time with 6th-grade teachers and students, to remind everyone of the scheme and renew the posters where necessary. Further, a handful of questions were added to the standard survey that RBF-auditors conduct on a quarterly basis, specifically to assess the implementation of the new routine. Given the challenging environment, we opted for a “multi-level, multi-motive” incentive scheme to try to strongly encourage students, teachers, and schools to stick with the intervention.9 For students, the reward comprised of two elements. The first was an intrinsic, non-material individual reward based on a publicly announced (via posters in the classroom) star system. A student earned a star each week that she/he had (i) taken home and returned in good condition her/his math and French textbooks according to the assignment; and (ii) taken part in the weekly quiz. Similar public star systems have been found effective in other contexts (Ashraf, Bandiera, and Jack 2014). Note that the individual stars are not associated with any material benefits: The incentive works only through mechanisms of social status. The second reward element was an extrinsic, material group reward consisting of handouts of school material such as notebooks and pens and pencils. This reward was meant to be given to classes in which the ratio of actual stars to possible 9 We thus combine different incentives to increase manipulation strength. The disadvantage of this approach is that we cannot disentangle the relative importance of the different incentives, as limited sample size did not allow us to implement multiple treatment arms.
  12. 12. 12 stars (if all class students got stars every week) was at least 75 percent.10 We thus attempted to tap multiple motives, intrinsic (reputational or social image) as well as extrinsic (in-kind), on the assumption that these motives are complementary. For schools, a small flat-rate compensation, US$120 per year, was given for taking part in the experiment. This sought to address the schools’ concern about missing and damaged books, providing them with resources to buy new books to compensate for lost and damaged books. If there was money left after having compensated for the missing books, then the school could use that money to cover its general expenses. Schools (headteachers and teachers) were thus encouraged to help their students take care of the books and minimize any loss or damage to them, while keeping the program on track. Note that the material incentives were relatively weak. To mitigate any income effect produced by providing in-kind or financial rewards for student learning, all transfers were made only at the end of the intervention period. Further, contrary to most incentive systems evaluated in the literature, our intervention focuses on student effort, not the teachers’ or students’ performance. This is natural, as the intervention emphasized self-study, but it also has the advantage of not creating (perverse) incentives to cheat or teach only what is being directly measured, and/or lower the bar of required student achievement. III. Data and Study Design III.A. Data 10 In practice, however, the material benefit was in the end delivered to all classes irrespective of achievement. As this took place ex post and should have been a surprise to non-performing classes, it should not have impacted the perceived incentives.
  13. 13. 13 Our sample is composed of 90 primary schools in two districts of South Kivu. We stratified the sample by district, and then randomly assigned schools in each stratum to treatment or control group. We gathered detailed information about these schools and especially their 5th- and 6th-grade students using our own base- and end-line surveys combined with three primary sources of quantitative data. By combining different data sources, we can mitigate concerns about common method bias, triangulate our data, and validate our main results. (1) Own Surveys and Tests We held multiple surveys at each of the 90 schools both at baseline, before the intervention, and at end-line, after 18 months’ implementation of the intervention. In each school, we surveyed the head teacher, the teachers in the 5th (baseline) and 6th (end-line) grade, and the students in the 5th (baseline) and 6th (end-line) grade. We asked the head teachers about their socioeconomic status, professional experiences, human capital, school climate, leadership style, and the nature of after-school homework support. For teachers, we specifically enquired about their teaching experience and teaching efficacy. These survey questions were mostly drawn from established survey instruments, in particular from the Harvard Graduate School of Education and PASEC.11 All questionnaires were in French and the two most commonly used local languages, Swahili and Mashi. At baseline, all 5th-grade students who were present completed a survey in class. This survey included questions about the student’s household situation, school-related attitudes and habits, non-school-related work, and aspirations for the future. Our enumerators also instructed students to fill in an in-class test, which covered both 11 The Harvard Graduate School of Education survey instruments are explained here: PASEC is an international organization with the mission to evaluate education in the French-speaking parts of the developing world.
  14. 14. 14 mathematics and French, mostly based on questions from PASEC (2015). The test (provided in the Supplementary Appendix) covered basic grammar and reading comprehension in the French part, and basic numeracy (sequencing, basic arithmetic, fractions) in the math part.12 Questions were multiple choice, and there were overall eight questions in each subject, with a few sub-questions for each. Students were given the time they needed to complete the test. The results provide us with a useful proxy measure of knowledge at baseline and help us control for potential non-observable heterogeneity among the students. We repeated all the surveys at end-line. Given the overall poor student performance on the test at baseline, we added a few simpler questions to the end-line test. We also administered the same test to the teachers. At treatment schools, some extra survey questions explicitly focused on the intervention and its implementation were added. (2) National End-of-6th-Grade Exam Data Published by the Provincial Inspection of Education We hand-collected the available performance outcomes on the national exam, the ‘Test National de fin d’Etudes Primaires’ (National End of Primary Studies Test, TENAFEP). This high-stakes test is taken at the end of 6th grade and certifies completion of primary education, and is a requirement for students wishing to pursue secondary education. The TENAFEP comprises three subjects, all administered in French: French language, general knowledge, and mathematics. For each school, the Provincial Inspection of Education publishes how many students took the test and how many of those failed (disaggregated by gender), and then the names and overall score only for those students who passed the exam. 12 The French score was computed as the overall score without weighting between the different sections: the ‘completing a sentence’ section accounted for 12 points and the ‘retrieving a word’ section for 4 points. Similarly, the math score was the sum of maximally 3 points for decimals, 7 points for ordering, and 2 points for the word-based problem.
  15. 15. 15 We matched these student names with the names from our student database using a level of tolerance for incorrect or alternative spelling of names and double-checked this matching manually.13 (3) Provincial Division of Education We hand-collected several basic statistics about each school from the archival records maintained by the Provincial Division of Education. This allowed us to gather information about school size and students’ demographic composition. (4) RBF Audit Data We extracted several additional basic statistics about each school from the audit reports produced every quarter in the context of the RBF program present at all schools in our sample. These audit reports include information about basic school infrastructure, number of pupils who took the TENAFEP, quality and continuity of the teaching programs, and overall teacher and team performance. They also contain information about the proportion of RBF funding in the school’s overall revenues. III.B. Randomization Check and Attrition While stratifying allowed us to ensure balance between the treatment and control groups in terms of location (school district), we still confirmed, using baseline survey data, that balanced randomization was successfully achieved on other key observable characteristics. Also, given our setting, the problem of attrition was especially serious. To systematically analyze both issues of balance and attrition within the same framework, we regressed each observable variable at baseline (Y) for each student (i) in each school (s) on 13 Using the matchit package of Stata, and systematically checking all names with a score between 0.6 and 0.99 (which allows for typos but also first and last names being inverted).
  16. 16. 16 the intervention dummy variable (T), the attrition dummy (A), and the interaction between the two: 𝑌𝑖,𝑠 = 𝛼 + 𝛽𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑠 + 𝜋𝐴𝑡𝑡𝑟𝑖𝑡𝑖𝑜𝑛𝑖,𝑠 + 𝛾𝑇𝑠 𝐴𝑖,𝑠 + 𝜖𝑖,𝑠 (1) The constant α gives the mean value for students identified at both baseline and end-line in control schools and β measures the difference for the similarly identified students in treated schools. The parameter π captures the difference in the average value of the variable of interest within the group of students who could not be re-identified. Finally, γ, capturing the interaction effect between treatment and attrition, indicates whether the drivers of attrition are different between the control and treatment groups. We look at both individual and school-level characteristics. Looking first at balance between treatment and control schools, as shown in Table 1 in column 2, four variables – namely students’ age, gender, hours worked after school, and frequency of eating breakfast – are significantly different in the treatment versus control group, though the imbalance is slight.14 Further, it is a priori unclear what their joint influence on learning would be: Some variables suggest more vulnerable students (girls, less well-fed students) are in the treatment group while others, such as the number of hours spent doing work (child labor) suggest the opposite. Overall, our randomization checks fairly reassuringly indicate that randomization was effective at generating samples that were balanced on schools’ and students’ observable characteristics (measured at baseline), and we also present analysis with controls for those four variables that were slightly imbalanced in our regression tables. 14 These results are confirmed when regressing all observable variables jointly against the intervention dummy. When excluding the four variables that were found to be significantly different, an F-test of joint significance fails to reject the null of no difference between the treatment and control samples. Including the four variables, the F-test rejects the null hypothesis.
  17. 17. 17 From the baseline survey, the attrition rate is 46.85 percent -that is, out of 2,818 students surveyed at baseline, 1,498 students were matched at end-line. This high level of attrition reflects in large part the notoriously high dropout and repetition rates in rural DRC (World Bank 2015): Using data from the year before the intervention in the two districts of interest, we found an average combined drop-out and repeat rate of 28.53 percent (standard deviation of 40.29) in the schools under study (by which we mean that grade 6 classes are 28.35% smaller than grade 5 classes). What is more, day-to-day student absenteeism is very high in our context, meaning that there was a non-negligible risk that some students would not be present at school when we returned there. Absenteeism draws on issues typical to low-income countries (physical access to school, child labor, etc.) but also, in the context of the DRC, on the fact that many parents face difficulty paying the tuition fees and their children are expelled from class as a result. While it was very likely that we would lose a substantial part of our sample due to these contextual factors, the deterioration of the security situation over the course of the study further worsened attrition. The repeated postponement of the national elections led to severe economic difficulties and violence in the field, which affected our sample attrition in two different ways. Firstly, it aggravated the typical drop-out issues, with more families struggling to make ends meet. Secondly, the end-line survey could only be conducted months later than expected, near the very end of the schoolyear. Cordaid assesses the security situation daily in the districts and enumerators are only allowed in the field when the situation is judged as reasonably stable. At the time when end-line surveys were planned to take place the security situation turned bad for an extended period.15 15 The insecurity and remote locations, in particular in the district of Shabunda, made it prohibitively costly to try to track students outside of schools to conduct the end-line surveys and tests.
  18. 18. 18 In theory, attrition is problematic for two reasons. Firstly, it may affect the overall representativeness of the sample. In our case, Column 3 of Table 1 shows that older students are, overall, more likely to be missing at end-line. We cannot know for sure, but being older is a sign of past struggle at school and these students tend to be more likely to drop-out anyway, suggesting that our sample is still representative of students who pass from grade 5 to grade 6. Another, non-exclusive, possibility is that older students are less likely to be present in class at end-line, in which case our sample is slightly biased towards younger students. Secondly, attrition is of particular concern if it is correlated with different variables between treatment and control groups. In this respect, Column 4 of Table 1 is reassuring as no statistical difference between the two groups can be found. Furthermore, the difference in attrition rate between treatment and control schools is not statistically significant.16 Nevertheless, given the high level of attrition we also present results with inverse probability weights to correct for attrition based on observables (see, e.g., Wooldridge 2007; Muralidharan, Singh, and Ganimian 2019).17 III.C. Compliance Thanks to the quarterly field visits by the RBF verification agents of the Agence d’Achat de Performance (AAP), we were able to generate fine-grained measures of schools’ compliance with the new routine.18 Compliance was high overall, though weaker in seven 16 Using a logit model with school clusters, the difference in log-likelihood is -0.165 (z= -1.15). 17 The specification we present uses all the variables summarized in Table 1 to predict the weight’s denominator. The covariates of attrition (age), as well as the variables that are different between treatment and control groups (age, gender, hours worked after school, and frequency of eating breakfast) and the French and math baseline score that have a high variance are taken out to calculate the numerator. We tested alternative specifications of the IPW (e.g. leaving in French and math for the numerator and not including any of the school covariates) and noticed no substantial changes in the results. 18 Specifically, the following six dimensions were carefully evaluated: Whether (i) textbooks were returned in good state; (ii) a “take home” log with student names and date was maintained; (iii) the star system was correctly used; (v) posters were hanging in the classroom; (vi) textbooks were taken home on a weekly basis; (vi) all project documentation was neatly organized. In addition, RBF verification agents also noted the number of weekly classroom tests and verified whether at least 75 percent of all students had received a
  19. 19. 19 schools in the first six months of the intervention (see Table A2). When we repeated our main analysis considering the compliant schools only (Imbens and Agrist 1994), the effects were somewhat larger in magnitude but qualitatively the same. This finding aids the internal validity of our results. In addition, we also sought to assess compliance post hoc using questions that we added to the end-line student and teacher surveys. More specifically, we asked about students’ use of textbooks at home. We find that 81 percent of surveyed students in treatment schools reported having taken home a textbook in the past month, versus 39 percent of surveyed students in control schools. These data allow us to triangulate the compliance/verification data collected by AAP. Further, they lend support to the notion that the new routine effectively led students in treatment schools to make greater use of textbooks at home. IV. Primary Results In this section, we first explore how the “textbook for self-study” routine affected the students’ results on the French and math tests that we designed. We then turn to students’ national exam results, and the sit and pass rates at the national exam (TENAFEP). In the Appendix (Figure A1), we present the frequency distribution of student test scores, separately for the treatment and control schools and at baseline and end-line. Student achievement in both control and treatment schools improved substantially between 5th and 6th grade, in particular at the lower end of the distribution. The 6th grade is a key year, as star on a regular basis. Taken together, these verification data allowed us to gain detailed understanding of overall compliance and to early detect (and attempt to remediate) schools that were experiencing difficulties.
  20. 20. 20 at the end of this year there is the national exam required to continue to secondary school, so the improvement is expected. The distributions of baseline results look very similar in the control and treatment schools, though, and there is no statistically significant difference in unconditional average baseline test scores across the two groups. For end-line test results, the distributions look more different for French, and the unconditional average score in the treatment group is significantly higher than in the control group in this subject. There is a strong positive correlation between baseline and end-line test scores at the individual level, which suggests meaningful baseline tests despite lower-than-expected test results. IV.A. French and Math Test Results Following our pre-analysis plan, we estimate the effects of the “textbook for self- study” routine using the following linear model19 : 𝑌𝑖,𝑗,𝑑,𝑠,𝑡+1 = 𝛼 + 𝛽𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑗 + 𝜋𝑌𝑖,𝑗,𝑑,𝑠,𝑡 + 𝛾𝜙𝑖𝑑𝑡 + 𝜖𝑖,𝑗,𝑠,𝑡+1 , (2) where Yijdst is student i’s standardized test score20 in school j in district d in subject s at time t; Treatment is the indicator variable for being in a treatment school; and 𝜙 is a vector of district and enumerator fixed effects.21 Each enumerator was assigned both treatment and control schools and ran over 100 student tests on average, allowing us to remove enumerator fixed effects from all empirical specifications. This helps to address the concern that enumerators may have induced differential measurement error, for instance by helping 19 The analysis follows the pre-analysis plan published on May 8, 2018, at the AEA RCT Registry. The RCT ID is AEARCTR-0001845. 20 Using the mean and standard deviation of the control group (all pupils, not only those matched). Given the rather big difference between baseline and end-line scores, we use the baseline control group for baseline data and the end-line control group for end-line data. Using the baseline control group for end-line leads to inflated effect sizes. 21 In principle, randomization should take care of the well-known challenge that knowledge is a cumulative process that depends on the full history of a subject’s exposure to educational input, not just a time-limited recent intervention. Including baseline test results as sufficient statistics for prior input into learning should thus not be as essential in this case (Todd and Wolpin 2003). Having baseline results, though, helps us reduce noise from unobserved heterogeneity within our sample.
  21. 21. 21 students comprehend questions on the test or influencing the way the intervention was grounded from the start (Crossley et al. 2017; West and Blom 2017). For every specification, we confirmed that the coefficients on the enumerator dummies were indeed significantly different from zero. Error terms are clustered at the school level to take into consideration intra-cluster correlations. In an alternative specification, we additionally include the four variables that were slightly unbalanced in the baseline data. As shown in Table 2, we find that students who were in the treatment schools scored 0.260σ to 0.274σ higher in French compared with students in the control group schools (Columns 1–3), but scored only marginally, 0.094σ to 0.106σ, and not significantly higher in math (Column 4–6). For both French and math, we report first the base model regression results, then the estimates when controlling for the four variables found to differ between treatment and control schools at baseline, and finally additionally when including the inverse probability weights to adjust for potential attrition bias. Our major findings hold across all three empirical specifications. Estimated coefficients in different models are also illustrated in Figure 1. In addition to presenting impacts on standardized total scores, we also present impacts on different domains of subject-level competencies, in Figure 2, using the model with inverse probability weights. The effects are positive and significant across the two domains covered by our test of French language competencies (sentence completion and retrieving explicitly provided information). The effects on results across the three math- specific domains (decimals, sequencing, and word-based math problems) are non- significant. Drawing on prior research, we highlight several plausible explanations for why the textbook routine produced a positive impact on students’ achievement in French but not in
  22. 22. 22 math. First, this discrepancy may well reflect known differences in the knowledge accumulation process between these two subjects (de Jong 2015). In math the knowledge elements build upon each other, whereas in French the knowledge elements are more parallel. Therefore, when the overwhelming majority of students do not pass a proficiency threshold in math (World Bank 2018) or lack understanding of very basic math concepts, as is the case in our setting, the marginal benefit of using a grade-appropriate math textbook may be especially small. Second, for students to make measurable progress in math, the textbook routine may have been insufficient. Indeed, prior work has argued that more help and supervision in school and/or at home is typically needed for students to progress in math (Lee and Barro 2001; Marcotte 2007). Third, this discrepancy may well reflect differences in the textbooks’ readability. Language textbooks (French in the case of DRC) tend to be written at a more rudimentary level than subject textbooks (Chimombo 1989; Milligan, Clegg, and Tikly 2016), and thus also at a more appropriate level for students who lag behind. Since the math textbooks were written in the French language, there was also a double treatment of French language exposure. Note that, since our intervention obliged all students to take home both a math and a French textbook at least once per week, we can refute that differential exposure to these textbooks drives the discrepancy. IV.B. TENAFEP Turning to the high-stakes national exam, we estimate an equation of similar form to Equation (2) to evaluate the effects of the intervention on students’ exam results. Table 3 shows the effects of the intervention on these results for students who passed the TENAFEP (the data doesn’t reveal information on test scores for students failing the test). We find that the intervention increased the average exam score by around 0.27σ. This corroborates the earlier finding that the textbook for self-study intervention effectively improved student
  23. 23. 23 learning. In Columns 1 and 2, we only look at the students who were re-identified and for whom we have baseline test scores in French and math. In Columns 3 and 4, we look at all the students who passed TENAFEP in the schools for which we have data. The results of Columns 3 and 4 do not include individual controls beyond gender (the only piece of information available from the TENAFEP registers) and likely include students who were not exposed to the full course of the intervention because they only joined in 6th grade. Furthermore, in three schools there were two 6th grade classes and there is a risk that the students taking the test were not in the class that benefited from the intervention. On the other hand, attrition is not a concern in Columns 3 and 4. For all these reasons, it is reassuring to see that the results of Columns 3 and 4 are generally in line with the results of Columns 1 and 2 (albeit with somewhat larger standard errors). To investigate the effects of the intervention on the likelihood that students at all take the TENAFEP, we use school-level data. This is for two reasons. Firstly, there is a high risk that by using student surveys we would miss many of the students who did not take the test; indeed, those tend to drop out or stop showing up in class around the time of TENAFEP, which was before our end-line survey. Second, the TENAFEP data that we have does not indicate whether students who are not on the lists did not take the test or failed it, and our own survey question proved unreliable.22 To compute the school-level sit and pass rates indicators, we used data on enrollment in semester two, a few months before TENAFEP. Results, presented in Table 4, show that the intervention significantly raised the share of students taking the exam (Columns 1 and 2) but had no significant impact on the pass rate. Put differently, even though the intervention led more students to take the exam, the average 22 We only have TENAFEP data on the students who passed it and our survey question on whether the student took the TENAFEP led to confusion: we used the French verb ‘passer’, which should be understood as taking the test but is also often understood as passing it.
  24. 24. 24 TENAFEP pass rate did not drop; rather, it remained the same. This makes the effect on TENAFEP from the intervention even more encouraging, as the increase in the share of students sitting the exam likely draws from students from the lower end of the achievement distribution. Despite this, more students passed the exam, and among those that did, the average score was higher than in the control schools. V. Additional Results V.A. Habits, Attitudes, and Aspirations In this section we provide some additional evidence to shed light on the impact of the “textbook for self-study” routine on students’ study habits and attitudes with regards to further study and career aspirations, drawing on our own survey data. In the student questionnaires, we first enquired about students’ textbook usage and homework routines, thereafter about their aspirations for the future and attitudes towards school. We estimate an equation similar to equation (2) with the set of four control variables used above and inverse probability weights to adjust for attrition bias, and we control for baseline values. We also provide standard errors adjusted for multiple hypothesis testing. Table 5 reveals that students in the treatment schools say they spend slightly less time on homework, thereby ruling out the possibility that the impact of the new routine simply came from a higher homework load. On the other hand, students in treatment schools found the textbooks significantly more useful for learning but were not more likely to use them in the classroom. With respect to future aspirations, students in treatment schools were almost 14 percent more likely to aspire to a non-manual job and they were more motivated to go to school. They were also more likely to say that they wanted to continue to secondary school, though this effect was not statistically significant. Our hypothesis is that the
  25. 25. 25 intervention succeeded in not only raising student achievement but also encouraging more students to take the TENAFEP in part because it generated a more positive attitude toward textbooks and more ambitious study and job aspirations (TENAFEP being a prerequisite for most non-manual jobs and secondary school). We also checked whether the intervention was associated with teachers using the textbooks more or more efficiently, but nothing suggested this was the case. V.B. Effect Size and Cost-Effectiveness The ITT effect of 0.26 to 0.27σ that we find for French language falls in the middle of the distribution of estimated effect sizes presented in the overview by Kremer, Brannen, and Glennerster (2013). Glewwe, Kremer, and Moulin (2009) and Sabarwal, Evans, and Marshak (2014) evaluate textbook allocation projects in Kenya and Sierra Leone, respectively, but neither finds a significant impact of textbooks on average student learning. For stronger students, though, Glewwe, Kremer, and Moulin (2009) estimate an impact of 0.22σ for the fifth quintile and 0.14σ for the fourth quintile after one year of exposure.23 It should be noted that our intervention measures what can be thought of as the “intensive margin” of school inputs, making more use of existing textbooks, not the extensive margin – the impact of providing more books. We have not been able to identify directly comparable impact evaluations in the literature. 23 Other studies have analyzed alternative pedagogical tools, such as flip charts (Glewwe et al. 2004) and multilevel learning materials (Tan, Lane, and Lassibille 1999). While the former finds no significant impact, the latter finds a very high impact on English in a Filipino context of a combination of multilevel learning materials and enforced parent–teacher partnerships (between 0.75σ and 1.05σ). Recently, technology-driven interventions geared toward teaching at the right level have shown potential in poor but stable environments. Banerjee et al. (2007) found that a computer-aided learning program that provided two hours per week of math instruction improved test scores by 0.48σ after two years. Muralidharan, Singh, and Ganimian (2019) estimate ITT effects of 0.37σ in math and 0.23 in Hindi over a 4.5-month period from a personalized technology-aided after-school instruction program in urban India. Implementing something similar in a fragile rural setting such as Eastern DRC would be very challenging.
  26. 26. 26 Cost-effectiveness, and even low absolute costs, are particularly relevant in very poor and fragile settings such as DRC. Based on operating expenditures, we estimate the cost per student at US$17 in our treatment subsample. This includes the direct costs for the incentives to schools (roughly US$9 per student), i.e. school material shared with students over the school year and a flat compensation of US$120 per school. It also includes the costs of setting up the intervention and monitoring compliance, including primarily costs of human resources and project management but also some small expenditures on survey material. These costs were in addition to the costs of the RBF program itself. Based on our most conservative estimate of impact (0.26σ), this suggests that US$100 yields 1.53σ improvement in test scores, or alternatively that US$65 would achieve an improvement of 1σ. This compares very favorably with the 30 randomized controlled trials (RCTs) evaluated for cost efficiency in Kremer, Brannen, and Glennerster (2013). A significant part of the costs of this intervention are associated with the need to monitor schools for compliance and with distributing financial compensation. The planned scaling up of RBF to 12 provinces covering 1,350 primary schools may be helpful in this case, as the textbook routine could be embedded in that broader system of monitoring and incentives, and it should be possible to reduce costs per school. This being a bundled intervention, it is difficult to separate the roles of the financial incentives versus the non- pecuniary incentives through the star system, but the direct costs of compensation to the schools for the financial incentives were slightly more than half of total costs. If scaled up, reducing financial compensation and stressing more non-pecuniary incentives, together with reduced monitoring costs through the RBF system, could potentially almost double cost efficiency. This is assuming, though, that the impact carries over also if scaled up, which of course cannot be guaranteed (see e.g. Bold et al. 2018).
  27. 27. 27 The compliance data also suggest that initial concerns about books disappearing were not well founded. The average (median) number of textbooks that disappeared or were damaged over the course of the intervention was, respectively 6.71 (3) in the first semester of the intervention and 5.3 (3.5) in the second semester. V.C. Heterogeneous Treatment Effects The main results in Section III.A address the impact of the intervention on average test- score results. In this section we turn to analyzing the impact across different groups of students and teachers. This analysis should be interpreted with some care, as we are restricted in sample size and therefore statistical power. Differences in impact would have to be quite substantial to be deemed statistically significant, so we cannot rule out differences of smaller size. All tables are presented in Appendix A. In Table A3 we investigate whether impact varied with the status of the students. It is possible, for instance, that academically stronger students, students from households of higher socioeconomic status, and those with fewer work responsibilities at home had greater opportunities to benefit from the intervention. On the other hand, it is also possible that weaker students may stand the most to gain from being trusted with a book and given an opportunity to catch up in learning with stronger students. We therefore study variation by gender, status as vulnerable, age (older students have typically repeated grades), and baseline test scores as a good summary statistic of prior inputs into education using a linear interaction in the model with inverse probability weights. The interaction effects are statistically insignificant across the models, with the exception of the case of the best students who are less likely to see their test scores in both French and math increase as a result of the intervention. Viewed from an equity standpoint, these results suggest that the intervention did not unintentionally reinforce any gender or age discrimination or biases
  28. 28. 28 against the academically weakest students in class – if anything it benefited most those who are academically relatively weaker. Next, we investigate whether the impact varied with teacher characteristics, notably teachers’ teaching efficacy, years of teaching experience, and teacher competence in math and French as measured by their score on the student math and French tests. Teacher competence has been shown to be low across Sub-Saharan Africa and to negatively impact student learning (Bold et al. 2017). Results in Table A4 tentatively suggest that the new routine if anything had a larger impact on French and math test scores in schools with comparatively less efficient teachers. This should be interpreted with some caution, but the results are consistent with the notion that textbooks can somewhat compensate for weak in- class learning in the French language. In sum, these results tentatively suggest that students generally benefited in French language skills from the intervention, but the academically weaker students somewhat more so. The results also tentatively suggest that textbooks at home and self-learning may be particularly beneficial for students in schools with weaker teachers. VI. Conclusion In this paper, we have investigated whether a simple incentives-only scheme designed to encourage students to regularly take home textbooks for self-study can increase student learning, as well as their study and career aspirations. Using a randomized field experiment implemented in South Kivu in DRC, we find evidence that a “textbooks for self-study” scheme led to significant test-score gains in French, but not in math, and in the end of primary school national exam. It also led more students to successfully take the exam, which is a prerequisite for continuing to secondary education. Furthermore, the scheme led
  29. 29. 29 students to perceive textbooks more useful, and express more ambitious study and career aspirations. Lastly, while the average impact of the scheme was strong and robust to various specifications, such as the inclusion of inverse probability weights to adjust for a high level of attrition, the scheme also seems to have had a somewhat greater impact on the learning outcomes of the relatively weaker students and those in classrooms with relatively lower- performing teachers. Though tentative given the restricted size of the available sample, these results lend support to the notion that such a “textbooks for self-study” incentive scheme can compensate for weaker teacher quality and can especially help students who are comparatively lagging behind. Our paper expands the set of strategies available to donors and developing-country governments to increase student learning outcomes and effectively address the serious “learning crisis” (World Bank 2018). Our findings suggest that student-incentive-only programs can produce a measurable impact on students’ achievement and future aspirations. They are consistent with a set of recent papers that have emphasized the complementarity between incentives to stimulate effort and accountability and school inputs such as textbooks (e.g. Gilligan et al. 2018; Mbiti et al. 2019), arguing that the combination of the two is of great importance in generating significant learning gains. The fact that the intervention relies mostly on non-monetary incentives is attractive, as this renders the scheme more feasible also in severely resource-constrained and fragile environments. Furthermore, in focusing on student incentives, it intentionally circumvents many of the deep structural problems – notably the weak capacity of primary school teachers – that characterize the education sector in DRC as in many other fragile states. Even at the fairly small scale at which the “textbook for self-study” scheme was implemented, the intervention seems to be cost efficient. Based on the most moderately
  30. 30. 30 estimated effect of 0.26σ, we estimate that $US100 yields a 1.53σ improvement in test scores. If implemented at a larger scale, some fixed costs could be reduced, and the low number of lost books suggests that the compensation to schools could be lowered. Raising student learning in rural DRC and other similar fragile settings to levels reflecting the ambitions and goals set in the official curriculum and global targets necessitates substantial improvements in most aspects relevant for knowledge production. This includes household-level inputs at early age, teacher quality, school management and accountable and coordinated education policies (Brandt 2017). Such an effort requires substantial resources, both financial and human, and depends strongly on the ability to generate stability and inclusive growth. To start identifying cost-efficient interventions that largely rely on clearer incentives and more efficient use of existing resources can be a first step, though, towards making a difference in the lives of students. For scaling up, more experimentation can help the scheme to reach its full potential and design the most cost-efficient implementation. Training teachers in the use of textbooks both in the classroom and for self-study could potentially increase the impact of the routine. This could include using textbooks from different grades as a diagnostic tool to teach students at their individual level, thus avoiding some of the problems associated with overambitious curricula (e.g. Beatty and Pritchett 2012). Finally, limited resources prevented us from analyzing properly the relative role of financial versus non-financial incentives for students and schools. For cost effectiveness, understanding what primarily drives change in behavior, and how to secure compliance through regular monitoring or other means, becomes essential when scaling up this intervention. Our study raises intriguing questions for future research. What are other approaches (peer-to-peer learning, for example) that sidestep the challenge of poor teacher quality and
  31. 31. 31 weak parental support, and how can they be leveraged to improve learning in primary schools? How might the impact of a textbook intervention differ if one considered a younger cohort? How might a change in incentives impact outcomes? Can our main findings be replicated in other settings? We defer these questions to further study.
  32. 32. 32 References Ashraf, Nava, Oriana Bandiera, and Kelsey B. Jack. 2014. “No Margin, No Mission? A Field Experiment on Incentives for Public Service Delivery.” Journal of Public Economics 120: 1–17. Banerjee, Abhijit, Shawn Cole, Esther Duflo, and Leigh Linden. 2007. “Remedying Education: Evidence from Two Randomized Experiments in India.” Quarterly Journal of Economics 122 (3): 1235–64. Beatty, Amanda, and Lant Pritchett. 2012. “The Negative Consequences of Overambitious Curricula in Developing Countries.” Harvard Kennedy School of Government, Scholarly Articles 9403174. Behrman, Jere R., Susan W. Parker, Petra E. Todd, and Kenneth I. Wolpin. 2015. “Aligning Learning Incentives of Students and Teachers: Results from a Social Experiment in Mexican High Schools.” Journal of Political Economy 123 (2): 325–64. Beland, Louis-Philippe, and Richard Murphy. 2016. “Ill Communication: Technology, Distraction & Student.” Labor Economics 41: 61–76. Blimpo, M. P., and D.K. Evans. 2011. “School-Based Management and Educational Outcomes: Lessons from a Randomized Field Experiment.” Unpublished. Bold, Tessa, Deon Filmer, Gayle Martin, Ezequiel Molina, Brian Stacy, Christophe Rockmore, and Jakob Svensson. 2017. “Enrollment without Learning: Teacher Effort, Knowledge, and Skill in Primary Schools in Africa.” Journal of Economic Perspectives 31: 185–204.
  33. 33. 33 Bold, Tessa, Mwangi Kimenyi, Germano Mwabub, Alice Ng’anga, and Justin Sandefur. 2018. “Experimental Evidence on Scaling Up Education Reforms in Kenya.” Journal of Public Economics 168: 1–20. Brandt, Cyril Owen. 2016. Review of Teacher Remuneration in the DRC. Consultancy report for the ACCELERE! project. Cambridge: Cambridge Education. Brandt, Cyril Owen. 2017. “Ambivalent Outcomes of State building: Multiplication of Brokers and Educational Expansion in the DR Congo (2004–2013).” Review of African Political Economy 44 (154): 624–42. Brandt, Cyril Owen. 2018. “Illegibility as a State Effect: The Limits of Governing Teacher Identification in the Democratic Republic of Congo.” PhD thesis, unpublished, University of Amsterdam. Burde, Dana, and Leigh L. Linden. 2013. “Bringing Education to Afghan Girls: A Randomized Controlled Trial of Village-Based Schools.” American Economic Journal: Applied Economics 5 (3): 27–40. Burde, Dana, Amy Kapit, Rachel L. Wahl, Ozen Guven, and Margot Igland Skarpeteig. 2017. “Education in Emergencies: A Review of Theory and Research.” Review of Educational Research 87 (3): 619–58. Chimombo, Moira. 1989. “Readability of Subject Texts: Implications for ESL Teaching in Africa.” English for Specific Purposes 8 (3): 255–64. Crossley, Thomas, Tobias Schmidt, Panagiota Tzamourani, and Joachim K. Winter. 2017. “Interviewer Effects and the Measurement of Financial Literacy.” University of Essex ISER Working Paper Series 2017-06.
  34. 34. 34 De Herdt, Tom, and Kristof Titeca. 2016. “Governance with Empty Pockets: The Education Sector in the Democratic Republic of Congo.” Development and Change 47 (3): 472–94. de Jong, John H.A.L. 2015. “Why Learning English Differs from Learning Math and Science.” Pearson blog, October 20, from-learning-maths-science-gse. Dobbie, Will, and Roland G. Fryer Jr. 2013 "Getting beneath the veil of effective schools: Evidence from New York City." American Economic Journal: Applied Economics 5(4): 28-60. Duflo, Esther, Pascaline Dupas, and Michael Kremer. 2011. “Peer Effects, Teacher Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in Kenya.” The American Economic Review 101: 1739–74. Duflo, Esther, Pascaline Dupas, and Michael Kremer. 2015. “School Governance, Teacher Incentives, and Pupil–Teacher Ratios: Experimental Evidence from Kenyan Primary Schools.” Journal of Public Economics 123: 92–110. Fredriksen, Birger, Sukhdeep Brar, and Michael Trucano. 2015. “Getting Textbooks to Every Child in Sub-Saharan Africa: Strategies for Addressing the High Cost and Low Availability Problem.” Washington, DC: World Bank. Fryer, Jr Roland G. 2011. Financial incentives and student achievement: Evidence from randomized trials. The Quarterly Journal of Economics 126 (4): 1755-1798. Gilligan, Daniel O., Naureen Karachiwalla, Ibrahim Kasirye, Adrienne M. Lucas, and Derek Neal. 2018. “Educator Incentives and Educational Triage in Rural Primary Schools.” National Bureau of Economic Research Working Paper 24911.
  35. 35. 35 Glewwe, Paul, and Karthik Muralidharan. 2016. “Improving Education Outcomes in Developing Countries: Evidence, Knowledge Gaps, and Policy Implications,” in Handbook of the Economics of Education, Volume 5, edited by Eric A. Hanushek, Stephen Machin, and Ludger Woessmann, 653–743. Amsterdam: Elsevier. Glewwe, Paul, Michael Kremer, and Sylvie Moulin. 2009. “Many Children Left Behind? Textbooks and Test Scores in Kenya.” American Economic Journal: Applied Economics 1 (1): 112–35. Glewwe, Paul, Michael Kremer, Sylvie Moulin, and Eric Zitzewitz. 2004. “Retrospective vs. Prospective Analyses of School Inputs: The Case of Flip Charts in Kenya.” Journal of Development Economics 74 (1): 251–68. Hanushek, Eric A. 1979. “Conceptual and Empirical Issues in the Estimation of Educational Production Functions.” Journal of Human Resources 14 (3): 351–88. IMF (International Monetary Fund). 2015. “Democratic Republic of Congo: Selected Issues.” IMF Country Report 15/281. Jacob, Brian A., and Steven D. Levitt. 2003. “Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating.” Quarterly Journal of Economics 118 (3): 843–77. Kivu Security Tracker. 2018. “Monthly Report: June 2018.” Accessed June, 25, 2018. Kremer, Michael, Conner Brannen, and Rachel Glennerster. 2013. “The Challenge of Education and Learning in the Developing World.” Science 340: 297–300.
  36. 36. 36 Lee, Jong-Wha, and Robert J. Barro. 2001. “Schooling Quality in a Cross-Section of Countries.” Economica 68: 465–88. Marcotte, Dave E. 2007. “Schooling and Test Scores: A Mother-Natural Experiment.” Economics of Education Review 26: 629–40. Marivoet, Wim, and Tom De Herdt. 2015. “Poverty Lines as Context Deflators: A Method to Account for Regional Diversity with Application to the Democratic Republic of Congo.” Review of Income and Wealth 61 (2): 329–52. Martinelli, César, Susan W. Parker, Ana Cristina Pérez-Gea, and Rodimiro Rodrigo. 2018. “Cheating and Incentives: Learning from a Policy Experiment.” American Economic Journal: Economic Policy 10 (1): 298–325. Mbiti, Isaac, Karthik Muralidharan, Mauricio Romero, Youdi Schipper, Constantine Manda, and Rakesh Rajani. 2019. “Inputs, Incentives, and Complementarities in Education: Experimental Evidence from Tanzania.” National Bureau of Economic Research Working Paper 24876. Milligan, Lizzi O., John Clegg, and Leon Tikly. 2016. “Exploring the Potential for Language Supportive Learning in English Medium Instruction: A Rwandan Case Study.” Comparative Education 52 (3): 328–42. Milligan, Lizzi O., Leon Tikly, Timothy Williams, Jean-Marie Vianney, and Alphonse Uworwabayeho. 2017. “Textbook Availability and Use in Rwandan Basic Education: A Mixed-Methods Study.” International Journal of Educational Development 54: 1–7.
  37. 37. 37 Muralidharan, Karthik, Abhijeet Singh, and Alejandro J. Ganimian. 2019. “Disrupting Education? Experimental Evidence on Technology-Aided Instruction in India.” American Economic Review 109: 1426-1460. OECD (Organisation for Economic Co-operation and Development). 2015. The ABC of Gender Equality in Education: Aptitude, Behaviour, Confidence. Paris: OECD Publishing. Orkin, Kate. 2013. “The Effect of Lengthening the School Day on Children’s Achievement in Ethiopia.” Young Lives Working Paper 119. Panorama Education. 2015. User Guide: Panorama Teacher Survey. Boston: Panorama Education. PASEC. 2015. PASEC2014. Education System Performance in Francophone Sub- Saharan Africa: Competencies and Learning Factors in Primary Education. Dakar: PASEC. Read, Tony. 2015. Where Have All the Textbooks Gone? Toward Sustainable Provision of Teaching and Learning Materials in Sub-Saharan Africa. Washington, DC: World Bank Group. Sabarwal, Shwetlena, David K. Evans, and Anastasia Marshak. 2014. “The Permanent Input Hypothesis: The Case of Textbooks and (No) Student Learning in Sierra Leone.” World Bank Policy Research Working Paper WPS 7021. Tan, Jee-Peng, Julia Lane, and Gerard Lassibille. 1999. “Student Outcomes in Philippine Elementary Schools: An Evaluation of Four Experiments.” World Bank Economic Review 13 (3): 493–508.
  38. 38. 38 Todd, Petra, and Kenneth I. Wolpin. 2003. “On the Specification and Estimation of the Production Function for Cognitive Achievement.” The Economic Journal 113: 3–33. UNESCO. 2014. Education for All Global Monitoring Report. Paris: UNESCO. West, Brady T., and Annelies G. Blom. 2017. “Explaining Interviewer Effects: A Research Synthesis.” Journal of Survey Statistics and Methodology 5: 175–211. Wooldridge, J. 2007. “Inverse Probability Weighted Estimation for General Missing Data Problems.” Journal of Econometrics 141: 1281–301. World Bank. 2015. Public Expenditure Review of the Education Sector in the Democratic Republic of Congo: An Efficiency, Effectiveness and Equity Analysis. Washington, DC: World Bank. World Bank. 2017. “Implementation Completion and Results Report (TF-14253, TF- 14358).” Report No ICR00004233. Washington, DC: World Bank. World Bank. 2018. World Development Report 2018: Learning to Realize Education’s Promise. Washington, DC: World Bank.
  39. 39. 39 Figures and Tables Figure 1: ITT Effect on Student Achievement for Different Model Specifications Figure 2: ITT Effects by Specific Competence Assessed (Model Is Cluster + Controls + IPW)
  40. 40. 40 TABLE 1--SAMPLE STATISTICS, BALANCE ON OBSERVABLES, AND ATTRITION (1) (2) (3) (4) Control groupα Difference treatment groupβ Attritionπ Attrition* treatmentγ Age (years) 11.860 (0.096) 0.234 (0.115) 0.615 (0.116) -0.225 (0.139) Female 0.476 (0.023) 0.053 (0.027) -0.019 (0.038) -0.019 (0.035) Literate mother 0.644 (0.037) 0.024 (0.044) 0.027 (0.045) 0.003 (0.040) Literate father 0.839 (0.025) -0.009 (0.033) 0.015 (0.032) 0.010 (0.031) Hours of (non-school) work per week 5.887 (0.341) -0.843 (0.422) 0.592 (0.373) -0.108 (0.311) Help with homework at home 0.273 (0.044) 0.010 (0.058) -0.027 (0.043) -0.014 (0.043) Minutes spent on homework per day 41.143 (1.934) 1.850 (2.507) 1.947 (2.934) 1.724 (3.579) Frequency of eating breakfast a 1.983 (0.115) -0.287 (0.163) 0.009 (0.167) 0.041 (0.133) Concern about violence at school b 1.960 (0.139) 0.280 (0.178) 0.077 (0.132) -0.186 (0.133) School-level Organized study time at school after school hours 0.040 (0.031) 0.017 (0.044) -0.076 (0.050) 0.005 (0.024) Teacher–student ratio 0.004 (0.001) 0.001 (0.002) -0.002 (0.002) 0.000 (0.001) Size of grade 5 class 59.451 (7.825) -0.487 (9.520) 2.906 (6.541) 2.194 (5.929) Overall share of vulnerable students 0.089 (0.006) 0.006 (0.007) -0.002 (0.006) -0.005 (0.005) Overall ratio of female over male students 1.050 (0.025) -0.010 (0.039) -0.005 (0.037) 0.009 (0.023) Share of revenues from RBF 3.127 (0.174) 0.104 (0.263) -0.196 (0.239) -0.101 (0.174) Number of classrooms with brick walls 9.888 (0.725) -0.333 (0.959) -0.219 (0.805) 0.034 (0.636) Regular meetings with parents 0.647 (0.087) 0.046 (0.108) 0.061 (0.089) -0.067 (0.055) Overall teacher and team performance 26.915 (1.686) 0.859 (2.848) 2.425 (2.625) -0.986 (2.080)
  41. 41. 41 Overall quality and continuity of teaching programs 7.565 (0.799) -0.905 (0.910) 0.157 (0.699) -0.595 (0.696) Quality of school leadership 3.765 (0.047) 0.010 (0.060) 0.025 (0.055) -0.068 (0.053) School climate 3.765 (0.054) 0.027 (0.058) 0.020 (0.057) -0.027 (0.048) Teacher’s years of experience 13.338 (1.782) 0.427 (2.459) -2.220 (2.861) 0.137 (1.329) Outcome variables French (/16) 3.015 (1.093) 0.071 (0.391) 0.062 (0.379) -0.691 (0.437) Math (/12) 1.970 (0.726) -0.069 (0.281) 0.330 (0.271) -0.399 (0.274) Notes: α, β, π, and γ are the coefficients indicated in Equation (1). Breakfast: Respondents indicate on a four-point scale how frequently they have breakfast, with 0 indicating “almost never,” 1 “once per week,” 2 “at least 2–3 days a week,” and 3 “every morning”; b Concern about violence: Respondents indicate on a five-point scale how frequently they are concerned about violence at school, with 1 indicating “almost never,” 2 “once in a while,” 3 “sometimes,” 4 “frequently,” and 5 “almost all the time.” TABLE 2--INTENT-TO-TREAT EFFECTS IN AN OLS REGRESSION FRAMEWORK Dependent variable: Standardized test scores (end-line) (1) (2) (3) (4) (5) (6) French French French Math Math Math Intervention (ITT) 0.274** 0.270** 0.258** 0.103 0.106 0.093 (0.125) (0.124) (0.125) (0.099) (0.100) (0.098) Baseline z- score 0.148*** 0.147*** 0.154*** 0.236*** 0.229*** 0.234*** (0.047) (0.047) (0.047) (0.043) (0.043) (0.043) Pupil controls No Yes Yes No Yes Yes Enum. FE Yes Yes Yes Yes Yes Yes IPW No No Yes No No Yes Observations 1498 1498 1498 1498 1498 1498 Adj. R- squared 0.255 0.261 0.267 0.212 0.221 0.226 Notes: Robust standard errors clustered by school in parentheses. Treatment is a dummy variable indicating whether student was attending a treatment school. Tests in French and math were designed to cover wide ranges of achievement and to be linked between baseline and end-line assessments using common items. Scores were standardized to have a mean of 0 and standard deviation of 1 in the baseline. ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively.
  42. 42. 42 TABLE 3--EFFECTS ON STUDENT PERFORMANCE IN THE NATIONAL END-OF-6TH-GRADE EXAM (TENAFEP) Dependent variable: TENAFEP z-score Restricted sample (re-identified students only) All students in the school (1) (2) (3) (4) Intervention (ITT) 0.274** 0.270** 0.214 0.230* (0.137) (0.133) (0.159) (0.133) Control baseline test (French & math) Yes Yes N/A N/A Pupil controls (age, breakfast, hours of work) No Yes N/A N/A Gender control No Yes No Yes Enumerator controls (baseline only) Yes Yes Yes Yes School controls (pass rate y-1, mean score y-1, PBF subsidy y-1) No Yes No Yes District control Yes Yes Yes Yes School clustered SE Yes Yes Yes Yes Observations 806 806 2575 2500 R-squared 0.202 0.230 0.072 0.132 Notes: Standard errors in parentheses. a For ease of interpretation we use a linear probability model. District controls are included to account for the fact that TENAFEP scores are weighted at district level (giving a bonus to students from Shabunda for its remoteness). ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively. TABLE 4--EFFECTS ON NATIONAL END-OF-6TH-GRADE EXAM (TENAFEP) OUTCOMES AT SCHOOL LEVEL Dependent variable (OLS) (1) (2) (3) (4) Sit rate Sit rate Pass rate Pass rate Intervention 0.094* 0.090* −0.001 −0.004 (0.046) (0.046) (0.022) (0.022) Control baseline value Yes Yes Yes Yes School-level control (1) No Yes No Yes District control Yes Yes Yes Yes Observations 83+ 83 83 83 Adjusted R-squared 0.032 0.024 0.084 0.077 Notes: Standard errors in parentheses. + No end-line TENAFEP data were available for seven schools (five in the treatment group, two in the control group). See Table 3 for an explanation for the inclusion of a district control. ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively.
  43. 43. 43 TABLE 5 -- HABITS, ATTITUDES, AND ASPIRATIONS Dependent variable (OLS) (1) (2) (3) (4) (5) (6) Time spent on homework Finds school textbooks useful for learning Uses school textbooks in class Aspires to non-manual jobs a Wishes to go to secondary school Motivation to go to class Intervention -0.204* 0.226* 0.048 0.118* 0.034 0.230** (0.119) (0.118) (0.133) (0.068) (0.023) (0.102) Mean baseline value 1.397 3.830 0.454 0.910 3.910 (0.041) (0.068) (0.009) (0.013) (0.075) Bonferroni p-value 0.336 0.298 0.720 0.336 0.336 0.157 Control baseline value Yes Yes Yes No b Yes Yes Pupil controls (4) Yes Yes Yes Yes Yes Yes Enumerator fixed effects Yes Yes Yes Yes Yes Yes IPW Yes Yes Yes Yes Yes Yes Observations 1497 1389 1391 1497 1498 1497 Adjusted R-squared 0.103 0.143 0.132 0.381 0.044 0.113 Notes: Standard errors are clustered by school. a A non-manual job is defined as a job belonging to one of the following categories: artist, lawyer, politician, faith minister, and civil servant – including teachers but excluding military and police. b We do not have a baseline for pupils’ professional aspirations but given that randomization largely created balanced treatment and control groups this should not be a concern. ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively.
  44. 44. 44 Appendix A: Complementary Tables and Results TABLE A1--BALANCE ON SCHOOL-LEVEL CHARACTERISTICS (1) (2) Control group Difference treatment group1 Teacher supervision during after-school study time 0.053 (0.027) -0.005 (0.036) Teacher–student ratio 0.005 (0.001) -0.000 (0.002) Size of grade 5 class 52.556 (5.350) 0.893 (7.657) Overall share of vulnerable students 0.095 (0.006) 0.003 (0.009) Overall ratio of female over male students 1.037 (0.028) -0.011 (0.037) Share of revenues from RBF 3.263 (0.166) 0.091 (0.276) Number of classrooms with brick walls 9.573 (0.584) -0.166 (0.873) Regular meetings with parents 0.720 (0.068) 0.004 (0.096) Shabunda district 1.000 0.000 Overall teacher and team performance2 26.517 (1.848) 0.345 (2.840) Overall quality and continuity of teaching2 programs 7.390 (0.786) -0.847 (0.913) Quality of school leadership3 3.834 (0.052) -0.048 (0.068) School climate3 3.804 (0.058) -0.003 (0.067) Teacher level characheristics Years of work experience of teacher3 13.676 (1.704) 1.114 (2.473) Teacher’s score on own math test 18.250 (0.388) -0.417 (0.629) Teacher’s score on own French test 23.222 (0.387) 0.238 (0.492) Notes: 1. Using a simple OLS model where the outcome is regressed on the treatment dummy | 2. As evaluated through the RBF system (independent of this study) | 3. Based on survey with the principal, composite indicator (means of 5-item Lickert scales from 1 terrible to 5 excellent)
  45. 45. 45 TABLE A2--COMPLIANCE WITH THE INTERVENTION Semester 2 2017 Semester 3 2017 (1) (2) Books returned in good state 0.754 0.678 (0.277) (0.308) Registers are used [core] 0.667 0.622 (0.477) (0.490) Stars system is used [core] 0.663 0.63 (0.301) (0.311) Pictures displayed in the classroom [core] 0.8 0.778 (0.405) (0.420) Proportion of book withdrawals 0.75 0.726 (0.316) (0.334) Project documents are filed [core] 0.778 0.8 (0.420) (0.405) Stars obtained 0.681 0.629 (0.284) (0.310) Weekly test 0.64 0.53 (0.283) (0.257) Observations 45 45 Note: Standard deviations in parentheses.
  46. 46. 46 TABLE A3--HETEROGENEITY IN TREATMENT EFFECT BY STUDENTS’ GENDER, SOCIOECONOMIC STATUS, AGE, AND BASELINE TEST SCORE Dependent variable: Standardized test scores (end-line) (1) (2) (3) (4) (5) (6) (7) (8) Covariates Vulnerable Female Best half of the students Older pupils French Math French Math French Math French Math Treatment 0.284* * 0.105 0.252* 0.100 0.401** * 0.170 0.209 0.064 (0.120) (0.095) (0.150) (0.117) (0.144) (0.106) (0.132) (0.103) Covariate 0.143 0.086 0.009 0.156** 0.221* 0.242* ** 0.035 0.092 (0.183) (0.150) (0.091) (0.077) (0.119) (0.075) (0.108) (0.113) Interaction -0.313 -0.218 -0.000 -0.034 - 0.324** -0.191* 0.174 0.075 (0.228) (0.204) (0.116) (0.102) (0.148) (0.108) (0.121) (0.125) Controls+ yes yes yes yes yes yes yes yes Treatment plus Interaction ** *** Observations 1498 1498 1498 1498 1498 1498 1498 1498 Adjusted R- squared 0.268 0.230 0.266 0.229 0.272 0.231 0.267 0.229 Notes: Robust standard errors clustered by school in parentheses. All regressions include enumerator fixed effects, the four controls mentioned earlier, and inverse probably weighting to account for attrition issues. We also control for the dependent variables of the table (female, vulnerable, older pupils, and/or best half of students) that are not the covariate of interest. There is no correlation between being a weaker student and being vulnerable. Vulnerable are more likely to be male and younger. ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively.
  47. 47. 47 TABLE A4--HETEROGENEITY IN TREATMENT EFFECT BY TEACHER CHARACTERISTICS Dependent variable: Standardized test scores (end-line) (1) (2) (3) (4) (5) (6) (7) (8) Covariates Teaching efficacy Teaching experience Teacher competence in French Teacher competence in math French Math French Math French Math French Math Treatment 0.378** 0.189 0.454* -0.190 0.422* 0.195 0.397 0.288 (0.165) (0.137) (0.231) (0.189) (0.245) (0.175) (0.273) (0.204) Covariate 0.070 0.020 0.010 -0.208 0.460 0.251 0.015 0.182 (0.245) (0.191) (0.267) (0.208) (0.325) (0.226) (0.296) (0.204) Interaction -0.615* -0.527* -0.351 0.519* -0.348 -0.058 -0.300 -0.303 (0.314) (0.276) (0.367) (0.297) (0.426) (0.296) (0.455) (0.334) Treatment plus Interaction * Observations 1498 1498 1467 1467 1305 1305 1305 1305 Adjusted R- squared 0.287 0.243 0.275 0.237 0.273 0.245 0.267 0.244 Note: The covariates are all binary variables and take the value of 1 for an above-the-median score. Robust standard errors clustered by school in parentheses. All regressions include enumerator fixed effects, the four controls mentioned earlier, and inverse probably weighting to account for attrition issues. ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively.
  48. 48. 48 Table A5: Heterogeneity between RBF and non-RBF schools in 2015 Non-RBF RBF t-test difference Number of qualified teachers 8.241 9.667 1.322** (3.288) (4.227) Teachers at lowest level of qualification 0.0296 0.0556 0.026 (0.170) (0.313) Teachers per pupil 0.0290 0.0246 -0.004** (0.0130) (0.00845) Support staff per pupil 0.00579 0.00459 -0.001 (0.00716) (0.00267) Girls to boys (RBF indicator) 0.908 1.007 0.099*** (0.208) (0.201) Classrooms 8.305 9.167 0.870 (3.631) (4.144) Classrooms in a good state 0.0936 0.0333 -0.082 (0.642) (0.181) Seats 352.5 360.8 -0.462 (196.3) (218.0) Seats per pupil 1.201 0.877 -0.323*** (0.738) (0.423) Female head of school 0.0690 0.0556 -0.022 (0.254) (0.230) ‘Payé and Mécanisée’ (RBF selection criteria) 0.773 0.900 0.127** (0.420) (0.302) TENAFEP fail rate 0.273 0.238 -0.027 (0.215) (0.218) TENAFEP mean score 59.30 58.61 -1.149 (4.924) (4.575) TENAFEP mean standard deviation 3.661 3.504 -0.063 (1.669) (1.405) N 201 90 Note: Standard deviation in parenthesis. Significance level * p<0.05 ** p<0.01 *** p<0.001. For TENAFEP data we only have 158 schools in total, of which 51 are RBF.
  49. 49. 49 FIGURE A1--DISTRIBUTION OF TEST SCORES AT BASELINE AND END-LINE Kolmorgorof-Smirnoff (equality of distribution): 0.0406 (p=0.571) Kolmorgorof-Smirnoff (equality of distribution): 0.0664 (p=0.075)
  50. 50. 50 Kolmorgorof-Smirnoff (equality of distribution): 0.0567 (p=0.182) Kolmorgorof-Smirnoff (equality of distribution): 0.0530 (p= 0.245)