Authors: Anders Olofsgård, Jean-Benoit Falisse and Marieke Huysentruyt
Using a randomized field experiment in South Kivu, we study the impact of a simple “textbooks for self-study” incentive scheme targeting primary school students on student achievement. Students in the treatment schools scored 0.26σ higher in French but did no better in math. They were more likely to take the high-stakes end-of-6th-grade national exam and those who passed the test obtained higher scores. The largest positive impact was found in schools with lower teaching-efficacy and for lower-ability students. Our results demonstrate that incentives-only programs designed to intensify and diversify students’ use of existing school resources can sharply improve student achievement.
Incentivizing Textbooks for Self-Study: Experimental Evidence from the DRC
Incentivizing Textbooks for Self-Study:
Experimental Evidence from the Democratic Republic of
Jean-Benoit Falisse˥ Marieke Huysentruyt†‡ Anders Olofsgård ‡⁋
Date of this version: February 28, 2020
Using a randomized field experiment in South Kivu, we study the impact of a simple “textbooks for
self-study” incentive scheme targeting primary school students on student achievement. Students in
the treatment schools scored 0.26σ higher in French but did no better in math. They were more likely
to take the high-stakes end-of-6th-grade national exam and those who passed the test obtained higher
scores. The largest positive impact was found in schools with lower teaching-efficacy and for lower-
ability students. Our results demonstrate that incentives-only programs designed to intensify and
diversify students’ use of existing school resources can sharply improve student achievement.
JEL codes: C93, I21, J24, O15
* We are deeply grateful for the support from Cordaid to facilitate this research. We are particularly indebted
to Adolphe Malanga, Emmanuel Muzi, Liliane Rwezangabo Milimbo, Kamala Kaghoma and Patrice Kiziba
Yafali for excellent project management, supervision of surveys and data gathering, and to Alinda Bosch,
Mamadou Sylla, Christian Bahizi and Manuela Jansen for continuous support and facilitation. We also thank
Bluesquare (and especially Julie Vanhamme and Antoine Legrand) for their assistance with the RBF data and
the surveys. We are also grateful to Abhijeet Singh, Pamela Campa, and Dagmara Celik Katreniak, and
seminar participants at the S&O 2019 Research Day at HEC Paris, the ASWEDE 2019 Research Conference
at Uppsala University, the 2019 NOVAFRICA Conference in Lisbon, the 2019 CSAE Conference in Oxford,
and Stockholm School of Economics for helpful comments. Excellent research assistance was provided by
Edvard Von Sydow. Finally, this research was made possible by the generous financial support from the World
Bank, through the Results in Education for All Children (REACH) program. The analysis follows the pre-
analysis plan published on May 8, 2018, at the AEA RCT Registry. The RCT ID is AEARCTR-0001845.
˥ University of Edinburgh, Edinburgh Futures Institute & Centre of African Studies (firstname.lastname@example.org).
† HEC Paris, Strategy and Business Policy (email@example.com).
‡ Stockholm Institute of Transition Economics (SITE).
⁋ Stockholm School of Economics (firstname.lastname@example.org).
Over the past decades, numerous programs and billions of dollars in national taxpayer and
donor funds have been focused on making textbooks widely available in primary schools,
including in the world’s poorest and most fragile settings, such as the Democratic Republic
of Congo (DRC) (Fredriksen, Brar, and Trucano 2015; Read 2015; World Bank 2018). Yet
the evidence of the impact of textbooks on student achievement is at best equivocal, when
not disappointing (Glewwe, Kremer, and Moulin 2009; Sabarwal, Evans, and Marshak
2014; Glewwe and Muralidharan 2016). One potential problem is that these textbook
distribution programs rarely foresee incentives for teachers or students to actually make use
of the textbooks in schools, let alone to intensify or diversify their use, say by leveraging
textbooks as a support for student self-study at home.1
This raises an important policy
question, or opportunity: Can simple incentives designed to increase the use of textbooks
improve student’s learning?
Inspired by insights from recent field experiments finding strong complementarities
between school inputs and incentives (Gilligan et al. 2018; Mbiti et al. 2019), this paper
studies the impact of a novel “textbooks for self-study” incentive scheme designed to
encourage primary school students to regularly take home textbooks for self-study. It is set
in the DRC, a context where the government and the donor community have been repeatedly
distributing textbooks in primary schools, however without much evidence that they are
The scheme rewards student effort through primarily non-financial
Self-study is defined as study outside the classroom and without direct supervision.
In the aftermath of Africa’s “First Congo War,” which led to the loss of some five million lives between
1994 and 2003, the education sector, like much else, was in desperate need of reconstruction. The DRC
government and the donor community, led by the World Bank, has invested substantial amounts in rebuilding
some of that capacity, including a large textbook distribution program that has allocated around 40 million
textbooks nationwide since 2008.
incentives and requires minimal input from teachers. Our evaluation was carried out on a
sample of 90 primary schools, involving 1,498 5th/6th-grade primary school students, in the
province of South Kivu. Thanks to rich longitudinal data that we collected on students,
teachers, and school head teachers, we are able to demonstrate the added value of the
incentive scheme – notably in terms of improved test scores, both on our own tests and the
national exam, greater likelihood of sitting the national exam, and more ambitious attitudes
toward studying and careers.
To conduct the field experiment, we collaborated with the Ministry of Education of
South Kivu and Cordaid (Caritas Netherlands), an international non-governmental
organization that has been working in the education sector in the South Kivu over the past
ten years. The “textbook for self-study” incentive scheme was designed to intensify and
diversify the use of textbooks. More specifically, the scheme encouraged 5th/6th grade-
students to regularly take home textbooks to study for weekly quizzes through a mix of
incentives: (i) a public display of stars attributed to each student who took math and French
textbooks home, brought them back in good condition and participated in an end-of-week
quiz [image incentives based on individual student effort]; (ii) in-kind rewards such as pens
and pencils for all students in classes where at least 75 percent of the students regularly
participated in the routine [in-kind incentives based on overall class effort]; and (iii) a small
flat-rate compensation to schools for participating in the intervention and to compensate for
a potential increase in the number of lost and damaged books [school-level financial
incentive based on overall compliance]. Financial and material incentives were given after
the end line survey to avoid income effects. We randomly assigned 45 primary schools to
the incentive scheme and 45 comparable primary schools to a control group. The incentive
scheme was implemented over an 18-month period.
We emphasize three main sets of results. First, we find that the students in the
treatment schools scored 0.26 to 0.27σ higher on the French language test relative to
students in the control group schools. We evidence better performance on both the sentence
completion tasks and the retrieving words tasks. However, the intervention had no
significant impact on student test scores in math in aggregate, or on any of the three main
competency domains in math tested. We offer several explanations for why our incentive
scheme had a positive effect on language skills but failed to raise performance in math.
Second, we find that the students in the treatment schools were not only 13 percent
more likely to sit the high-stakes national exam (TENAFEP, from 69 to 78%), we also find
that those who passed did so with better scores (0.27σ higher scores in the main
specification). This corroborates the first finding that the “textbook for self-study” routine
significantly raised student achievement.
Third, our findings suggest that the intervention improved students’ perceptions
about the usefulness of textbooks and students’ educational and career aspirations. These
effects suggest plausible mechanisms for how our incentive scheme produced better
learning outcomes and led more people to sit the TENAFEP. By affecting students’
perceptions and aspirations, our incentive scheme may well have long-lasting consequences.
Furthermore, we provide suggestive evidence that the weaker students benefited most from
the “textbook for self-study” routine, as well as students in classrooms with weaker teachers.
Relative to prior research on education, incentives and learning outcomes, this paper
shifts focus in four important ways. First, we consider how existing inputs to education
production (Hanushek 1979) can be used more efficiently, rather than evaluating the impact
of new resources such as new technology-aided instruction programs (Muralidharan, Singh,
and Ganimian 2019), new contract teachers (Duflo, Dupas, and Kremer 2011, 2015), or new
grants to purchase classroom materials (Blimpo and Evans 2011). Second, we examine the
potential role of textbooks to foster self-study outside the classroom.3
Third, we focus on
incentives that reward student effort rather than learning outcomes, using primarily non-
To the best of our knowledge, our study provides the first causal
evidence of the positive effects of self-study on student achievement in a fragile and
conflict-affected environment. Fourth, we test our novel incentives-only scheme in a
severely resource-constrained and fragile setting. Rigorous randomized experiments are
rarely seen in fragile or conflict-affected states (Burde et al. 2017): of the 118 evaluations
of education interventions reviewed by Glewwe and Muralidharan (2016), only two were
conducted in one of the 15 countries deemed most fragile according to the Fragile States
Index produced by the think tank Fund for Peace.5
One explanation for scant prior RCTs in
fragile or conflict-affected settings is the challenges involved potentially affecting the
internal validity of findings due to higher than usual levels of attrition. We undertake various
measures to address this concern.
The immediate economic and social repercussions of the “learning crisis” are
particularly evident in these fragile settings (World Bank 2018). Yet we know surprisingly
little about how student achievement can be effectively raised under multiple and severe
One of the main challenges for students to self-study is to improve self-control and delay instant gratification
(Beland and Murphy 2016). In our setting, the textbook may also have served as a useful reminder or
commitment device for students to follow through on their intentions to self-study. Our paper thus ties into
the literature on student motivation, self-efficacy, and self-learning
These incentives are primarily non-financial, comprising image and in-kind rewards linked to individual
and class effort, respectively. Even the small monetary compensations directed at the schools to motivate
overall compliance with the scheme are conditional on overall effort, not student performance. By not
linking incentives to student performance, we lower the probability of unintentionally raising teacher
incentives to cheat or “teach to the test” (Jacob and Levitt 2003; Behrman et al. 2015; Martinelli et al. 2018).
Burde and Linden (2013) evaluated the impact of building schools in rural Afghanistan on school enrollment
and student achievement. They found large effects on average student test scores (specifically, an increase of
0.4σ for boys and 0.6σ for girls). However, though the authors do not provide an analysis of the project’s cost-
effectiveness, the costs of this intervention are likely also high. Orkin (2013) assessed the impact of an increase
in instructional time on student learning outcomes in Ethiopia and found that this increase had very small
effects on student achievement. Again, no data on the implementation costs of the intervention were provided.
(resource) constraints, including particularly weak teacher capacity. Our study gives
evidence of a “low-hanging fruit”-type intervention: The intervention does not require any
new inputs but instead leverages an existing widespread yet underutilized educational
resource to encourage self-study.
Implemented on a small scale, the intervention compares favorably in terms of cost-
effectiveness with interventions that increase teaching resources or focus on improved
pedagogy (Kremer et al. 2013). Based on per-student expenditures and the most
conservatively estimated ITT effect of 0.26σ, we estimate that US$65 is necessary to
achieve a 1σ improvement in French test scores. It must be noted that the intervention partly
piggy-backed on an existing results-based financing (RBF) program.6
Further gains in terms
of cost-effectiveness could possibly be achieved in such a context. It also appears that the
provisions for missing or damaged books were too high since very few books went missing
or were severely damaged.
The rest of this paper is organized as follows. Section II describes the context and
the intervention. Section III describes our data and study design. Section IV presents our
main results. Section V discusses some additional outcomes and the intervention’s cost-
effectiveness. Section VI concludes.
II. Context and Intervention
We discuss the implication of running the intervention in schools where an RFB program exists below.
The Democratic Republic of Congo is one of the poorest and most conflict-ridden
countries in the world (IMF 2015; World Bank 2018; Marivoet and De Herdt 2015). It was
the battleground of “Africa’s First World War”, which led to the loss of an estimated five
million lives between 1994 and 2003. The province of South Kivu remains marked by
tension, and outbursts of violence carried out by various armed groups, including the regular
army, are not infrequent (Kivu Security Tracker 2018). Despite recent efforts to improve
the budget allocation, public finance for primary education, and education in general,
remains low compared with most other countries in the region. Only 10.9 percent of the
public budget is allocated to education and the spending execution of the education budget
is at about 1.8 percent of GDP. Furthermore, public spending on education is uneven, biased
toward the rich, particularly in preschool, secondary, and higher education (World Bank
2015). Even though South Kivu counts 15 percent of the country’s primary-level students,
the province receives only 7 percent of teachers’ budget (De Herdt and Titeca 2016).
Primary school education lasts six years and is compulsory for 6- to 11-year-old
children. Net primary school enrolment has increased from 51 percent to 79 percent between
2005 and 2014, and it keeps increasing, especially due to progress in the enrolment of girls
and children living in rural areas. Yet important challenges remain. Dropout and repetition
rates are high, with the overall repetition rate hovering at over 10 percent (World Bank
2015). Schooling is frequently interrupted not just due to outbursts of violence, but also
because many families struggle to pay the school fees. Quality of education is also lagging
behind. According to an assessment carried out in 2014 by PASEC, a joint program of
African francophone countries to evaluate the education sector, most grade 6 students in
DRC were not sufficiently competent in reading or mathematics (World Bank 2018). Many
primary school teachers lack the requisite teacher qualifications or training and frequently
also strong motivation. Teachers’ wages are very low, and their disbursement is highly
erratic –the result of commonly delayed payment by government and parents’ irregular
payment of prohibitive school fees.
As part of a US$150 million World Bank grant aimed at improving public service
delivery, the DRC government launched a nationwide initiative in 2008 to distribute 18
million textbooks in primary schools. This initiative was followed by another US$100
million grant, between 2012 and 2017, in which more than 22 million textbooks were
distributed. An internal review from the World Bank (2017) reported that 93 percent of
these books actually reached schools, but also highlighted that additional effort is needed in
the future in order to ensure the efficient and continuous use of the delivered textbooks.
In 2018, the DRC government, again with the support of the World Bank, launched
a new five-year program called ‘Projet d’Amélioration de la Qualité de l’Education’ (Project
for Education Quality Improvement, PAQUE), specifically focused on tackling the learning
crisis in primary schools in 12 of its 25 provinces. As part of this, the DRC government
began to scale-up results-based financing to an additional 1,350 primary schools in 2019
and plans to introduce RBF in all primary schools across the focal 12 provinces by 2020.
Taken together, recent strategic priorities in DRC’s education sector render the findings of
our study of immediate policy interest.
II.B. The Intervention
The “textbooks for self-study” routine was implemented during five consecutive
trimesters, from second trimester in school year 2016–2017 through the end of school year
2017–2018, in 45 randomly selected primary schools in the districts of Shabunda and
Walungu, in South Kivu. Together with 45 schools in the control group, these schools
represent the full population of primary schools where Cordaid had introduced an RBF
program starting in 2008. The RBF program ties financing to school performance, measured
using a long list of quantitative and qualitative indicators over primarily three domains.
School enrollment carries the biggest weight in the overall computation of points, with extra
weight on girls and in particular students classified as vulnerable.7
The second and third
most important domains are administrative and financial management and supervision,
respectively. There are no direct financial incentives for improvements in student
performance, based on say test results. Neither was there, before our routine was introduced,
any incentives on the usage of textbooks.
The benefits of testing the impacts of the “textbooks for self-study” routine at these
schools were twofold. First, thanks to the RBF program, there was already a monitoring
system in place (with school visits on a quarterly basis), which we could extend to include
monitoring of compliance with our own intervention. Second, we were able to integrate our
extrinsic monetary incentives with the financial transfers paid to schools as part of the RBF
program. This allowed us to economize on implementation costs, a point which we further
discuss in the subsection on cost-effectiveness.
A potential drawback of running the experiment only in RBF schools is questionable
external validity, as these schools may not be representative for all primary schools.8
However, using administrative data for 2015 (see Table A5), we find that even after on
Vulnerable children comprise: (1) orphans; (2) children with a physical or mental handicap; (3) children
born out of sexual violence and who are not accepted by their family and without any other assistance; (4)
abandoned children; (5) children born out of wedlock; war-displaced children without any other assistance;
(6) children affected by HIV/AIDS; children whose parents are affected by HIV/AIDS; (7) children 6-18
who have worked in artisanal mining; (8) children 6-13 who are being raised by vulnerable people or
RBF schools were selected based on two criteria: (i) the school needed to have a minimum ‘viable’ size
(computed as 26 pupils multiplied by the number of classes in the school); and (ii) the school needed to be
accessible by car. Furthermore, Cordaid sought to ensure that the proportion of different types of schools
(Catholic, Protestant, or non-religious public schools) in the sample reflected the true proportions in the
population at large in each district.
average six to eight years of RBF implementation, the 90 RBF schools do not systematically
outperform non-RBF schools in the same districts. They are better off along some
dimensions (they had more teachers, a higher proportion of girls) but not along others (e.g.
fewer qualified teachers per pupil and fewer seats per pupils). Importantly, the performance
of the RBF and non-RBF schools in terms of both TENAFEP pass rates and TENAFEP
mean scores were not significantly different. Together, these findings indeed strengthen the
external validity of the present evaluation. Finally, as mentioned above, given that RBF is
now being rolled out across all primary schools in the 12 select provinces, testing our
intervention at RBF schools is in fact most informative from a policy-making perspective.
The “textbooks for self-study” routine was developed with the input of multiple staff
members at Cordaid based both at headquarters in The Hague and locally in Bukavu;
primary school teachers (through a series of focus group discussions); and officials of
SECOPE (Service for the Control of the Teacher Payroll), the provincial Head of Education,
and the Ministry of Education. Based on these diverse inputs, the routine was designed to
leverage an already-existing, widely available resource (textbooks), and to fit with existing
practices of homework (children attend half days in school) and in-class evaluations, using
a bundle of incentives. In the past, many development innovations or initiatives – take the
US$100 million initiative to distribute textbooks across schools in DRC – have failed to
realize their intended impacts in part because they overlooked or underinvested in incentives
to take-up or utilize the goods provided. In 41 percent of the schools in our sample, letting
students bring home books was not allowed and only a third of the students reported having
a textbook allocated to them that they would regularly use.
At each treatment school, a common set of initiatives was staged to help ensure
clarity on the goals and practical details of the intervention among all relevant stakeholders.
For instance, a team of facilitators, whom we recruited and trained ourselves, held
introductory meetings with the head teacher and 5th-grade teachers, with the parent
committee, and in class with 5th-grade students during which children made a poster
together about the initiative (which then was hung up in the classroom as a useful reminder).
They also conducted refresher meetings at the start of the following school year, this time
with 6th-grade teachers and students, to remind everyone of the scheme and renew the
posters where necessary. Further, a handful of questions were added to the standard survey
that RBF-auditors conduct on a quarterly basis, specifically to assess the implementation of
the new routine.
Given the challenging environment, we opted for a “multi-level, multi-motive”
incentive scheme to try to strongly encourage students, teachers, and schools to stick with
For students, the reward comprised of two elements. The first was an
intrinsic, non-material individual reward based on a publicly announced (via posters in the
classroom) star system. A student earned a star each week that she/he had (i) taken home
and returned in good condition her/his math and French textbooks according to the
assignment; and (ii) taken part in the weekly quiz. Similar public star systems have been
found effective in other contexts (Ashraf, Bandiera, and Jack 2014). Note that the individual
stars are not associated with any material benefits: The incentive works only through
mechanisms of social status. The second reward element was an extrinsic, material group
reward consisting of handouts of school material such as notebooks and pens and pencils.
This reward was meant to be given to classes in which the ratio of actual stars to possible
We thus combine different incentives to increase manipulation strength. The disadvantage of this approach
is that we cannot disentangle the relative importance of the different incentives, as limited sample size did
not allow us to implement multiple treatment arms.
stars (if all class students got stars every week) was at least 75 percent.10
We thus attempted
to tap multiple motives, intrinsic (reputational or social image) as well as extrinsic (in-kind),
on the assumption that these motives are complementary.
For schools, a small flat-rate compensation, US$120 per year, was given for taking
part in the experiment. This sought to address the schools’ concern about missing and
damaged books, providing them with resources to buy new books to compensate for lost
and damaged books. If there was money left after having compensated for the missing
books, then the school could use that money to cover its general expenses. Schools
(headteachers and teachers) were thus encouraged to help their students take care of the
books and minimize any loss or damage to them, while keeping the program on track. Note
that the material incentives were relatively weak. To mitigate any income effect produced
by providing in-kind or financial rewards for student learning, all transfers were made only
at the end of the intervention period. Further, contrary to most incentive systems evaluated
in the literature, our intervention focuses on student effort, not the teachers’ or students’
performance. This is natural, as the intervention emphasized self-study, but it also has the
advantage of not creating (perverse) incentives to cheat or teach only what is being directly
measured, and/or lower the bar of required student achievement.
III. Data and Study Design
In practice, however, the material benefit was in the end delivered to all classes irrespective of
achievement. As this took place ex post and should have been a surprise to non-performing classes, it should
not have impacted the perceived incentives.
Our sample is composed of 90 primary schools in two districts of South Kivu. We
stratified the sample by district, and then randomly assigned schools in each stratum to
treatment or control group. We gathered detailed information about these schools and
especially their 5th- and 6th-grade students using our own base- and end-line surveys
combined with three primary sources of quantitative data. By combining different data
sources, we can mitigate concerns about common method bias, triangulate our data, and
validate our main results.
(1) Own Surveys and Tests
We held multiple surveys at each of the 90 schools both at baseline, before the intervention,
and at end-line, after 18 months’ implementation of the intervention. In each school, we
surveyed the head teacher, the teachers in the 5th (baseline) and 6th (end-line) grade, and
the students in the 5th (baseline) and 6th (end-line) grade. We asked the head teachers about
their socioeconomic status, professional experiences, human capital, school climate,
leadership style, and the nature of after-school homework support. For teachers, we
specifically enquired about their teaching experience and teaching efficacy. These survey
questions were mostly drawn from established survey instruments, in particular from the
Harvard Graduate School of Education and PASEC.11
All questionnaires were in French
and the two most commonly used local languages, Swahili and Mashi.
At baseline, all 5th-grade students who were present completed a survey in class.
This survey included questions about the student’s household situation, school-related
attitudes and habits, non-school-related work, and aspirations for the future. Our
enumerators also instructed students to fill in an in-class test, which covered both
The Harvard Graduate School of Education survey instruments are explained here:
www.panoramaed.com/panorama-student-survey. PASEC is an international organization with the mission
to evaluate education in the French-speaking parts of the developing world.
mathematics and French, mostly based on questions from PASEC (2015). The test (provided
in the Supplementary Appendix) covered basic grammar and reading comprehension in the
French part, and basic numeracy (sequencing, basic arithmetic, fractions) in the math part.12
Questions were multiple choice, and there were overall eight questions in each subject, with
a few sub-questions for each. Students were given the time they needed to complete the test.
The results provide us with a useful proxy measure of knowledge at baseline and help us
control for potential non-observable heterogeneity among the students.
We repeated all the surveys at end-line. Given the overall poor student performance
on the test at baseline, we added a few simpler questions to the end-line test. We also
administered the same test to the teachers. At treatment schools, some extra survey questions
explicitly focused on the intervention and its implementation were added.
(2) National End-of-6th-Grade Exam Data Published by the Provincial Inspection of
We hand-collected the available performance outcomes on the national exam, the ‘Test
National de fin d’Etudes Primaires’ (National End of Primary Studies Test, TENAFEP).
This high-stakes test is taken at the end of 6th grade and certifies completion of primary
education, and is a requirement for students wishing to pursue secondary education. The
TENAFEP comprises three subjects, all administered in French: French language, general
knowledge, and mathematics. For each school, the Provincial Inspection of Education
publishes how many students took the test and how many of those failed (disaggregated by
gender), and then the names and overall score only for those students who passed the exam.
The French score was computed as the overall score without weighting between the different sections: the
‘completing a sentence’ section accounted for 12 points and the ‘retrieving a word’ section for 4 points.
Similarly, the math score was the sum of maximally 3 points for decimals, 7 points for ordering, and 2
points for the word-based problem.
We matched these student names with the names from our student database using a level of
tolerance for incorrect or alternative spelling of names and double-checked this matching
(3) Provincial Division of Education
We hand-collected several basic statistics about each school from the archival records
maintained by the Provincial Division of Education. This allowed us to gather information
about school size and students’ demographic composition.
(4) RBF Audit Data
We extracted several additional basic statistics about each school from the audit reports
produced every quarter in the context of the RBF program present at all schools in our
sample. These audit reports include information about basic school infrastructure, number
of pupils who took the TENAFEP, quality and continuity of the teaching programs, and
overall teacher and team performance. They also contain information about the proportion
of RBF funding in the school’s overall revenues.
III.B. Randomization Check and Attrition
While stratifying allowed us to ensure balance between the treatment and control
groups in terms of location (school district), we still confirmed, using baseline survey data,
that balanced randomization was successfully achieved on other key observable
characteristics. Also, given our setting, the problem of attrition was especially serious. To
systematically analyze both issues of balance and attrition within the same framework, we
regressed each observable variable at baseline (Y) for each student (i) in each school (s) on
Using the matchit package of Stata, and systematically checking all names with a score between 0.6 and
0.99 (which allows for typos but also first and last names being inverted).
the intervention dummy variable (T), the attrition dummy (A), and the interaction between
𝑌𝑖,𝑠 = 𝛼 + 𝛽𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡 𝑠 + 𝜋𝐴𝑡𝑡𝑟𝑖𝑡𝑖𝑜𝑛𝑖,𝑠 + 𝛾𝑇𝑠 𝐴𝑖,𝑠 + 𝜖𝑖,𝑠 (1)
The constant α gives the mean value for students identified at both baseline and end-line in
control schools and β measures the difference for the similarly identified students in treated
schools. The parameter π captures the difference in the average value of the variable of
interest within the group of students who could not be re-identified. Finally, γ, capturing the
interaction effect between treatment and attrition, indicates whether the drivers of attrition
are different between the control and treatment groups. We look at both individual and
Looking first at balance between treatment and control schools, as shown in Table
1 in column 2, four variables – namely students’ age, gender, hours worked after school,
and frequency of eating breakfast – are significantly different in the treatment versus control
group, though the imbalance is slight.14
Further, it is a priori unclear what their joint
influence on learning would be: Some variables suggest more vulnerable students (girls,
less well-fed students) are in the treatment group while others, such as the number of hours
spent doing work (child labor) suggest the opposite. Overall, our randomization checks
fairly reassuringly indicate that randomization was effective at generating samples that were
balanced on schools’ and students’ observable characteristics (measured at baseline), and
we also present analysis with controls for those four variables that were slightly imbalanced
in our regression tables.
These results are confirmed when regressing all observable variables jointly against the intervention
dummy. When excluding the four variables that were found to be significantly different, an F-test of joint
significance fails to reject the null of no difference between the treatment and control samples. Including the
four variables, the F-test rejects the null hypothesis.
From the baseline survey, the attrition rate is 46.85 percent -that is, out of 2,818
students surveyed at baseline, 1,498 students were matched at end-line. This high level of
attrition reflects in large part the notoriously high dropout and repetition rates in rural DRC
(World Bank 2015): Using data from the year before the intervention in the two districts of
interest, we found an average combined drop-out and repeat rate of 28.53 percent (standard
deviation of 40.29) in the schools under study (by which we mean that grade 6 classes are
28.35% smaller than grade 5 classes). What is more, day-to-day student absenteeism is very
high in our context, meaning that there was a non-negligible risk that some students would
not be present at school when we returned there. Absenteeism draws on issues typical to
low-income countries (physical access to school, child labor, etc.) but also, in the context
of the DRC, on the fact that many parents face difficulty paying the tuition fees and their
children are expelled from class as a result. While it was very likely that we would lose a
substantial part of our sample due to these contextual factors, the deterioration of the
security situation over the course of the study further worsened attrition. The repeated
postponement of the national elections led to severe economic difficulties and violence in
the field, which affected our sample attrition in two different ways. Firstly, it aggravated the
typical drop-out issues, with more families struggling to make ends meet. Secondly, the
end-line survey could only be conducted months later than expected, near the very end of
the schoolyear. Cordaid assesses the security situation daily in the districts and enumerators
are only allowed in the field when the situation is judged as reasonably stable. At the time
when end-line surveys were planned to take place the security situation turned bad for an
The insecurity and remote locations, in particular in the district of Shabunda, made it prohibitively costly
to try to track students outside of schools to conduct the end-line surveys and tests.
In theory, attrition is problematic for two reasons. Firstly, it may affect the overall
representativeness of the sample. In our case, Column 3 of Table 1 shows that older students
are, overall, more likely to be missing at end-line. We cannot know for sure, but being older
is a sign of past struggle at school and these students tend to be more likely to drop-out
anyway, suggesting that our sample is still representative of students who pass from grade
5 to grade 6. Another, non-exclusive, possibility is that older students are less likely to be
present in class at end-line, in which case our sample is slightly biased towards younger
students. Secondly, attrition is of particular concern if it is correlated with different variables
between treatment and control groups. In this respect, Column 4 of Table 1 is reassuring as
no statistical difference between the two groups can be found. Furthermore, the difference
in attrition rate between treatment and control schools is not statistically significant.16
Nevertheless, given the high level of attrition we also present results with inverse probability
weights to correct for attrition based on observables (see, e.g., Wooldridge 2007;
Muralidharan, Singh, and Ganimian 2019).17
Thanks to the quarterly field visits by the RBF verification agents of the Agence
d’Achat de Performance (AAP), we were able to generate fine-grained measures of schools’
compliance with the new routine.18
Compliance was high overall, though weaker in seven
Using a logit model with school clusters, the difference in log-likelihood is -0.165 (z= -1.15).
The specification we present uses all the variables summarized in Table 1 to predict the weight’s
denominator. The covariates of attrition (age), as well as the variables that are different between treatment
and control groups (age, gender, hours worked after school, and frequency of eating breakfast) and the
French and math baseline score that have a high variance are taken out to calculate the numerator. We tested
alternative specifications of the IPW (e.g. leaving in French and math for the numerator and not including
any of the school covariates) and noticed no substantial changes in the results.
Specifically, the following six dimensions were carefully evaluated: Whether (i) textbooks were returned
in good state; (ii) a “take home” log with student names and date was maintained; (iii) the star system was
correctly used; (v) posters were hanging in the classroom; (vi) textbooks were taken home on a weekly
basis; (vi) all project documentation was neatly organized. In addition, RBF verification agents also noted
the number of weekly classroom tests and verified whether at least 75 percent of all students had received a
schools in the first six months of the intervention (see Table A2). When we repeated our
main analysis considering the compliant schools only (Imbens and Agrist 1994), the effects
were somewhat larger in magnitude but qualitatively the same. This finding aids the internal
validity of our results.
In addition, we also sought to assess compliance post hoc using questions that we
added to the end-line student and teacher surveys. More specifically, we asked about
students’ use of textbooks at home. We find that 81 percent of surveyed students in
treatment schools reported having taken home a textbook in the past month, versus
39 percent of surveyed students in control schools. These data allow us to triangulate the
compliance/verification data collected by AAP. Further, they lend support to the notion that
the new routine effectively led students in treatment schools to make greater use of
textbooks at home.
IV. Primary Results
In this section, we first explore how the “textbook for self-study” routine affected the
students’ results on the French and math tests that we designed. We then turn to students’
national exam results, and the sit and pass rates at the national exam (TENAFEP).
In the Appendix (Figure A1), we present the frequency distribution of student test
scores, separately for the treatment and control schools and at baseline and end-line. Student
achievement in both control and treatment schools improved substantially between 5th and
6th grade, in particular at the lower end of the distribution. The 6th grade is a key year, as
star on a regular basis. Taken together, these verification data allowed us to gain detailed understanding of
overall compliance and to early detect (and attempt to remediate) schools that were experiencing difficulties.
at the end of this year there is the national exam required to continue to secondary school,
so the improvement is expected. The distributions of baseline results look very similar in
the control and treatment schools, though, and there is no statistically significant difference
in unconditional average baseline test scores across the two groups. For end-line test results,
the distributions look more different for French, and the unconditional average score in the
treatment group is significantly higher than in the control group in this subject. There is a
strong positive correlation between baseline and end-line test scores at the individual level,
which suggests meaningful baseline tests despite lower-than-expected test results.
IV.A. French and Math Test Results
Following our pre-analysis plan, we estimate the effects of the “textbook for self-
study” routine using the following linear model19
𝑌𝑖,𝑗,𝑑,𝑠,𝑡+1 = 𝛼 + 𝛽𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑗 + 𝜋𝑌𝑖,𝑗,𝑑,𝑠,𝑡 + 𝛾𝜙𝑖𝑑𝑡 + 𝜖𝑖,𝑗,𝑠,𝑡+1 , (2)
where Yijdst is student i’s standardized test score20
in school j in district d in subject s at time
t; Treatment is the indicator variable for being in a treatment school; and 𝜙 is a vector of
district and enumerator fixed effects.21
Each enumerator was assigned both treatment and
control schools and ran over 100 student tests on average, allowing us to remove enumerator
fixed effects from all empirical specifications. This helps to address the concern that
enumerators may have induced differential measurement error, for instance by helping
The analysis follows the pre-analysis plan published on May 8, 2018, at the AEA RCT Registry. The RCT
ID is AEARCTR-0001845.
Using the mean and standard deviation of the control group (all pupils, not only those matched). Given the
rather big difference between baseline and end-line scores, we use the baseline control group for baseline
data and the end-line control group for end-line data. Using the baseline control group for end-line leads to
inflated effect sizes.
In principle, randomization should take care of the well-known challenge that knowledge is a cumulative
process that depends on the full history of a subject’s exposure to educational input, not just a time-limited
recent intervention. Including baseline test results as sufficient statistics for prior input into learning should
thus not be as essential in this case (Todd and Wolpin 2003). Having baseline results, though, helps us reduce
noise from unobserved heterogeneity within our sample.
students comprehend questions on the test or influencing the way the intervention was
grounded from the start (Crossley et al. 2017; West and Blom 2017). For every specification,
we confirmed that the coefficients on the enumerator dummies were indeed significantly
different from zero. Error terms are clustered at the school level to take into consideration
intra-cluster correlations. In an alternative specification, we additionally include the four
variables that were slightly unbalanced in the baseline data.
As shown in Table 2, we find that students who were in the treatment schools scored
0.260σ to 0.274σ higher in French compared with students in the control group schools
(Columns 1–3), but scored only marginally, 0.094σ to 0.106σ, and not significantly higher
in math (Column 4–6). For both French and math, we report first the base model regression
results, then the estimates when controlling for the four variables found to differ between
treatment and control schools at baseline, and finally additionally when including the
inverse probability weights to adjust for potential attrition bias. Our major findings hold
across all three empirical specifications. Estimated coefficients in different models are also
illustrated in Figure 1.
In addition to presenting impacts on standardized total scores, we also present
impacts on different domains of subject-level competencies, in Figure 2, using the model
with inverse probability weights. The effects are positive and significant across the two
domains covered by our test of French language competencies (sentence completion and
retrieving explicitly provided information). The effects on results across the three math-
specific domains (decimals, sequencing, and word-based math problems) are non-
Drawing on prior research, we highlight several plausible explanations for why the
textbook routine produced a positive impact on students’ achievement in French but not in
math. First, this discrepancy may well reflect known differences in the knowledge
accumulation process between these two subjects (de Jong 2015). In math the knowledge
elements build upon each other, whereas in French the knowledge elements are more
parallel. Therefore, when the overwhelming majority of students do not pass a proficiency
threshold in math (World Bank 2018) or lack understanding of very basic math concepts, as
is the case in our setting, the marginal benefit of using a grade-appropriate math textbook
may be especially small. Second, for students to make measurable progress in math, the
textbook routine may have been insufficient. Indeed, prior work has argued that more help
and supervision in school and/or at home is typically needed for students to progress in math
(Lee and Barro 2001; Marcotte 2007). Third, this discrepancy may well reflect differences
in the textbooks’ readability. Language textbooks (French in the case of DRC) tend to be
written at a more rudimentary level than subject textbooks (Chimombo 1989; Milligan,
Clegg, and Tikly 2016), and thus also at a more appropriate level for students who lag
behind. Since the math textbooks were written in the French language, there was also a
double treatment of French language exposure. Note that, since our intervention obliged all
students to take home both a math and a French textbook at least once per week, we can
refute that differential exposure to these textbooks drives the discrepancy.
Turning to the high-stakes national exam, we estimate an equation of similar form
to Equation (2) to evaluate the effects of the intervention on students’ exam results. Table 3
shows the effects of the intervention on these results for students who passed the TENAFEP
(the data doesn’t reveal information on test scores for students failing the test). We find that
the intervention increased the average exam score by around 0.27σ. This corroborates the
earlier finding that the textbook for self-study intervention effectively improved student
learning. In Columns 1 and 2, we only look at the students who were re-identified and for
whom we have baseline test scores in French and math. In Columns 3 and 4, we look at all
the students who passed TENAFEP in the schools for which we have data. The results of
Columns 3 and 4 do not include individual controls beyond gender (the only piece of
information available from the TENAFEP registers) and likely include students who were
not exposed to the full course of the intervention because they only joined in 6th
Furthermore, in three schools there were two 6th
grade classes and there is a risk that the
students taking the test were not in the class that benefited from the intervention. On the
other hand, attrition is not a concern in Columns 3 and 4. For all these reasons, it is
reassuring to see that the results of Columns 3 and 4 are generally in line with the results of
Columns 1 and 2 (albeit with somewhat larger standard errors).
To investigate the effects of the intervention on the likelihood that students at all
take the TENAFEP, we use school-level data. This is for two reasons. Firstly, there is a high
risk that by using student surveys we would miss many of the students who did not take the
test; indeed, those tend to drop out or stop showing up in class around the time of TENAFEP,
which was before our end-line survey. Second, the TENAFEP data that we have does not
indicate whether students who are not on the lists did not take the test or failed it, and our
own survey question proved unreliable.22
To compute the school-level sit and pass rates
indicators, we used data on enrollment in semester two, a few months before TENAFEP.
Results, presented in Table 4, show that the intervention significantly raised the share of
students taking the exam (Columns 1 and 2) but had no significant impact on the pass rate.
Put differently, even though the intervention led more students to take the exam, the average
We only have TENAFEP data on the students who passed it and our survey question on whether the
student took the TENAFEP led to confusion: we used the French verb ‘passer’, which should be understood
as taking the test but is also often understood as passing it.
TENAFEP pass rate did not drop; rather, it remained the same. This makes the effect on
TENAFEP from the intervention even more encouraging, as the increase in the share of
students sitting the exam likely draws from students from the lower end of the achievement
distribution. Despite this, more students passed the exam, and among those that did, the
average score was higher than in the control schools.
V. Additional Results
V.A. Habits, Attitudes, and Aspirations
In this section we provide some additional evidence to shed light on the impact of
the “textbook for self-study” routine on students’ study habits and attitudes with regards to
further study and career aspirations, drawing on our own survey data. In the student
questionnaires, we first enquired about students’ textbook usage and homework routines,
thereafter about their aspirations for the future and attitudes towards school. We estimate an
equation similar to equation (2) with the set of four control variables used above and inverse
probability weights to adjust for attrition bias, and we control for baseline values. We also
provide standard errors adjusted for multiple hypothesis testing.
Table 5 reveals that students in the treatment schools say they spend slightly less
time on homework, thereby ruling out the possibility that the impact of the new routine
simply came from a higher homework load. On the other hand, students in treatment schools
found the textbooks significantly more useful for learning but were not more likely to use
them in the classroom. With respect to future aspirations, students in treatment schools were
almost 14 percent more likely to aspire to a non-manual job and they were more motivated
to go to school. They were also more likely to say that they wanted to continue to secondary
school, though this effect was not statistically significant. Our hypothesis is that the
intervention succeeded in not only raising student achievement but also encouraging more
students to take the TENAFEP in part because it generated a more positive attitude toward
textbooks and more ambitious study and job aspirations (TENAFEP being a prerequisite for
most non-manual jobs and secondary school). We also checked whether the intervention
was associated with teachers using the textbooks more or more efficiently, but nothing
suggested this was the case.
V.B. Effect Size and Cost-Effectiveness
The ITT effect of 0.26 to 0.27σ that we find for French language falls in the middle
of the distribution of estimated effect sizes presented in the overview by Kremer, Brannen,
and Glennerster (2013). Glewwe, Kremer, and Moulin (2009) and Sabarwal, Evans, and
Marshak (2014) evaluate textbook allocation projects in Kenya and Sierra Leone,
respectively, but neither finds a significant impact of textbooks on average student learning.
For stronger students, though, Glewwe, Kremer, and Moulin (2009) estimate an impact of
0.22σ for the fifth quintile and 0.14σ for the fourth quintile after one year of exposure.23
should be noted that our intervention measures what can be thought of as the “intensive
margin” of school inputs, making more use of existing textbooks, not the extensive margin
– the impact of providing more books. We have not been able to identify directly
comparable impact evaluations in the literature.
Other studies have analyzed alternative pedagogical tools, such as flip charts (Glewwe et al. 2004) and
multilevel learning materials (Tan, Lane, and Lassibille 1999). While the former finds no significant impact,
the latter finds a very high impact on English in a Filipino context of a combination of multilevel learning
materials and enforced parent–teacher partnerships (between 0.75σ and 1.05σ). Recently, technology-driven
interventions geared toward teaching at the right level have shown potential in poor but stable environments.
Banerjee et al. (2007) found that a computer-aided learning program that provided two hours per week of math
instruction improved test scores by 0.48σ after two years. Muralidharan, Singh, and Ganimian (2019) estimate
ITT effects of 0.37σ in math and 0.23 in Hindi over a 4.5-month period from a personalized technology-aided
after-school instruction program in urban India. Implementing something similar in a fragile rural setting such
as Eastern DRC would be very challenging.
Cost-effectiveness, and even low absolute costs, are particularly relevant in very
poor and fragile settings such as DRC. Based on operating expenditures, we estimate the
cost per student at US$17 in our treatment subsample. This includes the direct costs for the
incentives to schools (roughly US$9 per student), i.e. school material shared with students
over the school year and a flat compensation of US$120 per school. It also includes the costs
of setting up the intervention and monitoring compliance, including primarily costs of
human resources and project management but also some small expenditures on survey
material. These costs were in addition to the costs of the RBF program itself.
Based on our most conservative estimate of impact (0.26σ), this suggests that
US$100 yields 1.53σ improvement in test scores, or alternatively that US$65 would achieve
an improvement of 1σ. This compares very favorably with the 30 randomized controlled
trials (RCTs) evaluated for cost efficiency in Kremer, Brannen, and Glennerster (2013). A
significant part of the costs of this intervention are associated with the need to monitor
schools for compliance and with distributing financial compensation. The planned scaling
up of RBF to 12 provinces covering 1,350 primary schools may be helpful in this case, as
the textbook routine could be embedded in that broader system of monitoring and
incentives, and it should be possible to reduce costs per school. This being a bundled
intervention, it is difficult to separate the roles of the financial incentives versus the non-
pecuniary incentives through the star system, but the direct costs of compensation to the
schools for the financial incentives were slightly more than half of total costs. If scaled up,
reducing financial compensation and stressing more non-pecuniary incentives, together with
reduced monitoring costs through the RBF system, could potentially almost double cost
efficiency. This is assuming, though, that the impact carries over also if scaled up, which of
course cannot be guaranteed (see e.g. Bold et al. 2018).
The compliance data also suggest that initial concerns about books disappearing
were not well founded. The average (median) number of textbooks that disappeared or were
damaged over the course of the intervention was, respectively 6.71 (3) in the first semester
of the intervention and 5.3 (3.5) in the second semester.
V.C. Heterogeneous Treatment Effects
The main results in Section III.A address the impact of the intervention on average test-
score results. In this section we turn to analyzing the impact across different groups of
students and teachers. This analysis should be interpreted with some care, as we are
restricted in sample size and therefore statistical power. Differences in impact would have
to be quite substantial to be deemed statistically significant, so we cannot rule out
differences of smaller size. All tables are presented in Appendix A.
In Table A3 we investigate whether impact varied with the status of the students. It
is possible, for instance, that academically stronger students, students from households of
higher socioeconomic status, and those with fewer work responsibilities at home had greater
opportunities to benefit from the intervention. On the other hand, it is also possible that
weaker students may stand the most to gain from being trusted with a book and given an
opportunity to catch up in learning with stronger students. We therefore study variation by
gender, status as vulnerable, age (older students have typically repeated grades), and
baseline test scores as a good summary statistic of prior inputs into education using a linear
interaction in the model with inverse probability weights. The interaction effects are
statistically insignificant across the models, with the exception of the case of the best
students who are less likely to see their test scores in both French and math increase as a
result of the intervention. Viewed from an equity standpoint, these results suggest that the
intervention did not unintentionally reinforce any gender or age discrimination or biases
against the academically weakest students in class – if anything it benefited most those who
are academically relatively weaker.
Next, we investigate whether the impact varied with teacher characteristics, notably
teachers’ teaching efficacy, years of teaching experience, and teacher competence in math
and French as measured by their score on the student math and French tests. Teacher
competence has been shown to be low across Sub-Saharan Africa and to negatively impact
student learning (Bold et al. 2017). Results in Table A4 tentatively suggest that the new
routine if anything had a larger impact on French and math test scores in schools with
comparatively less efficient teachers. This should be interpreted with some caution, but the
results are consistent with the notion that textbooks can somewhat compensate for weak in-
class learning in the French language.
In sum, these results tentatively suggest that students generally benefited in French
language skills from the intervention, but the academically weaker students somewhat more
so. The results also tentatively suggest that textbooks at home and self-learning may be
particularly beneficial for students in schools with weaker teachers.
In this paper, we have investigated whether a simple incentives-only scheme designed to
encourage students to regularly take home textbooks for self-study can increase student
learning, as well as their study and career aspirations. Using a randomized field experiment
implemented in South Kivu in DRC, we find evidence that a “textbooks for self-study”
scheme led to significant test-score gains in French, but not in math, and in the end of
primary school national exam. It also led more students to successfully take the exam, which
is a prerequisite for continuing to secondary education. Furthermore, the scheme led
students to perceive textbooks more useful, and express more ambitious study and career
aspirations. Lastly, while the average impact of the scheme was strong and robust to various
specifications, such as the inclusion of inverse probability weights to adjust for a high level
of attrition, the scheme also seems to have had a somewhat greater impact on the learning
outcomes of the relatively weaker students and those in classrooms with relatively lower-
performing teachers. Though tentative given the restricted size of the available sample, these
results lend support to the notion that such a “textbooks for self-study” incentive scheme
can compensate for weaker teacher quality and can especially help students who are
comparatively lagging behind.
Our paper expands the set of strategies available to donors and developing-country
governments to increase student learning outcomes and effectively address the serious
“learning crisis” (World Bank 2018). Our findings suggest that student-incentive-only
programs can produce a measurable impact on students’ achievement and future aspirations.
They are consistent with a set of recent papers that have emphasized the complementarity
between incentives to stimulate effort and accountability and school inputs such as
textbooks (e.g. Gilligan et al. 2018; Mbiti et al. 2019), arguing that the combination of the
two is of great importance in generating significant learning gains. The fact that the
intervention relies mostly on non-monetary incentives is attractive, as this renders the
scheme more feasible also in severely resource-constrained and fragile environments.
Furthermore, in focusing on student incentives, it intentionally circumvents many of the
deep structural problems – notably the weak capacity of primary school teachers – that
characterize the education sector in DRC as in many other fragile states.
Even at the fairly small scale at which the “textbook for self-study” scheme was
implemented, the intervention seems to be cost efficient. Based on the most moderately
estimated effect of 0.26σ, we estimate that $US100 yields a 1.53σ improvement in test
scores. If implemented at a larger scale, some fixed costs could be reduced, and the low
number of lost books suggests that the compensation to schools could be lowered.
Raising student learning in rural DRC and other similar fragile settings to levels
reflecting the ambitions and goals set in the official curriculum and global targets
necessitates substantial improvements in most aspects relevant for knowledge production.
This includes household-level inputs at early age, teacher quality, school management and
accountable and coordinated education policies (Brandt 2017). Such an effort requires
substantial resources, both financial and human, and depends strongly on the ability to
generate stability and inclusive growth. To start identifying cost-efficient interventions that
largely rely on clearer incentives and more efficient use of existing resources can be a first
step, though, towards making a difference in the lives of students.
For scaling up, more experimentation can help the scheme to reach its full potential
and design the most cost-efficient implementation. Training teachers in the use of textbooks
both in the classroom and for self-study could potentially increase the impact of the routine.
This could include using textbooks from different grades as a diagnostic tool to teach
students at their individual level, thus avoiding some of the problems associated with
overambitious curricula (e.g. Beatty and Pritchett 2012). Finally, limited resources
prevented us from analyzing properly the relative role of financial versus non-financial
incentives for students and schools. For cost effectiveness, understanding what primarily
drives change in behavior, and how to secure compliance through regular monitoring or
other means, becomes essential when scaling up this intervention.
Our study raises intriguing questions for future research. What are other approaches
(peer-to-peer learning, for example) that sidestep the challenge of poor teacher quality and
weak parental support, and how can they be leveraged to improve learning in primary
schools? How might the impact of a textbook intervention differ if one considered a younger
cohort? How might a change in incentives impact outcomes? Can our main findings be
replicated in other settings? We defer these questions to further study.
Ashraf, Nava, Oriana Bandiera, and Kelsey B. Jack. 2014. “No Margin, No Mission?
A Field Experiment on Incentives for Public Service Delivery.” Journal of Public
Economics 120: 1–17.
Banerjee, Abhijit, Shawn Cole, Esther Duflo, and Leigh Linden. 2007. “Remedying
Education: Evidence from Two Randomized Experiments in India.” Quarterly Journal
of Economics 122 (3): 1235–64.
Beatty, Amanda, and Lant Pritchett. 2012. “The Negative Consequences of
Overambitious Curricula in Developing Countries.” Harvard Kennedy School of
Government, Scholarly Articles 9403174.
Behrman, Jere R., Susan W. Parker, Petra E. Todd, and Kenneth I. Wolpin. 2015.
“Aligning Learning Incentives of Students and Teachers: Results from a Social
Experiment in Mexican High Schools.” Journal of Political Economy 123 (2): 325–64.
Beland, Louis-Philippe, and Richard Murphy. 2016. “Ill Communication: Technology,
Distraction & Student.” Labor Economics 41: 61–76.
Blimpo, M. P., and D.K. Evans. 2011. “School-Based Management and Educational
Outcomes: Lessons from a Randomized Field Experiment.” Unpublished.
Bold, Tessa, Deon Filmer, Gayle Martin, Ezequiel Molina, Brian Stacy, Christophe
Rockmore, and Jakob Svensson. 2017. “Enrollment without Learning: Teacher
Effort, Knowledge, and Skill in Primary Schools in Africa.” Journal of Economic
Perspectives 31: 185–204.
Bold, Tessa, Mwangi Kimenyi, Germano Mwabub, Alice Ng’anga, and Justin
Sandefur. 2018. “Experimental Evidence on Scaling Up Education Reforms in
Kenya.” Journal of Public Economics 168: 1–20.
Brandt, Cyril Owen. 2016. Review of Teacher Remuneration in the DRC. Consultancy
report for the ACCELERE! project. Cambridge: Cambridge Education.
Brandt, Cyril Owen. 2017. “Ambivalent Outcomes of State building: Multiplication of
Brokers and Educational Expansion in the DR Congo (2004–2013).” Review of African
Political Economy 44 (154): 624–42.
Brandt, Cyril Owen. 2018. “Illegibility as a State Effect: The Limits of Governing
Teacher Identification in the Democratic Republic of Congo.” PhD thesis, unpublished,
University of Amsterdam.
Burde, Dana, and Leigh L. Linden. 2013. “Bringing Education to Afghan Girls: A
Randomized Controlled Trial of Village-Based Schools.” American Economic Journal:
Applied Economics 5 (3): 27–40.
Burde, Dana, Amy Kapit, Rachel L. Wahl, Ozen Guven, and Margot Igland
Skarpeteig. 2017. “Education in Emergencies: A Review of Theory and Research.”
Review of Educational Research 87 (3): 619–58.
Chimombo, Moira. 1989. “Readability of Subject Texts: Implications for ESL Teaching
in Africa.” English for Specific Purposes 8 (3): 255–64.
Crossley, Thomas, Tobias Schmidt, Panagiota Tzamourani, and Joachim K. Winter.
2017. “Interviewer Effects and the Measurement of Financial Literacy.” University of
Essex ISER Working Paper Series 2017-06.
De Herdt, Tom, and Kristof Titeca. 2016. “Governance with Empty Pockets: The
Education Sector in the Democratic Republic of Congo.” Development and Change 47
de Jong, John H.A.L. 2015. “Why Learning English Differs from Learning Math and
Science.” Pearson blog, October 20, www.english.com/blog/learning-english-differs-
Dobbie, Will, and Roland G. Fryer Jr. 2013 "Getting beneath the veil of effective
schools: Evidence from New York City." American Economic Journal: Applied
Economics 5(4): 28-60.
Duflo, Esther, Pascaline Dupas, and Michael Kremer. 2011. “Peer Effects, Teacher
Incentives, and the Impact of Tracking: Evidence from a Randomized Evaluation in
Kenya.” The American Economic Review 101: 1739–74.
Duflo, Esther, Pascaline Dupas, and Michael Kremer. 2015. “School Governance,
Teacher Incentives, and Pupil–Teacher Ratios: Experimental Evidence from Kenyan
Primary Schools.” Journal of Public Economics 123: 92–110.
Fredriksen, Birger, Sukhdeep Brar, and Michael Trucano. 2015. “Getting Textbooks
to Every Child in Sub-Saharan Africa: Strategies for Addressing the High Cost and
Low Availability Problem.” Washington, DC: World Bank.
Fryer, Jr Roland G. 2011. Financial incentives and student achievement: Evidence from
randomized trials. The Quarterly Journal of Economics 126 (4): 1755-1798.
Gilligan, Daniel O., Naureen Karachiwalla, Ibrahim Kasirye, Adrienne M. Lucas,
and Derek Neal. 2018. “Educator Incentives and Educational Triage in Rural Primary
Schools.” National Bureau of Economic Research Working Paper 24911.
Glewwe, Paul, and Karthik Muralidharan. 2016. “Improving Education Outcomes in
Developing Countries: Evidence, Knowledge Gaps, and Policy Implications,” in
Handbook of the Economics of Education, Volume 5, edited by Eric A. Hanushek,
Stephen Machin, and Ludger Woessmann, 653–743. Amsterdam: Elsevier.
Glewwe, Paul, Michael Kremer, and Sylvie Moulin. 2009. “Many Children Left
Behind? Textbooks and Test Scores in Kenya.” American Economic Journal: Applied
Economics 1 (1): 112–35.
Glewwe, Paul, Michael Kremer, Sylvie Moulin, and Eric Zitzewitz. 2004.
“Retrospective vs. Prospective Analyses of School Inputs: The Case of Flip Charts in
Kenya.” Journal of Development Economics 74 (1): 251–68.
Hanushek, Eric A. 1979. “Conceptual and Empirical Issues in the Estimation of
Educational Production Functions.” Journal of Human Resources 14 (3): 351–88.
IMF (International Monetary Fund). 2015. “Democratic Republic of Congo: Selected
Issues.” IMF Country Report 15/281.
Jacob, Brian A., and Steven D. Levitt. 2003. “Rotten Apples: An Investigation of the
Prevalence and Predictors of Teacher Cheating.” Quarterly Journal of Economics 118
Kivu Security Tracker. 2018. “Monthly Report: June 2018.” Accessed June, 25, 2018.
Kremer, Michael, Conner Brannen, and Rachel Glennerster. 2013. “The Challenge of
Education and Learning in the Developing World.” Science 340: 297–300.
Lee, Jong-Wha, and Robert J. Barro. 2001. “Schooling Quality in a Cross-Section of
Countries.” Economica 68: 465–88.
Marcotte, Dave E. 2007. “Schooling and Test Scores: A Mother-Natural Experiment.”
Economics of Education Review 26: 629–40.
Marivoet, Wim, and Tom De Herdt. 2015. “Poverty Lines as Context Deflators: A
Method to Account for Regional Diversity with Application to the Democratic
Republic of Congo.” Review of Income and Wealth 61 (2): 329–52.
Martinelli, César, Susan W. Parker, Ana Cristina Pérez-Gea, and Rodimiro Rodrigo.
2018. “Cheating and Incentives: Learning from a Policy Experiment.” American
Economic Journal: Economic Policy 10 (1): 298–325.
Mbiti, Isaac, Karthik Muralidharan, Mauricio Romero, Youdi Schipper,
Constantine Manda, and Rakesh Rajani. 2019. “Inputs, Incentives, and
Complementarities in Education: Experimental Evidence from Tanzania.” National
Bureau of Economic Research Working Paper 24876.
Milligan, Lizzi O., John Clegg, and Leon Tikly. 2016. “Exploring the Potential for
Language Supportive Learning in English Medium Instruction: A Rwandan Case
Study.” Comparative Education 52 (3): 328–42.
Milligan, Lizzi O., Leon Tikly, Timothy Williams, Jean-Marie Vianney, and
Alphonse Uworwabayeho. 2017. “Textbook Availability and Use in Rwandan Basic
Education: A Mixed-Methods Study.” International Journal of Educational
Development 54: 1–7.
Muralidharan, Karthik, Abhijeet Singh, and Alejandro J. Ganimian. 2019.
“Disrupting Education? Experimental Evidence on Technology-Aided Instruction in
India.” American Economic Review 109: 1426-1460.
OECD (Organisation for Economic Co-operation and Development). 2015. The ABC
of Gender Equality in Education: Aptitude, Behaviour, Confidence. Paris: OECD
Orkin, Kate. 2013. “The Effect of Lengthening the School Day on Children’s
Achievement in Ethiopia.” Young Lives Working Paper 119.
Panorama Education. 2015. User Guide: Panorama Teacher Survey. Boston: Panorama
PASEC. 2015. PASEC2014. Education System Performance in Francophone Sub-
Saharan Africa: Competencies and Learning Factors in Primary Education. Dakar:
Read, Tony. 2015. Where Have All the Textbooks Gone? Toward Sustainable Provision
of Teaching and Learning Materials in Sub-Saharan Africa. Washington, DC: World
Sabarwal, Shwetlena, David K. Evans, and Anastasia Marshak. 2014. “The Permanent
Input Hypothesis: The Case of Textbooks and (No) Student Learning in Sierra Leone.”
World Bank Policy Research Working Paper WPS 7021.
Tan, Jee-Peng, Julia Lane, and Gerard Lassibille. 1999. “Student Outcomes in
Philippine Elementary Schools: An Evaluation of Four Experiments.” World Bank
Economic Review 13 (3): 493–508.
Todd, Petra, and Kenneth I. Wolpin. 2003. “On the Specification and Estimation of the
Production Function for Cognitive Achievement.” The Economic Journal 113: 3–33.
UNESCO. 2014. Education for All Global Monitoring Report. Paris: UNESCO.
West, Brady T., and Annelies G. Blom. 2017. “Explaining Interviewer Effects: A
Research Synthesis.” Journal of Survey Statistics and Methodology 5: 175–211.
Wooldridge, J. 2007. “Inverse Probability Weighted Estimation for General Missing Data
Problems.” Journal of Econometrics 141: 1281–301.
World Bank. 2015. Public Expenditure Review of the Education Sector in the Democratic
Republic of Congo: An Efficiency, Effectiveness and Equity Analysis. Washington, DC:
World Bank. 2017. “Implementation Completion and Results Report (TF-14253, TF-
14358).” Report No ICR00004233. Washington, DC: World Bank.
World Bank. 2018. World Development Report 2018: Learning to Realize Education’s
Promise. Washington, DC: World Bank.
Figures and Tables
Figure 1: ITT Effect on Student Achievement
for Different Model Specifications
Figure 2: ITT Effects by Specific Competence Assessed (Model Is Cluster + Controls + IPW)
TABLE 1--SAMPLE STATISTICS, BALANCE ON OBSERVABLES, AND ATTRITION
(1) (2) (3) (4)
Age (years) 11.860
Literate mother 0.644
Literate father 0.839
Hours of (non-school) work per
Help with homework at home 0.273
Minutes spent on homework per
Frequency of eating breakfast a
Concern about violence at
Organized study time at school
after school hours
Teacher–student ratio 0.004
Size of grade 5 class 59.451
Overall share of vulnerable
Overall ratio of female over
Share of revenues from RBF 3.127
Number of classrooms with
Regular meetings with parents 0.647
Overall teacher and team
Overall quality and continuity
of teaching programs
Quality of school leadership 3.765
School climate 3.765
Teacher’s years of experience 13.338
French (/16) 3.015
Math (/12) 1.970
Notes: α, β, π, and γ are the coefficients indicated in Equation (1). Breakfast: Respondents
indicate on a four-point scale how frequently they have breakfast, with 0 indicating “almost
never,” 1 “once per week,” 2 “at least 2–3 days a week,” and 3 “every morning”; b
about violence: Respondents indicate on a five-point scale how frequently they are concerned
about violence at school, with 1 indicating “almost never,” 2 “once in a while,” 3 “sometimes,”
4 “frequently,” and 5 “almost all the time.”
TABLE 2--INTENT-TO-TREAT EFFECTS IN AN OLS REGRESSION FRAMEWORK
Dependent variable: Standardized test scores (end-line)
(1) (2) (3) (4) (5) (6)
French French French Math Math Math
0.274** 0.270** 0.258** 0.103 0.106 0.093
(0.125) (0.124) (0.125) (0.099) (0.100) (0.098)
0.148*** 0.147*** 0.154*** 0.236*** 0.229*** 0.234***
(0.047) (0.047) (0.047) (0.043) (0.043) (0.043)
Pupil controls No Yes Yes No Yes Yes
Enum. FE Yes Yes Yes Yes Yes Yes
IPW No No Yes No No Yes
Observations 1498 1498 1498 1498 1498 1498
squared 0.255 0.261 0.267 0.212 0.221 0.226
Notes: Robust standard errors clustered by school in parentheses. Treatment is a dummy variable
indicating whether student was attending a treatment school. Tests in French and math were
designed to cover wide ranges of achievement and to be linked between baseline and end-line
assessments using common items. Scores were standardized to have a mean of 0 and standard
deviation of 1 in the baseline. ***, **, and * signify statistical significance at 1, 5 and 10 percent
TABLE 3--EFFECTS ON STUDENT PERFORMANCE IN THE NATIONAL END-OF-6TH-GRADE EXAM
Dependent variable: TENAFEP z-score
(re-identified students only)
All students in the school
(1) (2) (3) (4)
Intervention (ITT) 0.274** 0.270** 0.214 0.230*
(0.137) (0.133) (0.159) (0.133)
Control baseline test
(French & math)
Yes Yes N/A N/A
Pupil controls (age,
breakfast, hours of work)
No Yes N/A N/A
Gender control No Yes No Yes
Yes Yes Yes Yes
School controls (pass rate
y-1, mean score y-1, PBF
No Yes No Yes
District control Yes Yes Yes Yes
School clustered SE Yes Yes Yes Yes
Observations 806 806 2575 2500
R-squared 0.202 0.230 0.072 0.132
Notes: Standard errors in parentheses. a
For ease of interpretation we use a linear probability
model. District controls are included to account for the fact that TENAFEP scores are weighted
at district level (giving a bonus to students from Shabunda for its remoteness). ***, **, and *
signify statistical significance at 1, 5 and 10 percent level respectively.
TABLE 4--EFFECTS ON NATIONAL END-OF-6TH-GRADE EXAM (TENAFEP) OUTCOMES AT
Dependent variable (OLS)
(1) (2) (3) (4)
Sit rate Sit rate Pass rate Pass rate
Intervention 0.094* 0.090* −0.001 −0.004
(0.046) (0.046) (0.022) (0.022)
Control baseline value Yes Yes Yes Yes
School-level control (1) No Yes No Yes
District control Yes Yes Yes Yes
83 83 83
Adjusted R-squared 0.032 0.024 0.084 0.077
Notes: Standard errors in parentheses. +
No end-line TENAFEP data were available for seven
schools (five in the treatment group, two in the control group). See Table 3 for an explanation for
the inclusion of a district control. ***, **, and * signify statistical significance at 1, 5 and 10
percent level respectively.
TABLE 5 -- HABITS, ATTITUDES, AND ASPIRATIONS
Dependent variable (OLS)
(1) (2) (3) (4) (5) (6)
useful for learning
textbooks in class
non-manual jobs a Wishes to go to
Motivation to go
Intervention -0.204* 0.226* 0.048 0.118* 0.034 0.230**
(0.119) (0.118) (0.133) (0.068) (0.023) (0.102)
Mean baseline value 1.397 3.830 0.454 0.910 3.910
(0.041) (0.068) (0.009) (0.013) (0.075)
Bonferroni p-value 0.336 0.298 0.720 0.336 0.336 0.157
Control baseline value Yes Yes Yes No b
Pupil controls (4) Yes Yes Yes Yes Yes Yes
Enumerator fixed effects Yes Yes Yes Yes Yes Yes
IPW Yes Yes Yes Yes Yes Yes
Observations 1497 1389 1391 1497 1498 1497
Adjusted R-squared 0.103 0.143 0.132 0.381 0.044 0.113
Notes: Standard errors are clustered by school. a
A non-manual job is defined as a job belonging to one of the following categories: artist, lawyer,
politician, faith minister, and civil servant – including teachers but excluding military and police. b
We do not have a baseline for pupils’ professional
aspirations but given that randomization largely created balanced treatment and control groups this should not be a concern. ***, **, and * signify
statistical significance at 1, 5 and 10 percent level respectively.
Appendix A: Complementary Tables and Results
TABLE A1--BALANCE ON SCHOOL-LEVEL CHARACTERISTICS
Teacher supervision during after-school
Size of grade 5 class
Overall share of vulnerable students
Overall ratio of female over male students
Share of revenues from RBF
Number of classrooms with brick walls
Regular meetings with parents
Shabunda district 1.000 0.000
Overall teacher and team performance2 26.517
Overall quality and continuity of teaching2
Quality of school leadership3 3.834
School climate3 3.804
Teacher level characheristics
Years of work experience of teacher3
Teacher’s score on own math test
Teacher’s score on own French test
Notes: 1. Using a simple OLS model where the outcome is regressed on the treatment dummy |
2. As evaluated through the RBF system (independent of this study) | 3. Based on survey with
the principal, composite indicator (means of 5-item Lickert scales from 1 terrible to 5 excellent)
TABLE A2--COMPLIANCE WITH THE INTERVENTION
Books returned in good state 0.754 0.678
Registers are used [core] 0.667 0.622
Stars system is used [core] 0.663 0.63
Pictures displayed in the classroom [core] 0.8 0.778
Proportion of book withdrawals 0.75 0.726
Project documents are filed [core] 0.778 0.8
Stars obtained 0.681 0.629
Weekly test 0.64 0.53
Observations 45 45
Note: Standard deviations in parentheses.
TABLE A3--HETEROGENEITY IN TREATMENT EFFECT BY STUDENTS’ GENDER, SOCIOECONOMIC
STATUS, AGE, AND BASELINE TEST SCORE
Dependent variable: Standardized test scores (end-line)
(1) (2) (3) (4) (5) (6) (7) (8)
Covariates Vulnerable Female Best half of the
French Math French Math French Math French Math
* 0.105 0.252* 0.100
* 0.170 0.209 0.064
(0.120) (0.095) (0.150) (0.117) (0.144) (0.106) (0.132) (0.103)
0.143 0.086 0.009 0.156** 0.221*
** 0.035 0.092
(0.183) (0.150) (0.091) (0.077) (0.119) (0.075) (0.108) (0.113)
-0.313 -0.218 -0.000 -0.034
0.324** -0.191* 0.174 0.075
(0.228) (0.204) (0.116) (0.102) (0.148) (0.108) (0.121) (0.125)
yes yes yes yes yes yes yes yes
Observations 1498 1498 1498 1498 1498 1498 1498 1498
squared 0.268 0.230 0.266 0.229 0.272 0.231 0.267 0.229
Notes: Robust standard errors clustered by school in parentheses. All regressions include enumerator
fixed effects, the four controls mentioned earlier, and inverse probably weighting to account for
attrition issues. We also control for the dependent variables of the table (female, vulnerable, older
pupils, and/or best half of students) that are not the covariate of interest. There is no correlation
between being a weaker student and being vulnerable. Vulnerable are more likely to be male and
younger. ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively.
TABLE A4--HETEROGENEITY IN TREATMENT EFFECT BY TEACHER CHARACTERISTICS
Dependent variable: Standardized test scores (end-line)
(1) (2) (3) (4) (5) (6) (7) (8)
Covariates Teaching efficacy Teaching
French Math French Math French Math French Math
Treatment 0.378** 0.189 0.454* -0.190 0.422* 0.195 0.397 0.288
(0.165) (0.137) (0.231) (0.189) (0.245) (0.175) (0.273) (0.204)
Covariate 0.070 0.020 0.010 -0.208 0.460 0.251 0.015 0.182
(0.245) (0.191) (0.267) (0.208) (0.325) (0.226) (0.296) (0.204)
Interaction -0.615* -0.527* -0.351 0.519* -0.348 -0.058 -0.300 -0.303
(0.314) (0.276) (0.367) (0.297) (0.426) (0.296) (0.455) (0.334)
Observations 1498 1498 1467 1467 1305 1305 1305 1305
squared 0.287 0.243 0.275 0.237 0.273 0.245 0.267 0.244
Note: The covariates are all binary variables and take the value of 1 for an above-the-median score.
Robust standard errors clustered by school in parentheses. All regressions include enumerator fixed
effects, the four controls mentioned earlier, and inverse probably weighting to account for attrition
issues. ***, **, and * signify statistical significance at 1, 5 and 10 percent level respectively.
Table A5: Heterogeneity between RBF and non-RBF schools in 2015
Non-RBF RBF t-test difference
Number of qualified teachers 8.241 9.667 1.322**
Teachers at lowest level of qualification 0.0296 0.0556 0.026
Teachers per pupil 0.0290 0.0246 -0.004**
Support staff per pupil 0.00579 0.00459 -0.001
Girls to boys (RBF indicator) 0.908 1.007 0.099***
Classrooms 8.305 9.167 0.870
Classrooms in a good state 0.0936 0.0333 -0.082
Seats 352.5 360.8 -0.462
Seats per pupil 1.201 0.877 -0.323***
Female head of school 0.0690 0.0556 -0.022
‘Payé and Mécanisée’ (RBF selection criteria) 0.773 0.900 0.127**
TENAFEP fail rate 0.273 0.238 -0.027
TENAFEP mean score 59.30 58.61 -1.149
TENAFEP mean standard deviation 3.661 3.504 -0.063
N 201 90
Note: Standard deviation in parenthesis. Significance level * p<0.05 ** p<0.01 *** p<0.001.
For TENAFEP data we only have 158 schools in total, of which 51 are RBF.
FIGURE A1--DISTRIBUTION OF TEST SCORES AT BASELINE AND END-LINE
Kolmorgorof-Smirnoff (equality of distribution): 0.0406 (p=0.571)
Kolmorgorof-Smirnoff (equality of distribution): 0.0664 (p=0.075)
Kolmorgorof-Smirnoff (equality of distribution): 0.0567 (p=0.182)
Kolmorgorof-Smirnoff (equality of distribution): 0.0530 (p= 0.245)