1. RESEARCH ON SCHOOL
INSPECTIONS: WHAT DO WE
KNOW?
ResearchEd
9 September 2023
Dr Christian Bokhove
Southampton Education School
University of Southampton
3. General
• Not a political talk.
• Our position: accountability is necessary, but of course it
needs to be reliable – hence, research needed.
• However, the lack of research worries us.
• Furthermore, research should not be captured by
ideology: if you start with ‘we need to scrap them’, then
that seems starting with a foregone conclusion. Likewise,
given the high stakes we must be critical.
• Cue this project (and previous/associated work).
4. Previous research on Ofsted
• Very little data -> very little research.
• Hussain (2014) 4pctile test score gains from an
‘Inadequate’.
• Eyles & Machin (2017) +59% head turnover after
conversion.
• Ofsted (2017) 92% agreement in short inspections.
• Data scraping creates new opportunities.
• Starting experimenting in 2014/2015.
5.
6. Bokhove, C., & Sims, S. (2020). Demonstrating the potential of text mining for analyzing school inspection
reports: a sentiment analysis of 17,000 Ofsted documents. International Journal of Research and Method in
Education. https://doi.org/10.1080/1743727X.2020.1819228
Open Access
7. Boxplot showing the distribution of sentiment scores by inspection grade. N=3,155.
8. Bokhove, C., Jerrim, J., & Sims, S. (2023). How Useful are Ofsted Inspection judgements for Informing
Secondary School Choice?. Journal of School Choice, 17(1), 35-61.
https://doi.org/10.1080/15582159.2023.2169813
Open Access
10. This is John’s family………..
Tom (eldest)
Loves football (and trains).
In reception year.
Joe (youngest)
Love food (and trains).
Will enter reception Sep 2024
The most recent Ofsted report prior to
John’s school choice was 15 years
before their eldest started attending,
and 20 years before youngest will
leave!
11. This is Christian’s family………..
Came to the UK in 2012.
Eldest needed a secondary
school, younger children a
primary school.
Primary school has infant and
junior section.
Ofsted reports for Christian’s family were quite
current when school choice was made.
But still from choice to last child leaving primary 7
years (and quite stable), for secondary 12 years
(and changes).
12. So limitations Ofsted judgements to
inform school choice…
1. Time-lags.
2. Siblings.
a) Almost certainly go to same school (particularly primary).
b) Make choice of basis of Eldest. Youngest then just follows.
c) Makes issue of time-lags worse!
3. Ofsted mechanism for change
a) Don’t want RI / Inadequate schools to be predictive of future.
b) Want them to improve!
4. Comparability of information (framework changes)
5. Choices actually available to parents
a) May not have an “outstanding” school near to them!
b) Rural areas
6. Added value over other information.
13. School choice for a hypothetical family
- Two kids born 2 years apart
- Selecting secondary school for eldest in 2013
- Eldest will attend Sep 2014 – June 2019
- Youngest will attend Sep 2016 – June 2021
We ask:
(a) How long is the gap between report available at point of school
choice and when eldest child will start at the school?
(b) When does the headteacher during this inspection (used in
school choice) leave the school?
(c) To what extent does the judgement from this inspection (used
in the school choice of the parent) predict outcomes when the
eldest attends?
14. The lag between inspection and school
choice for the eldest child……….
School characteristic Mean (days)
Outstanding 1,546
Good 965
RI 736
Inadequate 622
15. The “survival” of headteachers by Ofsted
rating….
Entry
By the time the eldest child
enters the school, half will
be led by a different
headteacher
….particularly in
“Inadequate” schools.
Exit
By the time the eldest child
leaves the school, around
80% of headteachers will
have moved on.
16. Large raw differences in GCSE
outcomes……….
Figures refer to average
GCSE percentile rank by
Ofsted inspection
judgement at the time of
school choice for our
hypothetical family
17. …that largely disappear
once we control for intake
Figures refer to average
GCSE percentile rank by
Ofsted inspection
judgement at the time of
school choice for our
hypothetical family….
…conditional upon
differences in schools
intakes (e.g. KS2 scores,
%FSM, %SEN).
18. Similar results for other outcomes
Conditional upon intake, we find little association between Ofsted
judgement at the time of school choice and:
- GCSE outcomes
- Parental satisfaction with the school
- School leadership (including satisfaction with leadership)
- Absence rates
The exception to the rule:
- Recently made Outstanding judgements (within last 2 years) had some
association with future outcomes.
20. Bokhove, C., Jerrim, J., & Sims, S. (2023). Are some school inspectors more lenient than others?. School
Effectiveness and School Improvement, 1-23. https://doi.org/10.1080/09243453.2023.2240318
Open Access
21. Our research
• Motivation: Human aspect of inspection is a feature not a
bug, but….the higher the stakes, the greater the level of
reliability required
• This paper: do judgements of similar schools vary based
on inspector characteristics?
• HMI vs OI
• Lead inspector gender
• Experience
• Team size
• …
22. Data
• Inspections between Sep 2011 and Aug 2019 (old
framework)
• Primary and secondary schools
(focus on primary, larger N)
• Names of lead inspectors extracted from reports
Watchsted / scraping
• 1,376 different inspectors, 35,751 school inspections
25. Lead inspector gender
Lead male/female inspectors not assigned differently.
This effect is very small, though, and at Inadequate /
RI distinction (also in short inspections)
27. Current work
2000 2009 2017 - short
2014
• Sequence effects
• Textmining reports
• Contributed to consultation
28. Overall conclusions
• Inspection reports not very useful for school choice.
• Differences in judgement, especially contract type HMI/OI.
HMCI commentary
However…..
• Limitations of course apply.
• Not a lot of research on Ofsted.
• To gain more insight we need
more data and transparency (e.g.
inspector assignment).
• Role Ofsted: accountability and/or
improvement?
https://assets.publishing.service.gov.uk/government/uploads/syst
em/uploads/attachment_data/file/596708/Reliability_study_-
_final.pdf
The backbone of the presentation will be two papers we published, one working paper, see for more info with links to news articles and blogs at https://is.gd/nuffieldofsted, and one journal article, see https://doi.org/10.1080/15582159.2023.2169813. Both sources also cite a variety of other sources.
On inspection judgements we found:
Evidence of a (modest) gender difference in primary inspection outcomes.
Evidence of a difference in inspection outcomes between HMIs and OIs.
Differences by inspection team size
On the usefulness of Ofsted judgements for school choice we found, conditional upon intake, little association between Ofsted judgement at the time of school choice and GCSE outcomes, parental satisfaction with the school, school leadership (including satisfaction with leadership), and absence rates. The exception to the rule concerns recently made Outstanding judgements (within last 2 years), which had some association with future outcomes.
Figure 1 presents the distribution of Overall Effectiveness judgements awarded to schools when the inspection was led by an HMI (pink) compared to an OI (green).
There is a clear, sizeable difference for primary schools. HMIs are less likely than OIs to judge a school to be Good (47% versus 60%), and more likely to award a Requires Improvement or Inadequate grade.
The magnitude of the difference is smaller for secondary schools, though a similar pattern emerges. Most notably, HMIs are 1.5 times more likely to judge a school to be Inadequate than OIs (12% versus 8%) and less likely to rate a school as Good.
But are HMIs and OIs deployed to inspect schools with different characteristics?
Yes they are! And we can see this in our data.
Table 2 presents evidence on this matter for primary schools. HMIs disproportionately lead inspections of schools with lower levels of performance in national examinations and that were judged to be inadequate in their last inspection. In our academic paper, we also show how HMIs are more likely to conduct inspections of “exempt” schools (explaining the higher percentage of HMIs inspecting previously Outstanding schools).
Between 2012 and 2021, schools previously judged as Outstanding were exempt from routine inspection. They could however be inspected where complaints about the school were received or if Ofsted’s risk assessment process identified concerns.
So can this explain what we are seeing in Figure 1?
Only partially, as far as we can tell.
After controlling for differences across HMIs and OIs in the types of inspections they conduct, the difference between them is reduced. There is, however, still a sizeable gap that our controls for the background characteristics of schools (including school performance measures, inspection type, prior inspection judgement and absence rates) cannot explain.
This can be seen in Figure 2 – the gap between the pink (OI) and green (HMI) bars has reduced in comparison to Figure 1 but remains non-trivial and statistically significant at conventional levels.
Interestingly, in the analogous analysis for secondary schools, the only remaining point of difference between HMIs and OIs after adding these controls is in the tendency to award Good (43% versus 47%) compared to Inadequate (12% versus 8%) grades.
What is the take-home message?
Now, in our analysis we have only been able to control for a limited set of background factors. Ofsted may select HMIs and OIs to lead inspections in ways that we cannot observe – and thus are unable to control for in our analysis.
Our hunch is therefore that the differences reported in Figure 2 are probably still a bit too high. At the same time, we believe it unlikely that further differences in HMI and OI deployment would explain all the remaining gap.
Ofsted may wish to dig into this issue some more, particularly with respect to the inadequate grade, where the relative difference between HMIs and OIs is greatest.
Conclusions
It should be noted that the results presented in Table 1 represent quite an extreme example. We are comparing two hypothetical inspectors at either end of the distribution (the most “lenient” and the “harshest”).
And, as our previous blogs have shown, some characteristics of lead inspectors and their teams are more strongly associated with inspection outcomes than others. For the two hypothetical inspectors considered in Table 1, the HMI versus OI distinction will be responsible for a fair chunk of the difference observed.
Yet, in reality, schools will indeed receive inspections led by very different inspectors. Not only in terms of the characteristics we can observe in Table 1, but also many other potentially important unobservable characteristics as well (e.g. personality types, experiences of leading challenging schools).
We hence hope that the results presented in Table 1 are at least a useful thought experiment for readers, in terms of how much inspection outcomes may differ in the extreme across rather different lead inspectors.