Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

What are metrics good for? Reflections on REF and TEF


Published on

Public lecture organised by Council for Defence of British Universities
Royal Holloway, University of London
Tuesday 20th November 2018

Published in: Education
  • Be the first to comment

  • Be the first to like this

What are metrics good for? Reflections on REF and TEF

  1. 1. What are metrics good for? Reflections on the Research Excellence Framework (REF) and Teaching Excellence Framework (TEF) Dorothy Bishop Wellcome Trust Principal Research Fellow Department of Experimental Psychology University of Oxford 1
  2. 2. 2 Started in 2012 ’Universities in the UK are under threat. A series of reforms has made universities more like businesses, subject to market forces. We believe that these changes to the way universities operate and are governed pose a risk to a university’s central function, which is to gather knowledge, free from interference, and to educate people in the skills they need to think critically and independently. ’
  3. 3. 3 Structure of talk History and nature of REF History and nature of TEF With a focus for both on: • What are we measuring? • Why are we measuring it? Concluding with suggestions that we • Ditch league tables; foster diversity of institutions • Radically rethink and simplify REF • Get rid of TEF • Give academics time and security to do the job well
  4. 4. Two sources of public money for funding research: ‘Dual support’ funding model since 1965 4 Block grant Research councils Used for staff salaries and long-term research investments, infra-structure Equipment/salaries for specific projects, evaluated by peer review Funds distributed by University Grants Council: ‘buffer’ between government and universities Government
  5. 5. 1981: Huge cuts in recurrent funding • Thatcher Government: concerns about efficiency and accountability of the sector • Cut budget to Dept for Education and Science • University Grants Council had to decide: ‘equal misery’ vs. ‘selectivity’ – went for selectivity 5 Elite old universities largely protected: Bad hits to Salford, Aston and Bradford – some departments closed
  6. 6. Research Assessment Exercise (RAE) Question of how to allocate a shrinking research budget fairly and transparently became top priority 1989: “Selectivity exercise” becomes “Research Assessment Exercise” with increase in proportion of block grant awarded by rankings 6 But then a new challenge: 1992: Further and Higher Education Act • Former polytechnics awarded university status • N universities goes up from 46 to 84
  7. 7. Massive funding shift over time 7 Block grant Research councils
  8. 8. 1994 onwards: RAE process refined - now Research Excellence Framework (REF) • Each university submits information about research activity in given areas over the period since last assessment, to include: – Outputs (mostly publications) – Information on the research environment. – Research impact (since 2014) • Information assessed by discipline-based panels of academics in terms of ‘originality, significance and rigour’. 8 4* World-leading 3* Internationally excellent, but falls short of the highest standards of excellence. 2* Recognised internationally 1* Recognised nationally
  9. 9. Advantages of RAE/REF* • System is transparent • Fairness** – Rising stars rise quicker – Been adapted to allow for career breaks, etc. – Mechanism for getting rid of “dead wood” * Research Excellence Framework ** System admired in many European countries where appointments/promotions determined by nepotism or political considerations 9
  10. 10. Advantages of RAE/REF* • System is transparent • Fairness** – Rising stars rise quicker – Been adapted to allow for career breaks, etc. – Mechanism for getting rid of “dead wood” * Research Excellence Framework ** System admired in many European countries where appointments/promotions determined by nepotism or political considerations 10 Prof Farnsworth Futurama I teach mathematics of quantum neutrino fields. I made up the title so no student would dare take it.
  11. 11. Advantages of RAE/REF* 11 Also claims that REF has led to better research “Significant improvement was found in the quality of research since the last assessment exercise. On average across all submissions, 22% of outputs were judged world- leading (4*), up from 14% in the previous exercise in 2008. A further 50% were judged internationally excellent (3*), up from 37%.” Try Googling ‘replication crisis’
  12. 12. Disadvantage: cost in time/£££ to tell us what we already know • “ main criticism of RAEs concerns their cost- effectiveness. The HEFCs estimated that the 1992 RAE cost them £4 million. The cost of each RAE to the universities, especially in the time of academics and administrators, is enormously greater. • After the four RAEs that have already taken place, the changes in ranking that now occur from Exercise to Exercise are generally small in magnitude and in number. In other words, huge effort and cost are being invested to discover less and less information.” Colin Blakemore (1998) 12
  13. 13. 13 REF 2014 191,150/898 = 213 per panel member (not allowing for double marking). Some outputs are books
  14. 14. Could it be simplified? Much debate over metrics/peer review • Metrics Easy to compute but…. – Superficial – Differ by discipline/sub-discipline – Can be gamed • Peer review More acceptable to academics but….. – Impossible for panels to do the kind of peer review that normally operates with grants/publications – Far too much to evaluate in too short a time frame – Panels may not have adequate expertise 14
  15. 15. Metrics or peer review? • Metrics Easy to compute but…. – Superficial – Differ by discipline/sub-discipline – Can be gamed • Though note: REF designed to reduce reliance on quantity over quality of outputs, and has vetoed use of journal impact factor • Peer review More acceptable to academics but….. – Impossible for panels to do the kind of peer review that normally operates with grants/publications • Far too much to evaluate in too short a time frame • Panels may not have relevant expertise 15
  16. 16. 16 Disadvantage: Arbitrariness Tweaking of funding formula can have big effects Reminiscent of John Kay’s comment on the United Nations Index of Human Development Uses formula that is weighted sum of indices of longevity, educational standards and gross domestic product • Iceland at top • Sierra Leone at bottom ‘Most people would agree that Iceland scores higher for human development than Sierra Leone. But if the index didn’t give us that ranking we would change the index, not our view of Iceland or Sierra Leone’ John Kay (2010) Obliquity: why our goals are best achieved indirectly. Profile Books, London.
  17. 17. 17 Any formula that did not put Oxford, London and Cambridge at the top would be unacceptable. Weighting given to 4*, 3* etc will affect the size of the advantage. Cumulative effects over time If income in round N of REF affects ability to do research for round N+1, can engineer whether discrepancy between top and bottom gets larger No discussion of whether concentration of funding in a small elite is the optimal approach. Disadvantage: Arbitrariness Tweaking of funding formula can have big effects
  18. 18. Disadvantages, ctd. Negative impact on research culture –Hiring decisions on basis of ‘refability’ –Emphasis on grants and “high impact” publications: speed rather than careful scholarship 18
  19. 19. This is counterproductive because • Amount of funding needed to do research is not a proxy for value of that research • Some activities intrinsically more expensive • Why disfavour research areas that cost less? • Many great discoveries involve slow, careful work over years • How many Nobel laureates would not be deemed ‘reefable’? 19 Peter Higgs Albert Einstein Donna Strickland Daniel Kahneman
  20. 20. Furthermore • Desperate scramble for research funds leads to researchers being overcommitted -> poorly conducted studies • Ridiculous amount of waste due to the ‘academic backlog’ 20
  21. 21. Time for a rethink: What are we trying to do with REF? • Has acquired major role as status indicator – ‘Grade point average’ – Mean REF * rating per staff member entered 21
  22. 22. 22 excellence/2017590.article REF as status indicator: GPA score Mean * score per FTE staff entered GPA score does not determine funding but is used to compile league tables
  23. 23. Time for a rethink: What are we trying to do with REF? • Original goal was as a means of distributing funds transparently – Still used for this purpose: REF ‘power’ score based on N staff entered into REF weighted by 4* and 3* ratings of outputs/environment/impact. 4*/3* ratio 4:1 – Funds then calculated in relation to types of discipline (medicine/technical/other) 23
  24. 24. 24 excellence/2017590.article QR funding (total: £1.58 billion) is determined in relation to power score UCL Oxford Cambridge Edinburgh Manchester Nottingham KCL Imperial REF as basis for distributing funds: power score N FTE staff entered weighted by 4*/3* ratings
  25. 25. Suppose we return to original goal Means of distributing funds transparently • We could dispense with review of quality and obtain very similar funding outcome by allocating funding at institutional level in proportion to N research active staff - i.e. ‘power’ measure • Would need defence against gaming: e.g., requirement of minimal standard for ’research active’ and restrict submission to those employed at institution for at least 3 yr. 25
  26. 26. What would we lose? • Role of REF in performance management; can influence research culture James Wilsdon on REF as – “driver of a serious shift towards embracing and valuing impact through the system. As the HEFCE consultation confirmed, the next cycle is likely to be used to nudge change through the system in other areas, through its requirements over open access and (perhaps) the use of unique identifiers like ORCID numbers. Given the reach of the REF into all corners of university research, it’s arguably the most efficient and effective way of introducing changes of this kind.” 26
  27. 27. 27 My plea We need to take a long hard look at costs and benefits of REF and determine what we want it to achieve Mission creep over the years means it has morphed from a simple system of allocating funding fairly into a monstrous and complex exercise, with each new iteration adding new features designed to counteract disadvantages of previous version.
  28. 28. 28 This section of the talk is a shortened version of a lecture given at University of Southampon:
  29. 29. Teaching Excellence Framework 29 Introduced by Jo Johnson in 2015 as part of the Higher Education and Research Bill, with various justifications given in a Green Paper
  30. 30. Justification: Teaching is neglected relative to research “There are weak incentives on Higher Education Providers to increase the standard of teaching in the higher education sector.” “Because many universities see their reputation, their standing in prestigious international league tables and their marginal funding as being principally determined by scholarly output, teaching has regrettably been allowed to become something of a poor cousin to research in parts of our system" My solution: get rid of distorting incentive structure provided by REF JJ solution: more league tables, this time for teaching 30
  31. 31. Claim 1: Teaching is ‘patchy’ and ‘lamentable’ 9th September 2015 ”I hear this when I talk to worried parents, such as the physics teacher whose son dropped out at the start of year two of a humanities programme at a prestigious London university, having barely set eyes on his tutor. Her other son, by contrast, studying engineering at Bristol, saw the system at its best: he was worked off his feet, with plenty of support and mostly excellent teaching. This patchiness in the student experience within and between institutions cannot continue. There is extraordinary teaching that deserves greater recognition. And there is lamentable teaching that must be driven out of our system.” Jo Johnson Minister for Universities and Science 31
  32. 32. Is teaching ‘patchy’ and ‘lamentable’? “In the NSS 2015 survey, two thirds of providers are performing well below their peers on at least one aspect of the student experience.” Jo Johnson Challenged to give evidence of ‘lamentable’ teaching by Select Committee: Distribution of responses to item 22: "Overall I am satisfied with the quality of the course" See: 32
  33. 33. Claim 2: Students are dissatisfied Introduction: The transparency challenge, point 15: “Students are also concerned about value for money, with one third of undergraduates paying higher fees in England believing their course represents very poor or poor value for money.” Cites this report: 33
  34. 34. Are students dissatisfied? England Scotland “Perceptions of value for money have diverged as a result of the increase in the full-time undergraduate fee cap to £9,000 in 2012 for students from England …. Only 7% of students from England on the higher fees feel they receive ‘very good’ value for money – the figure for students from Scotland who remain there to study is five times higher (35%).” 34
  35. 35. But some problems noted, e.g. : ‘One third (33%) of students describe the information they received before starting their course as “accurate”, which has been a consistent finding over the last three years’ Are students dissatisfied? 35
  36. 36. But information for students is available through Unistat 36
  37. 37. Proposed TEF metrics Key NSS: National Student Survey ILR: Individualised Learner Record (FE) DLHE: Destinations of Leavers from Higher Education survey • Some changes made since Green Paper • To be interpreted in light of qualitative information about ‘context’ to give gold/silver/bronze ranking ment_data/file/556355/TEF_Year_2_specification.pdf 37
  38. 38. Validity of proposed TEF metrics 38
  39. 39. The NSS scales Teaching on my course 1 - Staff are good at explaining things. 2 - Staff have made the subject interesting. 3 - Staff are enthusiastic about what they are teaching. 4 - The course is intellectually stimulating. Benchmark factors: Subject of Study, Age on Entry, Ethnicity, Sex, Disability 5 - The criteria used in marking have been clear in advance. 6 - Assessment arrangements and marking have been fair. 7 - Feedback on my work has been prompt. 8 - I have received detailed comments on my work. 9 - Feedback on my work has helped me clarify things I did not understand Assessment and Feedback ’. 10 - I have received sufficient advice and support with my studies. 11 - I have been able to contact staff when I needed to. 12 - Good advice was available when I needed to make study choices. Academic Support 39
  40. 40. Problems with NSS as a metric 1. It doesn’t measure teaching quality 40
  41. 41. Response by Royal Statistical Society 41
  42. 42. Response by Royal Statistical Society ”Anecdotally, we have heard of institutions explicitly `dumbing down‟ programmes so as to result in higher NSS scores. A new TEF needs to recognise this and mitigate against it. One goal of higher education is to produce highly educated people of use to the society of the future and the NSS inadvertently encourages the opposite.” 42
  43. 43. Problems with NSS as a metric 1. It doesn’t measure teaching quality 2. It doesn’t discriminate between institutions 43
  44. 44. Response by Royal Statistical Society “It is not clear that it is possible to discriminate the vast majority of HE institutions on the overall NSS satisfaction scores, let alone when they are broken down into smaller subgroups." Cites ONS: “Teaching Excellence Framework: Review of Data Sources --- Interim Report” Rank ordered HEIs Line shows mean score, vertical bar shows confidence interval 44
  45. 45. Chris Husbands, chair of TEF Panel, citing US research: “Student satisfaction seems to be driven by the physical attractiveness of academics rather than anything else,” He added that the TEF panel would “not draw policy from a single data point” and that “all data” are “flawed” in some respect. However, he went on, the challenge was to recognise and understand the flaws, and to learn how the information could be used effectively. 45 Further problems with validity noted by the chair of the TEF panel!
  46. 46. Potential for damage to reputation of those awarded less than Gold 46 So we will have a set of incommensurate and invalid metrics which will then be combined to give a 3-point scale
  47. 47. Cost-benefit analysis of TEF = Universities/HEIs 28005/bis-16-295-he-research-bill-detailed-impact-assessment.pdf Estimated as £53K per institution on average 47
  48. 48. Cost-benefit analysis of TEF: Benefits Table 14: Benefits to HEIs in real terms • Projections crucially dependent on correctly predicting student N and inflation rate • Benefits of TEF vs no TEF purely down to fact that fees will be fixed if no TEF 48
  49. 49. What about costs to academic staff? Bring in substantial grant income Publish in ‘high impact’ journals Demonstrate impact beyond academia Manage a research group REF demands Be available for students at all times Provide detailed and prompt feedback Give interesting lectures with enthusiasm Ensure students go on to get good jobs TEF demands Can one person do all of these? As well as being good ‘academic citizen’? 49
  50. 50. Alternative approach to restore balance of teaching and research pointless-metrics-and-take-a-hard-look-at-casualisation 2. 50
  51. 51. REF and TEF: The relevance of obliquity
  52. 52. • Basic actions • Intermediate goals • High-level objectives Example from Chapter 5 of Obliquity – John Kay 3 stonemasons working on a medieval cathedral – asked ‘what are you doing? A. “I’m cutting this stone to shape” B. “I’m building a great cathedral” C. ”I’m working for the glory of God” 52 “… decision making cannot proceed by defining objectives, analyzing them into goals and subsequently breaking them down into actions. No priest or politician, counsellor or manager, has the capacity to do this – and those who claim to.. have an immense capacity to damage the complex systems they attempt to plan. In that imperfectly understood world, high-level objectives are best achieved by constantly balancing their incompatible and incommensurable components – through obliquity.” (p. 44)
  53. 53. 53 • Ditch league tables; foster diversity of institutions • Academics have sacrificed their higher-level objectives by accepting that research and teaching should be treated as a competition • Stop trying to design perfect, comprehensive evaluation systems • Not realistic: question about balancing benefits vs costs of an inevitably incomplete and imperfect system • Radically rethink and simplify REF • Get rid of ‘quality evaluation’ which wastes time and generates bad incentives. Consider allocating block funding at level of institution in relation to research volume • Get rid of TEF • It’s a solution to a non-problem. Revert to system that deals with the rare failures (QAA); direct students to Unistat for basic information • Give academics time and security to do the job well • Attempts to create ‘incentive structures’ are superfluous; most academics are already motivated by higher-level objectives Recommendations
  54. 54. Dorothy V. M. Bishop Professor of Developmental Neuropsychology University of Oxford @deevybee @cdbuni Most points covered here are discussed in more detail on my blog: Google ‘bishopblog catalogue’ to find relevant posts. Also see: 54