RESEARCH EVALUATION: chasing
indicators of excellence and impact
Evidence Thomson Reuters
UHMLG Preparing for research assessment, Royal Society of Medicine
JONATHAN ADAMS, Director Research Evaluation
07 MARCH 2011
WE HAVE TO RESPOND TO GLOBAL
CHALLENGES
0
10
20
30
40
50
60
198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007
Annualvolumeofresearchpaperscomparedto1981
CHINA
World
EU
USA
2
RESEARCH ASSESSMENT PROVIDES US
WITH INFORMATION TO DO THAT
• Global challenges and dynamism
• Economic turbulence and threats to public resourcing in all
sectors
• Scarce resources
– must be distributed selectively
– in a manner that is equitable
– and maintains academic confidence
• But what are our criteria?
– What is research quality?
– What is excellence?
– What is impact?
3
WE CANNOT DIRECTLY ASSESS WHAT WE
WANT TO KNOW
4
Conventionally, this problem is addressed by expert and experienced peer
review.
Peer review is not without its problems.
Peer review of academic research tends to focus on academic impact, so
other forms of impact require merit review.
Expert review may be opaque to other stakeholders.
Objectivity is addressed by introducing quantitative indicators.
Research
quality
Research
black box
What we want
to know
INDICATORS, NOT METRICS
It’s like taking bearings from your yacht
5
A single indicator is not enoughA single indicator is not enough
Good combinations of indicators take
distinctive bearings, or differing
perspectives across the research
landscape
A single indicator is not enough
Good combinations of indicators take
distinctive bearings, or differing
perspectives across the research
landscape
They are unlikely to agree completely,
which gives us an estimate of our
uncertainty
PRINCIPLES OF QUANTITATIVE RESEACH
EVALUATION
• First, note that there are no absolutes; it’s all relative
• Impact may be local, national or international; we need
benchmarks to make any sense of a number
• Are the proposed data relevant to the question?
• Can the available data address the question?
• What data do we have that we can use ... ?
6
RESEARCH PERFORMANCE INDICATORS
COME FROM THE RESEARCH PROCESS
7
INPUTS
Research
quality
Research
black box OUTPUTS
What we want
to know
WE CAN EXTEND THIS OVER THE WHOLE
CYCLE (activities are then not synchronous)
8
What we
want to know
INPUTS
Research
quality
Research
black box OUTPUTS
Ideas:
proposals,
applications
and
partnerships
OUTCOMES
What we
want to know
INPUTS
What we have
to use
Research
quality
Research
black box
Numbers –
of researchers,
facilities,
collaboration
O
U
T
P
U
T
S
Journals and
Proceedings
Ideas:
proposals,
applications
and
partnerships
Trained
people
Licences and
spin outs
Patents
Deals and
revenue
Citation and
address links
Skilled
employment
Industrial
contracts
Research
awards O
U
T
C
O
M
E
S
Reports and
grey literature
Citations
Social policy
change
WE HAVE A WIDE RANGE OF DATA AND
POTENTIAL INDICATORS
9
Note that all these data points are characterised by:
Location – where the activity took place;
Time – when the activity took place;
Discipline – the subject matter of the activity
All these should be taken into account in evaluation
PRINCIPLES OF QUANTITATIVE RESEACH
EVALUATION
• Are the proposed data relevant to the question?
• Can the available data address the question?
• Are we comparing ‘like-with-like’?
• Can we test outcomes by using multiple indicators?
• Have we ‘normalised’ our data?
– Consider relative values, not absolute values
• Do we understand the characteristics of the data?
• Are there artefacts in the data that require editing?
• Do the results appear reasonable?
10
HOW CAN WE JUDGE POSSIBLE
INDICATORS?
• Relevant and appropriate
– Are indicators correlated with other performance estimates?
– Do indicators really distinguish ‘excellence’ as we see it?
– Are these the indicators the researchers would use?
• Cost effective
– Data accessibility, coverage, cost and validation
• Transparent, equitable and stable
– Are the characteristics and dynamics of the indicators clear?
– Are all institutions, staff and subjects treated equitably?
– How do people respond? Can they manipulate indicator outcomes?
• “Once an indicator is made a target for policy, it starts to lose the
information content that initially qualified it to play such a role”
(Goodhart’s Law)
11
COMMUNITY BEHAVIOUR HAS
RESPONDED TO EVALUATION
RAE1996
Science Engineering Social sciences Humanities and arts
Outputs % Outputs % Outputs % Outputs %
Books and chapters 5,013 5.8 2,405 8.1 16,185 35.1 22,635 44.4
Conference proceedings 2,657 3.1 9,117 30.8 3,202 6.9 2,133 4.2
Journal articles 77,037 89.8 16,951 57.3 22,575 49.0 15,135 29.7
Other 1,104 1.3 1,122 3.8 4,154 9.0 11,128 21.8
RAE2001
Books and chapters 1,953 2.5 1,438 5.4 12,972 28.6 25,217 46.5
Conference proceedings 751 0.9 3,944 14.9 857 1.9 1,619 3.0
Journal articles 76,182 95.8 20,657 78.1 29,449 65.0 17,074 31.5
Other 618 0.8 408 1.5 2,008 4.4 10,345 19.1
RAE2008
Books and chapters 1,048 1.2 216 1.2 12,632 19.0 21,579 47.6
Conference proceedings 2,164 2.5 326 1.8 614 0.9 897 2.0
Journal articles 80,203 93.8 17,451 95.4 50,163 75.5 14,543 32.1
Other 2,125 2.5 301 1.6 3,018 4.5 8,287 18.3
WHY BIBLIOMETRICS ARE A POPULAR
SOURCE OF RESEARCH INDICATORS
• Publication is a universal characteristic of academic research
and provides a standard ‘currency’
• Citations are a natural part of academic behaviour
• Citation counts are associated with academic ‘impact’
– Impact is arguably a proxy for quality
• Data are accessible, affordable and increasingly international
– though there is subject imbalance
• Data characteristics are well understood and widely explored
– Citation counts grow over time
– Citation behaviour is a cultural characteristic, which varies
between fields
– Citation behaviour may vary between countries
13
CITATION COUNTS GROW OVER TIME
AND RATES VARY BETWEEN FIELDS
0
10
20
30
40
198119861991199620012006
Citationsperpaper
Biology & Biochemistry Physics Engineering
14
PAPERS ARE MORE LIKELY TO BE CITED
OVER TIME
0
25
50
75
100
198119861991199620012006
Percentageofpapersremaininguncitedin2008
AUSTRALIA BELGIUM CHILE
15
RAW CITATION COUNTS MUST BE
ADJUSTED USING A BENCHMARK
• First, we need to separate articles and reviews
• Then ‘normalise’ the raw count by using a global reference
benchmark
• Take year of publication into account
• Take field into account
• But how do we define field?
– Projects funded by a Research Council
– Departments which host a group of researchers
– Journal set linked by citations
– Granularity
• Physiology – Life science – Natural sciences
16
NORMALISED CITATION IMPACT
CORRELATES WITH PEER REVIEW (Chemistry data)
0
0.5
1
1.5
2
0 1 2 3 4 5 6
RAE2001 mapped articles (RBI)
HEI5yrav'geRBI1996-2000
Grade 3a
Grade 3b
Grade 4
Grade 5
Grade 5*
Spearman r = 0.57, P<0.001
Ratio mapped/NSI = 1.93
17
Methodology affects the detail but
not the sense of the outcome
THIS IS MOSTLY ABOUT EXCELLENCE:
WHAT IS IMPACT?
• Research excellence might be termed ‘academic impact’
• Other forms of impact for which we legitimately may seek
evaluation are
– Economic impact
– Social impact
• Quantitative research evaluation traces its origins back to the
1980s
• The DTI spent much money in the 1990s failing to index
economic impact
• It is difficult to track many research innovations through to a new
product or process, or vice versa
– Links are many-to-many and time-delayed
• Social impact is difficult to define or capture
18
CHASING IMPACT
• Eugene Garfield originally talked about citation counts as an
index of ‘impact’, fifty years ago
• Current focus on economic and social impact should be seen
as a serious engagement with other modes of recognising
and demonstrating the value of original and applied research
• Of course
– The objectives are undefined, which undermines any evaluation
– It is easier to do this in some subjects than others
– Much of the current material is anecdotal
– It is difficult to validate without indicators
• But a start has been made
– The principles should follow those of research evaluation
– There must be ownership by the disciplinary communities
19
RESEARCH EVALUATION: chasing
indicators of excellence and impact
Evidence Thomson Reuters
UHMLG Preparing for research assessment, Royal Society of Medicine
JONATHAN ADAMS, Director Research Evaluation
07 MARCH 2011

Adams2011

  • 1.
    RESEARCH EVALUATION: chasing indicatorsof excellence and impact Evidence Thomson Reuters UHMLG Preparing for research assessment, Royal Society of Medicine JONATHAN ADAMS, Director Research Evaluation 07 MARCH 2011
  • 2.
    WE HAVE TORESPOND TO GLOBAL CHALLENGES 0 10 20 30 40 50 60 198119821983198419851986198719881989199019911992199319941995199619971998199920002001200220032004200520062007 Annualvolumeofresearchpaperscomparedto1981 CHINA World EU USA 2
  • 3.
    RESEARCH ASSESSMENT PROVIDESUS WITH INFORMATION TO DO THAT • Global challenges and dynamism • Economic turbulence and threats to public resourcing in all sectors • Scarce resources – must be distributed selectively – in a manner that is equitable – and maintains academic confidence • But what are our criteria? – What is research quality? – What is excellence? – What is impact? 3
  • 4.
    WE CANNOT DIRECTLYASSESS WHAT WE WANT TO KNOW 4 Conventionally, this problem is addressed by expert and experienced peer review. Peer review is not without its problems. Peer review of academic research tends to focus on academic impact, so other forms of impact require merit review. Expert review may be opaque to other stakeholders. Objectivity is addressed by introducing quantitative indicators. Research quality Research black box What we want to know
  • 5.
    INDICATORS, NOT METRICS It’slike taking bearings from your yacht 5 A single indicator is not enoughA single indicator is not enough Good combinations of indicators take distinctive bearings, or differing perspectives across the research landscape A single indicator is not enough Good combinations of indicators take distinctive bearings, or differing perspectives across the research landscape They are unlikely to agree completely, which gives us an estimate of our uncertainty
  • 6.
    PRINCIPLES OF QUANTITATIVERESEACH EVALUATION • First, note that there are no absolutes; it’s all relative • Impact may be local, national or international; we need benchmarks to make any sense of a number • Are the proposed data relevant to the question? • Can the available data address the question? • What data do we have that we can use ... ? 6
  • 7.
    RESEARCH PERFORMANCE INDICATORS COMEFROM THE RESEARCH PROCESS 7 INPUTS Research quality Research black box OUTPUTS What we want to know
  • 8.
    WE CAN EXTENDTHIS OVER THE WHOLE CYCLE (activities are then not synchronous) 8 What we want to know INPUTS Research quality Research black box OUTPUTS Ideas: proposals, applications and partnerships OUTCOMES
  • 9.
    What we want toknow INPUTS What we have to use Research quality Research black box Numbers – of researchers, facilities, collaboration O U T P U T S Journals and Proceedings Ideas: proposals, applications and partnerships Trained people Licences and spin outs Patents Deals and revenue Citation and address links Skilled employment Industrial contracts Research awards O U T C O M E S Reports and grey literature Citations Social policy change WE HAVE A WIDE RANGE OF DATA AND POTENTIAL INDICATORS 9 Note that all these data points are characterised by: Location – where the activity took place; Time – when the activity took place; Discipline – the subject matter of the activity All these should be taken into account in evaluation
  • 10.
    PRINCIPLES OF QUANTITATIVERESEACH EVALUATION • Are the proposed data relevant to the question? • Can the available data address the question? • Are we comparing ‘like-with-like’? • Can we test outcomes by using multiple indicators? • Have we ‘normalised’ our data? – Consider relative values, not absolute values • Do we understand the characteristics of the data? • Are there artefacts in the data that require editing? • Do the results appear reasonable? 10
  • 11.
    HOW CAN WEJUDGE POSSIBLE INDICATORS? • Relevant and appropriate – Are indicators correlated with other performance estimates? – Do indicators really distinguish ‘excellence’ as we see it? – Are these the indicators the researchers would use? • Cost effective – Data accessibility, coverage, cost and validation • Transparent, equitable and stable – Are the characteristics and dynamics of the indicators clear? – Are all institutions, staff and subjects treated equitably? – How do people respond? Can they manipulate indicator outcomes? • “Once an indicator is made a target for policy, it starts to lose the information content that initially qualified it to play such a role” (Goodhart’s Law) 11
  • 12.
    COMMUNITY BEHAVIOUR HAS RESPONDEDTO EVALUATION RAE1996 Science Engineering Social sciences Humanities and arts Outputs % Outputs % Outputs % Outputs % Books and chapters 5,013 5.8 2,405 8.1 16,185 35.1 22,635 44.4 Conference proceedings 2,657 3.1 9,117 30.8 3,202 6.9 2,133 4.2 Journal articles 77,037 89.8 16,951 57.3 22,575 49.0 15,135 29.7 Other 1,104 1.3 1,122 3.8 4,154 9.0 11,128 21.8 RAE2001 Books and chapters 1,953 2.5 1,438 5.4 12,972 28.6 25,217 46.5 Conference proceedings 751 0.9 3,944 14.9 857 1.9 1,619 3.0 Journal articles 76,182 95.8 20,657 78.1 29,449 65.0 17,074 31.5 Other 618 0.8 408 1.5 2,008 4.4 10,345 19.1 RAE2008 Books and chapters 1,048 1.2 216 1.2 12,632 19.0 21,579 47.6 Conference proceedings 2,164 2.5 326 1.8 614 0.9 897 2.0 Journal articles 80,203 93.8 17,451 95.4 50,163 75.5 14,543 32.1 Other 2,125 2.5 301 1.6 3,018 4.5 8,287 18.3
  • 13.
    WHY BIBLIOMETRICS AREA POPULAR SOURCE OF RESEARCH INDICATORS • Publication is a universal characteristic of academic research and provides a standard ‘currency’ • Citations are a natural part of academic behaviour • Citation counts are associated with academic ‘impact’ – Impact is arguably a proxy for quality • Data are accessible, affordable and increasingly international – though there is subject imbalance • Data characteristics are well understood and widely explored – Citation counts grow over time – Citation behaviour is a cultural characteristic, which varies between fields – Citation behaviour may vary between countries 13
  • 14.
    CITATION COUNTS GROWOVER TIME AND RATES VARY BETWEEN FIELDS 0 10 20 30 40 198119861991199620012006 Citationsperpaper Biology & Biochemistry Physics Engineering 14
  • 15.
    PAPERS ARE MORELIKELY TO BE CITED OVER TIME 0 25 50 75 100 198119861991199620012006 Percentageofpapersremaininguncitedin2008 AUSTRALIA BELGIUM CHILE 15
  • 16.
    RAW CITATION COUNTSMUST BE ADJUSTED USING A BENCHMARK • First, we need to separate articles and reviews • Then ‘normalise’ the raw count by using a global reference benchmark • Take year of publication into account • Take field into account • But how do we define field? – Projects funded by a Research Council – Departments which host a group of researchers – Journal set linked by citations – Granularity • Physiology – Life science – Natural sciences 16
  • 17.
    NORMALISED CITATION IMPACT CORRELATESWITH PEER REVIEW (Chemistry data) 0 0.5 1 1.5 2 0 1 2 3 4 5 6 RAE2001 mapped articles (RBI) HEI5yrav'geRBI1996-2000 Grade 3a Grade 3b Grade 4 Grade 5 Grade 5* Spearman r = 0.57, P<0.001 Ratio mapped/NSI = 1.93 17 Methodology affects the detail but not the sense of the outcome
  • 18.
    THIS IS MOSTLYABOUT EXCELLENCE: WHAT IS IMPACT? • Research excellence might be termed ‘academic impact’ • Other forms of impact for which we legitimately may seek evaluation are – Economic impact – Social impact • Quantitative research evaluation traces its origins back to the 1980s • The DTI spent much money in the 1990s failing to index economic impact • It is difficult to track many research innovations through to a new product or process, or vice versa – Links are many-to-many and time-delayed • Social impact is difficult to define or capture 18
  • 19.
    CHASING IMPACT • EugeneGarfield originally talked about citation counts as an index of ‘impact’, fifty years ago • Current focus on economic and social impact should be seen as a serious engagement with other modes of recognising and demonstrating the value of original and applied research • Of course – The objectives are undefined, which undermines any evaluation – It is easier to do this in some subjects than others – Much of the current material is anecdotal – It is difficult to validate without indicators • But a start has been made – The principles should follow those of research evaluation – There must be ownership by the disciplinary communities 19
  • 20.
    RESEARCH EVALUATION: chasing indicatorsof excellence and impact Evidence Thomson Reuters UHMLG Preparing for research assessment, Royal Society of Medicine JONATHAN ADAMS, Director Research Evaluation 07 MARCH 2011