Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Research Evaluation: When you measure a system, you change the system

Intervento di Giorgio Sirilli al Workshop " Recent Developments in Evaluation Systems of Research, Napoli, 13 maggio 2015

  • Be the first to comment

Research Evaluation: When you measure a system, you change the system

  2. 2. 2 22 ROARS Start: 2011 Members of the Editorial board: 14 Collaborators: 250 Contacts: 10,6 million (November 2011 – May 2015) Average daily contacts: 500 November 2011; 8,000 in 2014) Articles published: 2,000 Comments by readers: 30,000 ROARS is ranked 8° among the top cultural national blogs ROARS, a genuine expression of democracy and participation, has become a very important player in the policy debate and in policy making
  3. 3. 3 33 Evaluation Evaluation may be defined as an objective process aimed at the critical analysis of the relevance, efficiency, and effectiveness of policies, programmes, projects, institutions, groups and individual researchers in the pursuance of the stated objectives. Evaluation consists of a set of coordinated activities of comparative nature, based on formalised methods and techniques through codified procedures aimed at formulating an assessment of intentional interventions with reference to their implementation and to their effectiveness. Internal/external evaluation
  4. 4. 4 44 The first evaluation (Genesis) The first evaluation In the beginning God created the heaven and the earth. And God saw everything that He had made. “Behold”, God said, “it is very good”. And the evening and morning were the sixth day. And on the seventh day God rested from all His work. His Archangel came then unto Him asking, “God, how do you know that what You have created is ‘very good’? What are Your criteria? On what data do You base Your judgement? Aren’t You a little close to the situation to make a fair and unbiased evaluation?” God thought about these questions all that day and His rest was greatly disturbed. On the eighth day, God said, “Lucifer, go to hell!” (From Halcom’s “The Real Story of Paradise Lost”)
  5. 5. 5 55 A brief history of evaluation Research Assessment Exercise (RAE) Research Excellence Framework (REF) (impact) “The REF will over time doubtless become more sophisticated and burdensome. In short we are creating a Frankenstein monster” (Ben Martin) Italy, a latecomer Evaluation in Italy: yes or no? Yes, but … good evaluation
  6. 6. 6 66 What do we evaluate?
  7. 7. 7 77 The value of science William Gladstone, then British Chancellor of the Exchequer (minister of finance), asked Michael Faraday of the practical value of electricity. Gladstone’s only commentary was ‘but, after all, what use is it?” “Why, sir, there is every probability that you will soon be able to tax it.” Michael Faraday William Gladstone
  8. 8. 8 88 The case of physicists Bruno Maksimovič Pontekorvo
  9. 9. 9 99 The case of physicists “Physics is a single discipline but unfortunately nowadays phisicists belong to two differents groups: the theoreticians and the experimentalists. If a thoretician does not posses an extraordinary ability his work does not make sense ….For experimentalists also ordinary peole can do a useful work …” (Enrico Fermi, 1931) “La fisica è una sola ma disgraziatamente oggi i fisici sono divisi in due categorie: i teorici e gli sperimentatori. Se un teorico non possiede straordinarie capacità il suo lavoro non ha senso… Per quanto riguarda la sperimentazione invece anche una persona di medie capacità ha la possibilità di svolgere un lavoro utile.”
  10. 10. 10 1010 The case of graphene Grapheneis an allotrope of carbon in the form of a two-dimensional, atomic-scale, hexagonal lattice. Graphene has many extraordinary properties. It is about 100 times stronger than steel by weight, conducts heat and electricity with great efficiency and is nearly transparent. Scientists it was first measurably produced and isolated in the lab in 2003. Andre Geim and Konstantin Novoselov at the University of Manchester won the Nobel Prize in Physics in 2010 "for groundbreaking experiments regarding graphene." The global market for graphene is reported to have reached $9 million by 2014 with most sales in the semiconductor, electronics, battery energy and composites industries.
  11. 11. 11 1111 The famous paper by Andre Geim and Konstantin Novoselov was published in 2004 and in 2007 it was indeed quite famous and cited. The point is whether the committee would have selected his project and awarded him with an ERC Starting Grant in 2004. By looking at his citations and publications records in 2004 it is very un-probable that he would have been considered among the top 10%. The case of graphene
  12. 12. 12 1212 2004 2004 The case of graphene
  13. 13. 13 1313 The knowledge bundle
  14. 14. 14 1414 The knowledge institutions University teaching research “third mission” Research agencies research problem solving management
  15. 15. 15 1515 The neo-conservative wave of the 1980s
  16. 16. 16 1616 The new catchwords New public management Value for money Accountability Relevance Excellence
  17. 17. 17 1717 The neo-conservative wave in Italy Letizia Moratti Italian minister of education and research “You first show that you use efficiently and effectively the public money, then we will open the strings of the purse” Never happened!
  18. 18. 18 1818 Model of firm’s management based on the principles of competitiveness and customer satisfaction (the market) The catchwords: competitiveness excellence meritocracy “Evaluative state” as the “minimum state” in which the government gives up the role of political responsibility and avoid the democratic debate in search of consensus, and rests on the “automatic pilot” of techno-administrative control. Contro l’ideologia della valutazione. L’ANVUR e l’arte della rottamazione dell’università
  19. 19. 19 1919 Contro l’ideologia della valutazione. L’ANVUR e l’arte della rottamazione dell’università “ANVUR is much more than an administrative branch. It is the outcome of a cultural and political project aimed at reducing the range of alternatives and hampering pluralism.” Sergio Benedetto
  20. 20. 20 2020 Changes in university life The university has become at the mercy of: - increasing bibliometric measurement - quality standards - blind refereeing (someone sees you but you do not see him) - bibliometric medians - journal classifications (A, B, C, …) - opportunistic citing - academic tourism - administrative burden - …….
  21. 21. 21 2121 Interview of Italian researchers (40-65 years old) Main results: A drastic change of researchers’ attitude due to the introduction of bibliometrics-based evaluation The bibliometrics-based evaluation has an extremely strong normative function on scientific practices, which deeply impact the epistemic status of the disciplines The epistemic consequences of bibliometrics-based evaluation (T. Castellani, E. Pontecorvo, A. Valente, Epistemological consequences of bibliometrics: Insights from the scientific community, Social Epistemology Review and Reply Collective vol. 3 no. 11, 2014).
  22. 22. 22 2222 Results 1. The bibliometrics-based evaluation criteria changed the way in which scientists choose the topic of their research: -choosing a fashionable theme -placing the article in the tail of an important discovery (bandwagon effect) -choosing short empirical papers 2. The hurry 3. Interdisciplinary topics are hindered. Bibliometric evaluative systems encourage researchers not to change topic during their career 4. repetition of experiments is discouraged. Only new results are considered interesting(T. Castellani, E. Pontecorvo, A. Valente, Epistemological consequences of bibliometrics: Insights from the scientific community, Social Epistemology Review and Reply Collective vol. 3 no. 11, 2014). The epistemic consequences of bibliometrics-based evaluation
  23. 23. 23 2323 Excellence CNR Statute 2011 CNR Statute 2015
  24. 24. 24 2424 Research evaluation Indicators used - bibliometrics - R&D - peer review - students - graduates - patents - spin-offs - contracts and other funding - other
  25. 25. 25 2525 Some indicators Number of publications Number of citations Impact factor h-index
  26. 26. 26 2626 Use of publications for decision making The case of China (SCI) The case of Russia
  27. 27. 27 2727 The h-index (Jorge Eduardo Hirsch) In 2005, the physicist Jorge Hirsch suggested a new index to measure the broad impact of an individual scientist’s work, the h-index . A scientist has index h if h of his or her Np papers have at least h citations each and the other (Np − h) papers have ≤ h citations each. In plain terms, a researcher has an h-index of 20 if he or she has published 20 articles receiving at least 20 citations each.
  28. 28. 28 2828 Impact factor (Eugene Fardfield) The impact factor (IF) of an academic journal is a measure reflecting the average number of citations to recent articles published in that journal. It is frequently used as a proxy for the relative importance of a journal within its field. In any given year, the impact factor of a journal is the average number of citations received per paper published in that journal during the two preceding years. For example, if a journal has an impact factor of 3 in 2008, then its papers published in 2006 and 2007 received 3 citations each on average in 2008. ("Citable items" for this calculation are usually articles, reviews, proceedings, or notes; not editorials or letters to the editor).
  29. 29. 29 2929 Nobel laureates and bibliometrics (Boson in 2013) Peter Ware Higgs 13 works, mostly in “minor” journal, h-index = 6 Francois Englert 89 works, both in prestigious and minor journals, h-index = 10 W. S. Boyle h-index = 7 G. E. Smith h-index = 5 C. K. Kao h-index = 1 T. Maskawa h-index = 1 Y. Namby h-index = 17
  30. 30. 30 3030 Science and ideology: the impact on citations 0 500 1,000 1,500 2,000 2,500 3,00080 82 84 86 88 90 92 94 96 98 00 02 04 06 CITATION YEAR NRCITES MARX LENIN Fall of the Berlin wall Berlin Nov. 1989
  31. 31. 31 3131 San Francisco Declaration on Research Assessment The Journal Impact Factor, as calculated by Thomson Reuters, was originally created as a tool to help librarians identify journals to purchase, not as a measure of the scientific quality of research in an article. With that in mind, it is critical to understand that the Journal Impact Factor has a number of well-documented deficiencies as a tool for research assessment. These limitations include: A) citation distributions within journals are highly skewed; B) the properties of the Journal Impact Factor are field-specific: it is a composite of multiple, highly diverse article types, including primary research papers and reviews; C) Journal Impact Factors can be manipulated (or “gamed”) by editorial policy; and D) data used to calculate the Journal Impact Factors are neither transparent nor openly available to the public.
  32. 32. 32 3232 San Francisco Declaration on Research Assessment General Recommendation Do not use journal-based metrics, such as Journal Impact Factors, as a surrogate measure of the quality of individual research articles, to assess an individual scientist’s contributions, or in hiring, promotion, or funding decisions. San Francisco Declaration on Research Assessment
  33. 33. 33 3333 The Leiden manifesto on bibliometrics
  34. 34. 34 3434 The Leiden Manifesto Bibliometrics: The Leiden Manifesto for research metrics “Data are increasingly used to govern science. Research evaluations that were once bespoke and performed by peers are now routine and reliant on metrics. The problem is that evaluation is now led by the data rather than by judgement. Metrics have proliferated: usually well intentioned, not always well informed, often ill applied. We risk damaging the system with the very tools designed to improve it, as evaluation is increasingly implemented by organizations without knowledge of, or advice on, good practice and interpretation.”
  35. 35. 35 3535 The Leiden Manifesto – Ten principles 1) Quantitative evaluation should support qualitative, expert assessment. 2) Measure performance against the research missions of the institution, group or researcher. 3) Protect excellence in locally relevant research. 4) Keep data collection and analytical processes open, transparent and simple. 5) Allow those evaluated to verify data and analysis.
  36. 36. 36 3636 6) Account for variation by field in publication and citation practices. 7) Base assessment of individual researchers on a qualitative judgment of their portfolio. 8) Avoid misplaced concreteness and false precision. 9) Recognize the systemic effects of assessment and indicators. 10) Scrutinize indicators regularly and update them. The Leiden Manifesto – Ten principles
  37. 37. 37 3737 Ranking universities and research agencies ---- CNR Fraunhofer CNRS ---- ---- ----
  38. 38. 38 3838 Ranking universities and research agencies Evaluating, difficult and even dangerous ….
  39. 39. 39 3939 Ranking of universities Four major sources of ranking ARWU Shangai (Shangai, Jiao Tong University) QS World University Ranking THE University Ranking (Times Higher Education) US News e World Reports (Best Global Universities)
  40. 40. Criteria selected as the key pillars of what makes a world class university: •Research •Teaching •Employability •Internationalisation •Facilities •Social Responsibility •Innovation •Arts & Culture •Inclusiveness •Specialist Criteria TopUNIVERSITIES Worldwide university rankings, guides & events
  41. 41. 41 4141 Global rankings cover less than 3-5% of the world universities Performance Top20 Top500 Next 500 Numberofuniversities Other 16,500 universities
  42. 42. 42 4242 Ranking of universities: the case of Italy ARWU Shangai (Shangai, Jiao Tong University) QS World University Ranking THE University Ranking (Times Higher Education) US News e World Reports (Best Global Universities) ARWU Shangai: Bologna 173,, Milano 186, Padova 188, Pisa 190, Sapienza 191 QS World University Ranking: Bologna 182,, Sapienza 202, Politecnico Milano 229 World University Ranking SA: Sapienza 95, Bologna 99, Pisa 184, Milano 193 US News e World Report: Sapienza 139, Bologna 146, Padova 146, Milano 155
  43. 43. 43 4343 The rank-ism (De Nicolao)
  44. 44. 44 4444 The rank-ism (De Nicolao) The vice-rector of the univerisity of Pavia declared that “There are various rankings in the world: in each of them the University of Pavia ranks in the firts 1%. But it is not true. According to three agencies Pavia is in the following positions: 371: QS World University Rankings 251-275: Times Higher Education 401-500: Shanghai Ranking (ARWU) Pavia
  45. 45. 45 4545 Evaluation is an expensive exercise Rule of thumb: less than 1% of R&D budget devoted to its evaluation Evaluation of the Quality of Research (VQR) 300 million Euro (ROARS) 182 million Euro (Geuna) Research Assessment Exercise (RAE) 540 million Euro Research Excellence Framework (REF) 1 milllion Pounds (500 million)
  46. 46. 46 4646 Evaluation is an expensive exercise National Scientific Habilitation: 126 million Euro - Cost per application: 2,300 euro - Cost per job assigned: 32,000 euro
  47. 47. 47 4747 Cost of evaluation: the saturation effect Source: Geuna and Martin
  48. 48. 48 4848 Source: Geuna and Martin Cost of evaluation: a systematic loss
  49. 49. 49 4949 Evaluation of the Quality of Research by ANVUR Researchers’ products to be evaluated - journal articles - books and book chapters - patents - designs, exhibitions, software, manufactured items, prototypes, etc. University teachers: 3 “products” over the period 2004-2010 Public Research Agencies researchers: 6 “products” over the period 2004-2010 Scores: from 1 (excellent) to -1 (missing)
  50. 50. 50 5050 Attention basically here! Evaluation of the Quality of Research by ANVUR Indicators linked to research: quality (0,5) ability to attract resources (0,1) mobility (0,1) internazionationalisation (0,1) high level education (0,1) own resources (0,05) improvement (0,05)
  51. 51. 51 5151 Evaluation of the Quality of Research by ANVUR Indicators of the “third mission” : fund raising (0,2) patents (0,1) spin-offs (0,1) incubators (0,1) consortia (0,1) archaeological sites (0,1) museums (0,1) other activities (0,2)
  52. 52. 52 5252 Call for Papers for Philosophy and Technology’s special issue: Toward a Philosophy of Impact There was a time when serendipity played a central role in knowledge policy. Scientific advancement was viewed as essential for social progress, but this was paired with the assumption that it was generally impossible to steer research directly toward desired outcomes. Attempts to guide the course of research or predict its societal impacts were seen as impeding the advancement of science and thus of social welfare. Driven in part by budgetary constraints, and in part by ideology, the age of serendipity is being eclipsed by the age of accountability. Society increasingly requires academics to give an account of the value of their research. The ‘audit culture’ now permeates the university from STEM (science, technology, engineering, and math) through HASS (humanities, arts, and social sciences). Academics are being asked to consider not just how their work influences their disciplines, but also other disciplines and society more generally.
  53. 53. 53 5353 A warning “Science today is riven with perverse incentives: Researchers judge one another not by the quality of their science — who has time to read all that? — but by the pedigree of their journal publications. High-profile journals pursue flashy results, many of which won’t pan out on further scrutiny. Universities reward researchers on those publication records. Financing agencies, reliant on peer review, direct their grant money back toward those same winners. Graduate students, dependent on their advisers and neglected by their universities, receive minimal, ad hoc training on proper experimental design, believing the system of rewards is how it always has been and how it always will be.” The Cronicle of Higher Education (March 16, 2015) Amid a Sea of False Findings, the NIH Tries Reform - By Paul Voosen
  54. 54. 54 5454 Lessons from Research Evaluation Evaluation in Italy is going to stay The system has been measured and has changed Awareness of the limitations of metrics The challenge: avoid that evaluation becomes a Frankenstein monster Main problems: League tables Competition vs cooperation of scientists Peer review vs bibliometrics NSE vs SSH Opportunistic behaviour The split of the academic community (the good and the bad guys) The equilibrium amongst the teaching, research and third mission Bureacratisation The use of evaluation for polict purposes
  55. 55. 55 5555 Research Evaluation Thank you for attention