Bias against Novelty in Science:
A Cautionary Tale for Users of
Bibliometric Indicators
OECD Blue Sky Forum
September 19, 2016
Jian Wang (KU Leuven)
Reinhilde Veugelers (KU Leuven, Bruegel & CEPR)
Paula Stephan (Georgia State University & NBER)
In a nutshell
• Develop a bibliometric measure of combinatorial novelty.
• Study the impact profile of novel research:
o High risk: higher variance in citations.
o High gain: highly cited, and inspire follow-on highly-cited papers.
o Transdisciplinary impact: broader impact, highly cited in foreign but
not home fields.
o Delayed recognition: not highly cited in the short run.
o Published in low Impact Factor journals.
• Implication:
o Bias against novelty in standard bibliometric indicators.
o Appreciation of novel research comes from foreign fields.
Why do we care?
• Novel research  “High risk/high gain”  public support.
• Funding agencies are increasingly risk-averse.
o Roger Kornberg, Nobel Laureate, “If the work that you propose to
do isn’t virtually certain of success, then it won’t be funded.”
• Bibliometrics is increasingly used in funding decisions.
o Performance based research funding systems.
• Research Question:
o What is the relationship between novelty and citation impact?
o Are there potential biases in standard bibliometric indicators against
novelty?
Conceptualizing novelty
The creation of any sort of novelty in art, science, or practical
life – consists to a substantial extent of a recombination of
conceptual and physical materials that were previously in
existence.
-- Nelson and Winter (1982)
• Combinatorial novelty: combining existing scientific
components in an unprecedented fashion.
o Economists (Schumpeter, 1939; Nelson & Winter, 1982); psychologists
(Mednick, 1962; Simonton, 2004); sociologists (Latour & Woolgar, 1986).
• Combinatorial novelty is just one dimension of novelty.
Measuring novelty
• For each paper, retrieve its co-cited journal pairs.
• Identify new pairs.
• Check how distant are the combined journals, by
comparing their co-cited journal profiles.
o Cosine similarity (COSi,j) between their journal co-citation profiles in
the preceding three years.
• 𝑁𝑜𝑣𝑒𝑙𝑡𝑦 = 𝐽 𝑖−𝐽 𝑗 𝑝𝑎𝑖𝑟 𝑖𝑠 𝑛𝑒𝑤 1 − 𝐶𝑂𝑆𝑖,𝑗
• To avoid trivial combinations:
o Exclude 50% least cited journals (in the preceding 3 years).
o Require to be reused in the next 3 years.
o Results robust when relaxing these constraints.
Measuring novelty: An example
Denk & Horstmann (2004) Serial block-face scanning electron microscopy to
reconstruct three-dimensional tissue nanostructure. PLoS biology, 2(11), e329.
o cites 19 WoS-indexed journals, and 9 (out of 171) pairs are new.
• Nature Materials: Chemistry, Physical; Materials Science,
Multidisciplinary; Physics, Applied; and Physics, Condensed Matter.
• Others: Neurosciences; Cell Biology; and Physiology.
Journal 1 Journal 2
1 NATURE MATERIALS CURRENT OPINION IN NEUROBIOLOGY
2 NATURE MATERIALS DEVELOPMENTAL DYNAMICS
3 NATURE MATERIALS PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF
LONDON SERIES B-BIOLOGICAL SCIENCES
4 NATURE MATERIALS EUROPEAN JOURNAL OF NEUROSCIENCE
5 NATURE MATERIALS JOURNAL OF HISTOTECHNOLOGY
6 NATURE MATERIALS SCANNING
7 NATURE MATERIALS BRAIN RESEARCH REVIEWS
8 NATURE MATERIALS ANNUAL REVIEW OF BIOPHYSICS AND BIOENGINEERING
9 NATURE MATERIALS PFLUGERS ARCHIV-EUROPEAN JOURNAL OF PHYSIOLOGY
Measuring novelty: An example
o How distant are NATURE MATERIALS and CURRENT OPINION
IN NEUROBIOLOGY?
o Journal co-citation matrix (2001-2003)
o 𝐶𝑂𝑆1,2 =
𝐽1∙𝐽2
𝐽1 𝐽2
=
331×9691+110×0+0×9959+⋯
02+3312+1102+02+⋯ × 02+96912+02+99592+⋯
=
0.31
J1 J2 J3 J4 J5 … JN
J1 NATURE MATERIALS / 0 331 110 0 … …
J2 CURRENT OPINION 0 / 9691 0 9959 … …
J3 SCIENCE 331 9691 / … … … …
J4 NANO LETTERS 110 0 … / … … …
J5 J. OF NEUROSCIENCE 0 9959 … … / … …
… … … … … … / …
JN … … … … … … /
Measuring novelty: An example
Journal 1 Journal 2 novelty
1 NATURE MATERIALS CURRENT OPINION IN NEUROBIOLOGY 0.69
2 NATURE MATERIALS DEVELOPMENTAL DYNAMICS 0.72
3 NATURE MATERIALS PHILOSOPHICAL TRANSACTIONS … 0.56
4 NATURE MATERIALS EUROPEAN JOURNAL OF NEUROSCIENCE 0.74
5 NATURE MATERIALS JOURNAL OF HISTOTECHNOLOGY 0.73
6 NATURE MATERIALS SCANNING 0.36
7 NATURE MATERIALS BRAIN RESEARCH REVIEWS 0.76
8 NATURE MATERIALS ANNUAL REVIEW OF BIOPHYSICS … 0.50
9 NATURE MATERIALS PFLUGERS … 0.74
o Novelty of the paper = 5.79, top 1% highly novel in its subject
categories.
o This paper was NOT among the top 1% highly cited papers until
2012/2013.
Measuring novelty
• Novelty scores are highly skewed.
• Categorical measure: NOV CAT:
1. non-novel, if a paper has no new journal combinations;
2. moderately novel, if a paper makes new combinations but has a
novelty score lower than the top 1% of its subject category;
3. highly novel, if a paper has a novelty score among the top 1%.
• 661,643 unique pubs, 1,038,238 obs. in 2001.
% of all
papers
avg # new
pairs
median #
new pairs
Avg (avg
cos)
Avg(min
cos)
Non-novel 89% / / / /
Moderately 10% 1.76 1.00 0.22 0.19
Highly 1% 8.39 7.00 0.13 0.06
Novelty and impact
• Data:
o 661,643 unique articles in 2001 in WoS.
o 1,038,238 obs.
o Papers with multiple subject categories are counted multiple times.
• Dependent variables:
o Various aspects of impact.
• Independent variable:
o Categorical novelty measure: NOV CAT
• Control:
o Number of references and authors, whether internationally
coauthored, subject category dummies.
High risk of novel research
*** p<.001, ** p<.01, * p<.05, + p<.10.
Control for international co-authorship, number of authors (ln), number of
references (ln), and scientific field fixed effects.
15-year
citations
GNB
Mean
Moderately- 0.032***
Highly novel 0.146***
…
Dispersion
Moderately- -0.001
Highly novel 0.162***
…
Citation
classes (15y)
Multi-logit
top10% vs mid80%
Moderately- 0.056***
Highly novel 0.162***
…
low10% vs mid80%
Moderately- -0.054**
Highly novel 0.137**
…
High gain from novel research
Top 1% cited
(15y)
logit
Cited by big hits
(10y)
logit
Moderately novel 0.122*** 0.055***
Highly novel 0.451*** 0.229***
10y citations (ln) 1.669***
• Novel papers are more likely to become big hits, i.e., top
1% highly cited in the field.
• Novel papers are more likely to be cited by papers which
themselves become big hits.
Transdisciplinary impact
# citing
fields
(15y)
Poisson
Ratio
foreign
field
citations
(15y)
OLS
Max dist.:
citing-
home
field
(15y)
OLS
Top 1%
cited
home
field
(15y)
logit
Top 1%
cited
foreign
field
(15y)
logit
Moderately- 0.100*** 0.050*** 0.016*** -0.102** 0.318***
Highly novel 0.177*** 0.083*** 0.030*** 0.010 0.669***
15y cites (ln) 0.494*** 0.002***
15y foreign
cites (ln)
0.052***
• Novel papers are cited in more fields and fields further
away from their home field.
• Novel papers are highly cited in foreign fields but not in
their home field.
Top 1%
cited (3y)
logit
Moderately- -0.102**
Highly novel -0.031
Delayed recognition
• Novel papers are more likely to be top cited in the long run,
but not in the short run.
• Delayed recognition.
o Ahead of its time.
o Resistance from incumbent scientific paradigms.
Top 1%
cited (15y)
logit
Moderately- 0.122***
Highly novel 0.451***
Bias against novelty
• Novel papers are less likely to be published in journals
with high Impact Factors.
JIF
Poisson
JIF
Poisson
JIF
Poisson
Moderately novel -0.103*** -0.101*** -0.079***
Highly novel -0.182*** -0.180*** -0.136***
Journal age < 4 -0.398***
Journal age (ln) 0.250***
Summary
Implications
• Potential bias against novel research in science policy
using journal impact factor or short-term citations.
• Over-reliance on such measures
o Directly, discourage novel research that might of great value.
o Indirectly, miss follow-on breakthroughs build on novel research.
• The monodisciplinary approach in peer review may fail to
recognize the full value of novel research.
Caveats
• Combinatorial novelty, other dimensions of novelty
• Not all breakthrough research is “novel”
• Data are truncated
• “Gaming” system could become concern if review bodies
focused on “novel” indicator
• Note: important for public agencies to have a portfolio that
includes risk; not all research funded should be risky. Real
role for “ditch diggers”
Thanks for your attention!
Questions, comments?

Wang - Bias againt novelty in science

  • 1.
    Bias against Noveltyin Science: A Cautionary Tale for Users of Bibliometric Indicators OECD Blue Sky Forum September 19, 2016 Jian Wang (KU Leuven) Reinhilde Veugelers (KU Leuven, Bruegel & CEPR) Paula Stephan (Georgia State University & NBER)
  • 2.
    In a nutshell •Develop a bibliometric measure of combinatorial novelty. • Study the impact profile of novel research: o High risk: higher variance in citations. o High gain: highly cited, and inspire follow-on highly-cited papers. o Transdisciplinary impact: broader impact, highly cited in foreign but not home fields. o Delayed recognition: not highly cited in the short run. o Published in low Impact Factor journals. • Implication: o Bias against novelty in standard bibliometric indicators. o Appreciation of novel research comes from foreign fields.
  • 3.
    Why do wecare? • Novel research  “High risk/high gain”  public support. • Funding agencies are increasingly risk-averse. o Roger Kornberg, Nobel Laureate, “If the work that you propose to do isn’t virtually certain of success, then it won’t be funded.” • Bibliometrics is increasingly used in funding decisions. o Performance based research funding systems. • Research Question: o What is the relationship between novelty and citation impact? o Are there potential biases in standard bibliometric indicators against novelty?
  • 4.
    Conceptualizing novelty The creationof any sort of novelty in art, science, or practical life – consists to a substantial extent of a recombination of conceptual and physical materials that were previously in existence. -- Nelson and Winter (1982) • Combinatorial novelty: combining existing scientific components in an unprecedented fashion. o Economists (Schumpeter, 1939; Nelson & Winter, 1982); psychologists (Mednick, 1962; Simonton, 2004); sociologists (Latour & Woolgar, 1986). • Combinatorial novelty is just one dimension of novelty.
  • 5.
    Measuring novelty • Foreach paper, retrieve its co-cited journal pairs. • Identify new pairs. • Check how distant are the combined journals, by comparing their co-cited journal profiles. o Cosine similarity (COSi,j) between their journal co-citation profiles in the preceding three years. • 𝑁𝑜𝑣𝑒𝑙𝑡𝑦 = 𝐽 𝑖−𝐽 𝑗 𝑝𝑎𝑖𝑟 𝑖𝑠 𝑛𝑒𝑤 1 − 𝐶𝑂𝑆𝑖,𝑗 • To avoid trivial combinations: o Exclude 50% least cited journals (in the preceding 3 years). o Require to be reused in the next 3 years. o Results robust when relaxing these constraints.
  • 6.
    Measuring novelty: Anexample Denk & Horstmann (2004) Serial block-face scanning electron microscopy to reconstruct three-dimensional tissue nanostructure. PLoS biology, 2(11), e329. o cites 19 WoS-indexed journals, and 9 (out of 171) pairs are new. • Nature Materials: Chemistry, Physical; Materials Science, Multidisciplinary; Physics, Applied; and Physics, Condensed Matter. • Others: Neurosciences; Cell Biology; and Physiology. Journal 1 Journal 2 1 NATURE MATERIALS CURRENT OPINION IN NEUROBIOLOGY 2 NATURE MATERIALS DEVELOPMENTAL DYNAMICS 3 NATURE MATERIALS PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY OF LONDON SERIES B-BIOLOGICAL SCIENCES 4 NATURE MATERIALS EUROPEAN JOURNAL OF NEUROSCIENCE 5 NATURE MATERIALS JOURNAL OF HISTOTECHNOLOGY 6 NATURE MATERIALS SCANNING 7 NATURE MATERIALS BRAIN RESEARCH REVIEWS 8 NATURE MATERIALS ANNUAL REVIEW OF BIOPHYSICS AND BIOENGINEERING 9 NATURE MATERIALS PFLUGERS ARCHIV-EUROPEAN JOURNAL OF PHYSIOLOGY
  • 7.
    Measuring novelty: Anexample o How distant are NATURE MATERIALS and CURRENT OPINION IN NEUROBIOLOGY? o Journal co-citation matrix (2001-2003) o 𝐶𝑂𝑆1,2 = 𝐽1∙𝐽2 𝐽1 𝐽2 = 331×9691+110×0+0×9959+⋯ 02+3312+1102+02+⋯ × 02+96912+02+99592+⋯ = 0.31 J1 J2 J3 J4 J5 … JN J1 NATURE MATERIALS / 0 331 110 0 … … J2 CURRENT OPINION 0 / 9691 0 9959 … … J3 SCIENCE 331 9691 / … … … … J4 NANO LETTERS 110 0 … / … … … J5 J. OF NEUROSCIENCE 0 9959 … … / … … … … … … … … / … JN … … … … … … /
  • 8.
    Measuring novelty: Anexample Journal 1 Journal 2 novelty 1 NATURE MATERIALS CURRENT OPINION IN NEUROBIOLOGY 0.69 2 NATURE MATERIALS DEVELOPMENTAL DYNAMICS 0.72 3 NATURE MATERIALS PHILOSOPHICAL TRANSACTIONS … 0.56 4 NATURE MATERIALS EUROPEAN JOURNAL OF NEUROSCIENCE 0.74 5 NATURE MATERIALS JOURNAL OF HISTOTECHNOLOGY 0.73 6 NATURE MATERIALS SCANNING 0.36 7 NATURE MATERIALS BRAIN RESEARCH REVIEWS 0.76 8 NATURE MATERIALS ANNUAL REVIEW OF BIOPHYSICS … 0.50 9 NATURE MATERIALS PFLUGERS … 0.74 o Novelty of the paper = 5.79, top 1% highly novel in its subject categories. o This paper was NOT among the top 1% highly cited papers until 2012/2013.
  • 9.
    Measuring novelty • Noveltyscores are highly skewed. • Categorical measure: NOV CAT: 1. non-novel, if a paper has no new journal combinations; 2. moderately novel, if a paper makes new combinations but has a novelty score lower than the top 1% of its subject category; 3. highly novel, if a paper has a novelty score among the top 1%. • 661,643 unique pubs, 1,038,238 obs. in 2001. % of all papers avg # new pairs median # new pairs Avg (avg cos) Avg(min cos) Non-novel 89% / / / / Moderately 10% 1.76 1.00 0.22 0.19 Highly 1% 8.39 7.00 0.13 0.06
  • 10.
    Novelty and impact •Data: o 661,643 unique articles in 2001 in WoS. o 1,038,238 obs. o Papers with multiple subject categories are counted multiple times. • Dependent variables: o Various aspects of impact. • Independent variable: o Categorical novelty measure: NOV CAT • Control: o Number of references and authors, whether internationally coauthored, subject category dummies.
  • 11.
    High risk ofnovel research *** p<.001, ** p<.01, * p<.05, + p<.10. Control for international co-authorship, number of authors (ln), number of references (ln), and scientific field fixed effects. 15-year citations GNB Mean Moderately- 0.032*** Highly novel 0.146*** … Dispersion Moderately- -0.001 Highly novel 0.162*** … Citation classes (15y) Multi-logit top10% vs mid80% Moderately- 0.056*** Highly novel 0.162*** … low10% vs mid80% Moderately- -0.054** Highly novel 0.137** …
  • 12.
    High gain fromnovel research Top 1% cited (15y) logit Cited by big hits (10y) logit Moderately novel 0.122*** 0.055*** Highly novel 0.451*** 0.229*** 10y citations (ln) 1.669*** • Novel papers are more likely to become big hits, i.e., top 1% highly cited in the field. • Novel papers are more likely to be cited by papers which themselves become big hits.
  • 13.
    Transdisciplinary impact # citing fields (15y) Poisson Ratio foreign field citations (15y) OLS Maxdist.: citing- home field (15y) OLS Top 1% cited home field (15y) logit Top 1% cited foreign field (15y) logit Moderately- 0.100*** 0.050*** 0.016*** -0.102** 0.318*** Highly novel 0.177*** 0.083*** 0.030*** 0.010 0.669*** 15y cites (ln) 0.494*** 0.002*** 15y foreign cites (ln) 0.052*** • Novel papers are cited in more fields and fields further away from their home field. • Novel papers are highly cited in foreign fields but not in their home field.
  • 14.
    Top 1% cited (3y) logit Moderately--0.102** Highly novel -0.031 Delayed recognition • Novel papers are more likely to be top cited in the long run, but not in the short run. • Delayed recognition. o Ahead of its time. o Resistance from incumbent scientific paradigms. Top 1% cited (15y) logit Moderately- 0.122*** Highly novel 0.451***
  • 15.
    Bias against novelty •Novel papers are less likely to be published in journals with high Impact Factors. JIF Poisson JIF Poisson JIF Poisson Moderately novel -0.103*** -0.101*** -0.079*** Highly novel -0.182*** -0.180*** -0.136*** Journal age < 4 -0.398*** Journal age (ln) 0.250***
  • 16.
  • 17.
    Implications • Potential biasagainst novel research in science policy using journal impact factor or short-term citations. • Over-reliance on such measures o Directly, discourage novel research that might of great value. o Indirectly, miss follow-on breakthroughs build on novel research. • The monodisciplinary approach in peer review may fail to recognize the full value of novel research.
  • 18.
    Caveats • Combinatorial novelty,other dimensions of novelty • Not all breakthrough research is “novel” • Data are truncated • “Gaming” system could become concern if review bodies focused on “novel” indicator • Note: important for public agencies to have a portfolio that includes risk; not all research funded should be risky. Real role for “ditch diggers”
  • 19.
    Thanks for yourattention! Questions, comments?