MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
The Strange World of Bibliometric Numbers: Implications for Professional Practice
1. The strange world of bibliometric numbers:
Implications for professional practice
Dr Ian Rowlands
David Wilson Library
Manchester Metropolitan University
27 June 2016
2. Three themes
Look at the underlying data
• don’t take indicators at face value
Think how your data will be used
• put your numbers in context
Accept that bibliometrics is driven by rare events
• put measurements around that uncertainty
3. Content
The importance of context: Interpreting the h-index
The journal impact factor: A case study in extremes
Working with `difficult’ numbers
7. Fly body length (mm)
Statistic Value
Mean 45.5
Median 45.5
Mode 45
Range 36 – 55
Standard deviation 3.9
8. Citation frequencies: Nature 2008
Citations to present for 975
Nature articles and review
papers published in 2008
9. Nature citations
Statistic Value
Mean cites per paper 275.1
Median 164
Mode 1
Range 0 – 4,735
Standard deviation 366.6
Citations to present for 975 Nature articles and review papers published in 2008
10. Nature citations
Statistic Value
Mean cites per paper 275.1
Median 164
Mode 1
Range 0 – 4,735
Standard deviation 366.6
Citations to present for 975 Nature articles and review papers published in 2008
What’s the average??
The data range over three
orders of magnitude!!
11. A thought experiment
What if flies’ body lengths followed the same distribution as
citations?
• most typically, a fly would not even exist (often, mode=0)
• 85 per cent of flies would have bodies shorter than average
for the whole population, and most would be hors d’oeuvres
for the top 15 per cent
• some flies would measure a giant 30-inches
12. Lesson
`Average’ is a problematic concept in bibliometrics
This has serious implications for
• methodology
• interpretation
• application
16. Interpreting the h-index
Harry
– 60 papers
– 6,000 citations
– 100 citations per paper
Tom
– 60 papers
– 6,000 citations
– 100 citations per paper
17. Interpreting the h-index
Harry
– 60 papers
– 6,000 citations
– 100 citations per paper
Tom
– 60 papers
– 6,000 citations
– 100 citations per paper
h-index = 20
18. Interpreting the h-index
Harry
– 60 papers
– 6,000 citations
– 100 citations per paper
Tom
– 60 papers
– 6,000 citations
– 100 citations per paper
h-index = 20
h-index = 40
20. On the h-index and its variants
“These are often breathtakingly naïve attempts to capture a
complex citation record with a single number. Indeed the
primary advantage of these new indices over simple
histograms of citation counts is that the indices discard almost
all of the detail … and this makes it possible to rank any two
scientists … Surely understanding ought to be the goal when
assessing research, not ensuring that any two people are
comparable.”
International Mathematical Union, Citation Statistics, June 2008, p.14
http://www.mathunion.org/fileadmin/IMU/Report/CitationStatistics.pdf
21. Practical tips
The accuracy of h depends on not missing any relevant papers in
the core as well as avoiding false drops
Present h with a health warning that pushes responsibility for
curating their online identity back on the client (e.g. ORCID,
active management of their ResearcherID)
Source coverage (particularly Scopus vs Web of Science) is a
seriously overlooked issue and may yield very different h values
Since h throws away information about important highly cited
papers (papers with citations > h) it does many researchers a
disservice
24. Journal impact factor 2015 calculation
citations accrued
during 2015
papers published
in 2013
papers published
in 2014
+
÷
Numerator=ALL citations
Denominator
=articles and reviews only
25. Bibliometric ratios can be very unstable
The journal impact factor is a simple ratio:
JIF = citations / papers
Citations can throw up surprises, and these will be amplified if
the sample is small.
28. Acta Crystallographica Section A
Citations received in
2008 2009 2010
The whole journal 3,628 6,068 7,325
George Sheldrick, A short history of SHELX
(2008) 64(1) pp 112-122. 3,542 5,897 7,029
Helen Berman, The Protein Data Bank: A
historical perspective (2008) 64(1) pp 88-95. 4 7 23
29. A short history of SHELX
Abstract
“An account is given of the development of the SHELX system of
computer programs from SHELX-76 to the present day …This
paper could serve as a general literature citation when one or
more of the open-source SHELX programs … are employed in the
course of a crystal-structure determination.”
George M Sheldick, A short history of SHELX, Acta Crystallographica Section A
(2008) 64(1): 112-122.
30. top 10% of articles
generate
40% of all citations …
… 82% of articles are `below
average’Bill Gates gets on the train …
and, on average, everyone on
board is a millionaire
(at least until he gets off)
31. Lessons
The example of Acta Crystallographica A’s 2009 JIF is a
salutary reminder that rare events do happen. The issue is
compounded in this case because the denominator is small
(127 papers).
How could the journal impact factor (and other bibliometrics
indicators) be better presented?
• in principle, the mode and median are far more appropriate and
informative than the mean when dealing with highly skewed
distributions
• but in reality, the mode and median for many indicators will simply
be 0 or 1
• but this is not terribly realistic strategy!
35. Advantages
By using a logarithmic rather than a linear scale, the mode,
median and mean converge and we have a much better
sense of the central tendency.
This has three practical benefits:
• Suddenly `average’ becomes meaningful
• You can now use a whole range of statistical tests that
assume a normal distribution (e.g. student’s t-test,
ANOVA)
• You can now put 95% confidence intervals around the
mean, which aids interpretation
36. Health warning
YOU MUST LOOK AT THE DATA
Fairly mature citation distributions are often approximately
loglinear but this is not always the case.
Try other transforms (e.g. square root, reciprocal) to see if
they offer a better solution.
If you want to be squeaky clean, consider a Box-Cox test to
find the optimal transform.
39. Final conclusions
Always look at the raw data, not the cooked indicator, and
think about context
`Rare events’ can make a huge difference
Bibliometric indicators are unstable and this can lead to
poor decision-making
You have a responsibility to present meaningful averages
and to put bounds around data uncertainty
Editor's Notes
Who has the greater impact?
Modelled on real life example of Harry Kroto, Nobel chemistry laureate for discovery of buckminsterfullerenes. Huge lasting impact but on the basis of a small number of papers. You don’t get any extra credit in h for citations > h
The JIF is simply the ration of the two. The AVERAGE NUMBER OF RECENT CITATIONS PER PAPER to a particular journal.
It couldn’t be simpler. However big or small a journal we now have a common currency for comparison.