This document discusses usage data from eJournal subscriptions. It begins by introducing the authors and their affiliations. It then outlines the inspiration and work of an Elsevier evaluation team tasked with analyzing usage of Elsevier products. This included gathering usage, cost-per-use, interlibrary loan, publishing and citation data. The techniques learned were then extended to analyze other publishers. The document discusses the universe of usage data available and how it can be used for collection decisions and demonstrating library value despite limitations and issues with interpretation. It provides examples analyzing Elsevier usage data and impact factors to influence collections.
Awash in eJournal Data: What It Is, Where It Is, and Uses
1. Awash in eJournal data:
What it is, where it is, and what can be
done with it.
(is it “Too Much” or “Not Enough?”)
November 7, 2013
2. Who we are….
David Brennan, MLS, Assistant Librarian,
Collection Development/Digital Resources
Management | George T. Harrell Health Sciences
Library | Penn State Hershey - Milton S. Hershey
Medical Center | Penn State College of Medicine
Nancy J. Butkovich, MLS, Associate Librarian
and Head, Physical and Mathematical Sciences
Library, The Pennsylvania State University
University Park
dbrennan@hmc.psu.edu
njb2@psu.edu
3. Who contributed:
Lisa German, Associate Dean for Collections, Information and
Access Services
Linda Musser, Distinguished Librarian and Head, Earth and
Mineral Sciences Library
Robert Alan, Head Serials and Acquisitions Services
Jaime Jamison, Electronic Resources Specialist
Barbara Coopey, Assistant Head of Access Services
Ann Thompson, Information Resources and Services Supervisor-
Manager
Alan Shay, Data Analyst, Assessment
Serials Department staff (for cleaning the raw JR-1 files)
Dana Roth, Caltech Library
4. Outline
Introduction
Inspiration – the Elsevier Study Group and its charge
What resulted from the study group, and extending the model
Inputs: the universe of usage data and some uses for it
– David Brennan
Outputs: PSU Authors, Author Citations, and editorial contributions
– Nan Butkovich
Conclusions (if any!) / Discussion
5. Inspiration: the Elsevier Evaluation Team
David Brennan, Penn State Hershey
Nan Butkovich, Physical & Mathematical Sciences Library
Linda Musser, Earth & Mineral Sciences Library
The team was charged by Lisa German, Associate Dean for Collections,
Information and Access Services and the University Libraries’ Collections
Services Advisory Group (CSAG), Collection Assessment Team to gather data
to inform decisions related to Elsevier products. The team’s primary focus
was on the ScienceDirect journal package – its use, cost, and impact. Work
began in January, 2013. The team submitted its final report in June, 2013.
6. What resulted from the study group:
Usage data, JR1 - 5M+ hits over 5 years, 80% use from 20-23% of titles.
Cost-per-use data – Cost per use ranged from $2-$3.00. Based on a simple calculation derived from the
contract cost in a given year against the aggregate use from the JR1
ILL data - Obtaining the raw data on borrowing requests was straightforward, however, determining which
requests were for Elsevier titles required manual title matching. The intent of our analysis was to not only
identify the extent of borrowing from Elsevier titles overall but to identify any titles for which we were
exceeding the number of free requests allowed under CONTU guidelines as a basis for determining if
subscribing to these titles was a better economic value than ILL. Based on list price, only 2 titles were.
Publishing and Citation data (Web of Science) 18.8% of the papers published by Penn State authors in 2011
were in Elsevier titles. PSU authors cited 1,355 of 1,908 titles in 2011 (71%). Of the 249,187 cited references
to items from all publishers, Penn State authors cited items from works currently published by Elsevier a total
37,006 times in 2011 (14.9%)
Data on PSU editors and editorial-board members of Elsevier journals – This data was gathered by searching
“Penn State Elsevier editorial board” or “Pennsylvania State Elsevier Editorial” in Google. 108 separate titles
were determined to have some level of Penn State involvement. 119 faculty members served in some
capacity as: editors-in-chief (9), associate editors (11), board members (86), advisory committee members (2)
and one of each of book review editor, journal management committee member, senior editor, and advisory
editor. (Data thanks to Lisa German)
7. Leveraging the techniques we learned from
this study – going beyond Elsevier
What kinds of data sources are there and what are their limitations?
Can we use these data (even though imperfect) to make collection
decisions?
Can we use these data (even though imperfect) to show library impact and
value?
8. Sucked into the universe of usage data…..
National Geographic Society/Red Vision/C4 Studios/Pioneer Productions Retrieved Oct. 28, 2013, from http://newswatch.nationalgeographic.com/2008/12/05/to_the_edge_of_the_universe_an/
9. Why collect all this data(1)?
Collection Development in one slide:
The Universe of Stuff
Have it? Don’t have it?
Keep it? Get it?
Data= (maybe ≠…) how do we decide?
10. Why collect all this data(2)?
“Library Value”:
The Universe of Stuff
Have it? Don’t have it?
Keep it? Get it?
Promote stuff and services
related to it.
11. Aaberg, Jason. (2006, March 25). Bullet. Stock Xchng. Retrieved Sept 28, 2013, from http://www.sxc.hu/photo/495767
12. Every bit of data has *some* impact (and some
drawback)
Impact
Factor
(or other
Measure)
Outputs
Program/
Curriculum/
Subject Needs
Licensing
An important
journal title!
Purchase
Requests
From Users
Use
JR1, 1a
Budget /
Cost
Potential
Use?
Turnaways,
JR2
ILL Activity
13. Two examples of data mining and use
Impact factors - top holdings related to liaison assignments
JR1/1a usage and issues – the dilemmas of too much information
Audience and purpose!
14. Impact factors - top holdings related to liaison assignments
• Publishing and citation data (and by extension, the use of impact factors)
(Haddow, 2007) can also be used to influence collection decisions (again
inasmuch as there is the ability to swap titles in and out of packages, and
given the limitations of IF (e.g. EASE, 2012)).
• IF is a known quantity that is more familiar to library users than other
measures, and is commonly touted by vendors and publishers. Many journal
landing pages prominently show their IF.
• Libraries do have a role in showing how to appropriately use the IF and other
bibliometric data (e.g. Emory libguide)
• Even with large packages, there are still high impact titles that are outliers –
recognizing these gaps and demonstrating current coverage is part of
showing library value and meeting the needs of end users.
• As this analysis extends to all of the liaison areas, a clearer picture will
emerge of collection strengths and needs.
1. Haddow G. Evidence Summary: Level 1 COUNTER Compliant Vendor Statistics are a Reliable Measure of Journal Usage. Evidence Based
Library and Information Practice , 2007, 2(2).
2. European Association of Science Editors. The EASE Statement on Inappropriate Use of Impact Factors. 2013. Retrieved from
http://www.ease.org.uk/publications/impact-factor-statement
3. Emory Libraries & Information Technology, Robert W. Woodruff Library. Impact Factors and Citation Analysis [Libguide] 2013. Retrieved from
http://guides.main.library.emory.edu/citationanalysis
15.
16. JR1/1a usage and issues – the dilemmas of too much information
• Use metrics have value, but only inasmuch as there is the ability to swap titles
in and out of a package when use data dictates. Will talk about CPU data
later.
• The proper analysis of “use” is of greater concern, with the well-known issues
of the COUNTER standards and their implementation having an impact in how
useful this data can be. (Welker, 2012) - What is “use”? (Nicolson-Guest &
Macdonald, 2013) and what is “Cost-per use?” (Harrington & Stovall, 2011)
• pdf v html and platform design (Bucknell, 2012)
• Use data is still an easily demonstrable measure to use, with some
manipulation.
• Long tail (20% of titles accounting for 80% of use leaves 80% of titles with
diminishing returns, even if none of them are truly “zero use“)
• Backfile confusion (JR1 vs. 1A) – if the point is to use data to influence
subscription decisions, then owned backfile use might not be part of the
equation – this is debatable (Bucknell, 2012)
1. Welker J. Counting on COUNTER: The Current State of E-Resource Usage Data in Libraries, Computers in Libraries, 2012, 32(9). Retrieved from
http://www.infotoday.com/cilmag/nov12/Welker--Counting-on-COUNTER.shtml
2. Nicolson-Guest B.; Macdonald D. Are we comparing bananas and gorillas? Interpreting usage statistics for cost benefit and reporting, ALIA Information Online
2013 Conference Proceedings, 2013. Retrieved from http://www.information-online.com.au/pdf/Thursday_Concurrent_15_1115_Nicolson_Guest.pdf
3. Harrington M.; Stovall C. Contextualizing and Interpreting Cost per Use for Electronic Journals Proceedings of the Charleston Library Conference, 2011.
http://dx.doi.org/10.5703/1288284314928
4. Bucknell, T. Garbage in, gospel out: Twelve reasons why librarian should not accept cost per download figures at face value. The Serials Librarian, 2012, 63(2),
192-212. http://dx.doi.org/10.1080/0361526X.2012.680687
17. Example: Elsevier
JR1
“Number of Successful Full-Text Article Requests [SFTARs] by Month and
Journal (COUNTER Required and Compliant) – current year”*
- Aggregate usage (subscribed and backfiles)
- Includes nulls
- Reporting past 5 years of use (means title
changes and add/drops reflected in data)
*Description of the ScienceDirect Customer Usage Reports
http://usagereports.elsevier.com/Report_Descriptions/sd_report_description.pdf
18. Example: Elsevier
JR1a
“Number of Successful Full-Text Article Requests [SFTARs] from an Archive by
Month and Journal (COUNTER Required and Compliant)”*
- Backfile usage only
*Description of the ScienceDirect Customer Usage Reports
http://usagereports.elsevier.com/Report_Descriptions/sd_report_description.pdf
19. Example: Elsevier
Biochemical systematics and ecology (0305-1978)
-from 1973 to 1994 in ScienceDirect Agricultural & Biological Sciences Backfile
and ScienceDirect Environmental Science Backfile
-from 1974 to 1994 in ScienceDirect Biochemistry, Genetics & Molecular
Biology Backfile
-from 06/28/1974 to 2009 in ScienceDirect Journals
Aggregate use (JR1):
Backfile use (JR1a):
21. Publishers by the Numbers:
Where PSU Authors Publish and
What They Cite
Nan Butkovich, Associate Librarian and Head,
Physical & Mathematical Sciences Library
The Pennsylvania State University
University Park, PA
njb2@psu.edu
24. Where do PSU authors publish?
Searched Web of Science (Source Searches)
Arts & Humanities Citation Index, Social Science Citation Index, Science
Citation Index
“Penn State” in the address field
Searched for articles published in 2011
Looked at Publisher (PU) field
6,928 articles in 2011
25. And the #1 publisher choice for PSU authors is…
#1
Elsevier (and its various subsidiaries)
2011: 1,290 of 6,928 articles (18.6%)
26. What about the competition?
#2: Wiley (and subsidiaries)
2011: 774 articles (11.2%)
#3: Springer (and subsidiaries)
2011: 453 articles (6.5%)
#4: American Chemical Society
2011: 401 articles (5.8%)
27. The “take away”…
Elsevier published almost as many articles by PSU
authors as the #2, #3, and #4 publishers… COMBINED!
And together, the four publishers accounted for 42.1% of
all articles published by PSU authors in 2011.
WOW!
29. Citation analysis:
Where do the numbers come from?
Citation analysis has a long and rich history in collection development and
management – the first study was done in 1927!1
Cited references were from papers with PSU authors that were published in
2011 and indexed in Web of Science
Citation data for each of the 3 main citation indexes examined separately
JR1 files (from Serials Solutions) provided the lists of titles for the four
publishers included in this part of the study
1 Gross, P. L. K.; Gross, E. M. College Libraries and Chemical Education. Science 1927, 66 (1713), 385-389.
30. Caveat: These data are biased
The four publishers that were selected for this phase of the study are all
heavily oriented to the STEM disciplines
The Web of Science database is also biased toward STEM disciplines
It indexes journal articles, and STEM researchers publish in journals more often than do
researchers in non-STEM disciplines
While Web of Science now publishes book citation indexes, they are separate databases
and were not included in this study
Penn State is a land-grant institution with a heavy STEM focus
31. Challenges
Web of Science data required significant cleaning before use
Manual extraction of citation data from records
Lack of consistency in journal abbreviations (one title can have several abbreviations)
JR1 files
No easy way to compare list of full titles to the list of cited journal abbreviations
Had to be manually cleaned to be useful
Confidentiality clauses in licenses
Had to aggregate the data from the four publishers rather than show data for individual
publishers (sorry about that)
32. Most of the clean-up and calculations were done the “old-fashioned” way…
33. Terminology: Cited titles
These are specific titles that were cited
The counting unit is the journal itself
Each journal is counted once
34. Journals by the numbers
American Chemical Society – 51 titles
Elsevier (and subsidiaries) – 1,777 titles
Springer (and subsidiaries) – 1,534 titles
Wiley (and subsidiaries) – 1,471 titles
Total number of titles used in this study (n) – 4,833
35. Cited titles – How many were cited?
Our collection n=4,833 titles
Of these,
3,169 were cited in the 2011 Science Citation Index (65.6%)
1,430 were cited in the 2011 Social Science Citation Index (29.6%)
205 were cited in the Arts & Humanities Citation Index (4.2%)
36. Cited titles –
How many were unique to one citation index?
n = 4,833 titles
1,789 were cited only in Science Citation Index (37.0%)
298 were cited only in Social Science Citation Index (6.2%)
20 were cited only in Arts & Humanities Citation Index (0.4%)
1,286 were not cited at all (26.6%)
37. The “take-away”…
In 2011…
PSU authors cited 3,547 titles out of 4,833 currently subscribed titles
from these four publishers
That’s 73.4%!
Most of these were cited in STEM publications
38. Terminology: Cited references
These are the references that were cited by PSU authors in their
publications
The counting unit is the publication that was cited
Publications are counted each time that they are cited
39. Cited references: What do PSU authors cite?
Data from Web of Science
2011 data for all publishers, all publications indexed in Web of Science
Science Citation Index
183,393 publications cited by PSU authors
Social Science Citation Index
58,802 publications cited by PSU authors
Arts & Humanities Citation Index
6,992 publications cited by PSU authors
40. Cited references: The Big Four
Science Citation Index
63,572 citations (the Big Four) out of 183,393 citations
34.7%
Social Science Citation Index
14,364 citations out of 58,802 citations
24.4%
Arts & Humanities Citation Index
568 citations out of 6,992 citations
8.1%
41. The “take-away”…
78,504 out of 249,187 publications cited by PSU authors in 2011 were
published by one of these four publishers…
That’s 31.5%!
Most citations were of STEM publications (no surprise there)
42. Deeper questions
So far, these citation data highlight the importance of these four publishers
in STEM publishing
They can be useful for collection development purposes, but what else can
you do with the citation data?
Things that I’ve wondered about…
What is the cost/use of these publications?
How many articles does a researcher read for every article that he or she cites?
These can all be calculated using JR1 data and citation data
43. Cost per use
This measure has been around since long before the e-journal was a gleam
in some publisher’s eyes
Now it’s usually cost/view, but the goal is the same: Determine how much
the institution pays each time a patron opens a document
At some magic number it becomes cheaper to cancel a publication and get
the desired articles through ILL or document delivery rather than subscribe
to the publication
It works for e-journal packages too…
44. Penn State’s cost/view
Combined expenditure: $4,749,866.67
Total number of views (the Big Four) : 1,790,333
Cost/view (use) = $4,749,866.67 = $2.65
1,790,333
45. Articles viewed per citation
Citations are just the end product… how many are viewed to get the one
citation?
There are a number of papers that have examined the number of articles
read by researchers… most notably those of Tenopir & King 1
Information on reads (views)/citation sparse 2
Although the method works, the specific values will vary according to what
is being measured
1 For example, King, D. W.; Tenopir, C.; Clarke, M. Measuring Total Reading of Journal Articles. D-Lib Magazine
2006, 12 (10), http://www.dlib.org/dlib/october06/king/10king.html.
2 Kurtz, M. J.; Eichhorn, G.; Accomazzi, A.; Grant, C.; Demleitner, M.; Murray, S. S.; Martimbeau, N.; Elwell, B. The
Bibliometric Properties of Article Readership Information. Journal of the American Society for Information Science
and Technology 2005, 56 (2), 111-128.
46. Penn State’s articles viewed/citation
Number of items viewed (for Penn State’s Big Four publishers): 1,790,333
Number of cited references (to Big Four journals by Penn Staters): 78,504
Views/citation = 1,790,333 = 22.8
78,504
47. Thoughts about the process
The results of citation studies like this will vary depending on the sources of
the data used to the study
Looking at the publisher level citation data can be useful with so many
publishers bundling the journals into packages
Easier to identify top level publishers than those that are bottom tier
Models for evaluating adds/drops will be highly variable, depending on local
needs and the weights assigned to each of the variables, but the simple
decisions have already been made. New models and approaches will be
needed, and gathering the requisite data will require careful thought to
workflow and the design of analysis tools.