Awash in eJournal Data: What It Is, Where It Is, and Uses

Awash in eJournal data:
What it is, where it is, and what can be
done with it.
(is it “Too Much” or “Not Enough?”)
November 7, 2013

Who we are….
David Brennan, MLS, Assistant Librarian,
Collection Development/Digital Resources
Management | George T. Harrell Health Sciences
Library | Penn State Hershey - Milton S. Hershey
Medical Center | Penn State College of Medicine
Nancy J. Butkovich, MLS, Associate Librarian
and Head, Physical and Mathematical Sciences
Library, The Pennsylvania State University
University Park
dbrennan@hmc.psu.edu
njb2@psu.edu

Who contributed:
 Lisa German, Associate Dean for Collections, Information and
Access Services
 Linda Musser, Distinguished Librarian and Head, Earth and
Mineral Sciences Library
 Robert Alan, Head Serials and Acquisitions Services
 Jaime Jamison, Electronic Resources Specialist
 Barbara Coopey, Assistant Head of Access Services
 Ann Thompson, Information Resources and Services Supervisor-
Manager
 Alan Shay, Data Analyst, Assessment
 Serials Department staff (for cleaning the raw JR-1 files)
 Dana Roth, Caltech Library

Outline
 Introduction
 Inspiration – the Elsevier Study Group and its charge
 What resulted from the study group, and extending the model
 Inputs: the universe of usage data and some uses for it
– David Brennan
 Outputs: PSU Authors, Author Citations, and editorial contributions
– Nan Butkovich
 Conclusions (if any!) / Discussion

Inspiration: the Elsevier Evaluation Team
 David Brennan, Penn State Hershey
 Nan Butkovich, Physical & Mathematical Sciences Library
 Linda Musser, Earth & Mineral Sciences Library
The team was charged by Lisa German, Associate Dean for Collections,
Information and Access Services and the University Libraries’ Collections
Services Advisory Group (CSAG), Collection Assessment Team to gather data
to inform decisions related to Elsevier products. The team’s primary focus
was on the ScienceDirect journal package – its use, cost, and impact. Work
began in January, 2013. The team submitted its final report in June, 2013.

What resulted from the study group:
 Usage data, JR1 - 5M+ hits over 5 years, 80% use from 20-23% of titles.
 Cost-per-use data – Cost per use ranged from $2-$3.00. Based on a simple calculation derived from the
contract cost in a given year against the aggregate use from the JR1
 ILL data - Obtaining the raw data on borrowing requests was straightforward, however, determining which
requests were for Elsevier titles required manual title matching. The intent of our analysis was to not only
identify the extent of borrowing from Elsevier titles overall but to identify any titles for which we were
exceeding the number of free requests allowed under CONTU guidelines as a basis for determining if
subscribing to these titles was a better economic value than ILL. Based on list price, only 2 titles were.
 Publishing and Citation data (Web of Science) 18.8% of the papers published by Penn State authors in 2011
were in Elsevier titles. PSU authors cited 1,355 of 1,908 titles in 2011 (71%). Of the 249,187 cited references
to items from all publishers, Penn State authors cited items from works currently published by Elsevier a total
37,006 times in 2011 (14.9%)
 Data on PSU editors and editorial-board members of Elsevier journals – This data was gathered by searching
“Penn State Elsevier editorial board” or “Pennsylvania State Elsevier Editorial” in Google. 108 separate titles
were determined to have some level of Penn State involvement. 119 faculty members served in some
capacity as: editors-in-chief (9), associate editors (11), board members (86), advisory committee members (2)
and one of each of book review editor, journal management committee member, senior editor, and advisory
editor. (Data thanks to Lisa German)

Leveraging the techniques we learned from
this study – going beyond Elsevier
 What kinds of data sources are there and what are their limitations?
 Can we use these data (even though imperfect) to make collection
decisions?
 Can we use these data (even though imperfect) to show library impact and
value?

Sucked into the universe of usage data…..
National Geographic Society/Red Vision/C4 Studios/Pioneer Productions Retrieved Oct. 28, 2013, from http://newswatch.nationalgeographic.com/2008/12/05/to_the_edge_of_the_universe_an/

Why collect all this data(1)?
Collection Development in one slide:
The Universe of Stuff
Have it? Don’t have it?
Keep it? Get it?
Data= (maybe ≠…) how do we decide?

Why collect all this data(2)?
“Library Value”:
The Universe of Stuff
Have it? Don’t have it?
Keep it? Get it?
Promote stuff and services
related to it.

Aaberg, Jason. (2006, March 25). Bullet. Stock Xchng. Retrieved Sept 28, 2013, from http://www.sxc.hu/photo/495767

Every bit of data has *some* impact (and some
drawback)
Impact
Factor
(or other
Measure)
Outputs
Program/
Curriculum/
Subject Needs
Licensing
An important
journal title!
Purchase
Requests
From Users
Use
JR1, 1a
Budget /
Cost
Potential
Use?
Turnaways,
JR2
ILL Activity

Two examples of data mining and use
 Impact factors - top holdings related to liaison assignments
 JR1/1a usage and issues – the dilemmas of too much information
Audience and purpose!

Impact factors - top holdings related to liaison assignments
• Publishing and citation data (and by extension, the use of impact factors)
(Haddow, 2007) can also be used to influence collection decisions (again
inasmuch as there is the ability to swap titles in and out of packages, and
given the limitations of IF (e.g. EASE, 2012)).
• IF is a known quantity that is more familiar to library users than other
measures, and is commonly touted by vendors and publishers. Many journal
landing pages prominently show their IF.
• Libraries do have a role in showing how to appropriately use the IF and other
bibliometric data (e.g. Emory libguide)
• Even with large packages, there are still high impact titles that are outliers –
recognizing these gaps and demonstrating current coverage is part of
showing library value and meeting the needs of end users.
• As this analysis extends to all of the liaison areas, a clearer picture will
emerge of collection strengths and needs.
1. Haddow G. Evidence Summary: Level 1 COUNTER Compliant Vendor Statistics are a Reliable Measure of Journal Usage. Evidence Based
Library and Information Practice , 2007, 2(2).
2. European Association of Science Editors. The EASE Statement on Inappropriate Use of Impact Factors. 2013. Retrieved from
http://www.ease.org.uk/publications/impact-factor-statement
3. Emory Libraries & Information Technology, Robert W. Woodruff Library. Impact Factors and Citation Analysis [Libguide] 2013. Retrieved from
http://guides.main.library.emory.edu/citationanalysis

JR1/1a usage and issues – the dilemmas of too much information
• Use metrics have value, but only inasmuch as there is the ability to swap titles
in and out of a package when use data dictates. Will talk about CPU data
later.
• The proper analysis of “use” is of greater concern, with the well-known issues
of the COUNTER standards and their implementation having an impact in how
useful this data can be. (Welker, 2012) - What is “use”? (Nicolson-Guest &
Macdonald, 2013) and what is “Cost-per use?” (Harrington & Stovall, 2011)
• pdf v html and platform design (Bucknell, 2012)
• Use data is still an easily demonstrable measure to use, with some
manipulation.
• Long tail (20% of titles accounting for 80% of use leaves 80% of titles with
diminishing returns, even if none of them are truly “zero use“)
• Backfile confusion (JR1 vs. 1A) – if the point is to use data to influence
subscription decisions, then owned backfile use might not be part of the
equation – this is debatable (Bucknell, 2012)
1. Welker J. Counting on COUNTER: The Current State of E-Resource Usage Data in Libraries, Computers in Libraries, 2012, 32(9). Retrieved from
http://www.infotoday.com/cilmag/nov12/Welker--Counting-on-COUNTER.shtml
2. Nicolson-Guest B.; Macdonald D. Are we comparing bananas and gorillas? Interpreting usage statistics for cost benefit and reporting, ALIA Information Online
2013 Conference Proceedings, 2013. Retrieved from http://www.information-online.com.au/pdf/Thursday_Concurrent_15_1115_Nicolson_Guest.pdf
3. Harrington M.; Stovall C. Contextualizing and Interpreting Cost per Use for Electronic Journals Proceedings of the Charleston Library Conference, 2011.
http://dx.doi.org/10.5703/1288284314928
4. Bucknell, T. Garbage in, gospel out: Twelve reasons why librarian should not accept cost per download figures at face value. The Serials Librarian, 2012, 63(2),
192-212. http://dx.doi.org/10.1080/0361526X.2012.680687

Example: Elsevier
 JR1
“Number of Successful Full-Text Article Requests [SFTARs] by Month and
Journal (COUNTER Required and Compliant) – current year”*
- Aggregate usage (subscribed and backfiles)
- Includes nulls
- Reporting past 5 years of use (means title
changes and add/drops reflected in data)
*Description of the ScienceDirect Customer Usage Reports
http://usagereports.elsevier.com/Report_Descriptions/sd_report_description.pdf

Example: Elsevier
 JR1a
“Number of Successful Full-Text Article Requests [SFTARs] from an Archive by
Month and Journal (COUNTER Required and Compliant)”*
- Backfile usage only
*Description of the ScienceDirect Customer Usage Reports
http://usagereports.elsevier.com/Report_Descriptions/sd_report_description.pdf

Example: Elsevier
Biochemical systematics and ecology (0305-1978)
-from 1973 to 1994 in ScienceDirect Agricultural & Biological Sciences Backfile
and ScienceDirect Environmental Science Backfile
-from 1974 to 1994 in ScienceDirect Biochemistry, Genetics & Molecular
Biology Backfile
-from 06/28/1974 to 2009 in ScienceDirect Journals
Aggregate use (JR1):
Backfile use (JR1a):

Publishers by the Numbers:
Where PSU Authors Publish and
What They Cite
Nan Butkovich, Associate Librarian and Head,
Physical & Mathematical Sciences Library
The Pennsylvania State University
University Park, PA
njb2@psu.edu

Where do PSU authors publish?
 Searched Web of Science (Source Searches)
 Arts & Humanities Citation Index, Social Science Citation Index, Science
Citation Index
 “Penn State” in the address field
 Searched for articles published in 2011
 Looked at Publisher (PU) field
 6,928 articles in 2011

And the #1 publisher choice for PSU authors is…
#1
Elsevier (and its various subsidiaries)
2011: 1,290 of 6,928 articles (18.6%)

What about the competition?
#2: Wiley (and subsidiaries)
 2011: 774 articles (11.2%)
#3: Springer (and subsidiaries)
 2011: 453 articles (6.5%)
#4: American Chemical Society
 2011: 401 articles (5.8%)

The “take away”…
 Elsevier published almost as many articles by PSU
authors as the #2, #3, and #4 publishers… COMBINED!
 And together, the four publishers accounted for 42.1% of
all articles published by PSU authors in 2011.
WOW!

Citation analysis:
Where do the numbers come from?
 Citation analysis has a long and rich history in collection development and
management – the first study was done in 1927!1
 Cited references were from papers with PSU authors that were published in
2011 and indexed in Web of Science
 Citation data for each of the 3 main citation indexes examined separately
 JR1 files (from Serials Solutions) provided the lists of titles for the four
publishers included in this part of the study
1 Gross, P. L. K.; Gross, E. M. College Libraries and Chemical Education. Science 1927, 66 (1713), 385-389.

Caveat: These data are biased
 The four publishers that were selected for this phase of the study are all
heavily oriented to the STEM disciplines
 The Web of Science database is also biased toward STEM disciplines
 It indexes journal articles, and STEM researchers publish in journals more often than do
researchers in non-STEM disciplines
 While Web of Science now publishes book citation indexes, they are separate databases
and were not included in this study
 Penn State is a land-grant institution with a heavy STEM focus

Challenges
 Web of Science data required significant cleaning before use
 Manual extraction of citation data from records
 Lack of consistency in journal abbreviations (one title can have several abbreviations)
 JR1 files
 No easy way to compare list of full titles to the list of cited journal abbreviations
 Had to be manually cleaned to be useful
 Confidentiality clauses in licenses
 Had to aggregate the data from the four publishers rather than show data for individual
publishers (sorry about that)

Most of the clean-up and calculations were done the “old-fashioned” way…

Terminology: Cited titles
 These are specific titles that were cited
 The counting unit is the journal itself
 Each journal is counted once

Journals by the numbers
 American Chemical Society – 51 titles
 Elsevier (and subsidiaries) – 1,777 titles
 Springer (and subsidiaries) – 1,534 titles
 Wiley (and subsidiaries) – 1,471 titles
 Total number of titles used in this study (n) – 4,833

Cited titles – How many were cited?
 Our collection n=4,833 titles
 Of these,
 3,169 were cited in the 2011 Science Citation Index (65.6%)
 1,430 were cited in the 2011 Social Science Citation Index (29.6%)
 205 were cited in the Arts & Humanities Citation Index (4.2%)

Cited titles –
How many were unique to one citation index?
 n = 4,833 titles
 1,789 were cited only in Science Citation Index (37.0%)
 298 were cited only in Social Science Citation Index (6.2%)
 20 were cited only in Arts & Humanities Citation Index (0.4%)
 1,286 were not cited at all (26.6%)

The “take-away”…
 In 2011…
 PSU authors cited 3,547 titles out of 4,833 currently subscribed titles
from these four publishers
 That’s 73.4%!
 Most of these were cited in STEM publications

Terminology: Cited references
 These are the references that were cited by PSU authors in their
publications
 The counting unit is the publication that was cited
 Publications are counted each time that they are cited

Cited references: What do PSU authors cite?
 Data from Web of Science
 2011 data for all publishers, all publications indexed in Web of Science
 Science Citation Index
 183,393 publications cited by PSU authors
 Social Science Citation Index
 Arts & Humanities Citation Index

Cited references: The Big Four
Science Citation Index
 63,572 citations (the Big Four) out of 183,393 citations
 34.7%
Social Science Citation Index
 14,364 citations out of 58,802 citations
 24.4%
Arts & Humanities Citation Index
 568 citations out of 6,992 citations
 8.1%

The “take-away”…
 78,504 out of 249,187 publications cited by PSU authors in 2011 were
published by one of these four publishers…
 That’s 31.5%!
 Most citations were of STEM publications (no surprise there)

Deeper questions
 So far, these citation data highlight the importance of these four publishers
in STEM publishing
 They can be useful for collection development purposes, but what else can
you do with the citation data?
 Things that I’ve wondered about…
 What is the cost/use of these publications?
 How many articles does a researcher read for every article that he or she cites?
 These can all be calculated using JR1 data and citation data

Cost per use
 This measure has been around since long before the e-journal was a gleam
in some publisher’s eyes
 Now it’s usually cost/view, but the goal is the same: Determine how much
the institution pays each time a patron opens a document
 At some magic number it becomes cheaper to cancel a publication and get
the desired articles through ILL or document delivery rather than subscribe
to the publication
 It works for e-journal packages too…

Penn State’s cost/view
 Combined expenditure: $4,749,866.67
 Total number of views (the Big Four) : 1,790,333
Cost/view (use) = $4,749,866.67 = $2.65
1,790,333

Articles viewed per citation
 Citations are just the end product… how many are viewed to get the one
citation?
 There are a number of papers that have examined the number of articles
read by researchers… most notably those of Tenopir & King 1
 Information on reads (views)/citation sparse 2
 Although the method works, the specific values will vary according to what
is being measured
1 For example, King, D. W.; Tenopir, C.; Clarke, M. Measuring Total Reading of Journal Articles. D-Lib Magazine
2006, 12 (10), http://www.dlib.org/dlib/october06/king/10king.html.
2 Kurtz, M. J.; Eichhorn, G.; Accomazzi, A.; Grant, C.; Demleitner, M.; Murray, S. S.; Martimbeau, N.; Elwell, B. The
Bibliometric Properties of Article Readership Information. Journal of the American Society for Information Science
and Technology 2005, 56 (2), 111-128.

Penn State’s articles viewed/citation
 Number of items viewed (for Penn State’s Big Four publishers): 1,790,333
 Number of cited references (to Big Four journals by Penn Staters): 78,504
Views/citation = 1,790,333 = 22.8
78,504

Thoughts about the process
 The results of citation studies like this will vary depending on the sources of
the data used to the study
 Looking at the publisher level citation data can be useful with so many
publishers bundling the journals into packages
 Easier to identify top level publishers than those that are bottom tier
 Models for evaluating adds/drops will be highly variable, depending on local
needs and the weights assigned to each of the variables, but the simple
decisions have already been made. New models and approaches will be
needed, and gathering the requisite data will require careful thought to
workflow and the design of analysis tools.

Awash in eJournal Data: What It Is, Where It Is, and Uses

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to Awash in eJournal Data: What It Is, Where It Is, and Uses

Similar to Awash in eJournal Data: What It Is, Where It Is, and Uses (20)

Recently uploaded

Recently uploaded (20)

Awash in eJournal Data: What It Is, Where It Is, and Uses