Awash in eJournal Data: What It Is, Where It Is, and What Can Be Done With It.


Published on

David P. Brennan (speaker), Nancy J. Butkovich (speaker)

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Awash in eJournal Data: What It Is, Where It Is, and What Can Be Done With It.

  1. 1. Awash in eJournal data: What it is, where it is, and what can be done with it. (is it “Too Much” or “Not Enough?”) November 7, 2013
  2. 2. Who we are…. David Brennan, MLS, Assistant Librarian, Collection Development/Digital Resources Management | George T. Harrell Health Sciences Library | Penn State Hershey - Milton S. Hershey Medical Center | Penn State College of Medicine Nancy J. Butkovich, MLS, Associate Librarian and Head, Physical and Mathematical Sciences Library, The Pennsylvania State University University Park
  3. 3. Who contributed:  Lisa German, Associate Dean for Collections, Information and Access Services  Linda Musser, Distinguished Librarian and Head, Earth and Mineral Sciences Library  Robert Alan, Head Serials and Acquisitions Services  Jaime Jamison, Electronic Resources Specialist  Barbara Coopey, Assistant Head of Access Services  Ann Thompson, Information Resources and Services Supervisor- Manager  Alan Shay, Data Analyst, Assessment  Serials Department staff (for cleaning the raw JR-1 files)  Dana Roth, Caltech Library
  4. 4. Outline  Introduction  Inspiration – the Elsevier Study Group and its charge  What resulted from the study group, and extending the model  Inputs: the universe of usage data and some uses for it – David Brennan  Outputs: PSU Authors, Author Citations, and editorial contributions – Nan Butkovich  Conclusions (if any!) / Discussion
  5. 5. Inspiration: the Elsevier Evaluation Team  David Brennan, Penn State Hershey  Nan Butkovich, Physical & Mathematical Sciences Library  Linda Musser, Earth & Mineral Sciences Library The team was charged by Lisa German, Associate Dean for Collections, Information and Access Services and the University Libraries’ Collections Services Advisory Group (CSAG), Collection Assessment Team to gather data to inform decisions related to Elsevier products. The team’s primary focus was on the ScienceDirect journal package – its use, cost, and impact. Work began in January, 2013. The team submitted its final report in June, 2013.
  6. 6. What resulted from the study group:  Usage data, JR1 - 5M+ hits over 5 years, 80% use from 20-23% of titles.  Cost-per-use data – Cost per use ranged from $2-$3.00. Based on a simple calculation derived from the contract cost in a given year against the aggregate use from the JR1  ILL data - Obtaining the raw data on borrowing requests was straightforward, however, determining which requests were for Elsevier titles required manual title matching. The intent of our analysis was to not only identify the extent of borrowing from Elsevier titles overall but to identify any titles for which we were exceeding the number of free requests allowed under CONTU guidelines as a basis for determining if subscribing to these titles was a better economic value than ILL. Based on list price, only 2 titles were.  Publishing and Citation data (Web of Science) 18.8% of the papers published by Penn State authors in 2011 were in Elsevier titles. PSU authors cited 1,355 of 1,908 titles in 2011 (71%). Of the 249,187 cited references to items from all publishers, Penn State authors cited items from works currently published by Elsevier a total 37,006 times in 2011 (14.9%)  Data on PSU editors and editorial-board members of Elsevier journals – This data was gathered by searching “Penn State Elsevier editorial board” or “Pennsylvania State Elsevier Editorial” in Google. 108 separate titles were determined to have some level of Penn State involvement. 119 faculty members served in some capacity as: editors-in-chief (9), associate editors (11), board members (86), advisory committee members (2) and one of each of book review editor, journal management committee member, senior editor, and advisory editor. (Data thanks to Lisa German)
  7. 7. Leveraging the techniques we learned from this study – going beyond Elsevier  What kinds of data sources are there and what are their limitations?  Can we use these data (even though imperfect) to make collection decisions?  Can we use these data (even though imperfect) to show library impact and value?
  8. 8. Sucked into the universe of usage data….. National Geographic Society/Red Vision/C4 Studios/Pioneer Productions Retrieved Oct. 28, 2013, from
  9. 9. Why collect all this data(1)? Collection Development in one slide: The Universe of Stuff Have it? Don’t have it? Keep it? Get it? Data= (maybe ≠…) how do we decide?
  10. 10. Why collect all this data(2)? “Library Value”: The Universe of Stuff Have it? Don’t have it? Keep it? Get it? Promote stuff and services related to it.
  11. 11. Aaberg, Jason. (2006, March 25). Bullet. Stock Xchng. Retrieved Sept 28, 2013, from
  12. 12. Every bit of data has *some* impact (and some drawback) Impact Factor (or other Measure) Outputs Program/ Curriculum/ Subject Needs Licensing An important journal title! Purchase Requests From Users Use JR1, 1a Budget / Cost Potential Use? Turnaways, JR2 ILL Activity
  13. 13. Two examples of data mining and use  Impact factors - top holdings related to liaison assignments  JR1/1a usage and issues – the dilemmas of too much information Audience and purpose!
  14. 14. Impact factors - top holdings related to liaison assignments • Publishing and citation data (and by extension, the use of impact factors) (Haddow, 2007) can also be used to influence collection decisions (again inasmuch as there is the ability to swap titles in and out of packages, and given the limitations of IF (e.g. EASE, 2012)). • IF is a known quantity that is more familiar to library users than other measures, and is commonly touted by vendors and publishers. Many journal landing pages prominently show their IF. • Libraries do have a role in showing how to appropriately use the IF and other bibliometric data (e.g. Emory libguide) • Even with large packages, there are still high impact titles that are outliers – recognizing these gaps and demonstrating current coverage is part of showing library value and meeting the needs of end users. • As this analysis extends to all of the liaison areas, a clearer picture will emerge of collection strengths and needs. 1. Haddow G. Evidence Summary: Level 1 COUNTER Compliant Vendor Statistics are a Reliable Measure of Journal Usage. Evidence Based Library and Information Practice , 2007, 2(2). 2. European Association of Science Editors. The EASE Statement on Inappropriate Use of Impact Factors. 2013. Retrieved from 3. Emory Libraries & Information Technology, Robert W. Woodruff Library. Impact Factors and Citation Analysis [Libguide] 2013. Retrieved from
  15. 15. JR1/1a usage and issues – the dilemmas of too much information • Use metrics have value, but only inasmuch as there is the ability to swap titles in and out of a package when use data dictates. Will talk about CPU data later. • The proper analysis of “use” is of greater concern, with the well-known issues of the COUNTER standards and their implementation having an impact in how useful this data can be. (Welker, 2012) - What is “use”? (Nicolson-Guest & Macdonald, 2013) and what is “Cost-per use?” (Harrington & Stovall, 2011) • pdf v html and platform design (Bucknell, 2012) • Use data is still an easily demonstrable measure to use, with some manipulation. • Long tail (20% of titles accounting for 80% of use leaves 80% of titles with diminishing returns, even if none of them are truly “zero use“) • Backfile confusion (JR1 vs. 1A) – if the point is to use data to influence subscription decisions, then owned backfile use might not be part of the equation – this is debatable (Bucknell, 2012) 1. Welker J. Counting on COUNTER: The Current State of E-Resource Usage Data in Libraries, Computers in Libraries, 2012, 32(9). Retrieved from 2. Nicolson-Guest B.; Macdonald D. Are we comparing bananas and gorillas? Interpreting usage statistics for cost benefit and reporting, ALIA Information Online 2013 Conference Proceedings, 2013. Retrieved from 3. Harrington M.; Stovall C. Contextualizing and Interpreting Cost per Use for Electronic Journals Proceedings of the Charleston Library Conference, 2011. 4. Bucknell, T. Garbage in, gospel out: Twelve reasons why librarian should not accept cost per download figures at face value. The Serials Librarian, 2012, 63(2), 192-212.
  16. 16. Example: Elsevier  JR1 “Number of Successful Full-Text Article Requests [SFTARs] by Month and Journal (COUNTER Required and Compliant) – current year”* - Aggregate usage (subscribed and backfiles) - Includes nulls - Reporting past 5 years of use (means title changes and add/drops reflected in data) *Description of the ScienceDirect Customer Usage Reports
  17. 17. Example: Elsevier  JR1a “Number of Successful Full-Text Article Requests [SFTARs] from an Archive by Month and Journal (COUNTER Required and Compliant)”* - Backfile usage only *Description of the ScienceDirect Customer Usage Reports
  18. 18. Example: Elsevier Biochemical systematics and ecology (0305-1978) -from 1973 to 1994 in ScienceDirect Agricultural & Biological Sciences Backfile and ScienceDirect Environmental Science Backfile -from 1974 to 1994 in ScienceDirect Biochemistry, Genetics & Molecular Biology Backfile -from 06/28/1974 to 2009 in ScienceDirect Journals Aggregate use (JR1): Backfile use (JR1a):
  19. 19.
  20. 20. Publishers by the Numbers: Where PSU Authors Publish and What They Cite Nan Butkovich, Associate Librarian and Head, Physical & Mathematical Sciences Library The Pennsylvania State University University Park, PA
  22. 22. SOURCE DATA
  23. 23. Where do PSU authors publish?  Searched Web of Science (Source Searches)  Arts & Humanities Citation Index, Social Science Citation Index, Science Citation Index  “Penn State” in the address field  Searched for articles published in 2011  Looked at Publisher (PU) field  6,928 articles in 2011
  24. 24. And the #1 publisher choice for PSU authors is… #1 Elsevier (and its various subsidiaries) 2011: 1,290 of 6,928 articles (18.6%)
  25. 25. What about the competition? #2: Wiley (and subsidiaries)  2011: 774 articles (11.2%) #3: Springer (and subsidiaries)  2011: 453 articles (6.5%) #4: American Chemical Society  2011: 401 articles (5.8%)
  26. 26. The “take away”…  Elsevier published almost as many articles by PSU authors as the #2, #3, and #4 publishers… COMBINED!  And together, the four publishers accounted for 42.1% of all articles published by PSU authors in 2011. WOW!
  28. 28. Citation analysis: Where do the numbers come from?  Citation analysis has a long and rich history in collection development and management – the first study was done in 1927!1  Cited references were from papers with PSU authors that were published in 2011 and indexed in Web of Science  Citation data for each of the 3 main citation indexes examined separately  JR1 files (from Serials Solutions) provided the lists of titles for the four publishers included in this part of the study 1 Gross, P. L. K.; Gross, E. M. College Libraries and Chemical Education. Science 1927, 66 (1713), 385-389.
  29. 29. Caveat: These data are biased  The four publishers that were selected for this phase of the study are all heavily oriented to the STEM disciplines  The Web of Science database is also biased toward STEM disciplines  It indexes journal articles, and STEM researchers publish in journals more often than do researchers in non-STEM disciplines  While Web of Science now publishes book citation indexes, they are separate databases and were not included in this study  Penn State is a land-grant institution with a heavy STEM focus
  30. 30. Challenges  Web of Science data required significant cleaning before use  Manual extraction of citation data from records  Lack of consistency in journal abbreviations (one title can have several abbreviations)  JR1 files  No easy way to compare list of full titles to the list of cited journal abbreviations  Had to be manually cleaned to be useful  Confidentiality clauses in licenses  Had to aggregate the data from the four publishers rather than show data for individual publishers (sorry about that)
  31. 31. Most of the clean-up and calculations were done the “old-fashioned” way…
  32. 32. Terminology: Cited titles  These are specific titles that were cited  The counting unit is the journal itself  Each journal is counted once
  33. 33. Journals by the numbers  American Chemical Society – 51 titles  Elsevier (and subsidiaries) – 1,777 titles  Springer (and subsidiaries) – 1,534 titles  Wiley (and subsidiaries) – 1,471 titles  Total number of titles used in this study (n) – 4,833
  34. 34. Cited titles – How many were cited?  Our collection n=4,833 titles  Of these,  3,169 were cited in the 2011 Science Citation Index (65.6%)  1,430 were cited in the 2011 Social Science Citation Index (29.6%)  205 were cited in the Arts & Humanities Citation Index (4.2%)
  35. 35. Cited titles – How many were unique to one citation index?  n = 4,833 titles  1,789 were cited only in Science Citation Index (37.0%)  298 were cited only in Social Science Citation Index (6.2%)  20 were cited only in Arts & Humanities Citation Index (0.4%)  1,286 were not cited at all (26.6%)
  36. 36. The “take-away”…  In 2011…  PSU authors cited 3,547 titles out of 4,833 currently subscribed titles from these four publishers  That’s 73.4%!  Most of these were cited in STEM publications
  37. 37. Terminology: Cited references  These are the references that were cited by PSU authors in their publications  The counting unit is the publication that was cited  Publications are counted each time that they are cited
  38. 38. Cited references: What do PSU authors cite?  Data from Web of Science  2011 data for all publishers, all publications indexed in Web of Science  Science Citation Index  183,393 publications cited by PSU authors  Social Science Citation Index  58,802 publications cited by PSU authors  Arts & Humanities Citation Index  6,992 publications cited by PSU authors
  39. 39. Cited references: The Big Four Science Citation Index  63,572 citations (the Big Four) out of 183,393 citations  34.7% Social Science Citation Index  14,364 citations out of 58,802 citations  24.4% Arts & Humanities Citation Index  568 citations out of 6,992 citations  8.1%
  40. 40. The “take-away”…  78,504 out of 249,187 publications cited by PSU authors in 2011 were published by one of these four publishers…  That’s 31.5%!  Most citations were of STEM publications (no surprise there)
  41. 41. Deeper questions  So far, these citation data highlight the importance of these four publishers in STEM publishing  They can be useful for collection development purposes, but what else can you do with the citation data?  Things that I’ve wondered about…  What is the cost/use of these publications?  How many articles does a researcher read for every article that he or she cites?  These can all be calculated using JR1 data and citation data
  42. 42. Cost per use  This measure has been around since long before the e-journal was a gleam in some publisher’s eyes  Now it’s usually cost/view, but the goal is the same: Determine how much the institution pays each time a patron opens a document  At some magic number it becomes cheaper to cancel a publication and get the desired articles through ILL or document delivery rather than subscribe to the publication  It works for e-journal packages too…
  43. 43. Penn State’s cost/view  Combined expenditure: $4,749,866.67  Total number of views (the Big Four) : 1,790,333 Cost/view (use) = $4,749,866.67 = $2.65 1,790,333
  44. 44. Articles viewed per citation  Citations are just the end product… how many are viewed to get the one citation?  There are a number of papers that have examined the number of articles read by researchers… most notably those of Tenopir & King 1  Information on reads (views)/citation sparse 2  Although the method works, the specific values will vary according to what is being measured 1 For example, King, D. W.; Tenopir, C.; Clarke, M. Measuring Total Reading of Journal Articles. D-Lib Magazine 2006, 12 (10), 2 Kurtz, M. J.; Eichhorn, G.; Accomazzi, A.; Grant, C.; Demleitner, M.; Murray, S. S.; Martimbeau, N.; Elwell, B. The Bibliometric Properties of Article Readership Information. Journal of the American Society for Information Science and Technology 2005, 56 (2), 111-128.
  45. 45. Penn State’s articles viewed/citation  Number of items viewed (for Penn State’s Big Four publishers): 1,790,333  Number of cited references (to Big Four journals by Penn Staters): 78,504 Views/citation = 1,790,333 = 22.8 78,504
  46. 46. Thoughts about the process  The results of citation studies like this will vary depending on the sources of the data used to the study  Looking at the publisher level citation data can be useful with so many publishers bundling the journals into packages  Easier to identify top level publishers than those that are bottom tier  Models for evaluating adds/drops will be highly variable, depending on local needs and the weights assigned to each of the variables, but the simple decisions have already been made. New models and approaches will be needed, and gathering the requisite data will require careful thought to workflow and the design of analysis tools.
  47. 47. Thank you!