Good afternoon everyone, or Good morning to those of us on the West Coast. Thanks to Peter and Peter for great presentations, which makes my job that much harder. I’d like to start by giving everyone a little background on me: I’m the library director at the Claremont Colleges in California and have spent most of my professional research interest in trying to understand academic information usage behaviors by examining statistical and quantitative information. I’ve done research on online journal usage by studying OpenURL logs and am currently working on understanding print and ebook usage in an academic setting. Today I’m not presenting research that I’ve done, since I’ve already presented elsewhere on my most recent projects. Instead, I’d like to talk about the importance of doing usage data research and the applicability of research in practice.
As evidenced by the participation here today, there’s currently intense interest in studying usage data. We now have the ability to measure information usage actions, including downloads of articles, citation of articles, and to measure the actions between resources, for example between a database search and the objects used in the results sets. In addition, the normal librarian has the computing power to analyze these datasets since, like COUNTER has done, there are mechanisms to deliver usage data to us and standards for that information provision and distribution. And finally, we now, as librarians, are very interested in our Return on Investment as we seek to rationalize our decisions and justify our value to our organizations.
I feel that while we’re still at a nascent stage of usage data research, it’s an exciting time because there are endless possibilities for what data is collected and how to craft a research project around specific data sets. We’ve had ISI citation data for a long time, but now we can match that with things like COUNTER reports of journal, database, or ebook usage. Many publishers provide robust and interesting reports that may not be COUNTER compliant but are useful data for analysis. We also have web server logs, proxy logs, openURL logs at our disposal, and even have third party non-library software that we can use to understand our users like Google Analytics.
There are basically two types of usage data research that is important for librarians to understand and/or perform. There is the more theoretical, bibliometric-based, studies and then there are practitioner based studies that help us put some theory into place and address specific issues within our institutions. Three theoretical studies I’d like to highlight today include Bollen’s study of journal Centrality Measures, Rosvall & Bergstrom’s journal maps based on citation data, and Phil Davis’ studies on open access citation rates.
Bollen and his colleagues have been working on some advanced statistical studies of citation and usage, and this study tested 39 journal measures of ‘impact’, some very standard like ISI Impact Factor, and other self-developed by combing OpenURL server logs at their institution. What they found that is important for the practictioner is that the centrality measures indicated that usage-based measures do not correlate with citation-based measures, indicating that they are different ‘usage’ events.
This figure is from that study, and you’ll see that the Usage Based Measures are clustered closely together but far away from the citation based measures. While the citation based measures are not clustered closely even near themselves. This study is also interesting for the statistical tests used: spearman rank-order correlation to address the non-linear datasets involved.
Now turning to another study by Rosvall & Bergstrom that used citation data to draw scientific maps of relationships. Their study found that most basic science fields have bidirectional relationship with each other, but many applied fields cite journals in the basic sciences without the same reciprocal relationship.
Here’s an map from their article illustrating this relationship. The fields of Medicine and Molecular & Cell Biology cite each other heavily and equally, while the applied field of Ecology & evolution cites Molecular & Cell Biology heavily, but is not cited by that field to the same degree. I find this fascinating and would like to see this study repeated with usage data, either at the global or the local level.
Davis’ studied 11 journals who randomly assigned open access status to articles. He found that there was no evidence that OA articles accumulated more citations than paid articles.
And again, this study is interesting since we could use usage data to repeat the study and find out if citers and readers exhibit different behaviors based on the cost of an article. And the linear regression model that he developed is interesting, as you can see the complexity of the relationship of the independent variables that were studied.
Now onto the practioner articles. The following articles all come from the most recent issue of the Journal of Electronic Resources Librarianship that I edited. Betty used Google Analytics to study online tutorials, Grigson studied usage data of different ebook vendors to determine the best business modelf or their institution, and Kinman used a novel application of Edward Tufte’s sparklines to graphically display usage data.
Betty’s study involved installing google analytics to track the usage of his department’s web-based tutorials for library instruction. The study is fascinating because there isn’t yet a large corpus of research on the use of Google Analytics in the library environment and the results that he found from the study.
Those results found that for one tutorial, a significant portion of the hits were from an unintended audience, giving him the opportunitiy to look into ways to mitigate that in the future. He also found that there were high hit rates
NISO Webinar on Usage Data: An Overview of Recent Usage Data Research
<ul><li>NISO Webinar on Usage Data </li></ul><ul><li>An Overview of Recent Usage Data Research </li></ul><ul><li>John McDonald </li></ul><ul><li>Libraries, Claremont University Consortium </li></ul><ul><li>May 13, 2009 </li></ul>
Increased Interest in Usage Data <ul><li>Ability to measure actions </li></ul><ul><ul><li>Usage </li></ul></ul><ul><ul><li>Citation </li></ul></ul><ul><ul><li>Relationships between resources </li></ul></ul><ul><li>Ability to analyze large datasets </li></ul><ul><ul><li>Computational power </li></ul></ul><ul><ul><li>Data provided directly to librarians </li></ul></ul><ul><ul><li>Standards for data and distribution </li></ul></ul><ul><li>Ability to demonstrate return on investment </li></ul><ul><ul><li>Management data </li></ul></ul><ul><ul><li>Collections data </li></ul></ul>
New Ways to Collect Usage Data <ul><ul><li>ISI Citation Data </li></ul></ul><ul><ul><li>COUNTER reports </li></ul></ul><ul><ul><li>Publisher provided data </li></ul></ul><ul><ul><li>Web server logs </li></ul></ul><ul><ul><li>Proxy server logs </li></ul></ul><ul><ul><li>OpenURL resolver logs </li></ul></ul><ul><ul><li>Google Analytics </li></ul></ul>
Theoretical Analysis of Usage Data <ul><ul><li>Bollen’s Centrality Measures </li></ul></ul><ul><ul><li>Rosvall & Bergstrom’s Scientific Communication Maps </li></ul></ul><ul><ul><li>Davis’ Open Access studies </li></ul></ul>
Citation and Usage Data Measures Bollen, Van de Sompel, Hagberg, Chute (2009). A principal component analysis of 39 scientific impact measures. arXiv . Available: http://arxiv.org/PS_cache/arxiv/pdf/0902/0902.2183v1.pdf <ul><li>A study of 39 journal measures, both standard bibliographic measures derived from citation and other measures derived from usage. </li></ul><ul><li>Outcomes included that citation and usage are distinctly different events and measures based on them do not correlate closely. </li></ul>
Figure 2 from Bollen et. Al. (2009). Usage based measures Citation Based Measures
Illustration of Citation Networks Rosvall & Bergstrom (2008). Maps of random walks on complex networks reveal community structure. PNAS Available: http://octavia.zoology.washington.edu/publications/RosvallAndBergstrom08.pdf <ul><li>A scientific map of the citation relationships between 6000+ ISI-indexed journals. </li></ul><ul><li>Outcomes indicate that many basic science fields have bidirectional relationships with other fields, while most applied fields have uni-directional relationships with the basic science fields. </li></ul>
Analysis of Open Access citations Davis (2008). Author-choice open access publishing in the biological and medical literature: a citation analysis. arXiv . Available: http://arxiv.org/PS_cache/arxiv/pdf/0808/0808.2428v3.pdf <ul><li>A study of 11journals where open access status was assigned randomly to articles to determine the citation advantage for OA articles. </li></ul><ul><li>Outcomes included that OA articles were not more likely to accumulate citations than paid access articles. </li></ul>
Evidence Based Analysis of Usage Data <ul><li>Betty’s Google Analytics of Local Content </li></ul><ul><li>Grigson’s Analysis of eBook Models </li></ul><ul><li>Kinman’s Use of Sparklines </li></ul>
Analyzing Local Usage Data <ul><li>Betty (2009). Assessing Homegrown Library Collections: Using Google Analytics to Track Use of Screencasts and Flash-Based Learning Objects . Journal of Electronic Resources Librarianship. Volume 21:1, 75 – 92. </li></ul><ul><li>A study utilizing Google Analytics to track the use of web-based tutorials for library instruction. </li></ul><ul><li>Outcomes included information about the total hits to each tutorial, usage throughout a tutorial, connection speed, browser software components </li></ul>
Betty’s Tutorials Usage Results <ul><li>One tutorial had 23% of its hits recorded by an unintended audience </li></ul><ul><ul><li>Possible action: Better marketing/description of the content </li></ul></ul><ul><li>High hits for beginning & end of tutorials </li></ul><ul><ul><li>Possible action: Shorten or revise content in areas being skipped </li></ul></ul><ul><li>Most users had necessary software to view files </li></ul><ul><ul><li>Possible action: None needed </li></ul></ul><ul><li>A significant minority of users had dial-up access </li></ul><ul><ul><li>Possible action: Produce multiple versions </li></ul></ul>
Evaluating eBook Usage Data <ul><li>Grigson (2009). Evaluating Business Models for E-Books Through Usage Data Analysis: A Case Study from the University of Westminster . Journal of Electronic Resources Librarianship. Volume 21:1, 62-74. </li></ul><ul><li>A study comparing usage of ebook packages provided by vendors with different acquisitions models (simultaneous users v. annual usage) </li></ul><ul><li>Outcomes resulted in a clearly preferred model focusing on annual usage to accommodate the high peaks of usage during academic semesters. </li></ul>
Table 1 from Grigson’s eBookUsage Indicates peak-use periods of high demand for portions of the collections
Evaluating Ebook Usage Data Kinman (2008). Putting the Trees Back in the Forest: E-Resource Usage Statistics and Library Assessment. ER&L, March 18-21, 2008, Atlanta, GA. https://smartech.gatech.edu/bitstream/1853/20665/1/forest_trees_kinman.pdf A description of a 5 year study on library services and resource usage, including a novel application of Tufte’s Sparklines http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR
Future Directions for Usage Data Analysis <ul><li>Auditing or compliance with standards </li></ul><ul><li>Non-text media (eBooks, podcasts, etc.) </li></ul><ul><li>Non-text subjects (i.e. Museums, Art) </li></ul><ul><li>More robust database analysis </li></ul><ul><li>Development of user-centered statistical standards </li></ul><ul><ul><li>Develop standard measures and standard tests to help in evaluation </li></ul></ul>
Thank you! Comments: John McDonald Libraries, Claremont University Consortium [email_address]