Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of Value


Published on

Presentation given by Joseph Greene, Research Repository Librarian at University College Dublin Library, at the LIBER 2017 Conference in Patras, Greece, on July 6, 2017.

Published in: Education
  • Be the first to comment

  • Be the first to like this

COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of Value

  1. 1. UCD Library University College Dublin, Belfield, Dublin 4, Ireland Leabharlann UCD An Coláiste Ollscoile, Baile Átha Cliath, Belfield, Baile Átha Cliath 4, Eire COUNTER standards for Open Access: The value of measuring/the measuring of value LIBER 2017 Patras, 6 July Joseph Greene Research Repository Librarian University College Dublin
  2. 2. Introduction Defining success, defining value
  3. 3. Call: define success “…[we] too often conflate several rather different objectives for transforming scholarly communications...”
  4. 4. First principles: BOAI 2002 “…the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it...”
  5. 5. Measure distribution Tipping point: in 2014, more than 50% of recent papers (2011-2013) were found to be Open access Archambault, E. et al. (2014). Proportion of Open Access Papers Published in Peer-Reviewed Journals at the European and World Levels: 1996–2013 (41p.). Produced for the European Commission DG Research & Innovation.
  6. 6. Measuring ‘free and unrestricted access’ Defining value, measuring value
  7. 7. OA Citation advantage • At least 40 separate studies show that Open Access increases citations1,2 • Wide variations between disciplines • 35% increase in mathematics2 • 500% increase in citations in physics/astronomy2 • Most recent study: 3.3 million papers3 • Average: OA = 50% more citations • (Green is overall the better strategy) 1Wagner, B. (2010) ‘Open Access Citation Advantage: An Annotated Bibliography’. DOI: 10.5062/F4Q81B0W 2Swan, A. (2010) ‘The Open Access citation advantage: Studies and results to date’. 3Archambault, E. (2016) ‘Research impact of paywalled versus open access papers’.
  8. 8. OACitationAdvantage { if (papers_are_OA) { papers_are_accessible = true; citationAdvantage(); } } citationAdvantage { if (papers_are_accessible) { ++papers_read; ++chance_of_citation; } }
  9. 9. Measuring access Usage data as metric
  10. 10. BOAI15 'Means should exist that will permit having some idea of the value and quality of each document, for example, a number of metrics having to do with views, downloads, comments, corrections' Guédon, Jean-Claude (2017-02). Open Access: Toward the Internet of the Mind. access-toward-the-internet-of-the-mind
  11. 11. European Commission 'Usage metrics are highly relevant for open-science' Recommend 'making better use of existing metrics for open science' including usage metrics Directorate-General for Research and Innovation (2017-03). Next-generation metrics: Responsible metrics and evaluation for open science DOI:10.2777/337729
  12. 12. Coalition for Networked Information 'Researchers and librarians at several universities are working to make analytics on use of items in IRs more reliable‘ But 'statistics generated by the systems are poor and do not demonstrate impact' CNI Executive Roundtable (2017-04). Rethinking Institutional Repository Strategies. institutional-repository-strategies
  13. 13. OA usage statistics
  14. 14. Usage data are not perfect • Up to 85% of OA repository downloads come from non-human agents1 • At least 40% of OA journal downloads are not human2 • Even with robot detection, there is room for improvement3 • DSpace stats: 62% human • EPrints stats: 55% human • U. Minho DSpace stats: 59-73% human 1Greene, J. (2016) 'Web robot detection in scholarly Open Access institutional repositories'. Library Hi Tech, 34 (3):500-520 2Huntington, P., Nicholas, D., & Jamali, H. R. (2008). Web robot detection in the scholarly information environment. Journal of Information Science, 34(5), 726-741 3Greene, J. (2016) 'How Accurate are IR Usage Statistics?’. Open Repositories (OR2016) Dublin, 13-16 June 2016
  15. 15. Creating standards Raw data to empirical knowledge
  16. 16. Problems • Many ways to do robot detection • (At least 23 in the literature , not to mention combinations) • Nothing resembling a standard available • Cross-platform comparison and aggregation impossible
  17. 17. Addressing the problem • COUNTER Robots Working Group • Joseph Greene, UCD, RIAN (chair) • Lorraine Estelle, Project COUNTER • Paul Needham, IRUS-UK/COUNTER • Representatives from EBSCO, Elsevier, Wiley, ScholarlyIQ, DSpace, EPrints, DigitalCommons, OpenAIRE, Base Bielefeld and Open Journal Systems “…to devise ‘adaptive filtering systems’ that will allow publishers/repositories/services to follow a common set of rules to dynamically identify and filter out unusual usage and robot activity”
  18. 18. Usage data sources .csv .csv .txt Source: Bielefeld/OJS (x3) Lines: 233,000 Source: IRUS-UK (97 IRs) Lines: 1.9 million Source: Wiley Lines: Several million PostgreSQL database Several million rows Period: 3-9 October 2016
  19. 19. Robot detection • Simple random sample taken • 202-204 downloads for each dataset • 95% certainty • 12 syntactic variables from SQL queries or added manually • E.g. IP address, agent, IP owner • 12-13 behavioural variables added using SQL queries or API calls • E.g. number of downloads by user, number of items downloaded, dates/times seen
  20. 20. Mozilla/5.0 (compatible; Baiduspider/2.0; + Mozilla/5.0 (compatible; Googlebot/2.1; + Mozilla/5.0 (compatible; spbot/5.0.3; + ) Sogou web spider/4.0(+ gsa-crawler(Enterprise; T4-BLNCV2FADUSTW; Mozilla/5.0 (compatible; YandexBot/3.0; + RePEc link checker ( Jakarta Commons-HttpClient/3.0.1 Betsie Self-declared robots
  21. 21. Undeclared but obvious behaviour
  22. 22. Testing filters • Test existing COUNTER robots list • Test existing COUNTER double-click filter • Rate of requests • Volume of requests • User agents per IP address • Requests where requested item = referring URL
  23. 23. Testing filters • Simulate a set of filters on the datasets • Assign true/false positives, true/false negatives compared with manual determination • Calculate: • Recall, precision (excluded stats) • Inverse recall, inverse precision (reported stats) • Find best combination of filters, balance of practicality and accuracy
  24. 24. Results: COUNTER Code of Practice Release 5, 2017