A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers
1. A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers Adam Chandler Cornell University Library Cornell University Library, Metadata Working Group Forum 16 October 2009
3. OpenURL model cont. incoming OpenURL http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/ in our knowledge base? title: Library hi tech issn: 0737-8831 start date: 19970101 end date: link-to syntax for Emerald http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#
4. OpenURL is pervasive Cornell link resolver alone: July 1, 2008 – June 30, 2009: 402,000 OpenURL service requests. 402,000 * 123(ARL libraries) = 49 million
5. Cornell’s top 10 OpenURL sources Web of Knowledge WorldCat Local Google Scholar Webfeat (our “Find Articles” service) EBSCOHost OCLC FirstSearch SilverPlatter Weill Cornell Medical Center SciFinder Scholar PubMed
9. … but quality of experience is difficult to benchmark Wrong start end date in the local library's holdings knowledge base (see NISO KBART) Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example) Wrong link-to syntax in link resolver Fragile handling of incoming links by content provider
10. … but quality of experience is difficult to benchmark Inaccurate or missing Crossref DOI URL (sometimes the DOI registration process is out of sync with the mounting of articles) Subscription errors (especially with the start of a new calendar year) Syntactically incorrect or missing metadata from the OpenURL origin
11. Literature review I can identify no systematic study designed and carried out to benchmark the quality of linking. The OpenURL standard was introduced some ten years ago.
12. Wakimoto, Walker, and Dabbour (2006) Main finding: Users just expect full-text. When they do not get it they are disappointed. Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
13. Wakimoto, Walker, and Dabbour (2006) "Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134) Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
14. Blake and Knudson (2002) “Increased awareness of bibliographic/citation standards by authors. Increased submission of publications with bibliographical references reflecting the accepted standards.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
15. Blake and Knudson (2002) “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
16. Blake and Knudson (2002) “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.” (NISO KBART role) Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
17. Blake and Knudson (2002) “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
18. Hughes (2004) Hughes describes an initiative of the Open Language Archives Community (OLAC), a consortium of linguistic data archives, to create an infrastructure to support metadata quality assessment within a specialized Open Archives Initiative (OAI) community. . Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
19. Hughes (2004) Metadata quality should be evaluated on a per record and per collection basis and assessed against the baseline of broader community practice. Metadata quality requires both structural and semantic validation. . Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
20. Hughes (2004) Goals: establish a baseline against which future instances can be compared; provide assistance to data providers; evaluate a set of domain-grounded controlled vocabularies. . Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
21. Hughes’ approach Each metadata record score from 0 - 10. There are two parts, a "Code Existence Score and an Element Absence Penalty," with weighting. The Code Existence Score is specific to the OLAC communities use of Dublin Core extensions. The Element Absence Penalty is based on the premise that the usefullness of a given metadata decreases in the absence of core metadata fields. The absence of a core element results in a negative 0.2 penalty. Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
22. Hughes’ approach From this simple approach, an array of metrics are derived: archive diversity; metadata quality; core elements per record; core element usage; code usage; code and element usage; star rating. From these metrics a score is computed for each metadata record, each archive, and the community as a whole. Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
23. Mellon funded planning grant for L'Année philologique 1. Canonical Citation Linking: http://cwkb.org In collaboration with Eric Rebillard, Professor, Classics and History, and David Ruddy, Cornell University Library 2. OpenURL Quality Is it possible to build a tool for evaluating the quality of OpenURLs from a content provider?
24. Key findings from 2008 Mellon OpenURL quality investigation Hughes’ approach to metadata evaluation is excellent scaffolding to help build a model for OpenURL metadata evaluation, but it does not match the problem exactly.
25. Constant: Core elements used by content providers in their link-to targets title - 64% spage - 64% volume - 61% issue - 60% date - 48% aulast - 47% issn - 35% atitle - 35% DOI - 14% ISBN – 5% Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.
27. aulast First author's family name. This may be more than one word. In many citations, the author's family name is recorded first and is followed by a comma, e.g. Smith, Fred James is recorded as "aulast=smith"
29. aulast_other examples Ryan S Miller Louise D Bryant DAVID J MCKENZIE %C4%90okovi%C4%87 Indu B Ahluwalia Carreras-Sangr%c3%a0 Bautista-Casta%C3%B1o O%27Shea Melissa Ventura Marra Guan XueYing%3B Yu Nan%3B ShangguanXiaoXia
30. spage First page number of a start/end (spage-epage) pair. Note that pages are not always numeric.
33. date The publication date of the item or bundle encoded in the "Complete date" variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYYMM- DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year.
42. Next steps create a NISO structure to wrap around the metrics: “NISO OpenURL Quality Index” add non-Cornell data from libraries and link resolver vendors (model is agnostic to source) confirm and publicize key elements used by target syntaxes can the quality of the global OpenURL network be modeled mathematically?
43. How to stay in the loop http://openurlquality.blogspot.com/ Adam ChandlerDatabase Management and Electronic Resources Research LibrarianCentral Library OperationsCornell University Librarytel: 607-255-5760email: alc28@cornell.edu