Presentation on creating a method for benchmarking metadata consistency in OpenURL links. See also: <http: />. Delivered at the July 2009 American Library Association conference in Chicago.
1. Towards OpenURL Quality
Metrics: Initial Findings
Adam Chandler
Cornell University Library
2009 American Library Association Annual Conference, Chicago
3. OpenURL model cont.
incoming OpenURL
http://linkresolver.library.cornell.edu:4550/resserv?
&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-
level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit
=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bin
gley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&
rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,
+n&rft_id=info:doi/10.1108%2f07378830910942991/
in our knowledge base?
title: Library hi tech issn: 0737-8831 start date: 19970101 end date:
link-to syntax for Emerald
http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-
HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#
4. OpenURL is pervasive
Cornell link resolver alone:
July 1, 2008 – June 30, 2009:
402,000 OpenURL service
requests.
Estimate: 402,000 * 123(ARL libraries) =
49 million
5. Cornell’s top 10 OpenURL sources
1. Web of Knowledge
2. Google Scholar
3. Webfeat (our “Find Articles” service)
4. EBSCOHost
5. OCLC FirstSearch
6. SilverPlatter
7. Weill Cornell Medical Center
8. SciFinder Scholar
9. PubMed
10. Refworks
9. Literature review
Since the OpenURL standard was
introduced some ten years ago I can
identify no systematic study designed and
carried out to benchmark the quality of
linking.
10. Wakimoto, Walker, and Dabbour (2006)
Main finding: Users just expect full-text.
When they do not get it they are
disappointed.
Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The
Myths and Realities of SFX in Academic Libraries." The Journal of Academic
Librarianship 32 (2): 127–136
11. Wakimoto, Walker, and Dabbour (2006)
"Where does SFX start and where does it end? If
an SFX request does not result in a full-text link,
does the problem lie with the source database’s
metadata, the construction of the OpenURL
request, the SFX KnowledgeBase, the SFX
software, the resulting target resource, or even
the local library’s collection development plan?"
(p. 134)
Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The
Myths and Realities of SFX in Academic Libraries." The Journal of Academic
Librarianship 32 (2): 127–136
12. … but finding the cause of the problem is hard
• Wrong start end date in the local library's holdings
knowledge base (see KBART)
• Semantically inaccurate metadata from the OpenURL origin
(wrong ISSN, for example)
• Wrong link-to syntax in link resolver
• Fragile handling of incoming links by content provider
• Inaccurate or missing Crossref DOI URL (sometimes the DOI
registration process is out of sync with the mounting of
articles)
• Subscription errors (especially with the start of a new
calendar year)
• Syntactically incorrect metadata from the OpenURL origin
13. Blake and Knudson (2002)
• “Increased communication between primary
publishers and secondary publishers.
Metadata corrections and updates need to be
better coordinated.”
See: Culling, James (2007). "Link Resolvers and the Serials Supply
Chain." UKSG. <http://www.uksg.org/projects/linkfinal> and NISO/UKSG
KBART
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
Linking." Library Collections, Acquisitions & Technical Services 26 (3),
(2002): 219-230.
14. Blake and Knudson (2002)
• “Increased awareness of bibliographic/citation
standards by authors. Increased submission of
publications with bibliographical references
reflecting the accepted standards.”
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
Linking." Library Collections, Acquisitions & Technical Services 26 (3),
(2002): 219-230.
15. Blake and Knudson (2002)
• “Increased outreach by librarians to authors
emphasizing and promoting the importance of
citation standards for electronic document
retrieval.”
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
Linking." Library Collections, Acquisitions & Technical Services 26 (3),
(2002): 219-230.
16. Blake and Knudson (2002)
• “Increased consistency in metadata within a
single database and across databases. This
would result in a higher success rate of linking
and would allow the algorithms to be simpler.
Simpler algorithms are easier to maintain and
modify.”
Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference
Linking." Library Collections, Acquisitions & Technical Services 26 (3),
(2002): 230.
17. Hughes (2004)
• Hughes describes an initiative of the Open
Language Archives Community (OLAC), a
consortium of linguistic data archives, to
create an infrastructure to support metadata
quality assessment within a specialized Open
Archives Initiative (OAI) community.
.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open
Language Archives Community. 7th International Conference on Asian
Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
Proceedings, pp 320-329.
18. Hughes (2004)
• Metadata quality should be evaluated on a
per record and per collection basis and
assessed against the baseline of broader
community practice. Metadata quality
requires both structural and semantic
validation.
.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open
Language Archives Community. 7th International Conference on Asian
Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
Proceedings, pp. 320-329.
19. Hughes (2004)
• Goals:
– establish a baseline against which future
instances can be compared;
– provide assistance to data providers;
– evaluate a set of domain-grounded controlled
vocabularies.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open
.
Language Archives Community. 7th International Conference on Asian
Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
Proceedings, pp. 320-329.
20. Hughes’ approach
• Each metadata record score from 0 - 10.
• There are two parts, a "Code Existence Score and an Element
Absence Penalty," with weighting.
• The Code Existence Score is specific to the OLAC communities use
of Dublin Core extensions.
• The Element Absence Penalty is based on the premise that the
usefullness of a given metadata decreases in the absence of core
metadata fields.
• The absence of a core element results in a negative 0.2 penalty.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open
Language Archives Community. 7th International Conference on Asian
Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
Proceedings, pp. 320-329.
21. Hughes’ approach
• From this simple approach, an array of metrics are derived:
– archive diversity;
– metadata quality;
– core elements per record;
– core element usage;
– code usage;
– code and element usage;
– star rating.
• From these metrics a score is computed for each metadata
record, each archive, and the community as a whole.
Baden Hughes, Metadata Quality Evaluation: Experience from the Open
Language Archives Community. 7th International Conference on Asian
Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004.
Proceedings, pp. 320-329.
22. Mellon funded planning grant for
L'Année philologique
1. Canonical Citation Linking: http://cwkb.org
In collaboration with Eric Rebillard, Professor, Classics
and History, and David Ruddy, Cornell University
Library
2. OpenURL Quality
Is it possible to build a system for evaluating OpenURL
quality from a content provider?
23. Key findings from 2008 Mellon
OpenURL quality investigation
Hughes’ approach to metadata evaluation is
excellent scaffolding to help build a model
for OpenURL metadata evaluation, but it
does not match the problem exactly.
24. Constant 1: Key elements used by content
providers in their link-to targets
title - 64%
spage - 64%
volume - 61%
issue - 60% Based on an analysis of link-
tos in the Cornell instance of
date - 48% the III WebBridge link resolver
aulast - 47% product.
issn - 35%
atitle - 35%
DOI - 14%
ISBN – 5%
29. aulast_other examples
Ryan S Miller
Louise D Bryant
DAVID J MCKENZIE
%C4%90okovi%C4%87
Indu B Ahluwalia
Carreras-Sangr%c3%a0
Bautista-Casta%C3%B1o
O%27Shea
Melissa Ventura Marra
Guan XueYing%3B Yu Nan%3B Shangguan XiaoXia
44. Next steps
• add non-Cornell data, from libraries or link
resolver vendors (model is agnostic to source)
• confirm and publicize key elements used by
target syntaxes
• outreach to content providers
• refine and expand metrics
• more reports
– longitudinal by source
– compare frequency of an element’s use across
sources
– compare frequency of an element pattern across
sources
45. How to stay in the loop
http://openurlquality.blogspot.com/
Adam Chandler
Database Management and Electronic
Resources Librarian
Library Technical Services
Cornell University Library
tel: 607-255-5760
email: alc28@cornell.edu