Towards OpenURL Quality
 Metrics: Initial Findings
                   Adam Chandler
               Cornell University Libr...
OpenURL model
OpenURL model cont.
incoming OpenURL
http://linkresolver.library.cornell.edu:4550/resserv?
&url_ver=z39.88-2004&url_ctx_fm...
OpenURL is pervasive

Cornell link resolver alone:
July 1, 2008 – June 30, 2009:
 402,000 OpenURL service
requests.
 Estim...
Cornell’s top 10 OpenURL sources
1. Web of Knowledge
2. Google Scholar
3. Webfeat (our “Find Articles” service)
4. EBSCOHo...
example OpenURL


http://linkresolver.library.cornell.edu:4550/resserv?
&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:...
example OpenURL (1)
http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004
&url_ctx_fmt=info:ofi/fmt:ke...
example OpenURL (2)
&rft.issue=1
&rft.place=bingley
&rft.pub=emerald+group+publishing+limited
&rft.spage=151
&rft.stitle=l...
Literature review
Since the OpenURL standard was
introduced some ten years ago I can
identify no systematic study designed...
Wakimoto, Walker, and Dabbour (2006)


Main finding: Users just expect full-text.
 When they do not get it they are
 disap...
Wakimoto, Walker, and Dabbour (2006)

"Where does SFX start and where does it end? If
  an SFX request does not result in ...
… but finding the cause of the problem is hard

• Wrong start end date in the local library's holdings
  knowledge base (s...
Blake and Knudson (2002)

• “Increased communication between primary
  publishers and secondary publishers.
  Metadata cor...
Blake and Knudson (2002)

• “Increased awareness of bibliographic/citation
  standards by authors. Increased submission of...
Blake and Knudson (2002)
• “Increased outreach by librarians to authors
  emphasizing and promoting the importance of
  ci...
Blake and Knudson (2002)
• “Increased consistency in metadata within a
  single database and across databases. This
  woul...
Hughes (2004)
• Hughes describes an initiative of the Open
  Language Archives Community (OLAC), a
  consortium of linguis...
Hughes (2004)

• Metadata quality should be evaluated on a
  per record and per collection basis and
  assessed against th...
Hughes (2004)
• Goals:
  – establish a baseline against which future
    instances can be compared;
      – provide assist...
Hughes’ approach
• Each metadata record score from 0 - 10.
• There are two parts, a "Code Existence Score and an Element
 ...
Hughes’ approach
• From this simple approach, an array of metrics are derived:
   –   archive diversity;
   –   metadata q...
Mellon funded planning grant for
         L'Année philologique

1. Canonical Citation Linking: http://cwkb.org
In collabor...
Key findings from 2008 Mellon
 OpenURL quality investigation

Hughes’ approach to metadata evaluation is
excellent scaffol...
Constant 1: Key elements used by content
    providers in their link-to targets
  title - 64%
  spage - 64%
  volume - 61%...
Constant 2: Frequency of element
  string patterns for all sources
Relational model
aulast
if ($element =~ /aulast/) {
     if ($sid =~ /firstsearch/) { if ($element =~ /rft.aulast/) { next; } }
     $patte...
Simple flat structure
aulast_other examples
Ryan S Miller
Louise D Bryant
DAVID J MCKENZIE
%C4%90okovi%C4%87
Indu B Ahluwalia
Carreras-Sangr%c3%...
spage

if ($element =~ /spage/) {
  if ($sid =~ /firstsearch/) { if ($element =~ /rft.spage/) { next; } }
  $patterns{alls...
spage_other examples
•   1033 (6 pages)
•   85(19)
•   575 (11 pages)
•   283...290
•   PHYS
•   GLRM
•   58,+VI
date
if ($element =~ /date/) {
     if ($sid =~ /firstsearch/) { if ($element =~ /rft.date/) { next; } }
     $patterns{al...
date_other examples
•   1956 July
•   %7E1994
•   June 5%2C 2002
•   JUN 30 05
•   2006%282007%29
•   1922,+April+25th
•  ...
issn_other
if ($element =~ /issn/) {
     if ($sid =~ /firstsearch/) { if ($element =~ /rft.issn/) { next; } }
     $patte...
issn_other examples
• 0065-2598%28print%29
• 0018-5345+%28ISSN+print%29
• ISSN ISBN 0-9525091-5-6.
• 0021-8375%28print
  %...
How often?
metric          frequency in July-Sep 2008 sample


au_last_other   5476
spage_other     772
date_other      59...
Demo of OQ UI
Element report
Element report
Pattern report
Pattern report
Pattern report
Next steps
• add non-Cornell data, from libraries or link
  resolver vendors (model is agnostic to source)
• confirm and p...
How to stay in the loop
http://openurlquality.blogspot.com/

                 Adam Chandler
                 Database Mana...
Towards OpenURL Quality Metrics: Initial Findings
Upcoming SlideShare
Loading in …5
×

Towards OpenURL Quality Metrics: Initial Findings

1,014 views

Published on

Presentation on creating a method for benchmarking metadata consistency in OpenURL links. See also: <http: />. Delivered at the July 2009 American Library Association conference in Chicago.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,014
On SlideShare
0
From Embeds
0
Number of Embeds
122
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Towards OpenURL Quality Metrics: Initial Findings

  1. 1. Towards OpenURL Quality Metrics: Initial Findings Adam Chandler Cornell University Library 2009 American Library Association Annual Conference, Chicago
  2. 2. OpenURL model
  3. 3. OpenURL model cont. incoming OpenURL http://linkresolver.library.cornell.edu:4550/resserv? &url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item- level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit =c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bin gley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech& rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch, +n&rft_id=info:doi/10.1108%2f07378830910942991/ in our knowledge base? title: Library hi tech issn: 0737-8831 start date: 19970101 end date: link-to syntax for Emerald http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN- HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#
  4. 4. OpenURL is pervasive Cornell link resolver alone: July 1, 2008 – June 30, 2009: 402,000 OpenURL service requests. Estimate: 402,000 * 123(ARL libraries) = 49 million
  5. 5. Cornell’s top 10 OpenURL sources 1. Web of Knowledge 2. Google Scholar 3. Webfeat (our “Find Articles” service) 4. EBSCOHost 5. OCLC FirstSearch 6. SilverPlatter 7. Weill Cornell Medical Center 8. SciFinder Scholar 9. PubMed 10. Refworks
  6. 6. example OpenURL http://linkresolver.library.cornell.edu:4550/resserv? &url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=i nfo:ofi/fmt:kev:mtx:journal&rft.atitle=item- level+usage+statistics+a+review+of+current+practices+and+recommendati ons+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.da te=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1 &rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage= 151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id =info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch, +n&rft_id=info:doi/10.1108%2f07378830910942991/
  7. 7. example OpenURL (1) http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004 &url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx &rft_val_fmt=info:ofi/fmt:kev:mtx:journal &rft.atitle=item- level+usage+statistics+a+review+of+current+practices+and+recommendati ons+for+normalization+and+exchange &rft.auinit=c &rft.aulast=merk &rft.date=2009 &rft.epage=162 &rft.genre=article &rft.issn=0737-8831
  8. 8. example OpenURL (2) &rft.issue=1 &rft.place=bingley &rft.pub=emerald+group+publishing+limited &rft.spage=151 &rft.stitle=libr+hi+tech &rft.title=library+hi+tech &rft.volume=27 &rfr_id=info:sid/www.isinet.com:wok:wos &rft.au=scholze,+f &rft.au=windisch,+n &rft_id=info:doi/10.1108%2f07378830910942991/
  9. 9. Literature review Since the OpenURL standard was introduced some ten years ago I can identify no systematic study designed and carried out to benchmark the quality of linking.
  10. 10. Wakimoto, Walker, and Dabbour (2006) Main finding: Users just expect full-text. When they do not get it they are disappointed. Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
  11. 11. Wakimoto, Walker, and Dabbour (2006) "Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134) Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
  12. 12. … but finding the cause of the problem is hard • Wrong start end date in the local library's holdings knowledge base (see KBART) • Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example) • Wrong link-to syntax in link resolver • Fragile handling of incoming links by content provider • Inaccurate or missing Crossref DOI URL (sometimes the DOI registration process is out of sync with the mounting of articles) • Subscription errors (especially with the start of a new calendar year) • Syntactically incorrect metadata from the OpenURL origin
  13. 13. Blake and Knudson (2002) • “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.” See: Culling, James (2007). "Link Resolvers and the Serials Supply Chain." UKSG. <http://www.uksg.org/projects/linkfinal> and NISO/UKSG KBART Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.
  14. 14. Blake and Knudson (2002) • “Increased awareness of bibliographic/citation standards by authors. Increased submission of publications with bibliographical references reflecting the accepted standards.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.
  15. 15. Blake and Knudson (2002) • “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 219-230.
  16. 16. Blake and Knudson (2002) • “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.” Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
  17. 17. Hughes (2004) • Hughes describes an initiative of the Open Language Archives Community (OLAC), a consortium of linguistic data archives, to create an infrastructure to support metadata quality assessment within a specialized Open Archives Initiative (OAI) community. . Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
  18. 18. Hughes (2004) • Metadata quality should be evaluated on a per record and per collection basis and assessed against the baseline of broader community practice. Metadata quality requires both structural and semantic validation. . Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
  19. 19. Hughes (2004) • Goals: – establish a baseline against which future instances can be compared; – provide assistance to data providers; – evaluate a set of domain-grounded controlled vocabularies. Baden Hughes, Metadata Quality Evaluation: Experience from the Open . Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
  20. 20. Hughes’ approach • Each metadata record score from 0 - 10. • There are two parts, a "Code Existence Score and an Element Absence Penalty," with weighting. • The Code Existence Score is specific to the OLAC communities use of Dublin Core extensions. • The Element Absence Penalty is based on the premise that the usefullness of a given metadata decreases in the absence of core metadata fields. • The absence of a core element results in a negative 0.2 penalty. Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
  21. 21. Hughes’ approach • From this simple approach, an array of metrics are derived: – archive diversity; – metadata quality; – core elements per record; – core element usage; – code usage; – code and element usage; – star rating. • From these metrics a score is computed for each metadata record, each archive, and the community as a whole. Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp. 320-329.
  22. 22. Mellon funded planning grant for L'Année philologique 1. Canonical Citation Linking: http://cwkb.org In collaboration with Eric Rebillard, Professor, Classics and History, and David Ruddy, Cornell University Library 2. OpenURL Quality Is it possible to build a system for evaluating OpenURL quality from a content provider?
  23. 23. Key findings from 2008 Mellon OpenURL quality investigation Hughes’ approach to metadata evaluation is excellent scaffolding to help build a model for OpenURL metadata evaluation, but it does not match the problem exactly.
  24. 24. Constant 1: Key elements used by content providers in their link-to targets title - 64% spage - 64% volume - 61% issue - 60% Based on an analysis of link- tos in the Cornell instance of date - 48% the III WebBridge link resolver aulast - 47% product. issn - 35% atitle - 35% DOI - 14% ISBN – 5%
  25. 25. Constant 2: Frequency of element string patterns for all sources
  26. 26. Relational model
  27. 27. aulast if ($element =~ /aulast/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.aulast/) { next; } } $patterns{allsids}{$genre}{"aulast"}++; $patterns{$sid}{$genre}{"aulast"}++; if ($value =~ /^[A-Za-z]+$/) { $patterns{$sid}{$genre} {"aulast_simple"}++; } elsif ($value =~ /^[A-Za-z]+, .+$/) { $patterns{$sid}{$genre} {"aulast_comma"}++; } elsif ($value =~ /^[A-Z][a-z]+( [A-Z].)+$/) { $patterns{$sid}{$genre} {"aulast_simpleplusinitial"}++; } else {$patterns{$sid}{$genre}{"aulast_other"}++; } }
  28. 28. Simple flat structure
  29. 29. aulast_other examples Ryan S Miller Louise D Bryant DAVID J MCKENZIE %C4%90okovi%C4%87 Indu B Ahluwalia Carreras-Sangr%c3%a0 Bautista-Casta%C3%B1o O%27Shea Melissa Ventura Marra Guan XueYing%3B Yu Nan%3B Shangguan XiaoXia
  30. 30. spage if ($element =~ /spage/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.spage/) { next; } } $patterns{allsids}{$genre}{"spage"}++; $patterns{$sid}{$genre}{"spage"}++; if ($value =~ /^d+$/) { $patterns{$sid}{$genre}{"spage_number"}+ +; } elsif ($value =~ /^d+-d+$/) { $patterns{$sid}{$genre} {"spage_number_number"}++; } elsif ($value =~ /[A-Za-z].+d/) { $patterns{$sid}{$genre} {"spage_string_w_number"}++; } else {$patterns{$sid}{$genre}{"spage_other"}++; } }
  31. 31. spage_other examples • 1033 (6 pages) • 85(19) • 575 (11 pages) • 283...290 • PHYS • GLRM • 58,+VI
  32. 32. date if ($element =~ /date/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.date/) { next; } } $patterns{allsids}{$genre}{"date"}++; $patterns{$sid}{$genre}{"date"}++; if ($value =~ /^d{4}$/) { $patterns{$sid}{$genre}{"date_dddd"}++; } elsif ($value =~ /^d{4}-d{2}$/) { $patterns{$sid}{$genre}{"date_dddd- dd"}++; } elsif ($value =~ /^d{4}-d{2}-d{2}$/) { $patterns{$sid}{$genre} {"date_dddd-dd-dd"}++; } elsif ($value =~ /^d{4}-d{4}$/) { $patterns{$sid}{$genre}{"date_dddd- dddd"}++; } elsif ($value =~ /^d{8}$/) { $patterns{$sid}{$genre}{"date_dddddddd"}++; } else {$patterns{$sid}{$genre}{"date_dateother"}++;} }
  33. 33. date_other examples • 1956 July • %7E1994 • June 5%2C 2002 • JUN 30 05 • 2006%282007%29 • 1922,+April+25th • %5B%5B1943-06-19%5D%5D
  34. 34. issn_other if ($element =~ /issn/) { if ($sid =~ /firstsearch/) { if ($element =~ /rft.issn/) { next; } } $patterns{allsids}{$genre}{"issn"}++; $patterns{$sid}{$genre}{"issn"}++; if ($value =~ /^d+-d+$/) { $patterns{$sid}{$genre} {"issn_number_number"}++; } elsif ($value =~ /^d+$/) { $patterns{$sid}{$genre}{"issn_number"}++; } elsif ($value =~ /^d+X$/) { $patterns{$sid}{$genre}{"issn_numberX"}++; } elsif ($value =~ /^d+-d+X$/) { $patterns{$sid}{$genre} {"issn_number_numberX"}++; } else {$patterns{$sid}{$genre}{"issn_other"}++; print "$valuen";} }
  35. 35. issn_other examples • 0065-2598%28print%29 • 0018-5345+%28ISSN+print%29 • ISSN ISBN 0-9525091-5-6. • 0021-8375%28print %29%7C1439-0361%28electronic%29 • 1471-2164+%28ISSN+online%29 • 0191-8699%3B0191-8699 • 0741-8329 (Print)%3B NLM Unique Journal Identifier%3A 8502311
  36. 36. How often? metric frequency in July-Sep 2008 sample au_last_other 5476 spage_other 772 date_other 591 issn_other 200
  37. 37. Demo of OQ UI
  38. 38. Element report
  39. 39. Element report
  40. 40. Pattern report
  41. 41. Pattern report
  42. 42. Pattern report
  43. 43. Next steps • add non-Cornell data, from libraries or link resolver vendors (model is agnostic to source) • confirm and publicize key elements used by target syntaxes • outreach to content providers • refine and expand metrics • more reports – longitudinal by source – compare frequency of an element’s use across sources – compare frequency of an element pattern across sources
  44. 44. How to stay in the loop http://openurlquality.blogspot.com/ Adam Chandler Database Management and Electronic Resources Librarian Library Technical Services Cornell University Library tel: 607-255-5760 email: alc28@cornell.edu

×