A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across conte...
OpenURL model<br />
OpenURL model cont. <br />incoming OpenURL<br />http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&...
OpenURL is pervasive<br />Cornell link resolver alone:<br />July 1, 2008 – June 30, 2009:<br />402,000 OpenURL service req...
Cornell’s top 10 OpenURL sources<br />Web of Knowledge<br />WorldCat Local<br />Google Scholar<br />Webfeat (our “Find Art...
example OpenURL<br />http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:ke...
example OpenURL (1)<br />http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004<br />&url_ctx_fmt=info:...
example OpenURL (2)<br />&rft.issue=1<br />&rft.place=bingley<br />&rft.pub=emerald+group+publishing+limited<br />&rft.spa...
 … but quality of experience is difficult to benchmark<br />Wrong start end date in the local library&apos;s holdings know...
 … but quality of experience is difficult to benchmark<br />Inaccurate or missing Crossref DOI URL (sometimes the DOI regi...
Literature review<br />I can identify no systematic study designed and carried out to benchmark the quality of linking. Th...
Wakimoto, Walker, and Dabbour (2006)<br />Main finding: Users just expect full-text. When they do not get it they are disa...
Wakimoto, Walker, and Dabbour (2006)<br />&quot;Where does SFX start and where does it end? If an SFX request does not res...
Blake and Knudson (2002)<br />“Increased awareness of bibliographic/citation standards by authors. Increased submission of...
Blake and Knudson (2002)<br />“Increased outreach by librarians to authors emphasizing and promoting the importance of cit...
Blake and Knudson (2002)<br />“Increased communication between primary publishers and secondary publishers. Metadata corre...
Blake and Knudson (2002)<br />“Increased consistency in metadata within a single database and across databases. This would...
Hughes (2004)<br />Hughes describes an initiative of the Open Language Archives Community (OLAC), a consortium of linguist...
Hughes (2004)<br />Metadata quality should be evaluated on a per record and per collection basis and assessed against the ...
Hughes (2004)<br />Goals: <br />establish a baseline against which future instances can be compared; <br />provide assista...
Hughes’ approach<br />Each metadata record score from 0 - 10. <br />There are two parts, a &quot;Code Existence Score and ...
Hughes’ approach<br />From this simple approach, an array of metrics are derived:  <br />archive diversity; <br />metadata...
Mellon funded planning grant for L&apos;Année philologique <br />1. Canonical Citation Linking: http://cwkb.org<br />In co...
Key findings from 2008 Mellon OpenURL quality investigation<br />Hughes’ approach to metadata evaluation is excellent  sca...
Constant: Core elements used by content providers in their link-to targets<br />title - 64%<br />spage - 64%<br />volume -...
Variable: Frequency of element string patterns for all sources<br />
aulast<br /> First author&apos;s family name. This may be more than one word. In many citations, the author&apos;s family ...
aulast<br />  if ($e =~ /aulast/) {<br />      $patterns{$neworigin}{$newsid}{$e}++;<br />      if ($elementhash{$e} =~ /^...
aulast_other examples<br />Ryan S Miller<br />Louise D Bryant<br />DAVID J MCKENZIE<br />%C4%90okovi%C4%87<br />Indu B Ahl...
spage<br />First page number of a start/end (spage-epage) pair. Note that pages are not always numeric.<br />
spage<br />     if ($e =~ /spage/) {<br />      $patterns{$neworigin}{$newsid}{$e}++;<br />      if ($elementhash{$e} =~ /...
spage_other examples<br />1033 (6 pages)<br />85(19)<br />575 (11 pages)<br />283...290<br />PHYS<br />GLRM<br />58,+VI<br />
date<br />The publication date of the item or bundle encoded in the &quot;Complete date&quot; variant of ISO8601 (see http...
Upcoming SlideShare
Loading in …5
×

A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

927 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
927
On SlideShare
0
From Embeds
0
Number of Embeds
50
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

  1. 1. A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers<br />Adam Chandler<br />Cornell University Library<br />Cornell University Library, Metadata Working Group Forum<br />16 October 2009<br />
  2. 2. OpenURL model<br />
  3. 3. OpenURL model cont. <br />incoming OpenURL<br />http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/<br />in our knowledge base?<br />title: Library hi tech issn: 0737-8831 start date: 19970101 end date: <br />link-to syntax for Emerald<br />http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#<br />
  4. 4. OpenURL is pervasive<br />Cornell link resolver alone:<br />July 1, 2008 – June 30, 2009:<br />402,000 OpenURL service requests.<br />402,000 * 123(ARL libraries) = 49 million<br />
  5. 5. Cornell’s top 10 OpenURL sources<br />Web of Knowledge<br />WorldCat Local<br />Google Scholar<br />Webfeat (our “Find Articles” service)<br />EBSCOHost<br />OCLC FirstSearch<br />SilverPlatter<br />Weill Cornell Medical Center<br />SciFinder Scholar <br />PubMed<br />
  6. 6. example OpenURL<br />http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/<br />
  7. 7. example OpenURL (1)<br />http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004<br />&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx<br />&rft_val_fmt=info:ofi/fmt:kev:mtx:journal<br />&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange<br />&rft.auinit=c<br />&rft.aulast=merk<br />&rft.date=2009<br />&rft.epage=162<br />&rft.genre=article<br />&rft.issn=0737-8831<br />
  8. 8. example OpenURL (2)<br />&rft.issue=1<br />&rft.place=bingley<br />&rft.pub=emerald+group+publishing+limited<br />&rft.spage=151<br />&rft.stitle=libr+hi+tech<br />&rft.title=library+hi+tech<br />&rft.volume=27<br />&rfr_id=info:sid/www.isinet.com:wok:wos<br />&rft.au=scholze,+f<br />&rft.au=windisch,+n<br />&rft_id=info:doi/10.1108%2f07378830910942991/<br />
  9. 9. … but quality of experience is difficult to benchmark<br />Wrong start end date in the local library&apos;s holdings knowledge base (see NISO KBART)<br />Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example) <br />Wrong link-to syntax in link resolver<br />Fragile handling of incoming links by content provider<br />
  10. 10. … but quality of experience is difficult to benchmark<br />Inaccurate or missing Crossref DOI URL (sometimes the DOI registration process is out of sync with the mounting of articles)<br />Subscription errors (especially with the start of a new calendar year)<br />Syntactically incorrect or missing metadata from the OpenURL origin <br />
  11. 11. Literature review<br />I can identify no systematic study designed and carried out to benchmark the quality of linking. The OpenURL standard was introduced some ten years ago.<br />
  12. 12. Wakimoto, Walker, and Dabbour (2006)<br />Main finding: Users just expect full-text. When they do not get it they are disappointed.<br />Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). &quot;The Myths and Realities of SFX in Academic Libraries.&quot; The Journal of Academic Librarianship 32 (2): 127–136<br />
  13. 13. Wakimoto, Walker, and Dabbour (2006)<br />&quot;Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?&quot; (p. 134)<br />Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). &quot;The Myths and Realities of SFX in Academic Libraries.&quot; The Journal of Academic Librarianship 32 (2): 127–136<br />
  14. 14. Blake and Knudson (2002)<br />“Increased awareness of bibliographic/citation standards by authors. Increased submission of publications with bibliographical references reflecting the accepted standards.”<br />Blake, Miriam E. and Frances L. Knudson. &quot;Metadata and Reference Linking.&quot; Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.<br />
  15. 15. Blake and Knudson (2002)<br />“Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.”<br />Blake, Miriam E. and Frances L. Knudson. &quot;Metadata and Reference Linking.&quot; Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.<br />
  16. 16. Blake and Knudson (2002)<br />“Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.”<br />(NISO KBART role)<br />Blake, Miriam E. and Frances L. Knudson. &quot;Metadata and Reference Linking.&quot; Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.<br />
  17. 17. Blake and Knudson (2002)<br />“Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.”<br />Blake, Miriam E. and Frances L. Knudson. &quot;Metadata and Reference Linking.&quot; Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.<br />
  18. 18. Hughes (2004)<br />Hughes describes an initiative of the Open Language Archives Community (OLAC), a consortium of linguistic data archives, to create an infrastructure to support metadata quality assessment within a specialized Open Archives Initiative (OAI) community. <br />.<br />Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.<br />
  19. 19. Hughes (2004)<br />Metadata quality should be evaluated on a per record and per collection basis and assessed against the baseline of broader community practice. Metadata quality requires both structural and semantic validation. <br />.<br />Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.<br />
  20. 20. Hughes (2004)<br />Goals: <br />establish a baseline against which future instances can be compared; <br />provide assistance to data providers; <br />evaluate a set of domain-grounded controlled vocabularies.<br />.<br />Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.<br />
  21. 21. Hughes’ approach<br />Each metadata record score from 0 - 10. <br />There are two parts, a &quot;Code Existence Score and an Element Absence Penalty,&quot; with weighting. <br />The Code Existence Score is specific to the OLAC communities use of Dublin Core extensions. <br />The Element Absence Penalty is based on the premise that the usefullness of a given metadata decreases in the absence of core metadata fields. <br />The absence of a core element results in a negative 0.2 penalty.<br />Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.<br />
  22. 22. Hughes’ approach<br />From this simple approach, an array of metrics are derived: <br />archive diversity; <br />metadata quality; <br />core elements per record; <br />core element usage; <br />code usage; <br />code and element usage; <br />star rating.<br />From these metrics a score is computed for each metadata record, each archive, and the community as a whole.<br />Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.<br />
  23. 23. Mellon funded planning grant for L&apos;Année philologique <br />1. Canonical Citation Linking: http://cwkb.org<br />In collaboration with Eric Rebillard, Professor, Classics and History, and David Ruddy, Cornell University Library<br />2. OpenURL Quality<br />Is it possible to build a tool for evaluating the quality of OpenURLs from a content provider? <br />
  24. 24. Key findings from 2008 Mellon OpenURL quality investigation<br />Hughes’ approach to metadata evaluation is excellent scaffolding to help build a model for OpenURL metadata evaluation, but it does not match the problem exactly.<br />
  25. 25. Constant: Core elements used by content providers in their link-to targets<br />title - 64%<br />spage - 64%<br />volume - 61%<br />issue - 60%<br />date - 48%<br />aulast - 47%<br />issn - 35%<br />atitle - 35%<br />DOI - 14%<br />ISBN – 5%<br />Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.<br />
  26. 26. Variable: Frequency of element string patterns for all sources<br />
  27. 27. aulast<br /> First author&apos;s family name. This may be more than one word. In many citations, the author&apos;s family name is recorded first and is followed by a comma, e.g. Smith, Fred James is recorded as &quot;aulast=smith&quot;<br />
  28. 28. aulast<br /> if ($e =~ /aulast/) {<br /> $patterns{$neworigin}{$newsid}{$e}++;<br /> if ($elementhash{$e} =~ /^[A-Za-z]+$/) { $patterns{$neworigin}{$newsid}{&quot;aulast_simple&quot;}++; }<br />elsif ($elementhash{$e} =~ /^[A-Za-z]+, .+$/) { $patterns{$neworigin}{$newsid}{&quot;aulast_comma&quot;}++; }<br />elsif ($elementhash{$e} =~ /^[A-Z][a-z]+( [A-Z].)+$/) { $patterns{$neworigin}{$newsid}{&quot;aulast_simpleplusinitial&quot;}++;}<br />else { $patterns{$neworigin}{$newsid}{&quot;aulast_other&quot;}++; }<br /> }<br />
  29. 29. aulast_other examples<br />Ryan S Miller<br />Louise D Bryant<br />DAVID J MCKENZIE<br />%C4%90okovi%C4%87<br />Indu B Ahluwalia<br />Carreras-Sangr%c3%a0<br />Bautista-Casta%C3%B1o<br />O%27Shea<br />Melissa Ventura Marra<br />Guan XueYing%3B Yu Nan%3B ShangguanXiaoXia<br />
  30. 30. spage<br />First page number of a start/end (spage-epage) pair. Note that pages are not always numeric.<br />
  31. 31. spage<br /> if ($e =~ /spage/) {<br /> $patterns{$neworigin}{$newsid}{$e}++;<br /> if ($elementhash{$e} =~ /^d+$/) { $patterns{$neworigin}{$newsid}{&quot;spage_number&quot;}++; }<br />elsif ($elementhash{$e} =~ /^d+-d+$/) { $patterns{$neworigin}{$newsid}{&quot;spage_number_number&quot;}++; }<br />elsif ($elementhash{$e} =~ /[A-Za-z].+d/) { $patterns{$neworigin}{$newsid}{&quot;spage_string_w_number&quot;}++; }<br />else { $patterns{$neworigin}{$newsid}{&quot;spage_other&quot;}++; }<br /> }<br />
  32. 32. spage_other examples<br />1033 (6 pages)<br />85(19)<br />575 (11 pages)<br />283...290<br />PHYS<br />GLRM<br />58,+VI<br />
  33. 33. date<br />The publication date of the item or bundle encoded in the &quot;Complete date&quot; variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYYMM- DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year.<br />
  34. 34. date<br /> if ($e =~ /date/) {<br /> $patterns{$neworigin}{$newsid}{$e}++;<br /> if ($elementhash{$e} =~ /^d{4}$/) { $patterns{$neworigin}{$newsid}{&quot;date_dddd&quot;}++; }<br />elsif ($elementhash{$e} =~ /^d{4}-d{2}$/) { $patterns{$neworigin}{$newsid}{&quot;date_dddd-dd&quot;}++; }<br />elsif ($elementhash{$e} =~ /^d{4}-d{2}-d{2}$/) { $patterns{$neworigin}{$newsid}{&quot;date_dddd-dd-dd&quot;}++; }<br />elsif ($elementhash{$e} =~ /^d{4}-d{4}$/) { $patterns{$neworigin}{$newsid}{&quot;date_dddd-dddd&quot;}++; }<br />elsif ($elementhash{$e} =~ /^d{8}$/) { $patterns{$neworigin}{$newsid}{&quot;date_dddddddd&quot;}++; }<br />else {$patterns{$neworigin}{$newsid}{&quot;date_dateother&quot;}++; }<br /> }<br />
  35. 35. date_other examples<br />1956 July<br />%7E1994<br />June 5%2C 2002<br />JUN 30 05<br />2006%282007%29<br />1922,+April+25th<br />%5B%5B1943-06-19%5D%5D<br />
  36. 36. issn<br /> International Standard Serials Number (ISSN). The issn may contain a hyphen, e.g. &quot;1041-5653&quot;<br />
  37. 37. issn<br /> if ($e =~ /issn/) {<br /> $patterns{$neworigin}{$newsid}{$e}++;<br /> if ($elementhash{$e} =~ /^d{4}-d{3}./) { $patterns{$neworigin}{$newsid}{&quot;issn_number_number&quot;}++; }<br />elsif ($elementhash{$e} =~ /^d{7}./) { $patterns{$neworigin}{$newsid}{&quot;issn_number&quot;}++; } <br />else { $patterns{$neworigin}{$newsid}{&quot;issn_other&quot;}++; }<br /> }<br />
  38. 38. issn_other examples<br />0065-2598%28print%29<br />0018-5345+%28ISSN+print%29<br />ISSN ISBN 0-9525091-5-6.<br />0021-8375%28print%29%7C1439-0361%28electronic%29<br />1471-2164+%28ISSN+online%29<br />0191-8699%3B0191-8699<br />0741-8329 (Print)%3B NLM Unique Journal Identifier%3A 8502311<br />
  39. 39. How often out of 402,000 Cornell OpenURLs?<br />
  40. 40. flat file output<br />logsourceyear quarter origin sid metric count<br />cornell 2009 Q1 csacsa:commabs-set-c atitle 154 <br />cornell 2009 Q1 csacsa:commabs-set-c atitle_colon 101 <br />cornell 2009 Q1 csacsa:commabs-set-c atitle_other 53 <br />cornell 2009 Q1 csacsa:commabs-set-c aulast 159 <br />cornell 2009 Q1 csacsa:commabs-set-c aulast_other 4 <br />cornell 2009 Q1 csacsa:commabs-set-c aulast_simple 155 <br />cornell 2009 Q1 csacsa:commabs-set-c date 159 <br />cornell 2009 Q1 csacsa:commabs-set-c date_dddd 110 <br />cornell 2009 Q1 csacsa:commabs-set-c date_dddd-dd 49 <br />cornell 2009 Q1 csacsa:commabs-set-c isbn 6 <br />cornell 2009 Q1 csacsa:commabs-set-c isbn_10 6 <br />cornell 2009 Q1 csacsa:commabs-set-c issn 135 <br />cornell 2009 Q1 csacsa:commabs-set-c issn_number-number 135<br />cornell 2009 Q1 csacsa:commabs-set-c issue 136 <br />cornell 2009 Q1 csacsa:commabs-set-c issue_number 132 <br />cornell 2009 Q1 csacsa:commabs-set-c issue_number_dash_number2<br />cornell 2009 Q1 csacsa:commabs-set-c issue_other 2 <br />cornell 2009 Q1 csacsa:commabs-set-c spage 153 <br />cornell 2009 Q1 csacsa:commabs-set-c spage_number 153 <br />cornell 2009 Q1 csacsa:commabs-set-c title 160 <br />cornell 2009 Q1 csacsa:commabs-set-c total 160 <br />cornell 2009 Q1 csacsa:commabs-set-c volume 139 <br />cornell 2009 Q1 csacsa:commabs-set-c volume_number 139 <br />
  41. 41. Demonstration<br />http://openurlquality.blogspot.com/<br />
  42. 42. Next steps<br />create a NISO structure to wrap around the metrics: “NISO OpenURL Quality Index”<br />add non-Cornell data from libraries and link resolver vendors (model is agnostic to source)<br />confirm and publicize key elements used by target syntaxes<br />can the quality of the global OpenURL network be modeled mathematically?<br />
  43. 43. How to stay in the loop<br />http://openurlquality.blogspot.com/<br />Adam ChandlerDatabase Management and Electronic Resources Research LibrarianCentral Library OperationsCornell University Librarytel: 607-255-5760email: alc28@cornell.edu<br />

×