A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers
Upcoming SlideShare
Loading in...5
×
 

A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

on

  • 1,013 views

 

Statistics

Views

Total Views
1,013
Views on SlideShare
964
Embed Views
49

Actions

Likes
0
Downloads
10
Comments
0

3 Embeds 49

http://openurlquality.blogspot.com 47
http://www.openurlquality.blogspot.com 1
http://openurlquality.blogspot.co.uk 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers Presentation Transcript

    • A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers
      Adam Chandler
      Cornell University Library
      Cornell University Library, Metadata Working Group Forum
      16 October 2009
    • OpenURL model
    • OpenURL model cont.
      incoming OpenURL
      http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/
      in our knowledge base?
      title: Library hi tech issn: 0737-8831 start date: 19970101 end date:
      link-to syntax for Emerald
      http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#
    • OpenURL is pervasive
      Cornell link resolver alone:
      July 1, 2008 – June 30, 2009:
      402,000 OpenURL service requests.
      402,000 * 123(ARL libraries) = 49 million
    • Cornell’s top 10 OpenURL sources
      Web of Knowledge
      WorldCat Local
      Google Scholar
      Webfeat (our “Find Articles” service)
      EBSCOHost
      OCLC FirstSearch
      SilverPlatter
      Weill Cornell Medical Center
      SciFinder Scholar
      PubMed
    • example OpenURL
      http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/
    • example OpenURL (1)
      http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004
      &url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx
      &rft_val_fmt=info:ofi/fmt:kev:mtx:journal
      &rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange
      &rft.auinit=c
      &rft.aulast=merk
      &rft.date=2009
      &rft.epage=162
      &rft.genre=article
      &rft.issn=0737-8831
    • example OpenURL (2)
      &rft.issue=1
      &rft.place=bingley
      &rft.pub=emerald+group+publishing+limited
      &rft.spage=151
      &rft.stitle=libr+hi+tech
      &rft.title=library+hi+tech
      &rft.volume=27
      &rfr_id=info:sid/www.isinet.com:wok:wos
      &rft.au=scholze,+f
      &rft.au=windisch,+n
      &rft_id=info:doi/10.1108%2f07378830910942991/
    • … but quality of experience is difficult to benchmark
      Wrong start end date in the local library's holdings knowledge base (see NISO KBART)
      Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example)
      Wrong link-to syntax in link resolver
      Fragile handling of incoming links by content provider
    • … but quality of experience is difficult to benchmark
      Inaccurate or missing Crossref DOI URL (sometimes the DOI registration process is out of sync with the mounting of articles)
      Subscription errors (especially with the start of a new calendar year)
      Syntactically incorrect or missing metadata from the OpenURL origin
    • Literature review
      I can identify no systematic study designed and carried out to benchmark the quality of linking. The OpenURL standard was introduced some ten years ago.
    • Wakimoto, Walker, and Dabbour (2006)
      Main finding: Users just expect full-text. When they do not get it they are disappointed.
      Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
    • Wakimoto, Walker, and Dabbour (2006)
      "Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134)
      Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136
    • Blake and Knudson (2002)
      “Increased awareness of bibliographic/citation standards by authors. Increased submission of publications with bibliographical references reflecting the accepted standards.”
      Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
    • Blake and Knudson (2002)
      “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.”
      Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
    • Blake and Knudson (2002)
      “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.”
      (NISO KBART role)
      Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
    • Blake and Knudson (2002)
      “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.”
      Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.
    • Hughes (2004)
      Hughes describes an initiative of the Open Language Archives Community (OLAC), a consortium of linguistic data archives, to create an infrastructure to support metadata quality assessment within a specialized Open Archives Initiative (OAI) community.
      .
      Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
    • Hughes (2004)
      Metadata quality should be evaluated on a per record and per collection basis and assessed against the baseline of broader community practice. Metadata quality requires both structural and semantic validation.
      .
      Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
    • Hughes (2004)
      Goals:
      establish a baseline against which future instances can be compared;
      provide assistance to data providers;
      evaluate a set of domain-grounded controlled vocabularies.
      .
      Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
    • Hughes’ approach
      Each metadata record score from 0 - 10.
      There are two parts, a "Code Existence Score and an Element Absence Penalty," with weighting.
      The Code Existence Score is specific to the OLAC communities use of Dublin Core extensions.
      The Element Absence Penalty is based on the premise that the usefullness of a given metadata decreases in the absence of core metadata fields.
      The absence of a core element results in a negative 0.2 penalty.
      Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
    • Hughes’ approach
      From this simple approach, an array of metrics are derived:
      archive diversity;
      metadata quality;
      core elements per record;
      core element usage;
      code usage;
      code and element usage;
      star rating.
      From these metrics a score is computed for each metadata record, each archive, and the community as a whole.
      Baden Hughes, Metadata Quality Evaluation: Experience from the Open Language Archives Community. 7th International Conference on Asian Digital Libraries, ICADL 2004, Shanghai, China, December 13-17, 2004. Proceedings, pp 320-329.
    • Mellon funded planning grant for L'Année philologique
      1. Canonical Citation Linking: http://cwkb.org
      In collaboration with Eric Rebillard, Professor, Classics and History, and David Ruddy, Cornell University Library
      2. OpenURL Quality
      Is it possible to build a tool for evaluating the quality of OpenURLs from a content provider?
    • Key findings from 2008 Mellon OpenURL quality investigation
      Hughes’ approach to metadata evaluation is excellent scaffolding to help build a model for OpenURL metadata evaluation, but it does not match the problem exactly.
    • Constant: Core elements used by content providers in their link-to targets
      title - 64%
      spage - 64%
      volume - 61%
      issue - 60%
      date - 48%
      aulast - 47%
      issn - 35%
      atitle - 35%
      DOI - 14%
      ISBN – 5%
      Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.
    • Variable: Frequency of element string patterns for all sources
    • aulast
      First author's family name. This may be more than one word. In many citations, the author's family name is recorded first and is followed by a comma, e.g. Smith, Fred James is recorded as "aulast=smith"
    • aulast
      if ($e =~ /aulast/) {
      $patterns{$neworigin}{$newsid}{$e}++;
      if ($elementhash{$e} =~ /^[A-Za-z]+$/) { $patterns{$neworigin}{$newsid}{"aulast_simple"}++; }
      elsif ($elementhash{$e} =~ /^[A-Za-z]+, .+$/) { $patterns{$neworigin}{$newsid}{"aulast_comma"}++; }
      elsif ($elementhash{$e} =~ /^[A-Z][a-z]+( [A-Z].)+$/) { $patterns{$neworigin}{$newsid}{"aulast_simpleplusinitial"}++;}
      else { $patterns{$neworigin}{$newsid}{"aulast_other"}++; }
      }
    • aulast_other examples
      Ryan S Miller
      Louise D Bryant
      DAVID J MCKENZIE
      %C4%90okovi%C4%87
      Indu B Ahluwalia
      Carreras-Sangr%c3%a0
      Bautista-Casta%C3%B1o
      O%27Shea
      Melissa Ventura Marra
      Guan XueYing%3B Yu Nan%3B ShangguanXiaoXia
    • spage
      First page number of a start/end (spage-epage) pair. Note that pages are not always numeric.
    • spage
      if ($e =~ /spage/) {
      $patterns{$neworigin}{$newsid}{$e}++;
      if ($elementhash{$e} =~ /^d+$/) { $patterns{$neworigin}{$newsid}{"spage_number"}++; }
      elsif ($elementhash{$e} =~ /^d+-d+$/) { $patterns{$neworigin}{$newsid}{"spage_number_number"}++; }
      elsif ($elementhash{$e} =~ /[A-Za-z].+d/) { $patterns{$neworigin}{$newsid}{"spage_string_w_number"}++; }
      else { $patterns{$neworigin}{$newsid}{"spage_other"}++; }
      }
    • spage_other examples
      1033 (6 pages)
      85(19)
      575 (11 pages)
      283...290
      PHYS
      GLRM
      58,+VI
    • date
      The publication date of the item or bundle encoded in the "Complete date" variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYYMM- DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year.
    • date
      if ($e =~ /date/) {
      $patterns{$neworigin}{$newsid}{$e}++;
      if ($elementhash{$e} =~ /^d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd"}++; }
      elsif ($elementhash{$e} =~ /^d{4}-d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd"}++; }
      elsif ($elementhash{$e} =~ /^d{4}-d{2}-d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd-dd"}++; }
      elsif ($elementhash{$e} =~ /^d{4}-d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dddd"}++; }
      elsif ($elementhash{$e} =~ /^d{8}$/) { $patterns{$neworigin}{$newsid}{"date_dddddddd"}++; }
      else {$patterns{$neworigin}{$newsid}{"date_dateother"}++; }
      }
    • date_other examples
      1956 July
      %7E1994
      June 5%2C 2002
      JUN 30 05
      2006%282007%29
      1922,+April+25th
      %5B%5B1943-06-19%5D%5D
    • issn
      International Standard Serials Number (ISSN). The issn may contain a hyphen, e.g. "1041-5653"
    • issn
      if ($e =~ /issn/) {
      $patterns{$neworigin}{$newsid}{$e}++;
      if ($elementhash{$e} =~ /^d{4}-d{3}./) { $patterns{$neworigin}{$newsid}{"issn_number_number"}++; }
      elsif ($elementhash{$e} =~ /^d{7}./) { $patterns{$neworigin}{$newsid}{"issn_number"}++; }
      else { $patterns{$neworigin}{$newsid}{"issn_other"}++; }
      }
    • issn_other examples
      0065-2598%28print%29
      0018-5345+%28ISSN+print%29
      ISSN ISBN 0-9525091-5-6.
      0021-8375%28print%29%7C1439-0361%28electronic%29
      1471-2164+%28ISSN+online%29
      0191-8699%3B0191-8699
      0741-8329 (Print)%3B NLM Unique Journal Identifier%3A 8502311
    • How often out of 402,000 Cornell OpenURLs?
    • flat file output
      logsourceyear quarter origin sid metric count
      cornell 2009 Q1 csacsa:commabs-set-c atitle 154
      cornell 2009 Q1 csacsa:commabs-set-c atitle_colon 101
      cornell 2009 Q1 csacsa:commabs-set-c atitle_other 53
      cornell 2009 Q1 csacsa:commabs-set-c aulast 159
      cornell 2009 Q1 csacsa:commabs-set-c aulast_other 4
      cornell 2009 Q1 csacsa:commabs-set-c aulast_simple 155
      cornell 2009 Q1 csacsa:commabs-set-c date 159
      cornell 2009 Q1 csacsa:commabs-set-c date_dddd 110
      cornell 2009 Q1 csacsa:commabs-set-c date_dddd-dd 49
      cornell 2009 Q1 csacsa:commabs-set-c isbn 6
      cornell 2009 Q1 csacsa:commabs-set-c isbn_10 6
      cornell 2009 Q1 csacsa:commabs-set-c issn 135
      cornell 2009 Q1 csacsa:commabs-set-c issn_number-number 135
      cornell 2009 Q1 csacsa:commabs-set-c issue 136
      cornell 2009 Q1 csacsa:commabs-set-c issue_number 132
      cornell 2009 Q1 csacsa:commabs-set-c issue_number_dash_number2
      cornell 2009 Q1 csacsa:commabs-set-c issue_other 2
      cornell 2009 Q1 csacsa:commabs-set-c spage 153
      cornell 2009 Q1 csacsa:commabs-set-c spage_number 153
      cornell 2009 Q1 csacsa:commabs-set-c title 160
      cornell 2009 Q1 csacsa:commabs-set-c total 160
      cornell 2009 Q1 csacsa:commabs-set-c volume 139
      cornell 2009 Q1 csacsa:commabs-set-c volume_number 139
    • Demonstration
      http://openurlquality.blogspot.com/
    • Next steps
      create a NISO structure to wrap around the metrics: “NISO OpenURL Quality Index”
      add non-Cornell data from libraries and link resolver vendors (model is agnostic to source)
      confirm and publicize key elements used by target syntaxes
      can the quality of the global OpenURL network be modeled mathematically?
    • How to stay in the loop
      http://openurlquality.blogspot.com/
      Adam ChandlerDatabase Management and Electronic Resources Research LibrarianCentral Library OperationsCornell University Librarytel: 607-255-5760email: alc28@cornell.edu