Lets run through a quick example. This table shows the core elements for an article link… and for the simplicity of this example we will assume all elements are equally important so each gets a weight of 1 – a perfect OpenURL will get the maximum score of 8.
Now lets look at some OpenURL elements…. In this OpenURL we have…<CLICK>Date … so we add one point<CLICK>ISSN… add another point<CLICK>Volume… another point<CLICK>ISSUE… another<CLICK>And Article Title… and another point<CLICK>… the result is a total of 5 points.<CLICK>The calculation is Sum of the weights for this OpenURL divided by the total for all weights<CLICK>Which is five divided by 8<CLICK>Or .625
We needed a better way of determining the element weights, so we sought help from Phil Davis – a researcher with some experience in statistical modeling. Phil’s suggestion was to perform stepwise regression to see the effect of individual elements on a sample of OpenURLs. And that is what we did…We started with a set of “perfect” OpenURLs – ones that not only included all core data elements, but that also resolved to match a full text target on both LinkSource and 360 Link… we used a set of 1500.<CLICK>We then ran several series of tests where we ran the OpenURL past the link resolver with a different element removed for each test series.<CLICK>We recorded the success (or rather failure rates) associated with each element. The elements with the higher failure rates are more important to the success of the OpenURL than the ones with lower failure rates.<CLICK>We then used the failure rates as a basis for weights.<CLICK>Then we used the weights and re-ran our 15,000 sample test.
So how’d it turn out? Again, here are numbers for LinkSource.<Click>You can see Volume was a key element with 74% of OpenURLs failing when it was removed.<Click>Author last name was not very important with less than a 10th of a percent failure rate<Click>Date was surprising low too. This could be for a few reasons – the level of forgiveness in the holdings matching logic (e.g. treat no date as “any date”), the ability for the link resolver to discover the date by looking up the article citation in the knowledge base using volume/issue/start page coupled with the fact that a lot of full text providers don’t use date explicitly in the outbound links.
We created article weights. <Click>Rather than use raw failure rates, we used logarithmic values of the failure rates – the number of failures per 10,000.
Then we ran our 15,000 record sample again. You can see from the graph that average completeness score and average success score for the OpenURL providers align very closely, and the Correlation Coefficient of these two values across all 15,000 test OpenURLs is .80 – which indicates a strong correlation. Good news for the test.This tells us that the Completeness Index can be used as a predictor of OpenURL success from a particular content provider – a low Completeness Index is a good indicator there is a problem.
Meeting the Challenge / NISO update
Meeting the Challenge: SuccessfulElectronic Resources Management in theAbsence of a Perfect SystemNISO Update on IOTA and SUSHIOliver PeschChief Strategist, E-Resources, EBSCO InformationServices
SUSHIWHAT IT IS ... An ANSI/NISO Standard (NISO Z39.93-2007) Defines automated request and response model for harvesting e-resource usage data Designed to work with COUNTER, the most frequently retrieved usage reports
SUSHIHOW TO USE IT … Works behind-the-scenes It is a client-server technology used by usage consolidation solutions (e.g. ERM systems) and content providers Content providers develop a SUSHI Server to deliver COUNTER statistics Usage consolidation solutions include a SUSHI client to automatically retrieve usage on a scheduled basis or on demand
SUSHIWHY YOU SHOULD USE IT … It replaces the time-consuming user-mediated collection of usage data reports The protocol is generalized and extensible, meaning it can be used to retrieve a variety of usage reports
SUSHICURRENT STATUS… Many resources available on SUSHI web site: http://www.niso.org/workrooms/sushi 40+ content providers support SUSHI (SUSHI Server Registry: https://sites.google.com/site/sushiserverregistry) Works with all COUNTER reports Ready for COUNTER Release 4 SUSHI support is an enforced requirement for COUNTER compliance with Release 4
SUSHITHE COMMITTEE… Bob McQuillan, Innovative Interfaces Inc. (Co-chair) Oliver Pesch, EBSCO Information Services (Co-chair) Marie Kennedy, Loyola Marymount University Chan Li, California Digital Library John Milligan, ScholarlyIQ Paul Needham, Cranfield University James Van Mil, University of Cincinnati Libraries
SUSHICURRENT ACTIVITES… ◦ Continued education and awareness ◦ Renovating the web site ◦ Exploring “SUSHI Lite” – a protocol that would be based on JSON
IOTAWHAT IS IT… ◦ A working group focused on OpenURL quality… ◦ Using analytics to provide a quantitative measure of quality of OpenURLs provided by “Sources” ◦ Created the Completeness Index as a measure of quality ◦ Developed an interactive online tool to provide analysis and reporting on real OpenURL log file ◦ Producing a Technical Report and Recommended Practice related to OpenURL quality
IOTACOMPLETENESS INDEX… Based on premise that the success of a link can be affected by the data provided in the OpenURL Identify the required metadata elements Determine a “weight” for each element to reflect importance Score an OpenURL by adding weights for all elements provided divided by the total if all elements appeared
IOTA Simple example assuming equal element weightsElement Description Weight This OpenURLATitle Article title 1AuLast Author’s last name 1Date Date of publication 1ISSN ISSN 1Issue Issue number 1SPage Start page 1Title Journal Title 1Volume Volume number 1TOTAL 8
IOTA SAMPLE OPEN URL DATA ?date=2/4/2008 &issn=1083-3013 Simple example assuming equal element weights &volume=13 &issue=20 Completeness Score... &atitle=the+casualties+of+warElement Description Weight This OpenURL(Total for This OpenURL) Total WeightsATitle Article title 1 1AuLast 5 / 8Author’s last name 1Date 1 = .625 of publication Date 1ISSN ISSN 1 1Issue Issue number 1 1SPage Start page 1Title Journal Title 1Volume Volume number 1 1TOTAL 8 5
IOTARECOMMENDED PRACTICE… Defines a technique for determining element weights Tested with real link resolvers and real OpenURLs Based on research which looked for a correlation with data elements on the OpenURL and “success” of the OpenURL
A Statistical Approach toDetermining Element Weights Select a set of “perfect” OpenURLs ◦ include all key data elements and resolve to full text Perform step-wise regression ◦ Test failure rates for each element by removing that element Use failure rates as basis for weights Use weights to calculate Completeness Scores and to test for correlation between weights and success for larger sample
Failure Rates from 1500 OpenURL test sampleAuthor’sElement removed last name is least Description Failure Percentage important OpenURL from the ATitle Article title .74% Date is AuLast surprisingly low Author’s last name .07% Date Date of publication .4% ISSN ISSN (either online or 22.02% print ISSN) Issue Issue number 20.27% SPage Volume is most critical Start page 33.27% Title Journal Title (either .61% Title or Jtitle) Volume Volume number 74.14%
Calculated Element WeightsElement Description Weight*ATitle Article title 1.87AuLast Author’s last name 0.83Date Date of publication 1.61ISSN ISSN (either online or 3.34 print ISSN)Issue Issue number 3.31SPage Start page 3.52Title Journal Title (either Title 1.78 or Jtitle)Volume Volume number 3.87 *Element weight calculation: log10 (failure-rate-per-10,000 OpenURLs)
Results1.20001.0000 Average of0.8000 Completeness0.6000 Score0.40000.2000 Average of Success Score0.0000 Correlation Coefficient .80 Tests conducted on sample of 15,000 OpenURLs randomly pulled from IOTA database
IOTAINTERACTIVE ONLINE TOOL… 23.3+ million OpenURLs processed Reporting interface ◦ Analyze data elements (metrics) across vendors or database (Source) ◦ Analyze (Source) for all data elements
IOTAHOW TO USE IT… ◦ The Technical Report provides suggestions for improving OpenURLs ◦ The interactive tool offers a means to pin- point irregularities in data provided on OpenURLs ◦ The Recommended Practice describes how to create a Completeness Index ◦ Completeness Index allows OpenURL quality problems to be quantified
IOTAWHY YOU SHOULD USE IT… ◦ Link resolver vendors can implement the Completeness Index in their products to help identify problematic OpenURL sources ◦ Librarians can use suggestions and Completeness Index to more effectively communicate quality problems to content providers ◦ Content providers can use the online interactive tool to identify problems with the data they provide
IOTATHE WORKING GROUP… Adam Chandler (Chair) Database Management and E-Resources Librarian, Cornell University Library Rafal Kasprowski Electronic Resources Librarian, Rice University Susan Marcin Licensed Electronic Resources Librarian, Continuing & Electronic Resources Management Division, Butler Library Columbia University Oliver Pesch Chief Strategist, E-Resource Access and Management Services, EBSCO Information Services Clara Ruttenberg Electronic Resources Librarian, University of Maryland Elizabeth Winter Electronic Resources Coordinator, Georgia Tech Library, Collection Acquisitions & Management Department Jim Wismer Manager, Software Engineering, Thomson Reuters Aron Wolf Data Program Analyst, Serials Solutions
IOTACURRENT STATUS… ◦ Technical Report in final draft ◦ Recommended Practice has been submitted to NISO ◦ Interactive Online Tool remains available
Active NISO Initiatives DAISY Standards Demand-Driven Acquisition (DDA) of Monographs Digital Bookmarking and Annotation E-book Special Interest Group (SIG) IOTA: OpenURL Quality Metrics I2 (Institutional Identifiers) ISO Project 25964 JATS: Journal Article Tag Suite (Also known as Standardized Markup for Journal Articles) KBART (Knowledge Base and Related Tools) (NISO/UKSG) NCIP (NISO Circulation Interchange Protocol) Standing Committee Open Discovery Initiative PIE-J (Presentation & Identification of E-Journals) ResourceSync SERU Standing Committee Standard Interchange Protocol (SIP) Supplemental Journal Article Materials (NISO/NFAIS) SUSHI Standing Committee and SUSHI Servers Z39.7 (Data Dictionary) Standing Committee
References Active NISO Groups http://www.niso.org/workrooms/#active SUSHI Web Site http://www.niso.org/workrooms/sushi IOTA Web Site http://www.niso.org/workrooms/openurlquality SUSHI Server Registry https://sites.google.com/site/sushiserverregistry
Have an idea for a standard or recommended practice? Email… Nettie Lagace, Associate Director for Programs, NISO firstname.lastname@example.org THANK YOU!