How Libraries Use Publisher Metadata


Steven Shadle (speaker)

Published in: Education, Technology
  • It’s not just about the catalog. Libraries provide access to resources through additional systems.
  • ANIMATION. CLICK FOR EACH SECTION. It’s not possible to provide this amount of service without the systems we’ll talk about today.
  • In 2009, UW Libraries had a small group work on developing user ‘personas’ which serve as surrogate and help us with designing user services. We’ll discuss two of these personas (Brooke and Richard) who account for nearly 90% of our users.
  • Descriptions provide more detail than needed for this presentation but provide context for user behavior.When Brooke first started at UW last year, she felt overwhelmed by all the activities and classes going on at once, and all the decisions she had to make. She’s learned to cope by tuning out most of the ‘extras’ and just focusing on what she needs to do immediately. She relies on MyUW to access all of her school information in one place, and keeps track of all her friends and social activities on Facebook.Brooke comes to the library at least once a week to study and use the computer labs – she doesn’t like to carry her laptop around campus all day because it’s too heavy.For class assignments, she mostly uses printed course materials  and  websites like Google and Wikipedia. She sometimes tries the Libraries website but she only uses the main search box on the home page because she’s not sure what the other links are.What she needs to doGet assigned readings for class Find academic articles to cite in a paperPain pointsIntimidated by all the different choices Doesn’t understand terminology,  like the  difference between articles, journals, and databases
  • He focuses on journals because that’s where the important work in the area is published, and he uses EndNote to save a list of all the articles he comes across.Richard’s faculty advisor refers him to specific authors and articles, and fellow students actively share advice about other resources to try (such as Web of Science, which lets him see who has cited a certain article). He also makes heavy use of internet sites like Google Scholar and public databases on government sites.Richard usually works from his lab, where he has several different networked computers to run simulations.  He also visits the Engineering and Business Libraries and Suzzallo Library to retrieve materials from different disciplines that relate to his project. Richard is very comfortable with technology. Generally if he has trouble online he will try figure it out for himself or search for online instructions rather than ask someone for help.What he needs to doLook up full text articles (already has citations)Refer to facts in authoritative reference materialsOrganize research materialsPain pointsGetting from a citation to full text Only comfortable using his one ‘go-to’ database
  • Richard has been told by one of his advisors that he should review some of the recent proceedings from this conference to bring him up to date on the topic. His advisor provides him the conference website as a starting point.The full-text is not available from the conference website. Richard is an experienced researcher that knows the library frequently has conference proceedings so Richard searches the library catalog to see if the library can provide him with any recent proceedings.
  • Richard enters the name of the conference in a Keyword search. Note that “Keyword” is the default but there are many other search options available.
  • The first two results appears to be the conference proceedings from the 2010 and 2011 conference.Perfect! Exactly what Richard was looking for. Recent proceedings so he can read about current research on the topic.Richard clicks on the link for the second record (the 2011 proceedings).
  • Richard is presented with the catalog record for the 2011 proceedings. The keywords from Richard’s search appear in red.The record was retrieved because the name of the conference appeared as the “author” (and also appeared as part of the “title”).This record provides a link to the full text, a note about the Springer package as well as information about the resource, including series, subjects, editor, ISBN and OCLC record #.
  • Clicking on the “SpringerLink Connect to this title online” link gets Richard to the full-text of the proceedings on the Springer website. From here he can review the table of contents and read any individual papers he is interested in. He is also able to save a PDF of the proceedings that he can use later.
  • The library catalog was able to provide access to the conference proceedings because Springer made a catalog record available for the library to download and add to the catalog.The library can directly download MARC records from this Springer website and load them into their local catalog.
  • Here’s the MARC catalog record that Springer makes available for download. This is the record that was originally downloaded into our library catalog. Even though the use of numeric MARC field labels may be different, the elements in this record should be familiar to many of you.Later changes were made to that record (ie, conference author, subject terms) to provide better, more specific access. A revised version of the record (which included these changes) was later downloaded from OCLC and overlaid the Springer record. The record currently in the catalog is a revised (better) version of the record than what was originally downloaded.
  • The citation is often referred to as the “source” and the individual services or referred to as “targets”
  • Not to be confused with the computer science/expert system definition of a knowledge base.A knowledgebase is typically presented in the form of a spreadsheet that lists information about all of the electronic resources a library licenses on behalf of their customers. Generally includes: title (either journal or book) book author identifiers (ISBN, ISSN) journal coverage resource provider (ie, online publisher or content provider) URL
  • HEAVY ANIMATION User starts in Web of Science, identifies an article they’re interested in. In this case, the user is in Web of Sciences and finds an article with the title “Improving Group Attention” which is published in the Sept 2011 issue of “Group Decision and Negotiation” (a Springer Journal) User click on purple “Check for Full-Text” button. This sends an OpenURL from Web of Science to the link resolver (resolver.lib). This is part of the OpenURL that is sent by WoS. Notice the citation elements in the URL (including date, ISSN, article DOI). We will look at a complete OpenURL in a minute. Resolver parses the citation elements, checks the ISSN and date against the KB and identifies the Springer version as having the full-text. The data about Springer journal titles are provided by Springer to Serials Solutions (actually Serials Solutions grabs the metadata (title/ISSN/dates/links) from the Springer download site on a monthly basis (see note below). UW Libraries staff then profile their specific subscriptions against the Serials Solutions KB. Link resolver knows the rules to create a target URL and sends user session to the full-text of that article NOTE FROM LEE DEITESFELD/SERSOL: For journals, we go to the SpringerLINK platform (, and download a CSV file. For ebooks/book series, we go to the Springer “downloader tool” (, and download an Excel file. And, in the rare case with SpringerOpen (, we actually have software that scrapes the SpringerOpen website for title-level metadata.
  • Overview of the link resolver steps we just walked through
  • ANIMATION. This is the openURL sent by Web of Science (note sid/ like a lot of complication until you format it to see that the OpenURL essentially consists of two elements:Link resolver address (in this case, elementsIn order for link resolvers to function properly, citation metadata coming from a source must be accurate. In most cases, this data is originally coming from you, the publisher.
  • The next three slides are examples illustrating why libraries use link resolvers. In general they provide better or additional access to a resource than what the library is able to supply through the traditional library catalog or from it’s physical collection.ANIMATION. Without the link resolver, the user has drill down through the library catalog and then through the journal website to get to the article full-text.Click through the drill-down process. SEVEN clicks to get to the full-text article through the library catalog.On a side note, there are still some library staff who attempt to show users this method for getting to a full-text article. In cases where link resolution fails, this may be what the user has to do.
  • ANIMATION.The library may get access to electronic full-text through a source besides directly from the publisher. In this example, the user goes directly to the Springer website and can’t get to the article because the the library doesn’t subscribe to the content. Integrative Physiological & Behavioral Science is a Springer journal and this specific article is available from the Springer website but the UW doesn’t subscribe to it from Springer. Instead KB indicates that the issue in question is available from an EBSCOHost database. The Link Resolver works with the KB to identify which source is the most appropriate source for the full text.
  • ANIMATION.Get the user to a service that will get the content if the library doesn’t license it. Integrative Physiological & Behavioral Science is a Springer journal and this specific article is available from the Springer website but the UW doesn’t subscribe to it. However, the UW may have the journal in print or may have a reciprocal borrowing relationship with another library for print. So if the link resolver, can’t find an online version, the user is giving options to find another version (or to fill out a request to have it scanned). Every library customizes the services offered depending on it’s own borrowing arrangements, library catalog, etc. Some libraries provide links to related full-text databases (with the idea that if the library doesn’t subscribe to *this* article, there may be other articles in XYZ database which would serve the user’s purpose.
  • The final access method we will talk about is the library discovery service. From the user perspective, it is similar to the Google search experience. These are some of the reasons that libraries use a discovery service.Discovery services differ from federated search applications in that discovery services don’t search live sources. By searching pre-indexed data discovery services return search results very quickly. Discovery services are touted as an evolution beyond federated search and in some ways they are. Some discovery services either provide integration with federated search or provide an API for others to do the integration
  • Each library discovery service has a different mix of content and can often be customized to include local content.However, most library discovery services consist primarily of the content that has been historically available from libraries (books, journals, articles).DON’T MENTION OTHER OCLC FUNCTIONALITIES/COMPONENTS. EUROPEANS GENERALLY DON’T USE AS CATALOGING SOURCE/RESOURCE SHARING/ETC.
  • HEAVY ANIMATIONIn Brooke’s ENGL 210 (English Medieval and Early Modern Literature), she learns about the Anglo-Saxon literary practice of Opus geminatum(twinned work; a work consisting of a pair of texts, one in prose and one in verse). Her professor also mentions that paraphrase was often used as a literary device in this time period. Brooke is required to write a research paper on a topic of her choice and because of other commitments, she wants to get as much done as she can early on.Brooke doesn’t take note of the latin phrase, but she does remember the concept so she inputs the words “paraphrase anglosaxon literature” into UW WorldCat (the UW Libraries’ discovery system). #3 includes the latin phrase in the article title (which she remembers as soon as she sees it). Because it has that phrase and because it’s held by the UW, she clicks on the link for that citation. Clicking the first “View full text” link gets Brooke to the article on the Springer website. Brooke gets from the discovery service to the full-text through link resolution (just like the Web of Science example we looked at earlier...the purple button). But unlike a traditional A&I database, a discovery service includes metadata from dozens of sources describing millions of resources (and not just journal articles).Note that most of the information in the WorldCat entry is the same data that appears in the Springer website. Also note that the “Database” referenced in the WorldCat entry is “ArticleFirst.” The Springer indexer provides describes the citation for Springer’s system. That data is sent to OCLC (and to many other database providers, I assume) where it used as the basis for resource discovery in any number of systems. The metadata provided by Springer staff (and in some cases originally created by authors) is recycled into any number of database and systems that library use. So an error in the original metadata can be propagated across any number of systems and services.
  • Now that Brooke remembers the phrase “Opus Geminatum” she searches in Google using that phrase.The results are useful results including a Wikipedia entry for a specific instance of an opus geminaum “Candidus of Fulda” and it also uses the phrase “twinned work” to describe the literature style. There are other resource (mostly articles and books) that might be useful for additional research. But note the second entry is for the Springer-published article that she found earlier. Clicking on the Google link gets her to the full-text just as it does when she’s going from a citation database or the discovery service using link resolution.This works because the library has profiled it’s IP ranges with Google, so that Google can pass along IP information to the link target. As long as Brooke is on a campus workstation or has proxied her session, Google will recognize her as a University of Washington user Google has the article metadata necessary to create the link to the full-text (likely an article DOI)So if you create article or journal metadata, it will most likely appear in Google and be used to support Google Search
  • One of the benefits of a library discovery system is the integration of previously silo’d discovery systems/services. A good discovery system will be scoped to provide access to a majority of the resources that can be provided by the library. As our head of reference put it, WorldCat Local is a good place to start research and should provide for the needs of most undergraduates.Brooke’s search for “paraphrase anglosaxon literature’ in WorldCat Local resulted in 28 resources. Of those, the eBooks, CDs and computer files may or may not have been in the traditional library catalog. The peer-reviewed articles definitely would not have been in the library catalog (so Brooke would need to search a separate database to find journal articles).Here is a more illustrative case of the potential breadth of a discovery service. Looking down the format facet, one sees a search for Jesus Christ Superstar brings up video, soundtracks (cd/lp), musical score, books about the opera and on related topics, etc. Looking at the format facet, one sees the huge variety of resource types indexed in WorldCat (including 36 book chapters, 26 encyclopedia articles and even a toy!). The metadata for these resources is coming from various sources following encoding and content standards. it’s the job of the discovery service provider to manage, integrate and provide quality control over this wealth of metadata.
  • WorldCat Local article citations come from dozens of sources, MARC records come from thousands of libraries. Because the discovery service indexes these all in one central index, there must be an underlying set of data elements that all the incoming data must be mapped to. These data elements must be rich enough to provide for robust search (so more than just keywords, must be able to search by author, title as well as limit by format, etc.)If incoming data is miscoded or (more likely) the mapping between the datasets is incorrect, bad results will occur. We’ll take a look at some of those in a minute.
  • ANIMATIONS (ONE FOR EACH EXAMPLE) - Here’s the list of formats from the format facet from the Jesus Christ Superstar search. Included are examples of where the metadata is miscoded (or mapped incorrectly) and the resource appears under the incorrect format facet. Microform Book. This is a microfilm, but it is a set of office files from the NAACP office (and thus should appear under the “Archival Material” facet. University Publications of America generally does microfilm books and had their default coding set up provide book metadata elements. Article Chapter. All 36 entries under this category are sound files provided by Alexander Street Press. These should be appearing under the eMusic facet, but somewhere along the way, a coding detail got lost and they were coded as “Article Chapters.”Computer File. Technically it is, but this is actually a master’s thesis and should have appeared under the “Thesis/Dissertation” facet. In WorldCat, “Computer file” is reserved for resources inherently computer-like such as software. This record was created by OAI harvesting from Bowling Green State University’s electronic thesis and dissertation collection. The element that describes this as a thesis was not included in the template used to create these records and thus was not included in the records that were loaded into WorldCat.In these examples, non-MARC metadata was used to create a record which then was converted into a MARC record which was loaded into WorldCat. Problems like the ones illustrated here can result from difference in encoding standards, content standards or in coding errors. In a discovery service, it is the responsibility of those contributing and managing metadata (and metadata crosswalks) to confirm that the metadata being contributed conforms to the discovery service standards.
  • Working in cooperative systems, library catalogers could do it all when the only records we created were for books (not articles) physically held in the library. Those days are long gone.
    1. 1. How Libraries Use Publisher Metadata Steve Shadle Serials Access Librarian University of Washington Libraries
    2. 2. Purpose Provide an overview of how libraries provide access to publisher content using publisher-provided metadata • Library Catalog • OpenURL Link Resolver • Library Discovery System
    3. 3. The University of Washington Oldest public university on the West Coast (founded 1861) Largest public university recipient of federal research funding (nearly $800 million) Total enrollment: 48,022 students 16 colleges offering more than 1,800 undergraduate courses each quarter across 3 campuses Ranked 16th among the world’s top 500 universities (ARWU) 54 programs ranked Top 10 in the U.S. (US News & World Report)
    4. 4. University of Washington Libraries Digital Library • 500,000 licensed electronic books and 100,000 online journals • 600,000 locally digitized items in 300 collections • 6 million licensed journal articles downloaded • 9 million separate sessions on Libraries’ websites Physical Library • 16 libraries on three campuses with 5.2 million visits last year • 7 million print volumes, 6 million microforms, 20K print serials • 1.8 million check-outs Reference Services • 15,000 reference questions answered online • 50,000 reference questions answered in person
    5. 5. Who Are UW Library Users? • • • • • Brooke the Beginner Richard the Researcher Sharon the Scholar Paul the Professional April the Alumna
    6. 6. Meet Brooke "I'd rather use an online article that ‘kinda works’ than go to the hassle of finding a book in the library.“ • New to the research process and academia • Working on several assignments in different humanities disciplines, but not an expert in any of them • Will take the first thing that’s good enough Brooke is a 19 year old undergraduate who hasn’t yet declared a major. Right now she’s taking classes in English, History and Biology. She hasn’t used the library website much yet, but will need to do research for many different class papers and projects over the next couple of years. How Brooke Uses the Libraries Website • Finds class materials by looking up the class in MyUW and following the link to Course Reserves. • Checks the hours at Suzzallo Library to see how late they are open before she goes there to study • For class papers, if she can’t find enough articles using Google, she will visit the libraries’ website. She looks for basic academic articles across several different topic areas by entering key words in the main search box, then using filters to refine the results.
    7. 7. Meet Richard “Accessing full-text articles online is my primary use of the library and is central to my research…but I still go to the library for some reference materials that aren’t online." • Dedicated full time student with significant knowledge in his area of study • Working on a long term, in-depth project • Will pursue all avenues to obtain materials related to his research Richard is a 29 year old doctoral student in the College of Built Environments. He’s working on a dissertation about public transportation utilization and incentives, which he models with computer simulations. Richard has already completed a Master’s in Civil Engineering and has used academic libraries for research at both his undergraduate and masters’ institution. To earn his doctorate, Richard needs to do original research, which means reviewing everything published on transportation modeling. How He Uses the Libraries Website • Searches the catalog for specific texts that he’s seen referenced in other works or heard about from colleagues • Uses Web of Science to find out which other researchers have cited articles important to his project • Searches for the full text of citations that he’s found through Google Scholar
    8. 8. Richard Looks for Conference Proceedings
    9. 9. Library Catalog • Historically, the catalog was a record of what the library physically “held” • Beginning in the mid-90s, libraries started including online licensed resources in the library catalog • Does not include journal articles • Most library catalogs are still MARC-based
    10. 10. Catalog Search Transport system telematics
    11. 11. Search Results
    12. 12. Catalog Record for 2011 Proceedings
    13. 13. Full-Text Success
    14. 14. Catalog Record Source
    15. 15. Springer MARC Catalog Record
    16. 16. OpenURL Link Resolver Service that takes a citation formatted as an OpenURL and provides the user with library services related to that citation. These services can include: • Accessing the online full-text • Placing an ILL (InterLibrary Loan) request • Searching a library catalog • Finding related resources
    17. 17. OpenURL Knowledge Base A database containing information about electronic resources such as electronic journals or eBooks and their availability and accessibility. Using the knowledge base, an OpenURL link resolver can determine if an item (article, book, etc.) is available electronically and identify the appropriate copy for a user.
    18. 18. OpenURL Linking Title Group decision and negotiation Group decision and negotiation Group decision and negotiation Group decision and negotiation ISSN eISSN StartDate Resource SpringerLINK Archive - Humanities, 0926-2644 1572-9907 4/1/1992 12/31/1996 Social Sciences and Law SpringerLink Contemporary - Orbis 0926-2644 1572-9907 1/1/1997 Cascade Alliance SpringerLink Contemporary - Orbis 0926-2644 1572-9907 1/1/1997 12/31/2009 Cascade Alliance (Perpetual Access) 0926-2644 1572-9907 11/1/2004 Resolver &id=doi:10.1007/s10726-011-9233-y&… EndDate 1 year ago Business Source Complete URL l.asp?genre=journal&issn=0926-2644 l.asp?genre=journal&issn=0926-2644 l.asp?genre=journal&issn=0926-2644 sp?db=bth&jid=OFY&scope=site article&id=doi:10.1007/s10726-011-9233-y
    19. 19. Link Resolver • Parses the citation elements from the source OpenURL • Tests those elements against a library’s knowledge base • Identifies targets based on test results • Creates and offers links based on linking logic
    20. 20. OpenURL and Citation Data Ferreira, Antonio, Pedro Antunes, and Valeria Herskovic. 2011. "Improving Group Attention: An Experiment with Synchronous Brainstorming". Group Decision and Negotiation. 20 (5): 643-666. &rft.atitle=Improving%20Group%20Attention%3A%20An%20Experiment%20with%20Synchronous%20Brai nstorming url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitl &rft.aufirst=Antonio &rft.aulast=Ferreira e=Improving%20Group%20Attention%3A%20An%20Experiment%20with%20Syn & chronous%20Brainstorming&rft.aufirst=Antonio&rft.aulast=Ferreira& &rft.spage=643 &rft.epage=666&rft.genre=article&rft.issn=0926-2644&rft.issue=5& &rft.epage=666 &rft.genre=article rft.jtitle=GROUP%20DECISION%20AND%20NEGOTIATION&rft.pages=643&rft.issn=0926-2644 666&rft.part=SI&rft.spage=643&rft.stitle=GROUP%20DECIS%20NEGOT&rft.volu &rft.issue=5 me=20&rfr_id=info:sid/ &rft.jtitle=GROUP%20DECISION%20AND%20NEGOTIATION &rft.pages=643-666 &rft.stitle=GROUP%20DECIS%20NEGOT &rft.volume=20 &rfr_id=info:sid/ & & &rft_id=info:doi/10.1007%2Fs10726-011-9233-y
    21. 21. Why Use a Link Resolver? • Navigating library systems is time consuming Regent, D., et al. 1999. "Place actuelle de l'imagerie radiologique dans l'exploration des MICI". Acta Endoscopica. 29 (3): 189-202.
    22. 22. Why Use a Link Resolver? • Gets the User to the Appropriate Copy Ryerson, D. (January 01, 2010). Postgay drama: sexuality, narration and history in the plays of Mark Ravenhill. Textual Practice, 24, 5, 863-882.
    23. 23. Why Use a Link Resolver? • Provides alternative services if full-text not licensed Saucier, D., and N. Gaudette. 2000. "Actual Memory Ability Significantly Predicts Self-Evaluations of Memory". Expert Evidence. 8: 3-14.
    24. 24. Library Discovery Service A search interface to pre-indexed metadata and/or full text documents made available by a library. – Simple Search – Comprehensive (Good Starting Point) – Fast Response Time – Can Include Local Collections in Addition to Licensed Resources – Supports “Get” as well as “Find”
    25. 25. Library Discovery Service 13M eBooks 30M digital items Google Books Hathi Trust OAIster 681M article citations OCLC WorldCat 225M books
    26. 26. Brooke Starts Her Research
    27. 27. Brooke Continues Her Research in Google
    28. 28. One-Stop Shop
    29. 29. Library Discovery Metadata • Typically comes from many sources • Must be mapped to an underlying set of data elements in order to be indexed • Data element set must be rich enough to provide robust search • Data must be accurate!!
    30. 30. The Most Common Fail
    31. 31. You Can’t Rewrite History 1989- American Journal of Reproductive Immunology ISSN 1046-7408 American Journal of Reproductive Immunology and 1985-1988 Microbiology ISSN 8755-8920 American Journal of 1980-1984 Reproductive Immunology ISSN 0271-6352 American Journal of Reproductive Immunology ISSN 1046-7408 1980-
    32. 32. When Standards Collide
    33. 33. Summary • Libraries use more than just MARC records in providing access to publisher content • Libraries use more than just the library catalog in providing access to publisher content • Metadata created by publishers is distributed to various systems, not just to libraries • Any source that supports OpenURL can potentially provide access to publisher content • Metadata accuracy is about more than correct transcription
    34. 34. Support for the Publisher KBART (Knowledge Bases and Related Tools)  UKSG & NISO joint initiative  Recommends best practices for formatting and distributing title lists  Phase 1 (2010) focused on eJournals  Phase 2 (in process) includes eBooks and Conference Proceedings  Phrase 2 also addressing other issues such as Open Access content and consortial access
    35. 35. Publisher Involvement 1. Everything can be found at 2. Review the requirements (data samples available) 3. Format your title lists accordingly 4. Self-check to ensure they conform to recommended practice 5. Ensure that you have a process in place for regular data updates 6. Register your organization on the KBART registry website:
    36. 36. Serials Solutions KnowledgeWorks • Includes a certification program for content providers • Project Transfer  UKSG Code of Practice  Help publishers ensure that journal content remains easily accessible by librarians and readers when there is a transfer between parties, and to ensure that the transfer process occurs with minimum disruption.  Addresses responsibilities for both transferring and receiving publisher  Includes guidelines addressing title access, digital files, subscription lists, URL changes, communication and DOI ownership PIE-J (Presentation and Identification of E-Journals) • Primarily discusses presentation but also has recommendations regarding metadata • /
    37. 37. MARC Metadata MARC Record Guide for Monograph Aggregator Vendors • Developed by the Program for Cooperative Cataloging • Provides detailed MARC data specifications for eBooks in packages and aggregations MarcEdit • An open source MARC editing utility developed by Terry Reese • Easy-to-use tool that can crosswalk data between MARC and other formats
    38. 38. Final Word Library catalogers can’t do it all!
    39. 39. And finally.... Downloaded under a Creative Commons license