Inside Out and Outside In: A Holistic Approach to Metadata Assessment for an Off-site Storage Collections


With demands on library space increasing, while research collections continue to grow, off-site storage is becoming a reality for academic research libraries across North America. In 2005, the University of Toronto Libraries (UTL) established a high-density storage and preservation facility to preserve and maintain print serials and low use monographic resources. Aptly named Keep@Downsview, this collection now contains over 3 million volumes and has evolved into a collaborative partnership with four other Ontario universities. As our off-site collections continue to expand, this presents unique challenges for UTL in facilitating resource discovery and access. Since library users cannot physically browse the Keep@Downsview collections, the only way to discover these resources is through the metadata contained in the library’s discovery systems. To ensure that these resources remain accessible to the scholarly community, it is crucial that the metadata for these collections is optimized for search and discovery. With the goal of improving access to our off-site collections, we conducted an investigation into the state of our metadata for the print serial collection held at the Keep@Downsview facility. In this assessment, we analyzed metadata elements based on the following metrics: completeness, accuracy, consistency and coherence, conformance to expectations, timeliness and accessibility. Additionally, we conducted a qualitative investigation into how our metadata is perceived through the eyes of researchers, librarians, and our Keep@Downsview partners. By approaching metadata assessment from both an ‘Inside-Out’ and ‘Outside-In’ perspective, our aim was to obtain a holistic view of the quality and effectiveness of our metadata and explore strategies for improving the discovery of our remotely held collections.

  1. 1. Inside Out and Outside In A Holistic Approach to Metadata Assessment for an Off-site Storage Collection Marlene van Ballegooie and Juliya Borie University of Toronto Libraries NASIG Conference 2019
  2. 2. Agenda ● Our Context ● Why metadata assessment for off-site storage? ● Study Methodology ● Assessment Metrics ● Observations ● Next Steps
  3. 3. Our Context ● University of Toronto - Canada’s largest university ○ 70,890 undergraduate students ○ 19,187 graduate students ● Decentralized library system with 44 libraries on three campuses ● Over 13 million physical items and approximately 2.8 million electronic resources
  4. 4. What is Downsview? ● A purpose-built, high-density preservation facility designed to provide a secure environmentally controlled space that is optimal for long term preservation ● Built in 2005 ● Approximately 3 million volumes with capacity for further expansion to five million items
  5. 5. About the Downsview Collection Current collection composition: ○ 2,287,810 Monographs ○ 533,632 Serial volumes (23,834 titles) ○ 102,467 Other (including music, audio, video, maps, etc.) Collection grew organically out of space needs: ○ Duplicate monographs ○ Low use resources from Robarts stacks ○ JSTOR journals and other cancelled journal runs ○ Pre-1923 holdings digitized by Internet Archive
  6. 6. Keep@Downsview as a Partnership ● Keep@Downsview is a shared print initiative between five Ontario universities ● At the Downsview facility, partners can retain low-demand materials at lower cost and without crowding storage spaces in the region with multiple copies of lesser used items ● Partner libraries retain ownership of resources regardless of whether or not a physical transfer actually takes place.
  7. 7. Why Metadata Assessment for Off-Site Storage?
  8. 8. Challenges for Remote Collections ● No open shelves or public access to collections ● No ability to physically browse collections ● Difficult to replicate serendipitous discovery in online environment ● Total reliance on metadata for discovery
  9. 9. Improve Services ● Better metadata = improved discoverability = better service ● Recognition among staff that Downsview services could be better ● High quality services require high quality metadata ● Metadata cleanup is an investment and will enable future possibilities
  10. 10. Facilitate Comparison Across Collections ● As Downsview becomes a shared off-site storage facility, there is increasing need to be able to compare across library collections ● For the Keep@Downsview Partnership ○ Quality metadata is critical to meeting project goals ● For national and international print preservation initiatives ○ Canadian Collective Print Preservation Strategy ○ HathiTrust
  11. 11. Upcoming Plans for System Migration ● UTL is planning for a transition to a new library services platform (LSP) ● Prospective system migration is prompting a review of all metadata ● Opportunity to clean up and address legacy data issues ● Necessity to re-think established workflows
  12. 12. Methodology
  13. 13. Methodology Utilize multiple techniques to obtain different views of our serials metadata ● Local records vs. community managed records ● Local records vs. CONSER records ● Perceptions of staff and library partners ● Perception of users Different perspectives provide a holistic view of the quality and effectiveness of our serials metadata
  14. 14. Local Records vs. Community Managed Records ● Downsview collection contains 23,560 print serial records ● Matched 10,161 to OCLC master records on key identifiers such as ISSN and OCLC number ● Combined dataset was loaded into MySQL for analysis ● SQL queries to determine convergence/divergence between local data elements and OCLC records
  15. 15. Local Records vs. CONSER Standard ● Representative sample of 400 journal records that could not be matched with OCLC ● Assessed the quality of our records vs. CONSER standard records ● Assessed holdings records ● Focus on foreign language titles as 40% of the collection is in languages other than English
  16. 16. Perceptions of Staff and Library Partners Aimed to gain insight on metadata quality through the eyes of staff members engaged in user services and Keep@Downsview partners ● Conducted three rounds of focus groups with user services librarians ● Survey sent to Keep@Downsview partners
  17. 17. Perception of Library Users Aimed to gain insight on the library user's experience in navigating serials metadata ● Survey sent to faculty and graduate students in 3 Departments ● Conducted focus group with graduate students
  18. 18. Assessment Metrics
  19. 19. Bruce and Hillmann on Metadata Quality ● Defines general characteristics of metadata quality ● Quality measurements and metrics ○ Completeness ○ Accuracy ○ Provenance ○ Conformance to expectations ○ Logical consistency and coherence ○ Timeliness ○ Accessibility
  20. 20. Completeness
  21. 21. Completeness ● According to Bruce and Hillmann: "The element set used should describe the target objects as completely as economically feasible...The element set should be applied to the target object population as completely as possible."
  22. 22. Does the element set completely describe the objects? Serials are complex constructs that combine whole/part relationships and aggregation relationships: ● they have a whole/part relationship to individual issues published over time ● each individual issue is an aggregate of articles (IFLA LRM)
  23. 23. Data Elements Used to Identify a Serial by Participants ● Title ● Place of publication ● Previous and subsequent title ● Holdings statements ● ISSN ● Editor (in cases of prominent editors) ● Series ● Ceased date ● Author/issuing body ● Subject headings (sometimes) Image: "Forest" by Jean Jullien, Flickr
  24. 24. CONSER Levels of Description: ● Full level records contain a full complement of elements that are applicable to the serial and all elements contained are fully authoritative. ● Core level records contain those elements essential to the description and access of the serial and all elements contained are fully authoritative. ● Minimal level records contain the essential (i.e., core) elements for description but subject elements may not be present and one or more headings may not be authoritative. Quality of UTL records according to CONSER Levels of Description
  25. 25. Faculty see title changes as part of the same journal family From Faculty: "If I am not mistaken, records are usually separate for each title, so it may take some investigation efforts to ensure this is the same journal." Sometimes it looked like I found the correct record for the succeeding title, but that record had little information to ensure it was the correct one (e.g., ISSN, place of publication or similar, perhaps also an indication that the previous title was such and such) - so can't always be sure it is the correct one."
  26. 26. Accuracy
  27. 27. Accuracy According to Bruce and Hillmann: "Metadata should be accurate in the way it describes objects...Minimally, the information provided in the values needs to be correct and factual." • Elimination of typographical errors • Use of standard abbreviations • Conform to standard expression of personal names and place names
  28. 28. Use of Standard Abbreviations In the local record/OCLC comparison, standard abbreviations for journal titles were analyzed: • 4,647 journal title abbreviations in OCLC records • 2,308 local records had matching abbreviations • 2,213 local records did not contain a journal title abbreviation • 126 local records had abbreviations that did not match OCLC
  29. 29. Are controlled vocabularies updated when relevant? 20% of records in the sample had Unauthorized Authorities Cross reference used as main heading ○ American Society for Testing and Materials$bCommittee E-9 on Fatigue instead of ASTM Committee E-9 on Fatigue Earlier established forms of name used ○ Wales. National Library, Aberystwyth instead of current heading National Library of Wales
  30. 30. Foreign Language Materials From a Downsview partner: "Diacritics are always an issue, especially with older records. If matching on ISBN or ISSN it shouldn't be an issue too much though." From researcher: "In my field, it’s important to have parallel fields in non-Latin scripts because I think most students in our Department even are not aware of Library of Congress transliteration. Otherwise, there are many other ways to transliterate, for example, for Cyrillic titles, there are National tables of transliteration. So, when I search, it’s good to search in the original language and not to think what transliteration they used to transcribe."
  31. 31. Differences in Transliteration can affect discoverability From Faculty member: "Transliteration, spelling are a challenge, so it is important to be attentive." Sample problem: UTL Record: $6880-01$aTaipingyang dao guo yan jiu =$bResearch on Pacific Island countries /$cChen Dezheng zhu bian; Lü Guixia, Qu Shengfu zhu bian OCLC Record: $6880-01$aTai ping yang dao guo yan jiu =$bResearch on Pacific Island countries /$cChen Dezheng zhu bian ; Lü Guixia, Qu Shengfu zhu b ian
  32. 32. Consistency and Coherence
  33. 33. Consistency and Coherence According to Bruce and Hillmann "Elements are conceived in a way that is consistent with standard definitions and concepts used in the subject or related domains and presented to the user in consistent ways." ● Use of standard data structures (i.e. MARC) ● Ability to search collections of similar objects using similar criteria
  34. 34. Sparse Records From a user services librarian: "Sometimes it’s a lot of information, and sometimes it’s nothing. I mean sometimes it’s so sparse that I’m like, how am I even supposed to figure out what this is, there’s no information. So the contrast is a little stark sometimes."
  35. 35. Successive vs. Latest Entry "Title changes were on the same record, but now that the volumes are in Downsview they're on separate records. Having different titles on one record made it very confusing to match volumes, and to see what, if anything, was already in Downsview."
  36. 36. Variance in use of parallel titles OCLC record: $aAdministrative law reports.$nFourth series =$bRecueil de jurisprudence en droit administratif. Quatrième série UTL record: $aAdministrative law reports.$nFourth series
  37. 37. Is data in elements consistent throughout? From a user services librarian "I find it frustrating … that the volumes are described inconsistently. But I know that that comes from ... someone entering the information, and, maybe we don’t really have a clear or communicated practice, or criteria, for how individual items are reflected through the creation of individual [item records]." "With Downsview, the Keep@Downsview with other institutions adding stuff in, and also with places like OISE and the Inforum [UTL libraries that use Dewey Decimal classification] ... the call numbers don’t always match so therefore things are not in order anymore. So it makes it harder to find."
  38. 38. Is data in elements consistent throughout? From a user services librarian: "It’s sometimes confusing because in some records the electronic is mixed in with the print and in others they are pulled apart as separate entities. I think consistency around that would be very useful."
  39. 39. Conformance to Expectations
  40. 40. Conformance to expectations According to Bruce and Hillmann: "Element sets...contain those elements that the community would reasonably expect to find." "It is important that community expectations be solicited, considered, and managed realistically."
  41. 41. Is the metadata in line with community expectations? Librarian and researcher expectations of how serial title changes should be presented most effectively From a user services librarian: "I wish there was a way that you can ... bring them all together … every serial or every volume or every edition that with all the title variations, it can be seen in one complete place. I would love to see that. Wouldn’t it be great to be able to just flip a switch and see it all come together and then send it back out to its [original records]".
  42. 42. Is the metadata in line with community expectations? Researchers' expectation was that serial metadata would mimic a combined display this is common on many vendor websites "[I prefer] one display that guides through title changes in the journal’s history. If the metadata allows for such accommodation. "
  43. 43. Lack of metadata consistency between local catalogues and Worldcat was baffling to researchers Metadata Inconsistencies Between Systems "I’ve also had the experience of searching something in RACER, and RACER didn’t show up ... but on Worldcat it came up, which is confusing for me. How is it possible for something to be on Worldcat and be part of what we should be able to find in RACER, but it was only after finding it in Worldcat and then copying and pasting words in RACER that it showed up in RACER."
  44. 44. Collaborative Catalogues Researchers expressed interest in contributing their knowledge of a serial resource to the library's catalogue records Can we balance quality control with user expectations for a more open and collaborative catalogue? "Is there a possibility of people – kind of like in a Wikipedia style – adding notes to the journal?"
  45. 45. Timeliness
  46. 46. Timeliness Bruce and Hillmann describe two different aspects of metadata timeliness: currency and lag. • Currency - "When the target object changes but the metadata does not" • Lag - "When the target object is disseminated before some or all metadata is knowable or available."
  47. 47. Is the metadata regularly updated as the resources change? Examples of problems with currency: ● Control field 008 06 – Type of date/Publication status ○ 606 local records coded as 'c' - Continuing resource currently published while OCLC records have same value coded as 'd' -Continuing resource ceased publication ● Control field 008 07-14 – Date 1 and Date 2 ○ 1,329 local records have Date 2 set as 9999 while OCLC records have a terminal date
  48. 48. Obsolete MARC Coding Practices Records not updated as MARC coding rules have changed ● Obsolete or incorrect language codes in fixed fields ○ Croatian – use of 'scr' instead of current code 'hrv' ○ Serbian – use of 'scc' instead of current code 'srp' ○ Some codes were simply incorrect – 'cro' and 'ser' ● Obsolete practice of recording multiple language codes in 041 field ○ $afregerita instead of $afre$ager$aita ○ $aengger$bczerumrus instead of $aeng$ager$bcze$brum$brus
  49. 49. Changes in Publisher Not Updated UTL record: =245 00$aInternational journal. =260 01$aToronto,$bCanadian Institute of International Affairs. OCLC record: =245 10$aInternational journal. =246 1$iVolumes for <spring 2005-> also have title:$aIJ =260 $a[Toronto] :$bCanadian Institute of International Affairs,$c[1946?]- =260 3$3<Winter 2008/09-> :$aToronto :$bCanadian International Council =264 31$3<Mar. 2014-> :$aLondon ;$aThousand Oaks, CA :$bSage Publications
  50. 50. Outdated Links ● Evidence of "link rot" in some local records ● Target webpage no longer exists or no longer represents original resource =245 04$aThe lichenologist. =856 41$zAlso available online:$3Table of contents and abstracts $u$2http
  51. 51. Accessibility
  52. 52. Accessibility According to Bruce and Hillmann: "Metadata that cannot be read or understood by users has no value." "There is a need to offer different views or arrangements of metadata to meet the expectations and needs of diverse audiences."
  53. 53. Holdings are confusing "The main thing I have with holdings is that there’s too much, they are hard to follow and they are not accurate." "It's just so fiddly, you know … I'd rather see a long list and browse numerically than to see the ranges … provided that it's in order."
  54. 54. Holdings are confusing "I think it’s challenging for users that it will say Robarts, but then it’s at Downsview ... Sometimes the holding statements are hard to read in the summary. There’s a lot of stuff there. So then you’re trying to go through all the stuff that’s in Downsview to see if we have the one. But the lists are insanely long. [Then] you go in the classic catalogue and your mind spins…"
  55. 55. Displays are confusing "In this record ... all the descriptive information is below the holdings ... [T]hat’s not great either because you don’t know it’s there. But I feel from a user experience perspective if something appears right at the top, this is crucial information. And I’m not sure if 25 centimeters [is something] users need to know now."
  56. 56. Publication Information vs. Holdings "The other thing that’s confusing for users though is that they often look at the publication date range at the top of the record and they get confused because they think that’s our holdings."
  57. 57. Observations
  58. 58. Serials Metadata is Dynamic ● Keeping up with serials metadata changes is challenging ● Fluidity of serials requires systematic review of metadata ● Standards change - difficult to go back and make changes to all our records
  59. 59. Indexing is important Prior to sending serials to offsite storage, it is helpful to know if and where they are indexed Less than 20% of the sampled journals were indexed (13% with full indexing and 5.5% with partial indexing) From a user services librarian: "A lot of journals we’ve sent offsite are not indexed anywhere. So there is no way to request an article, because it’s not indexed ... Or we sent the indexes together with the journal, that happens."
  60. 60. Even if the metadata conforms to quality metrics, users perceive it through a system interface. System interface design can significantly impact discovery. Metadata and Systems are Intertwined "One of the things that I think would be very, very helpful if the links to the previous and the subsequent [titles] actually linked to the record for the previous and the subsequent. Because what happens is you click on the link thinking it’s going to take you to that record and it just takes you to the same record that you are in."
  61. 61. Content Over Format "I don’t care what the format of the periodical is, whether it is in print or in microfilm or whether it’s in the library or at Downsview … But if I do a journal title search and it’s a microform then it won’t come up and that’s not ideal."
  62. 62. Flexible Metadata Successive entry vs. Latest entry - Can it be both? Rethinking the notion of the "record" Is the MARC data structure appropriate for dynamic resources? BIBFRAME has potential as a more appropriate data structure for serials. "You have the metadata separate, but you have the system bring it all together. Sounds like a dream!"
  63. 63. Workflow Affects Discoverability ● Decentralized system = decentralized workflow = lack of consistency ● Cost-effectiveness of addressing metadata issues before sending to off-site storage (Laskowski, 2016) "Those technical reports, no one will ever be able to use them. They are so complicated!… On these large-processing projects, things like this are done to save time, but it creates, what I think, is a very confusing [display]. So it’s a solution for system problem … But maybe not the ideal solution."
  64. 64. Next Steps
  65. 65. Improving Metadata Quality In consultation with campus libraries, devise strategy for improving discoverability of serials o Investigate programmatic upgrades to bibliographic records (full overlays and targeted enhancements) o Clean up and reformat call numbers for listing o Investigate methods for cleaning up holdings records and improving display in discovery system Once methods for clean up are established, expand to non-Downsview titles
  66. 66. Building Assessment into the Process To ensure that metadata improvements are not short-lived, it is essential to build assessment into all metadata creation and maintenance activities Define key points of review for serials metadata One-time events: projects in preparation for system migration On-going review: renewal/cancellation time, transition to off-site storage Incorporate user feedback to prioritize and guide clean-up efforts
  67. 67. Advocate for Metadata Quality ● Technical services need to play an active role in advocating for metadata quality ● Since metadata is the only way to access these resources, we need to ensure that it is optimized for our users ● The involvement of technical services in preparing metadata for offsite storage is essential
  68. 68. Assessment is not just measurement, it's a conversation Assessment is more than simply measuring against a set criteria for quality It's a conversation – getting to know our metadata better We ask questions of our metadata We ask questions of our users We may not always like the feedback we get... but the conversation provides guidance and direction on where we need to go to move forward
  69. 69. "I’m really hoping that this can lead to some changes. Because, you know, you get a sense that there’s a lot of problems, and we can recognize that. I think this is an area where a little bit of work can drastically improve a lot things for everyone." Last words... There are a lot of things that are good too. But there’s always room for improvement. Hopefully, we can work together to do that.
  70. 70. Marlene van Ballegooie Juliya Borie
  71. 71. References Bruce, T.R., Hillmann, D.: Metadata in Practice, Chap. The Continuum of Metadata Quality: Defining, Expressing, Exploiting. pp. 238–256. ALA Editions, Chicago (2004) Laskowski, Mary S. 2016. "When Good enough is Not Good enough: Resolving Cataloging Issues for High Density Storage." Cataloging & Classification Quarterly 54 (3): 147-158. Riva, P., Le Bœuf, P. and Žumer, M. 2017. Consolidation Editorial Group of the IFLA FRBR Review Group ”IFLA Library Reference Model, A Conceptual Model for Bibliographic Information. Revised after World-Wide Review”, 94.