Harmonizing legacy metadata for the future

•Download as PPTX, PDF•

0 likes•378 views

The Core Metadata Project at the University of Wisconsin-Milwaukee aimed to harmonize legacy metadata across digital collections for improved discovery and interoperability. The project involved evaluating existing metadata practices, documenting schema mappings, identifying fields for standardization, remediating collection records, and creating documentation. The end goal was to produce metadata that better conforms to external guidelines and enables records to be more shareable through channels like the Digital Public Library of America.

Education

Core Metadata Project:
Harmonizing legacy
metadata for the future
NATHAN HUMPAL, CATALOG AND METADATA LIBRARIAN
ANN HANLON, DIGITAL COLLECTIONS AND INITIATIVES
UNIVERSITY OF WISCONSIN-MILWAUKEE

Digital Collections at UWM
Documentation

Digital Collections at UWM
Documentation
• Geographic headings (Getty)
• Thesaurus for Graphic Materials (LOC)
• Reuse of existing metadata templates
• Inconsistency across collections (consistency within)

Digital Collections at UWM
Discovery Layers

Digital Collections at UWM
Organizational Structure
New front page and landing pages
External outreach

Core Metadata Project
Problems
bbbb
Legacy metadata
Workflows
Documentation

Core Metadata Project
Pre-planning and evaluation
◦What Dublin Core fields did we use?
◦What field names did we use?
◦What field names were mapped to what Dublin
Core elements?

Core Metadata Project
dct:type dct:format
dct:medium

Core Metadata Project
DCMI Type
• Moving Image
• Sound
• Spoken Word
• Still Image
• Cartographic Image
• Notated Music
• Text

Core Metadata Project
External Guidelines
• DPLA (Recollection Wisconsin)
• RDA
• Dublin Core
• ContentDM
• Linked Open Data

Core Metadata Project
Identification of quick wins

Core Metadata Project
Documentation
I wrote it in the documentation
so I wouldn’t have to
remember!

Core Metadata Project
Core Metadata
Fields document

Core Metadata Project
Collection-level remediation

Next Steps
Application Profiles
◦Type=Genre

Core Metadata Project
Conclusion
◦Improved metadata (especially for
interoperability)
◦Better documentation

Core Metadata Project
Thank you!
Ann Hanlon: hanlon@uwm.edu
Nathan Humpal: nahumpal@uwm.edu

Similar to Harmonizing legacy metadata for the future

The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...Docker, Inc.

Pitts Library Digitization Initiativesjbweave

About Scanning and Metadata Standards - NEMO 2010University of Connecticut Libraries Map and Geographic Information Center - MAGIC

Interdisciplinary Processes at the Digital Repository of Irelanddri_ireland

Born Again: The Digitisation of the Anthropology Photographic Archive. 2004Rose Holley

การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กรDr. Thiti Vacharasintopchai, ATSI-DX, CISA

An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell

Digitization revealed (2018 NLA Annual Conference)Marina Georgieva

EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube

2-21-12 Preservation Planning Success Stories SlidesDuraSpace

Semantic Web related top conference reviewGong Cheng

Wikidata Introductory WorkshopBeat Estermann

Chemical Databases and Open Chemistry on the DesktopMarcus Hanwell

Incentivising the uptake of reusable metadata in the survey production processLouise Corti

GKS Uploads From Museum Databaserdlaurin

Geo-referenced human-activity-data; access, processing and knowledge extractionConor Mc Elhinney

Overview of Lincoln Paper Designpbajcsy

Digital projects best practices [xxxiii reunión nacional de archivos 201111]Frederick Zarndt

IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann

Towards INSPIRE environmental 5* Open Data Martin Tuchyna

Similar to Harmonizing legacy metadata for the future (20)

The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...

Pitts Library Digitization Initiatives

About Scanning and Metadata Standards - NEMO 2010

Interdisciplinary Processes at the Digital Repository of Ireland

Born Again: The Digitisation of the Anthropology Photographic Archive. 2004

การประยุกต์ใช้ DSpace Open Source ในการจัดการความรู้ขององค์กร

An analysis and characterization of DMPs in NSF proposals from the University...

Digitization revealed (2018 NLA Annual Conference)

EarthCube Monthly Community Webinar- Nov. 22, 2013

2-21-12 Preservation Planning Success Stories Slides

Semantic Web related top conference review

Wikidata Introductory Workshop

Chemical Databases and Open Chemistry on the Desktop

Incentivising the uptake of reusable metadata in the survey production process

GKS Uploads From Museum Database

Geo-referenced human-activity-data; access, processing and knowledge extraction

Overview of Lincoln Paper Design

Digital projects best practices [xxxiii reunión nacional de archivos 201111]

IASSIST 2012 - DDI-RDF - Trouble with Triples

Towards INSPIRE environmental 5* Open Data

Recently uploaded

Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande

Interactive Powerpoint_How to Master effective communicationnomboosow

The basics of sentences session 2pptx copy.pptxheathfieldcps1

Advanced Views - Calendar View in Odoo 17Celine George

Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1

Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732

mini mental status format.docxPoojaSen20

Arihant handbook biology for class 11 .pdfchloefrazer622

Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB

Mattingly "AI & Prompt Design: The Basics of Prompt Design"National Information Standards Organization (NISO)

Software Engineering Methodologies (overview)eniolaolutunde

Paris 2024 Olympic Geographies - an activityGeoBlogs

TataKelola dan KamSiber Kecerdasan Buatan v022.pdfSarwono Sutikno, Dr.Eng.,CISA,CISSP,CISM,CSX-F

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"National Information Standards Organization (NISO)

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD

Measures of Central Tendency: Mean, Median and ModeThiyagu K

How to Make a Pirate ship Primary Education.pptxmanuelaromero2013

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxRAM LAL ANAND COLLEGE, DELHI UNIVERSITY.

Recently uploaded (20)

Web & Social Media Analytics Previous Year Question Paper.pdf

Interactive Powerpoint_How to Master effective communication

The basics of sentences session 2pptx copy.pptx

Advanced Views - Calendar View in Odoo 17

Employee wellbeing at the workplace.pptx

Separation of Lanthanides/ Lanthanides and Actinides

mini mental status format.docx

Arihant handbook biology for class 11 .pdf

Beyond the EU: DORA and NIS 2 Directive's Global Impact

Mattingly "AI & Prompt Design: The Basics of Prompt Design"

Software Engineering Methodologies (overview)

Paris 2024 Olympic Geographies - an activity

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"

Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...

Measures of Central Tendency: Mean, Median and Mode

How to Make a Pirate ship Primary Education.pptx

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx

Harmonizing legacy metadata for the future

1. Core Metadata Project: Harmonizing legacy metadata for the future NATHAN HUMPAL, CATALOG AND METADATA LIBRARIAN ANN HANLON, DIGITAL COLLECTIONS AND INITIATIVES UNIVERSITY OF WISCONSIN-MILWAUKEE

2. Digital Collections at UWM Scaling up

3. Digital Collections at UWM Documentation

4. Digital Collections at UWM Documentation • Geographic headings (Getty) • Thesaurus for Graphic Materials (LOC) • Reuse of existing metadata templates • Inconsistency across collections (consistency within)

5. Digital Collections at UWM Discovery Layers

6. Digital Collections at UWM Organizational Structure New front page and landing pages External outreach

7. Digital Collections at UWM Platforms

8. Core Metadata Project Problems bbbb Legacy metadata Workflows Documentation

9. Core Metadata Project Pre-planning and evaluation ◦What Dublin Core fields did we use? ◦What field names did we use? ◦What field names were mapped to what Dublin Core elements?

10. Core Metadata Project

11. Core Metadata Project dct:type dct:format dct:medium

12. Core Metadata Project DCMI Type • Moving Image • Sound • Spoken Word • Still Image • Cartographic Image • Notated Music • Text

13. Core Metadata Project External Guidelines • DPLA (Recollection Wisconsin) • RDA • Dublin Core • ContentDM • Linked Open Data

14. Core Metadata Project Identification of quick wins

15. Core Metadata Project Documentation I wrote it in the documentation so I wouldn’t have to remember!

16. Core Metadata Project Core Metadata Fields document

17. Type requirements document

18. Core Metadata Project Remediation

19. Core Metadata Project Collection-level remediation

20.

21. Core Metadata Project

22.

23. Core Metadata Project Quality Control

24. Next Steps Date

25. Next Steps Application Profiles ◦Type=Genre

26.

27.

28. Core Metadata Project Conclusion ◦Improved metadata (especially for interoperability) ◦Better documentation

29. Core Metadata Project QUESTIONS ?

30. Core Metadata Project Thank you! Ann Hanlon: hanlon@uwm.edu Nathan Humpal: nahumpal@uwm.edu

Editor's Notes

Digital Collections: The Libraries at UWM have been creating digital projects since 2002 - there are now fifty-four digital collections available with more than 130,000 digital objects, and more on the way.
Scaling Up: When I arrived in 2012, we were gearing up for some major additions to our digital collections, and focusing on comprehensive, or nearly so, digitization of entire collections. We had one project underway and another in the planning stages. Both of those collections – which ultimately comprised over 85,000 digital objects combined – were image-based photography collections. In the years that followed, we also worked with Archives in a major push to digitized their oral histories and selections from their WTMJ newsfilm collection; we added a collection of Yiddish Posters, a collection focused on Latino Activism at UWM, and a collection of Chinese scrolls from our Special Collections – all collections where we created bilingual metadata records; and we added two newspaper collections – the UWM Post and an underground newspaper from the late 1960s, the Kaleidoscope. So our collection building not only scaled up, but was wide-ranging in terms of format, subject matter, and audience.
Documentation: While we had some documentation for our metadata creation, it was collection-specific and didn’t take into account the myriad kinds of collections we were creating.
Our best documentation was focused on creating geographic subject headings for the images we were digitizing from our American Geographical Society Library. This was necessary given the nature of those images, as well as the difficulty of locating and assigning accurate geographic headings. But other than that, we really were just sticking to a set of controlled vocabularies for subject headings – primarily the LOC’s Thesaurus for Graphic Materials, and reusing basic metadata templates to ensure some consistency in the way we described the original repository and location. But things like date, format, type, etc were really all over the place across collections, even if they were consistent within collections.
Discovery Layers: One major driver toward creating more consistent documentation – and harmonizing our metadata across collections, is the proliferation of alternative discovery layers for our digital collections. For years we’ve been contributing collections to Recollection Wisconsin, which has been pretty forgiving in terms of metadata consistency. Two more recent developments – the adoption of Ex Libris’ Primo/Alma for our ILS, and the opportunity to make our digital collections discoverable in the Primo interface, and the addition of our materials to the Digital Public Library of America, have prompted us to examine our metadata practices more closely, especially with regard to fields that are useful for faceting, like Type, especially.
Organizational Structure: Finally, as our collections grew, so did our department’s goals and mission. We updated our landing pages and we’re creating additional context for our collections, in partnership with Archives, AGSL, and Special Collections. We have begun to emphasize our external outreach as well, working through our Digital Humanities Lab to integrate use of our digital collections into the classroom and to discover new ways to use those materials for research. And we’ve also started a project simply to clean up the files created over the past fifteen years in order to better organize our documentation for the purposes of digital preservation as well as to set us up for more efficient project planning and training in the future.
Platforms: We’re also keeping an eye on developments with regard to digital asset management systems, data models, linked data, image exchange protocols, and data exchange protocols, among other things. We’re currently using CONTENTdm to host our digital collections and expect to for the near-term, but we are invested in following development of the Hyku platform – an open-source platform based on a Fedora/Hydra stack and that is being developed as an out-of-the-box system by DPLA, Stanford, and DuraSpace.
Problem(s) (Ann): So that’s where we’re coming from. Conceptually, I was really focused on three issues that needed attention in order for us to continue working at scale with our digital collections and to make sure we were in shape for possible future migrations and to accommodate new avenues for discovery, not to mention new staff, new students, and other unexpected twists and turns. So my concerns were (and are): Working with a lot of legacy metadata: fifteen years of metadata and metadata experiments, all created to serve the purposes of each collection without a consistent focus on the collections as a whole, or on how that metadata functioned in external environments Ensuring we have consistent workflows for digital collection building and for developing metadata And updating – or in some cases, creating – adequate documentation to ensure consistency not just within collections, but across collections; and to help with training, too.
Before starting, I wanted to get a sense of the metadata across all of the collections. I wanted to answer three basic questions: What Dublin Core fields did we use and how often? What field names did we use and how often? What field names were mapped to what Dublin Core elements. Because ContentDM is structured as multiple collections, it doesn't offer a very good way of answering these questions within its client. Each collection is essentially siloed from the other collections, so cross collection analysis isn't really an option. Instead I had to extract the data and do a little bit of massaging in order to get some answers.
I built a spreadsheet with columns for collection name, field name, and Dublin Core mapping. Then I created several pivot tables to get a sense of what I was working with. I was able to determine what DC elements were used in what collections. This helped in figuring out which fields were important (for instance 'Contributors') and which were probably unimportant (for instance 'Audience').
We also decided to come up with some broad categories of the types of materials that a collection might have to help us organize how we treated those different material types. What I found while looking through our past metadata was a lot of confusion between Dublin Core Type, Format, Medium. We decided to ensure that there would be more consistency with how these elements were used and what vocabulary was used within them.
As a result I focused on organizing a lot of the subsequent documentation around DCMI Type, though we had to augment those categories with some RDA content types. We’ll get a little more into how this played out when I talk about documentation.
We wanted to stick to external guidelines as much as possible, so that we could be on the same page as other institutions. DPLA (Recollection Wisconsin, Madison pipe). The Recollection Wisconsin pipe has certain requirements so that it can consistently transform data to adhere to DPLA requirements. RDA. Looking towards RDA for guidelines for the data allows us to more closely adhere to how our physical material is being cataloged. Dublin Core. We looked at Dublin Core’s documentation to make sure that we were meeting their expectations for what the different fields were designed for. ContentDM. Of course, ContentDM itself creates restrictions on how we can adhere to all of the above guidelines. For instance, we can’t create particularly strong relationships within the data, especially across collections with ContentDM’s architecture. And, of course, it doesn’t allow for any linked open data content. *Click* That being said, we did try as hard as possible to set up an environment that would make a transition to a linked open data environment as painless as possible. We sought vocabularies that were published with LOD in mind, and kept track of the URIs for the terms we used.
problems with maps in Primo. One of the biggest impacts that we could identify was collocating our digital maps and our physical maps in Primo, our discovery layer. In order to do that we needed a consistent way to identify maps in ContentDM and make sure that Primo knew about it so it could facet that material in the same way that it faceted physical maps. This was one of the big reasons that we refined DCMI types to RDA content types in some cases. inconsistent type in contentDM. And as mentioned earlier, we noticed a lot of inconsistency in how Type and Medium were being used. Since resolving the map issue required dealing with Type, we decided that focusing on this inconsistency was probably the most important.
For our documentation I kind of came up with a three tiered approach: General metadata guidelines about what Dublin Core terms were required, more specific requirements by type of material, and then specific application profiles for each collection. Within there, we realized that there could be a broader category before specific collections if we knew that several collections were going to have similar elements: for instance, oral histories might have generalized guidelines.
Core metadata fields. *Open up Core Metadata document and navigate through it* The core metadata fields document required fields, required if applicable fields, optional, and fields that we shouldn’t use (and alternatives to those fields).
Required fields broken down by type. *Open Type Document* The required fields broken down by type, was then created so that when you are assessing the metadata needs of a new collection, you can figure out the different types of material and then create an application profile that uses the required fields from each different type of material in the collection. For instance, if we had a collection with Still Images and Text in it we’d need Date field Identifier field Rights field At least one Subject field Title field Type field labeled Type (DCMI), with either Still Image, or Text in its contents Description field with ‘Color/B&W’ as its label And a Description field with ‘Extent’ as its label. Note that these are only the required fields. So depending on the content, we might want a Creator field, a Medium field, or a Language field.
Before we started the actual remediation of data, we first needed to make sure that the fields in the different collections adhered to our guidelines. So Anne and I went through each of the collections maps in the Administration module and changed things around. We needed to be cognizant, of course of what we were changing and if that data needed to be remediated by a cataloger. So this field remediation served not just as an initial step in remediating the collections, but also as a triage in how to start remediating the collections.
Assigning catalogers: One of the great things about this project has been the chance to really work across departments – in this case, training the original catalogers to work with our digital collections. We divided up the collections according to size so that each cataloger got approximately the same number of records to work with, though some collections are more homogeneous than other collections, so workloads inevitably varied despite our best efforts.
Assigning catalogers: One of the great things about this project has been the chance to really work across departments – in this case, training the original catalogers to work with our digital collections. We divided up the collections according to size so that each cataloger got approximately the same number of records to work with, though some collections are more homogeneous than other collections, so workloads inevitably varied despite our best efforts.
Example (working within ContentDM client): Everyone needed to work with the CONTENTdm desktop client to do batch updates, or item-level updates. It’s an offline client so they could make changes and Nathan and I would approve the changes before they went live. Easy stuff included assigning a Type DCMI of Still Image to photographs – we have loads of them and it’s a pretty non-controversial designation Hard (granular vocabulary, compound objects): Harder stuff included assigning different types to multiple parts of a compound object; and assigning format descriptions could become more complicated than our vocabulary list indicated. But it was a group effort and we tried to avoid going too far down any rabbit hole.
Type, Medium, URIs, etc.: Vocabulary lists were important as there are many sources for controlled vocabularies that are legit for lots of different types of mediums in particular. For consistency we needed to agree on which term we preferred and why. That list is based on the DCMI Type to help organize terms logically. We built out terms initially that we knew dominate the collections. Things like “nitrate negatives” and “manuscripts”. Where we had far more variation – or growth of variety – was in the Type: Genre field.
Because of our running Vocabulary list, and the documentation we created, quality control was fairly straightforward. First we went through the field mapping to make sure that it conformed to what we had created in the documentation. Then we ensured that the vocabulary conformed to the vocabulary list we had created. That was pretty easy: since we had maintained a list as we went on, we could just create a vocabulary list in ContendDM and see if the field conformed to that list in each collection. It often didn’t, but that’s why we did Quality Control.
Further Remediation: So we still have more remediation to do, of course. Having finished up Medium and Type is great – and gets us pretty far, really, as these are fields that need consistency across collections and consistency with external collections as well. Date is our next frontier. Because we’re working primarily with archival collections, we have lots of items that have unknown dates, circa dates, and date ranges. Dates have been entered inconsistently over the last fifteen years and it’s another important field for sorting, faceting, discovery, and administration. We’re struggling here with some of the limitations of CONTENTdm and with the proper ISO form for date – Nathan can talk a bit more about that, but it’s the next step in our actual remediation efforts.
Application profiles for more specific genre types – such as oral histories, newspapers, photograph collections, etc – are another step that we’re eager to implement. So this will be another frontier in our ongoing quest to create useful and necessary documentation based on what we’ve been able to do in terms of remediation and harmonization.
We’ve had the good fortune to have some very talented students working for us over the past few years and one them, Charles Hosale – who has since moved on to a position at WGBH – created an application profile template for oral history collections that we will adapt for other genres. We think genre is the most logical way to create application profiles as most of the really significant differences in the way we structure and describe an object is based on its genre type. This is an example of the application profile Charles created…
And a little closer look to see that what we’re doing is indicating a field type, what it means, whether it’s required and repeatable and where to find the values that are allowed in this field, as well as where to map it for Dublin Core. So making our collections as predictable and consistent as possible.
Final thoughts: The core metadata project is ongoing. We’ve really only kicked it off, though with a lot of thought and systematic identification of the most impactful categories of data to remediate first – and so that’s why we’ve focused on Type and Format initially. The focus has been on creating consistency, coherence, and conformance to standards. And applying that to fifteen years worth of collections. We’re looking for two main outcomes: improved metadata for our digital collections that, especially, makes them function well not only in their native system, but in other portals and platforms, and alongside other materials as well; and better and complete documentation to ensure consistency going forward. Questions..
Questions..
Thank you

Harmonizing legacy metadata for the future

Recommended

Recommended

More Related Content

Similar to Harmonizing legacy metadata for the future

Similar to Harmonizing legacy metadata for the future (20)

More from WiLS

More from WiLS (20)

Recently uploaded

Recently uploaded (20)

Harmonizing legacy metadata for the future

Editor's Notes