VRA 2012, Cataloging Case Studies, Metadata Magic
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

VRA 2012, Cataloging Case Studies, Metadata Magic

  • 1,000 views
Uploaded on

Presented by Mary Alexander at the Annual Conference of the Visual Resources Association, April 18th - April 21st, 2012, in Albuquerque, New Mexico. ...

Presented by Mary Alexander at the Annual Conference of the Visual Resources Association, April 18th - April 21st, 2012, in Albuquerque, New Mexico.

The Cataloguing Case Studies session will explore metadata migration, workflows, cloud computing, and tagging and how they can be applied to digital collections. Mary Alexander of the University of Alabama will present on the second of two migrations that have taken place at the University of Alabama Libraries and the importance of metadata schema and workflows in that process. Joshua Polansky of the University of Washington will describe his automated workflow using optical character recognition (OCR), Apple Automator, and Microsoft Excel to speed the process of collecting metadata for 75,000 digital assets. Elizabeth Berenz of ARTstor will look at the advantages of cloud based software for image management using Shared Shelf as a working example. And finally Ian McDermott will demonstrate the advantages of expert tagging and annotation in improving metadata. His presentation will focus on two ARTstor collections that could benefit from the knowledge of the larger ARTstor community: the Gernsheim Photographic Corpus of Drawings and the Larry Qualls Archive of contemporary art exhibitions.

MODERATOR:
Jeannine Keefer, University of Richmond, VA

PRESENTERS:
Mary Alexander, University of Alabama
Elizabeth Berenz, ARTstor
Ian McDermott, ARTstor
Joshua Polansky, University of Washington

More in: Education , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,000
On Slideshare
1,000
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Mary Alexander is one of three metadata librarians at the University of Alabama Libraries where she has been employed since 1993. In 2003, she transition to a new position working with emerging metadata schemas and their related standards.
  • Across the nation digitization efforts were in production when the University of Alabama Libraries and the two large state institutions began to write their first grant to digitize archival collections.We had experienced personnel in archives, cataloging, and systems. We were rich in archival collections and limited in staffing and resources.
  • The grant was awarded in 2003. A metadata group representing the three institutions collaborated on a best practice document. The Dublin Core metadata schema was chosen based on its usage with large collaborative projects. While the group was discussing essential schema elements, individually the institutions proceeded in their efforts.The UA Libraries was already in possession of the digital management system, Endeavor’s Encompass. We began.
  • Within a short time after starting the project, Endeavor was acquired by a competing company. Encompass would no longer be supported.OCLC’s CONTENTdm was recommended as the next system for UA Libraries Digital Collections. A consultant was hired to get data out of Encompass. The exported data was delivered in a tab-delimited file.Those involved with the state-wide grant also choose CONTENTdm based on its connectivity with other sites. It provided a way for state collections to be discovered through one web site. The state-wide group discussed standards for digitization and metadata. The Dublin Core Metadata Best Practices by the CDP Metadata Working Group of the Collaborative Digitation Program was adopted.
  • In CONTENTdm, there was a digital collection for every physical collection.The Qualified Dublin Core schema had been implemented in all collections.Each collection had fields tailored to the collections need. Number of fields ranged from 17 to 27. Production was driven by a workflow based on spreadsheets. Each collection had a corresponding spreadsheet. Many of the collection’s fields were repeated elements. Each creator element had a unique display label that reflected the role of the person or corporation. Author, composer, artist, photographer are some of the roles/display labels. These were used as the header for columns in the spreadsheet.
  • Over a 5 year period, there were 28 collections in CONTENTdm each with a slightly different set of data fields. There were approximately 6,972 digital objects and records in CONTENTdm. A license plateau was quickly approaching.
  • The Cataloging and Metadata Services Department and the Digital Services Unit decided that a master list of fields should be adopted across collections. This would enable CONTENTdm collections to be combined resulting in fewer collections. Fields usage was analyzed to their frequency and definitions. Common fields were kept. The least used fields were discarded.A master list was developed. New fields were added for creating finding aids or EAD records for special collections. The display labels for field names were applied to a master spreadsheet. Existing CONTENTdm collections were modified to reflect the master data dictionary. When this process was completed, 28 collections were reduced to 12 collections. CONTENTdm collections were called containers locally and renamed using time spans significant to the holdings of the William Stanley Hoole Special Collections Library. One additional collection was added for University of Alabama digital objects.
  • The UA digital planning group looked forward to build a digital program.Institutional repositories and born-digital resources that would populate them were hot topics!Emerging schemas (MODS, a descriptive metadata schema; METS, a structure metadata schema, and EAD, a descriptive metadata schema for archival collections) were receiving a lot of attention.The Association of Research Libraries continue to emphasize exposing hidden collections.Mass digitization was being practice.With these factors and more, UA Libraries wanted to be in position to implement these possibilities. A Digital Services Unit was created to digitize and manage these collections.
  • Combining collections in CONTENTdm delayed reaching the next license plateau. The potential of digital objects filling CONTENTdm was a fact.The Digital Services Unit and the Library Office of Technology decided on moving away from CONTENTdm and Dublin Core. Plans were under way for digital objects and their metadata records to reside in a directory. A display and retrieval system would be built over SQL database for users.The metadata workflow was changed. Now the master spreadsheet begun with the archivists. After titles, names, dates, and other description information was added, the spreadsheets would move to the Digital Services Unit for the file names to be added as a step in the digitization process. The spreadsheets would be transformed to MODS records producing a preliminary record. This temporary record provided some access to resources while the Metadata Unit reviewed the spreadsheet for quality control, establishing names, adding abstracts, and assigning subject and genres terms.
  • The master spreadsheet field was mapped to MODS using EXCEL for a visual crosswalk.The next step was to create a style sheet (xls) for the spreadsheet conversion to MODS records.
  • This is the template used for the conversion of files to MODS records.
  • Archivist Utility uses a style sheet template and tab-delimited files derived from the spreadsheets to create raw MODS records. AU was created by a programmer working with the Digital Services Unit leader. Data could be viewed as text or as a MODS record. These preliminary views provide an opportunity for quality control. The error log reports unused or missing columns from tab-delimited files. Errors are corrected in the spreadsheet and loaded again into the utility. When the librarian is satisfied with the records, they are saved to a folder for the next step.[The is the icon for the Archivist Utility. The University’s mascot is Big Al, a elephant.]
  • The raw MODS records only contained data in the top-element MODS tags.Very simple edits were performed before loading to the server.
  • A second metadata librarian with hired. She wrote scripts to pull data for names and subjects files, to replace data, and to transform rawMODS data into robust MODS records.
  • Names were collected from the “name” columns of the spreadsheet to a tab-delimited file using a Python script. The processing included searching names in viag.org. If found, the viaf and LCCN columns would be populated. Notes stating search results for each name would be given. After reviewing this file, a metadata librarian would run a script to place the names and numbers into a MADS file.
  • Subject processing uses a Window powershell script. It pulls the subjects into a master list for tagging names, topics, geographic, events, genres, occupations, and other headings. After tagging, another script is run to replace subject headings with their tags into the tab-delimited file derived from the spreadsheet.
  • PURLs, Persistent Uniform Resource Locators, are needed for the digital object and the digital collection. A master list for collection-level PURLS were collected monthly and passed to the Metadata Services Unit.A Python script is used to pull identifiers from spreadsheets for the purpose of gathering PURLs for digital objects from the server. The results are provided in xml list.
  • A transformation puts the pieces together.The rawMODS records are in a project folder used in xmlSpy. A XSLT transformation populates the rawMODS records from the generated files created earlier. The authority attribute used with the name tag is populated from the MADS files. From the xml list, PURLS are added to their appropriate identifier tags. The subject tags (#c, #p, #x, etc.) are recognized so sub elements for name, topic, geographic, and other headings are assigned.The final step includes validation through Schematron using the item-level PURLs. When the Schematron validation is free of errors, final MODS records are loaded to the server replacing preliminary MODS records.
  • By moving to adirectory structure, we are able to implement OAI, ETDs, and EADS.This structure allows flexibility to adopt other schemas.It is important to know the standards.Knowing how to manipulate large amount of data is invaluable!
  • Please visit Acumen. Thank you!

Transcript

  • 1. Metadata MagicBy Mary S. AlexanderMetadata Librarian, The University of Alabama Libraries Given by Jeannine Keefer Moderator VRA Conference Case Studies IV March 21, 2012
  • 2. The beginning of digitization
  • 3. Dublin Core• 15 elements• element qualifiers• alpha tags• broad definitions• easy to use• optional elements.• repeatable elements
  • 4. Moving from Endeavor toCONTENTdm
  • 5. Data in CONTENTdm• <dc.title>• <dc.creator>• <dc.creator> displayLabel=“Artist”• <dc.creator> displayLabel=“Author”• <dc.creator> displayLabel=“Sender”• <dc.date>• <dc.publisher>• <dc.format>• <dc.coverage>• <dc.contributor>
  • 6. Production growth
  • 7. Display labels for masterspreadsheet• Title• Other title• Cover title• First Line of Text• First Line of Chorus• Masthead• Series Title• Special Issue• Title from plate• Subjects(s)• And more
  • 8. Catalysts for change• Institutional repositories Born-digital resources Electronic thesis and dissertations• More metadata schemas Metadata Object Description Schema (MODS) Metadata Encoding and Transmission Standard (METS) Encoding Archival Description (EAD)• Emphasizes on discovery of hidden collections• Mass digitization
  • 9. Local changesLocal development for a web-based search and retrieval systemwas favored. It is now know as Acumen.Mass digitization and processing workflows would beimplemented.MODS would be used for descriptive metadata schema.
  • 10. DC to MODS crosswalkdisplayLabel DC element MODS elementTitle Title <title>Other Title Title-alternative <title type=“alternative”Subject(s) Subject <subject authority=“lcsh”Description Description <description>Creator(s) Creator <name type=‘personal’>Author(s) Creator <name type=“personal”>Editor Contributor <name type=“personal”>
  • 11. Creating MODS<?xml version="1.0" encoding="UTF-8" ?><mods xmlns="http://www.loc.gov/mods/v3"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://www.loc.gov/mods/v3http://www.loc.gov/standards/mods/v3/mods-3-4.xsd"xmlns:xlink="http://www.w3.org/1999/xlink" version="3.4">- <titleInfo> <title>{{Title}}</title> <subTitle>{{Subtitle}}</subTitle> <partNumber>{{Part Number}}</partNumber> <partName>{{Part Name}}</partName>….
  • 12. Archivist Utility
  • 13. Editing rawMODS<modsCollection xsi:schemaLocation="http://www.loc.gov/mods/v3http://www.loc.gov/standards/mods/v3/mods-3-3.xsd"><mods><titleInfo displayLabel="title"> <title>Cotton lint</title></titleInfo><name type="personal"> <namePart>Smith, Marjorie L.</namePart><role> <roleTerm authority="marcRelator" type="text">Photographer </roleTerm></role></name>
  • 14. Real magic, scripting
  • 15. Name processingName Role Type Authority VIAF_ID LCCN Source History CommentsCoffman, J. Henry Sender(s) personal local no matchesKing, Helen Sender(s) personal local check VIAF againKing, Margaret Recipient(s) personal local 1+ matches--check VIAF againKing, Robert S. Sender(s) Personal naf 76561235 nb2004005427
  • 16. Subject processingSubject Master list Tagging keyIncorporation#x--West Virginia#z--Ohio County#z #x topics #z geographic locationIngram, T.#p--Finance, Personal#x #p personal nameJefferson County (Ala.). Tax collector#c #c corporate name
  • 17. PURLS• <Root>• <Row>• <identifier>u0003_0000520_0000001</identifier>• <purl>http://purl.lib.ua.edu/20394</purl>• </Row>• <Row>• <identifier>u0003_0000520_0000002</identifier>• <purl>http://purl.lib.ua.edu/20424</purl>• </Row>• <Row>• <identifier>u0003_0000520_0000003</identifier>• <purl>http://purl.lib.ua.edu/20425</purl>• </Row>• </Root>
  • 18. More magic<name type="personal" authority="naf"> <namePart>King, Robert S.</namePart></name><subject authority="lcsh"> <topic>Debtor and creditor</topic></subject><identifier type="local">u0003_0002865_0000001</identifier><identifier type="uri">http://purl.lib.ua.edu/35547</identifier>
  • 19. Practical lessons learned• By moving to the directory structure, we are able to implement OAI, ETDs, and EADS.• This structure allows flexibility to adopt other schemas.• It is important to know the standards. Knowing how to manipulate large amount of data is invaluable! Current records and digital objects available: Items available: 74197 Scans available: 282091
  • 20. acumen.lib.ua.eduMary S. AlexanderMetadata LibrarianCataloging and Metadata Services Dept.University of AlabamaTuscaloosa, AL 35487malexand@ua.eduvoice: 205-348-1490