Data Mapping: OAI Sheet
Music Harvester
Jenn Riley
Digital Media Specialist
Indiana University Digital Library
Program
Why was data mapping required?







OAI requires unqualified Dublin Core
Contributed data only needed to
support res...
Mapping inconsistencies
Dublin Core fields









Title
Creator
Subject
Description
Publisher
Contributor
Date
Type









Form...
Limitations to Dublin Core







Heavily slanted towards electronic
resources
No content standards enforced
Without q...
Existing metadata formats





MARC
Encoded Archival Description (EAD)
Dublin Core (DC)
Local custom formats
MARC


Library of Congress






some records in AACR2 MARC
many records in non-AACR2 MARC
already had data mapped “b...
EAD


Duke – item level finding aid






records weren’t contributed for phase 1
very robust and specific
conversion...
Dublin Core


UCLA – 4 types of DC records


songs


sheet music






covers et al

recordings

mapping basically o...
Local custom formats (1)









Johns Hopkins - Simple DTD
publication (location,
publisher, date)
subject
call ...
Local custom formats (2)









Indiana – simple database
title
composer
lyricist
place of publication
publisher...
Data inconsistencies




Different depths of description
Different levels of authority control
No common subject vocabu...
Some mapping issues









Field formatting important, not just contents
Choices heavily influenced by LC practic...
Outstanding issues







Authority control for names
Date formats
Data clean-up: what can be done at
harvester end a...
More information






Harvester site (still in development):
http://digital.library.ucla.edu/sheetmusic/
Jenn Riley, I...
Upcoming SlideShare
Loading in …5
×

Open Archives Initiative for Sheet Music: Data Mapping

325 views

Published on

Riley, Jenn. "Open Archives Initiative for Sheet Music: Data Mapping." TRUE/FALSE: Facsimiles, Fakes, Forgeries and Issues of Authenticity in Special Collections: The 44th Annual RBMS Preconference, Toronto, Canada, June 17-20, 2003.

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
325
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Unqualified DC required, but more robust formats also allowed. More on this later.
    Since the purpose of the harvester is discovery of resources, and a user is taken out of the harvester to view items at individual institutions, it was not necessary to force all of the information from each institution’s records into DC. We needed to define what was required for discovery only, not figure out how to squeeze every marc field into dc.
    The name “Dublin Core” reveals something about its purpose. It was designed to be a core set of metadata elements applicable to all types of resources. Thus it’s meant to be flexible, with a low entry barrier. This means the definitions of fields are open to wide interpretations. We needed to develop a single interpretation that all contributors followed to make searching and browsing more effective.
  • Search on Michigan’s OAIster for Einstein and format=image. Note rec. 1, 2 forms of name in author/creator, subject is a description. Note rec. 2, type=image, but very little indicates it’s a photo, weird note text, subjects stuck together in a single string
  • These are the 15 Dublin Core fields. None are required, all are repeatable.
    You’ll notice they’re pretty basic. That’s because they’re supposed to be “core.” Many of them are obvious inclusions – title, creator, description, publisher. Others are not so obvious from their names. “Type” is meant to be used to indicate a general category to which the resource belongs. Some suggested types are image, physical object, text, collection, and event. “Format” is for describing the physical or digital manifestation of a resource. You would record things like dimensions, duration, software needed in “format.”
    You’ll also notice that these terms are extremely generic – “creator” for example. Again, they must be this generic because Dublin Core is meant to be useable to describe any type of resource. You may look at this list and think these elements are TOO generic to be really useful. To try and change that, Dublin Core is defined in two types: qualified and unqualified. Unqualified is just using this list of fields, exactly as they are written here. Qualified Dublin Core allows the specification of a refinement of the meaning of the field or the encoding scheme used for the field. The first of these, refining the meaning of the field, would be used, for example, with “creator.” A qualifier could specify what role that creator had in the development of the resource. The second, specifying the encoding scheme, could be used, for example, with the “subject” field, to indicate the name of the controlled vocabulary from which a subject was taken. Unfortunately, OAI requires unqualified Dublin Core to be used, so the harvester couldn’t take advantage (at least in phase 1) of the greater specificity of description provided by qualifiers.
  • Even though unqualified DC is required for OAI, there are some reasons why it’s not the best metadata format to use for describing and searching sheet music collections.
    Although it doesn’t specifically limit its scope to online materials, DC was designed in the networked world. Field definitions tend to work better for networked materials. For example, the “format” element description suggests using internet mime types.
    Many of the DC fields have suggestions for controlled vocabularies to use. Internet mime types for format is one example. For subject, “recommended best practice” is to use terms from a CV, but no specific ones are identified. But DC itself does not require field values to conform to these suggestions.
    As we saw earlier, using unqualified DC doesn’t offer a great deal of specificity for describing resources. Sheet music has specific needs that DC can’t meet. For example, being able to distinguish between composers, lyricists, and arrangers is important for users of sheet music collections.
    Despite our efforts to clarify DC field definitions for use with the harvester, it is inevitable that the fields would be used differently by different institutions.
  • Duke (rare materials emphasis) and IU (little authority control) had records in MARC too, but these weren’t contributed in phase 1 of the project.
  • Very specific info, for example: subject type (LCSH, AAT, TGM), dedicatee, recordings available
  • Very specific for sheet music, not good even for other types of printed music.
  • No authority control
  • MARC has more fields than custom DBs, but custom DBs have more applicable fields. Many MARC records don’t have relator codes, so we don’t know who’s a composer and who’s a lyricist.
    Even if each individual collection was under name authority, they still might not interoperate. But the problem was much worse – there wasn’t even agreement on name order!
    Local subject vocabs in use, so even a complex mapping between LCSH, AAT, TGM wouldn’t solve the problem.
  • Open Archives Initiative for Sheet Music: Data Mapping

    1. 1. Data Mapping: OAI Sheet Music Harvester Jenn Riley Digital Media Specialist Indiana University Digital Library Program
    2. 2. Why was data mapping required?     OAI requires unqualified Dublin Core Contributed data only needed to support resource discovery Dublin Core field definitions need interpretation For efficient searching, data from different institutions must be consistent
    3. 3. Mapping inconsistencies
    4. 4. Dublin Core fields         Title Creator Subject Description Publisher Contributor Date Type        Format Identifier Source Language Relation Coverage Rights
    5. 5. Limitations to Dublin Core     Heavily slanted towards electronic resources No content standards enforced Without qualifiers, fields not granular enough for sheet music needs Field definitions open to interpretation
    6. 6. Existing metadata formats     MARC Encoded Archival Description (EAD) Dublin Core (DC) Local custom formats
    7. 7. MARC  Library of Congress     some records in AACR2 MARC many records in non-AACR2 MARC already had data mapped “based on” MARC to Dublin Core crosswalk not able to alter their mapping for participation in sheet music project
    8. 8. EAD  Duke – item level finding aid     records weren’t contributed for phase 1 very robust and specific conversion was relatively simple because data was converted to EAD from collection-specific database included virtually all information in EAD documents to DC records
    9. 9. Dublin Core  UCLA – 4 types of DC records  songs  sheet music    covers et al recordings mapping basically only required inheritance of songs and sheet music data elements down to the covers level
    10. 10. Local custom formats (1)         Johns Hopkins - Simple DTD publication (location, publisher, date) subject call num (box, item) title composer/lyricist/ arranger form of composition instrumentation         first line first line of chorus performer dedicatee engraver/lithographer/ artist advertisement plate num duplication
    11. 11. Local custom formats (2)         Indiana – simple database title composer lyricist place of publication publisher copyright first line       first line of chorus subject form of composition performance medium copies call #
    12. 12. Data inconsistencies    Different depths of description Different levels of authority control No common subject vocabulary between collections
    13. 13. Some mapping issues        Field formatting important, not just contents Choices heavily influenced by LC practice Can’t force institutions to comply Sheet music has many alternative titles Creator vs. contributor Plate numbers: they’re important, where to put and how to label? Uncertain dates and date ranges
    14. 14. Outstanding issues      Authority control for names Date formats Data clean-up: what can be done at harvester end and what must we ask data providers to do? What will more robust data format look like? How do we make it easier for more institutions to participate?
    15. 15. More information    Harvester site (still in development): http://digital.library.ucla.edu/sheetmusic/ Jenn Riley, Indiana University Digital Library Program: jenlrile@indiana.edu These presentation slides: http://www.dlib.indiana.edu/~jenlrile/rbms2003/

    ×