presentation for a workshop on cataloging medieval manuscripts with Debra Cashion, Sheila Bair and Sue Steuer which was held at the Rare Book and Manuscript Section (RBMS) of the Association of College and Research Libraries (ACRL) in Minneapolis, MN on June 27, 2013.
3. Planning metadata: collaboration and
responsibilities
“Metadata creation is an incremental process that should be a shared
responsibility among various parts of an institution.”
A Framework of Guidance for Building Good Digital Collections
• Existing metadata?
– Finding aids? Codicological description? Provenance?
• Subject specialists
– Description
• Technicians
– File format, extent, color management
• Administrators
– Access rights
• Users?
– Reviews, comments, tags
4. Planning metadata: documentation
• Documentation
–Best practices
–Local decisions, application profile
–“Data dictionary”
• Plan for the future
–Preservation and migration
–Maintenance
5. Planning metadata: what is “good”
metadata?
• Appropriate to objects in collection
• Appropriate to users and use
• Appropriate to system and resources
• Use of standards
• Interoperable and shareable
6. Good metadata: appropriate to
objects
Format(s) & file type(s)
– Images? Text?
– JPEG, XML files, MP3, MPEG, PDF
– More than one format in collection?
Images courtesy of Western Michigan
University Libraries
7. Good metadata: appropriate to objects
• Genre(s)
– Manuscripts? Maps? Cultural objects? Music?
– More than one genre in collection?
• Subject matter
Images courtesy of ArtStor
8. Good metadata: appropriate to users
• Who are your primary users?
– Medieval scholars? Undergraduate students?
• How will they expect to search?
– Searching skills?
• What will they be looking for?
• What “language” do they speak?
– Community of practice? Vocabulary?
9. Communities of Practice & Metadata
• Library community
– Mission: access, description, organization
– Shared records using shared standards
• Museum community
– Mission: outreach, education, interpretation
– Records created primarily for internal use
• Archives community
– Mission: archive, preservation
– Collection-level records, finding aids
• Research and education community
– Mission: research and collaboration
– Shared records using a variety of standards
10. Good metadata: appropriate
to intended use
• How do people use it now?
• Education or research?
• What are their expectations?
• What is their interest in the material?
– Botany, hagiography, art, language, music?
• What are other ways it may be used in the
future?
11. Image courtesy of Western Michigan
University Libraries
How will it be used?
• Example of pre-printing
press, handmade book
• Study of artwork, pigments,
symbolism
• Study of paleography
• Study of the text – grammar,
words & word usage
• Study of the text – people,
places, subjects
• Comparison to other
manuscripts – for textual
variants, relationships
between copies, identifying
scribes or artists
12. Good metadata: appropriate to system &
resources
• System
– CONTENTdm
– Luna Insight
– DLXS
– DSpace
• Resources
– One-time grant money vs. budget line-item
– Knowledge, skill, time of people
– Availability of existing metadata
13. Metadata for images vs. text
• Image
– Metadata is everything
• Text
– Transcription and
markup
• Text as image
– Image of the manuscript
page
– Full-text in metadata
Image courtesy of ArtStor
15. Describing images
Aboutness
• What is the meaning of the work?
• What is expressed by the work?
• What do the objects, events, etc., depicted
in the work symbolize?
• How may the image be interpreted?
• What was the intention of the work’s
creator?
• How has the work been interpreted
historically?
Image courtesy of ArtStor
17. Schemas or Element sets
• Dublin Core
• VRA (Visual Resources Assoc. Core)
• TEI (Text Encoding Initiative)
• EAD (Encoded Archival Description)
• MARC (MAchine-Readable Cataloging)
18. Simple Dublin Core
– Title
– Creator
– Subject
– Description
– Publisher
– Contributor
– Date
– Type
– Format
– Identifier
– Source
– Language
– Relation
– Coverage
– Rights
19. Expanded/Qualified Dublin Core
• Accrual Method
• Accrual Policy
• Accrual Periodicity
• Audience
• Instructional Method
• Provenance
• Rights Holder
• Description
– Abstract
• Identifier
– Bibliographic citation
• Relation
– Is Part Of
– Is Referenced By
• Title
– Alternative title
20. What is an application profile?
“There is no ‘one-size-fits-all’ metadata schema” Tony Gill, et al.
• Draw elements from more than one set
• Tailor set of elements to serve your user
requirements
• Document decisions, provide guidelines for
use
21. Premodern Manuscript Application
Profile
• Adds medieval manuscript description fields from
ENRICH
• Can be used as teaching tool
– http://web.library.wmich.edu/DIGI/reference/PMAP_D
ata_DictionaryTOC.pdf
• Audience
– Catalogers who are not medievalists
– Researchers who are not technicians
• Easy to use with CMS like CONTENTdm
– Will be included in 6.5 release
21
22. PMAP Elements
• Manuscript Identifier (R)
• Title (R)
• Incipit (O)
• Author (M)
• Origin Date (M)
• Origin Location (M)
• Description (R)
• Provenance (M)
• Manuscript Parts (O)
• Explicit (O)
• Secundo Folio (O)
• Extent (O)
• Subject (O)
• Dimensions (O)
• Material (O)
• Collation (O)
• Foliation (O)
• Binding (O)
• Decoration Description (O)
• Contributor (O)
• Description of Hands (O)
• Musical Notation (O)
• Additions and Marginalia (O)
• Relation-Is Part Of (RA)
• Publisher (R)
• Date-Issued (R)
• Type (R)
• Format (R)
• Format-Extent (RA)
• Identifier (R)
• Relation-Is Referenced By (O)
• Rights (RA)
22
23. Required Elements
• Manuscript Identifier
• Title
• Description
• Publisher
• Date-Issued
• Type
• Format
• Identifier
Image courtesy of WMU
31. Where can you find metadata?
• Catalog entries
• Seller’s descriptions
• Provenance
– Schoenberg Database of Manuscripts
http://dla.library.upenn.edu/dla/schoenberg/i
ndex.html
31
32. Controlled vocabularies
• Library of Congress Authorities
• Art & Architecture Thesaurus (AAT)
• Union List of Artist Names (ULAN)
• ICONCLASS
• Library of Congress Thesaurus for Graphic
Materials (TGM)
• DCMI Type Vocabulary
33. Why use controlled vocabularies?
“Do it once, do it right (consistent schemas, controlled vocabularies), and
you can re-purpose metadata in a wide variety of ways.” Murtha Baca
• Improve search retrieval
– Precision – how many retrieved records are relevant?
– Recall – how many relevant records retrieved?
• Database organization
– Allow for preset searches, lists of categories
• Name disambiguation
– People, places, organizations
34. Differences in vocabularies-meaning
"You keep using that word. I do not think it means what you think it
means.“ – The Princess Bride
• Initials – RBMS
– Provenance
evidence
• Initials – AAT, TGM
& LCSH
– Layout feature
Image courtesy of Artstor
36. Differences in vocabularies-Interoperability
ULAN: Buonarroti, Michelangelo (Italian sculptor,
painter, and architect, 1475-1564)
LCNAF: Michelangelo Buonarroti, 1475-1564
AAT: Illuminations
LCSH: Illumination of books and
manuscripts
37. Content standards & Best Practices
• Descriptive Cataloging of Ancient, Medieval, Renaissance,
and Early Modern Manuscripts (AMREMM)
• Cataloging Cultural Objects (CCO)
• Descriptive Cataloging of Rare Books
• CDP Dublin Core Metadata Best Practices
• TEI Guidelines for Electronic Text Encoding and Interchange
• Best Practices for CONTENTdm and Other OAI‐PMH
Compliant Repositories: Creating Sharable Metadata,
Version 3.0
38. Why are standards important?
• Interoperability
– “The goal of interoperability is to help users find and access
information objects that are distributed across domains and
institutions.” NISO
• Agreed upon terminology
– antiphoner, antiphonal, antiphonies, antiphonary
• Easier to share data
– OAI harvesting
– Digital Scriptorium
– Medieval Electronic Scholarly Alliance
– Semantic Web/Linked Data
38
39. Linked Data – Tim Berners Lee (2006)
• Use URIs as names for things
• Use HTTP URIs so that people can look up those
names.
• When someone looks up a URI, provide useful
information
• Include links to other URIs so that they can
discover more things.
40. Planning for the Semantic Web
40
http://lod-cloud.net/versions/2011-09-19/lod-cloud.html
43. Planning for the future: Use standard
vocabularies
“In order to make it easier for applications to
understand Linked Data, data providers should use
terms from widely deployed vocabularies to
represent data wherever possible.”
Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web
into a Global Data Space
43
44. Record for digital object
• Title: Hymnal for the
Sanctoral Cycle and
Common of Saints, f. 150 r.
• Origin Location: Abbazia
di Morimondo
• Contributor: Reoldus,
Bertramus, 13th century-
14th century (scribe)
• Subject: Antiphonaries
44
45. Linked data: breaking record into
data
Reoldus, Bertramus, 13th
century-14th century
Antiphonaries
45
has
contributor
has
subject
has
location
Abbazia di Morimondo
51. Questions?
Sheila Bair, Metadata & Cataloging
Librarian
sheila.bair@wmich.edu
Susan Steuer, Head of Special Collections
susan.steuer@wmich.edu
Western Michigan University Libraries
Image courtesy of WMU Libraries
Each triple represents a statement of a relationship
Each triple represents a statement of a relationship
In selecting vocabularies for reuse the following criteria should be applied: Usage and uptake – is the vocabulary in widespread usage? Will using this vocabulary make a data set more or less accessible to existing Linked Data applications? Maintenance and governance – is the vocabulary actively maintained according to a clear governance process? When, and on what basis, are updates made? Coverage – does the vocabulary cover enough of the data set to justify adopting its terms and ontological commitments? Expressivity – is the degree of expressivity in the vocabulary appropriate to the data set and application scenario? Is it too expressive, or not expressive enough?
Breaks up lump of data which is the record into data components which can be reusedOne way it does this by using Resource Description Framework (RDF) triples which include a Subject, a Predicate, and an ObjectRDF = a language for representing information about resources in the World Wide Web
Machine-readable identifiers/URIs
Transforming the Subject, Predicate, and Object into unambiguous identifiers