This document discusses tools and techniques for creating, maintaining, and distributing shareable metadata. It emphasizes that metadata should be structured to be understandable outside of local contexts and useful for other institutions. Key aspects of shareable metadata include using appropriate content and vocabularies, ensuring records are coherent, providing useful context, and conforming to standards. The document also outlines example workflows and considerations for making metadata shareable.
Tools and Techniques for Creating, Maintaining, and Distributing Shareable Metadata
1. Tools and Techniques forTools and Techniques for
Creating, Maintaining, andCreating, Maintaining, and
Distributing Shareable MetadataDistributing Shareable Metadata
Jenn Riley
Metadata Librarian
Indiana University Digital Library Program
2. What does this record describe?What does this record describe?<dc:identifier>http://museum.university.edu/unique identifier</dc:identifier>
<dc:publisher>State University Museum of Ichthyology, Fish Field Notes</dc:publisher>
<dc:format>jpeg</dc:format>
<dc:rights>These pages may be freely searched and displayed. Permission must be received for
subsequent distribution in print or electronically. Please go to http://museum,univeristy,edu/ for more
information.</dc:rights>
<dc:type>image</dc:type>
<dc:description>1926; 0070; 06; Little S. Br. Pere Marquette R.; THL26-68; 71300; 71301; 71302;
71303; 71304; 71305; 71306; 71307; 71308; 71309; 07; 1926/07/06; R12W; S09; Second collector
Moody; T16N</dc:description>
<dc:subject>Cottus bairdi; Esox lucius; Cottus cognatus; Etheostoma nigrum; Salmo trutta;
Oncorhynchus mykiss; Catostomus commersoni; Pimephales notatus; Margariscus margarita;
Rhinichthys atratulus; mottled sculpin; northern pike; slimy sculpin; johnny darter; brown trout;
rainbow trout; white sucker; bluntnose minnow; pearl dace; blacknose dace; bairdi; lucius; cognatus;
nigrum; trutta; mykiss; commersoni; notatus; margarita; atratulus; Cottus; Esox; Cottus; Etheostoma;
Salmo; Oncorhynchus; Catostomus; Pimephales; Margariscus; Rhinichthys; 1926-07-06; ; Boleosoma;
Salmo; Hyborhynchus; Semotilus; ; fario; gairdneri--irideus; atronasus--obtusus--meleagris</dc:subject>
<dc:language>UND</dc:language>
<dc:source>Michigan 1926 Langlois, v. 1 1926--1926; </dc:source>
Record harvested via OAI PMH 2-27-2007
5. Why we should careWhy we should care
Library/archive/museum data is useful
◦ Even when objects aren’t digitized
It’s our mission to distribute information
We should be leaders in the networked
information environment
We have good ideas, but others do too
We should therefore make it easier for
our data to be used by others
6. Shareable Metadata…Shareable Metadata…
Is quality metadata
Promotes search interoperability - “the
ability to perform a search over diverse sets
of metadata records and obtain meaningful
results” (Priscilla Caplan)
Is human understandable outside of its local
context
Is useful outside of its local context
Preferably is machine processable
7. Shareable Metadata as a ViewShareable Metadata as a View
Metadata is not monolithic
Metadata should be a view projected from
a single information object
Create multiple views appropriate for
groups of important sharing venues
Depends on:
◦ Use
◦ Audience
8. The 6 Cs & Lots of Ss of ShareableThe 6 Cs & Lots of Ss of Shareable
MetadataMetadata
Content
Coherence
Context
Communication
Consistency
Conformance to
Standards
9. ContentContent
How element values are structured affect
whether the record is shareable
For your institution, the resource and the
defined audience choose the appropriate:
◦ Vocabularies
◦ Content standards
◦ Granularity of description
◦ Version of the resource to describe
◦ Elements to use
Don’t include empty elements in shared
records
10. CoherenceCoherence
A shareable metadata record should make sense
on its own, outside of the local institutional
context and without access to the resource
itself
Place values in appropriate elements
Repeat elements instead of “packing” multiple
values into one field
Avoid local jargon, abbreviations and codes
Ensure mappings from local to shared metadata
formats result in coherent records
11. ContextContext
Appropriate context allows a user to
understand a resource based on the
metadata record alone
Shareable metadata records should:
◦ Include information not used locally
◦ Exclude information only used locally
Collection level records can help, but
don’t rely on them
12. CommunicationCommunication
Information supplementing your metadata
records can be useful to an aggregator
◦ Intended audiences
◦ Record creation methods
◦ Controlled vocabularies used
◦ Content standards used
◦ Accrual practices
◦ Existence of analytical or supplementary materials
◦ Provenance of materials
Can be within or external to a sharing protocol
13. ConsistencyConsistency
Consistency allows aggregators to apply same
indexing or enhancement logic to an entire
group of records
Can be affected by change in policy or
personnel over time
Pay special attention to consistency of:
◦ How metadata elements are used
◦ How (and which) vocabularies are used for a
particular element
◦ Syntax encoding schemes
14. Conformance to StandardsConformance to Standards
Technical conformance to all types of standards
is essential. Without it, processing tools and
routines simply break.
◦ Sharing protocols (e.g. OAI-PMH)
◦ Metadata structure standards
◦ Controlled vocabularies and syntax encoding
schemes
◦ Content standards
◦ Technical standards (e.g. XML, character encoding)
15. Generic high-level workflowGeneric high-level workflow
Write metadata
creation guidelines
Choose
standards
for native
metadata
Who to
share
with?
Choose
shared
metadata
formats
Plan
Create metadata
(thinking about
shareability)
Create
Perform conceptual
mapping
Perform technical
mapping
Validate transformed
metadata
Test shared metadata
with protocol
conformance tools
Transform
Implement sharing
protocol
Share
Communicate with
aggregators
See who is collecting
your metadata
Review your
metadata in
aggregations
Assess
16. No single “right” workflowNo single “right” workflow
exists for all situationsexists for all situations
Our tools sometimes dictate parts of our
workflow
◦ Be careful not to let them do this too much - tools
serve us, not vice-versa
Start workflow design from well-defined goals
(not processes)
Fundamental principles to follow
◦ Put the right information in from of the right person
at the right time
◦ Ensure shareability is a common theme underlying it
all
◦ Generate multiple views from a single master
17. Choose the best tools for the job
Important every step of the way
◦ Programming languages
◦ Commercial or open-source software
packages
◦ Repository solutions
◦ Metadata creation interfaces
Promotes both efficiency and quality
Define needed functionality, and
negotiate (compromise) from there
18. Thinking big picture
Must find a reasonable balance between
the perfect solution for a single set of
materials and fully streamlined processes
that treat everything the same way
One approach - define categories of
material and design reusable workflows
for each
19. Defining categories of material
By resource type
◦ Text
◦ Documentary images
◦ Art images
◦ Musical audio recordings
◦ etc…. (including getting more specific)
By managing institution?
◦ May provide barriers for our users - see
Elings/Waibel: “Metadata for All” article in First
Monday, 2007
◦ But institutional mission is a factor in determining the
appropriate views of a resource to share
20. Reusable parts of workflow
Decisions on metadata structure
standards, content standards, controlled
vocabularies, etc.
Metadata creation tools
Automated processing techniques
XSLT stylesheets and other data
management code
SIP/AIP/DIP architecture
Delivery systems
21. Generalization is worth the effort
You will have to go back and do it again at some
point
◦ Fixing typos, errors, etc.
◦ Adding new content over time
◦ Adding new metadata format or sharing mechanism
◦ Migration to another system
Need both workflow tools and documentation to
be accessible
Generalization will allow you to minimize the
effort redoing something and focus more on the
new stuff
22. Make the most of automation
Automate the repetitive tasks as much as
feasible, but only where it makes sense
For example:
◦ Create as much technical metadata as possible from
the file itself
◦ Derive basic structural metadata from filenaming
conventions
◦ Develop automated processes that are triggered
when an XML file is placed in a “drop box” or
submitted via a specialized tool
◦ Develop easy-to-use tools to apply the same
metadata to a defined group of records
23. Basic workflow at IU (1)Basic workflow at IU (1)
Metadata standards chosen
Metadata creation guidelines written and
tools developed/adapted
Fedora content model developed or
existing appropriate one identified
Metadata/markup created (and perhaps
digitization performed)
◦ Sometimes in phases by different people
24. Basic workflow at IU (2)
Metadata transformed via XSLT (one per
category of material, with some tweaking for
each collection) into all desired formats, and
loaded into Fedora
Metadata for sharing loaded into OAI-PMH data
provider
Appropriate staff alerted for parallel metadata
creation for OPAC (generally collection level)
Note several opportunities for greater
efficiency
25. One step at a timeOne step at a time
Implementing shareable metadata practices
likely will be done incrementally
We’re still learning how to best achieve
effective shareability
Best practices grow and change over time
Must be positioned to respond quickly to
new metadata standards and technologies as
they evolve
26. Shareable metadata isn’t just aboutShareable metadata isn’t just about
OAI-PMHOAI-PMH
Some other options:
◦ Lightweight APIs (e.g., OpenLibrary)
◦ Google SiteMaps
◦ OpenURL
◦ SRU
◦ OAI-ORE
◦ Linked data
Jim Michalko, RLG: library data sharing
mechanisms are “high value and low
participation”
Notice Z39.50 isn’t on this list.
27. Promoting new uses
The academic institution-built metadata (and/or
content) aggregation seems to have plateaued
◦ See Ricky Erway RLG report “Seeking Sustainability”
We must provide a variety of options for
accessing our data, to support a variety of uses
We shouldn’t necessarily stop collaboration and
aggregation, but we should allow others to do
this too, with our metadata (and maybe even our
content)
28. Terminologies servicesTerminologies services
Sharing our authority data is potentially even
more useful than sharing our descriptive data
RLG/OCLC doing some work in this area
◦ Moving terminologies to the “network level”
Some possible uses
◦ Give me more information on this
concept/person/etc.
◦ What are this term’s broader, narrower, related
terms?
◦ What are all the synonyms for this term?
29. Tools supporting the creation ofTools supporting the creation of
shareable metadatashareable metadata
Our existing metadata creation tools are
embarrasingly bad
Current technologies provide many
opportunities for improvement
Good tools make it easy to do the right
thing and hard to do the wrong thing
Can operate when metadata is first
created or in a later review step
Here are some ideas…
30. Directly in XMLDirectly in XML
Generally only a good idea for markup
languages, rather than metadata structure
standards
◦ And often not even then
Some supplemental tools can help
◦ Validation to Schema/DTD (of course)
◦ “Preview” function
◦ “Report card” function, e.g., with Schematron
31. ModularizeModularize
All metadata for a resource doesn’t have to be
created at once
◦ Transcription vs. authority work vs. subject analysis
◦ Descriptive vs. technical vs. structural
◦ Us vs. users!
Provide optimized views for each metadata
creation function
◦ Perhaps even different systems
◦ But always provide metadata creators with a way to
see how the metadata will be used
32. Abandon the record-centricAbandon the record-centric
approachapproach
Patterns (and outliers) emerge from data
in the aggregate
Reporting capabilities
◦ Sortable, deduplicated lists of values from a
given field or set of fields
◦ How many of this field per record
◦ How many distinct values used in this field
◦ Data overlap between fields
33. Useful features
Data type validation (while entering data
in that field!)
Auto-complete
Record-level validation
Spell check
Integration of metadata creation
guidelines into software tools
34. Integration of controlledIntegration of controlled
vocabulariesvocabularies
Should be seamless
Provide access to entire authority record rather
than just the heading
For short vocabularies, provide a combo box
For longer vocabularies
◦ Auto-complete
◦ Ajax-y interactions with hierarchical and alphabetical
views
Similar features could be used to perform
maintenance of vocabularies
35. Working around system limitationsWorking around system limitations
Many digital asset management systems don’t
support a second shareable copy of records
Do your best to split the difference with
system records
Use creative interface design for your local
system
Use extra-protocol documentation for
communicating with aggregators
Lobby your vendor!
36. Good practice requiresGood practice requires
collaborationcollaboration
One person can’t do it all
Implementing shareable metadata
requires a primary advocate to ensure
shareability is a consideration at all steps
of the workflow
Many people will need to be involved
37. Role of metadata specialistsRole of metadata specialists
Often are the shareable metadata
advocate
Choose standards and sharing protocols
Write metadata creation guidelines
Be prepared to compromise!
38. Role of technical staffRole of technical staff
Evaluate feasibility of technical plans
Help with prioritization of options
Locate and evaluate existing code to
minimize duplication of effort
Abstract specific processes for general
use
40. Final thoughts about sharingFinal thoughts about sharing
Shareable metadata represents a fundamental
shift in thinking
◦ Your metadata is no longer a destination, it is
information that will serve as building blocks for
other services
◦ Your metadata must operate effectively in an
increasingly decontextualized environment
Creating shareable metadata
◦ Will require more work on your part
◦ Will require our software to support (more)
standards
◦ Is no longer an option, it’s a requirement
41. Yes, this is hard…Yes, this is hard…
…and we’re just starting to learn how to
do it effectively and efficiently
There’s plenty of room for leadership in this
area.
Editor's Notes
90 mins total. Will probably start late. Leave ample time for ?s. Talk ca. 45 mins.
As libraries, archives, and museums streamline metadata creation and simultaneously improve its breadth, depth, and quality, we also seek methods to support new and innovative user services by making this metadata more widely available to networked processes. &quot;Shareable metadata&quot; is metadata optimized for use in multiple aggregated environments instead of a single tool. As such, it is a critical component of the re-imagining of cultural heritage institutions.
In her presentation, Ms. Riley will introduce the concept of shareable metadata, describe the role of various classes of staff members in its planning and creation, demonstrate the types of tools that can aid in its creation and dissemination, and discuss workflows that can be used throughout the entire metadata lifecycle to speed its creation and improve its quality and accessibility.