2. Tutorial Roadmap
Principles of Metadata
Dublin Core Metadata Basics
The Dublin Core Abstract Model
Syntax Alternatives for DC Metadata
Mixing and Matching Metadata
History and workings of the Dublin Core Metadata
Initiative
Acknowledgements: I have borrowed liberally from tutorial
slides sets from Tom Baker, Diane Hillman, Andy Powell,
and Marty Kurth, available at Dublincore.org
3. Basic Principles of Metadata
The Web as an information system
The Internet Commons
Interoperability is key
MARC lives
The varieties of metadata
Modularity
Some Challenges
4. State of the Web as an Information System
Search systems are motivated by business
models, not functionality
Index coverage is broad, but unpredictable
Too much recall, too little precision
Index spam abounds
Resources (and their names) are volatile
What about versions, editions, back issues?
Archiving is presently unsolved
Authority and quality of service are spotty
Managing Intellectual Property Rights is difficult
5. Metadata:
Part of a Solution
Structured data about other data
• helps to impose order on chaos
• enables automated discovery/manipulation
Full Text Web indexing is the dominant idiom for
search
Metadata is more useful in structured collections,
used in combination with applications designed to
take advantage of structured descriptions
6. Internet Commons includes Multiple Communities
Scientific
Data
Home
Pages Geo
Internet
Commons
Library
Museums
Commerce
Whatever...
7. Interoperability
requires conventions about:
Semantics
• The meaning of the elements
Structure
• human-readable
• machine-parseable
Syntax
• grammars to convey semantics and
structure
8. Haven’t we done metadata already?
The MARC family of standards is
the single most successful resource
description standard in the world
9. MARC Cataloging…
Is really MARC-AACR2 cataloging
• MARC is the communications format
• AACR2 (Anglo-American Cataloging Rules)
defines the cataloging rules (semantics
MARC and AACR2 are evolving
• Closer alignment with XML as a syntax option
• RDA is an effort to modernize AACR2, and
alignment it with networked environments
RDA and Dublin Core are cooperating on
alignment of a common underlying data model.
10. What’s wrong with
this model on the Web?
Expensive
• Complex
• Professional Catalogers required
Bias towards bibliographic artifacts
• Fixed resources
• Incomplete handling of resource evolution and
other resource relationships
Anglo-centric
• MARC 21 accounts for ¾ of MARC records, but
there are many other varieties
11. Metadata Takes Many Forms
resource
discovery
document
administration
rights
management
content
rating
security and
authentication
archival
status
products and
services
database
schemas
process control
or description
12. Warwick Framework:
Modular Metadata
Conceptual Architecture for metadata from the
Warwick Metadata Workshop (DC-2)
Conceptual architecture to support the
specification, collection, encoding, and exchange
of modular metadata
Provide context for metadata efforts (including
Dublin Core)
• avoids the “black-hole” of comprehensive
element sets
• focuses interoperability issues at package level
A conceptual framework, NOT an application
13. Modularity and Extensibility:
the Lego metaphor
DC is a beginning, not an end
An architecture for modular, extensible
metadata
The simplest common denominator
• Add stuff you need for
• Local requirements
• Domain specific functionality
• Other dimensions of description
• Eg cloud cover… management… structural
metadata….
14. Descriptive Metadata Standards
IEEE LOM (Learning Object Metadata)
• Descriptive and structural metadata to support
instructional systems
ONIX (Online Information Exchange) – bookseller
metadata
FGDC – Federal Geographic Data Committee: rich
descriptive and structural metadata for GIS applications
Encoded Archival Description – description of archival
collections
MPEG Multimedia Metadata – large, complicated, still in
progress – descriptive, structural, rights management
Dublin Core – core descriptive metadata
15. Metadata Creation
Metadata is expensive and error prone
• A MARC Record costs about $100 USD to
create one record at the Library of Congress
• Competes with indexing at… $ 00.001 ???
Capture it as close to point of creation as possible
Capture as much automatically as possible
Should be designed with close attention to the
functional requirements it serves
Re-use existing standards whenever possible
Always tension between completeness of
description, intended purpose, and cost
16. Metadata Challenges
Accommodate multiple varieties of metadata
Tension: functionality and simplicity
Tension: extensibility and interoperability
Human and machine creation and use
Community-specific functionality, creation,
administration, access work at cross purposes to
global interoperability
18. Dublin Core Basics
Design Philosophy – useful metaphors
Language and pidgins
Characteristics of DC metadata
The simple bucket (properties)
Resource Types
Metadata grammar
Dublin Core Principles
One-to-one
Dumb-down rule
Context appropriate values
Translations
19. Dublin Core:
Starting Assumptions and Essential Features
Simple
• true to a point: the elements are simple, the
underlying model is not
Consensus-based
• Crucial to early success, both in attracting
expertise and deployment. Bottom up
Based on the experience of practitioners, but
hard to capture and capitalize on lessons learned
Cross-disciplinary and International
• Central success factor
20. Essential Features (continued)
The Web is the strategic application
• On the mark
International
• Also central success factor, but hard (20
languages in the Registry)
Lego-like modularity & extensibility
• Partially realized promise
• Application Profiles are the means
Syntax independence
• An ongoing nightmare (HTML…XML…RDF/XML)
Authors will describe their own works
• Laughably naïve
21. A Pidgin for Digital Tourists
Metadata is language
Dublin Core is a small and simple language -- a
pidgin -- for finding resources across domains
Speakers of different languages naturally
"pidginize" to communicate
• E.g., tourists using simple phrases to order
beer ("zwei Bier bitte" "dva pivo" "biru o san
bai"...)
We are all "tourists" on the Internet.
22. A Grammar of
Dublin Core
By design not as rich as mother tongues, but
easy to learn and useful in practice
Pidgins: small vocabularies (Dublin Core:
fifteen special nouns and lots of optional
adjectives)
Simple grammars: sentences (statements)
follow a simple fixed pattern...
http://www.dlib.org/dlib/october00/baker/10baker.html
23. Basic Structures in Dublin Core Metadata
The basic unit of metadata is a statement:
• Statements consist of a property (a metadata element)
and a value
• Metadata statements describe resources
• More about the Dublin Core Abstract model later
resource statement
value
property
24. What are the properties and values in the
following metadata statements?
245 00 $a Amores perros $h [videorecording]
<title> Nueve reinas </title>
<type> MovingImage </type>
• Different models for conveying related information
• Dublin Core syntax fits in more naturally with the structure
of the Web
26. The fifteen elements (properties)
Creator Title Subject
Contributor Date Description
Publisher Type Format
Coverage Rights Relation
Source Language Identifier
27. Varieties of qualifiers:
Element Refinements
Make the meaning of an element narrower or
more specific.
• a Date Created versus a Date Modified
• an IsReplacedBy Relation versus a Replaces
Relation
If your software does not understand the
qualifier, you can safely ignore it.
28. Varieties of Qualifiers:
Value Encoding Schemes
Says that the value is
• a term from a controlled vocabulary (e.g.,
Library of Congress Subject Headings)
• a string formatted in a standard way (e.g.,
"2001-05-02" means May 3, not February 5)
Even if a scheme is not known by software, the
value should be "appropriate" and usable for
resource discovery.
29. Resource has Date "2000-06-13"
Resource has Subject "Languages -- Grammar"
30. Dumb-Down Principle for Qualifiers
Simple DC does not use element refinements or
encoding schemes – statements contain only
value strings
Qualified DC uses features of the DCMI Abstract
Model, including element refinements and
encoding schemes
Dumbing-down is translating Qualified DC to
simple DC
Qualifiers refine meaning (but may be harder to
understand)
31. The One to One Principle
Each resource should have one metadata
description
• For example, do not describe a digital image of
the Mona Lisa as if it were the original painting
Group Related descriptions into description sets
• Describe an artist and his or her work
separately, not in a single description
32. Appropriate Values
There are generally tradeoffs between local
requirements and global requirements
Use elements and qualifiers to meet the needs of
your local context, but…
Keep in mind that machines and people use and
interpret metadata, so…
Consider whether the values used will help
discovery outside your local context
33. Dublin Core as a multilingual metadata
language
Dublin Core has been translated into 20 +
languages
• machine-readable tokens are shared by all
• human-readable labels are defined in different
languages
• translations are distributed, maintained in
many countries
• eventually linked in DCMI registry
34.
35. One token –
labels in many languages
dc:creator
“Verfasser”
label
“Creator”
label
“Pencipta”
label
[Server in
Germany]
[Server in
Jakarta]
[DCMI Server]
36. Metadata languages are "multilingual"
Metadata is not a spoken language
The words of metadata -- "elements" --
are symbols that stand for concepts
expressible in multiple natural languages
Standards may have dozens of
translations
Are concepts like "title", "author", or
"subject" used the same way in English,
Finnish, and Korean?
37. DCMI Open Metadata Registry
Managing vocabularies defined by the DCMI
• Languages
• Versioning
• Controlled vocabularies
Foundation for modular, incremental
integration and evolution
The Registry working group is a Dublin Core
Community with participants around the world
38. The Dublin Core Abstract Model
Terminology
Simple versus Qualified DC
Resources
Descriptions
Description sets
Value Strings
Element refinements
Encoding Schemes
Graphical representation of the Abstract Model
Summary of general ideas
39. Important DCMI Document concerning
the Abstract Model and Syntax alternatives
DCMI Abstract Model
http://dublincore.org/documents/abstract-model/
Expressing Dublin Core in HTML/XHTML meta and
link elements
http://dublincore.org/documents/dcq-html/
Expressing Dublin Core metadata using the Resource
Description Framework (RDF)
http://dublincore.org/documents/dc-rdf/
Expressing Dublin Core metadata using XML
http://dublincore.org/documents/dc-xml/
40. Simple versus Qualified DC
Simple DC supports single descriptions using the
15 base elements and value strings
Qualified DC supports the richer features of the
Abstract Model, and allows the use of all DCMI
terms as well as other, non-DCMI terms.
An application profile is used to specify a
metadata application that includes DCMI terms in
combination with non-DCMI terms (mix & match
metadata).
41. The DCMI Abstract Model
A data model for Dublin Core
Agreed upon underlying structure for metadata
statements
Many years in the making -- long term contention
Describes the structure of statements about
resources that we make in our metadata
language:
resource statement
value
property
42. What is a resource?
W3C definition:
• “anything that has identity… electronic document,
an image, a service”
• “not all resources are network retrievable; e.g.
human beings, corporations, and bound books can
also be considered resources”
In other words, a resource is anything we can
identify:
• Physical things (books, people, airplanes….)
• Digital things (Images, web pages, services….)
• Concepts (colors, subjects, eras, places)
In the DC context, the DCMI Type list describes the
stuff we describe with DC metadata
43. Resource types for which DC is often used
Collection Dataset Event
Image Interactive
Resource
Moving
Image
Physical
Object
Service Software
Sound Still
Image
Text
DCMI TYPE Vocabulary
44. Abstract Model: Descriptions
A description is composed of:
• One or more statements about a single
resource
• Optionally, the URI of the resource being
described
Each statement is made up of
• A property URI (that identifies a property)
• A value URI (that identifies a value) and/or
one or more representations of the value (a
value string)
45. Terminology: Value Strings
A value string is a human-readable string that
represents the value of the property
Each value string may have an associated value
string language that is an ISO language tag (e.g.,
pt-BR)
46. Terminology: Element Refinements
Elements are the same as properties
Element refinements are the same as sub-
properties
An element refinement is a special case of an
element that shares the meaning of its ‘parent’,
but has narrower semantics
Paulo is illustrator of a book, therefore he is also
a contributor to the book
Illustrator is an element refinement of
contributor
47. Terminology: Encoding Schemes
Values and value strings can be ‘qualified’ by
encoding schemes in order to clarify their
meaning
• A Vocabulary Encoding Scheme is used to
indicate a terminology set from which a value
is taken:
Stem cells—Research is a value from LCSH
616.02774 is a value from DDC-22
• A syntax encoding scheme is used to indicate
the structure of a value string
2004-10-12 is structured according to the
W3CDTF rules for date encoding
48. Terminology: Description Sets
The 1:1 principle dictates that each description
describes one, and only one, resource
We often need to describe grouped sets of
descriptions, which are known in the abstract
model as description sets
• An article and its authors
• A painting and its artist
When description sets are exchanged between
software applications, they are generally encoded
according to a particular syntax in a metadata
record
49. Record (encoded as html, XML, or RDF/XML
Description set
Resource Description (URI)
Resource Description (URI)
Resource Description (URI)
Statement
Statement
Statement
language
(pt-BR)
Abstract Model summary (after Andy Powell)
value string
value URI
property (URI)
syntax encoding
scheme
Vocabulary
encoding scheme
50. General Ideas
DC is not just the 15 elements, though they
comprise the foundation for simple DC
50+ properties (elements) have been approved
by DCMI
The model supports local declarations of
additional properties
The model supports application profiles (mixing
DC elements with those of other sets)
The model allows the grouping of descriptions to
create more complex description entities
52. Syntax Alternatives
HTML… XML… RDF/XML
Three Web-based models for deploying metadata
Each has advantages and disadvantages
What is ‘best’ depends on local constraints
• What is the objective of the system? How do
these syntax alternatives support local
functional requirements?
• Are there services and software to ‘consume’
the metadata created?
• Are trained practitioners available to create
and support the systems?
53. Syntax Alternatives: HTML
Advantages:
• Simple – META tags embedded in content
• Widely deployed tools and knowledge
• Resource carries its metadata around with it
• Metadata is openly harvestable
54. Syntax Alternatives: HTML (continued)
Disadvantages
• Limited structural richness (does not support
hierarchical, tree-structured data
• Management of metadata is less reliable (the
metadata is out in the wild)
Describe one thing (the HTML document) and no
more!
55. Dublin Core in HTML (example)
<head>
<link rel="schema.DC" href="http://purl.org/dc">
<meta name="DC.title"
content=“DC Metadata Tutorial”
<meta name="DC.creator"
content=“Stuart L. Weibel">
<meta name="DC.subject" xml:lang= “en-US’
content=“Metadata">
<meta name="DC.date" scheme=“DCTERMS.W3CDTF"
content=“2007-07-08">
<meta name=“DCTERMS.audience”
content =“technical librarians”
</head>
<body>
… [ rest of html document ]
56. The namespaces for HTML encoding
All DCMI terms (elements, element refinements,
and encoding schemes) are found in:
DCMI Metadata Terms
http://dublincore.org/documents/dcmi-terms/
The namespaces are a result of historical
developments
• DC: [original elements]
• DCTERMS: [later elements]
57. Syntax Alternatives: XML
XML = eXtensible Markup Language
The standard for networked text and data
Wide-spread tool support
• Parsers are widely available
• Extensibility (XML namespaces)
• Type definitions (XML Schema)
• Transformation and Rendering (XSLT)
• Rich linking semantics (XLINK)
58. XML Schema
Rich XML-based language for expressing data-
type semantics
Replaces arcane and limited DTD (origin in SGML)
Facilities:
• Data typing (both complex and primitive)
• Constraints (ranges, cardinality…)
• Defaults (specify defaults for certain
properties)
59. Dublin Core fragment in XML
<metadata
xmlns:dc="http://www.openarchives.org
/OAI/dc.xsd">
<dc:creator>Carl Lagoze</dc:creator>
<dc:title>Accommodating Simplicity and
Complexity in Metadata</dc:title>
<dc:date>2000-07-01</dc:date>
<dc:publisher>Cornell University,
Computer Science</dc:publisher>
</metadata>
Where is the rest of the stuff? In the schema!
60. Case Study: OAI-PMH
OAI Protocol for Metadata Harvesting
Open Archives Initiative
http://www.openarchives.org
• Simple Protocol for sharing metadata records
Based on HTTP, XML, XML Schema, and XML
namespaces
Allows a harvester to query a remote repository
for some or all of its metadata records
DC is the default native metadata format in the
OAI protocol
61. Syntax Alternatives: RDF
RDF (Resource Description Format)
Syntax expressed in XML
W3C recommendation for encoding metadata (a
semantic Web technology)
Enabling technology for richly-structured metadata
Rich data model (the DC Abstract Model is a
constrained version of RDF)
Metadata can be shared easily among independent
applications that understand RDF
W3C – Resource Description Framework (RDF)
http://www.w3.org/RDF/
62. Summary: Syntax alternatives
Choices should be driven by local requirements
and objectives
• Available expertise
• Costs of Deployment
• Objectives and functional requirements
63. Association Models
Where do we keep the metadata?
Embedded
• HTML META tags or XML or RDF-XML can be embedded
in the resource, and hence travels with the resource
• Simple, but limited in structural richness
Loosely coupled
• Shadow Files (like Adobe’s XMP Sidecar files)
• Requires a system to manage and insure that they stay
in synch
• RDF or XML descriptions
Third Party Metadata
• Stored in repositories such as library catalogs
• Easier to manage and maintain, and provide service
• Library catalogs, for example
65. Application Profiles:
Mixing and Matching Metadata
What is an Application Profile?
Why bother?
Creating new properties
Documenting and declaring new
properties
Some examples
66. Application Profiles: Mixing and Matching
Metadata
• The mixing and matching of elements
(properties) from separate metadata sets
• An expression of metadata modularity
• Implementers can benefit from peer applications
• Communities can harmonize their metadata,
picking complementary properties
• Promotes convergence over time
• For application profiles to work, there must be
public declarations of properties that conform to
a common data model (or nearly so)
67. Application Profile: Definition
Declaration of metadata properties used in a
given organization or application or community
Documentation of encodings, constraints, and
creation guidelines
Implies formal schemas (xml schemas or RDF
schemas)
Should promote both human understanding and
machine interoperability
The concept of application profiles applies to any
metadata community of practice, not just DC
DC has promoted their use and leads by example
68. Why bother?
One-size-fits-all metadata results in bloated,
unmanageable specifications and applications
APs allow tailoring a given metadata application
to match the element set to specific functional
requirements based on local or community
needs, while retaining interoperability with a
larger metadata community
69. Creating an Application Profile
Find out what others have done… don’t re-invent
wheels!
Develop community consensus
Define Name, Label, definition relationships (see
the DCMI Usage Board guidelines)
Determine an appropriate URI (a home on the
Web)
Dublin Core Application Profile Guidelines
http://dublincore.org/usage/documents/profile-guidelines/
70. Document New Properties
At very least: a Web page with relevant
information
Better: a web page with a public schema using
new terms in an application profile
Better still: all properties available as part of a
metadata registry
72. Some History of the Dublin Core
and
How the Initiative Works
• The Beginnings
• Landmarks
• Workshops and Conference series
• What the initiative does
• Standardization
• Some example applications
73. Dublin Core: The Beginning
A casual discussion at WWW-2 in Chicago,
October of 1994
• How to make things on the Web easier to find?
OCLC & NCSA co-sponsored an invitational
workshop in March of 1995
The workshop became a workshop series, and
eventually a conference series
DCMI: Dublin Core Metadata Initiative
• Governance and process evolved over time
• De facto standards maintenance body
74. Dublin Core Landmarks
1994: Simple tags to describe Web pages
1995: The Dublin Core is one of many
vocabularies needed ("Warwick
Framework")
1996: The Dublin Core: 13 elements
expanded to 15 - appropriate for Text and
Images
1997: WF needs formal expression in a
Resource Description Framework (RDF)
75. Dublin Core Landmarks (continued)
2000: Dublin Core Metadata Initiative
recommends qualifiers, broadens its
organizational scope beyond the Core
2001: Workshop Series becomes a
conference series
DCMI Affiliates and a board of trustees
2005: Abstract Model (Finally)
76. The Dublin Core Workshop Series
Workshop Venues:
US DC 1, 3, 6
UK DC 2
Australia DC 4
Finland DC 5
Germany DC 7
Canada DC 8
Conferences
Tokyo (2001) China (2004)
Florence (2002) Spain (2005)
Seattle (2003) Mexico (2006)
77.
78. DCMI Activities
Standards development and maintenance
Metadata registry and infrastructure
Technical working groups and periodic
workshops
Tutorial materials and user guides
Education and training
Open source software
Liaisons with other standards or user
communities
79. Governance of DCMI
DCMI has a Board of Trustees that oversees the
operation and goals of the initiative
Managing Director
• Makx Dekkers
Director of Specifications and Documentation
• Tom Baker
An Advisory Board of metadata experts provides
guidance on metadata issues
80. The DCMI Usage Board
The Usage Board is an editorial committee that
evaluates proposals for new elements or revisions
International selection of metadata experts
Meet twice yearly
Documents decisions and updates DCTERMS
document
81. Affiliate Program
DCMI has National Affiliates which support the
Initiative and are represented on the Board of
Trustees
• Finland
• UK
• Singapore
• New Zealand
• Korea
OCLC has been the Host from the start
82. The Three I’s
Independent: DCMI is not controlled by specific
commercial or other interests and is not biased
towards specific domains nor does it mandate
specific technical solutions
International: DCMI encourages participation
from organizations anywhere in the world,
respecting linguistic and cultural differences
Influenceable: DCMI is an open organization
aiming at building consensus among the
participating organizations; there are no
prerequisites for participation
83. The Work gets done by Communities and
task groups
Accessibility Community
Collection Description Community
Education Community
Environment Community
Global Corporate Circle
Government Community
Kernel Community
Libraries Community
Localization and Internationalization Community
Preservation Community
Registry Community
Social Tagging Community
Standards Community
Tools Community
84. Standardization of the Dublin Core
IETF RFC 2413
• http://www.ietf.org/rfc/rfc2413.txt
CEN Workshop Agreement (Europe)
• endorse Dublin Core elements as
CWA13874
NISO Z39.85
• National Information Standards
Organization, an ANSI affiliate
ISO 15836
85. Metadata Applications - examples
Governments
• 7 governments have adopted DC metadata
• Adobe products
• XMP – Adobe’s variant of RDF
• Dublin Core is a base schema
IPTC – International Press and
Telecommunications Council
• Dublin Core based standard for journalism
Knowledge Management systems commonly use
DC metadata
Visual materials require metadata for findability
Library Systems (mostly MARC cataloging, but
increasingly other metadata as well)
86. Metadata applications (continued)
Search Systems
• Full text indexing is enormously useful
• Structured metadata improves search
• The Amazoogles are all aggressively courting
metadata aggregators
Cameras
• Automatically create metadata for each image
• Some even include GPS data
Commerce systems require metadata
Social Software applications are largely about
enriching resource information with tags,
reviews, and automated linking
87. To Sum Up…
Many purpose-built metadata standards
Few have explicit data models
Few interoperate
Some will survive, others will not
The Web demands convergence
• Break down silos between domains and
communities of practice
• RDF should help promote convergence, but we
are not there yet
Expect more metadata standards, but hope for
fewer
88. How to Participate
Join the
DC-General
mailing list
Join a working
group
Information
on lists and
working groups
is available at http://dublincore.org
89. Stuart L. Weibel
Visit me at: http://weibel-lines.typepad.com
Contact me at: Weibel@oclc.org
Thank you for your
attention