SlideShare a Scribd company logo
1 of 89
Dublin Core
Metadata Tutorial
July 9, 2007
Stuart Weibel
Senior Research Scientist
OCLC Programs and Research
Tutorial Roadmap
 Principles of Metadata
 Dublin Core Metadata Basics
 The Dublin Core Abstract Model
 Syntax Alternatives for DC Metadata
 Mixing and Matching Metadata
 History and workings of the Dublin Core Metadata
Initiative
 Acknowledgements: I have borrowed liberally from tutorial
slides sets from Tom Baker, Diane Hillman, Andy Powell,
and Marty Kurth, available at Dublincore.org
Basic Principles of Metadata
The Web as an information system
The Internet Commons
Interoperability is key
MARC lives
The varieties of metadata
Modularity
Some Challenges
State of the Web as an Information System
 Search systems are motivated by business
models, not functionality
 Index coverage is broad, but unpredictable
 Too much recall, too little precision
 Index spam abounds
 Resources (and their names) are volatile
 What about versions, editions, back issues?
 Archiving is presently unsolved
 Authority and quality of service are spotty
 Managing Intellectual Property Rights is difficult
Metadata:
Part of a Solution
 Structured data about other data
• helps to impose order on chaos
• enables automated discovery/manipulation
 Full Text Web indexing is the dominant idiom for
search
 Metadata is more useful in structured collections,
used in combination with applications designed to
take advantage of structured descriptions
Internet Commons includes Multiple Communities
Scientific
Data
Home
Pages Geo
Internet
Commons
Library
Museums
Commerce
Whatever...
Interoperability
requires conventions about:
 Semantics
• The meaning of the elements
 Structure
• human-readable
• machine-parseable
 Syntax
• grammars to convey semantics and
structure
Haven’t we done metadata already?
The MARC family of standards is
the single most successful resource
description standard in the world
MARC Cataloging…
 Is really MARC-AACR2 cataloging
• MARC is the communications format
• AACR2 (Anglo-American Cataloging Rules)
defines the cataloging rules (semantics
 MARC and AACR2 are evolving
• Closer alignment with XML as a syntax option
• RDA is an effort to modernize AACR2, and
alignment it with networked environments
 RDA and Dublin Core are cooperating on
alignment of a common underlying data model.
What’s wrong with
this model on the Web?
 Expensive
• Complex
• Professional Catalogers required
 Bias towards bibliographic artifacts
• Fixed resources
• Incomplete handling of resource evolution and
other resource relationships
 Anglo-centric
• MARC 21 accounts for ¾ of MARC records, but
there are many other varieties
Metadata Takes Many Forms
resource
discovery
document
administration
rights
management
content
rating
security and
authentication
archival
status
products and
services
database
schemas
process control
or description
Warwick Framework:
Modular Metadata
 Conceptual Architecture for metadata from the
Warwick Metadata Workshop (DC-2)
 Conceptual architecture to support the
specification, collection, encoding, and exchange
of modular metadata
 Provide context for metadata efforts (including
Dublin Core)
• avoids the “black-hole” of comprehensive
element sets
• focuses interoperability issues at package level
 A conceptual framework, NOT an application
Modularity and Extensibility:
the Lego metaphor
 DC is a beginning, not an end
 An architecture for modular, extensible
metadata
 The simplest common denominator
• Add stuff you need for
• Local requirements
• Domain specific functionality
• Other dimensions of description
• Eg cloud cover… management… structural
metadata….
Descriptive Metadata Standards
 IEEE LOM (Learning Object Metadata)
• Descriptive and structural metadata to support
instructional systems
 ONIX (Online Information Exchange) – bookseller
metadata
 FGDC – Federal Geographic Data Committee: rich
descriptive and structural metadata for GIS applications
 Encoded Archival Description – description of archival
collections
 MPEG Multimedia Metadata – large, complicated, still in
progress – descriptive, structural, rights management
 Dublin Core – core descriptive metadata
Metadata Creation
 Metadata is expensive and error prone
• A MARC Record costs about $100 USD to
create one record at the Library of Congress
• Competes with indexing at… $ 00.001 ???
 Capture it as close to point of creation as possible
 Capture as much automatically as possible
 Should be designed with close attention to the
functional requirements it serves
 Re-use existing standards whenever possible
 Always tension between completeness of
description, intended purpose, and cost
Metadata Challenges
 Accommodate multiple varieties of metadata
 Tension: functionality and simplicity
 Tension: extensibility and interoperability
 Human and machine creation and use
 Community-specific functionality, creation,
administration, access work at cross purposes to
global interoperability
Interoperability barriers cost time and money
A Common data model helps avoid this
Dublin Core Basics
 Design Philosophy – useful metaphors
Language and pidgins
 Characteristics of DC metadata
 The simple bucket (properties)
 Resource Types
 Metadata grammar
 Dublin Core Principles
One-to-one
Dumb-down rule
Context appropriate values
 Translations
Dublin Core:
Starting Assumptions and Essential Features
 Simple
• true to a point: the elements are simple, the
underlying model is not
 Consensus-based
• Crucial to early success, both in attracting
expertise and deployment. Bottom up
 Based on the experience of practitioners, but
hard to capture and capitalize on lessons learned
 Cross-disciplinary and International
• Central success factor
Essential Features (continued)
 The Web is the strategic application
• On the mark
 International
• Also central success factor, but hard (20
languages in the Registry)
 Lego-like modularity & extensibility
• Partially realized promise
• Application Profiles are the means
 Syntax independence
• An ongoing nightmare (HTML…XML…RDF/XML)
 Authors will describe their own works
• Laughably naïve
A Pidgin for Digital Tourists
 Metadata is language
 Dublin Core is a small and simple language -- a
pidgin -- for finding resources across domains
 Speakers of different languages naturally
"pidginize" to communicate
• E.g., tourists using simple phrases to order
beer ("zwei Bier bitte" "dva pivo" "biru o san
bai"...)
 We are all "tourists" on the Internet.
A Grammar of
Dublin Core
 By design not as rich as mother tongues, but
easy to learn and useful in practice
 Pidgins: small vocabularies (Dublin Core:
fifteen special nouns and lots of optional
adjectives)
 Simple grammars: sentences (statements)
follow a simple fixed pattern...
 http://www.dlib.org/dlib/october00/baker/10baker.html
Basic Structures in Dublin Core Metadata
 The basic unit of metadata is a statement:
• Statements consist of a property (a metadata element)
and a value
• Metadata statements describe resources
• More about the Dublin Core Abstract model later
resource statement
value
property
What are the properties and values in the
following metadata statements?
245 00 $a Amores perros $h [videorecording]
<title> Nueve reinas </title>
<type> MovingImage </type>
• Different models for conveying related information
• Dublin Core syntax fits in more naturally with the structure
of the Web
Resource has property
DC:Creator
DC:Title
DC:Subject
DC:Date...
X
implied
subject
implied
verb
one of 15
properties
property value
(an appropriate
literal)
qualifiers
(adjectives)
The fifteen elements (properties)
Creator Title Subject
Contributor Date Description
Publisher Type Format
Coverage Rights Relation
Source Language Identifier
Varieties of qualifiers:
Element Refinements
 Make the meaning of an element narrower or
more specific.
• a Date Created versus a Date Modified
• an IsReplacedBy Relation versus a Replaces
Relation
 If your software does not understand the
qualifier, you can safely ignore it.
Varieties of Qualifiers:
Value Encoding Schemes
 Says that the value is
• a term from a controlled vocabulary (e.g.,
Library of Congress Subject Headings)
• a string formatted in a standard way (e.g.,
"2001-05-02" means May 3, not February 5)
 Even if a scheme is not known by software, the
value should be "appropriate" and usable for
resource discovery.
Resource has Date "2000-06-13"
Resource has Subject "Languages -- Grammar"
Dumb-Down Principle for Qualifiers
 Simple DC does not use element refinements or
encoding schemes – statements contain only
value strings
 Qualified DC uses features of the DCMI Abstract
Model, including element refinements and
encoding schemes
 Dumbing-down is translating Qualified DC to
simple DC
 Qualifiers refine meaning (but may be harder to
understand)
The One to One Principle
 Each resource should have one metadata
description
• For example, do not describe a digital image of
the Mona Lisa as if it were the original painting
 Group Related descriptions into description sets
• Describe an artist and his or her work
separately, not in a single description
Appropriate Values
 There are generally tradeoffs between local
requirements and global requirements
 Use elements and qualifiers to meet the needs of
your local context, but…
 Keep in mind that machines and people use and
interpret metadata, so…
 Consider whether the values used will help
discovery outside your local context
Dublin Core as a multilingual metadata
language
 Dublin Core has been translated into 20 +
languages
• machine-readable tokens are shared by all
• human-readable labels are defined in different
languages
• translations are distributed, maintained in
many countries
• eventually linked in DCMI registry
One token –
labels in many languages
dc:creator
“Verfasser”
label
“Creator”
label
“Pencipta”
label
[Server in
Germany]
[Server in
Jakarta]
[DCMI Server]
Metadata languages are "multilingual"
 Metadata is not a spoken language
 The words of metadata -- "elements" --
are symbols that stand for concepts
expressible in multiple natural languages
 Standards may have dozens of
translations
 Are concepts like "title", "author", or
"subject" used the same way in English,
Finnish, and Korean?
DCMI Open Metadata Registry
 Managing vocabularies defined by the DCMI
• Languages
• Versioning
• Controlled vocabularies
 Foundation for modular, incremental
integration and evolution
 The Registry working group is a Dublin Core
Community with participants around the world
The Dublin Core Abstract Model
Terminology
Simple versus Qualified DC
Resources
Descriptions
Description sets
Value Strings
Element refinements
Encoding Schemes
Graphical representation of the Abstract Model
Summary of general ideas
Important DCMI Document concerning
the Abstract Model and Syntax alternatives
 DCMI Abstract Model
http://dublincore.org/documents/abstract-model/
 Expressing Dublin Core in HTML/XHTML meta and
link elements
http://dublincore.org/documents/dcq-html/
 Expressing Dublin Core metadata using the Resource
Description Framework (RDF)
http://dublincore.org/documents/dc-rdf/
 Expressing Dublin Core metadata using XML
http://dublincore.org/documents/dc-xml/
Simple versus Qualified DC
 Simple DC supports single descriptions using the
15 base elements and value strings
 Qualified DC supports the richer features of the
Abstract Model, and allows the use of all DCMI
terms as well as other, non-DCMI terms.
 An application profile is used to specify a
metadata application that includes DCMI terms in
combination with non-DCMI terms (mix & match
metadata).
The DCMI Abstract Model
 A data model for Dublin Core
 Agreed upon underlying structure for metadata
statements
 Many years in the making -- long term contention
 Describes the structure of statements about
resources that we make in our metadata
language:
resource statement
value
property
What is a resource?
 W3C definition:
• “anything that has identity… electronic document,
an image, a service”
• “not all resources are network retrievable; e.g.
human beings, corporations, and bound books can
also be considered resources”
 In other words, a resource is anything we can
identify:
• Physical things (books, people, airplanes….)
• Digital things (Images, web pages, services….)
• Concepts (colors, subjects, eras, places)
 In the DC context, the DCMI Type list describes the
stuff we describe with DC metadata
Resource types for which DC is often used
Collection Dataset Event
Image Interactive
Resource
Moving
Image
Physical
Object
Service Software
Sound Still
Image
Text
DCMI TYPE Vocabulary
Abstract Model: Descriptions
 A description is composed of:
• One or more statements about a single
resource
• Optionally, the URI of the resource being
described
 Each statement is made up of
• A property URI (that identifies a property)
• A value URI (that identifies a value) and/or
one or more representations of the value (a
value string)
Terminology: Value Strings
 A value string is a human-readable string that
represents the value of the property
 Each value string may have an associated value
string language that is an ISO language tag (e.g.,
pt-BR)
Terminology: Element Refinements
 Elements are the same as properties
 Element refinements are the same as sub-
properties
 An element refinement is a special case of an
element that shares the meaning of its ‘parent’,
but has narrower semantics
 Paulo is illustrator of a book, therefore he is also
a contributor to the book
Illustrator is an element refinement of
contributor
Terminology: Encoding Schemes
 Values and value strings can be ‘qualified’ by
encoding schemes in order to clarify their
meaning
• A Vocabulary Encoding Scheme is used to
indicate a terminology set from which a value
is taken:
Stem cells—Research is a value from LCSH
616.02774 is a value from DDC-22
• A syntax encoding scheme is used to indicate
the structure of a value string
2004-10-12 is structured according to the
W3CDTF rules for date encoding
Terminology: Description Sets
 The 1:1 principle dictates that each description
describes one, and only one, resource
 We often need to describe grouped sets of
descriptions, which are known in the abstract
model as description sets
• An article and its authors
• A painting and its artist
 When description sets are exchanged between
software applications, they are generally encoded
according to a particular syntax in a metadata
record
Record (encoded as html, XML, or RDF/XML
Description set
Resource Description (URI)
Resource Description (URI)
Resource Description (URI)
Statement
Statement
Statement
language
(pt-BR)
Abstract Model summary (after Andy Powell)
value string
value URI
property (URI)
syntax encoding
scheme
Vocabulary
encoding scheme
General Ideas
 DC is not just the 15 elements, though they
comprise the foundation for simple DC
 50+ properties (elements) have been approved
by DCMI
 The model supports local declarations of
additional properties
 The model supports application profiles (mixing
DC elements with those of other sets)
 The model allows the grouping of descriptions to
create more complex description entities
Syntax Alternatives
 Choosing among alternatives
 HTML
 XML
 RDF/XML
Syntax Alternatives
HTML… XML… RDF/XML
 Three Web-based models for deploying metadata
 Each has advantages and disadvantages
 What is ‘best’ depends on local constraints
• What is the objective of the system? How do
these syntax alternatives support local
functional requirements?
• Are there services and software to ‘consume’
the metadata created?
• Are trained practitioners available to create
and support the systems?
Syntax Alternatives: HTML
 Advantages:
• Simple – META tags embedded in content
• Widely deployed tools and knowledge
• Resource carries its metadata around with it
• Metadata is openly harvestable
Syntax Alternatives: HTML (continued)
 Disadvantages
• Limited structural richness (does not support
hierarchical, tree-structured data
• Management of metadata is less reliable (the
metadata is out in the wild)
 Describe one thing (the HTML document) and no
more!
Dublin Core in HTML (example)
<head>
<link rel="schema.DC" href="http://purl.org/dc">
<meta name="DC.title"
content=“DC Metadata Tutorial”
<meta name="DC.creator"
content=“Stuart L. Weibel">
<meta name="DC.subject" xml:lang= “en-US’
content=“Metadata">
<meta name="DC.date" scheme=“DCTERMS.W3CDTF"
content=“2007-07-08">
<meta name=“DCTERMS.audience”
content =“technical librarians”
</head>
<body>
… [ rest of html document ]
The namespaces for HTML encoding
 All DCMI terms (elements, element refinements,
and encoding schemes) are found in:
DCMI Metadata Terms
http://dublincore.org/documents/dcmi-terms/
 The namespaces are a result of historical
developments
• DC: [original elements]
• DCTERMS: [later elements]
Syntax Alternatives: XML
 XML = eXtensible Markup Language
 The standard for networked text and data
 Wide-spread tool support
• Parsers are widely available
• Extensibility (XML namespaces)
• Type definitions (XML Schema)
• Transformation and Rendering (XSLT)
• Rich linking semantics (XLINK)
XML Schema
 Rich XML-based language for expressing data-
type semantics
 Replaces arcane and limited DTD (origin in SGML)
 Facilities:
• Data typing (both complex and primitive)
• Constraints (ranges, cardinality…)
• Defaults (specify defaults for certain
properties)
Dublin Core fragment in XML
<metadata
xmlns:dc="http://www.openarchives.org
/OAI/dc.xsd">
<dc:creator>Carl Lagoze</dc:creator>
<dc:title>Accommodating Simplicity and
Complexity in Metadata</dc:title>
<dc:date>2000-07-01</dc:date>
<dc:publisher>Cornell University,
Computer Science</dc:publisher>
</metadata>
Where is the rest of the stuff? In the schema!
Case Study: OAI-PMH
OAI Protocol for Metadata Harvesting
 Open Archives Initiative
http://www.openarchives.org
• Simple Protocol for sharing metadata records
 Based on HTTP, XML, XML Schema, and XML
namespaces
 Allows a harvester to query a remote repository
for some or all of its metadata records
 DC is the default native metadata format in the
OAI protocol
Syntax Alternatives: RDF
 RDF (Resource Description Format)
 Syntax expressed in XML
 W3C recommendation for encoding metadata (a
semantic Web technology)
 Enabling technology for richly-structured metadata
 Rich data model (the DC Abstract Model is a
constrained version of RDF)
 Metadata can be shared easily among independent
applications that understand RDF
 W3C – Resource Description Framework (RDF)
http://www.w3.org/RDF/
Summary: Syntax alternatives
 Choices should be driven by local requirements
and objectives
• Available expertise
• Costs of Deployment
• Objectives and functional requirements
Association Models
Where do we keep the metadata?
 Embedded
• HTML META tags or XML or RDF-XML can be embedded
in the resource, and hence travels with the resource
• Simple, but limited in structural richness
 Loosely coupled
• Shadow Files (like Adobe’s XMP Sidecar files)
• Requires a system to manage and insure that they stay
in synch
• RDF or XML descriptions
 Third Party Metadata
• Stored in repositories such as library catalogs
• Easier to manage and maintain, and provide service
• Library catalogs, for example
Questions about syntax alternatives?
Application Profiles:
Mixing and Matching Metadata
 What is an Application Profile?
 Why bother?
 Creating new properties
 Documenting and declaring new
properties
 Some examples
Application Profiles: Mixing and Matching
Metadata
• The mixing and matching of elements
(properties) from separate metadata sets
• An expression of metadata modularity
• Implementers can benefit from peer applications
• Communities can harmonize their metadata,
picking complementary properties
• Promotes convergence over time
• For application profiles to work, there must be
public declarations of properties that conform to
a common data model (or nearly so)
Application Profile: Definition
 Declaration of metadata properties used in a
given organization or application or community
 Documentation of encodings, constraints, and
creation guidelines
 Implies formal schemas (xml schemas or RDF
schemas)
 Should promote both human understanding and
machine interoperability
 The concept of application profiles applies to any
metadata community of practice, not just DC
 DC has promoted their use and leads by example
Why bother?
 One-size-fits-all metadata results in bloated,
unmanageable specifications and applications
 APs allow tailoring a given metadata application
to match the element set to specific functional
requirements based on local or community
needs, while retaining interoperability with a
larger metadata community
Creating an Application Profile
 Find out what others have done… don’t re-invent
wheels!
 Develop community consensus
 Define Name, Label, definition relationships (see
the DCMI Usage Board guidelines)
 Determine an appropriate URI (a home on the
Web)
 Dublin Core Application Profile Guidelines
http://dublincore.org/usage/documents/profile-guidelines/
Document New Properties
 At very least: a Web page with relevant
information
 Better: a web page with a public schema using
new terms in an application profile
 Better still: all properties available as part of a
metadata registry
Example Application Profiles
 DC-Library AP
 DC-Collection Description AP
 DC-Government AP
 DC-Education AP
Some History of the Dublin Core
and
How the Initiative Works
• The Beginnings
• Landmarks
• Workshops and Conference series
• What the initiative does
• Standardization
• Some example applications
Dublin Core: The Beginning
 A casual discussion at WWW-2 in Chicago,
October of 1994
• How to make things on the Web easier to find?
 OCLC & NCSA co-sponsored an invitational
workshop in March of 1995
 The workshop became a workshop series, and
eventually a conference series
 DCMI: Dublin Core Metadata Initiative
• Governance and process evolved over time
• De facto standards maintenance body
Dublin Core Landmarks
 1994: Simple tags to describe Web pages
 1995: The Dublin Core is one of many
vocabularies needed ("Warwick
Framework")
 1996: The Dublin Core: 13 elements
expanded to 15 - appropriate for Text and
Images
 1997: WF needs formal expression in a
Resource Description Framework (RDF)
Dublin Core Landmarks (continued)
 2000: Dublin Core Metadata Initiative
recommends qualifiers, broadens its
organizational scope beyond the Core
 2001: Workshop Series becomes a
conference series
 DCMI Affiliates and a board of trustees
 2005: Abstract Model (Finally)
The Dublin Core Workshop Series
 Workshop Venues:
US DC 1, 3, 6
UK DC 2
Australia DC 4
Finland DC 5
Germany DC 7
Canada DC 8
 Conferences
Tokyo (2001) China (2004)
Florence (2002) Spain (2005)
Seattle (2003) Mexico (2006)
DCMI Activities
 Standards development and maintenance
 Metadata registry and infrastructure
 Technical working groups and periodic
workshops
 Tutorial materials and user guides
 Education and training
 Open source software
 Liaisons with other standards or user
communities
Governance of DCMI
 DCMI has a Board of Trustees that oversees the
operation and goals of the initiative
 Managing Director
• Makx Dekkers
 Director of Specifications and Documentation
• Tom Baker
 An Advisory Board of metadata experts provides
guidance on metadata issues
The DCMI Usage Board
 The Usage Board is an editorial committee that
evaluates proposals for new elements or revisions
 International selection of metadata experts
 Meet twice yearly
 Documents decisions and updates DCTERMS
document
Affiliate Program
 DCMI has National Affiliates which support the
Initiative and are represented on the Board of
Trustees
• Finland
• UK
• Singapore
• New Zealand
• Korea
 OCLC has been the Host from the start
The Three I’s
 Independent: DCMI is not controlled by specific
commercial or other interests and is not biased
towards specific domains nor does it mandate
specific technical solutions
 International: DCMI encourages participation
from organizations anywhere in the world,
respecting linguistic and cultural differences
 Influenceable: DCMI is an open organization
aiming at building consensus among the
participating organizations; there are no
prerequisites for participation
The Work gets done by Communities and
task groups
 Accessibility Community
 Collection Description Community
 Education Community
 Environment Community
 Global Corporate Circle
 Government Community
 Kernel Community
 Libraries Community
 Localization and Internationalization Community
 Preservation Community
 Registry Community
 Social Tagging Community
 Standards Community
 Tools Community
Standardization of the Dublin Core
 IETF RFC 2413
• http://www.ietf.org/rfc/rfc2413.txt
 CEN Workshop Agreement (Europe)
• endorse Dublin Core elements as
CWA13874
 NISO Z39.85
• National Information Standards
Organization, an ANSI affiliate
 ISO 15836
Metadata Applications - examples
 Governments
• 7 governments have adopted DC metadata
• Adobe products
• XMP – Adobe’s variant of RDF
• Dublin Core is a base schema
 IPTC – International Press and
Telecommunications Council
• Dublin Core based standard for journalism
 Knowledge Management systems commonly use
DC metadata
 Visual materials require metadata for findability
 Library Systems (mostly MARC cataloging, but
increasingly other metadata as well)
Metadata applications (continued)
 Search Systems
• Full text indexing is enormously useful
• Structured metadata improves search
• The Amazoogles are all aggressively courting
metadata aggregators
 Cameras
• Automatically create metadata for each image
• Some even include GPS data
 Commerce systems require metadata
 Social Software applications are largely about
enriching resource information with tags,
reviews, and automated linking
To Sum Up…
 Many purpose-built metadata standards
 Few have explicit data models
 Few interoperate
 Some will survive, others will not
 The Web demands convergence
• Break down silos between domains and
communities of practice
• RDF should help promote convergence, but we
are not there yet
 Expect more metadata standards, but hope for
fewer
How to Participate
 Join the
DC-General
mailing list
 Join a working
group
 Information
on lists and
working groups
is available at http://dublincore.org
Stuart L. Weibel
Visit me at: http://weibel-lines.typepad.com
Contact me at: Weibel@oclc.org
Thank you for your
attention

More Related Content

Similar to Dublin Core Metadata Tutorial.ppt

Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxNurulIzrin
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniJAXLondon2014
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptxSoniaDevi15
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I pkaviya
 
Utsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnetUtsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnetUtsav Mahendra
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In PracticeMarcia Zeng
 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdfTOUSEEQHAIDER14
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTVikas Bhushan
 
9a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc29a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc2Mukund Trivedi
 

Similar to Dublin Core Metadata Tutorial.ppt (20)

Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
Week 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptxWeek 2 - Database System Development Lifecycle-old.pptx
Week 2 - Database System Development Lifecycle-old.pptx
 
DBMS - Introduction.ppt
DBMS - Introduction.pptDBMS - Introduction.ppt
DBMS - Introduction.ppt
 
lecture5 (1) (2).pptx
lecture5 (1) (2).pptxlecture5 (1) (2).pptx
lecture5 (1) (2).pptx
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim RemaniFinding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
Finding your Way in the Midst of the NoSQL Haze - Abdelmonaim Remani
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
DATABASE MANAGEMENT SYSTEM
DATABASE MANAGEMENT SYSTEMDATABASE MANAGEMENT SYSTEM
DATABASE MANAGEMENT SYSTEM
 
Module 1 - Chapter1.pptx
Module 1 - Chapter1.pptxModule 1 - Chapter1.pptx
Module 1 - Chapter1.pptx
 
(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)(Dbms) class 1 & 2 (Presentation)
(Dbms) class 1 & 2 (Presentation)
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I
 
Utsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnetUtsav Mahendra : Introduction to Database and managemnet
Utsav Mahendra : Introduction to Database and managemnet
 
Dublin Core In Practice
Dublin Core In PracticeDublin Core In Practice
Dublin Core In Practice
 
Database systems introduction
Database systems introductionDatabase systems introduction
Database systems introduction
 
01-Database Administration and Management.pdf
01-Database Administration and Management.pdf01-Database Administration and Management.pdf
01-Database Administration and Management.pdf
 
1 overview-handout-notes
1 overview-handout-notes1 overview-handout-notes
1 overview-handout-notes
 
oracle
oracle oracle
oracle
 
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENTMETADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
METADATA: A PRACTICE AND ITS SERVICES TOWARDS DIGITAL ENVIRONMENT
 
9a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc29a797dbms chapter1 b.sc2
9a797dbms chapter1 b.sc2
 
DBMS introduction
DBMS introductionDBMS introduction
DBMS introduction
 

Recently uploaded

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Dublin Core Metadata Tutorial.ppt

  • 1. Dublin Core Metadata Tutorial July 9, 2007 Stuart Weibel Senior Research Scientist OCLC Programs and Research
  • 2. Tutorial Roadmap  Principles of Metadata  Dublin Core Metadata Basics  The Dublin Core Abstract Model  Syntax Alternatives for DC Metadata  Mixing and Matching Metadata  History and workings of the Dublin Core Metadata Initiative  Acknowledgements: I have borrowed liberally from tutorial slides sets from Tom Baker, Diane Hillman, Andy Powell, and Marty Kurth, available at Dublincore.org
  • 3. Basic Principles of Metadata The Web as an information system The Internet Commons Interoperability is key MARC lives The varieties of metadata Modularity Some Challenges
  • 4. State of the Web as an Information System  Search systems are motivated by business models, not functionality  Index coverage is broad, but unpredictable  Too much recall, too little precision  Index spam abounds  Resources (and their names) are volatile  What about versions, editions, back issues?  Archiving is presently unsolved  Authority and quality of service are spotty  Managing Intellectual Property Rights is difficult
  • 5. Metadata: Part of a Solution  Structured data about other data • helps to impose order on chaos • enables automated discovery/manipulation  Full Text Web indexing is the dominant idiom for search  Metadata is more useful in structured collections, used in combination with applications designed to take advantage of structured descriptions
  • 6. Internet Commons includes Multiple Communities Scientific Data Home Pages Geo Internet Commons Library Museums Commerce Whatever...
  • 7. Interoperability requires conventions about:  Semantics • The meaning of the elements  Structure • human-readable • machine-parseable  Syntax • grammars to convey semantics and structure
  • 8. Haven’t we done metadata already? The MARC family of standards is the single most successful resource description standard in the world
  • 9. MARC Cataloging…  Is really MARC-AACR2 cataloging • MARC is the communications format • AACR2 (Anglo-American Cataloging Rules) defines the cataloging rules (semantics  MARC and AACR2 are evolving • Closer alignment with XML as a syntax option • RDA is an effort to modernize AACR2, and alignment it with networked environments  RDA and Dublin Core are cooperating on alignment of a common underlying data model.
  • 10. What’s wrong with this model on the Web?  Expensive • Complex • Professional Catalogers required  Bias towards bibliographic artifacts • Fixed resources • Incomplete handling of resource evolution and other resource relationships  Anglo-centric • MARC 21 accounts for ¾ of MARC records, but there are many other varieties
  • 11. Metadata Takes Many Forms resource discovery document administration rights management content rating security and authentication archival status products and services database schemas process control or description
  • 12. Warwick Framework: Modular Metadata  Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)  Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata  Provide context for metadata efforts (including Dublin Core) • avoids the “black-hole” of comprehensive element sets • focuses interoperability issues at package level  A conceptual framework, NOT an application
  • 13. Modularity and Extensibility: the Lego metaphor  DC is a beginning, not an end  An architecture for modular, extensible metadata  The simplest common denominator • Add stuff you need for • Local requirements • Domain specific functionality • Other dimensions of description • Eg cloud cover… management… structural metadata….
  • 14. Descriptive Metadata Standards  IEEE LOM (Learning Object Metadata) • Descriptive and structural metadata to support instructional systems  ONIX (Online Information Exchange) – bookseller metadata  FGDC – Federal Geographic Data Committee: rich descriptive and structural metadata for GIS applications  Encoded Archival Description – description of archival collections  MPEG Multimedia Metadata – large, complicated, still in progress – descriptive, structural, rights management  Dublin Core – core descriptive metadata
  • 15. Metadata Creation  Metadata is expensive and error prone • A MARC Record costs about $100 USD to create one record at the Library of Congress • Competes with indexing at… $ 00.001 ???  Capture it as close to point of creation as possible  Capture as much automatically as possible  Should be designed with close attention to the functional requirements it serves  Re-use existing standards whenever possible  Always tension between completeness of description, intended purpose, and cost
  • 16. Metadata Challenges  Accommodate multiple varieties of metadata  Tension: functionality and simplicity  Tension: extensibility and interoperability  Human and machine creation and use  Community-specific functionality, creation, administration, access work at cross purposes to global interoperability
  • 17. Interoperability barriers cost time and money A Common data model helps avoid this
  • 18. Dublin Core Basics  Design Philosophy – useful metaphors Language and pidgins  Characteristics of DC metadata  The simple bucket (properties)  Resource Types  Metadata grammar  Dublin Core Principles One-to-one Dumb-down rule Context appropriate values  Translations
  • 19. Dublin Core: Starting Assumptions and Essential Features  Simple • true to a point: the elements are simple, the underlying model is not  Consensus-based • Crucial to early success, both in attracting expertise and deployment. Bottom up  Based on the experience of practitioners, but hard to capture and capitalize on lessons learned  Cross-disciplinary and International • Central success factor
  • 20. Essential Features (continued)  The Web is the strategic application • On the mark  International • Also central success factor, but hard (20 languages in the Registry)  Lego-like modularity & extensibility • Partially realized promise • Application Profiles are the means  Syntax independence • An ongoing nightmare (HTML…XML…RDF/XML)  Authors will describe their own works • Laughably naïve
  • 21. A Pidgin for Digital Tourists  Metadata is language  Dublin Core is a small and simple language -- a pidgin -- for finding resources across domains  Speakers of different languages naturally "pidginize" to communicate • E.g., tourists using simple phrases to order beer ("zwei Bier bitte" "dva pivo" "biru o san bai"...)  We are all "tourists" on the Internet.
  • 22. A Grammar of Dublin Core  By design not as rich as mother tongues, but easy to learn and useful in practice  Pidgins: small vocabularies (Dublin Core: fifteen special nouns and lots of optional adjectives)  Simple grammars: sentences (statements) follow a simple fixed pattern...  http://www.dlib.org/dlib/october00/baker/10baker.html
  • 23. Basic Structures in Dublin Core Metadata  The basic unit of metadata is a statement: • Statements consist of a property (a metadata element) and a value • Metadata statements describe resources • More about the Dublin Core Abstract model later resource statement value property
  • 24. What are the properties and values in the following metadata statements? 245 00 $a Amores perros $h [videorecording] <title> Nueve reinas </title> <type> MovingImage </type> • Different models for conveying related information • Dublin Core syntax fits in more naturally with the structure of the Web
  • 25. Resource has property DC:Creator DC:Title DC:Subject DC:Date... X implied subject implied verb one of 15 properties property value (an appropriate literal) qualifiers (adjectives)
  • 26. The fifteen elements (properties) Creator Title Subject Contributor Date Description Publisher Type Format Coverage Rights Relation Source Language Identifier
  • 27. Varieties of qualifiers: Element Refinements  Make the meaning of an element narrower or more specific. • a Date Created versus a Date Modified • an IsReplacedBy Relation versus a Replaces Relation  If your software does not understand the qualifier, you can safely ignore it.
  • 28. Varieties of Qualifiers: Value Encoding Schemes  Says that the value is • a term from a controlled vocabulary (e.g., Library of Congress Subject Headings) • a string formatted in a standard way (e.g., "2001-05-02" means May 3, not February 5)  Even if a scheme is not known by software, the value should be "appropriate" and usable for resource discovery.
  • 29. Resource has Date "2000-06-13" Resource has Subject "Languages -- Grammar"
  • 30. Dumb-Down Principle for Qualifiers  Simple DC does not use element refinements or encoding schemes – statements contain only value strings  Qualified DC uses features of the DCMI Abstract Model, including element refinements and encoding schemes  Dumbing-down is translating Qualified DC to simple DC  Qualifiers refine meaning (but may be harder to understand)
  • 31. The One to One Principle  Each resource should have one metadata description • For example, do not describe a digital image of the Mona Lisa as if it were the original painting  Group Related descriptions into description sets • Describe an artist and his or her work separately, not in a single description
  • 32. Appropriate Values  There are generally tradeoffs between local requirements and global requirements  Use elements and qualifiers to meet the needs of your local context, but…  Keep in mind that machines and people use and interpret metadata, so…  Consider whether the values used will help discovery outside your local context
  • 33. Dublin Core as a multilingual metadata language  Dublin Core has been translated into 20 + languages • machine-readable tokens are shared by all • human-readable labels are defined in different languages • translations are distributed, maintained in many countries • eventually linked in DCMI registry
  • 34.
  • 35. One token – labels in many languages dc:creator “Verfasser” label “Creator” label “Pencipta” label [Server in Germany] [Server in Jakarta] [DCMI Server]
  • 36. Metadata languages are "multilingual"  Metadata is not a spoken language  The words of metadata -- "elements" -- are symbols that stand for concepts expressible in multiple natural languages  Standards may have dozens of translations  Are concepts like "title", "author", or "subject" used the same way in English, Finnish, and Korean?
  • 37. DCMI Open Metadata Registry  Managing vocabularies defined by the DCMI • Languages • Versioning • Controlled vocabularies  Foundation for modular, incremental integration and evolution  The Registry working group is a Dublin Core Community with participants around the world
  • 38. The Dublin Core Abstract Model Terminology Simple versus Qualified DC Resources Descriptions Description sets Value Strings Element refinements Encoding Schemes Graphical representation of the Abstract Model Summary of general ideas
  • 39. Important DCMI Document concerning the Abstract Model and Syntax alternatives  DCMI Abstract Model http://dublincore.org/documents/abstract-model/  Expressing Dublin Core in HTML/XHTML meta and link elements http://dublincore.org/documents/dcq-html/  Expressing Dublin Core metadata using the Resource Description Framework (RDF) http://dublincore.org/documents/dc-rdf/  Expressing Dublin Core metadata using XML http://dublincore.org/documents/dc-xml/
  • 40. Simple versus Qualified DC  Simple DC supports single descriptions using the 15 base elements and value strings  Qualified DC supports the richer features of the Abstract Model, and allows the use of all DCMI terms as well as other, non-DCMI terms.  An application profile is used to specify a metadata application that includes DCMI terms in combination with non-DCMI terms (mix & match metadata).
  • 41. The DCMI Abstract Model  A data model for Dublin Core  Agreed upon underlying structure for metadata statements  Many years in the making -- long term contention  Describes the structure of statements about resources that we make in our metadata language: resource statement value property
  • 42. What is a resource?  W3C definition: • “anything that has identity… electronic document, an image, a service” • “not all resources are network retrievable; e.g. human beings, corporations, and bound books can also be considered resources”  In other words, a resource is anything we can identify: • Physical things (books, people, airplanes….) • Digital things (Images, web pages, services….) • Concepts (colors, subjects, eras, places)  In the DC context, the DCMI Type list describes the stuff we describe with DC metadata
  • 43. Resource types for which DC is often used Collection Dataset Event Image Interactive Resource Moving Image Physical Object Service Software Sound Still Image Text DCMI TYPE Vocabulary
  • 44. Abstract Model: Descriptions  A description is composed of: • One or more statements about a single resource • Optionally, the URI of the resource being described  Each statement is made up of • A property URI (that identifies a property) • A value URI (that identifies a value) and/or one or more representations of the value (a value string)
  • 45. Terminology: Value Strings  A value string is a human-readable string that represents the value of the property  Each value string may have an associated value string language that is an ISO language tag (e.g., pt-BR)
  • 46. Terminology: Element Refinements  Elements are the same as properties  Element refinements are the same as sub- properties  An element refinement is a special case of an element that shares the meaning of its ‘parent’, but has narrower semantics  Paulo is illustrator of a book, therefore he is also a contributor to the book Illustrator is an element refinement of contributor
  • 47. Terminology: Encoding Schemes  Values and value strings can be ‘qualified’ by encoding schemes in order to clarify their meaning • A Vocabulary Encoding Scheme is used to indicate a terminology set from which a value is taken: Stem cells—Research is a value from LCSH 616.02774 is a value from DDC-22 • A syntax encoding scheme is used to indicate the structure of a value string 2004-10-12 is structured according to the W3CDTF rules for date encoding
  • 48. Terminology: Description Sets  The 1:1 principle dictates that each description describes one, and only one, resource  We often need to describe grouped sets of descriptions, which are known in the abstract model as description sets • An article and its authors • A painting and its artist  When description sets are exchanged between software applications, they are generally encoded according to a particular syntax in a metadata record
  • 49. Record (encoded as html, XML, or RDF/XML Description set Resource Description (URI) Resource Description (URI) Resource Description (URI) Statement Statement Statement language (pt-BR) Abstract Model summary (after Andy Powell) value string value URI property (URI) syntax encoding scheme Vocabulary encoding scheme
  • 50. General Ideas  DC is not just the 15 elements, though they comprise the foundation for simple DC  50+ properties (elements) have been approved by DCMI  The model supports local declarations of additional properties  The model supports application profiles (mixing DC elements with those of other sets)  The model allows the grouping of descriptions to create more complex description entities
  • 51. Syntax Alternatives  Choosing among alternatives  HTML  XML  RDF/XML
  • 52. Syntax Alternatives HTML… XML… RDF/XML  Three Web-based models for deploying metadata  Each has advantages and disadvantages  What is ‘best’ depends on local constraints • What is the objective of the system? How do these syntax alternatives support local functional requirements? • Are there services and software to ‘consume’ the metadata created? • Are trained practitioners available to create and support the systems?
  • 53. Syntax Alternatives: HTML  Advantages: • Simple – META tags embedded in content • Widely deployed tools and knowledge • Resource carries its metadata around with it • Metadata is openly harvestable
  • 54. Syntax Alternatives: HTML (continued)  Disadvantages • Limited structural richness (does not support hierarchical, tree-structured data • Management of metadata is less reliable (the metadata is out in the wild)  Describe one thing (the HTML document) and no more!
  • 55. Dublin Core in HTML (example) <head> <link rel="schema.DC" href="http://purl.org/dc"> <meta name="DC.title" content=“DC Metadata Tutorial” <meta name="DC.creator" content=“Stuart L. Weibel"> <meta name="DC.subject" xml:lang= “en-US’ content=“Metadata"> <meta name="DC.date" scheme=“DCTERMS.W3CDTF" content=“2007-07-08"> <meta name=“DCTERMS.audience” content =“technical librarians” </head> <body> … [ rest of html document ]
  • 56. The namespaces for HTML encoding  All DCMI terms (elements, element refinements, and encoding schemes) are found in: DCMI Metadata Terms http://dublincore.org/documents/dcmi-terms/  The namespaces are a result of historical developments • DC: [original elements] • DCTERMS: [later elements]
  • 57. Syntax Alternatives: XML  XML = eXtensible Markup Language  The standard for networked text and data  Wide-spread tool support • Parsers are widely available • Extensibility (XML namespaces) • Type definitions (XML Schema) • Transformation and Rendering (XSLT) • Rich linking semantics (XLINK)
  • 58. XML Schema  Rich XML-based language for expressing data- type semantics  Replaces arcane and limited DTD (origin in SGML)  Facilities: • Data typing (both complex and primitive) • Constraints (ranges, cardinality…) • Defaults (specify defaults for certain properties)
  • 59. Dublin Core fragment in XML <metadata xmlns:dc="http://www.openarchives.org /OAI/dc.xsd"> <dc:creator>Carl Lagoze</dc:creator> <dc:title>Accommodating Simplicity and Complexity in Metadata</dc:title> <dc:date>2000-07-01</dc:date> <dc:publisher>Cornell University, Computer Science</dc:publisher> </metadata> Where is the rest of the stuff? In the schema!
  • 60. Case Study: OAI-PMH OAI Protocol for Metadata Harvesting  Open Archives Initiative http://www.openarchives.org • Simple Protocol for sharing metadata records  Based on HTTP, XML, XML Schema, and XML namespaces  Allows a harvester to query a remote repository for some or all of its metadata records  DC is the default native metadata format in the OAI protocol
  • 61. Syntax Alternatives: RDF  RDF (Resource Description Format)  Syntax expressed in XML  W3C recommendation for encoding metadata (a semantic Web technology)  Enabling technology for richly-structured metadata  Rich data model (the DC Abstract Model is a constrained version of RDF)  Metadata can be shared easily among independent applications that understand RDF  W3C – Resource Description Framework (RDF) http://www.w3.org/RDF/
  • 62. Summary: Syntax alternatives  Choices should be driven by local requirements and objectives • Available expertise • Costs of Deployment • Objectives and functional requirements
  • 63. Association Models Where do we keep the metadata?  Embedded • HTML META tags or XML or RDF-XML can be embedded in the resource, and hence travels with the resource • Simple, but limited in structural richness  Loosely coupled • Shadow Files (like Adobe’s XMP Sidecar files) • Requires a system to manage and insure that they stay in synch • RDF or XML descriptions  Third Party Metadata • Stored in repositories such as library catalogs • Easier to manage and maintain, and provide service • Library catalogs, for example
  • 64. Questions about syntax alternatives?
  • 65. Application Profiles: Mixing and Matching Metadata  What is an Application Profile?  Why bother?  Creating new properties  Documenting and declaring new properties  Some examples
  • 66. Application Profiles: Mixing and Matching Metadata • The mixing and matching of elements (properties) from separate metadata sets • An expression of metadata modularity • Implementers can benefit from peer applications • Communities can harmonize their metadata, picking complementary properties • Promotes convergence over time • For application profiles to work, there must be public declarations of properties that conform to a common data model (or nearly so)
  • 67. Application Profile: Definition  Declaration of metadata properties used in a given organization or application or community  Documentation of encodings, constraints, and creation guidelines  Implies formal schemas (xml schemas or RDF schemas)  Should promote both human understanding and machine interoperability  The concept of application profiles applies to any metadata community of practice, not just DC  DC has promoted their use and leads by example
  • 68. Why bother?  One-size-fits-all metadata results in bloated, unmanageable specifications and applications  APs allow tailoring a given metadata application to match the element set to specific functional requirements based on local or community needs, while retaining interoperability with a larger metadata community
  • 69. Creating an Application Profile  Find out what others have done… don’t re-invent wheels!  Develop community consensus  Define Name, Label, definition relationships (see the DCMI Usage Board guidelines)  Determine an appropriate URI (a home on the Web)  Dublin Core Application Profile Guidelines http://dublincore.org/usage/documents/profile-guidelines/
  • 70. Document New Properties  At very least: a Web page with relevant information  Better: a web page with a public schema using new terms in an application profile  Better still: all properties available as part of a metadata registry
  • 71. Example Application Profiles  DC-Library AP  DC-Collection Description AP  DC-Government AP  DC-Education AP
  • 72. Some History of the Dublin Core and How the Initiative Works • The Beginnings • Landmarks • Workshops and Conference series • What the initiative does • Standardization • Some example applications
  • 73. Dublin Core: The Beginning  A casual discussion at WWW-2 in Chicago, October of 1994 • How to make things on the Web easier to find?  OCLC & NCSA co-sponsored an invitational workshop in March of 1995  The workshop became a workshop series, and eventually a conference series  DCMI: Dublin Core Metadata Initiative • Governance and process evolved over time • De facto standards maintenance body
  • 74. Dublin Core Landmarks  1994: Simple tags to describe Web pages  1995: The Dublin Core is one of many vocabularies needed ("Warwick Framework")  1996: The Dublin Core: 13 elements expanded to 15 - appropriate for Text and Images  1997: WF needs formal expression in a Resource Description Framework (RDF)
  • 75. Dublin Core Landmarks (continued)  2000: Dublin Core Metadata Initiative recommends qualifiers, broadens its organizational scope beyond the Core  2001: Workshop Series becomes a conference series  DCMI Affiliates and a board of trustees  2005: Abstract Model (Finally)
  • 76. The Dublin Core Workshop Series  Workshop Venues: US DC 1, 3, 6 UK DC 2 Australia DC 4 Finland DC 5 Germany DC 7 Canada DC 8  Conferences Tokyo (2001) China (2004) Florence (2002) Spain (2005) Seattle (2003) Mexico (2006)
  • 77.
  • 78. DCMI Activities  Standards development and maintenance  Metadata registry and infrastructure  Technical working groups and periodic workshops  Tutorial materials and user guides  Education and training  Open source software  Liaisons with other standards or user communities
  • 79. Governance of DCMI  DCMI has a Board of Trustees that oversees the operation and goals of the initiative  Managing Director • Makx Dekkers  Director of Specifications and Documentation • Tom Baker  An Advisory Board of metadata experts provides guidance on metadata issues
  • 80. The DCMI Usage Board  The Usage Board is an editorial committee that evaluates proposals for new elements or revisions  International selection of metadata experts  Meet twice yearly  Documents decisions and updates DCTERMS document
  • 81. Affiliate Program  DCMI has National Affiliates which support the Initiative and are represented on the Board of Trustees • Finland • UK • Singapore • New Zealand • Korea  OCLC has been the Host from the start
  • 82. The Three I’s  Independent: DCMI is not controlled by specific commercial or other interests and is not biased towards specific domains nor does it mandate specific technical solutions  International: DCMI encourages participation from organizations anywhere in the world, respecting linguistic and cultural differences  Influenceable: DCMI is an open organization aiming at building consensus among the participating organizations; there are no prerequisites for participation
  • 83. The Work gets done by Communities and task groups  Accessibility Community  Collection Description Community  Education Community  Environment Community  Global Corporate Circle  Government Community  Kernel Community  Libraries Community  Localization and Internationalization Community  Preservation Community  Registry Community  Social Tagging Community  Standards Community  Tools Community
  • 84. Standardization of the Dublin Core  IETF RFC 2413 • http://www.ietf.org/rfc/rfc2413.txt  CEN Workshop Agreement (Europe) • endorse Dublin Core elements as CWA13874  NISO Z39.85 • National Information Standards Organization, an ANSI affiliate  ISO 15836
  • 85. Metadata Applications - examples  Governments • 7 governments have adopted DC metadata • Adobe products • XMP – Adobe’s variant of RDF • Dublin Core is a base schema  IPTC – International Press and Telecommunications Council • Dublin Core based standard for journalism  Knowledge Management systems commonly use DC metadata  Visual materials require metadata for findability  Library Systems (mostly MARC cataloging, but increasingly other metadata as well)
  • 86. Metadata applications (continued)  Search Systems • Full text indexing is enormously useful • Structured metadata improves search • The Amazoogles are all aggressively courting metadata aggregators  Cameras • Automatically create metadata for each image • Some even include GPS data  Commerce systems require metadata  Social Software applications are largely about enriching resource information with tags, reviews, and automated linking
  • 87. To Sum Up…  Many purpose-built metadata standards  Few have explicit data models  Few interoperate  Some will survive, others will not  The Web demands convergence • Break down silos between domains and communities of practice • RDF should help promote convergence, but we are not there yet  Expect more metadata standards, but hope for fewer
  • 88. How to Participate  Join the DC-General mailing list  Join a working group  Information on lists and working groups is available at http://dublincore.org
  • 89. Stuart L. Weibel Visit me at: http://weibel-lines.typepad.com Contact me at: Weibel@oclc.org Thank you for your attention

Editor's Notes

  1. 7/5/2023