Everything you wanted to know about Dublin Core metadata
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Everything you wanted to know about Dublin Core metadata

on

  • 5,829 views

Tutorial for Eduserv staff, Bath, Thursday 24 April 2008

Tutorial for Eduserv staff, Bath, Thursday 24 April 2008

Statistics

Views

Total Views
5,829
Views on SlideShare
5,825
Embed Views
4

Actions

Likes
2
Downloads
115
Comments
0

1 Embed 4

http://www.slideshare.net 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Everything you wanted to know about Dublin Core metadata Presentation Transcript

  • 1. Everything you wanted to know about Dublin Core metadata but… Tutorial for Eduserv Staff, Bath
  • 2. Dublin Core, the DCMI Abstract Model & DC Application Profiles
    • Title slide photo “ Orange flavour” by Flickr user kinderu See http://www.flickr.com/photos/kinderu/2328911735/ Made available under CC Attribution- NonCommercial -Share-Alike 2.0 license
  • 3. Show of Hands
    • Designing/developing metadata applications?
    • Designing/developing Dublin Core metadata applications?
    • Heard of the DCMI Abstract Model?
    • Read the DCMI Abstract Model?
    • Familiar with RDF?
    • Designing/developing RDF applications?
  • 4. Everything you wanted to know about Dublin Core metadata but...
    • “ Dublin Core” in c.2003
    • The DCMI Abstract Model
      • DCAM & RDF
    • Syntax: “Encoding” Dublin Core metadata
    • Context & Constraints: “DC Application Profiles” & the Singapore Framework
    • Example: The Scholarly Works (ePrints) DC Application Profile (SWAP)
    • “ Dublin Core” in 2008
  • 5. “ Dublin Core” in c.2003
    • Metadata vocabularies
      • … but what is a DC “element”?
    • Syntax independence & encoding guidelines
      • … but what are we “encoding”?
    • “ Simple” and “Qualified” DC
      • … vocabularies?
      • … formats? (e.g. oai_dc)
      • … constraints on use of vocabularies? On which vocabularies?
    • DC application profiles
      • … “ (re)using” terms? But what “terms” can we “(re)use”?
    • Grammatical Principles
    • DC & the Resource Description Framework
    • Absence of domain model(s)
      • the “checklist” approach to DC implementation
  • 6. The DCMI Abstract Model
  • 7. The DCMI Abstract Model
    • Work on DCAM by DCMI Architecture Community from mid-2003, initiated by Andy Powell
    • Initial Version, DCMI Recommendation, 2005-03-07
    • Second Version, DCMI Recommendation, 2007-06-04
      • http://dublincore.org/documents/2007/06/04/abstract-model/
  • 8. DCAM and Resources
    • DCAM concerned with description of resources
    • DCAM adopts Web Architecture/RFC3986 definition of resource
      • the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP to SMS gateway), a collection of other resources, and so on.
      • A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources.
      • Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).
      • RFC3986 URI Syntax
  • 9. The DCMI Abstract Model
    • DCAM describes
      • Components and constructs that make up an information structure (“DC description set”)
      • How that information structure is to be interpreted
    • Made up of three related “information models”
      • Resource model
      • Description set model
      • Vocabulary model
  • 10. The DCMI Abstract Model
    • DCAM describes an information structure called a “description set”…
    • … but does not describe how to represent DC description set in concrete form
      • DCMI-defined “Encoding guidelines”
      • Formats defined by others, e.g. Eprints DC-XML
    • DCAM describes various types of metadata term…
    • … but does not specify the use of any fixed set of terms
      • DCMI-owned metadata vocabularies
      • Vocabularies owned/defined by other agencies
  • 11. DCAM Resource Model
  • 12. DCAM Resource Model
    • The “view of the world” on which DC metadata is based
    • a described resource is described using one or more property-value pairs
    • a property-value pair is made up of
      • exactly one property and
      • exactly one value
    • a value is a resource
    • a value is either a literal value or a non-literal value
    • i.e. similar to RDF model of binary relations between resources; simplified entity-relational model
  • 13. DCAM Description Set Model
  • 14. DCAM Description Set Model
    • The structure of “DC metadata”
    • Uses URIs to refer to resources described & to metadata terms (like RDF)
    • a description set is made up of one or more descriptions , each of which describes one resource
    • a description is made up of
      • zero or one described resource URI
        • identifies described resource
      • one or more statements
    • a statement is made up of
      • exactly one property URI
        • identifies property
      • exactly one value surrogate
    • a value surrogate is either a literal value surrogate or a non-literal value surrogate
  • 15. Resource URI Resource URI Description Set Description Statement Property URI Literal Value Surrogate Description Statement Property URI Non-Literal Value Surrogate Statement Property URI Non-Literal Value Surrogate
  • 16. DCAM Description Set Model
    • a literal value surrogate is made up of
      • exactly one value string
        • encodes value
    • a non-literal value surrogate is made up of
      • zero or one value URIs
        • identifies value
      • zero or one vocabulary encoding scheme URI
        • identifies a set of which the value is a member
      • zero or more value strings
        • represents value
    • a value string is either a plain value string or a typed value string
      • a plain value string may have an associated value string language
      • a typed value string is associated with a syntax encoding scheme URI
  • 17. Value URI Description Set Description Statement Property URI Resource URI Literal Value Surrogate Description Statement Property URI Resource URI Non-Literal Value Surrogate Statement Property URI Non-Literal Value Surrogate Vocab Enc Scheme URI Value URI Value string Value string Syntax Enc Scheme URI Language Value string Language
  • 18. DCAM Description Set Model
    • a value may be described by another description
  • 19. Description Set Description Statement Property URI Resource URI Literal Value Surrogate Description Statement Property URI Resource URI Non-Literal Value Surrogate Statement Property URI Non-Literal Value Surrogate Value URI Vocab Enc Scheme URI Value URI Value string Value string Value string Syntax Enc Scheme URI Language Language
  • 20. Description Set Description Statement Property URI Literal Value Surrogate Description Statement Property URI Resource URI Non-Literal Value Surrogate Statement Property URI Non-Literal Value Surrogate Vocab Enc Scheme URI Value URI Value string Value string Value string Syntax Enc Scheme URI Language Language
  • 21. Description sets & RDBMS
    • Not a perfect analogy, but…
      • URIs as keys to tables
      • Each description = (roughly) a row in a table
      • Each statement = (roughly) a field name and field content
      • Each literal value surrogate = field content (possibly typed)
      • Each non-literal value surrogate = secondary key (value URI) and/or data from secondary table
      • Description set = some set of rows which it is useful to bundle together
  • 22. Description Set Description Property URI Resource URI Example: Description set with two descriptions, statements with non-literal value surrogates & literal value surrogates Statement Property URI Non-Literal Value Surrogate Non-Literal Value Surrogate Vocab Enc Scheme URI Value URI Value string Value string Value URI Language Language Description Resource URI Property URI Literal Value Surrogate Value string Language Statement Statement
  • 23. Description Set Description Statement <http:/purl.org/dc/terms/subject> Non-Literal Value Surrogate Non-Literal Value Surrogate <http://example.org/terms/mySH> “ Metadata” &quot;Métadonnées&quot; en fr <http://purl.org/dc/terms/publisher> <http://dublincore.org/documents/abstract-model/> <http://example.org/org/DCMI> Property URI Value URI <http://example.org/org/mySH/h123> Value URI Property URI Vocab Enc Scheme URI Value String Value String Description <http://example.org/org/DCMI> <http://xmlns.com/foaf/ 0.1/name> Literal Value Surrogate “ Dublin Core Metadata Initiative” en Value String Property URI Example: Description set with two descriptions, statements with non-literal value surrogates & literal value surrogates Statement Statement
  • 24. @prefix dcterms <http://purl.org/dc/terms/> . @prefix foaf <http://xmlns.com/foaf/0.1/> . DescriptionSet ( Description ( ResourceURI ( <http://dublincore.org/documents/abstract-model/> ) Statement ( PropertyURI ( dcterms:publisher ) ValueURI (<http://example.org/org/DCMI> ) ) Statement ( PropertyURI ( dcterms:subject ) ValueURI (<http://example.org/mySH/h123> ) VocabEncSchemeURI (<http://example.org/terms/mySH> ) ValueString ( “Metadata” Language (en ) ) ValueString ( &quot;Métadonnées&quot; Language (fr ) ) ) ) Description ( ResourceURI ( <http://example.org/org/DCMI> ) Statement ( PropertyURI ( foaf:name ) LiteralValueString ( “Dublin Core Metadata Initiative” Language (en) ) ) ) )
  • 25. DCAM & RDF
  • 26. DCAM & RDF
    • A history of co-evolution
    • DCAM grounded in concepts of RDF
      • i.e. assertions of binary relationships between resources
      • (rather informally!) shares RDF Semantics
      • basis for merging, inferencing
      • DCAM Vocabulary Model is RDF Schema
    • Doesn’t explicitly use “description model” of RDF (triple, graph)
    • Mapping from DCAM description model to RDF graph provided by “Expressing DC metadata using RDF”, DCMI Recommendation, 2008-01-14
    • (Tentative) plans to revise DCAM to base more formally on RDF model
  • 27. Further references
    • Powell, Nilsson, Naeve, Johnston, Baker. DCMI Abstract Model http://dublincore.org/documents/2007/06/04/abstract-model/
    • Klyne, Carroll. RDF Concepts and Abstract Syntax http://www.w3.org/TR/rdf-concepts/
    • Nilsson, Johnston, Naeve, Powell. “Towards an Interoperability Framework for Metadata Standards”. DC-2006 http://www.dublincore.go.kr/dcpapers/pdf/2006/Paper39.pdf
    • Nilsson (ed), Harmonization of Metadata Standards. ProLEARN Project http://ariadne.cs.kuleuven.be/lomi/images/5/52/D4.7-prolearn.pdf
  • 28. Syntax: “Encoding” Dublin Core metadata
  • 29. “ Encoding” Dublin Core metadata
    • DCAM description model is syntax-independent
    • For transfer between applications, descriptions must be encoded as digital objects (records)
    • “ Encoding Guidelines” describe
      • how abstract information structure is serialised/encoded using a metadata format
      • how instances of a metadata format are decoded/interpreted in terms of abstract information structure
    • Provider and consumer need shared rules for encoding/decoding
    • DCAM description set as “interface”; concrete syntax as implementation
  • 30. System A DC Description Set DC-XML Instance Encode using Binding Construct using DCAM & DSP Decode using Binding DC Description Set Interpret using DCAM System B DC-XML Instance <?xml version=&quot;1.0&quot;?> <dcx:descriptionSet>
  • 31. “ Encoding” Dublin Core metadata
    • Multiple syntaxes available
      • Defined by DCMI
      • Defined by other parties
    • Different syntaxes may be appropriate for different contexts
    • “Encoding guidelines” specify
      • what subset of DCAM description model supported
      • how each supported feature of DCAM encoded as syntactic constructs
      • how syntactic constructs interpreted as DCAM features
  • 32. “ Encoding” Dublin Core metadata
    • Warning!
    • Some of current DCMI “Encoding Guidelines” specs
      • Pre-date development of DCAM
      • Use earlier, simpler “DC abstract models”
      • Not fully compatible with DCAM description set model
    • Updating of specs in progress (2008)
    • Meanwhile, some formats defined outside of DCMI
      • e.g. Eprints DC-XML
  • 33. DC-RDF
    • “ Expressing DC metadata using RDF”, DCMI Recommendation, 2008-01-14
      • http://dublincore.org/documents/2008/01/14/dc-rdf/
      • Uses RDF abstract syntax
      • Supports full DCAM description model
      • Multiple concrete syntaxes available for RDF
        • RDF/XML, N3, Turtle, RDFa etc
      • Stable, complete
      • Example RDF/XML instance
  • 34. DC-XML-Full
    • “ Expressing DC metadata using XML (DC-XML-Full)”, Working Draft, 2007-06-19
      • http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLFGuidelines/2007-06-19
      • Supports full DCAM description model
      • Verbose, but easily processable
      • GRDDL Namespace Transformation to generate RDF/XML
      • To be moved forward (as Proposed Recommendation?), err, some time soon….
      • Example instance documents http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLFInstances/2007-06-19
  • 35. DC-XML-Min
    • “ Expressing DC metadata using XML (DC-XML-Min)”, Working Draft, 2007-06-19
      • http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLMGuidelines/2007-06-19
      • Supports subset of DCAM description model
      • More compact, but more options to handle
      • Use of QNames for URIs makes it slightly difficult to use with W3C XML Schema
      • GRDDL Namespace Transformation to generate RDF/XML
      • To be revised, need clearer requirements
      • Example instance document http://dublincore.org/architecturewiki/DCXMLRevision/DCXMLMInstances/2007-06-19
  • 36. DC-HTML
    • “ Expressing DC metadata using HTML/XHTML meta and link elements”, Proposed Recommendation, 2007-11-05
      • http://dublincore.org/documents/2007/11/05/dc-html/
      • Supports subset of DCAM description model
      • DC metadata in HTML document describes that document
        • or at least document of which HTML page is representation
      • An HTML meta-data profile
      • GRDDL Profile Transformation to generate RDF/XML
      • Minor amendments required
      • To be moved forward as Proposed Recommendation soon….
      • Example instance document http://dublincore.org/documents/2007/11/05/dc-html/ex34/
      • Triples
  • 37. DC-Text
    • “ Expressing DC metadata using DC-Text”, DCMI Recommended Resource, 2007-12-03
      • http://dublincore.org/documents/2007/12/03/dc-text/
      • Supports full DCAM description model
      • Intended for human-readability rather than machine-processing
      • Used in documentation etc
      • Stable, complete
      • (Example instance earlier in presentation)
  • 38. Further references
    • Nilsson. Basic Syntaxes. Tutorial at DC-2007 http://dc2007.sg/T2-BasicSyntaxes.pdf
    • DCMI Architecture Community Jiscmail list http://www.jiscmail.ac.uk/lists/DC-ARCHITECTURE.html
  • 39. Context & Constraints: The DCAM & “DC application profiles”
  • 40. “ DC application profiles”
    • Notion of “DC application profile” widely used within DCMI and by DC implementers
      • Customisation to meet needs of community/domain
      • Typically annotated lists of selected terms to be used in DC metadata
      • Terms defined by DCMI or by other agencies
    • But…
      • Absence of “domain model”
        • Tendency to model of world of homogeneous objects
      • Focus exclusively on vocabulary
        • A “checklist” approach to DCAPs
      • Over-emphasis on 15 properties of DCMES/”Simple DC”
        • Tendency to try to bend/extend 15 properties beyond their intended use….
        • … or reject DC as not useful
  • 41. The DCAM & DC Application Profiles
    • Specification of how to construct description sets (descriptions, statements) to serve some purpose
    • At core, a profile of a “description set”
      • a set of constraints
      • based on E-R model of problem space
    • A DC Application Profile is “packet of documentation” which consists of:
      • Functional requirements (desirable)
      • Domain model (mandatory)
      • Description Set Profile (DSP) (mandatory)
      • Usage guidelines (optional)
      • Encoding syntax guidelines (optional)
  • 42. DCMI Description Set Profile (DSP)
    • A way of describing structural constraints on a description set
      • the resources that may be described by descriptions in the description set
      • the properties that may be referenced in statements
      • the ways a value surrogate may be given
    • Description templates, statement templates
    • Model & XML Syntax for DSP
      • Working draft by Mikael Nilsson ( Royal Institute of Technology, Sweden)
      • http://dublincore.org/documents/2008/03/31/dc-dsp/
  • 43. Foundation standards Domain standards Application Profile The “Singapore Framework”
  • 44. Further references
    • Nilsson, Baker, Johnston. The Singapore Framework for Dublin Core Application Profiles http://dublincore.org/documents/2008/01/14/singapore-framework/
    • Nilsson. Description Set Profiles: A constraint language for Dublin Core Application Profiles http://dublincore.org/documents/2008/03/31/dc-dsp/
  • 45. Example: The Scholarly Works (ePrints) DC Application Profile (SWAP)
  • 46. Background to the Eprints DCAP
    • Eprints AP development funded by JISC, Summer 2006
    • Co-ordinated by Julie Allinson (UKOLN) & Andy Powell (Eduserv Foundation)
    • &quot;eprint&quot;:
      • a ''scientific or scholarly research text'‘ (Budapest Open Access Initiative)‏
      • e.g. peer-reviewed journal article, preprint, working paper, thesis, book chapter, report, etc.
    • Specification for using DC metadata for eprints that overcomes limitations of &quot;Simple DC“
      • especially relationships between “versions”
      • “ what is being described?”
  • 47. Components
    • Functional requirements specification
    • Domain model
      • Based on subset of FRBR
    • The &quot;eprints DCAP&quot;
      • a &quot;Description Set Profile&quot;
      • plus human-readable commentary, usage guidelines
    • New vocabularies of metadata terms
      • With URIs like http://purl.org/eprint/terms/xyz
    • Eprints DC-XML XML format
      • Based on DC-XML-Full, Version 2006-09-18
  • 48.
    • Report of IFLA Study Group, 1998
    • Entity-Relational model for the “world” that bibliographic records describe
    • FRBR models the world using 4 key entities (Group 1 Entities):
      • a work is a distinct intellectual or artistic creation. A work is an abstract entity
      • an expression is the intellectual or artistic realization of a work
      • a manifestation is the physical embodiment of an expression of a work
      • an item is a single exemplar of a manifestation. The entity defined as item is a concrete entity
    • Primary relationships
      • Work -- is realized through --> Expression
      • Expression -- is embodied in --> Manifestation
      • Manifestation -- is exemplified by --> Item
    Functional Requirements for Bibliographic Records (FRBR)
  • 49. FRBR Group 1 Entities Work Expression 1..∞ isRealisedThrough Manifestation isEmbodiedIn ∞ ..∞ Copy isExemplifiedBy 1..∞
  • 50.
    • Work-Work Relationships
      • Successor, Supplement, Adaptation etc
      • Whole-Part
    • Expression-Expression Relationships
      • Abridgement, Revision, Translation etc
      • Whole-Part
    • Manifestation-Manifestation Relationships
      • Reproduction, Alternate
      • Whole-Part
    • Item-Item Relationships
      • Reconfiguration, Reproduction
      • Whole-Part
    Functional Requirements for Bibliographic Records (FRBR)
  • 51.
    • Group 2 Entities: Person, Corporate body
      • Responsibility relationships
        • Work is-Created-By Person/CB
        • Expression is-Realised-By Person/CB
        • Manifestation is-Produced-By Person/CB
        • Item is-Owned-By Person/CB
    • Group 3 Entities: Concept, Object, Event and Place
      • Subject relationships
        • Work has-as-Subject Work/Expression/Manifestation/Item
        • Work has-as-Subject Person/CB
        • Work has-as-Subject Concept/Object/Event/Place
    Functional Requirements for Bibliographic Records (FRBR)
  • 52. The eprints DCAP Domain Model ScholarlyWork Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
  • 53. The eprints DCAP Domain Model Expression isExpressedAs Expression isExpressedAs Manifestation Manifestation isManifestedAs isManifestedAs hasAdaptation ScholarlyWork hasVersion hasTranslation Copy isAvailableAs Copy isAvailableAs Copy isAvailableAs
  • 54. ePrints DCAP attributes/properties (sample) ScholarlyWork: title subject abstract affiliated institution identifier Agent: name type of agent date of birth mailbox homepage identifier Expression: title date available status version number language genre / type copyright holder bibliographic citation identifier Manifestation: format date modified Copy: date available access rights licence identifier
  • 55. The eprints DCAP as DSP
    • Developed initially using &quot;traditional&quot; &quot;tabular&quot; DCAP presentation
    • Retrospectively used as test case for DSP model
    • Document divided into five sections/tables, one for description of each entity type
      • -> DSP Description Template
    • Each section/table divided into rows, one for each statement type within description
      • -> DSP Statement Template
    • For statement referencing Literal Value
      • -> DSP Literal Value Constraint
    • For statement referencing Non-Literal Value
      • -> DSP Non-Literal Value Constraint
    http:// www.ukoln.ac.uk/repositories/digirep /index/ EPrints_Application_Profile
  • 56. Thoughts on the Approach (Julie Allinson)
    • Driven by the functional requirements identified
    • Makes it easier to rationalise ‘traditional’ and ‘modern’ citations
      • traditional citations tend to be made between eprint ‘expressions’
      • hypertext links tend to be made between eprint ‘copies’ (or ‘items’ in FRBR terms)‏
    • A complex underlying model may be manifest in relatively simple cataloguer and/or end-user interfaces
    • Existing eprint systems may well capture this level of detail currently – but emphasis on simple DC stops them exposing it to others!
  • 57. Further references
    • Allinson, Powell. Eprints Application Profile http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile
    • Allinson, Johnston, Powell. “A Dublin Core Application Profile for Scholarly Works” Ariadne 50 (Jan 2007) http://www.ariadne.ac.uk/issue50/allinson-et-al/
    • IFLA. Functional Requirements for Bibliographic Records. 1998, 2007 http://www.ifla.org/VII/s13/frbr/
  • 58. Dublin Core in 2008
  • 59. Dublin Core in 2008
    • a framework (the DCAM)
      • which describes how to use certain types of terms
      • ... to make statements ...
      • ... that form descriptions (of resources)
      • … that can be grouped together as description sets
    • a set of specifications for encoding description sets using various formats
    • a managed vocabulary of widely useful terms
      • which can be referenced in statements
    • a vocabulary model & support for defining additional vocabularies of terms
      • which can be referenced in statements
    • a profile model & support for defining DC application profiles
      • which describe how to construct description sets for some particular set of requirements
    • extensibility, modularity, structural validation, compatibility with Semantic Web
  • 60. Dublin Core in 2008
    • Current DCMI work in progress
      • Generally, shift from focus on vocabularies to framework and its use in DCAPs
      • Updating “encoding guidelines” for DC-XML, DC-HTML
      • Finalising Description Set Profile model & syntax
      • Developing guidelines for creating DCAPs (including DSPs)
      • Working with IEEE LOM community on expressing LOM metadata using DCAM
      • Working with library community on revision of core library metadata standards in RDA initiative (using FRBR & DCAM)
    • Future?
      • Sort out messy documentation!
      • Clarify formal relationship between DCAM and RDF?
      • New “encoding guidelines” for e.g. RDFa?
      • DC metadata as Linked Data?
      • Relationship to “informal metadata” (tagging etc)?
  • 61. Everything you wanted to know about Dublin Core metadata but… Tutorial for Eduserv Staff, Bath