Principle Violations
Revisiting the Dublin Core 1:1 Principle
Richard J. Urban rjurban@illinois.edu http://www.richardurban.net
The Problem Pilot Study
The 1:1 Principle
In general, Dublin Core metadata describes one
manifestation or version of a resource, rather
than assuming that manifestations stand in for
one another. For instance, a jpeg image of the
Mona Lisa has much in common with the
original painting, but it is not the same as the
painting. As such, the digital image should be
described as itself, most likely with the creator
of the original image included as a Creator or
Contributor rather than just the painter of the
original Mona Lisa. The relationship between
the metadata for the original and the
reproduction is part of the metadata description,
and assists the user in determining whether
his/her need can be met by a reproduction
(Hillmann, 2003)
Although Dublin Core (DC) metadata emerged
from the need to describe "document-like objects"
on the World Wide Web in the mid-1990s, libraries,
archives and museums soon adopted it to share
information about hidden cultural heritage
collections. In response to concerns from this
community about distinguishing between records
describing "originals" and records describing
"reproductions," DCMI introduced the 1:1 Principle:
"each resource should have a discrete
metadata description and each description
should include elements describing a single
resource" (Weibel and Hakala, 1997)
However, metadata creators indicate that the 1:1
Principle causes "a great deal of confusion" in
practice (Park & Childress, 2009). Even when the
Principle is understood, software for metadata
creation lacks affordances for creating compliant
records (Miller, 2010). Studies find that records
frequently describe both physical and digital
resources and are "particularly problematic" in
large-scale metadata aggregations (Shreeves et
al., 2005; Han et al., 2009; Hutt & Riley, 2005).
Multiple accounts of the Principle, such as the
description provided by Using Dublin Core (below)
contribute to confusions about what the Principle is
about.
While these accounts of the 1:1 Principle may provide
guidance for metadata creators, additional rules are
needed to understand how particular records "violate"
the Principle. This pilot study explores techniques to
identify records that describe different classes of
resources.
Data Collection
IMLS Digital Collections & Content Project
25 collections
55,000 item-level OAI-PMH XML records
Data Analysis
Using the SIMILE Gadget (http://simile.mit.edu/wiki/Gadget)
XML data explorer, overviews of Dublin Core
properties and the frequency of unique values were
generated for each collection. Each statement was
assigned to a class of resources:
Using the statement classifications, each collection
was classified according to three categories:
Non-violating collections: records conformed to
the 1:1 Principle.
Violating collections: records included statements
about both physical and digital resources
Non-violating violations: Records described
physical resources, but identified digital resources.
Results
n=25
Digital�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Physical�Resource
Acknowledgments
Portions of this research was supported by a 2007 IMLS National Leadership
Research and Demonstration Grant (LG-06-07-0020-07) hosted by the
GSLIS Center for Informatics Research in Science and Scholarship (CIRSS),
Dr. Carole L. Palmer, Principal Investigator
:
PhysicalResources: resources
described by format values for
physical mediums and extents.
DigitalResources: resources
described by format values about
file formats and extents.
What is the 1:1 Principle, really?
Ongoing Research
n:1 Principle, DCAM & OAI-PMH XML
Although the Dublin Core Abstract Model
(DCAM) embodies the 1:1 Principle and may
help prevent errors, it does not directly help
identify violations in legacy OAI-PMH XML
that may include implicit description sets.
Nor does DCAM's generalized resources
("anything that can be identified") help
systematically recognize records that
describe more than one resource.
1:1 Principle & Bibliographic Relationships
If the concern of cultural heritage institutions
is about "originals", "reproductions" or
"surrogates," different kinds of bibliographic
relationships need to be considered. For
example, the museum community would not
classify the the relationship between a jpeg
and the Mona Lisa, as an Equivalence
Relationship that involves related FRBR
Manifestations. Rather, surrogate resources
may stand in Derivative or Descriptive
relationships involving FRBR Expressions or
FRBR Works. Unfortunately, "the problem of
defining reproductions in relationship to
originals has proven elusive through all of the
cataloging codes of the 20th
Century" (Knowlton, 2009)
Ongoing work will provide a conceptual
definition of a 1:1 Principle that reflects the
concerns of cultural heritage repositories and
is grounded in contemporary theories of the
bibliographic universe.
Identifying 1:1 Principle Violations
A conceptual definition will inform the
development of rules and techniques that
identify records that violate the 1:1 Principle.
Ongoing work will adapt the Getty Art &
Architecture Thesaurus to identify distinct
manifestation classes. Additional violation
categories based on other relationships or FRBR
Group 1 Entities will also be explored. (i.e. is it
possible to identify DC records that describe
more than one FRBR Expression or FRBR
Work?)
Violation identification techniques will be applied
to 148,000 item-level OAI-PMH records from the
IMLS DCC Opening History aggregation in order
to identify patterns of 1:1 Principle violations.
(http:/imlsdcc.grainger.illinois.edu/history).
Bibliography
Hillmann, D. (2003, August 26). Using Dublin Core. Dublin Core Metadata Initiative. Retrieved from http://dublincore.org/documents/2003/08/26/usageguide/
Hutt, A., & Riley, J. (2005). Semantics and syntax of dublin core usage in open archives initiative data providers of cultural heritage materials. In Proceedings
of the 5th ACM/IEEE-CS joint conference on Digital libraries (p. 270).
Knowlton, S. A. (2009). How the current draft of RDA addresses the cataloging of reproductions, facsimiles, and microforms. Library Resources and
Technical Services, 53(3), 159–165.
Miller, S. (2010). The One-To-One Principle: Challenges in Current Practice. International Conference On Dublin Core And Metadata Applications. Retrieved
October 23, 2010, from http://dcpapers.dublincore.org/ojs/pubs/article/view/1043
Park, J., & Childress, E. (2009). Dublin Core metadata semantics: An analysis of the perspectives of information professionals. Journal of Information
Science, XX(X), 1-13.
Powell, A., Nilsson, M., Naeve, A., Johnston, P., & Baker, T. (2007). DCMI Abstract Model. Dublin Core Metadata Initiative. Retrieved from http://dublincore.org/
documents/abstract-model/
Shreeves, S. L., Knutson, E. M., Stvilia, B., Palmer, C. L., Twidale, M. B., & Cole, T. W. (2005). Is “Quality” Metadata “Shareable” Metadata? The Implications
of Local Metadata Practices for Federated Collections. In Currents and convergence: navigating the rivers of change: proceedings of the Twelfth National
Conference of the Association of College and Research Libraries April 7-10, 2005, Minneapolis, Minnesota (p. 223).
Tillett, B. (2001). Bibliographic Relationships. In C. Bean & R. Green (Eds.), Relationships in the organization of knowledge. Boston: Kluwer Academic
Publishers
Weibel, S., & Hakala, J. (1998, February). DC-5: The Helsinki Metadata Workshop; A Report on the Workshop and Subsequent Developments. D-Lib
Magazine. Retrieved from http://www.dlib.org/dlib/february98/02weibel.html

Principle Violations: Revisiting the Dublin Core 1:1 Principle

  • 1.
    Principle Violations Revisiting theDublin Core 1:1 Principle Richard J. Urban rjurban@illinois.edu http://www.richardurban.net The Problem Pilot Study The 1:1 Principle In general, Dublin Core metadata describes one manifestation or version of a resource, rather than assuming that manifestations stand in for one another. For instance, a jpeg image of the Mona Lisa has much in common with the original painting, but it is not the same as the painting. As such, the digital image should be described as itself, most likely with the creator of the original image included as a Creator or Contributor rather than just the painter of the original Mona Lisa. The relationship between the metadata for the original and the reproduction is part of the metadata description, and assists the user in determining whether his/her need can be met by a reproduction (Hillmann, 2003) Although Dublin Core (DC) metadata emerged from the need to describe "document-like objects" on the World Wide Web in the mid-1990s, libraries, archives and museums soon adopted it to share information about hidden cultural heritage collections. In response to concerns from this community about distinguishing between records describing "originals" and records describing "reproductions," DCMI introduced the 1:1 Principle: "each resource should have a discrete metadata description and each description should include elements describing a single resource" (Weibel and Hakala, 1997) However, metadata creators indicate that the 1:1 Principle causes "a great deal of confusion" in practice (Park & Childress, 2009). Even when the Principle is understood, software for metadata creation lacks affordances for creating compliant records (Miller, 2010). Studies find that records frequently describe both physical and digital resources and are "particularly problematic" in large-scale metadata aggregations (Shreeves et al., 2005; Han et al., 2009; Hutt & Riley, 2005). Multiple accounts of the Principle, such as the description provided by Using Dublin Core (below) contribute to confusions about what the Principle is about. While these accounts of the 1:1 Principle may provide guidance for metadata creators, additional rules are needed to understand how particular records "violate" the Principle. This pilot study explores techniques to identify records that describe different classes of resources. Data Collection IMLS Digital Collections & Content Project 25 collections 55,000 item-level OAI-PMH XML records Data Analysis Using the SIMILE Gadget (http://simile.mit.edu/wiki/Gadget) XML data explorer, overviews of Dublin Core properties and the frequency of unique values were generated for each collection. Each statement was assigned to a class of resources: Using the statement classifications, each collection was classified according to three categories: Non-violating collections: records conformed to the 1:1 Principle. Violating collections: records included statements about both physical and digital resources Non-violating violations: Records described physical resources, but identified digital resources. Results n=25 Digital�Resource Physical�Resource Physical�Resource Physical�Resource Physical�Resource Physical�Resource Physical�Resource Physical�Resource Acknowledgments Portions of this research was supported by a 2007 IMLS National Leadership Research and Demonstration Grant (LG-06-07-0020-07) hosted by the GSLIS Center for Informatics Research in Science and Scholarship (CIRSS), Dr. Carole L. Palmer, Principal Investigator : PhysicalResources: resources described by format values for physical mediums and extents. DigitalResources: resources described by format values about file formats and extents. What is the 1:1 Principle, really? Ongoing Research n:1 Principle, DCAM & OAI-PMH XML Although the Dublin Core Abstract Model (DCAM) embodies the 1:1 Principle and may help prevent errors, it does not directly help identify violations in legacy OAI-PMH XML that may include implicit description sets. Nor does DCAM's generalized resources ("anything that can be identified") help systematically recognize records that describe more than one resource. 1:1 Principle & Bibliographic Relationships If the concern of cultural heritage institutions is about "originals", "reproductions" or "surrogates," different kinds of bibliographic relationships need to be considered. For example, the museum community would not classify the the relationship between a jpeg and the Mona Lisa, as an Equivalence Relationship that involves related FRBR Manifestations. Rather, surrogate resources may stand in Derivative or Descriptive relationships involving FRBR Expressions or FRBR Works. Unfortunately, "the problem of defining reproductions in relationship to originals has proven elusive through all of the cataloging codes of the 20th Century" (Knowlton, 2009) Ongoing work will provide a conceptual definition of a 1:1 Principle that reflects the concerns of cultural heritage repositories and is grounded in contemporary theories of the bibliographic universe. Identifying 1:1 Principle Violations A conceptual definition will inform the development of rules and techniques that identify records that violate the 1:1 Principle. Ongoing work will adapt the Getty Art & Architecture Thesaurus to identify distinct manifestation classes. Additional violation categories based on other relationships or FRBR Group 1 Entities will also be explored. (i.e. is it possible to identify DC records that describe more than one FRBR Expression or FRBR Work?) Violation identification techniques will be applied to 148,000 item-level OAI-PMH records from the IMLS DCC Opening History aggregation in order to identify patterns of 1:1 Principle violations. (http:/imlsdcc.grainger.illinois.edu/history). Bibliography Hillmann, D. (2003, August 26). Using Dublin Core. Dublin Core Metadata Initiative. Retrieved from http://dublincore.org/documents/2003/08/26/usageguide/ Hutt, A., & Riley, J. (2005). Semantics and syntax of dublin core usage in open archives initiative data providers of cultural heritage materials. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries (p. 270). Knowlton, S. A. (2009). How the current draft of RDA addresses the cataloging of reproductions, facsimiles, and microforms. Library Resources and Technical Services, 53(3), 159–165. Miller, S. (2010). The One-To-One Principle: Challenges in Current Practice. International Conference On Dublin Core And Metadata Applications. Retrieved October 23, 2010, from http://dcpapers.dublincore.org/ojs/pubs/article/view/1043 Park, J., & Childress, E. (2009). Dublin Core metadata semantics: An analysis of the perspectives of information professionals. Journal of Information Science, XX(X), 1-13. Powell, A., Nilsson, M., Naeve, A., Johnston, P., & Baker, T. (2007). DCMI Abstract Model. Dublin Core Metadata Initiative. Retrieved from http://dublincore.org/ documents/abstract-model/ Shreeves, S. L., Knutson, E. M., Stvilia, B., Palmer, C. L., Twidale, M. B., & Cole, T. W. (2005). Is “Quality” Metadata “Shareable” Metadata? The Implications of Local Metadata Practices for Federated Collections. In Currents and convergence: navigating the rivers of change: proceedings of the Twelfth National Conference of the Association of College and Research Libraries April 7-10, 2005, Minneapolis, Minnesota (p. 223). Tillett, B. (2001). Bibliographic Relationships. In C. Bean & R. Green (Eds.), Relationships in the organization of knowledge. Boston: Kluwer Academic Publishers Weibel, S., & Hakala, J. (1998, February). DC-5: The Helsinki Metadata Workshop; A Report on the Workshop and Subsequent Developments. D-Lib Magazine. Retrieved from http://www.dlib.org/dlib/february98/02weibel.html