This document summarizes a presentation given at the EDUG 2012 Symposium on April 26, 2012 in Boston Spa, UK. The presentation focused on metadata for Dewey Decimal Classification (DDC) numbers and editions. It proposed additions to MARC bibliographic and classification fields to capture provenance of machine-generated DDC assignments and relationships between DDC translations and editions. It also suggested enhancing DDC numbers in metadata to allow for faceted retrieval by number components through new indexes. The goal is to improve management and discovery of DDC data resources through standardized metadata.
1. EDUG 2012
Symposium
26 April 2012 DDC metadata
Boston Spa, UK
Michael Panzer
Assistant Editor, DDC
OCLC
panzerm@oclc.org
2. Types of DDC data
- Usually, Dewey numbers provide metadata for describing
other resources
- DDC as value vocabulary for metadata element sets
- Instead, the following focuses on cases where Dewey
numbers and DDC editions are the resources described
- Two levels of DDC metadata
- Number-level metadata (focus on bibliographic records)
- Edition-level metadata (focus on classification records)
3. DDC metadata
Metadata about
- Dewey numbers (082, 083, 085 fields in MARC
Bibliographic)
- Provenance of machine-generated classication data
- Dewey number components in linked 085 fields
- Dewey editions (084, 686 fields in MARC Classification)
- Interplay between class- and edition-level metadata rendered
in MARC Classification format
4. Agenda
Scenario Context
1. Provenance of - Proposal for MARBI;
machine-generated (metadata) provenance
data initiatives at W3C / DCMI
2. Edition-level metadata - Relationship between
translations and other
―versions‖
3. Metadata about Dewey - Enhancing Dewey
number components numbers for retrieval
5. MARBI proposal
- Drafted over the last two months in cooperation with colleagues
from DNB and LC
- To be presented at MARBI meeting at ALA Annual Conference
2012
- Two options
- Option 1: Addresses the immediate needs of documenting
information about machine generation of classification data
- Defines additional subfields in 082, 083, 084
- Option 2: Proposes a more general way of dealing with metadata
provenance
- Applicable to all MARC variable fields (in principle)
- Heeds the distinction between provenance in general and metadata
provenance in particular
6. Option 1
Defined for 082, 083, 084
$i - Method of assignment designator
Fully machine-generated (m)
Not fully machine-generated (x)
$u - Process of assignment
May contain a URI, a process name, or some other
description of process designated in $i
$1 - Confidence value
Confidence of the assigning agency in relation to the
process described in $u. Contains value from the interval
[0,1]
$q – Assigning agency (already defined)
7. Examples
DDC 23 number assigned by LC using AutoDewey. The
AutoDewey process involves machine assistance followed
by intellectual review:
082 00 $a829/.3$223$ix$uautodewey$11
Fictitious example of DDC 22 number assigned by OCLC in a
fully automated way using information in Classify:
082 04 $a394.12$222$im$uclassify$10.5$qOCoLC
8. Option 2
883 - Data provenance (R)
First Indicator: Method of assignment
# - No information provided
0 – Fully machine-generated
1 – Not fully machine-generated
$d - Date on which the linked field was generated
$u - Process used to generate linked field
$q - Agency using the process/activity to generate the linked field
$1 - Confidence value
$x - Ending date of validity
$0 - Authority record control number or standard number
$8 - Field link and sequence number (with new field link type ―p – Data
provenance‖)
11. Agenda
Scenario Context
1. Provenance of - Proposal for MARBI;
machine-generated (metadata) provenance
data initiatives at W3C / DCMI
2. Edition-level metadata - Relationship between
translations and other
―versions‖
3. Metadata about Dewey - Enhancing Dewey
number components numbers for retrieval
12. Edition-level metadata
- Edition registry: capturing information about editions and
translations in a centralized manner outside of MARC
records
- Storing additional metadata about editions/translations in
MARC records
- Better management of translation data and other versions
- MARC does not offer edition-level records
- Data info has to be carried in individual records, even when
it applies to the whole edition
- Relevant fields: 084 - Classification Scheme and Edition
686 - Relationship to Source Note
13. DDC translations:
Anatomy of an edition
German Italian
DDC 22 DDC 22
Swedish
French
Mixed
DDC 22
DDC 22
Afrikaans
Arabic
English
Chinese
French
French DDC DDC 22 DDC Sach-
Summaries Gruppen Italian
German (German)
Norwegian Rhaeto-
Portuguese Romansch
Russian 200
Guide Religion
Scots Gaelic (French) Class
Spanish
Swedish
A14
Vietnamese
French
A14
A14
Hebrew Spanish
A14 A14
Italian
A14
14. Types of editions
- Related to an edition, with relationships not captured at
record level
Examples: sdnb, DDC Summaries, Guide
versus
- Related to an edition, with relationships captured at
record level
Examples: 200 Religion, translations, A15engind
15. Tracking edition-to-edition relationships
Translation of standard edition
084 1# $a ddc $c 15 $e ind
Source edition
084 1# $a ddc $c 15 $e eng
Authorized derivative version of standard edition
084 8# $a ddc $c 22sdnb $d 22 $e ger
Source edition
084 0# $a ddc $c 22 $e eng
- Not explicitly full or abridged; ―8‖ is used for value of first indicator
- $n should be automatically populated with relevant information
about the changes regarding the source edition.
16. Tracking record-to-record relationships
1. Record has been modified
Translation of standard edition
084 1# $a ddc $c 15 $e ind
686 3# $i modified
Source record
084 1# $a ddc $c 15 $e eng
17. Tracking record-to-record relationships (2)
2. Record was created for translation
Translation of standard edition
084 1# $a ddc $c 15 $e ind
686 1# $b 305.899
Source record
[does not exist]
18. Tracking record-to-record relationships (3)
3. Unmodified record from different source edition
Translation of standard edition
084 1# $a ddc $c 15 $e ind
686 0# $2 23
Source record
084 0# $a ddc $c 23 $e eng
19. Agenda
Scenario Context
1. Provenance of - Proposal for MARBI;
machine-generated (metadata) provenance
data initiatives at W3C / DCMI
2. Edition-level metadata - Relationship between
translations and other
―versions‖
3. Metadata about Dewey - Enhancing Dewey
number components numbers for retrieval
20. 085 - Synthesized Classification Number
Components
- 085 fields provide information about components of Dewey
numbers in linked 082 or 083 fields
- Mirror 765 fields in MARC Classification format
- Vital for faceted retrieval driven by Dewey numbers
- Further enhancements possible by utilizing mappings of
Dewey numbers that occur prominently as components, e.g,
geographic data, time periods
- Definition of new indexes is a requirement for retrieval
use for WoldCat data
22. Proposed new indexes (083 fields)
―Dewey additional‖ index
da index: Add $z and $c ($y) to elements already in dd
index
Pattern: [z--]a[-c][:a[-c]]
23. Proposed new indexes (085 fields)
―Dewey components‖ index
dc index: Index $s and $t concatenated with full
address
Pattern: [z--]rs|w[-c][:t]
―Dewey synthesized‖ index
ds index: Index all components
Pattern: [z--]a|b|rs|u|w[-c][:a|b|t|u|v[-c]]
24. Proposed new indexes (082/083/085 fields)
―Dewey general‖ index
dg index: Index all elements in Dewey numbers
Pattern: Combine dd, da, and ds indexes
25. Example: History of Cologne during WWII
Built number: 943.55140864
9 History & geography
+ T2—435514 Cologne
+ 943.0864 Period of World War II, 1939-1945
082 00 $8 1x $a 943/.55140864 $2 22
085 0# $8 1x $b 9 $a 930 $c 990 $z 2 $s 435514 $u 943.5514
085 0# $8 1x $b 943.5514 $a 930 $c 990 $v 01 $c 09 $f 0 $r 943.0 $s 864
$u 943.55140864
27. Scenarios / Use cases
- Components / facets can be varied independently of each
other
- Allows for expanding, but also "morphing" the query by
changing individual components
- Integration of mapped vocabularies into Dewey-driven
discovery process
- Using terms that have been mapped to any number
components
- Usage of local hierarchies of number components instead
of just the hierarchical relationships of the base number
30. Some useful links
DDC 23 http://www.oclc.org/us/en/dewey/versions/print/default.htm
Abridged Edition 15 http://www.oclc.org/us/en/dewey/versions/abridged/default.htm
WebDewey 2.0 http://dewey.org/webdewey
dewey.info http://dewey.info
Dewey webinars & http://www.oclc.org/us/en/dewey/news/conferences/default.htm
presentations
025.431: http://ddc.typepad.com
The Dewey blog
Classify http://classify.oclc.org/classify2/
Questions? dewey@loc.gov (Dewey Editorial Office)
dewey@oclc.org (Licensing, group purchases, LIS program)
Editor's Notes
1. How to provide provenance metadata for machine-generated Dewey numbers and other pieces of classification information, and how to effectively express data provenance in the context of a MARC record.2. How metadata about Dewey editions can be used to establish relationships between translations and other versions of the classification.3. How metadata about Dewey number components provided by 085 fields can enhance the use of Dewey numbers in information discovery.
Note: $n should be automatically populated with relevant information about the changes regarding the source edition. A possible place to store $n on an edition level is the edition registry
Why go through all this trouble indexing subfields of number components?