• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Extending models for controlled vocabularies to classification systems: modelling DDC with FRSAD
 

Extending models for controlled vocabularies to classification systems: modelling DDC with FRSAD

on

  • 936 views

Mitchell, Joan S., Marcia Lei Zeng, and Maja Zumer. Presented at the International UDC Seminar 2011, Classification & Ontology, The Hague, The Netherlands, Sept. 19-20, 2011.

Mitchell, Joan S., Marcia Lei Zeng, and Maja Zumer. Presented at the International UDC Seminar 2011, Classification & Ontology, The Hague, The Netherlands, Sept. 19-20, 2011.

Statistics

Views

Total Views
936
Views on SlideShare
936
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • MZ note: I found a line in white font on the slide: Thema = “any entity used as a subject of a work ". Joan: is this what you would like to bring into the slide? I put an indirect question here. I understand we may not need to show it here. We can answer later… Indirect question: Will a general conceptual model be useful to model classification data?
  • This slide intends to put the development of knowledge organization systems in a timeline. It reveals a fact that standards and models were not developed ahead of them, rather, were initiated much later. Ref: http://catalogingandclassificationquarterly.com/ccq29nr1-2ed.htm THE LCSH CENTURY: A BRIEF HISTORY OF THE LIBRARY OF CONGRESS SUBJECT HEADINGS, AND INTRODUCTION TO THE CENTENNIAL ESSAYS By Alva Stone. Cataloging & Classification Quarterly, Volume 29, Number 1-2 2000 "Technically speaking, LC ’s publication of its subject headings list did not really begin in 1898. That was instead the year in which the Library of Congress converted from an author- plus a classed-catalog to a dictionary catalog , which incorporated author, title, and subject entries into a single file. The first actual printing of Subject Headings Used in the Dictionary Catalogues of the Library of Congress (later to be titled Library of Congress Subject Headings) began in the summer of 1909, …" 1905-1907 UDC published. According to sources the work on UDC started in 1896 - (as suggested by Aida) ISO 2788 (1974) Guidelines for the Establishment and Development of Monolingual Thesauri , published by International Standards Institution.   ISO 5964 (1985) Guidelines for the Establishment and Development of Multilingual Thesauri , published by International Standards Institution.
  • Another view of this chart. Although standards were developed, they were for thesauri speficicly. Thesauri usually comply with international standards such as ISO 2788 and ISO 5964. Subject heading schemes such as the LCSH have also adopted the basic structure of the thesaurus for the last two decades. Classification systems have implemented different practices and are usually constructed according to specific conventions and examples. SKOS and FRSAD also focused more towards controlled vocabularies rather than classifications. Although they aimed at KOS in general, they were influenced by thesaurus standards and practices heavily simply because no standards for classifications yet.
  • An overview of the FRBR family conceptual models. Ref: Zumer, Zeng, and Salaba, 2010. FRBR: A Generalized Approach to Dublin Core Application Profiles. Proc. Int ’ l Conf. on Dublin Core and Metadata Applications 2010
  • This model is generated within the FRBR framework. See next slide.
  • Still see three major entities: work, thema, and nomen, here the original FRBR entities and relationships are used to show the root of FRSAD development. In this context, thema is defined as ‘any entity used as a subject of a work’.
  • http://www.cas.org/ASSETS/93FD034A0502489FA2474DFC0EE03284/usan.pdf
  • Other hierarchical relationships include Polyhierarchical, Faceted, and Perspective Hierarchical Structures
  • This is why nomen (in general) has to be an entity, not an attribute of thema. An instance of a nomen may have parts: a personal name is a combination of first name and last name ( ‘Albert Einstein’ has parts ‘Albert’ and ‘Einstein’); a subject heading string is a combination of terms (“Universities and colleges--Employees--Labor unions--Germany”). In such cases a whole-part relationship (partial relationship) exists between the nomen and its components. In a particular knowledge organization system, rules are established to govern the creation of complex nomens from such components.  
  • There might exist a need to have case-by-case analysis in order to align FRSAD elements to the macro-structures that inherit certain conventions
  • Dewey Breakfast/Update June 23, 2007 ALA Annual Conference 2007

Extending models for controlled vocabularies to classification systems: modelling DDC with FRSAD Extending models for controlled vocabularies to classification systems: modelling DDC with FRSAD Presentation Transcript

  • Extending Models for Controlled Vocabularies to Classification Systems: Modelling DDC with FRSAD Joan S. Mitchell OCLC, Inc.   Marcia Lei Zeng Kent State University   Maja Žumer University of Ljubljana , Slovenia  
  • The big question Can the FRSAD conceptual model be extended beyond subject authority data (its original focus) to model classification data?
  • Outline
    • From Knowledge Organisation Systems (KOS) to data and conceptual models
    • FRSAD conceptual model
    • FRSAD model for classification systems
    • DDC case study
    • Findings and limitations
    • Future work
  • DDC UDC LCSH FRSAD FRAD FRBR TEST* * Thesaurus of engineering and scientific terms ISO 2788 (1974) Guidelines for the Establishment and Development of Monolingual Thesauri ISO 5964 (1985) Guidelines for the Establishment and Development of Multilingual Thesauri ISO 2788* ISO5964* SKOS OWL 1. From Knowledge Organisation Systems to Data and Conceptual Models: Timeline 2009 1998 2010 1876 1905 1898 1967 1974 1985 2004-2009
  • From Knowledge Organisation Systems to Data and Conceptual Models: Modelling efforts Classifi-cation Subject headings FRSAD FRAD FRBR ISO 2788 ISO5964 SKOS OWL Classifi-cation Thesauri Thesauri KOS KOS ontology Thesauri: mostly comply with ISO 2788 and ISO 5964. Subject heading schemes : adopted the basic structure of the thesaurus since 1990s. Classification systems : implemented different practices and are usually constructed according to specific conventions and examples. 2009 1998 2010 1876 1905 1898 1967 1974 1985 2004-2009
  • The “FRBR family”
    • FRBR: the original framework
      • All entities, focusing on Group 1 entities: work, expression, manifestation, item
      • Published 1998
    • FRAD: Functional Requirements for Authority Data
      • Focusing on Group 2 entities: person, corporate body, family
      • Published 2009
    • FRSAD: Functional Requirements for Subject Authority Data
      • Focusing on Group3 entities
      • FRSAR WG established in 2005
      • Published 2010
  • The FRBR family models: main entities and relationships FRBR FRAD FRSAD
  • 2. FRSAD Conceptual Model 2.1 The core of the FRSAD conceptual model
  • FRSAD – generalisation of FRBR
  • The core of the FRSAD conceptual model FRSAD Part 1: WORK has as subject THEMA / THEMA is subject of WORK FRSAD Part 2: THEMA has appellation NOMEN / NOMEN is appellation of THEMA NOMEN = any sign or sequence of signs (alphanumeric characters, symbols, sound, etc.) that a thema is known by, referred to or addressed as
  • Note: in a given controlled vocabulary and within a domain, a nomen should be an appellation of only one thema . The ‘has appellation’ relationship between thema and nomen in a controlled vocabulary:
      • NOMEN = any sign or sequence of signs (alphanumeric characters, symbols,
      • sound, etc.) that a thema is known by, referred to or addressed as .
    Source: STN Database Summary Sheet: USAN (The USP Dictionary of U.S. Adopted Names and International Drug Names) An example of nomens in an authority record for a chemical compound Nomen 1-8 Nomen 9
  • terms ( preferred & non-preferred) notations terms of pre-coordinated strings category labels (w or w/t notations) terms or identifiers … …
    • thesauri:
    • classification schemes:
    • subject heading systems:
    • taxonomies:
    • controlled lists:
    • … …
    themas represented by: Nomens in different types of KOS
  • 2.2 Relationships (1) Thema-to-thema relationships
    • Hierarchical
      • The generic relationship
      • The hierarchical whole-part relationship
      • The instance relationship
      • Other hierarchical relationships
    • Associative
      • [most commonly considered categories are listed in the report]
    • Other thema- to -thema relationships are domain- or implementation-dependent
    • Equivalence
      • Two nomens are considered equivalent only if they are appellations of the same thema in a controlled vocabulary.
    • Partitive
      • An instance of a nomen may have parts.
      • A whole-part relationship may exist between a nomen and its components.
    2.2 Relationships (2) Nomen-to-nomen relationships
  • 2.3 Attributes
    • Some general attributes of thema and nomen are proposed
      • (1) thema attributes: - type of thema
          • In an implementation themas can be organized based on category, kind, or type
      • - scope note
      • - In an implementation additional attributes may be defined/recorded
      • (2) nomen attributes: see next slide 
  • Nomen attributes
      • Type of nomen (identifier, controlled name, …)
      • Scheme (LCSH, DDC, UDC, ULAN, ISO 8601…)
      • Reference source of nomen (Encyclopaedia Britannica…)
      • Representation of nomen (alphanumeric, sound, visual,...)
      • Language of nomen (English, Japanese, Slovenian,…)
      • Script of nomen (Cyrillic, Thai, Chinese-simplified,…)
      • Script conversion (Pinyin, ISO 3601, Romanisation of Japanese…)
      • Form of nomen (full name, abbreviation, formula…)
      • Time of validity of nomen (until xxxx, after xxxx, from… to …)
      • Audience (English-speaking users, scientists, children …)
      • Status of nomen (provisional, accepted, official,...)
      • Note: examples of attribute values in parenthesis
      • - In an implementation additional attributes may be defined
    include but not limited to:
  • 2.4 The importance of the THEMA-NOMEN model to the subject authority data
    • Separating what are usually called concepts (or topics , subjects, classes [of concepts] ) from what they are known by, referred to, or addressed as
    • A general abstract model, not limited to any particular domain or implementation
    • Potential for interoperability within the library field and beyond
  • 3. FRSAD model for classification systems
    • Each class corresponds to a thema
    • Notation associated with the class is the nomen
    • Thema is the full category description of the class
    • Nomen is the symbol (or surrogate) used to represent the full category description
  • 4. DDC case study
  • Thema: Class 025.04
  • Nomens: DDC number, Full caption, URI 025.04 Computer science, information & general works/Library & information sciences/Operations of libraries, archives, information centers/Information storage and retrieval systems http://dewey.info/class/025.04/
  • Thema: Any topic co-extensive with the full meaning of the class topics that are functionally equivalent to the class
  • Scope note: Text describing or defining thema or specifying scope within particular system Scope note (≠ thema/class) Scope note (≠ thema/class)
  • Thema-to-thema relationships associative relationship associative relationship (poly)hierarchical relationship
  • Alternative nomens: Relative Index terms with equivalence relationship to class
  • equivalence relationship ? ? ? ? ? ? ? ? scope note SN SN SN SN ? unknown relationship ?
  • Derived alternative nomens 150 ## $a Databanks 260 ## $i see also $a Databases
  • equivalence relationship ? ? ? ? ? ? scope note SN SN SN SN ? unknown relationship Derived
  • 5. Findings and limitations
    • FRSAD conceptual model appears to accommodate DDC data at a broad level
    • Topic-to-topic relationships require further study
    • The study did not consider the usefulness of classification data modelled using FRSAD in real-world applications
  • 6. Future work
    • Specify all relationships between Relative Index terms and classes (see earlier work by Green, Mitchell)
  • equivalence relationship ? ? ? ? ? ? scope note SN SN SN SN ? unknown relationship Derived
  • 6. Future work
    • Specify all relationships between Relative Index terms and classes (see earlier work by Green, Mitchell)
    • Investigate DDC translations and mappings in context of model
  • French DDC 22 German DDC 22 Italian DDC 22 Swedish Mixed DDC 22 Italian A14 Vietnamese A14 French A14 Spanish A14 Hebrew A14 200 Religion Class Guide (French) DDC 22 A14 DDC Sach-Gruppen (German) DDC Summaries English French Italian Rhaeto-Romansch Afrikaans Arabic Chinese French German Norwegian Portuguese Russian Scots Gaelic Spanish Swedish
  • Mappings and crosswalks DDC LCSH MeSH SWD RAMEAU SAB BISAC SEARS CSH UDC LCC SAO Nuovo Soggettario
  • Thema-to-thema relationships across languages: Class 025.04 (22/swe) = Class 025.04 (22)
  • Thema-to-thema relationships (Complex case): T2—43414 (22) = T2—43414 (22/ger), but . . .
    • T2—43414 Giessen district (Giessen Regierungsbezirk)
    • Including *Lahn River
    • T2—43414 Regierungsbezirk Gießen
    • T2—434147 Lahn-Dill-Kreis
    • Hier auch: der Fluss *Lahn
    not equivalent to thema/class T2—43414 functionally equivalent to thema/class T2—434147
  • 6. Future work
    • Specify all relationships between Relative Index terms and classes (see earlier work by Green, Mitchell)
    • Investigate DDC translations and mappings in context of model
    • Investigate modelling the Relative Index as a separate controlled vocabulary to provide a topic-centered view
    • Experiment with modelling other classification schemes
    • Investigate usefulness of classification data modelled using FRSAD