Controlled
Vocabulary &
Knowledge Org.
Systems
LIS 653
Starr Hoffman
How it All Fits Together
Record
(representation
of bibliographic
information)
Code
(rules: AACR2,
RDA)
Encoding
(MARC, XML)
Authority control
(standardized author
name: J. R. R. Tolkien)
Structure
(ISBD, XML)
Subject
headings
(LCSH, Sears, AAT)
Classification
(shelving: LC,
Dewey)
Model
(FRBR, trad.
model)
Record is displayed in an OPAC,
online database, etc.
Format
(MARC,
Dublin Core)
Core Concepts of Controlled
Vocabularies
 Organized list of terms
 Words and/or phrases
 Used to label content
 Used to find content
 Terms may be:
 Hierarchically structured
 Relational
 Something in-between (BT, NT, RT)
CVs attempt to resolve problems…
 User language doesn’t match language
of the document they’re seeking
 User wants to find all resources on a given
topic
 User wants to find out what language a
system/organization uses to describe a
given concept
Approaches…
 Pre-coordinate: subdivided terms create compounded,
complex concepts (LCSH)
 Post-coordinate: single concepts, basic terms (that can
be combined later during the search process)
 Able to assign multiple subject terms regardless of the
above method used
 Hierarchical vs. more complex/subtle relationships
 Finite, limited (thesaurus) vs. evolving, broad (LCSH)
 Controlled vs. uncontrolled (tags, concept maps)
Types of Controlled Vocabularies
 Thesaurus: specific subject domain (Getty’s AAT)
 Subject heading list: library context, describe “aboutness”
of items in catalog records (LCSH, Sears)
 Classification scheme: library or other context
(taxonomy), descriptor to group like items together (LCC,
DDC)
 Authority file: list of names of individuals or geographic
places (LC Authority Files)
 Keyword list, Tag cloud: often user-defined/community-
defined, used to group similar content in social media or
other websites
 Website categories (aka taxonomy): tree-like hierarchical
structure, similar to classification (w/o notation / call #)
 Concept Map (aka ontology): visual representation of
relationship between concepts, relays meaning
(semantics)
Arrangement of CVs/KOSs
CV type Structure Pre- or post-
coordinate
About
Thesaurus Alphabetical &
Systematic (2 parts)
Post-coordinate Narrow scope: one subject
domain; cross-references,
related terms, synonyms, etc.
Subject heading list Alphabetical &/or
Systematic
Both pre-coordinate
& post-coordinate
Broad general focus; cross-
references, related terms,
synonyms, etc.
Classification
scheme
Systematic Pre-coordinate Often hierarchical, expressed
in notation (code) rather than
in words
Authority file Alphabetical Post-coordinate Lists of names: geographies,
individuals
Keyword list, Tag
cloud
Alphabetical (& tag
cloud as size-
systematic)
Usually post-
coordinate, may be
either
User-created; usually does not
identify synonyms, hierarchy,
relationships between terms
Website categories
(taxonomy)
Alphabetical &
Systematic
Hierarchical (tree structure)
Concept map
(ontology)
Visual arrangement
of relationships
between concepts
(Systematic)
(no preferred terms,
so does not apply)
Graphic representation of
concepts in a subject domain;
often complex relationships
(not just hierarchical)
Some Issues When Constructing CVs…
Word form (plural vs. singular)
 Cat vs. Cats
Sequence & form for phrases
 Energy conservation vs….
 Conservation of energy resources
Homographs & homophones
 Mercury: planet, metal, Roman god, car
 Fowl vs. foul
Qualifiers
 Mercury (planet)
 Mercury (Roman deity)
Some Issues When Constructing CVs…
Abbreviations, acronyms
 AIDS vs. Acquired Immune Deficiency Syndrome
Popular vs. technical terms
 Cancer vs. Neoplasms
Precoordinate (subdivision) vs. Postcoordinate
 Merchant marine—officers
Versus...
 Merchant marine
 Officers
Thesaurus Abbreviations
 TT: top term
 broadest term in hierarchy
 BT: broader term
 NT: narrower term
 RT: related term
 USE: use X
 points to a preferred term (X)
 UF: “use for”
 the preferred term
 SN: scope note
 describes meaning of the term meaning
Keywords
 Often chosen by authors of works
(uncontrolled, miss synonyms or related
concepts)
 Info retrieval system may use combination
of keywords & assessing document’s full
text (databases)
 Usually not connected to synonyms or
hierarchy of terms
Tag Clouds, Folksonomies
Tag Cloud Issues…
science fiction
ScienceFiction
scifi
sci-fi SF
Hard SF
time travel
books I read in high school
Dystopian futures with strong female leads
YA distopia
cyborgs
YA dystopia
robots
androids
AI
Artificial intelligence
Scifi-lite
stuff
Website
Categories
Concept Map
Concept Map
Literati database…
http://clio.columbia.edu/catalog/9356779
More Examples
Linked lists of controlled vocabularies:
http://geekyartistlibrarian.pbworks.com/w/page/88627766/Pratt
LIS653_ControlledVocabularies

LIS 653, Session 10: Controlled Vocabulary

  • 1.
  • 2.
    How it AllFits Together Record (representation of bibliographic information) Code (rules: AACR2, RDA) Encoding (MARC, XML) Authority control (standardized author name: J. R. R. Tolkien) Structure (ISBD, XML) Subject headings (LCSH, Sears, AAT) Classification (shelving: LC, Dewey) Model (FRBR, trad. model) Record is displayed in an OPAC, online database, etc. Format (MARC, Dublin Core)
  • 3.
    Core Concepts ofControlled Vocabularies  Organized list of terms  Words and/or phrases  Used to label content  Used to find content  Terms may be:  Hierarchically structured  Relational  Something in-between (BT, NT, RT)
  • 4.
    CVs attempt toresolve problems…  User language doesn’t match language of the document they’re seeking  User wants to find all resources on a given topic  User wants to find out what language a system/organization uses to describe a given concept
  • 5.
    Approaches…  Pre-coordinate: subdividedterms create compounded, complex concepts (LCSH)  Post-coordinate: single concepts, basic terms (that can be combined later during the search process)  Able to assign multiple subject terms regardless of the above method used  Hierarchical vs. more complex/subtle relationships  Finite, limited (thesaurus) vs. evolving, broad (LCSH)  Controlled vs. uncontrolled (tags, concept maps)
  • 6.
    Types of ControlledVocabularies  Thesaurus: specific subject domain (Getty’s AAT)  Subject heading list: library context, describe “aboutness” of items in catalog records (LCSH, Sears)  Classification scheme: library or other context (taxonomy), descriptor to group like items together (LCC, DDC)  Authority file: list of names of individuals or geographic places (LC Authority Files)  Keyword list, Tag cloud: often user-defined/community- defined, used to group similar content in social media or other websites  Website categories (aka taxonomy): tree-like hierarchical structure, similar to classification (w/o notation / call #)  Concept Map (aka ontology): visual representation of relationship between concepts, relays meaning (semantics)
  • 7.
    Arrangement of CVs/KOSs CVtype Structure Pre- or post- coordinate About Thesaurus Alphabetical & Systematic (2 parts) Post-coordinate Narrow scope: one subject domain; cross-references, related terms, synonyms, etc. Subject heading list Alphabetical &/or Systematic Both pre-coordinate & post-coordinate Broad general focus; cross- references, related terms, synonyms, etc. Classification scheme Systematic Pre-coordinate Often hierarchical, expressed in notation (code) rather than in words Authority file Alphabetical Post-coordinate Lists of names: geographies, individuals Keyword list, Tag cloud Alphabetical (& tag cloud as size- systematic) Usually post- coordinate, may be either User-created; usually does not identify synonyms, hierarchy, relationships between terms Website categories (taxonomy) Alphabetical & Systematic Hierarchical (tree structure) Concept map (ontology) Visual arrangement of relationships between concepts (Systematic) (no preferred terms, so does not apply) Graphic representation of concepts in a subject domain; often complex relationships (not just hierarchical)
  • 8.
    Some Issues WhenConstructing CVs… Word form (plural vs. singular)  Cat vs. Cats Sequence & form for phrases  Energy conservation vs….  Conservation of energy resources Homographs & homophones  Mercury: planet, metal, Roman god, car  Fowl vs. foul Qualifiers  Mercury (planet)  Mercury (Roman deity)
  • 9.
    Some Issues WhenConstructing CVs… Abbreviations, acronyms  AIDS vs. Acquired Immune Deficiency Syndrome Popular vs. technical terms  Cancer vs. Neoplasms Precoordinate (subdivision) vs. Postcoordinate  Merchant marine—officers Versus...  Merchant marine  Officers
  • 10.
    Thesaurus Abbreviations  TT:top term  broadest term in hierarchy  BT: broader term  NT: narrower term  RT: related term  USE: use X  points to a preferred term (X)  UF: “use for”  the preferred term  SN: scope note  describes meaning of the term meaning
  • 11.
    Keywords  Often chosenby authors of works (uncontrolled, miss synonyms or related concepts)  Info retrieval system may use combination of keywords & assessing document’s full text (databases)  Usually not connected to synonyms or hierarchy of terms
  • 12.
  • 13.
    Tag Cloud Issues… sciencefiction ScienceFiction scifi sci-fi SF Hard SF time travel books I read in high school Dystopian futures with strong female leads YA distopia cyborgs YA dystopia robots androids AI Artificial intelligence Scifi-lite stuff
  • 14.
  • 15.
  • 16.
  • 17.
    More Examples Linked listsof controlled vocabularies: http://geekyartistlibrarian.pbworks.com/w/page/88627766/Pratt LIS653_ControlledVocabularies

Editor's Notes

  • #3 Bibliographic record = record with descriptive information about a work (information object) that it represents Catalog = a group of records, a list of objects in a collection Model = determines how records are organized and created in a catalog (FRBR) Cataloging codes = rules for creating catalog records (AACR2, RDA) Structure = rules that help determine the order of elements & punctuation/spacing in a record (ISBD, XML) Format = format in which a record is created and/or encoded (MARC, Dublin Core) Subject heading = terms from a topical controlled vocabulary that describe the content of a work (LCSH, Sears, AAT) Authority control = term from a controlled vocabulary chosen to uniquely identify an author, corporation, book title, or series (so that all records for one author can be accessed easily) Classification scheme = assigns a unique descriptor to a work (usually based on subject), often used to indicate a work’s physical location (LC, Dewey)
  • #4 USES for controlled vocab… Subject description (subject headings, classification schemes, tags) Website navigation/structure Organize content for retrieval Concept map of website or system Facilitate user/organizational communication Determine what concepts & terms (jargon!) an organization/system uses
  • #5 User language doesn’t match language of the document they’re seeking User wants to find all resources on a given topic User wants to find out what language a system/organization uses to describe a given concept
  • #13 Visually organized User-created --not necessarily a controlled vocabulary—may have duplicate concepts, plural vs. singular, etc. --may evolve into a community-controlled vocabulary, or folksonomy (consistency, consensus, stability) Advantages: Natural language, language of/by/for users Users tag according to how they want to find things later Identifies part of item that the USERS care about, find most relevant—may differ from how a cataloger views the same item Disadvantages: Lack of precision: may have duplicate concepts, plural vs. singular, etc. If synonyms aren’t linked, also a lack of recall (don’t come up with the items in the system that apply to the user’s search) Tag cloud itself: visual representation of the use of tags, arranged by size (larger = more times term has been used) Can give a good idea of popular terms, perhaps also popular concepts Similar to a concept map in its visual layout
  • #14 Synonyms, inconsistency of terms (science fiction, sciencefiction, scifi, sci-fi…) Related terms, confusion of relationships (androids, robots, cyborgs) Different forms of words (robot, robots) Misspelled words Different syntax, capitalization Different specificity levels Different types of categorization, unclear relationships Very personal tags, little meaning for others Vague “STUFF”