Murtha Baca
Upcoming SlideShare
Loading in...5

Murtha Baca



Using Controlled Vocabularies to Enhance Access to Cultural Information

Using Controlled Vocabularies to Enhance Access to Cultural Information



Total Views
Views on SlideShare
Embed Views



1 Embed 8 8



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Murtha Baca Murtha Baca Presentation Transcript

  • “ Seek, and ye shall find:” Using Controlled Vocabularies to Enhance Access to Cultural Information SLA Seattle June 16, 2008
  • Controlled Vocabularies: an Overview Murtha Baca Head, Vocabulary Program Getty Research Institute SLA Seattle June 2008
    • Data structure standards (metadata element sets):
      • MARC, EAD, Dublin Core, CDWA, VRA Core
    • Data content standards (cataloging rules):
      • AACR (  RDA), ISBD, CCO, DA:CS
    • Data value standards (vocabularies):
    • Data format standards (standards expressed in machine-readable form):
      • MARC, MARCXML, EAD, CDWA Lite XML, Dublin Core Simple XML schema, DC Qualified XML schema, VRA Core XML schema
  • What are vocabularies?
    • Maps to guide people to information
      • creating / filling
      • searching / researching
      • organizing / classifying / thinking
    • Collections of terminology where relationships between terms are represented
    • Data value standards (i.e. what is used to “fill” metadata elements/categories or “containers” of information)
  • “ Knowledge bases” -- bodies of knowledge represented by language (glossaries, dictionaries, thesauri, word lists) What are vocabularies?
  • Types of terms in vocabularies
    • personal names: Collate, Charles B.
    • geographic names: Campbeltown (Argyll and Bute, Scotland, UK)
    • object names: clack valve
    • corporate names: Cambrian Railways
    • iconographic subjects and themes: The Legend of John Henry
    • genre terms: political cartoons, fish stories
    • multilingual equivalents: flat car (English) = Schienenwagen (German) = platforma (atklata) (Latvian)
  • What is a controlled vocabulary?
    • A tool for consistency in the language used in the recording and retrieval of information
  • What is a controlled vocabulary?
    • An organized arrangement of words and phrases that are used to index content and/or to retrieve content through navigation or a search
    • Typically a vocabulary that includes preferred terms and has a limited scope or describes a specific domain
  • Types of Controlled Vocabularies
  • Controlled Lists
    • Simple lists of terms used to control terminology
    • In a well-constructed controlled list:
      • Each term must be unique (no homographs).
      • Terms should all be members of the same class.
      • Terms should not be overlapping in meaning
      • Terms should be equal in granularity or specificity.
      • Terms are arranged alphabetically or in another logical order.
  • Controlled Lists cont.
    • May include terms from other controlled vocabulary resources (especially standard published vocabularies)
    • For some elements or fields in a database, a controlled list may be sufficient to control terminology, particularly where the terminology for that field is limited and unlikely to have synonyms or ancillary information. (Example: artists’ roles in ULAN, place types in TGN).
    • Controlled list: A simple list of terms used to control terminology
    manuscripts miscellaneous paintings photographs sculpture site Installation texts vessels Example of a controlled pick list for Classification Patricia Harpring, 2008 © J. Paul Getty Trust
    • A list comprising sets of terms that are considered equivalent
    • No preferred term
    • Generally used for search and retrieval, providing access to content that is represented in natural, uncontrolled language
    Felis domesticus Synonym ring list Jean-Baptiste Perroneau, Portrait of Magdaleine Pinceloup, © J. Paul Getty Museum; Chat Noir, Theophile-Alexandre Steinlen, © Sta. Barbara Museum of Art. Egyptian Cat, © Metropolitan Museum. Cat and Kittens, © National Gallery of Art. Maneki Neko, Japanese, © private collection. © J. Paul Getty Trust; Patricia Harpring 2008 domestic cat cat Felis catus house cat
  • Compilations, usually in alphabetical order, that combine separate concepts into a “string,” as in the Library of Congress Subject Headings (LCSH) Commercial fishing -- Japanese competition Salmon fisheries -- law and legislation -- California Subject Headings
  • Pre-coordination of terminology is a characteristic of subject headings; subject headings typically combine several unique concepts together. Subject Headings cont. Subject headings--Pictures. Pictures--Computer network resources. World Wide Web--Subject access.
  • Taxonomies/Classifications Vocabularies that organize a body of knowledge for a defined domain into conceptual categories, e.g. Nomenclature for Museum Cataloging , ICONCLASS. The Greek heroic legends Story of Hercules (Heracles) Labors of Hercules Hercules chokes the Nemean lion Hercules kills the Hydra of Lerna Hercules captures the Ceryneian hind Hercules captures the Cretan bull
  • Compilations of terms representing single concepts . Thesauri explicitly express relationships among terms via a semantic structure. <visual works by form> dioramas diptychs medals medallions (medals) polyptychs triptychs Thesauri
  • Authority Files
    • Compilations of authorized terms or headings used by a single information system, organization, or consortium for cataloging, indexing, and documentation.
    • Main purpose is to regulate usage .
    • Include synonyms (“ See” references) and related or associated terms (“ See also” references).
    • Examples include Library of Congress Name Authority File (LCNAF), local authorities for names, subjects, etc.
    • Authority files may take the form of thesauri, word lists, etc.—in other words, any kind of vocabulary can be used as an authority.
  • More on Thesauri
  • Thesauri
    • Terms in a thesaurus may have the following three types of relationships:
      • Equivalence
      • Hierarchical
      • Associative
  • Thesaural Relationships
    • Equivalence
      • synonyms, spelling variations, language variations
    • Hierarchical
      • broader to narrower
        • whole/part
        • genus/species
    • Associative
      • related concepts
  • Equivalence Relationship : Terms/names denote the same thing—a preferred name is used for displays
    • Bulgarini, Bartolomeo
    • (Sienese painter, circa 1337-1378)
    • Lorenzetti, Ugolino
    • Master of the Ovile Madonna
    • Ovile Master
    example from ULAN
  • Equivalence Relationship
    • still lifes
    • still life
    • still-lifes
    • still lives
    • nature morte
    • natura morta
    • stilleven
    • Stilleben
    • vie coye
    • ontbijtje
    • banketje
  • Whole/Part Relationship : “children” or narrower terms are part of the parent or broader term
    • España..........................(nation)
    • Andalucía.......................(region)
    • Almería.........................(province)
    • Cádiz...........................(province)
    • Córdoba.........................(province)
    • Granada.........................(province)
    • Huelva..........................(province)
    • Málaga..........................(province)
    • Sevilla.........................(province)
  • Genus/Species Relationship : “children” represent types of the “parent” or broader term
    • funerary sculpture
    • brasses
    • effigies
    • gisants...
    • haniwa
    • tomb slabs
    • ushabti
  • Associative Relationship : terms are related conceptually, but not necessarily hierarchically
    • Descriptor: charterhouses
    • Hierarchy: Built Complexes and Districts
    • Scope note - Carthusian monasteries.
    • Alternate Forms of Speech {ALT}:
    • charterhouse
    • Synonyms and spelling variants {UF}:
    • certose
    • charter houses
    • chartreuses
    • Related concepts:
    • Carthusian (Religions hierarchy)
  • indexer thesaurus: A thesaurus designed to control terminology and guide indexers in the choice of terms. See also end-user thesaurus . indexing: Also called human indexing . The process of evaluating information and designating indexing terms by using controlled vocabulary that will aid in finding and accessing the cultural work record. Refers to indexing done by human labor, not to the automatic parsing of data into a database index , which is used by a system to speed up search and retrieval. indexer thesaurus
  • A thesaurus designed for direct access by searchers rather than for use by indexers. Instead of controlling the terminology, the purpose of an end-user thesaurus is to help searchers find useful terminology for improving, narrowing, and broadening their queries. end-user thesaurus
  • A vocabulary constructed with the goal of being interoperable with an existing vocabulary, e.g. a specialty vocabulary such as a conservation thesaurus that is intended to be linked to the superstructure of a larger vocabulary, such as the AAT. satellite vocabulary
  • Vocabularies provide
    • intellectual “paths” that can improve access to information
    • Harlem Renaissance
    • Negro Renaissance
    • New Negro Movement
    • Renaissance, Harlem
    • Renaissance, Negro
    Jacob Lawrence Tombstones , 1942 Example from the AAT
  • Why do we need vocabularies?
    • Because of national and regional differences: lorries vs. trucks , lifts vs. elevators, Tom Thumb golf courses vs. miniature golf courses
    • Because of historical vs. contemporary names: Iran vs. Persia vs. Islamic Republic of Iran
    • Because of political and social changes: KhoiKhoi vs. Hottentot
    • Because of linguistic differences: Titian vs. Tiziano vs. Titien ; pottery vs. keramik vs. céramique
    • To disambiguate homographs: sinopia (pigment -- Materials hierarchy) vs. sinopia (preliminary drawing -- Visual Works hierarchy)
  • Why do we need vocabularies?
    • Thesaural relationships provide greater research/searching capabilities:
    • drawings
    • <drawings by function>
    • preliminary drawings
    • underdrawings
    • sinopie
  • Issues in vocabulary-enhanced searching
    • User interfaces are problematic
    • Optimally, controlled vocabularies should be used both on the “back end” and on the “front end” to be most effective
    • Economics: consistent implementation of controlled vocabularies is time- and labor-intensive
    • Vocabulary control is almost non-existent on the Web at present
  • Search “ARES” Against Getty Web site
  • “ ARES” did not match any pages
  • Improve recall by ORing equivalent names (Ares, Mars)
  • “ Ares OR Mars” now retrieves 37 pages
  • Search “ARES” Against Google (returns 1,250,000 pages; none of first 6 pages are relevant)
  • Increase precision by ANDing the broader/parent term of ARES, “Major Gods”
  • “ Ares AND Major Gods” now narrow to 506 hits (all first 7 pages are relevant)
  • Recall and Precision
    • Note that when searching “Ares” against the Getty site, it retrieves nothing. So we need to include synonyms/equivalents (OR “Mars”) to improve recall. When performing the same search against Google, however, it returns too many hits. So we need to combine the broader term (AND “Major Gods”) to improve precision. This illustrates how important it is for a retrieval system to be flexible and let the user decide how to refine the search according to specific situations.
    • Examples of standards for data values:
    • The Getty Vocabularies
    • Library of Congress Name Authority File (LCNAF)
    • Library of Congress Subject Headings (LCSH)
  • The Getty Vocabularies
  • The Getty Vocabularies
    • Compiled and maintained by the Getty Vocabulary Program
      • Union List of Artist Names ® (ULAN)
        • 117,600 ‘records’; 257,241 names
      • Art & Architecture Thesaurus ® (AAT)
        • 33,150 ‘records’; 128,075 terms
      • Getty Thesaurus of Geographic Names ® (TGN)
        • 911,300 ‘records,’1,102,200 names
    • Focus on the visual arts, architecture, & material culture
    • Are compiled resources (not comprehensive)
    • Grow through contributions
    • May be licensed (vendors of collection management systems, others)
  • Sagrada Familia, Barcelona Spain, 1882-1926. Image © Portrait © Encyclopedia Britannica online. Elements of a ULAN record artist 500014514 names Gaudí, Antoni Gaudí y Cornet, Antonio Cornet, Antoni Gauí Gaudí i Cornet, Antoni Note: The Focus of each vocabulary record is a concept - not a “term”
  • notes Gaudí was influenced by Catalonia's medieval history and architecture. His works display a respect for craftsmanship and structural logic. He was also inspired by forms in nature, using it in structure and ornament, creating a highly personal, organic style. His work is characterized by sculptural plasticity, the manipulation of light, and the use of mosaics and polychromy. His later style is classified as Catalan Modernisme, a style related to Art Nouveau. roles architect, landscape architect, furniture designer geographic location Reus (Spain) Barcelona (Spain) nationalities Catalan, Spanish bibliography Contemporary Architects (1987) Enciclopedia universal ilustrada (1978-1983) Encyclopedia of world art (1959-1987) Grove Dictionary of Art online (1999-) LC Name Authority Headings [online] (2002-) Elements of a ULAN record related people studied with: Juan Martorell Montells life dates Birth Date: 1852 Death Date: 1926 names Gaudí, Antoni Gaudí y Cornet, Antonio Cornet, Antoni Gauí Gaudí i Cornet, Antoni artist 500014514
  • Equivalence Relationships in ULAN
    • all names refer to same person
    • used for retrieval
    • one is “preferred”
    NAMES: Le Corbusier Corbusier, Le Corbu Charles Edouard Jeanneret Jeanneret, Charles Edouard Jeanneret, Charles-Edouard Jeanneret-Gris, Charles-Edouard portrait photo © from Encyclopedia Britannica Online, Le Corbusier, photograph by Yousuf Karsh, 1954 ; © Karsh--Woodfin Camp and AssociatesConvent of La Tourette, by Le Corbusier, at Eveux-sur-Arbresle, near Lyon, France, 1957 to 1960. ; Photo by Donald Corner and Jenny Young; CD.2260.1012.1841.051. © Donald Corner and Jenny Young Charles-Edouard JEANNERET) ; &quot;La caída de Barcelona”; 1939 ; Oil on canvas ; 81 x 99,5 cm; © Museo Nacional, Sofia; image from
  • Former names, “incorrect” names
    • Names for Sienese painter, active by 1337, died Sept. 4, 1378
    • include spelling variations, former names
    Names: Bulgarini, Bartolomeo Bartolomeo Bolgarini Bartolomeo Bolghini Bartolomeo Bulgarini Bartolommeo Bulgarini da Siena Maestro d'Ovile Master of the Ovile Madonna Ovile Master Lorenzetti, Ugolino Ugolino Lorenzetti Assumption of the Virgin, Pinacoteca Nazionale, Siena image from: Carli, Enzo, Sienese Painting, Harper & Row, 1983; The St. Catherine of Alexandria, National Gallery of Art, Washington DC, 1943.4.20, image from
    • published misspellings provide additional access points
    Common misspelling; married name NAMES: O'Keeffe, Georgia Georgia O'Keeffe O'Keefe, Georgia Stieglitz, Alfred, Mrs. Georgia O'Keefe; Ram's Skull With Brown Leaves; Roswell Museum and Art Center; Roswell, New Mexico from: /
  • Various transliterations, diacritics
    • variant transliterations provide additional access points
    • diacritics recorded in code-extended ASCII (e.g., $07) in data, maps to Unicode
    Iwan Schischkin (1831 - 1898) Im Russischen Wald, 1896; Öl auf Leinwand, 139 x 95 cm; image from
  • Translations
    • NAMES:
    • Mato Wanartaka
    • Kicking Bear
    • common translations are also important variants
    Battle of Little Big Horn, ca. 1898 Watercolor on muslin2 ft. 11 in. x 5 ft. 10 in. (frame included) The Southwest Museum (Los Angeles, California)
  • Associative Relationships in ULAN RELATED PERSON : son of Albrecht Dürer the elder RELATED PERSON: student of Michael Wolgemut, from 1486 through 1490
    • student/teacher relationships
    • familial relationships if parent is also an artist
    • dates of relationship
    Albrecht Dürer; German, 1471 - 1528; Knight, Death and Devil, 1513; engraving on laid paper, sheet: 24.8 x 19 cm (9 3/4 x 7 1/2 in.); © National Gallery of Art, Washington, DC. Gift of W.G. Russell Allen, 1941.1.20 for Albrecht Dürer
  • Controlled vocabularies: Why bother?
  • Αγία Σοφία Ayasofya Church of the Holy Wisdom Hagia Sophia Haghia Sophia Saint Sophia Sancta Sophia St. Sophia
  • Constantinople Constantinopolis Costantinopoli Estambul Istanbul Konstantinopel New Rome Mikligard Tsargrad Tsarigrad names from Getty Thesaurus of Geographic Names (TGN)
    • deposit slip/deposit ticket =
    • paying-in slip
    • confirmation chit =
    • receipt, deposit receipt
  • = cargo shorts = board shorts
  • desk? cartonnier? chest? cabinet?
  • dolls? figurines? statuettes? idols? carvings? sculptures?
    • Giambologna?
    • Giovanni da Bologna?
    • Jean de Boulogne?
    • Users may call the same artist by various names
    • Items have been catalogd using different names for the same artist
    • published misspellings provide access points
    Common misspellings NAMES: O’Keeffe, Georgia Georgia O’Keeffe O’Keefe, Georgia Stieglitz, Alfred, Mrs. Georgia O'Keefe Ram's Skull With Brown Leaves Roswell Museum and Art Center Roswell, New Mexico from:
  • Anonymous artist, later named
    • former appellations
    • name is now known
    • NAMES:
    • Bulgarini, Bartolomeo
    • Bartolomeo Bolgarini
    • Bulgarini da Siena, Bartolommeo
    • Lorenzetti, Ugolino
    • Master of the Ovile Madonna
    • Ovile Master
    The Crucifixion, mid 1300s, tempera on wood, The Hermitage (St. Petersburg, Russia) image from ~arthp/html/l/lorenzet/ugolino/index.html
  • Database issues
    • repeating vs. non-repeating fields
    • vocabulary-controlled vs. free-text fields (for indexing vs. display)
    • “ built-in thesauri”; vocabulary-assisted searching OR
    • addition of broader terms, variants, at record level
  • If we use terms from a standard source such as LCSH or the AAT, why do we need our own “local” authority file(s)?
  • Why do we need local authorities ?
    • Local authorities can provide terms not found in published authorities, including non-expert and even “wrong” terms and names.
    • An authority record can remind the cataloger/indexer/abstractor of policies regarding local usage of the term.
    • An authority record can contain relevant/appropriate variant names for the term and identify the one that is preferred and used by the project or institution.
  • What about social tagging and folksonomies?
  • In the context of the Web, the act of associating terms (called “tags”) with an information object (e.g. a Web page, an image, a streaming video clip), thus describing the item and enabling keyword-based classification and retrieval. Tags – a form of user-generated metadata – from communities of users can be aggregated and analyzed, providing useful information about the collection of objects with which the tags have been associated. tagging
  • The decentralized practice and method by which individuals and groups create, manage, and share terms, names, etc. (called “tags”) to annotate and categorize digital resources in an online “social” environment. A folksonomy is the result of social tagging. Also referred to as collaborative tagging, social classification, social indexing, mob indexing, folk categorization. social tagging
  • An orderly classification that explicitly expresses the relationships, usually hierarchical (e.g., genus/species, whole/part, class/instance), between and among the things being classified. taxonomy
  • An assemblage of concepts, represented by terms and names (called “tags”), the result of social tagging. A folksonomy is not a taxonomy. folksonomy
  • Vocabularies in the Corporate World
  • Disney Titles (preferred forms)
    • One Hundred and One Dalmatians
    • 101 Dalmatians
    • 101 Dalmatians II: Patch’s London Adventure
    • 101 Dalmatians, Disney’s
    • 101 Dalmatians: Escape from De Vil Manor
    • Sing Along Songs: Disney’s: 101 Dalmatians – Pongo & Perdita
    • 101 Dalmatians Holiday Art
  • Disney Variants
    • One Hundred and One Dalmatians
        • One Hundred and One Dalmations
        • One Hundred and One Dalmatians (animated)
        • 101 Dalmations (animated feature film)
        • 101 Dalmations
    • 101 Dalmatians
        • One Hundred and One Dalmatians (live action)
        • One Hundred and One Dalmations (live action)
        • One Hundred and One Dalmations (live-action feature)
        • 101 Dalmations
  • What’s in a name? That which we call a rose By any other name would smell as sweet. Shakespeare, Romeo and Juliet , Act II, scene ii
  • Murtha Baca Head, Vocabulary Program Getty Research Institute [email_address]