Thesauri, Controlled Vocabularies, and Metadata Information Architecture
Why? A way to view the network of relationships between the IA systems Glue that holds the systems together
Metadata “ Data about data” Provide information about or documentation of other data managed with an application or environment. For example: data about elements or attributes (name, size, data type, etc.)
Metadata Metadata tags are used to describe documents, pages, images, software, video and audio files. And other content objects for the purpose of improving navigation and retrieval. Example: <META name=“keywords” content=“”information Architecture, content management, knowledge management, user experience”>
Metadata Metadata driven web-sites take advantage of : Content management software Controlled vocabulary We need to describe the documents and the software and vocabulary take care of the rest.
Controlled Vocabulary
Controlled Vocabularies A controlled vocabulary is a list of equivalent terms in the form of synonym ring, or list of preferred terms in the form of an authority file. A subset of natural language
Types of Controlled Vocabularies Simple Complex (Relationships) (Vocabularies) Synonym Rings Authority Files Classification Schemes Thesauri Equivalence Hierarchical Associative
Controlled Vocabularies Synonym Rings connects a set of words that are defined as equivalent for the purpose of retrieval. Cuisinart Food processor blender Kitchen aid Kitchenaid Cuizinart
Synonym Rings Pros: Help users to locate information using different terms Can be easily implemented using standard capabilities of search engines Increases recall Cons: Users can be confused by results that actually don’t include their keywords. Might reduce precision
Synonym Rings Recall Precision trade-off
Authority Files Authority Files It is a list of preferred terms or acceptable values. It may include variants or synonyms Authority files are synonym rings in which one term has been defined as  preferred term .
Authority Files Example: A list of U.S. states AL  :: Alabama AK  :: Alaska AZ  :: Arizona AR  :: Arkansas . . .
Authority Files Pros Can be a tool for improving consistency among content authors and indexers. Can be used to “educate” users. Preferred terms are useful for labeling and navigation Cons If equivalent terms begin with different letters, preferred terms must be complemented with links to other terms.  Example: Aspirin see Bayer
Classification Schemes Classification Schemes or taxonomies Is a hierarchical arrangement of preferred terms. Examples: Dewey Decimal Classification (DDC) Yahoo! Hierarchy of categories
Controlled Vocabulary Thesauri “ A thesaurus is a controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval.”
Controlled Vocabulary Associative Relationship Preferred Term Broader Term Variant Term Variant Term Related Term Related Term Narrower Term Associative Relationship Hierarchical Relationship Hierarchical Relationship Equivalence Relationship Equivalence Relationship
Controlled Vocabulary Technical Lingo Preferred term (PT) Variant Term (VT) Broader Term (BT) Narrower Term (NT) Related Term (RT) Use (U) Used For (UF) Scope Note (SN)
Controlled Vocabulary Examples of Thesaurus in web design PubMed (National Library of Medicine)  http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
Types of Thesauri Searching Thesaurus No tagging of content Can enrich queries Indexing Thesaurus Enables browsable  indexes value untapped By search No Thesaurus Natural language search Classic Thesaurus High-end full Function tool Thesaurus Used in Indexing Thesaurus used in searching
Semantic Relationships Equivalence Connects preferred terms and their variants. Example: Preferred Term Aspirin Variant Terms Acetysal, Acetylsalicylic Acid, ASA, Bayer, Polopirin A = B
Semantic Relationships Hierarchical Divides up the information space into categories and subcategories. Subtypes: Generic Whole-part Instance A B
 
Semantic Relationships Associative Strongly implied semantic connections that aren’t capture within equivalence or hierarchical relationships Examples: Field and object of study :  Cardiology RT Heart Process and its agent   : Termite Control RT Pesticides Concepts and properties  :  Poison RT Toxicity Action and product   : Eating RT Indigestion Causal dependency   : Celebration RT New Year’s Eve B A
Preferred Terms Term form Grammatical form: Usually nouns Spelling: Most common spelling form employed by users Singular and Plural form:  count nouns in plural (i.e.cars, roads, maps) Conceptual nouns in singular (i.e. math) Abbreviations and acronyms: default to popular use.
Preferred Terms Term Selection Term selection should be guided by your goals and how the thesaurus will integrate with your web site.
Preferred Terms Term Definition Extreme specificity – we want to control vocab Examples Cells (biology) Cells (electric) Cells (prison)
Preferred Terms Term specificity Whether to use pre-coordination of terms or not. For example:  “ Knowledge Management Software” OR “ Knowledge Management” “ Software” Decision depends on your context.
Polyhierarchy Polyhierarchy allows multiple parents for a single node. Diseases Respiratory Track Infections Viral Pneumonia Virus Diseases
Faceted Classification Invented by Shiyali R. Ranganathan in 1930. Main principle: Documents and objects have multiple dimensions, or  facets .
Faceted Classification The faceted classification uses multiple taxonomies that focus on different dimensions of the content.
Faceted Classification Ranganathan’s universal facets: Personality Matter Energy Space Time
Faceted Classification Most common facets used in the business world: Topic Product Document Type Audience Geography Price
Facted Classification Example of a faceted classification in a web site:  wine.com  Facet Sample controlled vocabulary values Type Red, white, sparkling, Pink, Dessert Region (origin) Australian, Californian, French, Italian Winery (manufacturer) Blackstone, Clos du Bois, Cakebread Year 1969, 1990, 199, 2000, 2001, 2002 Price $3.99, $29.99, <$199, Cheap, Moderate, Expensive
Faceted Classification More information about faceted classification: KMconnection:  http://www.kmconnection.com/DOC100100.htm   Presentation of Faceted Classification  http://www.asis.org/Conferences/Summit2002/Gruenberg.ppt   Innovation in classification:  http://www.peterme.com/archives/00000063.html

Thesauri

  • 1.
    Thesauri, Controlled Vocabularies,and Metadata Information Architecture
  • 2.
    Why? A wayto view the network of relationships between the IA systems Glue that holds the systems together
  • 3.
    Metadata “ Dataabout data” Provide information about or documentation of other data managed with an application or environment. For example: data about elements or attributes (name, size, data type, etc.)
  • 4.
    Metadata Metadata tagsare used to describe documents, pages, images, software, video and audio files. And other content objects for the purpose of improving navigation and retrieval. Example: <META name=“keywords” content=“”information Architecture, content management, knowledge management, user experience”>
  • 5.
    Metadata Metadata drivenweb-sites take advantage of : Content management software Controlled vocabulary We need to describe the documents and the software and vocabulary take care of the rest.
  • 6.
  • 7.
    Controlled Vocabularies Acontrolled vocabulary is a list of equivalent terms in the form of synonym ring, or list of preferred terms in the form of an authority file. A subset of natural language
  • 8.
    Types of ControlledVocabularies Simple Complex (Relationships) (Vocabularies) Synonym Rings Authority Files Classification Schemes Thesauri Equivalence Hierarchical Associative
  • 9.
    Controlled Vocabularies SynonymRings connects a set of words that are defined as equivalent for the purpose of retrieval. Cuisinart Food processor blender Kitchen aid Kitchenaid Cuizinart
  • 10.
    Synonym Rings Pros:Help users to locate information using different terms Can be easily implemented using standard capabilities of search engines Increases recall Cons: Users can be confused by results that actually don’t include their keywords. Might reduce precision
  • 11.
    Synonym Rings RecallPrecision trade-off
  • 12.
    Authority Files AuthorityFiles It is a list of preferred terms or acceptable values. It may include variants or synonyms Authority files are synonym rings in which one term has been defined as preferred term .
  • 13.
    Authority Files Example:A list of U.S. states AL :: Alabama AK :: Alaska AZ :: Arizona AR :: Arkansas . . .
  • 14.
    Authority Files ProsCan be a tool for improving consistency among content authors and indexers. Can be used to “educate” users. Preferred terms are useful for labeling and navigation Cons If equivalent terms begin with different letters, preferred terms must be complemented with links to other terms. Example: Aspirin see Bayer
  • 15.
    Classification Schemes ClassificationSchemes or taxonomies Is a hierarchical arrangement of preferred terms. Examples: Dewey Decimal Classification (DDC) Yahoo! Hierarchy of categories
  • 16.
    Controlled Vocabulary Thesauri“ A thesaurus is a controlled vocabulary in which equivalence, hierarchical, and associative relationships are identified for purposes of improved retrieval.”
  • 17.
    Controlled Vocabulary AssociativeRelationship Preferred Term Broader Term Variant Term Variant Term Related Term Related Term Narrower Term Associative Relationship Hierarchical Relationship Hierarchical Relationship Equivalence Relationship Equivalence Relationship
  • 18.
    Controlled Vocabulary TechnicalLingo Preferred term (PT) Variant Term (VT) Broader Term (BT) Narrower Term (NT) Related Term (RT) Use (U) Used For (UF) Scope Note (SN)
  • 19.
    Controlled Vocabulary Examplesof Thesaurus in web design PubMed (National Library of Medicine) http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  • 20.
    Types of ThesauriSearching Thesaurus No tagging of content Can enrich queries Indexing Thesaurus Enables browsable indexes value untapped By search No Thesaurus Natural language search Classic Thesaurus High-end full Function tool Thesaurus Used in Indexing Thesaurus used in searching
  • 21.
    Semantic Relationships EquivalenceConnects preferred terms and their variants. Example: Preferred Term Aspirin Variant Terms Acetysal, Acetylsalicylic Acid, ASA, Bayer, Polopirin A = B
  • 22.
    Semantic Relationships HierarchicalDivides up the information space into categories and subcategories. Subtypes: Generic Whole-part Instance A B
  • 23.
  • 24.
    Semantic Relationships AssociativeStrongly implied semantic connections that aren’t capture within equivalence or hierarchical relationships Examples: Field and object of study : Cardiology RT Heart Process and its agent : Termite Control RT Pesticides Concepts and properties : Poison RT Toxicity Action and product : Eating RT Indigestion Causal dependency : Celebration RT New Year’s Eve B A
  • 25.
    Preferred Terms Termform Grammatical form: Usually nouns Spelling: Most common spelling form employed by users Singular and Plural form: count nouns in plural (i.e.cars, roads, maps) Conceptual nouns in singular (i.e. math) Abbreviations and acronyms: default to popular use.
  • 26.
    Preferred Terms TermSelection Term selection should be guided by your goals and how the thesaurus will integrate with your web site.
  • 27.
    Preferred Terms TermDefinition Extreme specificity – we want to control vocab Examples Cells (biology) Cells (electric) Cells (prison)
  • 28.
    Preferred Terms Termspecificity Whether to use pre-coordination of terms or not. For example: “ Knowledge Management Software” OR “ Knowledge Management” “ Software” Decision depends on your context.
  • 29.
    Polyhierarchy Polyhierarchy allowsmultiple parents for a single node. Diseases Respiratory Track Infections Viral Pneumonia Virus Diseases
  • 30.
    Faceted Classification Inventedby Shiyali R. Ranganathan in 1930. Main principle: Documents and objects have multiple dimensions, or facets .
  • 31.
    Faceted Classification Thefaceted classification uses multiple taxonomies that focus on different dimensions of the content.
  • 32.
    Faceted Classification Ranganathan’suniversal facets: Personality Matter Energy Space Time
  • 33.
    Faceted Classification Mostcommon facets used in the business world: Topic Product Document Type Audience Geography Price
  • 34.
    Facted Classification Exampleof a faceted classification in a web site: wine.com Facet Sample controlled vocabulary values Type Red, white, sparkling, Pink, Dessert Region (origin) Australian, Californian, French, Italian Winery (manufacturer) Blackstone, Clos du Bois, Cakebread Year 1969, 1990, 199, 2000, 2001, 2002 Price $3.99, $29.99, <$199, Cheap, Moderate, Expensive
  • 35.
    Faceted Classification Moreinformation about faceted classification: KMconnection: http://www.kmconnection.com/DOC100100.htm Presentation of Faceted Classification http://www.asis.org/Conferences/Summit2002/Gruenberg.ppt Innovation in classification: http://www.peterme.com/archives/00000063.html