SlideShare a Scribd company logo
Making Decisions in
   Creating Taxonomies
                           Heather Hedden
Information Taxonomist, Viziant Corporation




                                                              November 8, 2007




     Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Background

• Heather Hedden’s taxonomy development experience
   – controlled vocabularies for periodical index databases (Gale)
   – matching of controlled vocabulary to keywords for consumer
     products/services directories (various “yellow pages” clients)
   – enterprise taxonomies for corporate web sites and intranets (Earley
     & Associations)
   – base and custom taxonomies integrated within a knowledge
     discovery and data mining product (Viziant)


• Viziant Corporation
   – A provider of information access and intelligence systems for
     enterprises and government



           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Decisions for the Taxonomist

• Decisions of the taxonomy owner
   –   Approximate number of top-level nodes and number of levels
   –   Structure: primarily facets or tree
   –   Interface design: number and layout of displayed nodes
   –   Presence of polyhierarchies
   –   Automated search & retrieval or human indexing/tagging


• Decisions often left to the taxonomist
   –   Exact/final number of levels, nodes per level
   –   Arrangement of the node hierarchy, placement within facets
   –   Degree of term pre- or post-coordination
   –   Extent of use of variants/cross-references


            Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Number of levels, nodes per level

• 3 levels and 6-8 nodes per level is a nice ideal
   – Web site/intranet menu navigation
      • Menu is confined to bar across top or margin to the side
      • Menus pull-down or topic trees expand in place


• More levels and nodes per level are often needed
   – Content management/document retrieval for large content
     repositories
       • industries, products, fields of science, diseases, geographies,
         named entities


• Decision: Make more levels or make more nodes per level


           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Number of levels, nodes per level: Examples

Deep: Many levels

Geographies
- North America      - South America                     - Europe             - Asia         - Africa             - Oceania
-- United States                                                              --Central Asia
--- New England                                                               --Middle East
---- Massachusetts                                                            --South Asia
----- Boston                                                                  --Southeast Asia
------ North End
------- Old North Church

Broad: Many nodes per level

Geographies
- U.S. cities  - U.S. States                  - Countries             - World cities               - Continents     - Landmarks
-- Albuquerque -- Alabama




               Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Number of levels, nodes per level: Examples

Deep: Many levels (SIC, NAICS style with 10-20 upper level nodes)

Industries
- Transportation services
-- Air transportation
--- Schedule air transportation services
---- Scheduled air freight transportation services

Broad: Many nodes per level (job search sites, 50 - 80 nodes per level)

Industries                      Second levels at select nodes only: Healthcare, Sales
- Accounting/Auditing
- Administrative Support Services
- Advertising/Marketing/Public Relations
- Aerospace/Aviation/Defense
- Agriculture, Forestry, & Fishing
- Airlines
  etc.


               Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Number of levels, nodes per level

• Decision Factors
   – Display interface/horizontal and vertical real estate
   – Speed of displaying deeper levels
   – User market, needs, and expectations
      • Industry experts, internal employees, general public, students,
        etc.


• Need to balance how much can be easily skimmed in one
  view vs. how many levels down the user has patience to
  click down through
• More levels lead to less consistency across levels.


           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Arrangement of node hierarchy

• Decision: What’s the best method to handle different
  means of classification within the same hierarchy?
   – Industries by traditional SIC/NAICS classification or by vertical
     market
   – Products by manufacturing technology or by end-use
   – Places by physical geographic location or by type
   – Organizations by goals/objectives or by political/religious affiliation
   – Government agencies by type or by country/state of affiliation


• Even within facets, there often are hierarchies.
• Even allowing polyheirarchies, a top-level classification is
  needed, and too many polyhierarchies can be confusing.

           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Arrangement of node hierarchy: Examples
1. Governmental bodies & agencies
   - U.S. governmental bodies & agencies
   -- U.S. Courts
   -- U.S. executive branch agencies
   -- U.S. legislative branch
   -- State bodies & agencies
   - Foreign governmental bodies & agencies
   -- Foreign courts
   -- Foreign legislatures
   -- Foreign national agencies
   -- Foreign state & provincial government agencies

2. Governmental bodies & agencies
   -- Foreign legislatures (+ instances)
   -- U.S. legislatures (+ US federal and state instances)

3. Governmental bodies & agencies
   - Legislative bodies
   -- National legislatures (+ instances, both foreign and US)
   -- State & provincial legislatures (+ all instances alphabetical for US and foreign)

4. Governmental bodies & agencies
   - Legislative bodies (+ all instances, US and foreign, in one alphabetical list)


                  Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Arrangement of node hierarchy

•   Decision: If linking named entities to topical subjects, should they each
    link at the lowest node level possible, or group all of them together at a
    higher level?

•   Example: Link specific churches at the broader term, Churches
    (denominations), the appropriate narrower term, or both

     Churches (denomination)
     - Catholic churches
     - Orthodox churches
     - Protestant churches


     Does the user know where to look for the Maronite Church or the
       Assyrian Church of the East?



               Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Arrangement of node hierarchy

• Decision factors:
   –   Knowledge of users as to where to categorize an entity
   –   Likelihood of users to browse rather than search for entities
   –   Existence of entities that don’t belong in a subcategory
   –   Purpose to teach users (students) where entities belong


• Linking entities at both specific and broader level, makes
  them easier to find, but clutters up the taxonomy, slows
  down performance, and may not seem logical at first to the
  user




            Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Arrangement of node hierarchy

• Decision Factors
   – User market, needs, and expectations
      • How the users classify the subject matter
      • Whether a topic is even likely to be browsed for in the taxonomy
         or rather entered in the search box
   – Support for polyhierachies
   – Permissibility of nodes as category labels, not linked to content, at
     various intermediate levels within the hierarchy
      • e.g. Foreign legislatures


• Need to consider
   – Whether to create nodes difficult to distinguish in indexing
      • e.g. both Legislative bodies and National legislatures


           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Placement within facets

• Facets may be determined by taxonomy owner, but
  placement of certain nodes may not be obvious
   – Institutions could be Places or Organizations
       • Places of worship, educational institutions, museums, libraries
   – Business activities could be Actions or Topics
       • Acquisitions, Contracts, Joint ventures, Sales


• Decisions:
   – In which facet to put these nodes
   – Whether two (parenthetically modified) nodes for the concept
     should be created, one for each facet, e.g. Hotels (buildings) and
     Hotels (companies)
   – Or whether a node can be in more than one facet

           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Placement within facets

• Decision factors
   – System support for two occurrences of the same-named node
   – Automated or manual indexing
      • Automated indexing may not distinguish between different facet-
        meanings of a term: action or topic, place or organization, etc.




           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Term pre-coordination or post-coordination

• Hierarchical tree or thesauri serve pre-coordination
   – User browses for most specific concept

• Facets serve post-coordination
   – User chooses combination of concepts from multiple facets (e.g.
     place, product type, usage issue, customer type)

• But topic trees/thesauri may be used within a UI supporting
  multiple search terms (narrow a search)
• But hierarchies can exist within facets

• Decisions:
   – In a topic tree/thesaurus, whether to expect post-coordination
   – In a faceted taxonomy, whether and how much to have pre-
     coordination

           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Term pre-coordination or post-coordination

• Place and Topic facets
   – Russian foreign policy or Russia and Foreign policy
   – French embassies or France and Embassies
   – United States-Canadian relations
• Ethnicity and Occupation facets
   – Hispanic writers or Hispanics and Writers
• Body part and Disease facets
   – Ovarian cancer or Ovaries and Cancer
• Business action and Product facets
   – Drug trials or Product testing and Drugs
   – CRM Software or Customer Relations Management and
     Software/Marketing software


           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Term pre-coordination or post-coordination

• Decision Factors
   – Human or automated indexing/tagging
      • If human indexing, all could be post-coordinated
   – Keyword searching or taxonomy browse
      • If Keyword searching, pre-coordinated is preferred
   – Nature and volume of content
      • Specific content serves narrower pre-coordinated subjects
   – Scope of the content
      • Wide range of articles is better served by pre-coordination




           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Term pre-coordination or post-coordination

• Advantages to pre-coordinated terms
   – Provide more precise retrieval results, if used correctly
   – Better suited for specific, custom taxonomies
   – Better suited for phrase search string searching
• Disadvantages to pre-coordinated terms
   – Narrower nodes might be overlooked by the user.
   – More complex to correctly index.


• Flexibility in degree of pre- or post-coordination is OK, but
  consistency of application makes the taxonomy more
  usable.


           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Variants and cross-references

• Variants, Nonpreferred terms, Nonpostable terms,
  Equivalent terms, See references, Cross-references,
  Keywords

• First, take into consideration:
   –   Human or automated indexing/tagging
   –   Automated stemming
   –   Taxonomy browse, search, or both. If both, which is dominant
   –   Content from divergent sources, countries
   –   System/UI support for a variant pointing to more than one node




            Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Variants and cross-references

• Decision: whether a concept should be a node or its variant
  (when they are not synonyms)
   – Create a more specific/narrower node, or use it as a variant
      • Hydroelectric plants USE Electric power plants
      • Factories USE Plants & factories
   – Differentiate closely related terms, or use one as a variant
      • Foreign policy vs. International relations
      • Colleges & universities vs. Higher education
   – Differentiate topics from actions, or use one as a variant
      • Contracts vs. Contracting
      • Investments vs. Investing




           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Variants and cross-references

• Decision: whether a term should be a node or its variant
  (when synonyms)
   –   Plural vs. singular
   –   Acronym vs. spelled out form
   –   Technical/academic vs. popular term
   –   Synonyms also for a word within a phrase-term
        • administration vs. management
        • oil vs. petroleum
        • communications vs. telecommunications
        • health vs. medical




            Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Variants and cross-references

• Decision Factors: for the number of variants per node
   – Users as monolithic or diverse
   – Size of taxonomy (if browsable)
      • If small and easily learned then large number of variants
        unnecessary
   – Human or automated indexing/tagging
      • Automated indexing needs many more variants
   – Keyword searching or taxonomy browse
      • If Keyword searching needs more variants
   – Nature and volume of content
      • Broad/general content needs more variants
   – Display of Cross-references
      • Limit the number of variants if they display in the UI

           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Variants and cross-references

• Decision Factors: for the choice of term as node or variant
   – User background, level of expertise, expectations
   – Political correctness, instructiveness to users
   – Number of characters in display width



• The more stakeholders involved, the more complex the
  decision in choosing the preferred name of the node




           Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Conclusions

•   Taxonomy creation is a decision-making task
•   Different decisions are based on different factors
•   Each taxonomy project is unique
•   Creators/editors of the taxonomy need to know:
    –   Who are the users and what are their needs
    –   What is the nature of the content
    –   What the user interface will look like
    –   What the system supports (faceted search, multiple cross-refs)
    –   How the content will be indexed/tagged




             Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
Questions?



  Heather Hedden
  Information Taxonomist
  Viziant Corporation
  Two International Place, Suite 410
  Boston, MA 02110
  www.viziantcorp.com

  Heather.hedden@viziantcorp.com
  617-517-0075 ext. 104
  978-467-5195 (cell)



          Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

More Related Content

Similar to Making Decisions in Creating Taxonomies

eROSA Stakeholder WS1: AgGateway and FAIR Data
eROSA Stakeholder WS1: AgGateway and FAIR DataeROSA Stakeholder WS1: AgGateway and FAIR Data
eROSA Stakeholder WS1: AgGateway and FAIR Data
e-ROSA
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer Overview
Gridlogics
 
Nance "Demystifying Resource Sharing"
Nance "Demystifying Resource Sharing"Nance "Demystifying Resource Sharing"
Nance "Demystifying Resource Sharing"
National Information Standards Organization (NISO)
 
Taxonomies and Search for Chicago SharePoint User Group
Taxonomies and Search for Chicago SharePoint User GroupTaxonomies and Search for Chicago SharePoint User Group
Taxonomies and Search for Chicago SharePoint User Group
Earley Information Science
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
Dr. Haxel Consult
 
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Lucidworks (Archived)
 
Taxonomy Book Camp SharePoint IA 11-17-10
Taxonomy Book Camp SharePoint IA 11-17-10Taxonomy Book Camp SharePoint IA 11-17-10
Taxonomy Book Camp SharePoint IA 11-17-10
Earley Information Science
 
Douglas Briggs
Douglas BriggsDouglas Briggs
Douglas Briggs
daveGBE
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
Access Innovations, Inc.
 
TECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTS
TECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTSTECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTS
TECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTS
Annelore van der Lint
 
SharePoint Governance 101 SPSSA2016
SharePoint Governance 101  SPSSA2016SharePoint Governance 101  SPSSA2016
SharePoint Governance 101 SPSSA2016
Jim Adcock
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer Introduction
Gridlogics
 
DB_Assgn 3
DB_Assgn 3DB_Assgn 3
DB_Assgn 3
Dezirae N. Brown
 
Business Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and StewardshipBusiness Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and Stewardship
Pieter De Leenheer
 
Information Services: Breaking down Departmental Silos
Information Services: Breaking down Departmental SilosInformation Services: Breaking down Departmental Silos
Information Services: Breaking down Departmental Silos
Albert Simard
 
Designing an effective information architecture (
Designing an effective information architecture (Designing an effective information architecture (
Designing an effective information architecture (
Vickey Bird
 
Educación Lúdica
Educación Lúdica Educación Lúdica
Educación Lúdica
EstefanyPaulinoSilve
 
uso de sitios web
uso de sitios web uso de sitios web
uso de sitios web
rollimaka
 
Evaluating websites
Evaluating websitesEvaluating websites
Evaluating websites
vbaker2210
 

Similar to Making Decisions in Creating Taxonomies (20)

eROSA Stakeholder WS1: AgGateway and FAIR Data
eROSA Stakeholder WS1: AgGateway and FAIR DataeROSA Stakeholder WS1: AgGateway and FAIR Data
eROSA Stakeholder WS1: AgGateway and FAIR Data
 
PatSeer Overview
PatSeer OverviewPatSeer Overview
PatSeer Overview
 
Nance "Demystifying Resource Sharing"
Nance "Demystifying Resource Sharing"Nance "Demystifying Resource Sharing"
Nance "Demystifying Resource Sharing"
 
Taxonomies and Search for Chicago SharePoint User Group
Taxonomies and Search for Chicago SharePoint User GroupTaxonomies and Search for Chicago SharePoint User Group
Taxonomies and Search for Chicago SharePoint User Group
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and MethodologyEnterprise Knowledge - Taxonomy Design Best Practices and Methodology
Enterprise Knowledge - Taxonomy Design Best Practices and Methodology
 
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DCBig Data Challenges, Presented by Wes Caldwell at SolrExchage DC
Big Data Challenges, Presented by Wes Caldwell at SolrExchage DC
 
Taxonomy Book Camp SharePoint IA 11-17-10
Taxonomy Book Camp SharePoint IA 11-17-10Taxonomy Book Camp SharePoint IA 11-17-10
Taxonomy Book Camp SharePoint IA 11-17-10
 
Douglas Briggs
Douglas BriggsDouglas Briggs
Douglas Briggs
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
TECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTS
TECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTSTECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTS
TECHNOLOGY FOR HANDLING FOIA & PUBLIC DISCLOSURE REQUESTS
 
SharePoint Governance 101 SPSSA2016
SharePoint Governance 101  SPSSA2016SharePoint Governance 101  SPSSA2016
SharePoint Governance 101 SPSSA2016
 
PatSeer Introduction
PatSeer IntroductionPatSeer Introduction
PatSeer Introduction
 
DB_Assgn 3
DB_Assgn 3DB_Assgn 3
DB_Assgn 3
 
Business Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and StewardshipBusiness Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and Stewardship
 
Information Services: Breaking down Departmental Silos
Information Services: Breaking down Departmental SilosInformation Services: Breaking down Departmental Silos
Information Services: Breaking down Departmental Silos
 
Designing an effective information architecture (
Designing an effective information architecture (Designing an effective information architecture (
Designing an effective information architecture (
 
Educación Lúdica
Educación Lúdica Educación Lúdica
Educación Lúdica
 
uso de sitios web
uso de sitios web uso de sitios web
uso de sitios web
 
Evaluating websites
Evaluating websitesEvaluating websites
Evaluating websites
 

More from Heather Hedden

Introduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfIntroduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdf
Heather Hedden
 
Benefits of Taxonomies
Benefits of TaxonomiesBenefits of Taxonomies
Benefits of Taxonomies
Heather Hedden
 
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Heather Hedden
 
Taxonomies in Support of Search
Taxonomies in Support of SearchTaxonomies in Support of Search
Taxonomies in Support of Search
Heather Hedden
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOS
Heather Hedden
 
Mapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and OntologiesMapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and Ontologies
Heather Hedden
 
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementSelecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Heather Hedden
 
A Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge GraphsA Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge Graphs
Heather Hedden
 
Managing Taxonomy Tagging
Managing Taxonomy TaggingManaging Taxonomy Tagging
Managing Taxonomy Tagging
Heather Hedden
 
Taxonomies for Users
Taxonomies for UsersTaxonomies for Users
Taxonomies for Users
Heather Hedden
 
Taxonomy Design for SharePoint
Taxonomy Design for SharePointTaxonomy Design for SharePoint
Taxonomy Design for SharePoint
Heather Hedden
 
Taxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPressTaxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPress
Heather Hedden
 
Customer-Focused Thesauri
Customer-Focused ThesauriCustomer-Focused Thesauri
Customer-Focused Thesauri
Heather Hedden
 
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsSynonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred Terms
Heather Hedden
 
Managing Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan TermsManaging Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan Terms
Heather Hedden
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
Heather Hedden
 
Taxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexingTaxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexing
Heather Hedden
 

More from Heather Hedden (17)

Introduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdfIntroduction to Knowledge Graphs for Information Architects.pdf
Introduction to Knowledge Graphs for Information Architects.pdf
 
Benefits of Taxonomies
Benefits of TaxonomiesBenefits of Taxonomies
Benefits of Taxonomies
 
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
Thesauri for Indexing Support / Thesauri zur Unterstützung der Registererstel...
 
Taxonomies in Support of Search
Taxonomies in Support of SearchTaxonomies in Support of Search
Taxonomies in Support of Search
 
A Brief Introduction to SKOS
A Brief Introduction to SKOSA Brief Introduction to SKOS
A Brief Introduction to SKOS
 
Mapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and OntologiesMapping Taxonomies, Thesauri, and Ontologies
Mapping Taxonomies, Thesauri, and Ontologies
 
Selecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology ManagementSelecting Software for Taxonomy, Thesaurus and Ontology Management
Selecting Software for Taxonomy, Thesaurus and Ontology Management
 
A Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge GraphsA Brief Introduction to Knowledge Graphs
A Brief Introduction to Knowledge Graphs
 
Managing Taxonomy Tagging
Managing Taxonomy TaggingManaging Taxonomy Tagging
Managing Taxonomy Tagging
 
Taxonomies for Users
Taxonomies for UsersTaxonomies for Users
Taxonomies for Users
 
Taxonomy Design for SharePoint
Taxonomy Design for SharePointTaxonomy Design for SharePoint
Taxonomy Design for SharePoint
 
Taxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPressTaxonomies, Categories, and Tags in WordPress
Taxonomies, Categories, and Tags in WordPress
 
Customer-Focused Thesauri
Customer-Focused ThesauriCustomer-Focused Thesauri
Customer-Focused Thesauri
 
Synonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred TermsSynonyms, Alternative Labels, and Nonpreferred Terms
Synonyms, Alternative Labels, and Nonpreferred Terms
 
Managing Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan TermsManaging Mature Taxonomies: Resolving Orphan Terms
Managing Mature Taxonomies: Resolving Orphan Terms
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
Taxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexingTaxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexing
 

Recently uploaded

Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Kunal Gupta
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
RaminGhanbari2
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
Adam Dunkels
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
SAI KAILASH R
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
LINUS PROJECTS (INDIA)
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
Steven Carlson
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Nicolás Lopéz
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
313mohammedarshad
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
Tatiana Al-Chueyr
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
SynapseIndia
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
Brian Pichman
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
shanihomely
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
ssuser1915fe1
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
kumarjarun2010
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
BrainSell Technologies
 

Recently uploaded (20)

Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptxDublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
Dublin_mulesoft_meetup_Mulesoft_Salesforce_Integration (1).pptx
 
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyyActive Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
Active Inference is a veryyyyyyyyyyyyyyyyyyyyyyyy
 
How to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptxHow to Build a Profitable IoT Product.pptx
How to Build a Profitable IoT Product.pptx
 
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and DisadvantagesBLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
BLOCKCHAIN TECHNOLOGY - Advantages and Disadvantages
 
Pigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending PlantPigging Unit Lubricant Oil Blending Plant
Pigging Unit Lubricant Oil Blending Plant
 
Vulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive OverviewVulnerability Management: A Comprehensive Overview
Vulnerability Management: A Comprehensive Overview
 
Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024Vertex AI Agent Builder - GDG Alicante - Julio 2024
Vertex AI Agent Builder - GDG Alicante - Julio 2024
 
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptxIntroduction-to-the-IAM-Platform-Implementation-Plan.pptx
Introduction-to-the-IAM-Platform-Implementation-Plan.pptx
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Best Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdfBest Practices for Effectively Running dbt in Airflow.pdf
Best Practices for Effectively Running dbt in Airflow.pdf
 
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptxRPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
RPA In Healthcare Benefits, Use Case, Trend And Challenges 2024.pptx
 
Uncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in LibrariesUncharted Together- Navigating AI's New Frontiers in Libraries
Uncharted Together- Navigating AI's New Frontiers in Libraries
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
Premium Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service ...
 
Feature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptxFeature sql server terbaru performance.pptx
Feature sql server terbaru performance.pptx
 
Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSECHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
CHAPTER-8 COMPONENTS OF COMPUTER SYSTEM CLASS 9 CBSE
 
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdfAcumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
Acumatica vs. Sage Intacct vs. NetSuite _ NOW CFO.pdf
 

Making Decisions in Creating Taxonomies

  • 1. Making Decisions in Creating Taxonomies Heather Hedden Information Taxonomist, Viziant Corporation November 8, 2007 Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 2. Background • Heather Hedden’s taxonomy development experience – controlled vocabularies for periodical index databases (Gale) – matching of controlled vocabulary to keywords for consumer products/services directories (various “yellow pages” clients) – enterprise taxonomies for corporate web sites and intranets (Earley & Associations) – base and custom taxonomies integrated within a knowledge discovery and data mining product (Viziant) • Viziant Corporation – A provider of information access and intelligence systems for enterprises and government Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 3. Decisions for the Taxonomist • Decisions of the taxonomy owner – Approximate number of top-level nodes and number of levels – Structure: primarily facets or tree – Interface design: number and layout of displayed nodes – Presence of polyhierarchies – Automated search & retrieval or human indexing/tagging • Decisions often left to the taxonomist – Exact/final number of levels, nodes per level – Arrangement of the node hierarchy, placement within facets – Degree of term pre- or post-coordination – Extent of use of variants/cross-references Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 4. Number of levels, nodes per level • 3 levels and 6-8 nodes per level is a nice ideal – Web site/intranet menu navigation • Menu is confined to bar across top or margin to the side • Menus pull-down or topic trees expand in place • More levels and nodes per level are often needed – Content management/document retrieval for large content repositories • industries, products, fields of science, diseases, geographies, named entities • Decision: Make more levels or make more nodes per level Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 5. Number of levels, nodes per level: Examples Deep: Many levels Geographies - North America - South America - Europe - Asia - Africa - Oceania -- United States --Central Asia --- New England --Middle East ---- Massachusetts --South Asia ----- Boston --Southeast Asia ------ North End ------- Old North Church Broad: Many nodes per level Geographies - U.S. cities - U.S. States - Countries - World cities - Continents - Landmarks -- Albuquerque -- Alabama Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 6. Number of levels, nodes per level: Examples Deep: Many levels (SIC, NAICS style with 10-20 upper level nodes) Industries - Transportation services -- Air transportation --- Schedule air transportation services ---- Scheduled air freight transportation services Broad: Many nodes per level (job search sites, 50 - 80 nodes per level) Industries Second levels at select nodes only: Healthcare, Sales - Accounting/Auditing - Administrative Support Services - Advertising/Marketing/Public Relations - Aerospace/Aviation/Defense - Agriculture, Forestry, & Fishing - Airlines etc. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 7. Number of levels, nodes per level • Decision Factors – Display interface/horizontal and vertical real estate – Speed of displaying deeper levels – User market, needs, and expectations • Industry experts, internal employees, general public, students, etc. • Need to balance how much can be easily skimmed in one view vs. how many levels down the user has patience to click down through • More levels lead to less consistency across levels. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 8. Arrangement of node hierarchy • Decision: What’s the best method to handle different means of classification within the same hierarchy? – Industries by traditional SIC/NAICS classification or by vertical market – Products by manufacturing technology or by end-use – Places by physical geographic location or by type – Organizations by goals/objectives or by political/religious affiliation – Government agencies by type or by country/state of affiliation • Even within facets, there often are hierarchies. • Even allowing polyheirarchies, a top-level classification is needed, and too many polyhierarchies can be confusing. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 9. Arrangement of node hierarchy: Examples 1. Governmental bodies & agencies - U.S. governmental bodies & agencies -- U.S. Courts -- U.S. executive branch agencies -- U.S. legislative branch -- State bodies & agencies - Foreign governmental bodies & agencies -- Foreign courts -- Foreign legislatures -- Foreign national agencies -- Foreign state & provincial government agencies 2. Governmental bodies & agencies -- Foreign legislatures (+ instances) -- U.S. legislatures (+ US federal and state instances) 3. Governmental bodies & agencies - Legislative bodies -- National legislatures (+ instances, both foreign and US) -- State & provincial legislatures (+ all instances alphabetical for US and foreign) 4. Governmental bodies & agencies - Legislative bodies (+ all instances, US and foreign, in one alphabetical list) Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 10. Arrangement of node hierarchy • Decision: If linking named entities to topical subjects, should they each link at the lowest node level possible, or group all of them together at a higher level? • Example: Link specific churches at the broader term, Churches (denominations), the appropriate narrower term, or both Churches (denomination) - Catholic churches - Orthodox churches - Protestant churches Does the user know where to look for the Maronite Church or the Assyrian Church of the East? Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 11. Arrangement of node hierarchy • Decision factors: – Knowledge of users as to where to categorize an entity – Likelihood of users to browse rather than search for entities – Existence of entities that don’t belong in a subcategory – Purpose to teach users (students) where entities belong • Linking entities at both specific and broader level, makes them easier to find, but clutters up the taxonomy, slows down performance, and may not seem logical at first to the user Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 12. Arrangement of node hierarchy • Decision Factors – User market, needs, and expectations • How the users classify the subject matter • Whether a topic is even likely to be browsed for in the taxonomy or rather entered in the search box – Support for polyhierachies – Permissibility of nodes as category labels, not linked to content, at various intermediate levels within the hierarchy • e.g. Foreign legislatures • Need to consider – Whether to create nodes difficult to distinguish in indexing • e.g. both Legislative bodies and National legislatures Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 13. Placement within facets • Facets may be determined by taxonomy owner, but placement of certain nodes may not be obvious – Institutions could be Places or Organizations • Places of worship, educational institutions, museums, libraries – Business activities could be Actions or Topics • Acquisitions, Contracts, Joint ventures, Sales • Decisions: – In which facet to put these nodes – Whether two (parenthetically modified) nodes for the concept should be created, one for each facet, e.g. Hotels (buildings) and Hotels (companies) – Or whether a node can be in more than one facet Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 14. Placement within facets • Decision factors – System support for two occurrences of the same-named node – Automated or manual indexing • Automated indexing may not distinguish between different facet- meanings of a term: action or topic, place or organization, etc. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 15. Term pre-coordination or post-coordination • Hierarchical tree or thesauri serve pre-coordination – User browses for most specific concept • Facets serve post-coordination – User chooses combination of concepts from multiple facets (e.g. place, product type, usage issue, customer type) • But topic trees/thesauri may be used within a UI supporting multiple search terms (narrow a search) • But hierarchies can exist within facets • Decisions: – In a topic tree/thesaurus, whether to expect post-coordination – In a faceted taxonomy, whether and how much to have pre- coordination Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 16. Term pre-coordination or post-coordination • Place and Topic facets – Russian foreign policy or Russia and Foreign policy – French embassies or France and Embassies – United States-Canadian relations • Ethnicity and Occupation facets – Hispanic writers or Hispanics and Writers • Body part and Disease facets – Ovarian cancer or Ovaries and Cancer • Business action and Product facets – Drug trials or Product testing and Drugs – CRM Software or Customer Relations Management and Software/Marketing software Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 17. Term pre-coordination or post-coordination • Decision Factors – Human or automated indexing/tagging • If human indexing, all could be post-coordinated – Keyword searching or taxonomy browse • If Keyword searching, pre-coordinated is preferred – Nature and volume of content • Specific content serves narrower pre-coordinated subjects – Scope of the content • Wide range of articles is better served by pre-coordination Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 18. Term pre-coordination or post-coordination • Advantages to pre-coordinated terms – Provide more precise retrieval results, if used correctly – Better suited for specific, custom taxonomies – Better suited for phrase search string searching • Disadvantages to pre-coordinated terms – Narrower nodes might be overlooked by the user. – More complex to correctly index. • Flexibility in degree of pre- or post-coordination is OK, but consistency of application makes the taxonomy more usable. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 19. Variants and cross-references • Variants, Nonpreferred terms, Nonpostable terms, Equivalent terms, See references, Cross-references, Keywords • First, take into consideration: – Human or automated indexing/tagging – Automated stemming – Taxonomy browse, search, or both. If both, which is dominant – Content from divergent sources, countries – System/UI support for a variant pointing to more than one node Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 20. Variants and cross-references • Decision: whether a concept should be a node or its variant (when they are not synonyms) – Create a more specific/narrower node, or use it as a variant • Hydroelectric plants USE Electric power plants • Factories USE Plants & factories – Differentiate closely related terms, or use one as a variant • Foreign policy vs. International relations • Colleges & universities vs. Higher education – Differentiate topics from actions, or use one as a variant • Contracts vs. Contracting • Investments vs. Investing Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 21. Variants and cross-references • Decision: whether a term should be a node or its variant (when synonyms) – Plural vs. singular – Acronym vs. spelled out form – Technical/academic vs. popular term – Synonyms also for a word within a phrase-term • administration vs. management • oil vs. petroleum • communications vs. telecommunications • health vs. medical Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 22. Variants and cross-references • Decision Factors: for the number of variants per node – Users as monolithic or diverse – Size of taxonomy (if browsable) • If small and easily learned then large number of variants unnecessary – Human or automated indexing/tagging • Automated indexing needs many more variants – Keyword searching or taxonomy browse • If Keyword searching needs more variants – Nature and volume of content • Broad/general content needs more variants – Display of Cross-references • Limit the number of variants if they display in the UI Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 23. Variants and cross-references • Decision Factors: for the choice of term as node or variant – User background, level of expertise, expectations – Political correctness, instructiveness to users – Number of characters in display width • The more stakeholders involved, the more complex the decision in choosing the preferred name of the node Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 24. Conclusions • Taxonomy creation is a decision-making task • Different decisions are based on different factors • Each taxonomy project is unique • Creators/editors of the taxonomy need to know: – Who are the users and what are their needs – What is the nature of the content – What the user interface will look like – What the system supports (faceted search, multiple cross-refs) – How the content will be indexed/tagged Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  • 25. Questions? Heather Hedden Information Taxonomist Viziant Corporation Two International Place, Suite 410 Boston, MA 02110 www.viziantcorp.com Heather.hedden@viziantcorp.com 617-517-0075 ext. 104 978-467-5195 (cell) Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.