Making Decisions in Creating Taxonomies

607 views

Published on

Taxonomy Boot Camp conference presentation 2007

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
607
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Making Decisions in Creating Taxonomies

  1. 1. Making Decisions in Creating Taxonomies Heather HeddenInformation Taxonomist, Viziant Corporation November 8, 2007 Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  2. 2. Background• Heather Hedden’s taxonomy development experience – controlled vocabularies for periodical index databases (Gale) – matching of controlled vocabulary to keywords for consumer products/services directories (various “yellow pages” clients) – enterprise taxonomies for corporate web sites and intranets (Earley & Associations) – base and custom taxonomies integrated within a knowledge discovery and data mining product (Viziant)• Viziant Corporation – A provider of information access and intelligence systems for enterprises and government Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  3. 3. Decisions for the Taxonomist• Decisions of the taxonomy owner – Approximate number of top-level nodes and number of levels – Structure: primarily facets or tree – Interface design: number and layout of displayed nodes – Presence of polyhierarchies – Automated search & retrieval or human indexing/tagging• Decisions often left to the taxonomist – Exact/final number of levels, nodes per level – Arrangement of the node hierarchy, placement within facets – Degree of term pre- or post-coordination – Extent of use of variants/cross-references Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  4. 4. Number of levels, nodes per level• 3 levels and 6-8 nodes per level is a nice ideal – Web site/intranet menu navigation • Menu is confined to bar across top or margin to the side • Menus pull-down or topic trees expand in place• More levels and nodes per level are often needed – Content management/document retrieval for large content repositories • industries, products, fields of science, diseases, geographies, named entities• Decision: Make more levels or make more nodes per level Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  5. 5. Number of levels, nodes per level: ExamplesDeep: Many levelsGeographies- North America - South America - Europe - Asia - Africa - Oceania-- United States --Central Asia--- New England --Middle East---- Massachusetts --South Asia----- Boston --Southeast Asia------ North End------- Old North ChurchBroad: Many nodes per levelGeographies- U.S. cities - U.S. States - Countries - World cities - Continents - Landmarks-- Albuquerque -- Alabama Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  6. 6. Number of levels, nodes per level: ExamplesDeep: Many levels (SIC, NAICS style with 10-20 upper level nodes)Industries- Transportation services-- Air transportation--- Schedule air transportation services---- Scheduled air freight transportation servicesBroad: Many nodes per level (job search sites, 50 - 80 nodes per level)Industries Second levels at select nodes only: Healthcare, Sales- Accounting/Auditing- Administrative Support Services- Advertising/Marketing/Public Relations- Aerospace/Aviation/Defense- Agriculture, Forestry, & Fishing- Airlines etc. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  7. 7. Number of levels, nodes per level• Decision Factors – Display interface/horizontal and vertical real estate – Speed of displaying deeper levels – User market, needs, and expectations • Industry experts, internal employees, general public, students, etc.• Need to balance how much can be easily skimmed in one view vs. how many levels down the user has patience to click down through• More levels lead to less consistency across levels. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  8. 8. Arrangement of node hierarchy• Decision: What’s the best method to handle different means of classification within the same hierarchy? – Industries by traditional SIC/NAICS classification or by vertical market – Products by manufacturing technology or by end-use – Places by physical geographic location or by type – Organizations by goals/objectives or by political/religious affiliation – Government agencies by type or by country/state of affiliation• Even within facets, there often are hierarchies.• Even allowing polyheirarchies, a top-level classification is needed, and too many polyhierarchies can be confusing. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  9. 9. Arrangement of node hierarchy: Examples1. Governmental bodies & agencies - U.S. governmental bodies & agencies -- U.S. Courts -- U.S. executive branch agencies -- U.S. legislative branch -- State bodies & agencies - Foreign governmental bodies & agencies -- Foreign courts -- Foreign legislatures -- Foreign national agencies -- Foreign state & provincial government agencies2. Governmental bodies & agencies -- Foreign legislatures (+ instances) -- U.S. legislatures (+ US federal and state instances)3. Governmental bodies & agencies - Legislative bodies -- National legislatures (+ instances, both foreign and US) -- State & provincial legislatures (+ all instances alphabetical for US and foreign)4. Governmental bodies & agencies - Legislative bodies (+ all instances, US and foreign, in one alphabetical list) Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  10. 10. Arrangement of node hierarchy• Decision: If linking named entities to topical subjects, should they each link at the lowest node level possible, or group all of them together at a higher level?• Example: Link specific churches at the broader term, Churches (denominations), the appropriate narrower term, or both Churches (denomination) - Catholic churches - Orthodox churches - Protestant churches Does the user know where to look for the Maronite Church or the Assyrian Church of the East? Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  11. 11. Arrangement of node hierarchy• Decision factors: – Knowledge of users as to where to categorize an entity – Likelihood of users to browse rather than search for entities – Existence of entities that don’t belong in a subcategory – Purpose to teach users (students) where entities belong• Linking entities at both specific and broader level, makes them easier to find, but clutters up the taxonomy, slows down performance, and may not seem logical at first to the user Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  12. 12. Arrangement of node hierarchy• Decision Factors – User market, needs, and expectations • How the users classify the subject matter • Whether a topic is even likely to be browsed for in the taxonomy or rather entered in the search box – Support for polyhierachies – Permissibility of nodes as category labels, not linked to content, at various intermediate levels within the hierarchy • e.g. Foreign legislatures• Need to consider – Whether to create nodes difficult to distinguish in indexing • e.g. both Legislative bodies and National legislatures Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  13. 13. Placement within facets• Facets may be determined by taxonomy owner, but placement of certain nodes may not be obvious – Institutions could be Places or Organizations • Places of worship, educational institutions, museums, libraries – Business activities could be Actions or Topics • Acquisitions, Contracts, Joint ventures, Sales• Decisions: – In which facet to put these nodes – Whether two (parenthetically modified) nodes for the concept should be created, one for each facet, e.g. Hotels (buildings) and Hotels (companies) – Or whether a node can be in more than one facet Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  14. 14. Placement within facets• Decision factors – System support for two occurrences of the same-named node – Automated or manual indexing • Automated indexing may not distinguish between different facet- meanings of a term: action or topic, place or organization, etc. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  15. 15. Term pre-coordination or post-coordination• Hierarchical tree or thesauri serve pre-coordination – User browses for most specific concept• Facets serve post-coordination – User chooses combination of concepts from multiple facets (e.g. place, product type, usage issue, customer type)• But topic trees/thesauri may be used within a UI supporting multiple search terms (narrow a search)• But hierarchies can exist within facets• Decisions: – In a topic tree/thesaurus, whether to expect post-coordination – In a faceted taxonomy, whether and how much to have pre- coordination Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  16. 16. Term pre-coordination or post-coordination• Place and Topic facets – Russian foreign policy or Russia and Foreign policy – French embassies or France and Embassies – United States-Canadian relations• Ethnicity and Occupation facets – Hispanic writers or Hispanics and Writers• Body part and Disease facets – Ovarian cancer or Ovaries and Cancer• Business action and Product facets – Drug trials or Product testing and Drugs – CRM Software or Customer Relations Management and Software/Marketing software Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  17. 17. Term pre-coordination or post-coordination• Decision Factors – Human or automated indexing/tagging • If human indexing, all could be post-coordinated – Keyword searching or taxonomy browse • If Keyword searching, pre-coordinated is preferred – Nature and volume of content • Specific content serves narrower pre-coordinated subjects – Scope of the content • Wide range of articles is better served by pre-coordination Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  18. 18. Term pre-coordination or post-coordination• Advantages to pre-coordinated terms – Provide more precise retrieval results, if used correctly – Better suited for specific, custom taxonomies – Better suited for phrase search string searching• Disadvantages to pre-coordinated terms – Narrower nodes might be overlooked by the user. – More complex to correctly index.• Flexibility in degree of pre- or post-coordination is OK, but consistency of application makes the taxonomy more usable. Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  19. 19. Variants and cross-references• Variants, Nonpreferred terms, Nonpostable terms, Equivalent terms, See references, Cross-references, Keywords• First, take into consideration: – Human or automated indexing/tagging – Automated stemming – Taxonomy browse, search, or both. If both, which is dominant – Content from divergent sources, countries – System/UI support for a variant pointing to more than one node Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  20. 20. Variants and cross-references• Decision: whether a concept should be a node or its variant (when they are not synonyms) – Create a more specific/narrower node, or use it as a variant • Hydroelectric plants USE Electric power plants • Factories USE Plants & factories – Differentiate closely related terms, or use one as a variant • Foreign policy vs. International relations • Colleges & universities vs. Higher education – Differentiate topics from actions, or use one as a variant • Contracts vs. Contracting • Investments vs. Investing Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  21. 21. Variants and cross-references• Decision: whether a term should be a node or its variant (when synonyms) – Plural vs. singular – Acronym vs. spelled out form – Technical/academic vs. popular term – Synonyms also for a word within a phrase-term • administration vs. management • oil vs. petroleum • communications vs. telecommunications • health vs. medical Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  22. 22. Variants and cross-references• Decision Factors: for the number of variants per node – Users as monolithic or diverse – Size of taxonomy (if browsable) • If small and easily learned then large number of variants unnecessary – Human or automated indexing/tagging • Automated indexing needs many more variants – Keyword searching or taxonomy browse • If Keyword searching needs more variants – Nature and volume of content • Broad/general content needs more variants – Display of Cross-references • Limit the number of variants if they display in the UI Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  23. 23. Variants and cross-references• Decision Factors: for the choice of term as node or variant – User background, level of expertise, expectations – Political correctness, instructiveness to users – Number of characters in display width• The more stakeholders involved, the more complex the decision in choosing the preferred name of the node Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  24. 24. Conclusions• Taxonomy creation is a decision-making task• Different decisions are based on different factors• Each taxonomy project is unique• Creators/editors of the taxonomy need to know: – Who are the users and what are their needs – What is the nature of the content – What the user interface will look like – What the system supports (faceted search, multiple cross-refs) – How the content will be indexed/tagged Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.
  25. 25. Questions? Heather Hedden Information Taxonomist Viziant Corporation Two International Place, Suite 410 Boston, MA 02110 www.viziantcorp.com Heather.hedden@viziantcorp.com 617-517-0075 ext. 104 978-467-5195 (cell) Copyright © 2007 Viziant Corporation. All Rights Reserved. Proprietary & Confidential.

×