Implementing a Taxonomy in a Content Management Portal

2,999
-1

Published on

On the uses and implementation of taxonomy on the Web, with a particular focus on the taxonomy as part of an enterprise information environment. Presented by Marjorie M.K. Hlava during Content Week 2005 in Miami, Florida.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
2,999
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
44
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Implementing a Taxonomy in a Content Management Portal

  1. 1. Implementing a Taxonomy in a Content Management Portal Content Week 2005 Miami, Florida Monday, January 31, 2005 Workshop H 2:45pm – 4:45 pm Marjorie M.K. Hlava Access Innovations, Inc. 505-998-0800 mhlava@accessinn.com www.accessinn.com
  2. 2. Introductions • Name • Project • Expectations for these two short hours • Please fill in the sign up sheet • Would you like – 1. Copy of this presentation? – 2. Sample software? – 3. Other information?
  3. 3. Copyright © 2005 Access Innovations, Inc. What will we talk about this afternoon? • 1.Definitions • 2.Where taxonomy fits in the Information Circle • 3.Where to use a taxonomy • 4.Taxonomies for Communities of Practice • 5.Surrounding theories and applications • 6.How to build and maintain • 7.How is used in enterprise information
  4. 4. Thesaurus Master Data Feed MAI to add Metadata Database Management System Add Metadata using MAI Inverted File Implementing a Taxonomy in a Content Management Portal
  5. 5. Copyright © 2005 Access Innovations, Inc. 1. Definitions
  6. 6. Copyright © 2005 Access Innovations, Inc. What is a taxonomy? • A hierarchical thesaurus with authority terms applied at the final node • A browse-able web interface • A Linnaean System • A browse- able list with the term instance at the final leaf
  7. 7. Copyright © 2005 Access Innovations, Inc. Types of Taxonomies • Naming and organizing things into groups that share similar characteristics • 1. Flat – just a list • 2. Hierarchical – Taxonomic view • 3. Faceted – Sorted by a single charasteristic – Metadata - Dublin Core – COSATI -GILS • 4. Thesaurus – Term records – Database backend – Easier to modify and maintain
  8. 8. Copyright © 2005 Access Innovations, Inc. Taxonomy in meta data • Definition – Taxonomy is a thesaurus in its hierarchical view with the authority files applied at the final nodes – It allows the browse-able front end to a portal – It provides keyword and name access to the content in the portal
  9. 9. Copyright © 2005 Access Innovations, Inc. Taxonomy definition • A taxonomy is a thesaurus in hierarchical view with authority file terms added at the final nodes • Thesaurus • Authority file • Hierarchical form • Final nodes
  10. 10. Copyright © 2005 Access Innovations, Inc. Thesaurus • Concepts • Methods • Procedures • Cognitive approach • The knowledge capture piece • The topics or subjects
  11. 11. Copyright © 2005 Access Innovations, Inc. Authority file • People • Places • Things • The tangible approach • Concrete Entities
  12. 12. Copyright © 2005 Access Innovations, Inc. Hierarchical view • Gives the Portal view • The view of all the preferred terms in categorized order • An outline of the thesaurus
  13. 13. Copyright © 2005 Access Innovations, Inc. Final Nodes • The last position on the hierarchical tree – Taxonomy • concept – narrower terms » final node - people, place or thing term » document instance » Letter to George Wiesman Dec 12, 2003 » Technical report number TR-1039 » Museum artifact 1706 wodden wagon wheel
  14. 14. Copyright © 2005 Access Innovations, Inc. Term Records – the Database Part • Associative terms – Related terms • Equivalence terms – Preferred and non preferred – Use and used for – Synonyms • Hierarchical terms – Broader narrower terms – Parent Child
  15. 15. Copyright © 2005 Access Innovations, Inc. Other term record fields • Scope notes • Cross references • History • Term Status • Category • User defined
  16. 16. Copyright © 2005 Access Innovations, Inc. 2. Where does a taxonomy fit in the information circle?
  17. 17. Copyright © 2005 Access Innovations, Inc. Information Circle - Overview Taxonomy User Content Output
  18. 18. Copyright © 2005 Access Innovations, Inc. Content Taxonomy User Content Output •Web Pages •White Papers •Research Reports •Licensed Data Feeds •Intranet •Internal Reports •Lotus Notes files •Databases •Public Relations Documents/Press Releases •Market Research Reports •Customer Relationship Management (CRM) •HR Files •Accounting/Financial Records •Legal Documents •Patents •Museum artifacts
  19. 19. Copyright © 2005 Access Innovations, Inc. Taxonomy User Content Output Content – cont’d HTML – Meta name / Keywords DB – Field / Meta tag / Element XML – Entity table for valid values Content Creation:
  20. 20. Copyright © 2005 Access Innovations, Inc. Taxonomy Taxonomy User Content Output Taxonomy is applied to new and existing content: Meta Tags Thesaurus Terms Authority Terms Date Author Description etc. Rule Base Taxonomy
  21. 21. Copyright © 2005 Access Innovations, Inc. Taxonomy – cont’d Taxonomy User Content Output Index data - Manually - Automatically Suggest new candidate terms Review
  22. 22. Copyright © 2005 Access Innovations, Inc. Output Taxonomy User Content Output Searchable Data - Internal Data - External Data
  23. 23. Copyright © 2005 Access Innovations, Inc. User Taxonomy User Content Output Web Browsing/Searching Database Browsing/Searching Query Resolution
  24. 24. Copyright © 2005 Access Innovations, Inc. User – cont’d Taxonomy OutputUser Content User Input - Suggested Candidate Terms - New Documents Reports Based on User Search - Search Logs - Null Hits (These will also suggest new candidate terms)
  25. 25. Copyright © 2005 Access Innovations, Inc. New Content Taxonomy User New Content Output The cycle begins again
  26. 26. Copyright © 2005 Access Innovations, Inc. Information Circle - Overview Taxonomy User Content Output
  27. 27. Copyright © 2005 Access Innovations, Inc. 3. Where to use a taxonomy • Link the Taxonomy and Indexing • Always in sync with the industry • Keep up to date with terminology • Automatically index the old data • Filter newsfeeds • Search using the Taxonomy • File using the taxonomy • Spell check using the taxonomy • Link to translation system • Catalog using the taxonomy • Index a book
  28. 28. Copyright © 2005 Access Innovations, Inc.
  29. 29. Copyright © 2005 Access Innovations, Inc.
  30. 30. Copyright © 2005 Access Innovations, Inc.
  31. 31. Copyright © 2005 Access Innovations, Inc. Thesaurus Master
  32. 32. Copyright © 2005 Access Innovations, Inc.
  33. 33. Copyright © 2005 Access Innovations, Inc. Database Management System - Add Metadata using MAI Inverted File Aadvark Alligator Apple Advantage …. Zebra Record locator Accessinn.com/12345/demofile/recid15 Database records Each with many elements Portal Searching
  34. 34. Copyright © 2005 Access Innovations, Inc. Inverted File Aadvark Alligator Apple Advantage …. Zebra Record locator Accessinn.com/12345/demofile/recid15 Database records Each with many elements Portal Searching Many data bases can be reached
  35. 35. Copyright © 2005 Access Innovations, Inc. 4. Taxonomies for Communities of Practice
  36. 36. Copyright © 2005 Access Innovations, Inc. Taxonomies in a Community of Practice • Nature of Communities of Practice (CoP) • Taxonomies in context • Value of taxonomies • Creating a taxonomy • Applying the taxonomy
  37. 37. Copyright © 2005 Access Innovations, Inc. Nature of CoPs • Free flowing, loosely structured • Simple, ad hoc categorization • Active CoPs need organization • Search tends to be hit-or-miss Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
  38. 38. Copyright © 2005 Access Innovations, Inc. Taxonomies in Context A taxonomy aspires to be: • a correlation of the different functional, regional and (possibly) national languages used by a community of practice • a support mechanism for navigation • a support tool for search engines and knowledge maps • an authority for tagging documents and other information objects • a knowledge base in its own right Reference: “Taxonomies: the vital tool of information architecture”, www.tfpl.com
  39. 39. Copyright © 2005 Access Innovations, Inc. Value of Taxonomies • Improves organization & structure • Facilitates navigation • Facilitates knowledge discovery • Reduces effort • Saves time “Taxonomies are better created by professional indexers or librarians than by domain experts.” Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
  40. 40. Copyright © 2005 Access Innovations, Inc. Naval Postgraduate School’s Homeland Security Taxonomy (1)
  41. 41. Copyright © 2005 Access Innovations, Inc. Naval Postgraduate School’s Homeland Security Taxonomy (2)
  42. 42. Copyright © 2005 Access Innovations, Inc. IBM Insight graphical view
  43. 43. Copyright © 2005 Access Innovations, Inc. Applying a Taxonomy (1) Manually • Add terms into meta data fields • Design navigation & site indexes with taxonomy hierarchy Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
  44. 44. Incorporating Hierarchical Classification from a Taxonomy Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
  45. 45. Applying a Taxonomy (2) System integration • Search & retrieval systems • Auto-assignment of metadata • Categorization systems Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
  46. 46. Applying the Taxonomy to a Digital Library Web portal Locally held documents Public repositories Commercial data sources Agency data sources INTERNET (public) spiders Meta-Search Tool Filtered content Search engineSearch engine Search engine Search engineSearch engine Automated categorization Library catalogs Search engine Courtesy of Lillian Gassie, Naval Postgraduate School, Monterey, CA
  47. 47. Copyright © 2005 Access Innovations, Inc. 5. Surrounding theories and applications
  48. 48. Copyright © 2005 Access Innovations, Inc. Other Vocabulary types • Uncontrolled lists • Classification System • Subject headings • Controlled vocabulary – usually synonyms and spelling • Authority files • Thesaurus • Taxonomy
  49. 49. Copyright © 2005 Access Innovations, Inc. Uncontrolled list - define • Add terms as they occur • No cross reference • Simple flat structure
  50. 50. Copyright © 2005 Access Innovations, Inc. Controlled term lists - defined • State the preferred terms • Provide allowed term entry • Heavily cross referenced • Not generally hierarchical • Popular • Easy to create
  51. 51. Copyright © 2005 Access Innovations, Inc. Controlled term list - format • Cars – use Automobiles • Personal Computer – use Microcomputer
  52. 52. Copyright © 2005 Access Innovations, Inc. Classification vs Subject Headings • Classification – single spot or placement – browse physical list – often a numbering system – clear hierarchy – no or few cross references
  53. 53. Copyright © 2005 Access Innovations, Inc. Classification vs Subject Headings • Subject headings – generic search – hidden classification system – related terms and cross references in heavy use – Usually the inverted form • cells, electric – Alphabetic access
  54. 54. Copyright © 2005 Access Innovations, Inc. Authority systems - defined • Lists of terms in the preferred format for use • Frequently have cross references • Widely available • Frequently coded lists • Brand names
  55. 55. Copyright © 2005 Access Innovations, Inc. Authority lists - examples • ISO Country Name and Code – International Standards Organization • ISO Language list • NAICS (SIC) – Standard Industrial Classification Code (SIC) – Replaced by – North American Industrial Classification System (NAICS)
  56. 56. Copyright © 2005 Access Innovations, Inc. What is a thesaurus? • Jessica L. Milstead. All Rights Reserved • “For writers, it is a tool like Roget’s ­ one with words grouped and classified to help select the best word to convey a specific nuance of meaning. • For indexers and searchers, it is an information storage and retrieval tool: a listing of words and phrases authorized for use in an indexing system, together with relationships, variants and synonyms, and aids to navigation through the thesaurus” • www.jelem.com
  57. 57. Copyright © 2005 Access Innovations, Inc. Thesaurus - defined • For information retrieval 1960’s – indexing either intellectual or automatic – in searching – searching but not indexing – indexing but not searching – hierarchical view for searching
  58. 58. Copyright © 2005 Access Innovations, Inc. Thesaurus - defined • Monolingual - standard – British – English - ISO 5578 – American – English –ANSI/NISO Z39.19 • Multilingual – standard ISO 5579 – concept mapping – Eurovoc • Discipline or Mission based - ad hoc
  59. 59. Copyright © 2005 Access Innovations, Inc. Thesaurus -standard format • Main Entries • Top Terms - TT • Broader Terms - BT • Narrower Terms - NT • RELATED TERMS - RT • Scope Notes - SN • History - HI • Date term added/changed - DA
  60. 60. Copyright © 2005 Access Innovations, Inc. Standards • Monolingual –NISO / ANSI – Z39.19 –ISO 5578 • Multilingual –ISO 5579
  61. 61. Copyright © 2005 Access Innovations, Inc. ISO Standards • Set up already - easy to adopt • Multiple broader terms • The standards outline procedures – ISO -better for implementation – NISO much better reading
  62. 62. Copyright © 2005 Access Innovations, Inc. Why do we index ? • Improve precision – define scope of terms • Improve recall – different terms for same concept • Guide to a field of expertise • Learning tool • Richer expression
  63. 63. Copyright © 2005 Access Innovations, Inc. Uses ? • Indexing* – …process by which subject terms or classification symbols are assigned to concepts in documents – A thesaurus is also known as an indexing language – * not the building of the inverted file in computer sense of indexing
  64. 64. Copyright © 2005 Access Innovations, Inc. What are we controlling ? • Synonyms – different terms same concept • Polysemes or Homonyms – same word different meanings – Lead – Reading
  65. 65. Copyright © 2005 Access Innovations, Inc. How ? • Meaning – delineation of scope of a term • Term equivalence – linking of synonyms • Disambiguation of homonyms – lead (metal) – lead (element) – lead (management)
  66. 66. Copyright © 2005 Access Innovations, Inc. Precision options • Language specificity • Coordination • Compound terms - level of precoordination • Homographs and scope notes • Word distance indication
  67. 67. Copyright © 2005 Access Innovations, Inc. Precision options • Structural relationships • Links and roles • Treatment and aspect codes • Weighting
  68. 68. Copyright © 2005 Access Innovations, Inc. Disambiguation Bill Invoice Bill Legislative Bill Sport Bill Person
  69. 69. Copyright © 2005 Access Innovations, Inc. Disambiguation Bills Invoices Bills Legislation Bill Animal Bill Person PT NT BT RTRT BTNT
  70. 70. Copyright © 2005 Access Innovations, Inc. 6. How to build and maintain a taxonomy
  71. 71. Copyright © 2005 Access Innovations, Inc. How to build a taxonomy • Collect the terms • Pull out authority terms • Organize into arrays • Choose top terms • Organize hierarchically • Flesh out term records • Test, review, and edit
  72. 72. Copyright © 2005 Access Innovations, Inc. Or said another way … • Define scope • Collect terms and relationships • Identify existing taxonomies • Identify resources • Create & refine taxonomy • Apply taxonomy • Review and update
  73. 73. Copyright © 2005 Access Innovations, Inc. Maintain • Steady stream of terms – Web logs – Null sets – New announcements – Indexing team – Library – Records managers – Etc. • Candidate terms • Out of date is nearly useless
  74. 74. Copyright © 2005 Access Innovations, Inc. Best Results Measures • Accuracy • Productivity • Hits, Misses and Noise • Precision (Recall) • Relevance • Ease of set up • Time to production
  75. 75. Copyright © 2005 Access Innovations, Inc. Integration • Thesaurus – full featured – multiple views – multiple versions – multiple languages • Automatic indexing – filtering – assisted • Data Harmony MAI and Thesaurus Master
  76. 76. Copyright © 2005 Access Innovations, Inc. Visual Taxonomy • Ways to look – Hierarchical – Alphabetic – by term – Ring diagrams – Topic maps – Related terms Visual Taxonomy
  77. 77. Copyright © 2005 Access Innovations, Inc. API to Many Systems for CMS
  78. 78. Copyright © 2005 Access Innovations, Inc. Apply to the meta data • Automatic application? • Spider setting internally • External web crawls – use all aliases • Filter data • Enhance search experience
  79. 79. Copyright © 2005 Access Innovations, Inc. Meta data • The fields • The elements – Class codes – Title – Author – Plaintiff – Product – subject / topic • Meta Name Keywords in HTML
  80. 80. Copyright © 2005 Access Innovations, Inc.
  81. 81. Copyright © 2005 Access Innovations, Inc. 7. How Taxonomies are used in Enterprise Information
  82. 82. Copyright © 2005 Access Innovations, Inc. Brand is repeated in several spots and tied to search as well
  83. 83. Another way of listing brands
  84. 84. Category list from taxonomy is tied to brand list and product list
  85. 85. Category code from the taxonomy is tied to the brand list and the product list
  86. 86. Copyright © 2005 Access Innovations, Inc. Enterprise Taxonomy Management • Consistent application across entire site • Synonyms are used interchangeably • User doesn’t need to know the taxonomy • Pop up view is helpful • Site map for construction and browsing • Allows hidden sections for internal use
  87. 87. Copyright © 2005 Access Innovations, Inc. Taxonomies • Form the basis for knowledge sharing • Add value to discussion • Allow deeper retrieval • Are straightforward to create • Require on-going maintenance
  88. 88. Copyright © 2005 Access Innovations, Inc. Your Taxonomy • There is too much information to pile it on the floor. • It fits in many places in the information flow
  89. 89. Copyright © 2005 Access Innovations, Inc.
  90. 90. Data Feed Thesaurus Master MAI to add Metadata Database Management System Add Metadata using MAI Inverted File Implementing a Taxonomy in a Content Management Portal
  91. 91. Copyright © 2005 Access Innovations, Inc. Thank you for your time! Questions? Marjorie M.K. Hlava Access Innovations, Inc. 505-998-0800 mhlava@accessinn.com www.accessinn.com
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×