Taxonomy 101
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
872
On Slideshare
872
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
20
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Why are we here?
  • (MS centerpiece, Time Obama, RIM Industry name TK, banking?) This is a list, everything else we will talk about today has some kind of relationshipDonna: CVs provide the data to populate metadatafields Offer consistency in language used to describecontent] Act as an intermediary between the input of theuser and a database of terms by interpreting themeaning of the words Provide agreement in (semantic) meaning of termsused Facilitate retrieval Enable search input to better represent theoriginal intention of the user Provide consistent and clear hierarchies fornavigation
  • You’ll notice facets are both a cv and a taxonomy as they can have a hierarchical structure as well as a (Pets:Animal>Mammal>Dog>Poodle, Food:MI>Veg>Peppers>Chili peppers, RIM example TK)
  • Pros: EasyCons: Little context
  • Pros: Only need one relationship of term to synonymCon: No preferred terms
  • Pros: Preferred terms can be used for browse and displayCons: little context other than syn or official term
  • Pro: Easy to implementCon: 1 parent/1 child
  • Can be hierarchical, ackack
  • Pro: Multiple levels of hierarchy allowing for multiple parent/child relationshipsCon: Can spin wildly out of control as you attempt to classify the universe
  • Pro: Rock solid industry standardCon: Limited relationships
  • Part of semantic web (both big and little S)Pro: Allows for complex relationships between things to be expressedCon: Spin out of control, can be dif for systems to retrieve and make use of relationships
  • Microformats reuse existing html/xml tags to convey metadata Pro: highly extendable, Con:
  • And may just not be appropriate for your company
  • How does this work out for us in web world?Clickthroughs and return site visits, pure and simple
  • All based on use cases
  • Breads Bruschetta is bread dish but not MI
  • RetentionDocument storagediscovery
  • ISO/S 23081-1ISO 23081-2

Transcript

  • 1. + Taxonomy 101 Controlled Vocabularies and Beyond Barbara McGlamery, Marthastewart.com
  • 2. + About Me  9+ years Time Inc.  Entertainment Weekly  This Old House  Time  People  Instyle  Recipe Finder  1+ years Martha Stewart  Martha Stewart Living  Martha Stewart Weddings  Whole Living
  • 3. + Agenda  Basics of taxonomy and controlled vocabularies  Developing a taxonomy  Taxonomy software and tagging tools  Records management and taxonomy
  • 4. + What is a controlled vocabulary?  Predefined, authorized terms that can be consistently applied to content  Types:  Lists  Synonym rings  Authority Files  Facets
  • 5. + What is a taxonomy? Classificationof a controlled vocabulary in a hierarchical list  Types:  Taxonomy  Thesaurus  Ontology
  • 6. + Controlled Vocabulary  Predefined,authorized terms that can be consistently applied to content  Relationshipis between the list value and class
  • 7. + Controlled Vocabulary  Units of Measure  Cup  Tablespoon  Teaspoon
  • 8. + Synonym Ring  Extendsa CV by adding synonyms as equivalent terms  Relationship is between list value and its synonyms
  • 9. + Synonym Ring  Units of Measure  Cup = C= c  Tablespoon = Tbl = T  Teaspoon – tsp = t
  • 10. + Authority File Extends CV’s and synonym rings further by assigning one term as the preferred term which all other synonyms will point to Relationship assigns property (Preferred Term) to one term and all others as synonyms
  • 11. + Authority File  Units of Measure  (Preferred Term) Cup  Syn: C, c  (PT) Tablespoon  Syn: Tbl, T  (PT) Teaspoon  Syn: tsp, t
  • 12. + Facets  Termsare broken down individually by unique properties, allowing a mix and match approach to search and retrieval  Relationshipis between one facet node and multiple values
  • 13. + Facets
  • 14. + Taxonomy  Classificationof a controlled vocabulary in a hierarchical list  Relationship is in assigning a hierarchy to list values
  • 15. + Taxonomy  Food  Main Ingredient  Vegetables (ahem…fruit)  Tomatoes  Beefsteak tomatoes  Cherry tomatoes  Sundried tomatoes
  • 16. + Thesaurus  CV’sin a hierarchical structure with predefined relationships between terms (Broader Term, Narrower Term, Preferred Term, etc.)  Relationship is in assigning standardized properties to list values
  • 17. + Thesaurus  Food  (BT) Main Ingredient  (BT)Vegetables (ahem…fruit)  (BT)Tomatoes  (NT)Beefsteak tomato  (NT)(PT)Cherry tomato  (RT) Roma tomato  (NT)Sundried tomato  (RT) Tomato sauce
  • 18. + Ontology  CV’s in a hierarchical structure with complex relationships defined  Relationship is in assigning predetermined standardized and freeform properties to list values
  • 19. + Ontology  Beefsteaktomatoes (isMainIngredient) Tomato sauce  Will Smith (isLeadActor) Men in Black 3
  • 20. + Semantic (semantic) Web  Big S  Initiative from W3C to create a web of machine readable data by marking up content with consistently applied, standardized and freeform properties  RDF/OWL  Proprietary  Little s  Various standards that mark up content with agreed-upon and freeform properties  Microformats  Microdata  Proprietary
  • 21. + Pros and Cons of CV’s and taxonomy  Benefits  Greater precision in search and retrieval  Allows for faceted browsing  Facilitates aggregation of content  Clearly defines relationships between things  Limitations  Initial costs  Upkeep  Can spiral out of control  May be too complex for some organizations
  • 22. + What is taxonomy used for in web world?  Search and retrieval  Faceted browsing  Aggregation of content  Internal organization of assets
  • 23. + Developing a taxonomy  Strategy and planning  Choosing style and method  Determine classes and relationships  Gather terms and organize  Add terms and relationships  Review and approval
  • 24. + Strategy and Planning  Identify business case  ROI  Money saved  Money earned  Scope  Use cases  Front-end  Back-end  Approval  Wireframes and functional specification
  • 25. + Choose Style and Method  Method  Styles  Top down  CV  Bottom up  Synonym ring  Authority file  Facets  Taxonomy  Thesaurus  Ontology
  • 26. + Determine Classes and Relationships  Classes  As few as necessary  Relationships between terms  As few as necessary  With a taxonomy, determine nature of hierarchy  Type of  With a thesaurus, use predefined, but you may not want to use all  With ontology, determine complex relationships
  • 27. + Gather Terms and Organize  Research  Competitive analysis  Identify existing outside CV’s that might be utilized (SIC codes)  Meet with stakeholders  Get as much input as possible  Stick to biz case (spiraling problem)  You are the final decision maker  Must conform to structure decided upon otherwise mass chaos  Always keep use cases in mind
  • 28. + Add Terms and Relationships  Things to keep in mind:  Synonyms, misspellings, special characters  Homonyms  Different database identifiers or different names  Shower (baby and bathroom)  Duplicates  Technical considerations if different children  Breads as a main ingredient or as a dish  Bruschetta (dish, but not main ingredient)  Descriptions  Identifying duplicates or notes regarding the application to content
  • 29. + Review and Approval  Thorough review by all stakeholders  This can take several sessions if taxonomy is big  Final approval and sign-off  Critical for buy-in
  • 30. + Taxonomy and Tagging Tools Relational databases  Thesaurus and taxonomy tools  Filemaker Pro  Open source  Microsoft Access  Protégée  MySql  Commercial  SchemaLogic (Thesaurus) Content management  TopBraid Composer, software (Ontologies), Pro  Drupal  Auto categorization and text  Sharepoint mining  Proprietary applications  Data Harmony MAIstro,  Nstein
  • 31. + Tagging the Content  Manual  Good for small, controlled sets of documents  Highly accurate  Time consuming  Automated  Good for large unwieldy sets of documents  Fast and getting more accurate daily  Expensive, 3rd party apps  Hybrid  Manual – content or document creators insert valuable metadata  Automated – other data extracted and matched to taxonomy
  • 32. + Real World Application of Taxonomy for Records Management  Classifying  Storing and retrieving  Securing  Archiving or destroying
  • 33. + Real World Applications  CV  Taxonomy/Thesaurus  List of Departments (HR, IT,  Organizational chart Marketing)  Investment Bank Director  SVP Investments  Synonym rings  EVP Investments  Mergers and acquisitions = M and A = M&A  Investment Analyst  Authority File  (PT) Mergers and acquisitions  Ontology  Syn: M and A, M&A  Relationships between affiliations and departments/industries  Facets  ARMA (isProfessionalAssn)  Authors, Departments, for Records Managers Security Level
  • 34. + What could it be used for in your world?http://www.yutope.com/2008/07/is-your-email-inbox-overflowing/
  • 35. + Industry standards  Taxonomy specific  Dublin Core (DC)  Thesaurus construction  ANSI/NISO Z39.19  ISO 2788; 5964  Ontology development  W3C  Resource Description Framework (RDF)  Web Ontology Language (OWL)  Records Management specific  Metadata management  ISO/S 23081-1  ISO 23081-2
  • 36. + Questions?
  • 37. + My contact info  BarbaraMcGlamery Taxonomist Martha Stewart Living Omnimedia (212)827-8817 bmcglamery@marthastewart.com