Taxonomy Fundamentals Workshop


Published on

Opening presentation for Track 1 of the 2012 Taxonomy Boot Camp, October 16, 2012.

Presented by Marjorie M.K. Hlava of Access Innovations and Heather Hedden of Hedden Information Management.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • e.g. Account Information Usage could go into 2, but we keep in just Disclosures & Notifications
  • e.g. Account Information Usage could go into 2, but we keep in just Disclosures & Notifications
  • e.g. Account Information Usage could go into 2, but we keep in just Disclosures & Notifications
  • Thanks to Helen Atkins of AACR for this illustration.The real power of this is that the links can all go in all directions, so we take advantage of having the user’s attention regardless of how they step into our “web”
  • Taxonomy Fundamentals Workshop

    1. 1. Taxonomy Fundamentals WorkshopTaxonomy Boot Camp, October 16, 2012, Washington, DC Marjorie Hlava, President Access Innovations, Inc. Heather Hedden Hedden Information Management
    2. 2. IntroductionsMarjorie HlavaPresident, Access Innovations, Inc.Heather HeddenTaxonomy Consultant, Hedden Information ManagementAuthor, The Accidental Taxonomist
    3. 3. Outline• The basics – 30 minutes• More details: Polyhierarchies and Facets – 30 minutes (including exercises)• “Taxonomatch” – 15 minutes• Implementation and applications – 15 minutes• Q&A
    4. 4. The Basics – 30 minutes• What is a taxonomy?• What are the parts of a taxonomy?• How do you build one?• Guidelines for the terms• Subject Matter Experts (SME’s)• 40 slides © 2012. Access Innovations, Inc. All Rights Reserved.
    5. 5. What is a Taxonomy? ANSI/NISO Z39.19-2005 controlled“A collection of controlled vocabulary terms organized into a Yes! hierarchical structure.”Missing:equivalence, associative relationships, and notes © 2012. Access Innovations, Inc. All Rights Reserved.
    6. 6. The Semantic Road Map: Knowledge Organization Systems•Complex Semantic network •Linked Entities•High value •Contextual Specificity Ontology Thesaurus Taxonomy Controlled vocabulary Synonym set/ring Name authority file  Uncontrolled list list Uncontrolled •Unrelated Entities •Simple •Low value •Ambiguity Highest Cost over Time! © 2012. Access Innovations, Inc. All Rights Reserved. 2011.
    7. 7. Basic features - The term record• Main Term (MT) = subject term, heading, node, category, descriptor, class• Top Term (TT)• Broader Terms (BT) TAXONOMY• Narrower Terms (NT) ONTOLOGY• Related Terms (RT) – See also (SA) THESAURUS• Non-Preferred Term (NP) – Used for (UF), See (S) – Synonyms• Scope Note (SN)• History (H) © 2012. Access Innovations, Inc. All Rights Reserved.
    8. 8. Taxonomy? Thesaurus?• Often used interchangeably• Thesaurus is a taxonomy with extras – Related Terms – Non-preferred Terms (USE/Used for) – Scope Notes – More• Taxonomies often have the actual information object at the final node.• CMS and SharePoint tend to the hierarchical view only, definition, and USE © 2012. Access Innovations, Inc. All Rights Reserved.
    9. 9. Taxonomy Thesaurus view Term Record view Copyright © 2005 - Access Innovations, Inc.© 2012. Access Innovations, Inc. All Rights Reserved.
    10. 10. How do you build a taxonomy ?• Define subject field• Collect terms• Organize terms• Fill in gaps• Flesh out and interrelate terms• Apply to your data You’re done! © 2012. Access Innovations, Inc. All Rights Reserved.
    11. 11. Define subject field• Review representative collection of content• Determine: – Core areas – Peripheral topics Sociology Psychology Education Law • Scope can be modified later © 2012. Access Innovations, Inc. All Rights Reserved.
    12. 12. Build, buy, augment?• Survey existing thesaurus/taxonomy resources for your domain• Test for • Scope • Depth • Make-or-break terms • Cost• Adoption of existing taxonomies – Term registries – Taxobank – Taxonomy Warehouse – Other resources Don’t reinvent the wheel! © 2012. Access Innovations, Inc. All Rights Reserved.
    13. 13. Foundations• Start with what is known• Build from there• Use the literature, your data• Use internal lists• Built-in continuous review throughout the process, and beyond• Who is involved? – Taxonomists – Subject matter experts – Project management – Users © 2012. Access Innovations, Inc. All Rights Reserved.
    14. 14. Collect terms• Your documents and databases• Departmental terminology• Textbooks and their indexes• Book tables of contents and indexes• Journal quarterly indexes• Encyclopedias• Lexicons, glossaries on the topic• Web resources• Users and experts• Search logs © 2012. Access Innovations, Inc. All Rights Reserved.
    15. 15. Gather terms from search logs• Top 100 search terms from search logs• Terms used more than 50 times• Match to website with appropriate answer• Basis for favorites or best bets, presented at the top of results list• Behavior-based taxonomy © 2012. Access Innovations, Inc. All Rights Reserved.
    16. 16. How do you choose terms?• Importance in the subject area• Use in the literature, by the organization or community• Necessary degree of specificity or detail• Relationship with other controlled vocabularies• Single concept = single term © 2012. Access Innovations, Inc. All Rights Reserved.
    17. 17. One term / one concept• Terms represent simple or unitary concept• A unit of thought• May be a single-word term• May be a multiword term is required to represent the concept “A unit of thought, formed by• Three main categories mentally combining some or all – Concrete entities of the characteristics of a concrete or abstract, real or – Abstract concepts imaginary object. Concepts – Proper nouns exist in the mind as abstract entities independent of terms used to express them.” © 2012. Access Innovations, Inc. All Rights Reserved.
    18. 18. Concrete entities as terms• Things and their physical parts – Birds • Feathers • Buildings • Floors• Materials – Cement – Wood – Lead– Cards and Chips © 2012. Access Innovations, Inc. All Rights Reserved.
    19. 19. Abstract concepts as terms• Actions and events – evolution, skating, management, ceremonies• Abstract entities – law, theory• Properties of things, materials, and actions – strength, efficiency• Disciplines and sciences – physics, meteorology, mathematics• Units of measurement – pounds, kilograms, miles, meters, nanoseconds © 2012. Access Innovations, Inc. All Rights Reserved.
    20. 20. Proper nouns as terms• Individual entities – “classes of one” – expressed as proper nouns – San Francisco, Lake Michigan Thesaurus standards exclude proper names, persons, and trade names  authority files. Taxonomies include them as final nodes. © 2012. Access Innovations, Inc. All Rights Reserved.
    21. 21. Organize terms – roughly• Sort terms into several major categories – logical groups of similar concepts as Top Terms – Identify core areas and peripheral topics – 10 – 20 to start – Consider moving proper names to authority files• Result: loose collection of terms under several main headings – Rough and tentative – see how it fits as you go – Initial gap analysis – Add / modify / delete as needed © 2012. Access Innovations, Inc. All Rights Reserved.
    22. 22. How do terms relate?• Hierarchical relationships TAXONOMY – Parents and their children• Equivalence relationships – Aliases THESAURUS• Associative relationships – Cousins – See also’s © 2012. Access Innovations, Inc. All Rights Reserved.
    23. 23. Hierarchical relationships• Broader Term represents the class, whole, or genus• Narrower Term is a member, part, or species – Generic relationship – Whole-part relationship – Instance relationship• NT inherit all the BT characteristics• BTs/NTs have a reciprocal relationship © 2012. Access Innovations, Inc. All Rights Reserved.
    24. 24. Broader to narrower termsPolitics Elections Presidential elections Gubernatorial elections Mayoral elections © 2012. Access Innovations, Inc. All Rights Reserved.
    25. 25. Hierarchy – Whole-part relationship• Four general types – Body systems and organs • Ear  Middle ear – Geographical locations • Bernalillo County  Albuquerque – Fields of study • Geology  Physical geology – Hierarchical social structures • Ontario  Manitoulin District © 2012. Access Innovations, Inc. All Rights Reserved.
    26. 26. Hierarchy – Instance relationship• General category (common noun) as BT, with individual example (proper noun) as NTI (Narrower Term Instance)Seas French cathedrals Baltic Sea Chartres Cathedral Caspian Sea Rheims Cathedral Mediterranean Sea Rouen CathedralEssentially identical to “final node” in taxonomies © 2012. Access Innovations, Inc. All Rights Reserved.
    27. 27. Polyhierarchical relationship• Term can logically fit under more than one Broader Term – can have Multiple Broader Terms (MBT)• Part of ISO standards, new to ANSI/NISO Nurses Health administrators Nurse administrators Nurse administrators Finance Careers Accounting Accounting Copyright © 2009 - Access Innovations, Inc. © 2012. Access Innovations, Inc. All Rights Reserved.
    28. 28. Generic relationship test – 1• Both terms in same fundamental category• “All-and-some” test Rodents SOME ALL Squirrels Pests SOME NOT ALL Squirrels Inheritance or inclusion – what’s true of the parent (BT) is true for all children (NTs) © 2012. Access Innovations, Inc. All Rights Reserved.
    29. 29. Generic relationship test – 2 Rodents Squirrels Pests  ALL squirrels are rodents x NOT ALL squirrels are pests x NOT ALL pests are rodents © 2012. Access Innovations, Inc. All Rights Reserved.
    30. 30. Equivalence relationship• Preferred Term – Thesaurus term and valid for indexing – Thesaurus notation: USE• Non-Preferred Term – Not valid for indexing – An alias or imposter – Entry point, directs user to Preferred Term – Thesaurus notation: UF or NPT Spiders Plant pathology UF Arachnids USE Phytopathology © 2012. Access Innovations, Inc. All Rights Reserved.
    31. 31. Equivalence – when to use • Synonyms, slang, quasi-synonyms • Scientific and trade names – Ibubrofen UF Motrin™ • Lexical variants – Fiber optics UF Fibre optics – Mouse UF Mice • Upward posting of narrow concepts not specified in taxonomy or thesaurus – Social class UF Elite, Middle class, Working classGet equivalent terms from search logs, brainstorming… © 2012. Access Innovations, Inc. All Rights Reserved.
    32. 32. Associative relationship• Related Terms (RTs) – cousins• “…terms related conceptually but not hierarchically, and are not part of an equivalence set” (i.e. not synonyms)• Both valid for indexing• Reciprocal relationship with each other• Expands user’s awareness, reflects thesaurus coverage of unanticipated areas• Main basis for the ontology• 14 main options offered in Z39.19 © 2012. Access Innovations, Inc. All Rights Reserved.
    33. 33. Scope Notes (SN)• Indicate meaning of the term in the context of this thesaurus, for this audience – Stress – Mental, Psychological, Physiological• Could be the definition or glossary• Indicate any restriction in meaning• Indicate range of topics covered• Provide direction for indexers; for terms often confused, may suggest an alternative term• Use as needed – may not be for every term• Use a style guide• Be concise © 2012. Access Innovations, Inc. All Rights Reserved.
    34. 34. Stating the terms• Term format• Grammatical issues• Singular and plural forms• Spelling• Abbreviations and acronyms• Capitalization• Other punctuation• Consistency © 2012. Access Innovations, Inc. All Rights Reserved.
    35. 35. Term format• KISS – Keep it short and simple – 1-2-3 words – Effect on search – Pre- and Post-Coordination• Establish a policy – follow Chicago Manual of Style• Grammatical issues – Nouns and noun phrases – Verbs  Gerunds – Adjectives - no – Adverbs - no – Initial articles – no © 2012. Access Innovations, Inc. All Rights Reserved.
    36. 36. Compound terms – nope!• “Terms in a thesaurus should represent simple or unitary concepts…” (ISO standard)• “Compound terms should be factored (split) into simple elements…” (ANSI/NISO standard)• Term phrases are okay (bigrams) – Adjective-Noun – American history• Two concepts combined are not – Aromatherapy for bloating © 2012. Access Innovations, Inc. All Rights Reserved.
    37. 37. Pre and post coordinate terms• Pre coordinates – two concepts – Subject headings – Library of Congress • American history – Civil War – Back of the book – Put together in advance by the publisher• Post Coordinate – Taxonomy terms – Single concept – Put together by the user / searcher © 2012. Access Innovations, Inc. All Rights Reserved.
    38. 38. So far you’ve got• Hierarchy – Broader and Narrower Terms – Polyhierarchies when needed• Preferred/Non-Preferred Terms – Equivalence relationships• Related Terms – Associative relationships• Scope Notes• Complete term records – Correct term format © 2012. Access Innovations, Inc. All Rights Reserved.
    39. 39. Review, edit, test, edit, use, edit, and maintain, i.e. edit • Review • Edit and maintain – Users – Add term – Expert reviewers – Change existing term • Test – Change term status – Index 500+ documents – Delete term (more for variable writing – Add term relationship style; fewer for strict style) – Delete term relationship – Monitor search log – Add/modify Scope Note – Change overall structureConsider automated / assisted indexing software © 2012. Access Innovations, Inc. All Rights Reserved.
    40. 40. Subject Matter Experts• Work first from the literature• Establish literary warrant for terms• Someone else do the clerical work• Differentiate the lexicography work – From the Subject Matter expert work• Let SMEs do the review and tailoring• Expert review ensures the proper term use and application• Advisory Board…advisable! © 2012. Access Innovations, Inc. All Rights Reserved.
    41. 41. More Details Polyhierarchies Facets © 2012 Hedden Information Management
    42. 42. Polyhierarchies Term Term Child Child Child Child Term 1 Term 2 Term 1 Term 2Grand- Grand- Grand- Grand- Grand- Grand-child 1 child 2 child 1 child 2 child 3 child 4 Hierarchy Polyhierarchy © 2012 Hedden Information Management
    43. 43. Polyhierarchies A term has a polyhierarchy if it has more than one broader term. Polyhierarchy is permitted if the hierarchical relationship is valid in both/all cases Remember “All-and-Some” test for each generic hierarchical relationship © 2012 Hedden Information Management
    44. 44. Polyhierarchies Based on generic relationship Professions Motor vehicles Musicians Educators Cars Trucks Music Teachers Light trucks © 2012 Hedden Information Management
    45. 45. Polyhierarchies Based on different kinds of hierarchical relationships/ different means of categorizing (less common) Bodies United of Water States Lakes Utah Great Salt Lake © 2012 Hedden Information Management
    46. 46. Polyhierarchy - PlusesPolyhierarchy is useful when… It is obviously logical for select terms (cross-overs/hybrids, e.g. Music teachers or Light Trucks) It is indicated by different stakeholder views Indexers/taggers browse the taxonomy hierarchically End-user testing/input (e.g. card-sorting) indicates users are split as to where in the hierarchy an item belongs © 2012 Hedden Information Management
    47. 47. Polyhierarchy - PlusesRetail website Sports taxonomycase study example: case study example:Health & Fitness Back Exercises › Portable Fitness Electronics › Dead Lifts › Fitness GPS Watches Hamstring ExercisesCar, Marine & GPS › Dead Lifts › GPS Navigation › Handheld GPS › Fitness GPS Watches © 2012 Hedden Information Management
    48. 48. Polyhierarchy - MinusesPolyhierarchy is not so good when… It violates hierarchical relationship standards It becomes excessive, perhaps more common than mono- hierarchies It is the result of different kinds of a categorization, and the presence of different kinds of categorization is confusing It is a small taxonomy and the user doesn’t need or expect polyhierarchy © 2012 Hedden Information Management
    49. 49. Polyhierarchy - MinusesProblems with excessive polyhierarchies: Familiar tree structure is lost. Users cannot see the logical hierarchy. Users spend too much time clicking through categories. © 2012 Hedden Information Management
    50. 50. Polyhierarchy - MinusesLogical polyhierarchies, if done consistently, could become extensive.Example: creating polyhierarchies for products based on different classificationsGlass Products Tableware Balls Soccer Equipment Wine Glasses Soccer Balls © 2012 Hedden Information Management
    51. 51. Polyhierarchy - MinusesMultiple, potentially confusing categorizations: Place names in hierarchies for both geographic location and for place type Products in hierarchies for both material and for use Physical exercises in hierarchies for both body part and purpose/type (strength, endurance, etc.) “It’s OK, we can have polyhierarchies” This is not always the best solution. Maybe facets should be used instead. © 2012 Hedden Information Management
    52. 52. Polyhierarchies - CasesViolating hierarchical relationship standards Might be OK in some cases in some taxonomies But avoid overuse in polyhierarchies Computers & Tablets Case study example: Laptop & Netbook Computers  Accessories as a narrower term Tablets, iPads & E-Readers Desktop & All-in-One Computers to a product category Monitors  Services as a narrower term Mice & Keyboards to a product category Printers Hard Drives & Storage Computer Memory Video Cards & PC Components Networking & Wireless Software Computer Accessories Computer Setup & Services © 2012 Hedden Information Management
    53. 53. Polyhierarchies - CasesViolating hierarchical relationship standards within limits Computers & Tablets Laptop & Netbook Computers PC Laptops MacBooks Chromebooks Netbooks All Netbooks Netbook Cases Computer Setup & Services Not OK Laptop Accessories Computer Setup & Services OK Desktop & All-in-One Computers All-in-One Computers Towers Only Desktop Packages Computer Setup & Services OK © 2012 Hedden Information Management
    54. 54. Polyhierarchies - CasesDo not create a polyhierarchy to both a “parent” and a “grandparent.” Cameras Grandparent of Digital SLR Cameras Digital Cameras Parent of Digital SLR Cameras Digital SLR Cameras © 2012 Hedden Information Management
    55. 55. Polyhierarchies - CasesMight be better not to have polyhierarchies when the taxonomy is small and the number of top-level categories are fewCase study: Client management documents of a financial services company has 114 topical terms categorized with just five broader terms:  Account Information  Client Information  Client Status  Disclosures & Notifications  Approvals/Guidance Decided against polyhierarchies. Reason: Repeat users can memorize the small hierarchy. They don’t expect polyhierarchy here. © 2012 Hedden Information Management
    56. 56. Polyhierarchies - ConclusionsSome is good. More isn’t necessarily better. Polyhierarchies are best for isolated terms that can fall into two categories. Polyhierarchies can become too many in cases of overlays of two different categorization methods for numerous terms. (Facets may be better.) Polyhierarchies are useful, no matter how extensive, in term-focused thesauri Polyhierarchies should be more limited in fully displayed taxonomies © 2012 Hedden Information Management
    57. 57. Polyhierarchies - ExercisePropose two broader terms for each: Hotel managers Printers Fish Egypt Bill Gates © 2012 Hedden Information Management
    58. 58. Facets For serving faceted classification, which allows the assignment of multiple classifications to an object A “dimension” of a query; a type of concept Intended for searching with multiple terms in combination (post-coordination), one from each facet Can be for topics or for named entities, but generally not both Reflect the domain of content A subset of metadata fields © 2012 Hedden Information Management
    59. 59. FacetsFaceted ClassificationMathematician/librarian S.R. Ranganathan (1920s) developed as an alternative to the Dewey Decimal System for books:“Colon Classification”1. Personality – topic or orientation2. Matter – things or materials3. Energy – actions4. Space – places or locations5. Time – times or time periods © 2012 Hedden Information Management
    60. 60. FacetsFacets are suitable for: Structured data with discernable metadata fields or database records Homogeneous data with similar types of characteristics (e.g. products in an e-commerce site)Example types of facets: For products  category, brand, size, color, price range, features For people  name, job title, gender, birth year, location, department For reports  author, subject, audience, document type, language © 2012 Hedden Information Management
    61. 61. Facets For Web sites:For enterprise taxonomies: Rosenfeld and Morville,Patrick Lambe, Information ArchitectureOrganising Knowledge  Topic People and organizations  Product Things and parts of things  Document type Activity cycles  Audience Locations  Geography  Price © 2012 Hedden Information Management
    62. 62. Facet Examples 1. - advanced search 2. My Recipes 3. Microbial Life Educational Resources © 2012 Hedden Information Management
    63. 63. © 2012 Hedden Information Management
    64. 64. My Recipes© 2012 Hedden Information Management
    65. 65. © 2012 Hedden Information Management
    66. 66. Facets & HierarchiesCombining Facets and Hierarchies1. Have hierarchies within facets2. Start with hierarchical categories and then limit further with facets © 2012 Hedden Information Management
    67. 67. Facets & Hierarchies1. Hierarchies within facets: indented displayWorld Bank documents advanced search © 2012 Hedden Information Management
    68. 68. Facets & Hierarchies2. Hierarchies of topics, then facets to narrow results: ThomasNet business directory Buzzillions product reviews books browse © 2012 Hedden Information Management
    69. 69. Taxonomy Structures: HierarchiesOne level per web pageYahoo directory browse
    70. 70. © 2012 Hedden Information Management
    71. 71. Buzzillions© 2012 Hedden Information Management
    72. 72. Amazon > Books
    73. 73. Facets - ConclusionsAdvantages Supports more complex search queries by users Allows users to control the search refinement, narrowing or broadening in any manner or orderDisadvantages Only suitable for somewhat structured, unified type of content that share the same multiple facets Might not support multiple terms selected at once from the same facet Often hidden from users under “Advanced Search” Requires investment of thorough (multifacted) indexing/tagging © 2012 Hedden Information Management
    74. 74. Facets - ConclusionsFacet Design Tips Number of facets: 4-8, with 5-6 as ideal Facets listed in logical, not alphabetical order Number of terms per facet: 2-25  Ideally not much more than can be viewed in a scroll box  If the list is obvious (US states), then more is OK.  Exception can be made for hierarchical “Topics” facet If <12 terms, then a logical display order If >12 terms, then alphabetical A two-level hierarchy (indented) within a facet is possible © 2012 Hedden Information Management
    75. 75. Facets - ExerciseDesignate a set of 4-7 facets for a tour operator web site selling vacation packages. © 2012 Hedden Information Management
    76. 76. • Designed to enhance understanding and retention of the vocabulary concepts necessary for creating a taxonomy, ontology, thesaurus, or controlled vocabulary.• Game supplies: – 1 Deck of Orange Question and Challenge Cards – 1 Deck of Green Answer Cards• Game setup: – Shuffle the deck of Green Answer cards, – Deal the entire deck to the players. – Shuffle the deck of Orange Question and Challenge cards – Place them facedown in a pile in the middle of the table so that all players can reach the pile.• Reinforce what you just heard!• Have fun! © 2012. Access Innovations, Inc. All Rights Reserved.
    77. 77. 1. Play moves to the left of the dealer 7. Discussion among the players to arrive at the2. Draw a card from the top of the Orange cards. correct answer is permissible and encouraged! Read it aloud to all of the players. 8. If players do not arrive at a consensus3. The player who read the card says out loud regarding the correct answer, the Orange what they think the answer is. Question and Challenge card may be returned4. Each player looks at the Green Answer cards to the bottom of the pile, and play passes to in their hand. the person to the left of the player who drew the previous card. 1. If they have the correct answer to the Question or Challenge, they show their 9. When all of the Orange Question and Challenge cards have been drawn, read aloud, card to everyone at the table. and matched with their Green Answer cards, 2. If everyone agrees that the answer is the game ends. correct, the player holding the correct 10. If there are any Orange Question and answer card gives it to the player who Challenge cards remaining to which players read the Question or Challenge card. cannot agree on an answer, players may5. The player places their associated pair of consult their notes or ask the session speaker. cards – one Orange Question and Challenge card and one Green Answer card – face up on the table in front of them.6. Play passes to the person who held the correct Green Answer card in their hand. Play continues as in step 2 above. © 2012. Access Innovations, Inc. All Rights Reserved.
    78. 78. Implementation and applications• Adding the terms to the information objects• Search and other applications• Taxonomy use cases – implementation• Opportunities and Obstacles• 30 minutes © 2012. Access Innovations, Inc. All Rights Reserved.
    79. 79. Parts of the puzzle• The taxonomy – The words to use – In the order you want the users to browse• Applications – Search, CMS, SharePoint etc• Implementation / actions – Making the links – Adding terms to information objects• Most people confuse the parts and they act very differently © 2012. Access Innovations, Inc. All Rights Reserved.
    80. 80. The Workflow Fully integrated with MOSS Build Create Gather Tag and Put in search user source create database inverted interface data metadata with tags indexClient Data Automatic SummarizationFull Text SearchHTML, PDF, Machine Aided PresentationData Feeds, Indexer Layeretc. (M.A.I.™) Search Database Software Increases Inline Tagging Repository accuracy Client Browse by SubjectClient Taxonomy taxonomy Metadata and Auto-completion Entity Extractor Broader Terms Narrower Terms Thesaurus Related Terms Master © 2012. Access Innovations, Inc. All Rights Reserved.
    81. 81. Adding terms to information objects• Part of the record – XML – MARC• A relational table pointing the terms to a record ID number (Secondary key)• Adding data to the HTML – META NAME KEYWORD Element• Many other options © 2012. Access Innovations, Inc. All Rights Reserved.
    82. 82. Part of the record - XML• Added as an element in the XML record• Need an element to put the data in – <Taxonomy Term>• Capture the terms when creating the records © 2012. Access Innovations, Inc. All Rights Reserved.
    83. 83. Editorial Workflow Integration Author Submission ModuleThe author fills in the data to the document template, attaching images and graphs as necessary An API calls Data Harmony and generates a list of indexing terms based on the content © 2012. Access Innovations, Inc. All Rights Reserved.
    84. 84. Editorial Workflow Integration Author Submission ModuleAuthors review theindexing and maychange itContent is storedinto a datarepository asHTML, XML, etc. © 2012. Access Innovations, Inc. All Rights Reserved.
    85. 85. In the HTML record• Makes it crawl able for the Internet• Used in CMS applications – Content Management Systems• Add to the HTML – Manually – In Dreamweaver – In your CMS like Extron• Author Submissions Example• Do the same with SharePoint © 2012. Access Innovations, Inc. All Rights Reserved.
    86. 86. META NAME “KEYWORDS” © 2012. Access Innovations, Inc. All Rights Reserved.
    87. 87. In Relational Database Table• Primary key – the record• Secondary key all the metadata – Like taxonomy terms – Like author – Like publication date• Used in Oracle, SQL, etc – Need filed to put the taxonomy data in• Supports “Faceted Search” – Each item in a separate field or element or table © 2012. Access Innovations, Inc. All Rights Reserved.
    88. 88. Relational database diagram © 2012. Access Innovations, Inc. All Rights Reserved.
    89. 89. Using taxonomies in applications• Improve search • In “indexing” or categorizing,• Subject browsing as subject metadata• Mobile intelligence • In content management• Targeted resources based systems on subject or user role • In SharePoint• Link to society resources • In mashups• Author submission module • In social networking sites• Author authority database • In author tagging• Expert reviewer • In filtering data – e.g., spam identification filters and RSS feeds• Member profiles • In web crawlers• Data visualization • Social media - community• More like this © 2012. Access Innovations, Inc. All Rights Reserved.
    90. 90. Why does search fail?• Most large organizations have 5 search softwares – All disappointing and on the shelf• Inconsistent results• Unclear path to results• Lack of single unified clear consistent vocabulary• Not tied to data governance – Taxonomy – Other metadata © 2012. Access Innovations, Inc. All Rights Reserved.
    91. 91. Parts of Search• Search software – Inverted Index – Search algorithms• Presentation layer – Search box – Autocompletion – Related and narrower terms – Hierarchical display © 2012. Access Innovations, Inc. All Rights Reserved.
    92. 92. Sample DOCUMENT Outline of Presentation 1 Define key terminology 2 Thesaurus toolsCreating – Featuresan – FunctionsInverted 3 CostsFile Index – Thesaurus construction – Thesaurus tools 4 Why & when? © 2012. Access Innovations, Inc. All Rights Reserved.
    93. 93. Simple inverted file index The terms from the “outline” & key 1 of 2 outline 3 presentation 4 terminology construction thesaurus costs tools define when features why functions © 2012. Access Innovations, Inc. All Rights Reserved.
    94. 94. Complex inverted file index Placement location key - L2, P2, H& - Stop of - Stop1 - Stop outline - L1, P1, T2 - Stop presentation - L1, P3, T3 - Stop terminology - L2, P3, H4 - Stop thesaurus - (1) - L3, P1, Hconstruction - L7, P2, SH (2) - L7, P1, SHcosts - L6, P1, H (3) - L8, P1, SHdefine - L2, P1, H tools - (1) - L3, P2, Hfeatures - L4, P1, SH (2) - L8, P2, SHfunctions - L5, P1, SH when - L9, P3, H why - L9, P1, H © 2012. Access Innovations, Inc. All Rights Reserved.
    95. 95. Improve search Auto-completion using the taxonomy Guide the userNavigatethe fulltaxonomy“tree”BROWSE © 2012. Access Innovations, Inc. All Rights Reserved. 2011.
    96. 96. Subject browsing© 2012. Access Innovations, Inc. All Rights Reserved. 2011.
    97. 97. Targeted resources based on subject or user role CONFIDENTIAL © 2012. Access Innovations, Inc. All Rights Reserved.
    98. 98. Linked data CME Activity on Topic A Upcoming Other Journal Conference Articles on on Topic A Topic A Job Posting Journal for Expert Article on on Topic A Topic AGrant Available for Podcast Interview Researchers with ResearcherWorking on Topic A Working on Topic A Author Networks Social Networking © 2012. Access Innovations, Inc. All Rights Reserved.
    99. 99. Link to society resourcesCancer Epidemiology Biomarkers & Prevention Related Press Releases •How What and How Much We Eat (And Drink) Affects OurVol. 12, 161-164, Risk of CancerFebruary 2003 •Novel COX-2 Combination Treatment May Reduce Colon© 2003 American Association for Cancer Research Cancer Risk Combination Regimen of COX-2 Inhibitor and Fish Oil Causes Cell DeathShort Communications •COX-2 Levels Are Elevated in SmokersAlcohol, Folate, Methionine, and Risk of Incident BreastCancer in the American Cancer Society Cancer Prevention Related AACR Workshops and Conferences •Frontiers in Cancer Prevention ResearchStudy II Nutrition Cohort •Continuing Medical Education (CME)Heather Spencer Feigelson1, Carolyn R. Jonas, Andreas S. •Molecular Targets and Cancer TherapeuticsRobertson, Marjorie L. McCullough, Michael J. Thun and Related Meeting AbstractsEugenia E. Calle Department of Epidemiology and Surveillance •Association between dietary folate intake, alcohol intake, and methylenetetrahydrofolate reductase C677T and A1298CResearch, American Cancer Society, National Home Office, polymorphisms and subsequent breastAtlanta, Georgia 30329-4251 •Folate, folate cofactor, and alcohol intakes and risk for colorectal adenoma •Dietary folate intake and risk of prostate cancer in a largeRecent studies suggest that the increased risk of breast cancer prospective cohort studyassociated with alcohol consumption may be reduced byadequate folate intake. We examined this question among Related Education Book Content66,561 postmenopausal women in the American Cancer SocietyRelated Working Groups Think Tank Report Oral Contraceptives, Postmenopausal Hormones,•FinancePrevention Study II NutritionRelated Think Tank ReportCancer Cohort. and Breast Cancer•Charter Content Physical Activity and Cancer•Molecular Epidemiology Webcasts Hormonal Interventions: From Adjuvant Therapy toRelated Awards Related Webcasts Breast Cancer Prevention•AACR-GlaxoSmithKline Clinical Cancer ResearchScholar Awards•ACS Award•Weinstein Distinguished Lecture © 2012. Access Innovations, Inc. All Rights Reserved.
    100. 100. Authors at a place© 2012. Access Innovations, Inc. All Rights Reserved.
    101. 101. Member profile taggingUser pastes or uploads CVButton to auto-extract taxonomyattributes © 2012. Access Innovations, Inc. All Rights Reserved. 2011.
    102. 102. User uploads a document to SharePoint space Adding terms to SharePointBefore uploading to Data HarmonySharePoint server, the automatically attachesEventHandler sends the indexing terms beforedocument to Data uploading to MOSSHarmony. TaxoTerm Server Microsoft Data Harmony Returns subject SharePoint (M.A.I.) metadata Server 2010 108 © 2012. Access Innovations, Inc. All Rights Reserved.
    103. 103. SharePoint 2010 only shows 10 lines of the taxonomy This add on makes it all viewable 109 © 2012. Access Innovations, Inc. All Rights Reserved.
    104. 104. Taxonomies added in search exampleCore Architectural Components Administrator’s Dashboard FAST MANAGEMENT API Web WEB SEARCH Content CRAWLER SERVER Vertical Pipeline QUERY API Query Applications FILE Pipeline PROCESSOR Files, TRAVERSER QUERY Documents CONTENT API Portals DATABASE PROCESSOR DOCUMENT Databases CONNECTOR Index DB Results Custom EMAIL FILTER Alerts Front-Ends Email, Groupware CONNECTOR SERVER Mobile Search harmony Custom Content CUSTOM Devices Applications Push CONNECTOR Agent DB MAIstro Use taxonomy terms here Data Harmony Governance API © 2012. Access Innovations, Inc. All Rights Reserved.
    105. 105. Autosuggestion of taxonomy terms Allow for manual review of auto-Populate tagging forKeywords, qualityDescriptors, assurance.Indexing terms,etc. © 2012. Access Innovations, Inc. All Rights Reserved.
    106. 106. More Innovations• Link topic to article to author to event• Make visual links within domain• Enable authors to submit and categorize conference submissions• Create author authority database linking to co- authors, topics, locations, etc.• Create expert reviewer database• Create member profiles with alternate names, publications, tagged by topic• Visualize data and domain distribution• Display interest connections in social network• Deliver accurate targeted information through mobile applications• Etc. © 2012. Access Innovations, Inc. All Rights Reserved.
    107. 107. Taxonomy standards• Z39.19 (2005) Controlled Vocabularies• BS 8723 Parts 1 – 5• ISO25964 Parts 1 - 2• TAG 37 and 46 standards• SKOS - Simple Knowledge Organization System• OWL - Web Ontology Language• AND more! © 2012. Access Innovations, Inc. All Rights Reserved.
    108. 108. IT is often Fire, Ready, Aim!• Choose the hardware• Choose the software• Decide on the format• Convert the data• Fix the data• Tack on a taxonomy• Ignore the standards © 2012. Access Innovations, Inc. All Rights Reserved.
    109. 109. Change to Ready, Aim, Fire!• Follow the data• Look at the data, format and content• Design taxonomy for data• Leverage the standards• Use taxonomy to tag data• Choose search and repository software for data• Load the data into the system• Keep your eye on the target © 2012. Access Innovations, Inc. All Rights Reserved.
    110. 110. Summary• We covered the basics• We talked about the implementation• Application of the terms to your content• We reinforced the learning with activities• No go hear the case studies of the next two days! © 2012. Access Innovations, Inc. All Rights Reserved.
    111. 111. Questions?Heather Hedden Marjorie M.K. HlavaTaxonomy Consultant PresidentHedden Information Management Access Innovations, mhlava@accessinn.com978-467-5195 505-998-0800