Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Taxonomy Development and Digital Projects


Published on

Presentation from ALA Midwinter 2009 (American Library Association) meeting as part of the Networked Resources and Metadata Interest Group (NRMIG). A discussion on taxonomy development lead by Laura Dorricott a Taxonomy Project Delivery Manger with Dow Jones Taxonomy Services on Sunday, January 25th 2009.

Corresponding Blog post with notes from session by Laura available here:

Published in: Technology
  • Be the first to comment

Taxonomy Development and Digital Projects

  1. 1. Taxonomy Development and Digital Projects Laura Dorricott Project Delivery Manager, Taxonomy Services Dow Jones Client Solutions January 25, 2009 Networked Resources and Metadata Interest Group ALA Midwinter 2009
  2. 2. Introduction <ul><li>Laura Dorricott, Project Delivery Manager, Taxonomy Services, Dow Jones Client Solutions </li></ul><ul><li>IHS, Inc. – Indexer and Lexicographer </li></ul><ul><li>Synapse – 1995-2005 </li></ul><ul><li> Taxonomist and Operations Director </li></ul><ul><li>Dow Jones – 2005 – Project Delivery Manager </li></ul>
  3. 3. Information management needs – What do we do with this??? <ul><li>American </li></ul>Theo LeSieg Theodore Seuss Geisel Children’s writer March 2, 1904 Springfield, MA Articles about “Dr. Seuss Dr. Seuss
  4. 4. Taxonomy’s Evolutionary Path © 2007, Dow Jones Dictionaries & Flat Lists Hierarchical Taxonomies Controlled Vocabulary Thesauri Ontologies Structured Authority Files Taxonomies are the building blocks for ontologies and ontologies are semantic representations of the real world in all its rich diversity. Taxonomy is evolving organically…
  5. 5. Definitions of Controlled Vocabularies <ul><li>List: </li></ul><ul><li>“ Sometimes called a pick list, a limited set of terms arranged as a simple alphabetical list or in some other logically evident way.” </li></ul><ul><li>Synonym ring: </li></ul><ul><li>“ A group of terms that are considered equivalent for the purposes of retrieval.” </li></ul><ul><li>Taxonomy: </li></ul><ul><li>“ A collection of controlled vocabulary terms organized into a hierarchical structure. Each term has one or more parent/child (broader/narrower) relationships to each other term.” </li></ul><ul><li>Thesaurus: </li></ul><ul><li>“ A controlled vocabulary arranged in a known order and structured so that the various relationships among terms are displayed clearly and identified by standardized relationship indicators. Relationship indicators should be employed reciprocally.” </li></ul>
  6. 6. Next Generation <ul><li>Ontology: </li></ul><ul><li>“ A controlled vocabulary developed to bridge the gap between the real world and the information world, by striving to exactly model and control all the fundamentals of information concepts with the goal of building a new class of intelligent technologies and knowledge systems. ” </li></ul>
  7. 7. Purposes of Controlled Vocabularies <ul><li>Translation </li></ul><ul><li>Consistency </li></ul><ul><ul><li>Provide a framework of concepts that accurately represents the real world.* </li></ul></ul><ul><li>Indication of semantic relationships </li></ul><ul><li>Hierarchical arrangement to assist browsing </li></ul><ul><li>Search and retrieval </li></ul><ul><ul><li>Improve precision and recall </li></ul></ul><ul><ul><li>Reduce search time </li></ul></ul>* Real world includes physical objects, databases, digital content, and abstract domains of knowledge
  8. 8. SEARCH
  9. 9. Keyword Search <ul><li>Keyword searching is insufficient </li></ul><ul><li>People do not always know what they want </li></ul><ul><li>People all have different “keywords” </li></ul><ul><li>People don’t perform complex keyword searches </li></ul><ul><li>One word can have many meanings </li></ul><ul><li>Two or more words can share the same meaning </li></ul>
  10. 10. one thing can have many different names Dr. Peter Roget one word can mean very different things &quot;the elasticity of language&quot;
  11. 12. Taxonomy helps people filter out the noise and discover the relevant things regardless of what they are called.
  12. 13. NAVIGATE
  13. 14. Search and Navigation are not alternative solutions, they are complementary solutions Users expect both
  14. 15. Points of view…
  15. 16. one point of view…
  16. 17. another point of view…
  17. 18. Different audiences will have different views and good navigation will serve all of them.
  18. 19. Building a Taxonomy or Controlled Vocabulary <ul><li>Now that we know what taxonomies and controlled vocabularies are and can see some of the reasons we need them – what do we do next??? </li></ul>
  19. 20. Building a Taxonomy or Controlled Vocabulary <ul><ul><ul><li>Basic issues and principles </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul><ul><ul><ul><ul><li>One word can have multiple meanings (ambiguity) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Two words can share the same meaning (synonymy) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Semantic relationships </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Facets </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Warrant </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Structures </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Metadata </li></ul></ul></ul></ul>
  20. 21. Ambiguity <ul><ul><ul><li>Polysemes (homonyms, homographs) </li></ul></ul></ul><ul><ul><ul><ul><li>cranes (birds) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>cranes (equipment) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Mercury (planet) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Mercury (god) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Mercury (car) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Mercury (metal) </li></ul></ul></ul></ul>Ambiguity
  21. 22. Synonymy <ul><ul><ul><li>Two words with the same or similar meaning </li></ul></ul></ul><ul><ul><ul><ul><li>Popular vs. scientific names </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Generic vs. trade names </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Slang vs. traditional terms </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Dialectical variants </li></ul></ul></ul></ul><ul><ul><ul><li>Near-synonyms </li></ul></ul></ul><ul><ul><ul><li>Lexical variants </li></ul></ul></ul><ul><ul><ul><li>Generic postings </li></ul></ul></ul>Synonymy
  22. 23. Semantic Relationships <ul><ul><ul><li>Basic Types: </li></ul></ul></ul><ul><ul><ul><ul><li>Equivalence (USE/UF) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Hierarchical (BT/NT) </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Associative (RT/RT) </li></ul></ul></ul></ul><ul><ul><ul><li>Represented by standard codes/symbols </li></ul></ul></ul><ul><ul><ul><li>Reciprocity </li></ul></ul></ul>Semantic Relationships
  23. 24. Hierarchical Relationships <ul><li>Allow for browsable structures </li></ul><ul><li>Information discovery </li></ul><ul><li>Search expansion </li></ul><ul><li>Three types: </li></ul><ul><ul><li>Generic </li></ul></ul><ul><ul><li>Instance </li></ul></ul><ul><ul><li>Whole-part </li></ul></ul>
  24. 25. Hierarchical Relationships <ul><li>Between a class and its members </li></ul><ul><li>“ IsA” relationship </li></ul><ul><ul><li>A cactus IsA succulent plant, therefore: </li></ul></ul><ul><ul><ul><li>succulent plants NT cacti </li></ul></ul></ul>Generic Hierarchical Relationships
  25. 26. Hierarchical Relationships <ul><li>Between a general category of things or events and an individual instance of that category </li></ul><ul><li>Instance is often a proper noun </li></ul><ul><li>Also an “IsA” relationship type </li></ul><ul><li>Example: </li></ul><ul><ul><li>mountains NT Rocky Mountains </li></ul></ul>Instance Hierarchical Relationship
  26. 27. Hierarchical Relationships <ul><li>One concept inherently included in another </li></ul><ul><li>Examples: </li></ul><ul><ul><li>Systems and organs of the body </li></ul></ul><ul><ul><li>Geographic locations </li></ul></ul><ul><ul><li>Corporate, social, or political structures </li></ul></ul>Whole Part Hierarchical Relationships
  27. 28. Polyhierachy <ul><li>Concept logically fits into two different hierarchical structures </li></ul><ul><li>Advantage of electronic structures, allows for different viewpoints </li></ul><ul><li>Example: </li></ul><ul><ul><li>Biochemistry </li></ul></ul><ul><ul><ul><li>BT biology </li></ul></ul></ul><ul><ul><ul><li>BT chemistry </li></ul></ul></ul>
  28. 29. Associative Relationships <ul><ul><ul><li>May suggest additional terms for indexing or searching </li></ul></ul></ul><ul><ul><ul><li>Between terms in the same hierarchy </li></ul></ul></ul><ul><ul><ul><ul><li>Overlapping sibling terms </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Derivational relationships </li></ul></ul></ul></ul><ul><ul><ul><li>Between terms in different hierarchies </li></ul></ul></ul><ul><ul><ul><ul><li>Many types </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Examples: Process/agent; Action/property; Cause/effect </li></ul></ul></ul></ul>
  29. 30. Form of Terms <ul><li>Single word or compound terms </li></ul><ul><li>Grammatical forms: </li></ul><ul><ul><li>Nouns and noun phrases </li></ul></ul><ul><ul><li>Singular / plural </li></ul></ul><ul><li>Capitalization </li></ul><ul><ul><li>Predominantly lowercase characters, except for proper names, acronyms, trade names, etc. </li></ul></ul><ul><li>Punctuation </li></ul>
  30. 31. 2007 Factiva, Inc. All Rights Reserved. Standards <ul><ul><ul><li>“ Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies ,” ANSI/NISO Z39 19-2005 </li></ul></ul></ul><ul><ul><ul><li>“ Z39 50: A Primer on the Protocol ,” ANSI/NISO Z39 50 </li></ul></ul></ul><ul><ul><ul><li>“ Structured Vocabularies for Information Retrieval. Guide. Definitions, Symbols and Abbreviations ,” BS 8723-1:2005 </li></ul></ul></ul><ul><ul><ul><li>“ Structured Vocabularies for Information Retrieval. Guide. Thesauri ,” BS 8723-2:2005 </li></ul></ul></ul><ul><ul><ul><li>“ Guidelines for the Establishment and Development of Multilingual Thesauri ,” ISO 5964-1985 </li></ul></ul></ul><ul><ul><ul><li>“ Guidelines for the Establishment and Development of Monolingual Thesauri ,” ISO 2788-1986 </li></ul></ul></ul><ul><ul><ul><li>Web Ontology Language (OWL) Overview </li></ul></ul></ul>Standards
  31. 32. Value Proposition 2007 Factiva, Inc. All Rights Reserved. <ul><ul><li>“ 40% of corporate users…cannot find the information they need to do their jobs on their intranets.” </li></ul></ul><ul><ul><li>Susan Feldman, “The High Cost of Not Finding Information,” KMWorld, March 2004 </li></ul></ul>Value Proposition, or “So what?”
  32. 33. Low productivity High frustration Little leverage of information assets Too many search results Too many irrelevant hits The more precise I get the more I miss End-user search illiteracy Multilingual content Ambiguous results Information retrieval issues within companies
  33. 34. The controlled vocabulary value proposition <ul><li>Unlock the value of internal and external content to: </li></ul><ul><ul><li>Improve productivity </li></ul></ul><ul><ul><ul><li>“ Stop searching, start finding” </li></ul></ul></ul><ul><ul><li>Reduce cost </li></ul></ul><ul><ul><ul><li>Make existing content actionable, not dormant </li></ul></ul></ul><ul><ul><ul><li>Avoid reinventing wheels </li></ul></ul></ul><ul><ul><li>Gain competitive advantage </li></ul></ul><ul><ul><ul><li>Be better informed, act quicker </li></ul></ul></ul>
  34. 35. Controlled vocabulary’s role in portal success <ul><li>Drive usage </li></ul><ul><ul><li>Improve user experience, leverage portal investment </li></ul></ul><ul><li>Drive cultural change </li></ul><ul><ul><li>Help develop a common language </li></ul></ul><ul><ul><li>Support information exchange/reuse </li></ul></ul><ul><li>Leverage information management skills </li></ul><ul><ul><li>Turn information officers into information architects </li></ul></ul>
  35. 36. Value Proposition <ul><ul><li>Taxonomies make it easier to find information so people are more likely to use intranets and extranets. This results in better return on the time and effort already invested in these intranets and extranets. </li></ul></ul><ul><ul><li>Taxonomies improve “hit” rates - people find what they need </li></ul></ul><ul><ul><ul><li>Everyone has experienced irrelevant results from internet search engines because </li></ul></ul></ul><ul><ul><ul><ul><li>• Two or more words or terms can be used to represent a single concept </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>salinity/saltiness </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><li>• Two or more words that have the same spelling can represent different concepts </li></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Mercury (planet) </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Mercury (metal) </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>Mercury (automobile) </li></ul></ul></ul></ul></ul><ul><ul><ul><li>Taxonomies eliminate much of this problem </li></ul></ul></ul><ul><ul><li>People spend less time searching and more time finding </li></ul></ul><ul><ul><li>With a common taxonomy across the organization, knowledge can be more readily shared, reused and repurposed </li></ul></ul>
  36. 37. Controlled vocabulary can help reduce costs and increase revenue <ul><ul><li>Taxonomies can help organizations save money </li></ul></ul><ul><ul><ul><li>Reduces the number of hours spent seeking information. Hierarchical relationships allow users to easily narrow or broaden searches as well as look for related information. </li></ul></ul></ul><ul><ul><ul><li>Improves productivity by reusing and repurposing content </li></ul></ul></ul><ul><ul><li>A taxonomy can help increase revenue </li></ul></ul><ul><ul><ul><li>Increase customer satisfaction by improving </li></ul></ul></ul><ul><ul><ul><ul><li>search efficiency </li></ul></ul></ul></ul><ul><ul><ul><ul><li>findability </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Relevance </li></ul></ul></ul></ul><ul><ul><ul><li>Provide timely information with up to date terminology </li></ul></ul></ul><ul><ul><ul><li>Provide more precise information retrieval </li></ul></ul></ul>
  37. 38. <ul><li>THANK YOU! </li></ul><ul><li>Laura Dorricott </li></ul><ul><li>[email_address] </li></ul>