Taxonomies And Search Aiim Mn

5,298 views

Published on

Seth Early

Published in: Business, Technology

Taxonomies And Search Aiim Mn

  1. 1. Improving Search with Taxonomy AIIM Minnesota Chapter Control: The Content Management Expo January 26 th , 2009 Seth Earley 781-444-0287 [email_address]
  2. 2. Seth Earley, Founder, Earley & Associates, Inc. <ul><li>20 person consulting firm working with enterprises to develop knowledge and content management systems and taxonomy, metadata and search strategies </li></ul><ul><li>Co-author of Practical Knowledge Management from IBM Press </li></ul><ul><li>14 years experience building taxonomies for content and knowledge management systems, 20+ years experience in technology </li></ul><ul><li>Founder of the Boston Knowledge Management Forum </li></ul><ul><li>Former adjunct professor at Northeastern University </li></ul><ul><li>Founder of Search Community of Practice : http://tech.groups.yahoo.com/group/SearchCoP </li></ul><ul><li>Founder of Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP </li></ul><ul><li>Host monthly conference calls of case studies on search and taxonomy </li></ul><ul><li>Recently acquired taxonomy management tool company (www.wordmap.com) </li></ul>Precise access to information, enabled by consistent organisation
  3. 3. Agenda <ul><li>Search and the hype cycle, search as a utility </li></ul><ul><li>Basic premises </li></ul><ul><li>The challenge of search </li></ul><ul><li>Taxonomy, metadata & content management </li></ul><ul><li>5 taxonomy & search strategies you should know </li></ul><ul><ul><li>Tuned search </li></ul></ul><ul><ul><li>Metadata & tagging </li></ul></ul><ul><ul><li>Faceted search </li></ul></ul><ul><ul><li>Disambiguation </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><li>Conclusion </li></ul>What is it? How does taxonomy help? When do I use it? How do I implement it?
  4. 4. Search as Utility <ul><li>“ search as a utility has become deeply ingrained into people's everyday lives.“ – Study by Nielsen/Net Ratings </li></ul><ul><li>“ search software, hardware, and support bundle or search appliance has become very popular since being introduced in early 2002&quot; – Goebel Group </li></ul>These are misleading concepts. Search is used as a utility, but contexts vary so widely that “plugging search in” does not always produce satisfactory results.
  5. 5. Search and the Hype Cycle <ul><li>Different ‘flavors’ of Search are at various levels of maturity </li></ul><ul><li>1. On the Rise </li></ul><ul><li>Corporate Semantic Web </li></ul><ul><li>Desktop Portals </li></ul><ul><li>Content-Process Fusion </li></ul><ul><li>Desktop Search </li></ul><ul><li>Personal Knowledge Networks </li></ul><ul><li>Information Extraction </li></ul><ul><li>3. Sliding Into the Trough </li></ul><ul><li>Public Semantic Web </li></ul><ul><li>Automated Text Categorization </li></ul><ul><li>Expertise Location and Management </li></ul><ul><li>Folksonomies </li></ul><ul><li>E-Learning Suites </li></ul><ul><li>Shared Workspaces </li></ul><ul><li>Records Management </li></ul><ul><li>4. Climbing the Slope </li></ul><ul><li>Web Conferencing </li></ul><ul><li>MMS </li></ul><ul><li>Enterprise Content Management </li></ul><ul><li>Presence </li></ul><ul><li>2. At the Peak </li></ul><ul><li>Enterprise IM </li></ul><ul><li>Information Retrieval and Search — Advanced </li></ul><ul><li>Smart Enterprise Suites </li></ul><ul><li>Wikis </li></ul><ul><li>Content Integration </li></ul><ul><li>Taxonomy </li></ul><ul><li>Corporate Blogging </li></ul><ul><li>5. Entering the Plateau </li></ul><ul><li>Virtual Workplace </li></ul><ul><li>Knowledge Management </li></ul>1 2 3 4 5 Source: http://www.gartner.com
  6. 6. Basic Premises <ul><li>Premise 1 – All of search is about metadata </li></ul><ul><ul><li>Need to understand the relationship of taxonomy and metadata </li></ul></ul><ul><li>Premise 2 – The line between search and navigation is blurring </li></ul><ul><ul><li>Faceted search looks like navigation, guided navigation is search </li></ul></ul><ul><li>Premise 3 – Search needs to be designed as an application, not an appliance </li></ul><ul><ul><li>Design of any application requires attention to user context </li></ul></ul><ul><li>Premise 4 – Search needs to be integrated into processes, not added on </li></ul><ul><ul><li>Relevant search is context specific, context depends on process </li></ul></ul>
  7. 7. Basic Premises <ul><li>Premise 5 – We need to understand work processes, user tasks and user context in order to make search effective </li></ul><ul><ul><li>Users search for information in order to accomplish a goal </li></ul></ul><ul><li>Premise 6 – Taxonomy, metadata and information architecture are all aspects of search </li></ul><ul><ul><li>These are all an attempt to surface information for users in the context of their objectives </li></ul></ul><ul><li>Premise 7 – Search algorithms, no matter how sophisticated, intelligent and complex will never obviate the need for some level of structured tagging </li></ul><ul><li>Premise 8 – Taxonomy strategy needs to be tightly linked to search strategy (and to content strategy) </li></ul>
  8. 8. Basic Premises <ul><li>Premise 9 – Metadata is either implicit in content or explicitly applied to content </li></ul><ul><ul><li>Implicit metadata can take many forms – inherent structure of a piece of content or even the source or context of content </li></ul></ul><ul><li>Premise 10 – Search is messy </li></ul><ul><ul><li>Relevant results are in the eye of the beholder, language is imprecise, meaning is vague </li></ul></ul>
  9. 9. “… search terms are short, ambiguous and an approximation of the searcher’s real information need…” <ul><ul><li>Source: http://research.microsoft.com/~ryenw/papers/WhiteCONTEXT2002.pdf </li></ul></ul><ul><ul><li>Ryen W. White, Joemon M. Jose and Ian Ruthven </li></ul></ul>
  10. 10. The Challenge of Search <ul><li>Search seems to be a ‘given’ – we expect it to be there </li></ul><ul><li>Most enterprise search is less than optimal – too many results, irrelevant results, missing results </li></ul><ul><li>It was not so long ago that organizations were starved for information </li></ul><ul><li>A puzzling fact: as information environments have grown more complex , users expectations have grown that search should be simpler </li></ul>
  11. 11. Search is complex Enterprise search is diverse – need to access multiple applications and contexts – both structured and unstructured Business Intelligence/Analytics Customer Relationship Mgt Document repositories Custom databases and applications Intranets/web pages
  12. 12. Search is Heterogeneous Search/Tagging/Taxonomy Integration Framework Data Sources Search Mechanisms Appliances Federated Search Auto categorization/ Clustering Entity Extraction Faceted Search Semantic Search Business Intelligence Customer Relationship Mgt Document repositories Custom databases and applications Intranets/web pages
  13. 13. Change is constant <ul><li>Snap shot versus movie </li></ul><ul><li>Business changes faster than IT can support </li></ul><ul><li>Systems and tools grow up to solve specific problems without a view toward integration </li></ul><ul><li>Integration efforts lead to dis-integration </li></ul><ul><li>Need common frameworks for access and search, not discrete search applications </li></ul>
  14. 14. Contributing factors <ul><li>Proliferation of information sources </li></ul><ul><li>‘ Personal’ organizing perspectives </li></ul><ul><li>Trade off between chaos and control, creation and reuse </li></ul><ul><li>Interrupt driven world </li></ul><ul><li>No processes to organize, no business imperative or process </li></ul>
  15. 15. What is the right balance? <ul><li>Content can be created in structured or unstructured contexts </li></ul><ul><li>It’s value can vary depending on audience, context or process </li></ul><ul><li>Some content is extremely nuanced and requires more precise access (according to audience or task, solution, etc…) </li></ul><ul><li>Search can be based on inherent structure and content of a document (implicit metadata) or on information applied to that content (explicit metadata) </li></ul>
  16. 16. Different tools create different content structures © 2007 More Structured Email Instant Messages Wikki’s Blogs Discussions Collaborative Workspaces Online Learning Instructor Led Courses Content Mgt Workflow systems Doc Mgt Systems Records Mgt Systems Knowledge Creation Knowledge Access/Reuse Chaotic Processes Controlled Processes Less Structured
  17. 17. Relative value Lower Cost Higher Cost Message text External News Example deliverables Discussion postings Interim deliverables Content Repositories Success Stories Benchmarks Approved Methods Best Practices Unfiltered Reviewed/Vetted/Approved Lower Value Higher Value Formal Tagging/Organizing Processes (More difficult to access) (Easier to access) Social tagging (“folksonomy”) Structured tagging (taxonomy)
  18. 18. Taxonomy, Metadata and Content Foundational concepts for effective search
  19. 19. Taxonomy is a foundation… <ul><li>It is a system for classification </li></ul><ul><li>It allows for a means to organize documents and web content </li></ul><ul><li>Helps us fine tune search tools and mechanisms </li></ul><ul><li>Creates a common language for sharing concepts </li></ul><ul><li>Allows for a coherent approach to integrate information sources </li></ul><ul><li>It is a common language for business processes </li></ul>
  20. 20. Goals of a taxonomy <ul><li>Allow for knowledge discovery </li></ul><ul><li>Improve usability of applications as well as learnability of applications </li></ul><ul><li>Reduce the cost of delivering services, developing products and conducting operations </li></ul><ul><li>Improve operational efficiencies by allowing for reuse of information rather than recreation </li></ul><ul><li>Improve search results and applicability (both precision and recall) </li></ul>
  21. 21. What is metadata? <ul><li>It is the “is –ness” of a piece of content </li></ul><ul><li>And the “about- ness” of a piece of content </li></ul><ul><li>This is a Product Description </li></ul><ul><li>It is about the Motorola Razr </li></ul>Taxonomies are the organizing principle behind metadata and the values that populate metadata fields
  22. 22. What is a content model? <ul><li>Content is structured with body information and a wrapper that formats and tags that information </li></ul><ul><li>Also called a “content object model”* </li></ul>Title Description Simple content object model *Content model refers to overall framework Content object model refers to a specific model for a set of document types I.e., an overall “Content Model” includes multiple Content Object Models”
  23. 23. Metadata for a product page in a content management system Title Date Author Features Product_Name Category Doc_ID Doc_Type “ is – ness” “ about – ness” FAQ Product Press release Specification Promotion
  24. 24. Content modeling – Policy example Title Date Author Subject Doc_ID Content_ID Date Content_ID Date Content_ID Date Standard Header Policy content type Customer Service content type Claims processing content type
  25. 25. Deriving a content model <ul><li>How are elements assembled? </li></ul><ul><li>What are the decision points where users need to get to specific content items? </li></ul><ul><li>Where else can an element be used? </li></ul><ul><li>How granular will access need to be? </li></ul>
  26. 26. © 2007
  27. 27. © 2007
  28. 28. Why the metadata tutorial? One word: faceted search
  29. 29. Faceted Search/Guided Navigation Or – It’s a dessert, it’s a floor wax, it’s both!
  30. 30. Navigational taxonomy Taxonomy can be a hierarchical grouping of navigational nodes on a web site Challenge is there is no “one way” to navigate that is correct. Is this the “correct” way?
  31. 31. Navigational taxonomy Or is this one “correct”? Or is this one?
  32. 32. Motorola.com => United States => Government => Portable Radios Motorola.com => Portable Radios => United States => Government Motorola.com => Government => Portable Radios => United States
  33. 33. Navigating with “facets” <ul><li>Two way radios </li></ul><ul><ul><li>Portable </li></ul></ul><ul><ul><li>Fixed </li></ul></ul><ul><ul><li>Mobile </li></ul></ul><ul><ul><li>Motorcycle </li></ul></ul><ul><li>Vertical market </li></ul><ul><ul><li>Government </li></ul></ul><ul><ul><li>Manufacturing </li></ul></ul><ul><ul><li>Wholesale retail </li></ul></ul><ul><li>Country </li></ul><ul><ul><li>Canada </li></ul></ul><ul><ul><li>United Kingdom </li></ul></ul><ul><ul><li>United States </li></ul></ul>Product type Geographic region “ Facet” is a top level category in the taxonomy Just three nodes with 5 terms each could have 3 to the 5 th power (243) possible combinations Vertical market Target document: P = Portable radio G = United States V = Government
  34. 34. Faceted search implies tagged content with nice structured metadata… What if we don’t have a lot of existing metadata? Does that mean hire bunch of people to enter it in? Manual tagging is rarely practical with large amounts of lower value content. Instead, we need to derive implicit metadata from content
  35. 35. All search leverages metadata… <ul><li>… but not all metadata is explicit </li></ul><ul><li>Full text search derives metadata about documents </li></ul><ul><li>Creates an index of terms that occur in a document collection </li></ul><ul><li>Associates documents with those index entries </li></ul>
  36. 36. All search leverages metadata… <ul><li>Occurrence of certain words in a document and the relative value of those occurrences, including: </li></ul><ul><ul><li>Weighting </li></ul></ul><ul><ul><li>Relative positioning </li></ul></ul><ul><ul><li>Semantic relationships… </li></ul></ul><ul><li>… becomes information about the document that is cached in the index and served by the search engine </li></ul><ul><li>Search algorithms vary in how metadata is derived and exposed to users. </li></ul>Relevance ranking , for example, is additional metadata for a result that is ‘implied’ or derived based on incoming connections to a piece of content.
  37. 37. Context as metadata <ul><li>Metadata can be explicit or implicit </li></ul><ul><li>Implicit: implied though not directly expressed; inherent in the nature of something, implied by context </li></ul><ul><li>Explicit: precisely and clearly expressed or readily observable; leaving nothing to implication </li></ul>
  38. 38. Examples of implicit metadata: <ul><li>‘ Structure’ and format of content – a piece of content may be ‘unstructured’ and not contain metadata, but it is well organized. </li></ul><ul><ul><li>Example : Newspaper story contains a headline, sub head, and first paragraph with who, what, where, when, etc. </li></ul></ul><ul><ul><li>Clear editorial standards </li></ul></ul><ul><li>Context of content – Where did the content come from? If from a particular web site, file share, data source or intranet location the domain of knowledge provides context. </li></ul><ul><ul><li>How can we disambiguate the term “diamond”? </li></ul></ul><ul><ul><ul><li>Sports site – baseball diamond </li></ul></ul></ul><ul><ul><ul><li>Commerce site – diamond ring </li></ul></ul></ul><ul><ul><li>Sales context for ‘feature’ versus engineering context for ‘feature’ </li></ul></ul><ul><ul><li>“ Adapter” – power cord </li></ul></ul><ul><ul><li>“ Adapter” – blue tooth headset </li></ul></ul>
  39. 39. Context as metadata <ul><li>If we maintain context of a piece of information in our search results, this is equivalent to having additional metadata on that content </li></ul>Search results organized by repository This is a form of “federated” search – a single search term fed to multiple repositories Example courtesy of Morrison and Foerster
  40. 40. Structure as metadata <ul><li>Some content has excellent implicit metadata </li></ul><ul><ul><li>News story for example </li></ul></ul><ul><ul><ul><li>Has a main topic </li></ul></ul></ul><ul><ul><ul><li>Usually a summary of important points at the beginning </li></ul></ul></ul><ul><ul><ul><li>Mentions people, places and things that can be ‘extracted’ as entities </li></ul></ul></ul><ul><ul><ul><li>Complies with editorial standards, usually contains a narrow theme </li></ul></ul></ul><ul><ul><li>Will get good results from auto categorization and entity extraction </li></ul></ul><ul><li>Some content has poor implicit metadata </li></ul><ul><ul><li>Email for example </li></ul></ul><ul><ul><ul><li>Usually contains lots of topics </li></ul></ul></ul><ul><ul><ul><li>Does not have a theme </li></ul></ul></ul><ul><ul><ul><li>Does not comply with editorial standards, can be rambling, poorly written </li></ul></ul></ul><ul><ul><li>Will not get good results from auto categorization and entity extraction </li></ul></ul>
  41. 41. <ul><li>“ We should get Google”… </li></ul>
  42. 42. Why you will not “just get Google” <ul><li>Google leverages linkages on the web that are not typically duplicated internally in the organization </li></ul><ul><li>Search engines cannot infer intent or know what is important to you in the context of your work task </li></ul><ul><li>Information relevance is dependant on who you are and your level of expertise as well as what you are trying to accomplish </li></ul><ul><li>Not all content is equal - Google is fine for broad search results or less precise information, may not work as well if large numbers of documents with finer granularity of differences </li></ul>
  43. 43. Google’s search appliance is leveraging taxonomy values <ul><li>The new “one box” feature allows querying of structured content via specific keywords </li></ul><ul><ul><li>East Coast Sales </li></ul></ul><ul><ul><li>Contact: Wick </li></ul></ul><ul><ul><li>PO </li></ul></ul><ul><ul><li>Revenue by age </li></ul></ul><ul><ul><li>Weather </li></ul></ul>
  44. 44.
  45. 45.
  46. 46. Configuration process <ul><li>“ Define trigger” </li></ul><ul><li>“ Choose provider” </li></ul><ul><li>“ Format results” </li></ul><ul><li>What does this really mean? </li></ul>Need to consider taxonomy, metadata and thesaurus entries, for example a trigger may include equivalent terms: lax airport conditions SFO airport delays newark airport status See: http://code.google.com/enterprise/documentation/oneboxguide.html
  47. 47. We still have a context problem “ Revenue” is an ambiguous term
  48. 48. Why doesn’t Google, just use Google?
  49. 49. Why you will not “just get Google”
  50. 50. Applying a taxonomy to search <ul><li>We need a mechanism to improve search </li></ul><ul><li>A Taxonomy can be used to </li></ul><ul><ul><li>Define search terms and map those terms to specific locations of information (need to integrate with a search engine) </li></ul></ul><ul><ul><li>Apply terms to a document so that relevant and consistent search results are returned (need to integrate with a content management system) </li></ul></ul><ul><li>A Thesaurus can be used to define term synonyms and related terms in order to improve the recall of information. </li></ul><ul><ul><li>We may define “proposal” and “statement of work” and “SOW” as meaning the same thing. If I enter SOW, I can pull back documents that are labeled with (or contain) the other terms. This is referred to as “term expansion” </li></ul></ul>
  51. 51. Taxonomy & search strategies <ul><li>Five strategies you should know about </li></ul><ul><ul><li>Tuned search </li></ul></ul><ul><ul><li>Faceted search </li></ul></ul><ul><ul><li>Tagging </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><ul><li>Disambiguation </li></ul></ul>
  52. 52. Taxonomy and Search Strategies <ul><li>Pre-search processing </li></ul><ul><ul><li>Search engine incorporates user search terms to narrow search before retrieving results </li></ul></ul><ul><ul><ul><li>Tuned search “Best Bets” </li></ul></ul></ul><ul><ul><ul><li>Faceted search </li></ul></ul></ul><ul><ul><ul><li>Metadata & tagging </li></ul></ul></ul><ul><li>Post Search Processing </li></ul><ul><ul><li>Search results are narrowed after they are retrieved </li></ul></ul><ul><ul><ul><li>Disambiguation </li></ul></ul></ul><ul><ul><ul><li>Clustering </li></ul></ul></ul>
  53. 53. <ul><li>Tuned Search, or “Best Bets” </li></ul>
  54. 54. Tuned Search <ul><li>What is Tuned Search? </li></ul><ul><li>Search terms are defined in a taxonomy and mapped back to specific locations of information (ie. Specific web pages). </li></ul><ul><li>Eg. A user searching on a broad term like cell phones would be first pointed to a landing page (a “best bet”), or presented a box of hand-picked links above regular search results. </li></ul>
  55. 55. Best Bets Example – Best Buy
  56. 56. Tuned Search “Best Bets” <ul><li>The same search using just keyword matching could a have retrieved a list of pages with the words “phone” or “cell” e.g. </li></ul><ul><ul><ul><ul><li>Home phones </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Cordless phones </li></ul></ul></ul></ul><ul><ul><ul><ul><li>12 cell batteries </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Etc. </li></ul></ul></ul></ul><ul><li>Reading through pages of possible matches is time consuming and frustrating </li></ul>
  57. 57. Best Bets Example – SAP.com <ul><li>Search on “CRM” or “Customer Relationship Management” </li></ul>
  58. 58. Tuned Search “Best Bets” <ul><li>How Does a Taxonomy Help? </li></ul><ul><li>Using the taxonomy categories as landing pages assures that users are strategically directed to the content that is most important. </li></ul><ul><li>Best bets are done in conjunction with a taxonomy/thesaurus, not just a list of search terms… </li></ul><ul><ul><li>Eg. Circuit City </li></ul></ul>
  59. 59. Circuit City Example <ul><li>Search on “Cell phone”: </li></ul>
  60. 60. Circuit City Example <ul><li>Search on “Mobile phone”: </li></ul>
  61. 61. Circuit City Example <ul><li>What do these things have to do with mobile phones? </li></ul>
  62. 62. Tuned search – “Best Bets” <ul><li>When do I use it? </li></ul><ul><li>As a portal or websites grow, the number of pages with matching keywords increases. </li></ul><ul><li>This increases the likelihood of a search query returning high numbers of results. </li></ul><ul><li>Tuned search helps when keyword searching brings back to many results, and you want to map common searches to specific, commonly viewed pages of information. </li></ul>
  63. 63. Tuned Search – “Best Bets” <ul><li>How is it implemented? </li></ul><ul><li>Create a small database of search terms and then map these terms to landing pages or specific links </li></ul><ul><ul><li>Common search terms may be extracted from search logs </li></ul></ul><ul><li>Search engine must be configured to display the best bets link box or redirect to the landing page </li></ul><ul><ul><li>Few search engines provide this capability out of the box… </li></ul></ul>
  64. 64. <ul><li>Leveraging taxonomy terms as metadata – Standard search </li></ul>
  65. 65. Leverage the taxonomy terms as metadata – standard search <ul><li>All search leverages metadata </li></ul><ul><li>Metadata is either implied/derived from content or specifically applied to content </li></ul><ul><li>Apply taxonomy terms as metadata to a document so that relevant and consistent search results are returned when users enter query terms </li></ul><ul><ul><li>ie. Taxonomy drives content tagging. Search engine leverages tags for more precise results </li></ul></ul>
  66. 66. Not all content is created equal (ly?) <ul><li>Content can be created in structured or unstructured contexts </li></ul><ul><li>It’s value can vary depending on audience, context or process </li></ul><ul><li>Some content is extremely nuanced and requires more precise access (according to audience or task, solution, etc…) </li></ul><ul><li>Search can be based on inherent structure and content of a document or on information applied to that content </li></ul>
  67. 67. Ranking and Relevance <ul><li>Web search leverages linkages on the web that are not typically duplicated internally in the organization </li></ul><ul><li>Search engines cannot infer intent or know what is important to you in the context of your work task </li></ul><ul><li>Information relevance is dependant on who you are and your level of expertise as well as what you are trying to accomplish </li></ul><ul><li>Not all content is equal – full text search is fine for broad results or less precise information, may not work as well if large numbers of documents with finer granularity of differences </li></ul>
  68. 68. Standard search
  69. 69. <ul><li>Leveraging taxonomy terms as metadata – Faceted search </li></ul>
  70. 70. Leverage the taxonomy terms as metadata - faceted search <ul><li>What is Faceted Search? </li></ul><ul><li>Attribute based search (guided navigation) approach to create precise, targeted search results. Each parameter narrows the search result to the most appropriate content. </li></ul><ul><ul><li>Also commonly referred to as “advanced searching” or “parametric searching” </li></ul></ul><ul><li>Users think they are browsing, but they are actually searching </li></ul><ul><li>Allows for multiple navigation schemes based on taxonomy </li></ul>
  71. 71. Faceted Search Example - Epicurious © 2007 Facets: Course Cuisine Season Type of dish Prep method Source
  72. 72. Faceted search – PC Connection Each parameter narrows the search result to the most appropriate content.
  73. 73. © 2007 Facets Taxo term values
  74. 74. Faceted search <ul><li>How does taxonomy help? </li></ul><ul><li>Specific metadata fields are combined to bring back highly relevant search results. </li></ul>Course Cuisine <ul><li>Course </li></ul><ul><ul><li>Main course </li></ul></ul><ul><ul><li>Dessert </li></ul></ul><ul><ul><li>Hors d’Oeuvres </li></ul></ul><ul><ul><li>Breakfast </li></ul></ul><ul><li>Cuisine </li></ul><ul><ul><li>Japanese </li></ul></ul><ul><ul><li>Mexican </li></ul></ul><ul><ul><li>Chinese </li></ul></ul><ul><li>Prep method </li></ul><ul><ul><li>Steam </li></ul></ul><ul><ul><li>Stir fry </li></ul></ul><ul><ul><li>Barbecue </li></ul></ul>Prep method Target document: Course = Main course Cuisine = Japanese Prep = Steam
  75. 75. Faceted search <ul><li>How do I Implement It? </li></ul><ul><li>Requires structured data – parameters/facets driven by metadata and taxonomy </li></ul><ul><li>Can also derive metadata using “entity extraction” and simulate attributes and facets </li></ul>
  76. 76. <ul><li>Entity extraction: Deriving metadata based on semantic rules that identify term context (for example, if I see a word: Washington, recognizing this as a place in one context and a person in another) </li></ul>Washington? People, businesses, locations, dates, etc or may be specific to a particular knowledge domain (chemical entities for example) Entities:
  77. 77. Taxonomy and Search <ul><li>Post Search Processing- Search results are narrowed after they are retrieved </li></ul><ul><ul><li>Clustering </li></ul></ul><ul><ul><li>Disambiguation </li></ul></ul>
  78. 78. <ul><li>Categorization </li></ul>
  79. 79. Categorize search results using the taxonomy <ul><li>What is Categorization? </li></ul><ul><li>Automated classification tool creates groups of documents based on word patterns and places those documents into taxonomic categories </li></ul><ul><li>Some clustering algorithms simply group results based on word occurrences </li></ul><ul><li>This is not true categorization of results correlated with a taxonomy </li></ul>
  80. 80. Categorize search results using taxonomy “clustering” <ul><li>When Do I Use Clustering? </li></ul><ul><li>In an e-commerce scenario it can be used to break down large categories of products according to their attributes </li></ul><ul><ul><ul><li>Amazon </li></ul></ul></ul><ul><ul><ul><ul><li>http://amazon.com/ </li></ul></ul></ul></ul><ul><ul><ul><li>Wal Mart </li></ul></ul></ul><ul><ul><ul><ul><li>http://walmart.com/ </li></ul></ul></ul></ul>
  81. 81. Clustering Example – Circuit City
  82. 82. Clustering <ul><li>When do I use Clustering? </li></ul><ul><li>Remember our example of why we won’t just use Google? </li></ul>
  83. 83. Why you will not “just get Google”
  84. 84. Clustering <ul><li>Imagine if the same results for pesticides were clustered for the user i.e. </li></ul><ul><ul><ul><ul><li>Pesticides (Laws and Regulations) 147 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pesticides (Chemical Formula Names) 159 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Pesticides (Distributors) 89 </li></ul></ul></ul></ul><ul><ul><ul><ul><li>Etc. </li></ul></ul></ul></ul>
  85. 85. Clustering <ul><li>How do I implement Clustering? </li></ul><ul><li>Build out your taxonomy, then extract entities from content and categorize based on derived metadata (facets) </li></ul>
  86. 86. Categorizing content Statistical/linguistic Rules-based These documents look similar due to an analysis of word patterns – lets put them into the same group These documents look similar based on some rule that have created (they contain marketing plans and are about the newest widget) lets put them into the same group
  87. 87. Building a Taxonomy Slice from a set of Search Results: Courtesy of Raritan Technologies www.raritantechnologies.com
  88. 88. <ul><li>Disambiguation </li></ul>
  89. 89. Disambiguation of search results <ul><li>What is Disambiguation? </li></ul><ul><li>If a user enters a broad term (like “mobile”) the taxonomy can return terms that help the user select a more precise terms </li></ul><ul><li>Includes multiple approaches: </li></ul><ul><ul><li>Term expansion </li></ul></ul><ul><ul><li>Complex lookups </li></ul></ul>
  90. 90. Disambiguation methods <ul><li>Show related search terms with check boxes in the search results page. </li></ul><ul><li>Show additional search terms as links, perhaps with a prompt - &quot;You might also be interested in:&quot; </li></ul><ul><li>Expand the query and show the expanded words in the search box </li></ul><ul><li>Expand the query invisibly </li></ul>
  91. 91. Disambiguation of search results Mobile data terminals Handheld computers Network Infrastructure Mobile switches Phones Fixed mobile car phones Mobile phones Software applications Mobile applications Two way radios Mobile radios Intelligent video solutions Mobile video enforcer Mobile video sharing MESH Solutions Multi-radio mobile broadband Mobile Computing Mobile application Presenting term in multiple contexts mobile
  92. 92. From Associative Relationships
  93. 93. Disambiguation of search results <ul><li>How does Taxonomy Help? </li></ul><ul><li>When content is tagged and placed in a taxonomy a thesaurus can be used suggest related, broader or narrower terms, allowing the user to search in the areas most appropriate to their search needs </li></ul><ul><li>Taxonomy helps users discover related content or concept relationships they were not aware of </li></ul>
  94. 94. Disambiguation of search results <ul><li>When do I Use It </li></ul><ul><li>Similar to Clustering, disambiguation is very helpful separating large amounts of documents into specific categories, based on the categories of the taxonomy </li></ul>
  95. 95. Disambiguation of search results <ul><li>How Do I Implement Disambiguation Methods? </li></ul><ul><li>Need to integrate thesaurus with search engine </li></ul><ul><li>Can be accomplished through custom frameworks, web services, API calls </li></ul><ul><li>Thesaurus values can live inside of search engine, in taxonomy management tool, in spreadsheets or databases or in public sources </li></ul>
  96. 96. <ul><li>Federated Search </li></ul>
  97. 97. Federated Search <ul><li>Content is contained in multiple applications </li></ul><ul><li>Primarily a means to unify content sources and integrate repositories </li></ul><ul><li>Users typically have to enter searches in more than one location or use one result to drive another search </li></ul><ul><li>In some cases, legacy applications might use non preferred terms in metadata fields </li></ul>
  98. 98. Term Expansion and Federation Search Clinical trial leader searches compound data. Enters US Trade Name of compound. 1 Results All results are presented. 5 Allows Federated Search engine to search associated data sources. 4 Returns names of Clinical Trials associated with compounds. 3 Registry Clinical Trials Registry Generic Chemical Registry returns generic and chemical names. 2
  99. 99. Federated search <ul><li>If we maintain context of a piece of information in our search results, this is equivalent to having additional metadata on that content </li></ul>Search results organized by repository This is a form of “federated” search – a single search term fed to multiple repositories Example courtesy of Morrison and Foerster
  100. 100. Example Solution: Term expansion Term expansion using public source of terms (CRISP – “Computer Retrieval of Information on Scientific Projects”) Courtesy of Raritan Technologies www.raritantechnologies.com Search on – “Hypertension”
  101. 101. Example Solution: Term expansion Term expansion using public source of terms (CRISP – “Computer Retrieval of Information on Scientific Projects”) Courtesy of Raritan Technologies www.raritantechnologies.com Search on – “Hypertension” Returns back – “ High blood pressure” – “ Congenital High Blood pressure” – “ Systolic hypertension” – etc…
  102. 102. Example Solution: Term expansion <ul><li>Term expansion using public source of terms </li></ul><ul><ul><li>CRISP – “Computer Retrieval of Information on Scientific Projects”) </li></ul></ul>Courtesy of Raritan Technologies www.raritantechnologies.com Can be implemented with any search engine and any thesaurus source (spreadsheet, relational database, public source) Search on – “Hypertension” Returns back – “ High blood pressure” – “ Congenital High Blood pressure” – “ Systolic hypertension” – etc… Then executes search to return broader result set
  103. 103. Why you will not just “use a folksonomy” <ul><li>All content is not equal </li></ul><ul><li>Higher value content requires more rigor </li></ul><ul><li>Social tagging is still immature </li></ul><ul><li>May be appropriate for some kinds of content </li></ul><ul><li>On systems open to large user groups, esoteric tags which are understood by a only minority of users tend to proliferate </li></ul><ul><ul><li>burdens users </li></ul></ul><ul><ul><li>decreases system efficiency </li></ul></ul><ul><li>Core to folksonomies are the flaws that formal classification systems are designed to eliminate, such as redundancy, misspelling, etc. </li></ul><ul><li>Taxonomists/ontologists argue that an agreed-to set of tags enables more efficient indexing and searching of content </li></ul>
  104. 104. earley earley & associates earley & associates inc earley & associates needham, massachusets earley & associates taxonomy earley & associates, inc earley & associates, inc. earley & earley associates earley and associates earley and associates inc earley and associates seth earley and associates taxonomy earley assoc earley associates earley associates address earley associates boston earley associates wordmap earley financial earley jumpstart earley taxonomy earley taxonomy & metadata jumpstart call: managing structured metadata and taxonomies earley.com early & associates early and associates taxanomic classification of the freycinetia taxonimic classification of humans taxonomic and dichotomus taxonomic classification taxonomic classification human taxonomic genus of king cobra taxonomic implementation taxonomies of knowledge taxonomies project roadmap taxonomist job description taxonomy metadata taxonomy & metadata jumpstart - 2007 taxonomy and false drops taxonomy and classifiation examples of animals taxonomy and metadata taxonomy and metadata jumpstart taxonomy c taxonomy classification taxonomy classification charts taxonomy community of practice taxonomy consulting taxonomy creation taxonomy creation management taxonomy defined taxonomy deployment taxonomy development process taxonomy implementation taxonomy iqpc taxonomy job description taxonomy maintenance taxonomy management taxonomy management job title taxonomy management tools taxonomy metadata taxonomy models for project management taxonomy of global executives taxonomy of man taxonomy search taxonomy seth early taxonomy structure business organisation taxonomy training taxonomy validation taxonomy(2007) taxonomy, mlis taxonomy/classification.online
  105. 105. Conclusions <ul><li>Search engines, no matter how sophisticated, do not obviate the need for taxonomies </li></ul><ul><li>Content value in the context of a work process will determine the level of required structure </li></ul><ul><li>There is no “one size fits all” </li></ul><ul><li>Search should be treated as an integral part of your applications and systems </li></ul><ul><li>Google doesn’t always get it right… </li></ul>
  106. 106. Earley & Associates: #1 on Google for Silver Mining Tools
  107. 107. Taxonomy Community of Practice Calls <ul><li>Yahoo Group url: http://finance.groups.yahoo.com/group/TaxoCoP </li></ul><ul><li>Upcoming call topics: </li></ul><ul><li>Taxonomies & the Semantic Web </li></ul><ul><li>Taxonomy Validation </li></ul><ul><li>Blending Folksonomies & Taxonomies </li></ul><ul><li>Proving the ROI </li></ul><ul><li>Facets and Taxonomies </li></ul><ul><li>Multi-lingual Taxonomies </li></ul><ul><li>Getting Management Buy-In </li></ul><ul><li>Auto-categorization: Tools and Implementation </li></ul><ul><li>Beyond Auto-Categorization: Next Steps </li></ul><ul><li>Taxonomy Tools & Software: Beyond Excel </li></ul><ul><li>Taxonomy/Categorization Included </li></ul><ul><li>Taxonomy Project Deliverables: What to Promise and When </li></ul><ul><li>Taxonomy CoP Wiki at http://taxocop.wikispaces.com/ </li></ul>
  108. 108. Research Reports and White Papers <ul><li>Aligning Business Technology Goals </li></ul><ul><li>Deriving a Taxonomy: Assembling Terms for a Consistent Point-of-View </li></ul><ul><li>Indexing & Taxonomies: Finding the Best Way to Organize Online Content </li></ul><ul><li>Knowledge Mapping - A Fast Way to the Heart of the Organization </li></ul><ul><li>Making the Business Case for Enterprise Taxonomy </li></ul><ul><li>Managing Multiple Facets & Polyhierarchy </li></ul><ul><li>Measuring the Success of a Taxonomy Project: Tuning Content Categories for Continuous Improvement </li></ul><ul><li>Retrospective Indexing: Strategies for Cataloging Legacy Content </li></ul><ul><li>Taxonomy Metadata & Search </li></ul><ul><li>Text Mining: Search's Silver Lining </li></ul>
  109. 109. Questions? Seth Earley [email_address] www.earley.com 781-444-0287
  110. 110. Community of Practice Calls <ul><li>Taxonomy Group url: http://finance.groups.yahoo.com/group/TaxoCoP </li></ul><ul><li>Search Group url: http://tech.groups.yahoo.com/group/SearchCoP </li></ul><ul><li>Upcoming calls: </li></ul><ul><ul><ul><li>February- Best Practices for Faceted Search </li></ul></ul></ul><ul><ul><ul><li>March - Auto Tagging Requirements & Advances </li></ul></ul></ul><ul><ul><ul><li>April - Content Migration </li></ul></ul></ul><ul><ul><ul><li>May - Global Taxonomy Management </li></ul></ul></ul><ul><ul><ul><li>June - Taxonomy for Portals </li></ul></ul></ul><ul><ul><ul><li>July - Conducting a Search Audit </li></ul></ul></ul><ul><ul><ul><li>August- Heuristic Evaluation of Taxonomies </li></ul></ul></ul><ul><ul><ul><li>September - Taxonomy Usability Testing </li></ul></ul></ul><ul><ul><ul><li>October - Developing an Ontology </li></ul></ul></ul><ul><ul><ul><li>November - Applications for Topic Maps </li></ul></ul></ul><ul><ul><ul><li>December - Taxonomy Management </li></ul></ul></ul>
  111. 111. Research Reports and White Papers <ul><li>Go to http://www.earley.com/Articles.asp </li></ul><ul><li>Aligning Business Technology Goals </li></ul><ul><li>Deriving a Taxonomy: Assembling Terms for a Consistent Point-of-View </li></ul><ul><li>Indexing & Taxonomies: Finding the Best Way to Organize Online Content </li></ul><ul><li>Knowledge Mapping - A Fast Way to the Heart of the Organization </li></ul><ul><li>Making the Business Case for Enterprise Taxonomy </li></ul><ul><li>Managing Multiple Facets & Polyhierarchy </li></ul><ul><li>Measuring the Success of a Taxonomy Project: Tuning Content Categories for Continuous Improvement </li></ul><ul><li>Retrospective Indexing: Strategies for Cataloging Legacy Content </li></ul><ul><li>Taxonomy Metadata & Search </li></ul><ul><li>Text Mining: Search's Silver Lining </li></ul>
  112. 112. Questions? Seth Earley [email_address] www.earley.com 781-820-8080

×