  1. 1. Recall Best Practices: Making the Most of Search Navigators Precision Contents: 1. Introduction 2. The role of search navigators 3. Their place in the ecosystem 4. Getting the most from search navigators 5. Further reading 1. Introduction Search Technologies has provided more than 20,000 consultant-days of search implementation services during the last 4 years, working with a variety of leading search products. Our engagements range from corporate intranets and knowledge management systems, to search applications for content-rich websites, classifieds, and e-commerce. Search navigators 1 are now commonplace within non-trivial search applications. This brief paper explores the reasons for their success, positions search navigators in relation to other common approaches to search, and discusses how to maximize their effectiveness. For those unfamiliar with the concept of search navigators, two examples of their use follow. Both of these applications serve navigators in the left-side column: Classifieds: http://shop.ebay.com/?_from=R40&_trksid=p3907.m38.l1313&_nkw=sony&_sacat=See-All-Categories Government: http://www.gpo.gov/fdsys/search/search.action?na=_accodenav&se=_CRECfalse&sm=&flr=&ercode=&dateBrowse=&st=freedom+ of+information&=freedom+of+information&psh=&sbh=&tfh=&originalSearch=freedom+of+information&sb=re&ps=10&sb=re&ps= 10 These are public-facing search applications, but the approach illustrated is just as relevent behind the firewall. 1 Search navigators are also called guided navigation or facetted search
  2. 2. 2. The role of search navigators The search process can be viewed as consisting of two simple steps: a. The formation of a search clue b. The browsing of results An iterative process of search clue improvement is often necessary, this has always been the case. A large search system twenty years ago would initially reply to a search request with a number (of documents matching the criteria) rather than a results list, and invite the user to provide additional search terms to reduce that number to a manageable quantity, which could then be displayed and browsed. This often resulted in long search clues containing a mixture of full text and fielded terms. A typical “advanced search page” provides a helpful UI for achieving the same thing – the building of a great search clue - but without the need to know specific syntax. Enabled by modern search architectures and fast servers, search navigators play this important role today. The role is this: Search navigators help users to quickly reduce the search scope through single clicks. Put another way, search navigators are the most efficient mechanism yet implemented to help the user to build a great search clue. Added value Well-constructed search navigators go beyond being efficient mechanisms. They also provide feedback and insight to the user to guide the process of search scope reduction. This is particularly helpful to new users who, as a by-product of search activity, can quickly learn about the structure and distribution of content. For regular users, well-structured navigators provide a continuing education into the make-up of the dataset in a non-intrusive fashion. With time, this leads to more sophisticated use of both the search facility and content resources as a whole. Making better use of existing resources is a key goal for most intranet and knowledge management initiatives. It is the added value of providing actionable insight and a continuous education about the available content that truly separates search navigators from earlier approaches. 3. Navigators’ place in the search ecosystem The search software industry has for many years been technology led, with the various vendors evangelizing their favoured algorithms and approaches. It may be useful to briefly position search navigators relative to some of these. Earlier, it was suggested that search can be seen as a simple two step process. Of course, most modern search applications will present both the search results and opportunities to further refine the search within the same page. However, in positioning the various technological approaches, it is useful to keep the two steps separate. Let’s expand this theme and look at both in more detail: a. Formation of a search clue: The objective of this step is the reduction of the search scope to a point where the desired information can be conveniently found during results browsing
  3. 3. b. Browsing of results: The interactive inspection of a hit list to identify the desired information. The two steps must obviously work together and in some applications, one might dominate. a. Formation of a search clue The role of search navigators is firmly within this part of the search process, supporting human- decision making and efficiency of search scope reduction. Other technologies with something to contribute to this part of the process include: Tagging: Category taggers, entity extraction and other parsing methods that create additional metadata to populate search navigators Query parsing: Enriching queries with synonyms and other related terms, and where the search engine provides an appropriate query language, optionally customizing relevancy calculations. Clustering techniques: These compare the contents of documents as a whole and can sort search results into similar groupings using statistical techniques. Often these groupings are presented as a type of search navigator. b. Browsing results In this part of the search process, the user is presented with an ordered listing of what remains within the search scope. The primary issue is the order of presentation. Technologies and methods which can contribute include: Basic sorting: Using fielded information from the search index, such as ordering by date, price or distance Generic relevance: Fifteen years ago, keyword density and the ability to favor rare (assumed to be more important) keywords were mainstream approaches to ordering search results by relevance. Many other factors have since been added to relevancy calculations, including word proximity, contextual evidence (a semantically-based technique in which the presence of related words supports the relevance of keywords) and favoring specific areas of documents, such as titles or section headings. Such methods are present, to some extent, in most contemporary search applications, forming a baseline for relevance judgement. Off-page criteria: Factors other than document content, such as adjusting relevance based on the document’s original location, or on incoming links in a hyperlinked environment Polularity: Based on the historical behavior or contributions of the community as a whole, this class of relvancy measurement can be used in an absolute way to order results, or as an influencer of relevance. Factors include: What people previously bought, or viewed Ratings and opinions actively provided by other users Automatically derived measures based on the observation of visitor behaviour on a website as a whole Personalization: Ranking based on personally identifiable information has implications and issues for some communities, and is generally blended into relevancy calculations with some subtlety rather than being used for explicit results ordering. Google’s main web search offering currently does this. The main methods are: Influencing results ordering based on pre-defined criteria that have been volunteered by the user
  4. 4. Influencing results ordering based on observed previous behaviour of the individual user. An important reason for the widespread adoption of search navigators in sophisticated search systems is that they are complimentary rather than antagonistic to all of these other popular approaches. 4. Getting the most from search navigators Great search navigators exhibit two primary properties: Accuracy: The user needs to be able to trust search navigators to provide accurate information Contextual relevance: The most useful navigators are those that have been built specifically for the application. Users searching for an automobile will value a completely different set of navigators to users looking for stock market investment ideas. The key to delivering accuracy and contextual relevance is data preparation prior to indexing. Data preparation for search There are a wide range of techniques available for use in data preparation for search. Each application must deal with its own unique combination of data and users, and to get the best from search navigators, every applications should be approached on its merits. Specific technologies can often be helpful, especially in established niche applications, but in general, technology should be the assistant rather than the project focus. In our experience, the most important success factors are staff experience, well-practiced methodologies and a pragmatic approach. Knowing which of the many available extraction or matching techniques is suitable to an application is key to a successful outcome. The importance of data preparation goes beyond the accurate extraction of information to drive search navigators. Data cleansing, merging, splitting and enriching also improvesthe efficiency of the search experience as a whole. In struggling search applications, criticized by users in terms of relevancy or accuracy, the search engine is often not the problem – rather it is the poor quality of data being fed to the search engine that is causing issues. Search engine vendors only have themselves to blame for this – the industry has a history of over-selling the capabilities of technology to automatically overcome basic issues such as poor data quality. The good news for today’s buyer of enterprise search technology is this: Search is now a mature market and the leading products have all of the necessary capabilities to support most search applications. Comparison with the (even more) mature database market is insightful. Today, there are very few use cases where is it necessary to worry whether Oracle, DB2, SQLServer or MySQL is capable of providing the necessary functions or throughput. For the majority of structured data processing needs, it is the application-layer rather than the choice of database that makes the difference. Search engines are reaching this point too. At Search Technologies, we work with a range of leading search software vendors and we value our independence. Whatever your search engine of choice, proprietary or open source, if you need to provide an important search application to your users then we can help you to arrange clean, accurate and contextual data to feed search navigators and help your search engine provide a great service to users.
  5. 5. Although dilligent data preparation is not the only thing you’ll need to do, it is the foundation on which many successful search applications are built. 5. Further reading Best Practices: A Document Processing Methodology for Search Case Study: United States Government Printing Office A short glossary of data preparation tasks -------------------------- Search Technologies Corporation Search Technologies Limited 590 Herndon Parkway, Suite 375 Kingswick House Herndon, VA 20170 Sunninghill, Berkshire T: +1 703 953 2791 T: +44 1344 292 292 jback@searchtechnologies.com gcharlesworth@searchtechnologies.com www.searchtechnologies.com