Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Searchmaster's Toolbox - David Hawking, Funnelback Search


Published on

David Hawking, pre-eminent information retrieval researcher and Funnelback's Chief Scientist, gave this talk on the need for a Search Master within all but the smallest organisations at a Funnelback Seminar in London on March 31st, 2010. Even if there isn't an individual with that specific job title, the responsibility for maintaining, improving and monitoring search needs to be prioritised and clearly assigned. David's presentation covers the reasons why search is so vitally important and the tools which can improve search results.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

The Searchmaster's Toolbox - David Hawking, Funnelback Search

  1. 1. The SearchMaster's Toolbox Westminster Abbey 31 March 2010 David Hawking
  2. 2. Funnelback Overview <ul><li>Hosted (SaaS) search </li></ul><ul><ul><li>Web sites </li></ul></ul><ul><ul><li>Portals </li></ul></ul><ul><li>Enterprise search </li></ul><ul><ul><li>Intranets/Web </li></ul></ul><ul><ul><li>Databases </li></ul></ul><ul><ul><li>File shares </li></ul></ul><ul><ul><li>Corporate repositories (e.g. Lotus, TRIM, SharePoint and others) </li></ul></ul><ul><li>Search Applications </li></ul><ul><ul><li>Geospatial </li></ul></ul><ul><ul><li>Plagiarism detection </li></ul></ul><ul><ul><li>E-commerce </li></ul></ul><ul><li>Contracted Development & Professional Services </li></ul>
  3. 3. UK Customers <ul><li>From pre-2009 : Staffordshire University, Scottish Care Commission </li></ul><ul><li>From 2009 :The Electoral Commission, Digital UK, Hargreaves Lansdown </li></ul><ul><li>From 2010 : London School of Economics and Political Science, Incisive Media, British Medical Journal, East Ayrshire Council, ... </li></ul>
  4. 4. First UK Customer - 2004
  5. 5.   Funnelback Cloud How we deliver search Funnelback Enterprise Funnelback Hosted ABC Instant scalability and redundancy A fully managed search service that’s instantly scalable and redundant through our specialised search cloud environment.  Funnelback Cloud can be deployed in days, not months. For web sites and online catalogues, complete control over the look and feel allows a seamless experience for your users Dedicated and fully managed A fully managed search service hosted on dedicated hardware that’s especially architected to meet the most challenging search environments. For web sites and online catalogues, complete control over the look and feel allows a seamless experience for your users Self managed Complete control over the Funnelback search software on a self-managed Windows or Linux platform. Our search experts can help your organisation search across web sites, intranets, database, file shares, SharePoint sites, TRIM, staff directories and other repositories in a single query.
  6. 6. “ We have been extremely happy with the product and the overall service. We have significant search volume and Funnelback has stood up to the test” Mary Ramsay National Manager Interactive Services  
  7. 7. Funnelback Enterprise XML Relational Databases e.g. Oracle, MS SQL Web Sites Shared Drives G: Intranet Images Corporate Repositories e.g. Lotus TRIM SharePoint
  8. 8. “ Funnelback has taken our complex internal information environment and made it accessible” Mike Swanson, Oxfam Knowledge and Information Services Team Leader  
  9. 9. “ Search is life”
  10. 10. Costs of poor search <ul><li>Butler Group: Up to 10% of salary costs wasted through ineffective search </li></ul><ul><li>IDC: A company with 1000 information workers can expect to waste more than $5M p.a. due to poor search </li></ul><ul><li>Accenture: A survey of 1000 middle managers spend as long as 2 hrs/day searching for information. </li></ul>
  11. 11. Who's the SearchMaster in your organisation?
  12. 12. Stakeholders expect every SearchMaster to do her duty! <ul><li>To make external website search work </li></ul><ul><ul><li>Sales conversions </li></ul></ul><ul><ul><li>Information dissemination </li></ul></ul><ul><ul><li>Reduced inquiry handling load </li></ul></ul><ul><li>To provide effective search of corporate information </li></ul><ul><ul><li>Happy, productive employees (plus students and other stakeholders) </li></ul></ul>
  13. 13. Give them the tools and they will do the job! <ul><li>Searchmaster </li></ul><ul><li>End-user </li></ul><ul><li>Simple </li></ul><ul><li>Powerful </li></ul>
  14. 14. 1. The basic search tool <ul><li>Coverage and scale </li></ul><ul><li>Quick and easy to install </li></ul><ul><li>Good out-of-the-box performance </li></ul><ul><ul><li>It can find the answers people want </li></ul></ul><ul><li>Simple to configure </li></ul><ul><ul><li>Avoid features which are too complex to use or set up. </li></ul></ul>
  15. 15. Time to set up Funnelback <ul><li>As little as a couple of minutes to install the product. </li></ul><ul><li>As little as a couple of minutes to create a collection and start updating </li></ul><ul><li>A couple of hours to customise look and feel </li></ul><ul><li>A few minutes to activate contextual navigation, faceting and featured pages. </li></ul>
  16. 16. Perceived Funnelback advantages <ul><li>Tried and tested algorithm (20 yrs dev.t) </li></ul><ul><li>Control from admin interface </li></ul><ul><li>Customisability of business logic / Open APIs </li></ul><ul><li>Flexible result presentation </li></ul><ul><li>Professional services model </li></ul><ul><li>Open published price list </li></ul>
  17. 17. 2. FineTuning <ul><li>Every search deployment is different </li></ul><ul><ul><li>Web, database, fileshare, Lotus </li></ul></ul><ul><li>The weighting of ranking features must accommodate to the differences </li></ul><ul><li>Manual tweaking is fraught with danger </li></ul><ul><ul><li>Fix one query, break a dozen </li></ul></ul><ul><li>Make a test file and use Funnelback FineTune </li></ul>
  18. 18. Spreadsheet testfile
  19. 19. Testfile Desiderata <ul><li>Representative of real workload </li></ul><ul><ul><li>Unbiased sample </li></ul></ul><ul><ul><li>Good estimate of actual performance </li></ul></ul><ul><ul><li>Best tuning </li></ul></ul><ul><li>Many queries (e.g. > 100) </li></ul><ul><li>Multiple weighted answers (where applicable) </li></ul>
  20. 20. Break to Demo of FineTuning (based on popular/important queries and data from LSE)
  21. 22. Other LSE Testfiles <ul><li>Click logs – Right answers are those that users clicked </li></ul><ul><ul><li>Possible to tune to 0% failure rate (unrealistic) </li></ul></ul><ul><li>From keymatches file created for previous search technology(it had 709 rules) </li></ul><ul><ul><li>Failure rate is higher (unrealistic) </li></ul></ul><ul><li>Random sample is the preferred approach </li></ul>
  22. 23. FineTuning Summary <ul><li>Tuning 38 dimensions </li></ul><ul><li>Millions of query executions </li></ul><ul><li>Achieves substantial gains </li></ul><ul><li>Contact [email_address] </li></ul>
  23. 24. But why do some queries still fail? <ul><li>Misspelled </li></ul><ul><ul><li>Westminister Abbie </li></ul></ul><ul><li>Query words don't match language </li></ul><ul><ul><li>“ door” or “MOPEM” v. “manually operated personnel egress mechanism” </li></ul></ul><ul><li>There is no answer to that question. </li></ul><ul><ul><li>Maybe there should be </li></ul></ul><ul><ul><li>Scope issue? </li></ul></ul><ul><li>How information is published. </li></ul>
  24. 25. Need more tools!
  25. 26. 3. Spelling suggestion tools <ul><li>Suggestions may be useful even if words are correctly spelled: </li></ul><ul><ul><li>Carlton furball club -> Carlton football club </li></ul></ul><ul><li>Suggestions based on whole query, not word-by-word </li></ul><ul><li>Don't suggest queries which make no sense in the collection being searched </li></ul><ul><li>Autocompletion: Guide users to the best query </li></ul><ul><li>Context is king </li></ul>
  26. 27. 4. Query expansion tools <ul><li>Manual rules: </li></ul><ul><ul><li>Rego -> [registration rego] </li></ul></ul><ul><ul><li>MOPEM ->[“manually operated personnel egress mechanism door”] </li></ul></ul><ul><li>Related queries (automatic) </li></ul><ul><ul><li>Based on co-clicking </li></ul></ul><ul><li>Contextual navigation (on-the-fly) </li></ul><ul><ul><li>Finding superphrases in a deep result set </li></ul></ul><ul><li>Faceting (semi-automatic) </li></ul>
  27. 28. Related queries
  28. 30. Tools to tell you when you need to add, change or reorganize content.
  29. 31. 5. Reporting Tools <ul><li>Queries that produced no results </li></ul><ul><li>Click patterns that suggest user wasn't happy </li></ul><ul><li>Pattern analysis – query spike </li></ul>
  30. 32. Reporting
  31. 33. Query Spike - LPG
  32. 34. Pattern Analyser
  33. 35. Pattern Analyser - Timeplot
  34. 36. Conclusions <ul><li>Search is important </li></ul><ul><li>Organisations benefit when someone takes responsibility for effective search – the SearchMaster. </li></ul><ul><li>A good search tool provides a full kit of simple, effective tools to help the SearchMaster achieve just that. </li></ul>