David Hawking - The Search Master's Toolbox


Published on

David Hawking, Funnelback's Chief Scientist, presented "The Search Master's Toolbox" at Online Information 2010 in London.

The talk provided considerations and advice for website and marketing managers to apply to search solutions employed in their organisations. It highlighted the reasons why search is so vitally important to the overall success of a website and provided information on the tools required to deliver and optimise an effective search solution.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

David Hawking - The Search Master's Toolbox

  1. 1. Online Information London 30 Nov 2010 The Search Master's Toolbox David Hawking Funnelback / Squiz
  2. 2. Funnelback’s UK Customers <ul><li>From 2004/5 : Staffordshire University, Scottish Care Commission </li></ul><ul><li>From 2009 :The Electoral Commission, Digital UK, Hargreaves Lansdown </li></ul><ul><li>From 2010 : LSE, The Electoral Commission, Incisive Media, British Medical Journal, East Ayrshire Council, Skype international, UCL, ... </li></ul>
  3. 3. “ Search is life”
  4. 4. Costs of poor search <ul><li>Butler Group: Up to 10% of salary costs wasted through ineffective search </li></ul><ul><li>IDC: A company with 1000 information workers wastes more than $5M p.a. due to poor search </li></ul><ul><li>Accenture: Survey of 1000 middle mgrs show they spend up to 2 hrs/day searching. </li></ul><ul><li>Econsultancy: Only 41% of companies satisfied that their site search is delivering on business objectives. </li></ul><ul><li>ABC Shop: 24% increase in online sales after upgrade in search effectiveness </li></ul>Search is a critical part of the web experience.
  5. 5. Who's the SearchMaster in your organisation?
  6. 6. Stakeholders expect every SearchMaster to do her duty! <ul><li>To make external website search work </li></ul><ul><ul><li>Sales conversions </li></ul></ul><ul><ul><li>Information dissemination </li></ul></ul><ul><ul><li>Reduced inquiry handling load </li></ul></ul><ul><li>To provide effective search of corporate information </li></ul><ul><ul><li>Happy, productive employees (plus students and other stakeholders) </li></ul></ul>
  7. 7. Give them the tools and they will do the job! <ul><li>Searchmaster </li></ul><ul><li>End-user </li></ul><ul><li>Simple </li></ul><ul><li>Powerful </li></ul>
  8. 8. 1. The basic search tool <ul><li>Should: </li></ul><ul><ul><li>Have good performance out of the box, without weeks of implementation. </li></ul></ul><ul><ul><li>Be simple to configure </li></ul></ul><ul><ul><li>Avoid features which are too complex to use or set up. </li></ul></ul><ul><ul><li>Be able to cover your content and scale to the necessary level </li></ul></ul>
  9. 9. 2. FineTuner <ul><li>Every search deployment is different </li></ul><ul><ul><li>Web, database, fileshare, Lotus </li></ul></ul><ul><li>The weighting of ranking features must accommodate to the differences </li></ul><ul><li>Manual tweaking is fraught with danger </li></ul><ul><ul><li>Fix one query, break a dozen </li></ul></ul><ul><li>Make a test file and use a tuning tool to learn feature weightings </li></ul>
  10. 10. Testfile Desiderata <ul><li>Representative of real workload </li></ul><ul><ul><li>Need an unbiased sample </li></ul></ul><ul><li>Many queries (typically >> 100) </li></ul><ul><li>Identify the best answer(s) </li></ul><ul><li>Equivalent answers </li></ul><ul><li>See es.csiro.au/C-TEST/ </li></ul>
  11. 11. Spreadsheet testfile
  12. 12. Sources of testfiles at LSE <ul><li>A-Z Sitemap (>500 entries) </li></ul><ul><ul><li>Biased toward anchortext </li></ul></ul><ul><li>Keymatches file (>500 entries) </li></ul><ul><ul><li>Pessimistic </li></ul></ul><ul><li>Click data (>250 queries with > t clicks) </li></ul><ul><ul><li>Biased toward clicks – 100% success! </li></ul></ul><ul><li>Pop/crit queries (134 manually judged) </li></ul>All biased – Use a sampling tool!
  13. 13. Dimension-at-a-time tuning 1 2 3 dim2 dim1 dim1
  14. 14. Popular/Critical Set
  15. 15. Fine Tuning Summary <ul><li>Tuning a large number of dimensions (Funnelback FineTune covers 38) </li></ul><ul><li>Millions of query executions </li></ul><ul><li>Achieves substantial gains </li></ul>
  16. 16. But why do queries still fail? <ul><li>Misspelled </li></ul><ul><ul><li>Onlion Imformation </li></ul></ul><ul><li>Query words don't match document </li></ul><ul><ul><li>“ door” or “MOPEM” v. “manually operated personnel egress mechanism” </li></ul></ul><ul><li>There is no answer to that question. </li></ul><ul><ul><li>Maybe there should be </li></ul></ul><ul><ul><li>Scope issues. </li></ul></ul>
  17. 17. Need more tools!
  18. 18. 3. Spelling suggestion tools <ul><li>Suggestions may be useful even if words are correctly spelled: </li></ul><ul><ul><li>Manchester Untied -> Chelsea </li></ul></ul><ul><li>Suggestions based on whole query, not word-by-word </li></ul><ul><li>Don't suggest queries which make no sense in the collection being searched </li></ul><ul><li>Autocompletion: Guide users to the best query </li></ul><ul><li>Context is king </li></ul>
  19. 19. 4. Query expansion tools <ul><li>Manual rules: </li></ul><ul><ul><li>Rego -> [registration rego] </li></ul></ul><ul><ul><li>MOPEM ->[“manually operated personnel egress mechanism door”] </li></ul></ul><ul><li>Related queries (automatic) </li></ul><ul><ul><li>Based on co-clicking </li></ul></ul><ul><li>Contextual navigation (on-the-fly) </li></ul><ul><ul><li>Finding superphrases in a deep result set </li></ul></ul><ul><li>Faceting (semi-automatic) </li></ul>
  20. 22. 5. Reporting and alerting tools <ul><li>Reporting on Queries which: </li></ul><ul><ul><li>Produced no results </li></ul></ul><ul><ul><li>Logged behaviour suggestive of unfulfilment </li></ul></ul><ul><li>Alerting when: </li></ul><ul><ul><li>Submissions of a query (or group of related queries) sharply increase in frequency </li></ul></ul><ul><li>For: </li></ul><ul><ul><li>business intelligence </li></ul></ul><ul><ul><li>Triggering creation or changes to content </li></ul></ul>
  21. 23. Query spike alerting
  22. 24. Conclusions <ul><li>Search is important </li></ul><ul><li>Organisations benefit when someone takes responsibility for effective search – the SearchMaster. </li></ul><ul><li>The core search tool must be effective, and able to be adapted to your organisation's publishing and searching characteristics. </li></ul><ul><li>Further tools are needed to overcome poor queries and missing content. </li></ul>Thanks to Mike Swanson of Oxfam Australia for the Ned Kelly line.