Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Search Analytics What? Why? How?

2,060 views

Published on

Published in: Technology

Search Analytics What? Why? How?

  1. 1. Copyright 2011 Sematext Int'l. All rights reserved. Search Analytics What? Why? How? Otis Gospodneti ć – Sematext International
  2. 2. About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author: Lucene in Action 1 & 2 </li></ul><ul><li>Entrepreneur: Sematext, Simpy </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  3. 3. About Sematext <ul><li>Consulting, development, support: </li></ul><ul><li>Search (Lucene, Solr, Elastic Search...) </li></ul><ul><li>Big Data (Hadoop, HBase, Voldemort...) </li></ul><ul><li>Web Crawling (Nutch) </li></ul><ul><li>Machine Learning (Mahout) </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  4. 4. Agenda <ul><li>Intro: Otis & Sematext - DONE </li></ul><ul><li>What </li></ul><ul><li>Why </li></ul><ul><li>Reports </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  5. 5. What is Search Analytics? <ul><li>Input: queries and clicks </li></ul><ul><li>Output: reports – over time </li></ul><ul><li>Next: actions </li></ul><ul><li>The means, not the goal </li></ul><ul><li>Ongoing, not one-off </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  6. 6. Search Analytics and SEO <ul><li>Not the same </li></ul><ul><li>SA can help with SEO </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  7. 7. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></ul><ul><li>Ideally you can relate data from both or even unify it </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  8. 8. Why Search Analytics? <ul><li>Measure and monitor everything </li></ul><ul><li>Supports (re)design, navigation choices </li></ul><ul><li>Helps with content acquisition & enhancement </li></ul><ul><li>Improve search experience </li></ul><ul><li>Mula </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  9. 9. Report Groups <ul><li>Failures vs. non-failures </li></ul><ul><li>Actionable vs. non-actionable </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  10. 10. Failures <ul><li>Be aware of failures, but don't be one. </li></ul><ul><li>Zero hits </li></ul><ul><li>Low query CTR </li></ul><ul><li>High search exit rate </li></ul><ul><li>Irrelevant results </li></ul><ul><li>Over N refinements </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  11. 11. Report: Zero Hit Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Misspellings? </li></ul><ul><li>Synonyms? </li></ul><ul><li>No matching content? </li></ul><ul><li>Need (different) tagging? </li></ul><ul><li>Bad analysis? </li></ul><ul><li>Multilingual issue? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  12. 12. Report: Zero Hit Queries (cont.) <ul><li>Use Query Spellchecker (aka DYM) </li></ul><ul><li>Using AutoComplete </li></ul><ul><li>Using DYM ReSearcher </li></ul><ul><li>Designing No Results page </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  13. 13. Report: High Exit Rate Queries <ul><li>Disappointed, frustrated users </li></ul><ul><li>Major revenue loss, no second chance </li></ul><ul><li>Marriage with Web Analytics </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  14. 14. Report: Irrelevant Result Queries <ul><li>Manual: top N hits of top N queries </li></ul><ul><li>Judge relevance of each hit and assign score </li></ul><ul><li>Per-query score: sum scores of top N hits </li></ul><ul><li>Cumulative top N query score: sum per-query scores </li></ul><ul><li>Automated: Mean Reciprocal Rank (MRR) </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  15. 15. Report: Total Queries <ul><li>Search vs. navigation/browsing </li></ul><ul><li>Search vs. overall site usage </li></ul><ul><li>Related report: % of visits with search </li></ul><ul><li>Segment: new users vs. return users, etc. </li></ul><ul><li>Questions: do you count paging? Facet selection? Re-sorting? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  16. 16. Report: Total Distinct Queries <ul><li>What's distinct? Car vs. Cars </li></ul><ul><li># Total Queries / # Distinct Queries = Avg. # </li></ul><ul><li>Tied to performance and query cache utilization </li></ul><ul><li>Extension: Total distinct words in queries </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  17. 17. Report: Words Per Query <ul><li>Informative, slowly changing, not terribly actionable </li></ul><ul><li>Can affect search box size </li></ul><ul><li>Use AutoComplete if queries are long </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  18. 18. Report: Top Queries <ul><li>User intent and information needs </li></ul><ul><li>Ensure good results </li></ul><ul><li>Calculate MRR for top N queries </li></ul><ul><li>Calculate MRR for each top N query </li></ul><ul><li>Compare to global MRR </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  19. 19. Report: Top Queries (cont.) <ul><li>New top queries – new trend? New demand? </li></ul><ul><li>Best Bets (aka Query Elevation in Solr) </li></ul><ul><li>Expose before search is needed </li></ul><ul><li>Seasonality – hour of day, day of the week, etc. </li></ul><ul><ul><li>Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal) </li></ul></ul><ul><ul><li>Anticipate demand in the next cycle </li></ul></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  20. 20. Clickstream Analysis <ul><li>Query analysis is not a complete story: </li></ul><ul><ul><li>Queries </li></ul></ul><ul><ul><li>Clicks </li></ul></ul><ul><ul><li>(Trans)action </li></ul></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  21. 21. Query and Hit Valuation <ul><li>Query: by popularity (count) </li></ul><ul><li>Query: by CTR </li></ul><ul><li>Query: by subsequent (trans)action count/pct. </li></ul><ul><li>Hit: by click count </li></ul><ul><li>Hit: by subsequent (trans)action count/pct. </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  22. 22. Query and Hit Valuation (cont.) <ul><li>Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit) </li></ul><ul><li>Failures: high pop(q), yet low ctr(q) </li></ul><ul><li>high pop(q), high ctr(q), yet low action(q) </li></ul><ul><li>Integration with backend required </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  23. 23. Report: Low CTR Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  24. 24. Report: Queries with Most Clicks <ul><li>i.e. Queries with Highest CTR </li></ul><ul><li>Informative? Yes </li></ul><ul><li>Actionable? Somewhat: expose relevant content outside of search </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  25. 25. Search Session <ul><li>Search activity aimed at satisfying a specific information need in a some limited amount of time. </li></ul><ul><li>i.e. it's very fuzzy </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  26. 26. Interesting Search Sessions <ul><li>More than N queries in M minutes </li></ul><ul><li>Sessions that end in a failure </li></ul><ul><li>Sessions for specific type of info (e.g. person name, product name, event) </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  27. 27. Segmentation <ul><li>Searches that resulted in conversion vs. not </li></ul><ul><li>Search metrics for </li></ul><ul><li>New vs. returning visitors </li></ul><ul><li>English vs. French vs. Spanish vs. … </li></ul><ul><li>Chrome vs. IE </li></ul><ul><li>... </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  28. 28. More SA Reports/Questions <ul><li>% of queries from DYM vs. AC vs. typed </li></ul><ul><li>Most common queries per clicked hit </li></ul><ul><li>Which hits are generally popular? </li></ul><ul><li>Which hits are trending up? </li></ul><ul><li>Are there docs that are never ever clicked on? </li></ul><ul><li>Average number of queries per session </li></ul><ul><li>Breakdown of queries by number of hits </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  29. 29. More SA Reports/Questions <ul><li>Breakdown of queries by latency </li></ul><ul><li>Frequently used facets or sort criteria </li></ul><ul><li>Avg number of clicks per query </li></ul><ul><li>Time spent on site before/after searching </li></ul><ul><li>Search initiation pages </li></ul><ul><li>How deep into SERPs are people drilling? </li></ul><ul><li>Are too many clicks on pages other than 1 st ? </li></ul><ul><li>... </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
  30. 30. Data Collection Copyright 2011 Sematext Int'l. All rights reserved.
  31. 31. <ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@ sematext </li></ul><ul><li>@ otisg </li></ul><ul><li>[email_address] </li></ul>Contact Copyright 2011 Sematext Int'l. All rights reserved.

×