Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Search Analytics What? Why? How? Otis Gospodneti ć   –   Sematext International
About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout
Author: Lucene in Action 1 & 2
Entrepreneur: Sematext, Simpy </li></ul>
About Sematext <ul>Consulting, development, support: <li>Search  (Lucene, Solr, Elastic Search...)
Big Data  (Hadoop, HBase, Voldemort...)
Web Crawling  (Nutch)
Machine Learning  (Mahout) </li></ul>
Agenda <ul><li>Intro: Otis & Sematext - DONE
What
Why
Reports </li></ul>
What is Search Analytics? <ul><li>Input: queries and clicks
Output: reports – over time
Next: actions
The means, not the goal
Ongoing, not one-off </li></ul>
Search Analytics and SEO <ul><li>Not the same
SA can help with SEO </li></ul>
Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring
Hand in hand
Ideally you can relate data from both or even unify it </li></ul>
Why Search Analytics? <ul><li>Measure and monitor everything
Supports (re)design, navigation choices
Helps with content acquisition & enhancement
Improve search experience
Mula </li></ul>
Report Groups <ul><li>Failures vs. non-failures
Actionable vs. non-actionable </li></ul>
Failures <ul>Be aware of failures, but don't be one. <li>Zero hits
Low query CTR
Upcoming SlideShare
Loading in …5
×

Search analytics what why how - By Otis Gospodnetic

825 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Search analytics what why how - By Otis Gospodnetic

  1. 1. Search Analytics What? Why? How? Otis Gospodneti ć – Sematext International
  2. 2. About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout
  3. 3. Author: Lucene in Action 1 & 2
  4. 4. Entrepreneur: Sematext, Simpy </li></ul>
  5. 5. About Sematext <ul>Consulting, development, support: <li>Search (Lucene, Solr, Elastic Search...)
  6. 6. Big Data (Hadoop, HBase, Voldemort...)
  7. 7. Web Crawling (Nutch)
  8. 8. Machine Learning (Mahout) </li></ul>
  9. 9. Agenda <ul><li>Intro: Otis & Sematext - DONE
  10. 10. What
  11. 11. Why
  12. 12. Reports </li></ul>
  13. 13. What is Search Analytics? <ul><li>Input: queries and clicks
  14. 14. Output: reports – over time
  15. 15. Next: actions
  16. 16. The means, not the goal
  17. 17. Ongoing, not one-off </li></ul>
  18. 18. Search Analytics and SEO <ul><li>Not the same
  19. 19. SA can help with SEO </li></ul>
  20. 20. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring
  21. 21. Hand in hand
  22. 22. Ideally you can relate data from both or even unify it </li></ul>
  23. 23. Why Search Analytics? <ul><li>Measure and monitor everything
  24. 24. Supports (re)design, navigation choices
  25. 25. Helps with content acquisition & enhancement
  26. 26. Improve search experience
  27. 27. Mula </li></ul>
  28. 28. Report Groups <ul><li>Failures vs. non-failures
  29. 29. Actionable vs. non-actionable </li></ul>
  30. 30. Failures <ul>Be aware of failures, but don't be one. <li>Zero hits
  31. 31. Low query CTR
  32. 32. High search exit rate
  33. 33. Irrelevant results
  34. 34. Over N refinements </li></ul>
  35. 35. Report: Zero Hit Queries <ul><li>Popular queries; percentage, not raw count
  36. 36. Misspellings?
  37. 37. Synonyms?
  38. 38. No matching content?
  39. 39. Need (different) tagging?
  40. 40. Bad analysis?
  41. 41. Multilingual issue? </li></ul>
  42. 42. Report: Zero Hit Queries (cont.) <ul><li>Use Query Spellchecker (aka DYM)
  43. 43. Using AutoComplete
  44. 44. Using DYM ReSearcher
  45. 45. Designing No Results page </li></ul>
  46. 46. Report: High Exit Rate Queries <ul><li>Disappointed, frustrated users
  47. 47. Major revenue loss, no second chance
  48. 48. Marriage with Web Analytics
  49. 49. Relevance bad?
  50. 50. Default ordering bad?
  51. 51. Titles of hits need adjusting?
  52. 52. Search terms highlighting looking bad?
  53. 53. Bad thumbnails? Need thumbnails? </li></ul>
  54. 54. Report: Irrelevant Result Queries <ul><li>Manual: top N hits of top N queries
  55. 55. Judge relevance of each hit and assign score
  56. 56. Per-query score: sum scores of top N hits
  57. 57. Cumulative top N query score: sum per-query scores
  58. 58. Automated: Mean Reciprocal Rank (MRR) </li></ul>
  59. 59. Report: Total Queries <ul><li>Search vs. navigation/browsing
  60. 60. Search vs. overall site usage
  61. 61. Related report: % of visits with search
  62. 62. Segment: new users vs. return users, etc.
  63. 63. Questions: do you count paging? Facet selection? Re-sorting? </li></ul>
  64. 64. Report: Total Distinct Queries <ul><li>What's distinct? Car vs. Cars
  65. 65. # Total Queries / # Distinct Queries = Avg. #
  66. 66. Tied to performance and query cache utilization
  67. 67. Extension: Total distinct words in queries </li></ul>
  68. 68. Report: Words Per Query <ul><li>Informative, slowly changing, not terribly actionable
  69. 69. Can affect search box size
  70. 70. Use AutoComplete if queries are long </li></ul>
  71. 71. Report: Top Queries <ul><li>User intent and information needs
  72. 72. Ensure good results
  73. 73. Calculate MRR for top N queries
  74. 74. Calculate MRR for each top N query
  75. 75. Compare to global MRR </li></ul>
  76. 76. Report: Top Queries (cont.) <ul><li>New top queries – new trend? New demand?
  77. 77. Best Bets (aka Query Elevation in Solr)
  78. 78. Expose before search is needed
  79. 79. Seasonality – hour of day, day of the week, etc. </li><ul><li>Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal)
  80. 80. Anticipate demand in the next cycle </li></ul></ul>
  81. 81. Clickstream Analysis <ul><li>Query analysis is not a complete story: </li><ul><li>Queries
  82. 82. Clicks
  83. 83. (Trans)action </li></ul></ul>
  84. 84. Query and Hit Valuation <ul><li>Query: by popularity (count)
  85. 85. Query: by CTR
  86. 86. Query: by subsequent (trans)action count/pct.
  87. 87. Hit: by click count
  88. 88. Hit: by subsequent (trans)action count/pct. </li></ul>
  89. 89. Query and Hit Valuation (cont.) <ul><li>Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit)
  90. 90. Failures: high pop(q), yet low ctr(q)
  91. 91. high pop(q), high ctr(q), yet low action(q)
  92. 92. Integration with backend required </li></ul>
  93. 93. Report: Low CTR Queries <ul><li>Popular queries; percentage, not raw count
  94. 94. Relevance bad?
  95. 95. Default ordering bad?
  96. 96. Titles of hits need adjusting?
  97. 97. Search terms highlighting looking bad?
  98. 98. Bad thumbnails? Need thumbnails? </li></ul>
  99. 99. Report: Queries with Most Clicks <ul><li>i.e. Queries with Highest CTR
  100. 100. Informative? Yes
  101. 101. Actionable? Somewhat: expose relevant content outside of search </li></ul>
  102. 102. Search Session <ul><li>Search activity aimed at satisfying a specific information need in a some limited amount of time.
  103. 103. i.e. it's very fuzzy </li></ul>
  104. 104. Interesting Search Sessions <ul><li>More than N queries in M minutes
  105. 105. Sessions that end in a failure
  106. 106. Sessions for specific type of info (e.g. person name, product name, event) </li></ul>
  107. 107. Segmentation <ul><li>Searches that resulted in conversion vs. not
  108. 108. Search metrics for
  109. 109. New vs. returning visitors
  110. 110. English vs. French vs. Spanish vs. …
  111. 111. Chrome vs. IE
  112. 112. ... </li></ul>
  113. 113. More SA Reports/Questions <ul><li>% of queries from DYM vs. AC vs. typed
  114. 114. Most common queries per clicked hit
  115. 115. Which hits are generally popular?
  116. 116. Which hits are trending up?
  117. 117. Are there docs that are never ever clicked on?
  118. 118. Average number of queries per session
  119. 119. Breakdown of queries by number of hits </li></ul>
  120. 120. More SA Reports/Questions <ul><li>Breakdown of queries by latency
  121. 121. Frequently used facets or sort criteria
  122. 122. Avg number of clicks per query
  123. 123. Time spent on site before/after searching
  124. 124. Search initiation pages
  125. 125. How deep into SERPs are people drilling?
  126. 126. Are too many clicks on pages other than 1 st ?
  127. 127. ... </li></ul>
  128. 128. Data Collection
  129. 129. <ul><li>sematext.com
  130. 130. blog.sematext.com
  131. 131. @ sematext
  132. 132. @ otisg
  133. 133. [email_address] </li></ul>Contact

×