Search analytics what why how - By Otis Gospodnetic

685 views
606 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
685
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 10 days of data (5K/min)
  • Search analytics what why how - By Otis Gospodnetic

    1. 1. Search Analytics What? Why? How? Otis Gospodneti ć – Sematext International
    2. 2. About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout
    3. 3. Author: Lucene in Action 1 & 2
    4. 4. Entrepreneur: Sematext, Simpy </li></ul>
    5. 5. About Sematext <ul>Consulting, development, support: <li>Search (Lucene, Solr, Elastic Search...)
    6. 6. Big Data (Hadoop, HBase, Voldemort...)
    7. 7. Web Crawling (Nutch)
    8. 8. Machine Learning (Mahout) </li></ul>
    9. 9. Agenda <ul><li>Intro: Otis & Sematext - DONE
    10. 10. What
    11. 11. Why
    12. 12. Reports </li></ul>
    13. 13. What is Search Analytics? <ul><li>Input: queries and clicks
    14. 14. Output: reports – over time
    15. 15. Next: actions
    16. 16. The means, not the goal
    17. 17. Ongoing, not one-off </li></ul>
    18. 18. Search Analytics and SEO <ul><li>Not the same
    19. 19. SA can help with SEO </li></ul>
    20. 20. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring
    21. 21. Hand in hand
    22. 22. Ideally you can relate data from both or even unify it </li></ul>
    23. 23. Why Search Analytics? <ul><li>Measure and monitor everything
    24. 24. Supports (re)design, navigation choices
    25. 25. Helps with content acquisition & enhancement
    26. 26. Improve search experience
    27. 27. Mula </li></ul>
    28. 28. Report Groups <ul><li>Failures vs. non-failures
    29. 29. Actionable vs. non-actionable </li></ul>
    30. 30. Failures <ul>Be aware of failures, but don't be one. <li>Zero hits
    31. 31. Low query CTR
    32. 32. High search exit rate
    33. 33. Irrelevant results
    34. 34. Over N refinements </li></ul>
    35. 35. Report: Zero Hit Queries <ul><li>Popular queries; percentage, not raw count
    36. 36. Misspellings?
    37. 37. Synonyms?
    38. 38. No matching content?
    39. 39. Need (different) tagging?
    40. 40. Bad analysis?
    41. 41. Multilingual issue? </li></ul>
    42. 42. Report: Zero Hit Queries (cont.) <ul><li>Use Query Spellchecker (aka DYM)
    43. 43. Using AutoComplete
    44. 44. Using DYM ReSearcher
    45. 45. Designing No Results page </li></ul>
    46. 46. Report: High Exit Rate Queries <ul><li>Disappointed, frustrated users
    47. 47. Major revenue loss, no second chance
    48. 48. Marriage with Web Analytics
    49. 49. Relevance bad?
    50. 50. Default ordering bad?
    51. 51. Titles of hits need adjusting?
    52. 52. Search terms highlighting looking bad?
    53. 53. Bad thumbnails? Need thumbnails? </li></ul>
    54. 54. Report: Irrelevant Result Queries <ul><li>Manual: top N hits of top N queries
    55. 55. Judge relevance of each hit and assign score
    56. 56. Per-query score: sum scores of top N hits
    57. 57. Cumulative top N query score: sum per-query scores
    58. 58. Automated: Mean Reciprocal Rank (MRR) </li></ul>
    59. 59. Report: Total Queries <ul><li>Search vs. navigation/browsing
    60. 60. Search vs. overall site usage
    61. 61. Related report: % of visits with search
    62. 62. Segment: new users vs. return users, etc.
    63. 63. Questions: do you count paging? Facet selection? Re-sorting? </li></ul>
    64. 64. Report: Total Distinct Queries <ul><li>What's distinct? Car vs. Cars
    65. 65. # Total Queries / # Distinct Queries = Avg. #
    66. 66. Tied to performance and query cache utilization
    67. 67. Extension: Total distinct words in queries </li></ul>
    68. 68. Report: Words Per Query <ul><li>Informative, slowly changing, not terribly actionable
    69. 69. Can affect search box size
    70. 70. Use AutoComplete if queries are long </li></ul>
    71. 71. Report: Top Queries <ul><li>User intent and information needs
    72. 72. Ensure good results
    73. 73. Calculate MRR for top N queries
    74. 74. Calculate MRR for each top N query
    75. 75. Compare to global MRR </li></ul>
    76. 76. Report: Top Queries (cont.) <ul><li>New top queries – new trend? New demand?
    77. 77. Best Bets (aka Query Elevation in Solr)
    78. 78. Expose before search is needed
    79. 79. Seasonality – hour of day, day of the week, etc. </li><ul><li>Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal)
    80. 80. Anticipate demand in the next cycle </li></ul></ul>
    81. 81. Clickstream Analysis <ul><li>Query analysis is not a complete story: </li><ul><li>Queries
    82. 82. Clicks
    83. 83. (Trans)action </li></ul></ul>
    84. 84. Query and Hit Valuation <ul><li>Query: by popularity (count)
    85. 85. Query: by CTR
    86. 86. Query: by subsequent (trans)action count/pct.
    87. 87. Hit: by click count
    88. 88. Hit: by subsequent (trans)action count/pct. </li></ul>
    89. 89. Query and Hit Valuation (cont.) <ul><li>Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit)
    90. 90. Failures: high pop(q), yet low ctr(q)
    91. 91. high pop(q), high ctr(q), yet low action(q)
    92. 92. Integration with backend required </li></ul>
    93. 93. Report: Low CTR Queries <ul><li>Popular queries; percentage, not raw count
    94. 94. Relevance bad?
    95. 95. Default ordering bad?
    96. 96. Titles of hits need adjusting?
    97. 97. Search terms highlighting looking bad?
    98. 98. Bad thumbnails? Need thumbnails? </li></ul>
    99. 99. Report: Queries with Most Clicks <ul><li>i.e. Queries with Highest CTR
    100. 100. Informative? Yes
    101. 101. Actionable? Somewhat: expose relevant content outside of search </li></ul>
    102. 102. Search Session <ul><li>Search activity aimed at satisfying a specific information need in a some limited amount of time.
    103. 103. i.e. it's very fuzzy </li></ul>
    104. 104. Interesting Search Sessions <ul><li>More than N queries in M minutes
    105. 105. Sessions that end in a failure
    106. 106. Sessions for specific type of info (e.g. person name, product name, event) </li></ul>
    107. 107. Segmentation <ul><li>Searches that resulted in conversion vs. not
    108. 108. Search metrics for
    109. 109. New vs. returning visitors
    110. 110. English vs. French vs. Spanish vs. …
    111. 111. Chrome vs. IE
    112. 112. ... </li></ul>
    113. 113. More SA Reports/Questions <ul><li>% of queries from DYM vs. AC vs. typed
    114. 114. Most common queries per clicked hit
    115. 115. Which hits are generally popular?
    116. 116. Which hits are trending up?
    117. 117. Are there docs that are never ever clicked on?
    118. 118. Average number of queries per session
    119. 119. Breakdown of queries by number of hits </li></ul>
    120. 120. More SA Reports/Questions <ul><li>Breakdown of queries by latency
    121. 121. Frequently used facets or sort criteria
    122. 122. Avg number of clicks per query
    123. 123. Time spent on site before/after searching
    124. 124. Search initiation pages
    125. 125. How deep into SERPs are people drilling?
    126. 126. Are too many clicks on pages other than 1 st ?
    127. 127. ... </li></ul>
    128. 128. Data Collection
    129. 129. <ul><li>sematext.com
    130. 130. blog.sematext.com
    131. 131. @ sematext
    132. 132. @ otisg
    133. 133. [email_address] </li></ul>Contact

    ×