Search Analytics: What? Why? How? - By Otis Gospodnetic

1,719 views
1,642 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/revolution/2011

You’ve indexed your data and people are searching it. But how do you know if they are happy with
the results? How do you know if they are finding what they need? With search increasingly
becoming the primary information access mechanism, knowing how your search is doing is not just
a matter of mere curiosity, but often has direct business impact.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,719
On SlideShare
0
From Embeds
0
Number of Embeds
324
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • 10 days of data (5K/min)
  • Search Analytics: What? Why? How? - By Otis Gospodnetic

    1. 1. Search Analytics What? Why? How? Otis Gospodneti ć – Sematext International
    2. 2. About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author: Lucene in Action 1 & 2 </li></ul><ul><li>Entrepreneur: Sematext, Simpy </li></ul>
    3. 3. About Sematext <ul><li>Consulting, development, support: </li></ul><ul><li>Search (Lucene, Solr, Elastic Search...) </li></ul><ul><li>Big Data (Hadoop, HBase, Voldemort...) </li></ul><ul><li>Web Crawling (Nutch) </li></ul><ul><li>Machine Learning (Mahout) </li></ul>
    4. 4. Agenda <ul><li>Intro: Otis & Sematext - DONE </li></ul><ul><li>What </li></ul><ul><li>Why </li></ul><ul><li>Reports </li></ul>
    5. 5. What is Search Analytics? <ul><li>Input: queries and clicks </li></ul><ul><li>Output: reports – over time </li></ul><ul><li>Next: actions </li></ul><ul><li>The means, not the goal </li></ul><ul><li>Ongoing, not one-off </li></ul>
    6. 6. Search Analytics and SEO <ul><li>Not the same </li></ul><ul><li>SA can help with SEO </li></ul>
    7. 7. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></ul><ul><li>Ideally you can relate data from both or even unify it </li></ul>
    8. 8. Why Search Analytics? <ul><li>Measure and monitor everything </li></ul><ul><li>Supports (re)design, navigation choices </li></ul><ul><li>Helps with content acquisition & enhancement </li></ul><ul><li>Improve search experience </li></ul><ul><li>Mula </li></ul>
    9. 9. Report Groups <ul><li>Failures vs. non-failures </li></ul><ul><li>Actionable vs. non-actionable </li></ul>
    10. 10. Failures <ul><li>Be aware of failures, but don't be one. </li></ul><ul><li>Zero hits </li></ul><ul><li>Low query CTR </li></ul><ul><li>High search exit rate </li></ul><ul><li>Irrelevant results </li></ul><ul><li>Over N refinements </li></ul>
    11. 11. Report: Zero Hit Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Misspellings? </li></ul><ul><li>Synonyms? </li></ul><ul><li>No matching content? </li></ul><ul><li>Need (different) tagging? </li></ul><ul><li>Bad analysis? </li></ul><ul><li>Multilingual issue? </li></ul>
    12. 12. Report: Zero Hit Queries (cont.) <ul><li>Use Query Spellchecker (aka DYM) </li></ul><ul><li>Using AutoComplete </li></ul><ul><li>Using DYM ReSearcher </li></ul><ul><li>Designing No Results page </li></ul>
    13. 13. Report: High Exit Rate Queries <ul><li>Disappointed, frustrated users </li></ul><ul><li>Major revenue loss, no second chance </li></ul><ul><li>Marriage with Web Analytics </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>
    14. 14. Report: Irrelevant Result Queries <ul><li>Manual: top N hits of top N queries </li></ul><ul><li>Judge relevance of each hit and assign score </li></ul><ul><li>Per-query score: sum scores of top N hits </li></ul><ul><li>Cumulative top N query score: sum per-query scores </li></ul><ul><li>Automated: Mean Reciprocal Rank (MRR) </li></ul>
    15. 15. Report: Total Queries <ul><li>Search vs. navigation/browsing </li></ul><ul><li>Search vs. overall site usage </li></ul><ul><li>Related report: % of visits with search </li></ul><ul><li>Segment: new users vs. return users, etc. </li></ul><ul><li>Questions: do you count paging? Facet selection? Re-sorting? </li></ul>
    16. 16. Report: Total Distinct Queries <ul><li>What's distinct? Car vs. Cars </li></ul><ul><li># Total Queries / # Distinct Queries = Avg. # </li></ul><ul><li>Tied to performance and query cache utilization </li></ul><ul><li>Extension: Total distinct words in queries </li></ul>
    17. 17. Report: Words Per Query <ul><li>Informative, slowly changing, not terribly actionable </li></ul><ul><li>Can affect search box size </li></ul><ul><li>Use AutoComplete if queries are long </li></ul>
    18. 18. Report: Top Queries <ul><li>User intent and information needs </li></ul><ul><li>Ensure good results </li></ul><ul><li>Calculate MRR for top N queries </li></ul><ul><li>Calculate MRR for each top N query </li></ul><ul><li>Compare to global MRR </li></ul>
    19. 19. Report: Top Queries (cont.) <ul><li>New top queries – new trend? New demand? </li></ul><ul><li>Best Bets (aka Query Elevation in Solr) </li></ul><ul><li>Expose before search is needed </li></ul><ul><li>Seasonality – hour of day, day of the week, etc. </li></ul><ul><ul><li>Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal) </li></ul></ul><ul><ul><li>Anticipate demand in the next cycle </li></ul></ul>
    20. 20. Clickstream Analysis <ul><li>Query analysis is not a complete story: </li></ul><ul><ul><li>Queries </li></ul></ul><ul><ul><li>Clicks </li></ul></ul><ul><ul><li>(Trans)action </li></ul></ul>
    21. 21. Query and Hit Valuation <ul><li>Query: by popularity (count) </li></ul><ul><li>Query: by CTR </li></ul><ul><li>Query: by subsequent (trans)action count/pct. </li></ul><ul><li>Hit: by click count </li></ul><ul><li>Hit: by subsequent (trans)action count/pct. </li></ul>
    22. 22. Query and Hit Valuation (cont.) <ul><li>Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit) </li></ul><ul><li>Failures: high pop(q), yet low ctr(q) </li></ul><ul><li>high pop(q), high ctr(q), yet low action(q) </li></ul><ul><li>Integration with backend required </li></ul>
    23. 23. Report: Low CTR Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>
    24. 24. Report: Queries with Most Clicks <ul><li>i.e. Queries with Highest CTR </li></ul><ul><li>Informative? Yes </li></ul><ul><li>Actionable? Somewhat: expose relevant content outside of search </li></ul>
    25. 25. Search Session <ul><li>Search activity aimed at satisfying a specific information need in a some limited amount of time. </li></ul><ul><li>i.e. it's very fuzzy </li></ul>
    26. 26. Interesting Search Sessions <ul><li>More than N queries in M minutes </li></ul><ul><li>Sessions that end in a failure </li></ul><ul><li>Sessions for specific type of info (e.g. person name, product name, event) </li></ul>
    27. 27. Segmentation <ul><li>Searches that resulted in conversion vs. not </li></ul><ul><li>Search metrics for </li></ul><ul><li>New vs. returning visitors </li></ul><ul><li>English vs. French vs. Spanish vs. … </li></ul><ul><li>Chrome vs. IE </li></ul><ul><li>... </li></ul>
    28. 28. More SA Reports/Questions <ul><li>% of queries from DYM vs. AC vs. typed </li></ul><ul><li>Most common queries per clicked hit </li></ul><ul><li>Which hits are generally popular? </li></ul><ul><li>Which hits are trending up? </li></ul><ul><li>Are there docs that are never ever clicked on? </li></ul><ul><li>Average number of queries per session </li></ul><ul><li>Breakdown of queries by number of hits </li></ul>
    29. 29. More SA Reports/Questions <ul><li>Breakdown of queries by latency </li></ul><ul><li>Frequently used facets or sort criteria </li></ul><ul><li>Avg number of clicks per query </li></ul><ul><li>Time spent on site before/after searching </li></ul><ul><li>Search initiation pages </li></ul><ul><li>How deep into SERPs are people drilling? </li></ul><ul><li>Are too many clicks on pages other than 1 st ? </li></ul><ul><li>... </li></ul>
    30. 30. Data Collection
    31. 31. <ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@ sematext </li></ul><ul><li>@ otisg </li></ul><ul><li>[email_address] </li></ul>Contact

    ×