Search Analytics - What? Why? How?

3,875 views

Published on

This presentation describes what Search Analytics is, why it is valuable, and how it can be used to improve the search experience.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,875
On SlideShare
0
From Embeds
0
Number of Embeds
2,529
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • 10 days of data (5K/min)
  • Search Analytics - What? Why? How?

    1. 1. Search Analytics What? Why? How? Otis Gospodneti ć – Sematext International @otisg ◦ @sematext ◦ sematext.com http://sematext.com/search-analytics/index.html
    2. 2. About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author: Lucene in Action 1 & 2 </li></ul><ul><li>Entrepreneur: Sematext , Simpy </li></ul>
    3. 3. About Sematext <ul><li>Products & Services </li></ul><ul><li>Consulting, Development, Tech Support: </li></ul><ul><li>Search (Lucene, Solr, Elastic Search...) </li></ul><ul><li>Big Data (Hadoop, HBase, Voldemort...) </li></ul><ul><li>Web Crawling (Nutch, Droids) </li></ul><ul><li>Machine Learning (Mahout) </li></ul>
    4. 4. Agenda <ul><li>Intro: Otis & Sematext - DONE </li></ul><ul><li>What </li></ul><ul><li>Why </li></ul><ul><li>Specific Reports & their value </li></ul>
    5. 5. What is Search Analytics? <ul><li>Input: queries and clicks </li></ul><ul><li>Subsequent: actions / xactions / conversions </li></ul><ul><li>Output: reports – over time </li></ul><ul><li>The means, not the goal </li></ul><ul><li>Ongoing, not one-off </li></ul>
    6. 6. Search Analytics and SEO <ul><li>Not the same </li></ul><ul><li>SA can help with SEO </li></ul>
    7. 7. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></ul><ul><li>Ideally you can relate data from both or even unify it </li></ul>
    8. 8. Why Search Analytics? <ul><li>Measure and monitor everything. Introspection. </li></ul><ul><li>Supports (re)design, navigation choices </li></ul><ul><li>Helps with content acquisition & enhancement </li></ul><ul><li>Improve search experience </li></ul><ul><li>Mula </li></ul>
    9. 9. Report Groups <ul><li>Failures vs. non-failures </li></ul><ul><li>Actionable vs. non-actionable </li></ul>
    10. 10. Failures <ul><li>Be aware of failures, but don't be one. </li></ul><ul><li>Zero hits </li></ul><ul><li>Low query CTR </li></ul><ul><li>High search exit rate </li></ul><ul><li>Irrelevant results </li></ul><ul><li>Over N refinements </li></ul>
    11. 11. Report: Zero Hit Queries <ul><li>Overall pct. (not raw count) vs. popular queries </li></ul><ul><li>Misspellings? </li></ul><ul><li>Synonyms? </li></ul><ul><li>No matching content? </li></ul><ul><li>Need (different) tagging? </li></ul><ul><li>Bad analysis? </li></ul><ul><li>Multilingual issue? </li></ul>
    12. 13. Report: Zero Hit Queries (cont.) <ul><li>Use Query Spellchecker (aka DYM) </li></ul><ul><li>Using AutoComplete - $MM improvement </li></ul><ul><li>Using DYM ReSearcher </li></ul><ul><li>Designing No Results page </li></ul>
    13. 16. Report: High Exit Rate Queries <ul><li>Disappointed, frustrated users </li></ul><ul><li>Major revenue loss, no second chance </li></ul><ul><li>Marriage with Web Analytics </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>
    14. 17. Report: Irrelevant Result Queries <ul><li>Manual: top N hits of top N queries </li></ul><ul><li>Judge relevance of each hit and assign score </li></ul><ul><li>Per-query score: sum scores of top N hits </li></ul><ul><li>Cumulative top N query score: sum per-query scores </li></ul><ul><li>Automated: Mean Reciprocal Rank (MRR) </li></ul>
    15. 18. Report: Total Queries <ul><li>Search vs. navigation/browsing </li></ul><ul><li>Search vs. overall site usage </li></ul><ul><li>Related report: % of visits with search </li></ul><ul><li>Segment: new users vs. return users, etc. </li></ul><ul><li>Questions: do you count paging? Facet selection? Re-sorting? </li></ul>
    16. 19. Report: Total Distinct Queries <ul><li>What's distinct? Car vs. Cars </li></ul><ul><li># Total Queries / # Distinct Queries = Avg. # </li></ul><ul><li>Tied to performance and query cache utilization </li></ul><ul><li>Extension: Total distinct words in queries </li></ul>
    17. 20. Report: Words Per Query <ul><li>Informative, slowly changing, not terribly actionable </li></ul><ul><li>Can affect search box size </li></ul><ul><li>Use AutoComplete if queries are long </li></ul>
    18. 21. Report: Top Queries <ul><li>User intent and information needs </li></ul><ul><li>Ensure good results </li></ul><ul><li>Calculate MRR for top N queries </li></ul><ul><li>Calculate MRR for each top N query </li></ul><ul><li>Compare to global MRR </li></ul>
    19. 22. Report: Top Queries (cont.) <ul><li>New top queries – new trend? New demand? </li></ul><ul><li>Best Bets (aka Query Elevation in Solr) </li></ul><ul><li>Expose before search is needed </li></ul><ul><li>Seasonality – hour of day, day of the week, etc. </li></ul><ul><ul><li>Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal) </li></ul></ul><ul><ul><li>Anticipate demand in the next cycle </li></ul></ul>
    20. 24. Clickstream Analysis <ul><li>Query analysis is not a complete story: </li></ul><ul><ul><li>Queries </li></ul></ul><ul><ul><li>Clicks </li></ul></ul><ul><ul><li>Actions / Transactions / Conversions </li></ul></ul>
    21. 25. Query and Hit Valuation <ul><li>Query: by popularity (count) </li></ul><ul><li>Query: by CTR </li></ul><ul><li>Query: by subsequent (trans)action count/pct. </li></ul><ul><li>Hit: by click count </li></ul><ul><li>Hit: by subsequent (trans)action count/pct. </li></ul>
    22. 26. Query and Hit Valuation (cont.) <ul><li>Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit) </li></ul><ul><li>Failures: high pop(q), yet low ctr(q) </li></ul><ul><li>high pop(q), high ctr(q), yet low action(q) </li></ul><ul><li>Integration with backend required </li></ul>
    23. 27. Report: Low CTR Queries <ul><li>Percentage (not raw count) vs. popular queries </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>
    24. 29. Report: Queries with Most Clicks <ul><li>i.e. Queries with Highest CTR </li></ul><ul><li>Informative? Yes </li></ul><ul><li>Actionable? Somewhat: expose relevant content outside of search </li></ul>
    25. 31. Search Session <ul><li>Search activity aimed at satisfying a specific information need in a some limited amount of time. </li></ul><ul><li>i.e. it's very fuzzy </li></ul>
    26. 32. Interesting Search Sessions <ul><li>More than N queries in M minutes </li></ul><ul><li>Sessions that end in a failure </li></ul><ul><li>Sessions for specific type of info (e.g. person name, product name, event) </li></ul>
    27. 33. Segmentation <ul><li>Searches that resulted in conversion vs. not </li></ul><ul><li>Search metrics for: </li></ul><ul><ul><li>New vs. returning visitors </li></ul></ul><ul><ul><li>English vs. French vs. Spanish vs. … </li></ul></ul><ul><ul><li>Chrome vs. IE </li></ul></ul><ul><ul><li>... </li></ul></ul>
    28. 34. More SA Reports/Questions <ul><li>% of queries from DYM vs. AC vs. typed </li></ul><ul><li>Most common queries per clicked hit </li></ul><ul><li>Which hits are generally popular? </li></ul><ul><li>Which hits are trending up? </li></ul><ul><li>Are there docs that are never ever clicked on? </li></ul><ul><li>Average number of queries per session </li></ul><ul><li>Breakdown of queries by number of hits </li></ul>
    29. 35. More SA Reports/Questions <ul><li>Breakdown of queries by latency </li></ul><ul><li>Frequently used facets or sort criteria </li></ul><ul><li>Avg number of clicks per query </li></ul><ul><li>Time spent on site before/after searching </li></ul><ul><li>Search initiation pages </li></ul><ul><li>How deep into SERPs are people drilling? </li></ul><ul><li>Are too many clicks on pages other than 1 st ? </li></ul><ul><li>... </li></ul>
    30. 36. Data Collection <ul><li>Details in Search Analytics with Flume and HBase on http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ </li></ul>
    31. 37. Sematext's Search Analytics <ul><li>Built with Flume, HBase, Hadoop, etc. </li></ul><ul><li>Resulted in 2 open-source projects: </li></ul><ul><li>https://github.com/sematext/HBaseWD </li></ul><ul><li>https://github.com/sematext/HBaseHUT </li></ul><ul><li>See http://sematext.com/open-source/index.html </li></ul><ul><li>Resulted in patches for Flume and HBase </li></ul>
    32. 38. We're Hiring <ul><li>Dig Search ? </li></ul><ul><li>Dig Analytics ? </li></ul><ul><li>Dig Big Data ? </li></ul><ul><li>Dig Performance ? </li></ul><ul><li>Dig working with and in open-source ? </li></ul><ul><li>We're hiring world-wide! </li></ul><ul><li>http://sematext.com/about/jobs.html </li></ul>
    33. 39. <ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@sematext </li></ul><ul><li>@otisg </li></ul><ul><li>[email_address] Want SA? Grab me or go to: </li></ul><ul><li>http://sematext.com/search-analytics/index.html </li></ul>Contact

    ×