Your SlideShare is downloading. ×
0
Copyright 2011 Sematext Int'l.  All rights reserved. Search Analytics What? Why? How? Otis Gospodneti ć  –  Sematext Inter...
About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author: Lucene in Action 1 & ...
About Sematext <ul><li>Consulting, development, support: </li></ul><ul><li>Search  (Lucene, Solr, Elastic Search...) </li>...
Agenda <ul><li>Intro: Otis & Sematext - DONE </li></ul><ul><li>What </li></ul><ul><li>Why </li></ul><ul><li>Reports </li><...
What is Search Analytics? <ul><li>Input: queries and clicks </li></ul><ul><li>Output: reports – over time </li></ul><ul><l...
Search Analytics and SEO <ul><li>Not the same </li></ul><ul><li>SA can help with SEO </li></ul>Copyright 2011 Sematext Int...
Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></ul>...
Why Search Analytics? <ul><li>Measure and monitor everything </li></ul><ul><li>Supports (re)design, navigation choices </l...
Report Groups <ul><li>Failures vs. non-failures </li></ul><ul><li>Actionable vs. non-actionable </li></ul>Copyright 2011 S...
Failures <ul><li>Be aware of failures, but don't be one. </li></ul><ul><li>Zero hits </li></ul><ul><li>Low query CTR </li>...
Report: Zero Hit Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Misspellings? </li></ul><ul>...
Report: Zero Hit Queries (cont.) <ul><li>Use Query Spellchecker (aka DYM) </li></ul><ul><li>Using AutoComplete </li></ul><...
Report: High Exit Rate Queries <ul><li>Disappointed, frustrated users </li></ul><ul><li>Major revenue loss, no second chan...
Report: Irrelevant Result Queries <ul><li>Manual: top N hits of top N queries </li></ul><ul><li>Judge relevance of each hi...
Report: Total Queries <ul><li>Search vs. navigation/browsing </li></ul><ul><li>Search vs. overall site usage </li></ul><ul...
Report: Total Distinct Queries <ul><li>What's distinct?  Car vs. Cars </li></ul><ul><li># Total Queries / # Distinct Queri...
Report: Words Per Query <ul><li>Informative, slowly changing, not terribly actionable </li></ul><ul><li>Can affect search ...
Report: Top Queries <ul><li>User intent and information needs </li></ul><ul><li>Ensure good results </li></ul><ul><li>Calc...
Report: Top Queries (cont.) <ul><li>New top queries – new trend? New demand? </li></ul><ul><li>Best Bets (aka Query Elevat...
Clickstream Analysis <ul><li>Query analysis is not a complete story: </li></ul><ul><ul><li>Queries </li></ul></ul><ul><ul>...
Query and Hit Valuation <ul><li>Query: by popularity (count) </li></ul><ul><li>Query: by CTR </li></ul><ul><li>Query: by s...
Query and Hit Valuation (cont.) <ul><li>Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit) </li><...
Report: Low CTR Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Relevance bad? </li></ul><ul>...
Report: Queries with Most Clicks <ul><li>i.e. Queries with Highest CTR </li></ul><ul><li>Informative?  Yes </li></ul><ul><...
Search Session <ul><li>Search activity aimed at satisfying a specific information need in a some limited amount of time. <...
Interesting Search Sessions <ul><li>More than N queries in M minutes </li></ul><ul><li>Sessions that end in a failure </li...
Segmentation <ul><li>Searches that resulted in conversion vs. not </li></ul><ul><li>Search metrics for </li></ul><ul><li>N...
More SA Reports/Questions <ul><li>% of queries from DYM vs. AC vs. typed </li></ul><ul><li>Most common queries per clicked...
More SA Reports/Questions <ul><li>Breakdown of queries by latency </li></ul><ul><li>Frequently used facets or sort criteri...
Data Collection Copyright 2011 Sematext Int'l.  All rights reserved.
<ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@ sematext </li></ul><ul><li>@ otisg </li></ul>...
Upcoming SlideShare
Loading in...5
×

Search Analytics What? Why? How?

1,654

Published on

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,654
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
23
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide
  • 10 days of data (5K/min)
  • Transcript of "Search Analytics What? Why? How?"

    1. 1. Copyright 2011 Sematext Int'l. All rights reserved. Search Analytics What? Why? How? Otis Gospodneti ć – Sematext International
    2. 2. About Otis Gospodneti ć <ul><li>Member: Apache Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author: Lucene in Action 1 & 2 </li></ul><ul><li>Entrepreneur: Sematext, Simpy </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    3. 3. About Sematext <ul><li>Consulting, development, support: </li></ul><ul><li>Search (Lucene, Solr, Elastic Search...) </li></ul><ul><li>Big Data (Hadoop, HBase, Voldemort...) </li></ul><ul><li>Web Crawling (Nutch) </li></ul><ul><li>Machine Learning (Mahout) </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    4. 4. Agenda <ul><li>Intro: Otis & Sematext - DONE </li></ul><ul><li>What </li></ul><ul><li>Why </li></ul><ul><li>Reports </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    5. 5. What is Search Analytics? <ul><li>Input: queries and clicks </li></ul><ul><li>Output: reports – over time </li></ul><ul><li>Next: actions </li></ul><ul><li>The means, not the goal </li></ul><ul><li>Ongoing, not one-off </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    6. 6. Search Analytics and SEO <ul><li>Not the same </li></ul><ul><li>SA can help with SEO </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    7. 7. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></ul><ul><li>Ideally you can relate data from both or even unify it </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    8. 8. Why Search Analytics? <ul><li>Measure and monitor everything </li></ul><ul><li>Supports (re)design, navigation choices </li></ul><ul><li>Helps with content acquisition & enhancement </li></ul><ul><li>Improve search experience </li></ul><ul><li>Mula </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    9. 9. Report Groups <ul><li>Failures vs. non-failures </li></ul><ul><li>Actionable vs. non-actionable </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    10. 10. Failures <ul><li>Be aware of failures, but don't be one. </li></ul><ul><li>Zero hits </li></ul><ul><li>Low query CTR </li></ul><ul><li>High search exit rate </li></ul><ul><li>Irrelevant results </li></ul><ul><li>Over N refinements </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    11. 11. Report: Zero Hit Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Misspellings? </li></ul><ul><li>Synonyms? </li></ul><ul><li>No matching content? </li></ul><ul><li>Need (different) tagging? </li></ul><ul><li>Bad analysis? </li></ul><ul><li>Multilingual issue? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    12. 12. Report: Zero Hit Queries (cont.) <ul><li>Use Query Spellchecker (aka DYM) </li></ul><ul><li>Using AutoComplete </li></ul><ul><li>Using DYM ReSearcher </li></ul><ul><li>Designing No Results page </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    13. 13. Report: High Exit Rate Queries <ul><li>Disappointed, frustrated users </li></ul><ul><li>Major revenue loss, no second chance </li></ul><ul><li>Marriage with Web Analytics </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    14. 14. Report: Irrelevant Result Queries <ul><li>Manual: top N hits of top N queries </li></ul><ul><li>Judge relevance of each hit and assign score </li></ul><ul><li>Per-query score: sum scores of top N hits </li></ul><ul><li>Cumulative top N query score: sum per-query scores </li></ul><ul><li>Automated: Mean Reciprocal Rank (MRR) </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    15. 15. Report: Total Queries <ul><li>Search vs. navigation/browsing </li></ul><ul><li>Search vs. overall site usage </li></ul><ul><li>Related report: % of visits with search </li></ul><ul><li>Segment: new users vs. return users, etc. </li></ul><ul><li>Questions: do you count paging? Facet selection? Re-sorting? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    16. 16. Report: Total Distinct Queries <ul><li>What's distinct? Car vs. Cars </li></ul><ul><li># Total Queries / # Distinct Queries = Avg. # </li></ul><ul><li>Tied to performance and query cache utilization </li></ul><ul><li>Extension: Total distinct words in queries </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    17. 17. Report: Words Per Query <ul><li>Informative, slowly changing, not terribly actionable </li></ul><ul><li>Can affect search box size </li></ul><ul><li>Use AutoComplete if queries are long </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    18. 18. Report: Top Queries <ul><li>User intent and information needs </li></ul><ul><li>Ensure good results </li></ul><ul><li>Calculate MRR for top N queries </li></ul><ul><li>Calculate MRR for each top N query </li></ul><ul><li>Compare to global MRR </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    19. 19. Report: Top Queries (cont.) <ul><li>New top queries – new trend? New demand? </li></ul><ul><li>Best Bets (aka Query Elevation in Solr) </li></ul><ul><li>Expose before search is needed </li></ul><ul><li>Seasonality – hour of day, day of the week, etc. </li></ul><ul><ul><li>Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal) </li></ul></ul><ul><ul><li>Anticipate demand in the next cycle </li></ul></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    20. 20. Clickstream Analysis <ul><li>Query analysis is not a complete story: </li></ul><ul><ul><li>Queries </li></ul></ul><ul><ul><li>Clicks </li></ul></ul><ul><ul><li>(Trans)action </li></ul></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    21. 21. Query and Hit Valuation <ul><li>Query: by popularity (count) </li></ul><ul><li>Query: by CTR </li></ul><ul><li>Query: by subsequent (trans)action count/pct. </li></ul><ul><li>Hit: by click count </li></ul><ul><li>Hit: by subsequent (trans)action count/pct. </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    22. 22. Query and Hit Valuation (cont.) <ul><li>Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit) </li></ul><ul><li>Failures: high pop(q), yet low ctr(q) </li></ul><ul><li>high pop(q), high ctr(q), yet low action(q) </li></ul><ul><li>Integration with backend required </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    23. 23. Report: Low CTR Queries <ul><li>Popular queries; percentage, not raw count </li></ul><ul><li>Relevance bad? </li></ul><ul><li>Default ordering bad? </li></ul><ul><li>Titles of hits need adjusting? </li></ul><ul><li>Search terms highlighting looking bad? </li></ul><ul><li>Bad thumbnails? Need thumbnails? </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    24. 24. Report: Queries with Most Clicks <ul><li>i.e. Queries with Highest CTR </li></ul><ul><li>Informative? Yes </li></ul><ul><li>Actionable? Somewhat: expose relevant content outside of search </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    25. 25. Search Session <ul><li>Search activity aimed at satisfying a specific information need in a some limited amount of time. </li></ul><ul><li>i.e. it's very fuzzy </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    26. 26. Interesting Search Sessions <ul><li>More than N queries in M minutes </li></ul><ul><li>Sessions that end in a failure </li></ul><ul><li>Sessions for specific type of info (e.g. person name, product name, event) </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    27. 27. Segmentation <ul><li>Searches that resulted in conversion vs. not </li></ul><ul><li>Search metrics for </li></ul><ul><li>New vs. returning visitors </li></ul><ul><li>English vs. French vs. Spanish vs. … </li></ul><ul><li>Chrome vs. IE </li></ul><ul><li>... </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    28. 28. More SA Reports/Questions <ul><li>% of queries from DYM vs. AC vs. typed </li></ul><ul><li>Most common queries per clicked hit </li></ul><ul><li>Which hits are generally popular? </li></ul><ul><li>Which hits are trending up? </li></ul><ul><li>Are there docs that are never ever clicked on? </li></ul><ul><li>Average number of queries per session </li></ul><ul><li>Breakdown of queries by number of hits </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    29. 29. More SA Reports/Questions <ul><li>Breakdown of queries by latency </li></ul><ul><li>Frequently used facets or sort criteria </li></ul><ul><li>Avg number of clicks per query </li></ul><ul><li>Time spent on site before/after searching </li></ul><ul><li>Search initiation pages </li></ul><ul><li>How deep into SERPs are people drilling? </li></ul><ul><li>Are too many clicks on pages other than 1 st ? </li></ul><ul><li>... </li></ul>Copyright 2011 Sematext Int'l. All rights reserved.
    30. 30. Data Collection Copyright 2011 Sematext Int'l. All rights reserved.
    31. 31. <ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@ sematext </li></ul><ul><li>@ otisg </li></ul><ul><li>[email_address] </li></ul>Contact Copyright 2011 Sematext Int'l. All rights reserved.
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×