Your SlideShare is downloading. ×
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Search Analytics - What? Why? How?
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Search Analytics - What? Why? How?

3,469

Published on

This presentation describes what Search Analytics is, why it is valuable, and how it can be used to improve the search experience.

This presentation describes what Search Analytics is, why it is valuable, and how it can be used to improve the search experience.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
3,469
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • 10 days of data (5K/min)
  • Transcript

    • 1. Search Analytics What? Why? How? Otis Gospodneti ć – Sematext International @otisg ◦ @sematext ◦ sematext.com http://sematext.com/search-analytics/index.html
    • 2. About Otis Gospodneti ć
      • Member: Apache Lucene, Solr, Nutch, Mahout
      • Author: Lucene in Action 1 & 2
      • Entrepreneur: Sematext , Simpy
    • 3. About Sematext
      • Products & Services
      • Consulting, Development, Tech Support:
      • Search (Lucene, Solr, Elastic Search...)
      • Big Data (Hadoop, HBase, Voldemort...)
      • Web Crawling (Nutch, Droids)
      • Machine Learning (Mahout)
    • 4. Agenda
      • Intro: Otis & Sematext - DONE
      • What
      • Why
      • Specific Reports & their value
    • 5. What is Search Analytics?
      • Input: queries and clicks
      • Subsequent: actions / xactions / conversions
      • Output: reports – over time
      • The means, not the goal
      • Ongoing, not one-off
    • 6. Search Analytics and SEO
      • Not the same
      • SA can help with SEO
    • 7. Search vs. Web Analytics
      • User intent and information needs vs. inferring
      • Hand in hand
      • Ideally you can relate data from both or even unify it
    • 8. Why Search Analytics?
      • Measure and monitor everything. Introspection.
      • Supports (re)design, navigation choices
      • Helps with content acquisition & enhancement
      • Improve search experience
      • Mula
    • 9. Report Groups
      • Failures vs. non-failures
      • Actionable vs. non-actionable
    • 10. Failures
      • Be aware of failures, but don't be one.
      • Zero hits
      • Low query CTR
      • High search exit rate
      • Irrelevant results
      • Over N refinements
    • 11. Report: Zero Hit Queries
      • Overall pct. (not raw count) vs. popular queries
      • Misspellings?
      • Synonyms?
      • No matching content?
      • Need (different) tagging?
      • Bad analysis?
      • Multilingual issue?
    • 12.  
    • 13. Report: Zero Hit Queries (cont.)
      • Use Query Spellchecker (aka DYM)
      • Using AutoComplete - $MM improvement
      • Using DYM ReSearcher
      • Designing No Results page
    • 14.  
    • 15.  
    • 16. Report: High Exit Rate Queries
      • Disappointed, frustrated users
      • Major revenue loss, no second chance
      • Marriage with Web Analytics
      • Relevance bad?
      • Default ordering bad?
      • Titles of hits need adjusting?
      • Search terms highlighting looking bad?
      • Bad thumbnails? Need thumbnails?
    • 17. Report: Irrelevant Result Queries
      • Manual: top N hits of top N queries
      • Judge relevance of each hit and assign score
      • Per-query score: sum scores of top N hits
      • Cumulative top N query score: sum per-query scores
      • Automated: Mean Reciprocal Rank (MRR)
    • 18. Report: Total Queries
      • Search vs. navigation/browsing
      • Search vs. overall site usage
      • Related report: % of visits with search
      • Segment: new users vs. return users, etc.
      • Questions: do you count paging? Facet selection? Re-sorting?
    • 19. Report: Total Distinct Queries
      • What's distinct? Car vs. Cars
      • # Total Queries / # Distinct Queries = Avg. #
      • Tied to performance and query cache utilization
      • Extension: Total distinct words in queries
    • 20. Report: Words Per Query
      • Informative, slowly changing, not terribly actionable
      • Can affect search box size
      • Use AutoComplete if queries are long
    • 21. Report: Top Queries
      • User intent and information needs
      • Ensure good results
      • Calculate MRR for top N queries
      • Calculate MRR for each top N query
      • Compare to global MRR
    • 22. Report: Top Queries (cont.)
      • New top queries – new trend? New demand?
      • Best Bets (aka Query Elevation in Solr)
      • Expose before search is needed
      • Seasonality – hour of day, day of the week, etc.
        • Adjust content presentation and availability (e.g. week vs. weekend, business vs. personal)
        • Anticipate demand in the next cycle
    • 23.  
    • 24. Clickstream Analysis
      • Query analysis is not a complete story:
        • Queries
        • Clicks
        • Actions / Transactions / Conversions
    • 25. Query and Hit Valuation
      • Query: by popularity (count)
      • Query: by CTR
      • Query: by subsequent (trans)action count/pct.
      • Hit: by click count
      • Hit: by subsequent (trans)action count/pct.
    • 26. Query and Hit Valuation (cont.)
      • Maximize: pop(query) + ctr(query) + action(query) clicks(hit) + action(hit)
      • Failures: high pop(q), yet low ctr(q)
      • high pop(q), high ctr(q), yet low action(q)
      • Integration with backend required
    • 27. Report: Low CTR Queries
      • Percentage (not raw count) vs. popular queries
      • Relevance bad?
      • Default ordering bad?
      • Titles of hits need adjusting?
      • Search terms highlighting looking bad?
      • Bad thumbnails? Need thumbnails?
    • 28.  
    • 29. Report: Queries with Most Clicks
      • i.e. Queries with Highest CTR
      • Informative? Yes
      • Actionable? Somewhat: expose relevant content outside of search
    • 30.  
    • 31. Search Session
      • Search activity aimed at satisfying a specific information need in a some limited amount of time.
      • i.e. it's very fuzzy
    • 32. Interesting Search Sessions
      • More than N queries in M minutes
      • Sessions that end in a failure
      • Sessions for specific type of info (e.g. person name, product name, event)
    • 33. Segmentation
      • Searches that resulted in conversion vs. not
      • Search metrics for:
        • New vs. returning visitors
        • English vs. French vs. Spanish vs. …
        • Chrome vs. IE
        • ...
    • 34. More SA Reports/Questions
      • % of queries from DYM vs. AC vs. typed
      • Most common queries per clicked hit
      • Which hits are generally popular?
      • Which hits are trending up?
      • Are there docs that are never ever clicked on?
      • Average number of queries per session
      • Breakdown of queries by number of hits
    • 35. More SA Reports/Questions
      • Breakdown of queries by latency
      • Frequently used facets or sort criteria
      • Avg number of clicks per query
      • Time spent on site before/after searching
      • Search initiation pages
      • How deep into SERPs are people drilling?
      • Are too many clicks on pages other than 1 st ?
      • ...
    • 36. Data Collection
      • Details in Search Analytics with Flume and HBase on http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/
    • 37. Sematext's Search Analytics
      • Built with Flume, HBase, Hadoop, etc.
      • Resulted in 2 open-source projects:
      • https://github.com/sematext/HBaseWD
      • https://github.com/sematext/HBaseHUT
      • See http://sematext.com/open-source/index.html
      • Resulted in patches for Flume and HBase
    • 38. We're Hiring
      • Dig Search ?
      • Dig Analytics ?
      • Dig Big Data ?
      • Dig Performance ?
      • Dig working with and in open-source ?
      • We're hiring world-wide!
      • http://sematext.com/about/jobs.html
    • 39.
      • sematext.com
      • blog.sematext.com
      • @sematext
      • @otisg
      • [email_address] Want SA? Grab me or go to:
      • http://sematext.com/search-analytics/index.html
      Contact

    ×