Search Analytics What ?  Why ?  How ? Otis Gospodneti ć  –  Sematext International @otisg  ◦  @sematext  ◦  sematext.com s...
About Otis Gospodneti ć <ul><li>ASF Member : Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author :  Lucene in Action  1 &...
Sematext Metrics <ul><li>100%  organic : no GMO, no VC </li></ul><ul><li>4 years old </li></ul><ul><li>< 10 people </li></...
About Sematext <ul><li>Products & Services </li></ul><ul><li>Consulting, Development, Tech Support: </li></ul><ul><li>Sear...
Agenda <ul><li>What is Search Analytics and why it matters </li></ul><ul><li>Example reports and their value </li></ul><ul...
Communication <ul><li>twitter.com/ sematext </li></ul><ul><li>twitter.com/ otisg </li></ul><ul><li>hash tags:  # stsa  or ...
Why search users search providers search experience
Why Oh Why search providers search experience This search sucks! It takes 17 tries to find anything here! F!?@#$%^&?!? sea...
Fill in the Missing Piece Search Analytics Performance Monitoring Quality Assurance Tuning UI
Blind Leading the Blind
Analytics as Compass <ul><li>Search logs are your  Map </li></ul><ul><li>Search Analytics is your  Compass </li></ul>
The Bottom Line Why <ul><li>Measure  and  monitor everything . </li></ul><ul><li>Supports (re)design, navigation choices <...
The Moment of Truth <ul><li>Question for the audience #1 What do you use for Search Analytics? </li></ul><ul><li>a) Home g...
Search Analytics Basics <ul><li>Collect :  queries  &  clicks  &  interactions  & ... </li></ul><ul><li>Analyze : actions ...
Search vs. Web Analytics <ul><li>User  intent  and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></u...
Report Types <ul><li>Failures vs. non-failures </li></ul><ul><li>Actionable vs. non-actionable </li></ul><ul><li>Trends vs...
Failures vs. Non-Failures <ul><li>Zero hits </li></ul><ul><li>Low CTR </li></ul><ul><li>Low MRR </li></ul><ul><li>High bou...
Value of Failure Fixes <ul><li>Zero hits </li></ul><ul><li>Low CTR </li></ul><ul><li>Low MRR </li></ul><ul><li>High bounce...
Measure, then Fix <ul><li>If you can't  measure , it you can't  fix  it! </li></ul>
Relevance A/B Testing
Tracking Zero Hits
Watching Latency
Search Analytics & Measuring If you can't  measure  it, you can't  fix  it! You can't  measure  it if you don't have  Anal...
Actionable vs. Non-Actionable <ul><li>Zero hits </li></ul><ul><li>Low CTR </li></ul><ul><li>Low MRR </li></ul><ul><li>High...
More Fixin' <ul><li>Query rate </li></ul><ul><li>Query volume </li></ul><ul><li>Search sessions </li></ul><ul><li>Search u...
Output++: Data is Power <ul><li>AutoComplete - $MM improvement </li></ul><ul><li>Better DYM Spellchecker </li></ul><ul><li...
Closing the Loop search users search providers search experience
Resources http://rosenfeldmedia.com/books/searchanalytics/   Search Analytics for Your Site Louis Rosenfeld <ul><li>Search...
Key Take-aways <ul><li>Without Analytics you are  blind </li></ul><ul><li>If you can't  measure  it, you can't  fix  it </...
<ul><li>Time permitting: Behind the scenes of Sematext Search Analytics </li></ul>Behind the Scenes
<ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@sematext </li></ul><ul><li>@otisg </li></ul><u...
What We've Built <ul><li>Search Analytics SaaS </li></ul><ul><ul><li>Numerous  reports  (e.g. query volume, rate, latency,...
Sematext Search Analytics
Big Dreams <ul><li>SaaS </li></ul><ul><li>Multitenant </li></ul><ul><li>Large Scale – Massive Data </li></ul><ul><li>Cloud...
Storage Choices <ul><li>RDBMS: MySQL, PostgreSQL </li></ul><ul><li>HDFS </li></ul><ul><li>Hive </li></ul><ul><li>HBase </l...
SaaS vs. In-House <ul><li>Question for the audience #2 </li></ul><ul><li>SaaS vs in-house Search Analytics? a) SaaS b) in-...
Sematext Search Analytics
Sematext Search Analytics
Sematext Search Analytics
Sematext Search Analytics
Data Flow <ul><li>See  Search Analytics with Flume and HBase   http://blog.sematext.com/2010/10/16/search-analytics-hadoop...
Data Collection <ul><li>See  Search Analytics with Flume and HBase   http://blog.sematext.com/2010/10/16/search-analytics-...
Core Tech <ul><li>JavaScript  Beacons </li></ul><ul><li>Metric Capture Web App aka  Receiver </li></ul><ul><li>Flume  Agen...
What is Flume <ul><li>Distributed data/log collection service </li></ul><ul><li>Scalable, configurable, extensible </li></...
What is HBase <ul><li>Scalable, reliable, distributed, column-oriented DB </li></ul><ul><li>On top of HDFS </li></ul><ul><...
Data Flow, Detailed
Why Flume <ul><li>Reliable delivery </li></ul><ul><ul><li>e.g. queue msgs locally if destination unreachable </li></ul></u...
Why HBase <ul><li>Scalable raw & aggregate data storage </li></ul><ul><li>MapReduce data input </li></ul><ul><li>Fast scan...
Open Sourcing <ul><li>2 open-source projects: </li></ul><ul><li>github.com/sematext/HBaseWD </li></ul><ul><li>github.com/s...
Challenges <ul><li>Data size. Solutions: </li></ul><ul><ul><li>Compression (4-5x smaller with lzo) </li></ul></ul><ul><ul>...
<ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@sematext </li></ul><ul><li>@otisg </li></ul><u...
Upcoming SlideShare
Loading in …5
×

Search Analytics at Enterprise Search Summit Fall 2011

7,253 views

Published on

This presentation describes what Search Analytics is, what value it brings to the table, how it can be used, what additional functionality and values can be build with search data, etc.

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
7,253
On SlideShare
0
From Embeds
0
Number of Embeds
5,168
Actions
Shares
0
Downloads
64
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • 10 days of data (5K/min)
  • 10 days of data (5K/min)
  • Search Analytics at Enterprise Search Summit Fall 2011

    1. 1. Search Analytics What ? Why ? How ? Otis Gospodneti ć – Sematext International @otisg ◦ @sematext ◦ sematext.com sematext.com/search-analytics
    2. 2. About Otis Gospodneti ć <ul><li>ASF Member : Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author : Lucene in Action 1 & 2 </li></ul><ul><li>Entrepreneur : Sematext , Simpy </li></ul>
    3. 3. Sematext Metrics <ul><li>100% organic : no GMO, no VC </li></ul><ul><li>4 years old </li></ul><ul><li>< 10 people </li></ul><ul><li>7 countries </li></ul><ul><li>3 timezones </li></ul><ul><li>2 continents </li></ul><ul><li>> 100 customers </li></ul>
    4. 4. About Sematext <ul><li>Products & Services </li></ul><ul><li>Consulting, Development, Tech Support: </li></ul><ul><li>Search (Lucene, Solr, ElasticSearch...) </li></ul><ul><li>Big Data (Hadoop, HBase, Voldemort...) </li></ul><ul><li>Web Crawling (Nutch, Droids) </li></ul><ul><li>Machine Learning (Mahout) </li></ul>
    5. 5. Agenda <ul><li>What is Search Analytics and why it matters </li></ul><ul><li>Example reports and their value </li></ul><ul><li>Optional: Search Analytics in the Cloud </li></ul>
    6. 6. Communication <ul><li>twitter.com/ sematext </li></ul><ul><li>twitter.com/ otisg </li></ul><ul><li>hash tags: # stsa or # stanalytics </li></ul><ul><li>http://sematext.com/search-analytics/index.html </li></ul><ul><li>Raise your hand ! </li></ul><ul><li>otis @sematext.com </li></ul>
    7. 7. Why search users search providers search experience
    8. 8. Why Oh Why search providers search experience This search sucks! It takes 17 tries to find anything here! F!?@#$%^&?!? search users Cool, the latest search tweaks made our site really sticky! Awesome!
    9. 9. Fill in the Missing Piece Search Analytics Performance Monitoring Quality Assurance Tuning UI
    10. 10. Blind Leading the Blind
    11. 11. Analytics as Compass <ul><li>Search logs are your Map </li></ul><ul><li>Search Analytics is your Compass </li></ul>
    12. 12. The Bottom Line Why <ul><li>Measure and monitor everything . </li></ul><ul><li>Supports (re)design, navigation choices </li></ul><ul><li>Helps with content acquisition & enhancement </li></ul><ul><li>Improve search experience </li></ul><ul><li>Mula </li></ul>
    13. 13. The Moment of Truth <ul><li>Question for the audience #1 What do you use for Search Analytics? </li></ul><ul><li>a) Home grown stuff b) Google Analytics c) Omniture d) Webtrends e) Other f ) Nothing </li></ul>
    14. 14. Search Analytics Basics <ul><li>Collect : queries & clicks & interactions & ... </li></ul><ul><li>Analyze : actions / xactions / conversions </li></ul><ul><li>Output : reports – over time </li></ul><ul><li>Output++ : feedback loop </li></ul><ul><li>The means, not the goal </li></ul><ul><li>Ongoing, not one-off </li></ul>remember this
    15. 15. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></ul><ul><li>Ideally you can relate data from both or even unify it </li></ul>
    16. 16. Report Types <ul><li>Failures vs. non-failures </li></ul><ul><li>Actionable vs. non-actionable </li></ul><ul><li>Trends vs. summaries </li></ul>
    17. 17. Failures vs. Non-Failures <ul><li>Zero hits </li></ul><ul><li>Low CTR </li></ul><ul><li>Low MRR </li></ul><ul><li>High bounce rate </li></ul><ul><li>Low conversion rate </li></ul><ul><li>Deep paging </li></ul><ul><li>Deep clicking </li></ul><ul><li>High latency </li></ul><ul><li>Query rate </li></ul><ul><li>Query volume </li></ul><ul><li>Top seen & clicked docs </li></ul><ul><li>Top queries </li></ul><ul><li>Terms per query </li></ul><ul><li>Search sessions </li></ul><ul><li>Search users </li></ul><ul><li>Distinct queries </li></ul>
    18. 18. Value of Failure Fixes <ul><li>Zero hits </li></ul><ul><li>Low CTR </li></ul><ul><li>Low MRR </li></ul><ul><li>High bounce rate </li></ul><ul><li>Low conversion rate </li></ul><ul><li>Deep paging </li></ul><ul><li>Deep clicking </li></ul><ul><li>High latency </li></ul>Re-search Findability Relevance Tuning Performance Tuning
    19. 19. Measure, then Fix <ul><li>If you can't measure , it you can't fix it! </li></ul>
    20. 20. Relevance A/B Testing
    21. 21. Tracking Zero Hits
    22. 22. Watching Latency
    23. 23. Search Analytics & Measuring If you can't measure it, you can't fix it! You can't measure it if you don't have Analytics
    24. 24. Actionable vs. Non-Actionable <ul><li>Zero hits </li></ul><ul><li>Low CTR </li></ul><ul><li>Low MRR </li></ul><ul><li>High bounce rate </li></ul><ul><li>Low conversion rate </li></ul><ul><li>Deep paging </li></ul><ul><li>Deep clicking </li></ul><ul><li>High latency </li></ul><ul><li>Query rate </li></ul><ul><li>Query volume </li></ul><ul><li>Top seen & clicked docs </li></ul><ul><li>Top queries </li></ul><ul><li>Terms per query </li></ul><ul><li>Search sessions </li></ul><ul><li>Search users </li></ul><ul><li>Distinct queries </li></ul>
    25. 25. More Fixin' <ul><li>Query rate </li></ul><ul><li>Query volume </li></ul><ul><li>Search sessions </li></ul><ul><li>Search users </li></ul><ul><li>Top seen & clicked docs </li></ul><ul><li>Top queries </li></ul><ul><li>Terms per query </li></ul><ul><li>Distinct queries </li></ul>Navigation & Design Results Shuffling Diversification Recommendations AutoComplete Search box size
    26. 26. Output++: Data is Power <ul><li>AutoComplete - $MM improvement </li></ul><ul><li>Better DYM Spellchecker </li></ul><ul><li>Related Searches </li></ul><ul><li>Recommendations </li></ul><ul><li>Relevance Feedback </li></ul><ul><li>... </li></ul>
    27. 27. Closing the Loop search users search providers search experience
    28. 28. Resources http://rosenfeldmedia.com/books/searchanalytics/ Search Analytics for Your Site Louis Rosenfeld <ul><li>Search Analytics What? Why? How? </li></ul><ul><li>Search Analytics with Flume and HBase </li></ul><ul><li>Search Analytics Business Value & NoSQL Backend http://blog.sematext.com/tag/analytics/ </li></ul>
    29. 29. Key Take-aways <ul><li>Without Analytics you are blind </li></ul><ul><li>If you can't measure it, you can't fix it </li></ul><ul><li>Use Search Analytics to understand , measure and improve search </li></ul><ul><li>Using Search Analytics means having a competitive advantage </li></ul>
    30. 30. <ul><li>Time permitting: Behind the scenes of Sematext Search Analytics </li></ul>Behind the Scenes
    31. 31. <ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@sematext </li></ul><ul><li>@otisg </li></ul><ul><li>[email_address] Want SA? Grab me or go to: </li></ul><ul><li>sematext.com/search-analytics </li></ul><ul><li> Hash tags: # stsa or # stanalytics </li></ul>Contact
    32. 32. What We've Built <ul><li>Search Analytics SaaS </li></ul><ul><ul><li>Numerous reports (e.g. query volume, rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.) </li></ul></ul><ul><ul><li>Trending over time </li></ul></ul><ul><ul><li>Comparisons of time periods </li></ul></ul><ul><ul><li>Top N reports </li></ul></ul><ul><ul><li>Filter , slice and dice </li></ul></ul>
    33. 33. Sematext Search Analytics
    34. 34. Big Dreams <ul><li>SaaS </li></ul><ul><li>Multitenant </li></ul><ul><li>Large Scale – Massive Data </li></ul><ul><li>Cloud </li></ul>
    35. 35. Storage Choices <ul><li>RDBMS: MySQL, PostgreSQL </li></ul><ul><li>HDFS </li></ul><ul><li>Hive </li></ul><ul><li>HBase </li></ul><ul><li>Cassandra </li></ul>
    36. 36. SaaS vs. In-House <ul><li>Question for the audience #2 </li></ul><ul><li>SaaS vs in-house Search Analytics? a) SaaS b) in-house </li></ul>
    37. 37. Sematext Search Analytics
    38. 38. Sematext Search Analytics
    39. 39. Sematext Search Analytics
    40. 40. Sematext Search Analytics
    41. 41. Data Flow <ul><li>See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ </li></ul>
    42. 42. Data Collection <ul><li>See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ </li></ul>
    43. 43. Core Tech <ul><li>JavaScript Beacons </li></ul><ul><li>Metric Capture Web App aka Receiver </li></ul><ul><li>Flume Agents, Collectors, Sinks </li></ul><ul><li>HBase </li></ul><ul><li>MapReduce Aggregations </li></ul><ul><li>Search Analytics Reporting Web App </li></ul>
    44. 44. What is Flume <ul><li>Distributed data/log collection service </li></ul><ul><li>Scalable, configurable, extensible </li></ul><ul><li>Centrally manageable, open source </li></ul><ul><li>Agents get data from app, Collectors save it </li></ul><ul><li>Abstractions: Source -> Decorator(s) -> Sink </li></ul>
    45. 45. What is HBase <ul><li>Scalable, reliable, distributed, column-oriented DB </li></ul><ul><li>On top of HDFS </li></ul><ul><li>MapReducable </li></ul>
    46. 46. Data Flow, Detailed
    47. 47. Why Flume <ul><li>Reliable delivery </li></ul><ul><ul><li>e.g. queue msgs locally if destination unreachable </li></ul></ul><ul><li>Easy, centralized management via Web UI or console </li></ul><ul><li>Good community, good progress, now @ASF </li></ul><ul><li>But: more complex, more moving parts </li></ul><ul><li>On Flume: slideshare.net/cloudera/inside-flume </li></ul><ul><li>Alternatives: Kafka, Scribe... </li></ul>
    48. 48. Why HBase <ul><li>Scalable raw & aggregate data storage </li></ul><ul><li>MapReduce data input </li></ul><ul><li>Fast scans for time ranges, fast key lookups </li></ul><ul><li>Easy storage and compute power expansion </li></ul><ul><li>Good looking roadmap, community, progress </li></ul>
    49. 49. Open Sourcing <ul><li>2 open-source projects: </li></ul><ul><li>github.com/sematext/HBaseWD </li></ul><ul><li>github.com/sematext/HBaseHUT </li></ul><ul><li>See sematext.com/open-source/index.html </li></ul><ul><li>Patches for Flume and HBase blog.sematext.com/tag/flume/ </li></ul>
    50. 50. Challenges <ul><li>Data size. Solutions: </li></ul><ul><ul><li>Compression (4-5x smaller with lzo) </li></ul></ul><ul><ul><li>Data pruning (variable levels) </li></ul></ul><ul><li>Query string distribution: very long-tail </li></ul><ul><ul><li>Lots of data to process, update, aggregate </li></ul></ul><ul><li>Young tools: Flume, HBase </li></ul><ul><li>Poor IO on EC2 </li></ul><ul><li>Hadoop distributions </li></ul>
    51. 51. <ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@sematext </li></ul><ul><li>@otisg </li></ul><ul><li>[email_address] Want SA? Grab me or go to: </li></ul><ul><li>sematext.com/search-analytics </li></ul><ul><li> Hash tags: # stsa or # stanalytics </li></ul>Contact

    ×