Successfully reported this slideshow.
Search Analytics Business Value & NoSQL Backend Otis Gospodneti ć  –  Sematext International @otisg  ◦  @sematext  ◦  sema...
About Otis Gospodneti ć <ul><li>ASF Member : Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author :  Lucene in Action  1 &...
Sematext Metrics <ul><li>100%  organic : no GMO, no VC </li></ul><ul><li>4 years old </li></ul><ul><li>< 10 people </li></...
About Sematext <ul><li>Products & Services </li></ul><ul><li>Consulting, Development, Tech Support: </li></ul><ul><li>Sear...
Agenda <ul><li>What is Search Analytics and why it matters </li></ul><ul><li>Example reports and their value </li></ul><ul...
Communication <ul><li>twitter.com/ sematext </li></ul><ul><li>twitter.com/ otisg </li></ul><ul><li>hash tags:  # stsa  or ...
The Compass <ul><li>Search logs are your  Map </li></ul><ul><li>Search Analytics is your  Compass </li></ul>
High Level Why search users search providers search experience
High Level Why search providers search experience This search sucks! It takes 17 tries to find anything here! F!?@#$%^&?!?...
Don't Be Like This Dude
Got Clue? Search Analytics Performance Monitoring Quality Assurance Tuning UI
More Concrete Why <ul><li>Measure  and  monitor everything . Introspection. </li></ul><ul><li>Supports (re)design, navigat...
The Moment of Truth <ul><li>Question for the audience #1 What do you use for Search Analytics? </li></ul><ul><li>a) Home g...
Search Analytics Outline <ul><li>Collect:  queries  &  clicks  &  interactions  & ... </li></ul><ul><li>Analyze: actions /...
Search vs. Web Analytics <ul><li>User  intent  and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></u...
Example Core Reports <ul><li>Rate & Volume, Latency (mean, avg, 90%) </li></ul><ul><li>Click Through Rate, Mean Reciprocal...
More Reports in More Detail <ul><li>See  Search Analytics What? Why? How? http://blog.sematext.com/tag/analytics/ </li></ul>
Part Dos <ul><li>Switching gears... Juno digs NoSQL </li></ul>
What We've Built <ul><li>Search Analytics SaaS </li></ul><ul><ul><li>Numerous  reports  (e.g. query volume, rate, latency,...
Who Needs a Compass? <ul><li>We need it </li></ul><ul><ul><li>search-hadoop.com   &   search-lucene.com </li></ul></ul><ul...
Sematext Search Analytics
Big Dreams <ul><li>SaaS </li></ul><ul><li>Multitenant </li></ul><ul><li>Large Scale – Massive Data </li></ul><ul><li>Cloud...
Storage Choices <ul><li>RDBMS: MySQL, PostgreSQL </li></ul><ul><li>HDFS </li></ul><ul><li>Hive </li></ul><ul><li>HBase </l...
SaaS vs. In-House <ul><li>Question for the audience #2 </li></ul><ul><li>SaaS vs in-house Search Analytics? a) SaaS b) in-...
Sematext Search Analytics
Sematext Search Analytics
Sematext Search Analytics
Sematext Search Analytics
Data Flow <ul><li>See  Search Analytics with Flume and HBase   http://blog.sematext.com/2010/10/16/search-analytics-hadoop...
Data Collection <ul><li>See  Search Analytics with Flume and HBase   http://blog.sematext.com/2010/10/16/search-analytics-...
Core Tech <ul><li>JavaScript  Beacons </li></ul><ul><li>Metric Capture Web App aka  Receiver </li></ul><ul><li>Flume  Agen...
What is Flume <ul><li>Distributed data/log collection service </li></ul><ul><li>Scalable, configurable, extensible </li></...
What is HBase <ul><li>Scalable, reliable, distributed, column-oriented DB </li></ul><ul><li>On top of HDFS </li></ul><ul><...
Data Flow, Detailed
Why Flume <ul><li>Reliable delivery </li></ul><ul><ul><li>e.g. queue msgs locally if destination unreachable </li></ul></u...
Why HBase <ul><li>Scalable raw & aggregate data storage </li></ul><ul><li>MapReduce data input </li></ul><ul><li>Fast scan...
Open Sourcing <ul><li>2 open-source projects: </li></ul><ul><li>github.com/sematext/HBaseWD </li></ul><ul><li>github.com/s...
Challenges <ul><li>Data size. Solutions: </li></ul><ul><ul><li>Compression (4-5x smaller with lzo) </li></ul></ul><ul><ul>...
Output++ <ul><li>AutoComplete - $MM improvement </li></ul><ul><li>Better DYM Spellchecker </li></ul><ul><li>Related Search...
Closing the Loop search users search providers search experience
Resource http://rosenfeldmedia.com/books/searchanalytics/   Search Analytics for Your Site Louis Rosenfeld
We're Hiring <ul><li>Dig  Search ? </li></ul><ul><li>Dig  Analytics ? </li></ul><ul><li>Dig  Big Data ? </li></ul><ul><li>...
<ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@sematext </li></ul><ul><li>@otisg </li></ul><u...
Upcoming SlideShare
Loading in …5
×

Otis gospodnetic Search Analytics Lucene Eurocon 2011

1,256 views

Published on

See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Otis gospodnetic Search Analytics Lucene Eurocon 2011

  1. 1. Search Analytics Business Value & NoSQL Backend Otis Gospodneti ć – Sematext International @otisg ◦ @sematext ◦ sematext.com sematext.com/search-analytics
  2. 2. About Otis Gospodneti ć <ul><li>ASF Member : Lucene, Solr, Nutch, Mahout </li></ul><ul><li>Author : Lucene in Action 1 & 2 </li></ul><ul><li>Entrepreneur : Sematext , Simpy </li></ul>
  3. 3. Sematext Metrics <ul><li>100% organic : no GMO, no VC </li></ul><ul><li>4 years old </li></ul><ul><li>< 10 people </li></ul><ul><li>7 countries </li></ul><ul><li>3 timezones </li></ul><ul><li>2 continents </li></ul><ul><li>> 100 customers </li></ul>
  4. 4. About Sematext <ul><li>Products & Services </li></ul><ul><li>Consulting, Development, Tech Support: </li></ul><ul><li>Search (Lucene, Solr, ElasticSearch...) </li></ul><ul><li>Big Data (Hadoop, HBase, Voldemort...) </li></ul><ul><li>Web Crawling (Nutch, Droids) </li></ul><ul><li>Machine Learning (Mahout) </li></ul>
  5. 5. Agenda <ul><li>What is Search Analytics and why it matters </li></ul><ul><li>Example reports and their value </li></ul><ul><li>What we built, why, and how </li></ul>
  6. 6. Communication <ul><li>twitter.com/ sematext </li></ul><ul><li>twitter.com/ otisg </li></ul><ul><li>hash tags: # stsa or # stanalytics </li></ul><ul><li>http://sematext.com/search-analytics/index.html </li></ul><ul><li>Raise your hand ! </li></ul><ul><li>otis @sematext.com </li></ul>
  7. 7. The Compass <ul><li>Search logs are your Map </li></ul><ul><li>Search Analytics is your Compass </li></ul>
  8. 8. High Level Why search users search providers search experience
  9. 9. High Level Why search providers search experience This search sucks! It takes 17 tries to find anything here! F!?@#$%^&?!? search users Cool, the latest search tweaks made our site really sticky! Awesome!
  10. 10. Don't Be Like This Dude
  11. 11. Got Clue? Search Analytics Performance Monitoring Quality Assurance Tuning UI
  12. 12. More Concrete Why <ul><li>Measure and monitor everything . Introspection. </li></ul><ul><li>Supports (re)design, navigation choices </li></ul><ul><li>Helps with content acquisition & enhancement </li></ul><ul><li>Improve search experience </li></ul><ul><li>Mula </li></ul>
  13. 13. The Moment of Truth <ul><li>Question for the audience #1 What do you use for Search Analytics? </li></ul><ul><li>a) Home grown stuff b) Google Analytics c) Omniture d) Webtrends e) Other f ) Nothing </li></ul>
  14. 14. Search Analytics Outline <ul><li>Collect: queries & clicks & interactions & ... </li></ul><ul><li>Analyze: actions / xactions / conversions </li></ul><ul><li>Output: reports – over time </li></ul><ul><li>Output++: feedback loop </li></ul><ul><li>The means, not the goal </li></ul><ul><li>Ongoing, not one-off </li></ul>remember this
  15. 15. Search vs. Web Analytics <ul><li>User intent and information needs vs. inferring </li></ul><ul><li>Hand in hand </li></ul><ul><li>Ideally you can relate data from both or even unify it </li></ul>
  16. 16. Example Core Reports <ul><li>Rate & Volume, Latency (mean, avg, 90%) </li></ul><ul><li>Click Through Rate, Mean Reciprocal Rank </li></ul><ul><li>Top Queries by count, clicks, 0 hits... </li></ul><ul><li>Query Trending </li></ul><ul><li>Top Seen Docs, Top Clicked Docs (msft) </li></ul><ul><li>Page & Click Depth </li></ul><ul><li>Facet & Sort Usage </li></ul><ul><li>... </li></ul>
  17. 17. More Reports in More Detail <ul><li>See Search Analytics What? Why? How? http://blog.sematext.com/tag/analytics/ </li></ul>
  18. 18. Part Dos <ul><li>Switching gears... Juno digs NoSQL </li></ul>
  19. 19. What We've Built <ul><li>Search Analytics SaaS </li></ul><ul><ul><li>Numerous reports (e.g. query volume, rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.) </li></ul></ul><ul><ul><li>Trending over time </li></ul></ul><ul><ul><li>Comparisons of time periods </li></ul></ul><ul><ul><li>Top N reports </li></ul></ul><ul><ul><li>Filter , slice and dice </li></ul></ul>
  20. 20. Who Needs a Compass? <ul><li>We need it </li></ul><ul><ul><li>search-hadoop.com & search-lucene.com </li></ul></ul><ul><li>Our customers need it! </li></ul><ul><li>You ? </li></ul>
  21. 21. Sematext Search Analytics
  22. 22. Big Dreams <ul><li>SaaS </li></ul><ul><li>Multitenant </li></ul><ul><li>Large Scale – Massive Data </li></ul><ul><li>Cloud </li></ul>
  23. 23. Storage Choices <ul><li>RDBMS: MySQL, PostgreSQL </li></ul><ul><li>HDFS </li></ul><ul><li>Hive </li></ul><ul><li>HBase </li></ul><ul><li>Cassandra </li></ul>
  24. 24. SaaS vs. In-House <ul><li>Question for the audience #2 </li></ul><ul><li>SaaS vs in-house Search Analytics? a) SaaS b) in-house </li></ul>
  25. 25. Sematext Search Analytics
  26. 26. Sematext Search Analytics
  27. 27. Sematext Search Analytics
  28. 28. Sematext Search Analytics
  29. 29. Data Flow <ul><li>See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ </li></ul>
  30. 30. Data Collection <ul><li>See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ </li></ul>
  31. 31. Core Tech <ul><li>JavaScript Beacons </li></ul><ul><li>Metric Capture Web App aka Receiver </li></ul><ul><li>Flume Agents, Collectors, Sinks </li></ul><ul><li>HBase </li></ul><ul><li>MapReduce Aggregations </li></ul><ul><li>Search Analytics Reporting Web App </li></ul>
  32. 32. What is Flume <ul><li>Distributed data/log collection service </li></ul><ul><li>Scalable, configurable, extensible </li></ul><ul><li>Centrally manageable, open source </li></ul><ul><li>Agents get data from app, Collectors save it </li></ul><ul><li>Abstractions: Source -> Decorator(s) -> Sink </li></ul>
  33. 33. What is HBase <ul><li>Scalable, reliable, distributed, column-oriented DB </li></ul><ul><li>On top of HDFS </li></ul><ul><li>MapReducable </li></ul>
  34. 34. Data Flow, Detailed
  35. 35. Why Flume <ul><li>Reliable delivery </li></ul><ul><ul><li>e.g. queue msgs locally if destination unreachable </li></ul></ul><ul><li>Easy, centralized management via Web UI or console </li></ul><ul><li>Good community, good progress, now @ASF </li></ul><ul><li>But: more complex, more moving parts </li></ul><ul><li>On Flume: slideshare.net/cloudera/inside-flume </li></ul><ul><li>Alternatives: Kafka, Scribe... </li></ul>
  36. 36. Why HBase <ul><li>Scalable raw & aggregate data storage </li></ul><ul><li>MapReduce data input </li></ul><ul><li>Fast scans for time ranges, fast key lookups </li></ul><ul><li>Easy storage and compute power expansion </li></ul><ul><li>Good looking roadmap, community, progress </li></ul>
  37. 37. Open Sourcing <ul><li>2 open-source projects: </li></ul><ul><li>github.com/sematext/HBaseWD </li></ul><ul><li>github.com/sematext/HBaseHUT </li></ul><ul><li>See sematext.com/open-source/index.html </li></ul><ul><li>Patches for Flume and HBase blog.sematext.com/tag/flume/ </li></ul>
  38. 38. Challenges <ul><li>Data size. Solutions: </li></ul><ul><ul><li>Compression (4-5x smaller with lzo) </li></ul></ul><ul><ul><li>Data pruning (variable levels) </li></ul></ul><ul><li>Query string distribution: very long-tail </li></ul><ul><ul><li>Lots of data to process, update, aggregate </li></ul></ul><ul><li>Young tools: Flume, HBase </li></ul><ul><li>Poor IO on EC2 </li></ul><ul><li>Hadoop distributions </li></ul>
  39. 39. Output++ <ul><li>AutoComplete - $MM improvement </li></ul><ul><li>Better DYM Spellchecker </li></ul><ul><li>Related Searches </li></ul><ul><li>Recommendations </li></ul><ul><li>Relevance Feedback </li></ul><ul><li>... </li></ul>
  40. 40. Closing the Loop search users search providers search experience
  41. 41. Resource http://rosenfeldmedia.com/books/searchanalytics/ Search Analytics for Your Site Louis Rosenfeld
  42. 42. We're Hiring <ul><li>Dig Search ? </li></ul><ul><li>Dig Analytics ? </li></ul><ul><li>Dig Big Data ? </li></ul><ul><li>Dig Performance ? </li></ul><ul><li>Dig working with and in open-source ? </li></ul><ul><li>We're hiring world-wide! </li></ul><ul><li>http://sematext.com/about/jobs.html </li></ul>
  43. 43. <ul><li>sematext.com </li></ul><ul><li>blog.sematext.com </li></ul><ul><li>@sematext </li></ul><ul><li>@otisg </li></ul><ul><li>[email_address] Want SA? Grab me or go to: </li></ul><ul><li>sematext.com/search-analytics </li></ul><ul><li> Hash tags: # stsa or # stanalytics </li></ul>Contact

×