Your SlideShare is downloading. ×
Search Analytics Business Value & NoSQL Backend
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Search Analytics Business Value & NoSQL Backend

8,106
views

Published on

Published in: Technology

0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
8,106
On Slideshare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
34
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Search Analytics Business Value & NoSQL BackendOtis Gospodnetić – Sematext International @otisg ◦ @sematext ◦ sematext.com sematext.com/search-analytics
  • 2. About Otis Gospodnetić• ASF Member: Lucene, Solr, Nutch, Mahout• Author: Lucene in Action 1 & 2• Entrepreneur: Sematext, Simpy 2 Copyright 2011 Sematext Intl. All rights reserved.
  • 3. Sematext Metrics● 100% organic: no GMO, no VC● 4 years old● < 10 people● 7 countries● 3 timezones● 2 continents● > 100 customers 3 Copyright 2011 Sematext Intl. All rights reserved.
  • 4. About Sematext Products & Services Consulting, Development, Tech Support:● Search (Lucene, Solr, ElasticSearch...)● Big Data (Hadoop, HBase, Voldemort...)● Web Crawling (Nutch, Droids)● Machine Learning (Mahout) 4 Copyright 2011 Sematext Intl. All rights reserved.
  • 5. Agenda● What is Search Analytics and why it matters● Example reports and their value● What we built, why, and how 5 Copyright 2011 Sematext Intl. All rights reserved.
  • 6. Communication● twitter.com/sematext● twitter.com/otisg● hash tags: #stsa or #stanalytics● http://sematext.com/search-analytics/index.html● Raise your hand!● otis@sematext.com 6 Copyright 2011 Sematext Intl. All rights reserved.
  • 7. The Compass Search logs are your Map Search Analytics is your Compass 7 Copyright 2011 Sematext Intl. All rights reserved.
  • 8. High Level Why search users search experience search providers 8 Copyright 2011 Sematext Intl. All rights reserved.
  • 9. High Level Why This search sucks! It takes 17 tries to find anything here! F!?@#$%^&?!? search users search experience search providers Cool, the latest search tweaks made our site really sticky! Awesome! 9 Copyright 2011 Sematext Intl. All rights reserved.
  • 10. Dont Be Like This Dude 10 Copyright 2011 Sematext Intl. All rights reserved.
  • 11. Got Clue? Performance Monitoring Tuning Search Analytics UI Quality Assurance 11 Copyright 2011 Sematext Intl. All rights reserved.
  • 12. More Concrete Why● Measure and monitor everything. Introspection.● Supports (re)design, navigation choices● Helps with content acquisition & enhancement● Improve search experience● Mula 12 Copyright 2011 Sematext Intl. All rights reserved.
  • 13. The Moment of Truth Question for the audience #1 What do you use for Search Analytics? a) Home grown stuff b) Google Analytics c) Omniture d) Webtrends e) Other f ) Nothing 13 Copyright 2011 Sematext Intl. All rights reserved.
  • 14. Search Analytics Outline● Collect: queries & clicks & interactions & ...● Analyze: actions / xactions / conversions● Output: reports – over time● Output++: feedback loop remember this● The means, not the goal● Ongoing, not one-off 14 Copyright 2011 Sematext Intl. All rights reserved.
  • 15. Search vs. Web Analytics● User intent and information needs vs. inferring● Hand in hand● Ideally you can relate data from both or even unify it 15 Copyright 2011 Sematext Intl. All rights reserved.
  • 16. Example Core Reports● Rate & Volume, Latency (mean, avg, 90%)● Click Through Rate, Mean Reciprocal Rank● Top Queries by count, clicks, 0 hits...● Query Trending● Top Seen Docs, Top Clicked Docs (msft)● Page & Click Depth● Facet & Sort Usage● ... 16 Copyright 2011 Sematext Intl. All rights reserved.
  • 17. More Reports in More Detail● See Search Analytics What? Why? How? http://blog.sematext.com/tag/analytics/ 17 Copyright 2011 Sematext Intl. All rights reserved.
  • 18. Part Dos Switching gears... Juno digs NoSQL 18 Copyright 2011 Sematext Intl. All rights reserved.
  • 19. What Weve Built● Search Analytics SaaS ● Numerous reports (e.g. query volume, rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.) ● Trending over time ● Comparisons of time periods ● Top N reports ● Filter, slice and dice 19 Copyright 2011 Sematext Intl. All rights reserved.
  • 20. Who Needs a Compass?● We need it ● search-hadoop.com & search-lucene.com● Our customers need it!● You? 20 Copyright 2011 Sematext Intl. All rights reserved.
  • 21. Sematext Search Analytics 21 Copyright 2011 Sematext Intl. All rights reserved.
  • 22. Big Dreams● SaaS● Multitenant● Large Scale – Massive Data● Cloud 22 Copyright 2011 Sematext Intl. All rights reserved.
  • 23. Storage Choices● RDBMS: MySQL, PostgreSQL● HDFS● Hive● HBase● Cassandra 23 Copyright 2011 Sematext Intl. All rights reserved.
  • 24. SaaS vs. In-House Question for the audience #2 SaaS vs in-house Search Analytics? a) SaaS b) in-house 24 Copyright 2011 Sematext Intl. All rights reserved.
  • 25. Sematext Search Analytics 25 Copyright 2011 Sematext Intl. All rights reserved.
  • 26. Sematext Search Analytics 26 Copyright 2011 Sematext Intl. All rights reserved.
  • 27. Sematext Search Analytics 27 Copyright 2011 Sematext Intl. All rights reserved.
  • 28. Sematext Search Analytics 28 Copyright 2011 Sematext Intl. All rights reserved.
  • 29. Data Flow● See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ 29 Copyright 2011 Sematext Intl. All rights reserved.
  • 30. Data Collection● See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ 30 Copyright 2011 Sematext Intl. All rights reserved.
  • 31. Core Tech● JavaScript Beacons● Metric Capture Web App aka Receiver● Flume Agents, Collectors, Sinks● HBase● MapReduce Aggregations● Search Analytics Reporting Web App 31 Copyright 2011 Sematext Intl. All rights reserved.
  • 32. What is Flume● Distributed data/log collection service● Scalable, configurable, extensible● Centrally manageable, open source● Agents get data from app, Collectors save it● Abstractions: Source → Decorator(s) → Sink 32 Copyright 2011 Sematext Intl. All rights reserved.
  • 33. What is HBase● Scalable, reliable, distributed, column-oriented DB● On top of HDFS● MapReducable 33 Copyright 2011 Sematext Intl. All rights reserved.
  • 34. Data Flow, Detailed 34 Copyright 2011 Sematext Intl. All rights reserved.
  • 35. Why Flume● Reliable delivery ● e.g. queue msgs locally if destination unreachable● Easy, centralized management via Web UI or console● Good community, good progress, now @ASF● But: more complex, more moving parts● On Flume: slideshare.net/cloudera/inside-flume● Alternatives: Kafka, Scribe... 35 Copyright 2011 Sematext Intl. All rights reserved.
  • 36. Why HBase● Scalable raw & aggregate data storage● MapReduce data input● Fast scans for time ranges, fast key lookups● Easy storage and compute power expansion● Good looking roadmap, community, progress 36 Copyright 2011 Sematext Intl. All rights reserved.
  • 37. Open Sourcing● 2 open-source projects: github.com/sematext/HBaseWD github.com/sematext/HBaseHUT● See sematext.com/open-source/index.html● Patches for Flume and HBase blog.sematext.com/tag/flume/ 37 Copyright 2011 Sematext Intl. All rights reserved.
  • 38. Challenges● Data size. Solutions: ● Compression (4-5x smaller with lzo) ● Data pruning (variable levels)● Query string distribution: very long-tail ● Lots of data to process, update, aggregate● Young tools: Flume, HBase● Poor IO on EC2● Hadoop distributions 38 Copyright 2011 Sematext Intl. All rights reserved.
  • 39. Output++● AutoComplete - $MM improvement● Better DYM Spellchecker● Related Searches● Recommendations● Relevance Feedback● ... 39 Copyright 2011 Sematext Intl. All rights reserved.
  • 40. Closing the Loop search users search experience search providers 40 Copyright 2011 Sematext Intl. All rights reserved.
  • 41. Resource Search Analytics for Your Site Louis Rosenfeld http://rosenfeldmedia.com/books/searchanalytics/ 41 Copyright 2011 Sematext Intl. All rights reserved.
  • 42. Were Hiring Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig working with and in open-source? Were hiring world-wide! http://sematext.com/about/jobs.html 42 Copyright 2011 Sematext Intl. All rights reserved.
  • 43. Contact sematext.com blog.sematext.com @sematext @otisg otis@sematext.com Want SA? Grab me or go to: sematext.com/search-analytics Hash tags: #stsa or #stanalytics 43 Copyright 2011 Sematext Intl. All rights reserved.