Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Webinar: Search and Recommenders

Presented by Grant Ingersoll and Jake Mannix, Lucidworks

  • Login to see the comments

Webinar: Search and Recommenders

  1. 1. 2016 OCTOBER 11-14
  2. 2. Search and Recommenders Grant Ingersoll @gsingers CTO, Lucidworks Jake Mannix @pbrane Lead Data Engineer, Lucidworks
  3. 3. • Vision, motivations and definitions • Use cases for ecommerce, compliance, fraud and customer support • Fusion and the evolution of recommenders • Demo • Future Directions Agenda
  4. 4. Search-Driven Everything Customer Service Customer Insights Fraud Surveillance Research Portal Online Retail Digital Content
  5. 5. • Many companies treat search, recommendations/discovery and analytics as different beasts, yet: • The same inputs that make search better can also drive recommendations and better analytics • Engagement analytics is the key: • Your users give you engagement signals regarding the content that is relevant to them • Over time, patterns emerge in similarities of behavior (simplest possible pattern is just “popularity”) • These signals are often the biggest factor in both search relevance AND recommendations • In the enterprise, this is still the case, but the types of signals are often different (email, IM) Three Sides of the Same Coin
  6. 6. • Content — documents which are textually similar are often good as “similar items” to be recommended • Collaborative — documents which have been engaged with by the same people (and/or in the same search context) are also similar in a more subtle, but often more powerful way • Multi-Modal — why choose one? Try a smooth interpolation between using a content-based similarity metric, and an engagement based one! Defining Moments
  7. 7. Search-Driven Online Retail  Increase conversions with a personalized shopping experience with best in class reliability and performance. CATALOG DYNAMIC NAVIGATION AND LANDING PAGES INSTANT INSIGHTS AND ANALYTICS PERSONALIZED SHOPPING EXPERIENCE PROMOTIONS USER HISTORY Data Acquisition Data Processing Smart Access API
  8. 8. Search-Driven Compliance and Surveillance Detect and investigate activity for regulatory compliance, from one unified view. DATABASE ACCURATE REAL-TIME INFORMATION CONTEXTUALLY- ENRICHED INFORMATION MESSAGESLOGS DATA EXPLORATION AND VISUALIZATION Data Acquisition Indexing & Streaming Smart Access API
  9. 9. Search-Driven Customer Service Resolve customer issues quickly with immediate access to relevant answers. CUSTOMER 
  10. 10. Fusion and Recommenders
  11. 11. Lucidworks Fusion Is Search-Driven Everything •Drive next generation relevance via Content, Collaboration and Context •Harness best in class Open Source: Apache Solr + Spark •Simplify application development and reduce ongoing maintenance CATALOG DYNAMIC NAVIGATION AND LANDING PAGES INSTANT INSIGHTS AND ANALYTICS PERSONALIZED SHOPPING EXPERIENCE PROMOTIONS USER HISTORY Data Acquisition Indexing & Streaming Smart Access API Recommendations &
 Alerts Analytics & InsightsExtreme Relevancy Access data from anywhere to build intelligent, data- driven applications.
  12. 12. Fusion Architecture RESTAPI Worker Worker Cluster Mgr. Apache Spark Shards Shards Apache Solr HDFS(Optional) Shared Config Mgmt Leader Election Load Balancing ZK 1 Apache Zookeeper ZK N DATABASEWEBFILELOGSHADOOP CLOUD Connectors Alerting/Messaging NLP Pipelines Blob Storage Scheduling Recommenders/Signals … Core Services Admin UI SECURITY BUILT-IN Lucidworks View
  13. 13. • Fusion • Recommenders API • Machine Learning pipeline stages • Scheduling • Solr: • More Like This + Signals • Spark: • MLlib, Mahout, custom Key Platform Tech
  14. 14. • Solr comes built-in with a query parser, MoreLikeThis, which takes a given document, and: • Extracts nontrivial terms from specified fields in it • Builds an “OR” query to search for closest matches (like a cosine similarity computation) • Has many knobs to tune regarding “data-cleaning” non-useful terms from the query • TF-IDF is great, but there are other metrics possible: LSI, LDA, W2V Content-focused {!mlt qf=body,suggest,subject,title mintf=2 mindf=5 minwl=3}<DOC_ID>
  15. 15. “People who bought X also bought Y” / “Movies recommended for you” Collaborative Filtering Search User/ Item Index Top K users who’ve interacted with this Item Search and Rollup on User/ Item Index Top Y docs Current Doc Filter by context Profit User/Item Index Offline Tasks User/Item Signals Math!
  16. 16. • Fusion CF-based “documents like this” pipeline stages: • Sub-query: search aggregated signals index for current doc_id, extracting the top-K pairs of (user_id, weight) • Sub-query: search that table again with a weighted OR query: (user_id:user_id_1^weight_1 OR user_id:user_id_2^weight_2 OR … ) • Roll-up: topN(sum(score_i * weight_i)) • Sub-query: fetch the documents from primary Solr index of these top N doc_ids Collaborative Filtering: step by step in Fusion
  17. 17. • Both content-based and CF recommenders use features of the documents to generate a similarity metric • Content uses the tokens in the document • CF uses user ids who have engaged with it • Metrics can be weighted-summed, allowing a “slider” between the two • Fancy similarity techniques which can be done to a (doc, token) matrix can often be done on a (doc, userId) matrix, or even a joint (doc, (token or userId)) concatenated matrix • There is a cost to such techniques: harder to maintain, harder to A/B test variations Multi-modal
  18. 18. • Basics: • 26 Apache Projects registered so far plus LW web properties • 93 datasources* including email, Github, JIRA*, Website and Wiki • Fusion 2.4 • Signals everywhere • UI based on Lucidworks View • ASF Mail archives mirrored at: Demo
  19. 19. Implementation Details Branch: GH-28-doc-view Key Source Code UI Angular Directives: perdocument recommendations Offline Tasks Spark Jobs: mail_thread_signal_creation_job.json SimpleTwoHopRecommender.scala Fusion Pipelines Query: lucidfind-recommendations cf-similar-items-batch-rec cf-similar-items-rec
  20. 20. • Ensemble and Click-based approaches • • • • Deploy live • User registrations • Future Work
  21. 21. Resources Fusion: Search Hub: Company: Our blog: Twitter: @gsingers, @pbrane