Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The Big Idea
Universal Recommender
RECOMMENDATIONS
REQUIRED
A LITTLE HISTORY:
MOTIVATION
• Coocurrence: Mahout 2012
• Factorized ALS: Mahout then Spark’s MLlib
• Experience with then...
ANATOMY OF A RECOMMENDATION
PERSONALIZED
r = recommendations
hp = a user’s history of some action
(purchase for instance)
...
COOCCURRENCE WITH LLR
• Let’s call (PtP) an indicator matrix for some primary action like
purchase
• Rows = items, columns...
LLR AND SIMILARITY METRICS
PRECISION (MAP@K)
Higher is better
MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10...
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooc...
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooc...
USER HISTORY + COOCCURRENCES
+ SEARCH = RECOMMENDATIONS
• The final calculation uses hp as the query on the Cooccurrence
M...
THE UNIVERSAL RECOMMENDER:
THE BREAKTHROUGH IDEA
• Virtually all existing collaborative filtering type recommenders
use on...
THE UNIVERSAL RECOMMENDER:
CORRELATED CROSS-OCCURRENCE
• Virtually all existing collaborative filtering type recommenders
...
• Comparing the history of the primary action to other actions finds
actions that lead to the one you want to recommend
• ...
CORRELATED CROSS-OCCURRENCE;
ADDING CONTENT MODELS
• Collaborative Topic Filtering
• Use Latent Dirichlet Allocation (LDA)...
THE UNIVERSAL RECOMMENDER
ADDING CONTENT-BASED RECS
Indicators can also be based on content
similarity
(TTt) is a calculat...
INDICATOR TYPES
• Cooccurrence
• Find the best indicator of a user preference for the item type to be recommended: example...
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all indicators at once
Unified query:
pur...
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all correlators at once
Once indicators a...
THE UNIVERSAL RECOMMENDER:
BETTER USER COVERAGE
• Any number of user actions—entire user clickstream
• Metadata—from user ...
POLISH THE APPLE
• Dithering for auto-optimize via explore-exploit:
Randomize some returned recs, if they are acted upon t...
Architecture Based on
PredictionIO
Universal Recommender
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
background
events
&
item
me...
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
events
&
item
metadata
Pred...
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
events
&
item
metadata
RECOMMENDATION SERV...
Appendix
TECH STACK
• Hbase 1.X
• Postgres, MySQL, or other JDBC possible
• Spark 1.6.X
• Fast, massively scalable, seems like the ...
* The ES json query looks like this:
* {
* "size": 20
* "query": {
* "bool": {
* "should": [
* {
* "terms": {
* "rate": ["...
Upcoming SlideShare
Loading in …5
×

The Universal Recommender

25,625 views

Published on

How to create a cutting edge recommender that is fast, scalable, can use almost any applicable data, and is extremely flexible for use in many different contexts. Uses Spark, Mahout, and a search engine.

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

The Universal Recommender

  1. 1. The Big Idea Universal Recommender
  2. 2. RECOMMENDATIONS REQUIRED
  3. 3. A LITTLE HISTORY: MOTIVATION • Coocurrence: Mahout 2012 • Factorized ALS: Mahout then Spark’s MLlib • Experience with then current Recommender Tech • Evaluation and Experiments • Could only use “purchase” data threw out 100x view data • No “realtime” • too many edge cases, users that had no recommendations • didn’t adapt to metadata/content of items • Lots of discussions with Ted Dunning, Sean Owen, Sebastian Schelter, Pat Ferrel (me) • Cooccurrence and cross-cooccurrence led to many innovations
  4. 4. ANATOMY OF A RECOMMENDATION PERSONALIZED r = recommendations hp = a user’s history of some action (purchase for instance) P = the history of all users’ primary action rows are users, columns are items (PtP) = compares column to column using log-likelihood based correlation test r = (PtP)hp
  5. 5. COOCCURRENCE WITH LLR • Let’s call (PtP) an indicator matrix for some primary action like purchase • Rows = items, columns = items, element = similarity/correlation score • The score is row compared to column using a “similarity” or “correlation” metric • Log-Likelihood Ratio (LLR) finds important/correlating cooccurrences and filters out the rest—a major improvement in quality over simple cooccurrence or other similarity metrics. • Experiments on real-world data show LLR is significantly better than other similarity metrics * http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
  6. 6. LLR AND SIMILARITY METRICS PRECISION (MAP@K) Higher is better MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10 Similarity Metrics Mean Average Precision Mahout Cooccurrence Recommender with E-Commerce Data Cosine Tanimoto Log-likelihood
  7. 7. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  8. 8. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? • That’s exactly what a search engine does! r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  9. 9. USER HISTORY + COOCCURRENCES + SEARCH = RECOMMENDATIONS • The final calculation uses hp as the query on the Cooccurrence Matrix (PtP), returns a ranked set of items • Query is a “similarity” query, not relational or key based fetch • Uses Search Engine as Cosine-based K-Nearest Neighbor (KNN) Engine with norms and TF-IDF weighting • Highly optimized for serving these queries in realtime • Several (Solr, Elasticsearch) have High Availability, massively scalable clustered auto-sharding features like the best of NoSQL DBs. r = (PtP)hp
  10. 10. THE UNIVERSAL RECOMMENDER: THE BREAKTHROUGH IDEA • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  11. 11. THE UNIVERSAL RECOMMENDER: CORRELATED CROSS-OCCURRENCE • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… CROSS-OCCURRENCE r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  12. 12. • Comparing the history of the primary action to other actions finds actions that lead to the one you want to recommend • Given strong data about user preferences on a general population we can also use • items clicked • terms searched • categories viewed • items shared • people followed • items disliked (yes dislikes may predict likes) • location • device preference • gender • age bracket • Virtually any anything we know about the population can be tested for correlation and used to predict a particular users preferences CORRELATED CROSS-OCCURRENCE: SO WHAT?
  13. 13. CORRELATED CROSS-OCCURRENCE; ADDING CONTENT MODELS • Collaborative Topic Filtering • Use Latent Dirichlet Allocation (LDA) to model topics directly from the textual content • Calculate based on Word2Vec type word vectors instead of bag-of- words analysis to boost quality • Create cross-occurrence indicators from topics the user has preferred • Repeat periodically • Entity Preferences: • Use a Named Entity Recognition (NER) system to find entities in textual content • Create cross-occurrence indicators for these entities • Entities and Topics are long lived and richly describe user interests, these are very good for use in the Universal Recommender.
  14. 14. THE UNIVERSAL RECOMMENDER ADDING CONTENT-BASED RECS Indicators can also be based on content similarity (TTt) is a calculation that compares every 2 documents to each other and finds the most similar—based upon content alone r = (TTt)ht + l*L …
  15. 15. INDICATOR TYPES • Cooccurrence • Find the best indicator of a user preference for the item type to be recommended: examples are “buy”, “read”, “video_watch”, “share”, “follow”, “like”. • Cross-occurrence • Item metadata as “user” preference, for example: treat item category as a user category-preferences • Calculated from user actions on any data that may give information about user— category-preferences, search terms, gender, location • Create with Mahout-Samsara SimilarityAnalysis.cooccurrence • Content or metadata • Content text, tags, categories, description text, anything describing an item • Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity • Intrinsic • Popularity rank, geo-location, anything describing an item • Some may be derived from usage data like popularity rank, or hotness • Is a known or specially calculated property of the item
  16. 16. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all indicators at once Unified query: purchase-correlator: users-history-of-purchases view-correlator: users-history-of-views category-correlator: users-history-of-categories-viewed tags-correlator: users-history-of-purchases geo-location-correlator: users-location … r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  17. 17. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all correlators at once Once indicators are indexed as search fields this entire equation is a single query Fast! r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  18. 18. THE UNIVERSAL RECOMMENDER: BETTER USER COVERAGE • Any number of user actions—entire user clickstream • Metadata—from user profile or items • Context—on-site, time, location • Content—unstructured text or semi-structured categorical • Mixes any number of “indicators” to increase quality or tune to specific context • Solution to the “cold-start” problem—items with too short a lifespan or new users with no history • Can recommend to new users using realtime history • Can use new interaction data from any user in realtime • 95% implemented in Universal Recommender v0.3.0—most current release All Users Universal Recommender ALS or 1-action Recommenders
  19. 19. POLISH THE APPLE • Dithering for auto-optimize via explore-exploit: Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future • Visibility control: • Don’t show dups, blacklist items already shown • Filter items the user has already seen • Zero-downtime Deployment: deploy prediction server once then hot-swap new index when ready. • Generate some intrinsic indicators like hot, popular— helps solve the “cold-start” problem • Asymmetric train vs query—query with most recent user data, train on all historical data
  20. 20. Architecture Based on PredictionIO Universal Recommender
  21. 21. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION background events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties realtime RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  22. 22. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties backgroundREALTIME RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  23. 23. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations events & item metadata RECOMMENDATION SERVING PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Spark-Mahout’s Correlation Engine Elasticsearch Spark MODEL UPDATE HBase user history itemProperties BACKGROUNDREALTIME
  24. 24. Appendix
  25. 25. TECH STACK • Hbase 1.X • Postgres, MySQL, or other JDBC possible • Spark 1.6.X • Fast, massively scalable, seems like the “winner” • HDFS 2.6—Hadoop Distributed File System • Reiable, massively scalable, the defacto standard • Spray • Supplies REST endpoints, muti-threaded via Akka actors • Elasticsearch 1.7.X or 2.X • Reliable, massively scalable, fast • Scala & Java 8 • Fits functional and oop programming style for productivity • Stable, Scalable, High Availability, Well Supported
  26. 26. * The ES json query looks like this: * { * "size": 20 * "query": { * "bool": { * "should": [ * { * "terms": { * "rate": ["0", "67", "4"] * } * }, * { * "terms": { * "buy": ["0", "32"], * "boost": 2 * } * }, * { // categorical boosts * "terms": { * "category": ["cat1"], * "boost": 1.05 * } * } * ], * "must": [ // categorical filters * { * "terms": { * "category": ["cat1"], * "boost": 0 * } * }, * { * "must_not": [//blacklisted items * { * "ids": { * "values": ["items-id1", "item-id2", ...] * } * }, * { * "constant_score": {// date in query must fall between the expire and avqilable dates of an item * "filter": { * "range": { * "availabledate": { * "lte": "2015-08-30T12:24:41-07:00" * } * } * }, * "boost": 0 * } * }, * { * "constant_score": {// date range filter in query must be between these item property values * "filter": { * "range" : { * "expiredate" : { * "gte": "2015-08-15T11:28:45.114-07:00" * "lt": "2015-08-20T11:28:45.114-07:00" * } * } * }, "boost": 0 * } * }, * { * "constant_score": { // this orders popular items for backfill * "filter": { * "match_all": {} * }, * "boost": 0.000001 // must have as least a small number to be boostable * } * } * } * } * } * An example Elasticsearch query on a multi- field index created from the output of the CCO engine. The index includes about 90% of the data in the “whole enchilada” equation. This executes in 50ms on a non-cached cluster and ~26ms on an unoptimized cluster.

×