The Universal Recommender

The Big Idea
Universal Recommender

A LITTLE HISTORY:
MOTIVATION
• Coocurrence: Mahout 2012
• Factorized ALS: Mahout then Spark’s MLlib
• Experience with then current Recommender Tech
• Evaluation and Experiments
• Could only use “purchase” data threw out 100x view data
• No “realtime”
• too many edge cases, users that had no recommendations
• didn’t adapt to metadata/content of items
• Lots of discussions with Ted Dunning, Sean Owen, Sebastian
Schelter, Pat Ferrel (me)
• Cooccurrence and cross-cooccurrence led to many innovations

ANATOMY OF A RECOMMENDATION
PERSONALIZED
r = recommendations
hp = a user’s history of some action
(purchase for instance)
P = the history of all users’ primary action
rows are users, columns are items
(PtP) = compares column to column using
log-likelihood based correlation test
r = (PtP)hp

COOCCURRENCE WITH LLR
• Let’s call (PtP) an indicator matrix for some primary action like
purchase
• Rows = items, columns = items, element =
similarity/correlation score
• The score is row compared to column using a “similarity” or
“correlation” metric
• Log-Likelihood Ratio (LLR) finds important/correlating
cooccurrences and filters out the rest—a major improvement
in quality over simple cooccurrence or other similarity metrics.
• Experiments on real-world data show LLR is significantly
better than other similarity metrics
* http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf

LLR AND SIMILARITY METRICS
PRECISION (MAP@K)
Higher is better
MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10
Similarity Metrics
Mean Average Precision
Mahout Cooccurrence Recommender with E-Commerce Data
Cosine Tanimoto Log-likelihood

FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !

FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
• That’s exactly what a search engine
does!
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !

USER HISTORY + COOCCURRENCES
+ SEARCH = RECOMMENDATIONS
• The final calculation uses hp as the query on the Cooccurrence
Matrix (PtP), returns a ranked set of items
• Query is a “similarity” query, not relational or key based fetch
• Uses Search Engine as Cosine-based K-Nearest Neighbor
(KNN) Engine with norms and TF-IDF weighting
• Highly optimized for serving these queries in realtime
• Several (Solr, Elasticsearch) have High Availability, massively
scalable clustered auto-sharding features like the best of
NoSQL DBs.
r = (PtP)hp

THE UNIVERSAL RECOMMENDER:
THE BREAKTHROUGH IDEA
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …

CORRELATED CROSS-OCCURRENCE
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
CROSS-OCCURRENCE
r = (PtP)hp

• Comparing the history of the primary action to other actions finds
actions that lead to the one you want to recommend
• Given strong data about user preferences on a general population
we can also use
• items clicked
• terms searched
• categories viewed
• items shared
• people followed
• items disliked (yes dislikes may predict likes)
• location
• device preference
• gender
• age bracket
• Virtually any anything we know about the population can be
tested for correlation and used to predict a particular users
preferences
CORRELATED CROSS-OCCURRENCE:
SO WHAT?

CORRELATED CROSS-OCCURRENCE;
ADDING CONTENT MODELS
• Collaborative Topic Filtering
• Use Latent Dirichlet Allocation (LDA) to model topics directly from the
textual content
• Calculate based on Word2Vec type word vectors instead of bag-of-
words analysis to boost quality
• Create cross-occurrence indicators from topics the user has preferred
• Repeat periodically
• Entity Preferences:
• Use a Named Entity Recognition (NER) system to find entities in
textual content
• Create cross-occurrence indicators for these entities
• Entities and Topics are long lived and richly describe user
interests, these are very good for use in the Universal
Recommender.

THE UNIVERSAL RECOMMENDER
ADDING CONTENT-BASED RECS
Indicators can also be based on content
similarity
(TTt) is a calculation that compares every 2
documents to each other and finds the most
similar—based upon content alone
r = (TTt)ht + l*L …

INDICATOR TYPES
• Cooccurrence
• Find the best indicator of a user preference for the item type to be recommended: examples are “buy”,
“read”, “video_watch”, “share”, “follow”, “like”.
• Cross-occurrence
• Item metadata as “user” preference, for example: treat item category as a user category-preferences
• Calculated from user actions on any data that may give information about user— category-preferences,
search terms, gender, location
• Create with Mahout-Samsara SimilarityAnalysis.cooccurrence
• Content or metadata
• Content text, tags, categories, description text, anything describing an item
• Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity
• Intrinsic
• Popularity rank, geo-location, anything describing an item
• Some may be derived from usage data like popularity rank, or hotness
• Is a known or specially calculated property of the item

AKA THE WHOLE ENCHILADA
“Universal” means one query on all indicators at once
Unified query:
purchase-correlator: users-history-of-purchases
view-correlator: users-history-of-views
category-correlator: users-history-of-categories-viewed
tags-correlator: users-history-of-purchases
geo-location-correlator: users-location
…
(TTt)ht + l*L …

AKA THE WHOLE ENCHILADA
“Universal” means one query on all correlators at once
Once indicators are indexed as search fields this entire
equation is a single query
Fast!
(TTt)ht + l*L …

BETTER USER COVERAGE
• Any number of user actions—entire user clickstream
• Metadata—from user profile or items
• Context—on-site, time, location
• Content—unstructured text or semi-structured
categorical
• Mixes any number of “indicators” to increase quality
or tune to specific context
• Solution to the “cold-start” problem—items with too
short a lifespan or new users with no history
• Can recommend to new users using
realtime history
• Can use new interaction data from
any user in realtime
• 95% implemented in Universal Recommender
v0.3.0—most current release
All Users
ALS or 1-action
Recommenders

POLISH THE APPLE
• Dithering for auto-optimize via explore-exploit:
Randomize some returned recs, if they are acted upon they become
part of the new training data and are more likely to be recommended
in the future
• Visibility control:
• Don’t show dups, blacklist items already shown
• Filter items the user has already seen
• Zero-downtime Deployment: deploy prediction server
once then hot-swap new index when ready.
• Generate some intrinsic indicators like hot, popular—
helps solve the “cold-start” problem
• Asymmetric train vs query—query with most recent user
data, train on all historical data

Architecture Based on
PredictionIO

UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
background
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
realtime
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine

LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
backgroundREALTIME
Spark-Mahout’s
Correlation Engine

LAMBDA ARCHITECTURE
Application
query and
recommendations
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
PredictionIO REST
Serving Component
Spark-Mahout’s
Correlation Engine
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
BACKGROUNDREALTIME

TECH STACK
• Hbase 1.X
• Postgres, MySQL, or other JDBC possible
• Spark 1.6.X
• Fast, massively scalable, seems like the “winner”
• HDFS 2.6—Hadoop Distributed File System
• Reiable, massively scalable, the defacto standard
• Spray
• Supplies REST endpoints, muti-threaded via Akka actors
• Elasticsearch 1.7.X or 2.X
• Reliable, massively scalable, fast
• Scala & Java 8
• Fits functional and oop programming style for productivity
• Stable, Scalable, High Availability, Well Supported

* The ES json query looks like this:
* {
* "size": 20
* "query": {
* "bool": {
* "should": [
* {
* "terms": {
* "rate": ["0", "67", "4"]
* }
* },
* {
* "terms": {
* "buy": ["0", "32"],
* "boost": 2
* }
* },
* { // categorical boosts
* "terms": {
* "category": ["cat1"],
* "boost": 1.05
* }
* }
* ],
* "must": [ // categorical filters
* {
* "terms": {
* "category": ["cat1"],
* "boost": 0
* }
* },
* {
* "must_not": [//blacklisted items
* {
* "ids": {
* "values": ["items-id1", "item-id2", ...]
* }
* },
* {
* "constant_score": {// date in query must fall between the expire and avqilable dates of an item
* "filter": {
* "range": {
* "availabledate": {
* "lte": "2015-08-30T12:24:41-07:00"
* }
* }
* },
* "boost": 0
* }
* },
* {
* "constant_score": {// date range filter in query must be between these item property values
* "filter": {
* "range" : {
* "expiredate" : {
* "gte": "2015-08-15T11:28:45.114-07:00"
* "lt": "2015-08-20T11:28:45.114-07:00"
* }
* }
* }, "boost": 0
* }
* },
* {
* "constant_score": { // this orders popular items for backfill
* "filter": {
* "match_all": {}
* },
* "boost": 0.000001 // must have as least a small number to be boostable
* }
* }
* }
* }
* }
*
An example Elasticsearch query on a multi-
field index created from the output of the CCO
engine. The index includes about 90% of the
data in the “whole enchilada” equation.
This executes in 50ms on a non-cached
cluster and ~26ms on an unoptimized cluster.

The Universal Recommender

More Related Content

What's hot

Similar to The Universal Recommender

Recently uploaded

The Universal Recommender