SlideShare a Scribd company logo
1 of 26
The Big Idea
Universal Recommender
RECOMMENDATIONS
REQUIRED
A LITTLE HISTORY:
MOTIVATION
• Coocurrence: Mahout 2012
• Factorized ALS: Mahout then Spark’s MLlib
• Experience with then current Recommender Tech
• Evaluation and Experiments
• Could only use “purchase” data threw out 100x view data
• No “realtime”
• too many edge cases, users that had no recommendations
• didn’t adapt to metadata/content of items
• Lots of discussions with Ted Dunning, Sean Owen, Sebastian
Schelter, Pat Ferrel (me)
• Cooccurrence and cross-cooccurrence led to many innovations
ANATOMY OF A RECOMMENDATION
PERSONALIZED
r = recommendations
hp = a user’s history of some action
(purchase for instance)
P = the history of all users’ primary action
rows are users, columns are items
(PtP) = compares column to column using
log-likelihood based correlation test
r = (PtP)hp
COOCCURRENCE WITH LLR
• Let’s call (PtP) an indicator matrix for some primary action like
purchase
• Rows = items, columns = items, element =
similarity/correlation score
• The score is row compared to column using a “similarity” or
“correlation” metric
• Log-Likelihood Ratio (LLR) finds important/correlating
cooccurrences and filters out the rest—a major improvement
in quality over simple cooccurrence or other similarity metrics.
• Experiments on real-world data show LLR is significantly
better than other similarity metrics
* http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
LLR AND SIMILARITY METRICS
PRECISION (MAP@K)
Higher is better
MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10
Similarity Metrics
Mean Average Precision
Mahout Cooccurrence Recommender with E-Commerce Data
Cosine Tanimoto Log-likelihood
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !
FROM COOCCURRENCE TO
RECOMMENDATION
• This actually means to take the user’s
history hp and compare it to rows of the
cooccurrence matrix (PtP)
• TF-IDF weighting of cooccurrence would
be nice to mitigate the undue influence
of popular items
• Find items nearest to the user’s history
• Sort these by similarity strength and
keep only the highest
—you have recommendations
• Sound familiar? Find the k-nearest
neighbors using cosine and TF-IDF?
• That’s exactly what a search engine
does!
r = (PtP)hp
hp
user1: [item2, item3]
(PtP)
item1: [item2, item3]
item2: [item1, item3, item95]
item3: […]
find item that most closely
matches the user’s history
item1 !
USER HISTORY + COOCCURRENCES
+ SEARCH = RECOMMENDATIONS
• The final calculation uses hp as the query on the Cooccurrence
Matrix (PtP), returns a ranked set of items
• Query is a “similarity” query, not relational or key based fetch
• Uses Search Engine as Cosine-based K-Nearest Neighbor
(KNN) Engine with norms and TF-IDF weighting
• Highly optimized for serving these queries in realtime
• Several (Solr, Elasticsearch) have High Availability, massively
scalable clustered auto-sharding features like the best of
NoSQL DBs.
r = (PtP)hp
THE UNIVERSAL RECOMMENDER:
THE BREAKTHROUGH IDEA
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …
THE UNIVERSAL RECOMMENDER:
CORRELATED CROSS-OCCURRENCE
• Virtually all existing collaborative filtering type recommenders
use only one indicator of preference
• The theory doesn’t stop there!
• Virtually anything we know about the user can be used to
improve recommendations—purchase, view, category-
preference, location-preference, device-preference, gender…
CROSS-OCCURRENCE
r = (PtP)hp
r = (PtP)hp + (PtV)hv + (PtC)hc + …
• Comparing the history of the primary action to other actions finds
actions that lead to the one you want to recommend
• Given strong data about user preferences on a general population
we can also use
• items clicked
• terms searched
• categories viewed
• items shared
• people followed
• items disliked (yes dislikes may predict likes)
• location
• device preference
• gender
• age bracket
• Virtually any anything we know about the population can be
tested for correlation and used to predict a particular users
preferences
CORRELATED CROSS-OCCURRENCE:
SO WHAT?
CORRELATED CROSS-OCCURRENCE;
ADDING CONTENT MODELS
• Collaborative Topic Filtering
• Use Latent Dirichlet Allocation (LDA) to model topics directly from the
textual content
• Calculate based on Word2Vec type word vectors instead of bag-of-
words analysis to boost quality
• Create cross-occurrence indicators from topics the user has preferred
• Repeat periodically
• Entity Preferences:
• Use a Named Entity Recognition (NER) system to find entities in
textual content
• Create cross-occurrence indicators for these entities
• Entities and Topics are long lived and richly describe user
interests, these are very good for use in the Universal
Recommender.
THE UNIVERSAL RECOMMENDER
ADDING CONTENT-BASED RECS
Indicators can also be based on content
similarity
(TTt) is a calculation that compares every 2
documents to each other and finds the most
similar—based upon content alone
r = (TTt)ht + l*L …
INDICATOR TYPES
• Cooccurrence
• Find the best indicator of a user preference for the item type to be recommended: examples are “buy”,
“read”, “video_watch”, “share”, “follow”, “like”.
• Cross-occurrence
• Item metadata as “user” preference, for example: treat item category as a user category-preferences
• Calculated from user actions on any data that may give information about user— category-preferences,
search terms, gender, location
• Create with Mahout-Samsara SimilarityAnalysis.cooccurrence
• Content or metadata
• Content text, tags, categories, description text, anything describing an item
• Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity
• Intrinsic
• Popularity rank, geo-location, anything describing an item
• Some may be derived from usage data like popularity rank, or hotness
• Is a known or specially calculated property of the item
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all indicators at once
Unified query:
purchase-correlator: users-history-of-purchases
view-correlator: users-history-of-views
category-correlator: users-history-of-categories-viewed
tags-correlator: users-history-of-purchases
geo-location-correlator: users-location
…
r = (PtP)hp + (PtV)hv + (PtC)hc + …
(TTt)ht + l*L …
THE UNIVERSAL RECOMMENDER
AKA THE WHOLE ENCHILADA
“Universal” means one query on all correlators at once
Once indicators are indexed as search fields this entire
equation is a single query
Fast!
r = (PtP)hp + (PtV)hv + (PtC)hc + …
(TTt)ht + l*L …
THE UNIVERSAL RECOMMENDER:
BETTER USER COVERAGE
• Any number of user actions—entire user clickstream
• Metadata—from user profile or items
• Context—on-site, time, location
• Content—unstructured text or semi-structured
categorical
• Mixes any number of “indicators” to increase quality
or tune to specific context
• Solution to the “cold-start” problem—items with too
short a lifespan or new users with no history
• Can recommend to new users using
realtime history
• Can use new interaction data from
any user in realtime
• 95% implemented in Universal Recommender
v0.3.0—most current release
All Users
Universal Recommender
ALS or 1-action
Recommenders
POLISH THE APPLE
• Dithering for auto-optimize via explore-exploit:
Randomize some returned recs, if they are acted upon they become
part of the new training data and are more likely to be recommended
in the future
• Visibility control:
• Don’t show dups, blacklist items already shown
• Filter items the user has already seen
• Zero-downtime Deployment: deploy prediction server
once then hot-swap new index when ready.
• Generate some intrinsic indicators like hot, popular—
helps solve the “cold-start” problem
• Asymmetric train vs query—query with most recent user
data, train on all historical data
Architecture Based on
PredictionIO
Universal Recommender
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
background
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
realtime
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
MODEL CREATION
events
&
item
metadata
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
backgroundREALTIME
RECOMMENDATION SERVING
Spark-Mahout’s
Correlation Engine
UNIVERSAL RECOMMENDER
LAMBDA ARCHITECTURE
Application
query and
recommendations
events
&
item
metadata
RECOMMENDATION SERVING
PredictionIO
SDK or REST
PredictionIO
EventServer
DATA IN
Universal Recommender Engine
PredictionIO REST
Serving Component
Spark-Mahout’s
Correlation Engine
Elasticsearch Spark
MODEL UPDATE
HBase
user history
itemProperties
BACKGROUNDREALTIME
Appendix
TECH STACK
• Hbase 1.X
• Postgres, MySQL, or other JDBC possible
• Spark 1.6.X
• Fast, massively scalable, seems like the “winner”
• HDFS 2.6—Hadoop Distributed File System
• Reiable, massively scalable, the defacto standard
• Spray
• Supplies REST endpoints, muti-threaded via Akka actors
• Elasticsearch 1.7.X or 2.X
• Reliable, massively scalable, fast
• Scala & Java 8
• Fits functional and oop programming style for productivity
• Stable, Scalable, High Availability, Well Supported
* The ES json query looks like this:
* {
* "size": 20
* "query": {
* "bool": {
* "should": [
* {
* "terms": {
* "rate": ["0", "67", "4"]
* }
* },
* {
* "terms": {
* "buy": ["0", "32"],
* "boost": 2
* }
* },
* { // categorical boosts
* "terms": {
* "category": ["cat1"],
* "boost": 1.05
* }
* }
* ],
* "must": [ // categorical filters
* {
* "terms": {
* "category": ["cat1"],
* "boost": 0
* }
* },
* {
* "must_not": [//blacklisted items
* {
* "ids": {
* "values": ["items-id1", "item-id2", ...]
* }
* },
* {
* "constant_score": {// date in query must fall between the expire and avqilable dates of an item
* "filter": {
* "range": {
* "availabledate": {
* "lte": "2015-08-30T12:24:41-07:00"
* }
* }
* },
* "boost": 0
* }
* },
* {
* "constant_score": {// date range filter in query must be between these item property values
* "filter": {
* "range" : {
* "expiredate" : {
* "gte": "2015-08-15T11:28:45.114-07:00"
* "lt": "2015-08-20T11:28:45.114-07:00"
* }
* }
* }, "boost": 0
* }
* },
* {
* "constant_score": { // this orders popular items for backfill
* "filter": {
* "match_all": {}
* },
* "boost": 0.000001 // must have as least a small number to be boostable
* }
* }
* }
* }
* }
*
An example Elasticsearch query on a multi-
field index created from the output of the CCO
engine. The index includes about 90% of the
data in the “whole enchilada” equation.
This executes in 50ms on a non-cached
cluster and ~26ms on an unoptimized cluster.

More Related Content

What's hot

Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
Adam Kawa
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
Liang Xiang
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
Liang Xiang
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Zachary Schendel
 

What's hot (20)

Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
Fact Store at Scale for Netflix Recommendations with Nitin Sharma and Kedar S...
 
Recommendation system for ecommerce
Recommendation system for ecommerceRecommendation system for ecommerce
Recommendation system for ecommerce
 
Systemy rekomendacji
Systemy rekomendacjiSystemy rekomendacji
Systemy rekomendacji
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
A Hybrid Recommendation system
A Hybrid Recommendation systemA Hybrid Recommendation system
A Hybrid Recommendation system
 
Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
A Practical Enterprise Feature Store on Delta Lake
A Practical Enterprise Feature Store on Delta LakeA Practical Enterprise Feature Store on Delta Lake
A Practical Enterprise Feature Store on Delta Lake
 
How to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on SparkHow to Build a Recommendation Engine on Spark
How to Build a Recommendation Engine on Spark
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
RedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ TwitterRedisConf17- Using Redis at scale @ Twitter
RedisConf17- Using Redis at scale @ Twitter
 

Similar to The Universal Recommender

Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
Georgian Micsa
 

Similar to The Universal Recommender (20)

Discovery
DiscoveryDiscovery
Discovery
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Big data certification training mumbai
Big data certification training mumbaiBig data certification training mumbai
Big data certification training mumbai
 
Best data science courses in pune
Best data science courses in puneBest data science courses in pune
Best data science courses in pune
 
Top data science institutes in hyderabad
Top data science institutes in hyderabadTop data science institutes in hyderabad
Top data science institutes in hyderabad
 
best online data science courses
best online data science coursesbest online data science courses
best online data science courses
 
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep produc...
 
Recommandation systems -
Recommandation systems - Recommandation systems -
Recommandation systems -
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Content based recommendation systems
Content based recommendation systemsContent based recommendation systems
Content based recommendation systems
 
recommendation system techunique and issue
recommendation system techunique and issuerecommendation system techunique and issue
recommendation system techunique and issue
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket data
 
Lec7 collaborative filtering
Lec7 collaborative filteringLec7 collaborative filtering
Lec7 collaborative filtering
 
Quick introduction to the click-through filter
Quick introduction to the click-through filterQuick introduction to the click-through filter
Quick introduction to the click-through filter
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Recommender lecture
Recommender lectureRecommender lecture
Recommender lecture
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

The Universal Recommender

  • 3. A LITTLE HISTORY: MOTIVATION • Coocurrence: Mahout 2012 • Factorized ALS: Mahout then Spark’s MLlib • Experience with then current Recommender Tech • Evaluation and Experiments • Could only use “purchase” data threw out 100x view data • No “realtime” • too many edge cases, users that had no recommendations • didn’t adapt to metadata/content of items • Lots of discussions with Ted Dunning, Sean Owen, Sebastian Schelter, Pat Ferrel (me) • Cooccurrence and cross-cooccurrence led to many innovations
  • 4. ANATOMY OF A RECOMMENDATION PERSONALIZED r = recommendations hp = a user’s history of some action (purchase for instance) P = the history of all users’ primary action rows are users, columns are items (PtP) = compares column to column using log-likelihood based correlation test r = (PtP)hp
  • 5. COOCCURRENCE WITH LLR • Let’s call (PtP) an indicator matrix for some primary action like purchase • Rows = items, columns = items, element = similarity/correlation score • The score is row compared to column using a “similarity” or “correlation” metric • Log-Likelihood Ratio (LLR) finds important/correlating cooccurrences and filters out the rest—a major improvement in quality over simple cooccurrence or other similarity metrics. • Experiments on real-world data show LLR is significantly better than other similarity metrics * http://ssc.io/wp-content/uploads/2011/12/rec11-schelter.pdf
  • 6. LLR AND SIMILARITY METRICS PRECISION (MAP@K) Higher is better MAP@1 MAP@2 MAP@3 MAP@4 MAP@5 MAP@6 MAP@7 MAP@8 MAP@9 MAP@10 Similarity Metrics Mean Average Precision Mahout Cooccurrence Recommender with E-Commerce Data Cosine Tanimoto Log-likelihood
  • 7. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  • 8. FROM COOCCURRENCE TO RECOMMENDATION • This actually means to take the user’s history hp and compare it to rows of the cooccurrence matrix (PtP) • TF-IDF weighting of cooccurrence would be nice to mitigate the undue influence of popular items • Find items nearest to the user’s history • Sort these by similarity strength and keep only the highest —you have recommendations • Sound familiar? Find the k-nearest neighbors using cosine and TF-IDF? • That’s exactly what a search engine does! r = (PtP)hp hp user1: [item2, item3] (PtP) item1: [item2, item3] item2: [item1, item3, item95] item3: […] find item that most closely matches the user’s history item1 !
  • 9. USER HISTORY + COOCCURRENCES + SEARCH = RECOMMENDATIONS • The final calculation uses hp as the query on the Cooccurrence Matrix (PtP), returns a ranked set of items • Query is a “similarity” query, not relational or key based fetch • Uses Search Engine as Cosine-based K-Nearest Neighbor (KNN) Engine with norms and TF-IDF weighting • Highly optimized for serving these queries in realtime • Several (Solr, Elasticsearch) have High Availability, massively scalable clustered auto-sharding features like the best of NoSQL DBs. r = (PtP)hp
  • 10. THE UNIVERSAL RECOMMENDER: THE BREAKTHROUGH IDEA • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  • 11. THE UNIVERSAL RECOMMENDER: CORRELATED CROSS-OCCURRENCE • Virtually all existing collaborative filtering type recommenders use only one indicator of preference • The theory doesn’t stop there! • Virtually anything we know about the user can be used to improve recommendations—purchase, view, category- preference, location-preference, device-preference, gender… CROSS-OCCURRENCE r = (PtP)hp r = (PtP)hp + (PtV)hv + (PtC)hc + …
  • 12. • Comparing the history of the primary action to other actions finds actions that lead to the one you want to recommend • Given strong data about user preferences on a general population we can also use • items clicked • terms searched • categories viewed • items shared • people followed • items disliked (yes dislikes may predict likes) • location • device preference • gender • age bracket • Virtually any anything we know about the population can be tested for correlation and used to predict a particular users preferences CORRELATED CROSS-OCCURRENCE: SO WHAT?
  • 13. CORRELATED CROSS-OCCURRENCE; ADDING CONTENT MODELS • Collaborative Topic Filtering • Use Latent Dirichlet Allocation (LDA) to model topics directly from the textual content • Calculate based on Word2Vec type word vectors instead of bag-of- words analysis to boost quality • Create cross-occurrence indicators from topics the user has preferred • Repeat periodically • Entity Preferences: • Use a Named Entity Recognition (NER) system to find entities in textual content • Create cross-occurrence indicators for these entities • Entities and Topics are long lived and richly describe user interests, these are very good for use in the Universal Recommender.
  • 14. THE UNIVERSAL RECOMMENDER ADDING CONTENT-BASED RECS Indicators can also be based on content similarity (TTt) is a calculation that compares every 2 documents to each other and finds the most similar—based upon content alone r = (TTt)ht + l*L …
  • 15. INDICATOR TYPES • Cooccurrence • Find the best indicator of a user preference for the item type to be recommended: examples are “buy”, “read”, “video_watch”, “share”, “follow”, “like”. • Cross-occurrence • Item metadata as “user” preference, for example: treat item category as a user category-preferences • Calculated from user actions on any data that may give information about user— category-preferences, search terms, gender, location • Create with Mahout-Samsara SimilarityAnalysis.cooccurrence • Content or metadata • Content text, tags, categories, description text, anything describing an item • Create with Mahout-Samsara SimilarityAnalysis.rowSimilarity • Intrinsic • Popularity rank, geo-location, anything describing an item • Some may be derived from usage data like popularity rank, or hotness • Is a known or specially calculated property of the item
  • 16. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all indicators at once Unified query: purchase-correlator: users-history-of-purchases view-correlator: users-history-of-views category-correlator: users-history-of-categories-viewed tags-correlator: users-history-of-purchases geo-location-correlator: users-location … r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  • 17. THE UNIVERSAL RECOMMENDER AKA THE WHOLE ENCHILADA “Universal” means one query on all correlators at once Once indicators are indexed as search fields this entire equation is a single query Fast! r = (PtP)hp + (PtV)hv + (PtC)hc + … (TTt)ht + l*L …
  • 18. THE UNIVERSAL RECOMMENDER: BETTER USER COVERAGE • Any number of user actions—entire user clickstream • Metadata—from user profile or items • Context—on-site, time, location • Content—unstructured text or semi-structured categorical • Mixes any number of “indicators” to increase quality or tune to specific context • Solution to the “cold-start” problem—items with too short a lifespan or new users with no history • Can recommend to new users using realtime history • Can use new interaction data from any user in realtime • 95% implemented in Universal Recommender v0.3.0—most current release All Users Universal Recommender ALS or 1-action Recommenders
  • 19. POLISH THE APPLE • Dithering for auto-optimize via explore-exploit: Randomize some returned recs, if they are acted upon they become part of the new training data and are more likely to be recommended in the future • Visibility control: • Don’t show dups, blacklist items already shown • Filter items the user has already seen • Zero-downtime Deployment: deploy prediction server once then hot-swap new index when ready. • Generate some intrinsic indicators like hot, popular— helps solve the “cold-start” problem • Asymmetric train vs query—query with most recent user data, train on all historical data
  • 21. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION background events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties realtime RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  • 22. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations MODEL CREATION events & item metadata PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Elasticsearch Spark MODEL UPDATE HBase user history itemProperties backgroundREALTIME RECOMMENDATION SERVING Spark-Mahout’s Correlation Engine
  • 23. UNIVERSAL RECOMMENDER LAMBDA ARCHITECTURE Application query and recommendations events & item metadata RECOMMENDATION SERVING PredictionIO SDK or REST PredictionIO EventServer DATA IN Universal Recommender Engine PredictionIO REST Serving Component Spark-Mahout’s Correlation Engine Elasticsearch Spark MODEL UPDATE HBase user history itemProperties BACKGROUNDREALTIME
  • 25. TECH STACK • Hbase 1.X • Postgres, MySQL, or other JDBC possible • Spark 1.6.X • Fast, massively scalable, seems like the “winner” • HDFS 2.6—Hadoop Distributed File System • Reiable, massively scalable, the defacto standard • Spray • Supplies REST endpoints, muti-threaded via Akka actors • Elasticsearch 1.7.X or 2.X • Reliable, massively scalable, fast • Scala & Java 8 • Fits functional and oop programming style for productivity • Stable, Scalable, High Availability, Well Supported
  • 26. * The ES json query looks like this: * { * "size": 20 * "query": { * "bool": { * "should": [ * { * "terms": { * "rate": ["0", "67", "4"] * } * }, * { * "terms": { * "buy": ["0", "32"], * "boost": 2 * } * }, * { // categorical boosts * "terms": { * "category": ["cat1"], * "boost": 1.05 * } * } * ], * "must": [ // categorical filters * { * "terms": { * "category": ["cat1"], * "boost": 0 * } * }, * { * "must_not": [//blacklisted items * { * "ids": { * "values": ["items-id1", "item-id2", ...] * } * }, * { * "constant_score": {// date in query must fall between the expire and avqilable dates of an item * "filter": { * "range": { * "availabledate": { * "lte": "2015-08-30T12:24:41-07:00" * } * } * }, * "boost": 0 * } * }, * { * "constant_score": {// date range filter in query must be between these item property values * "filter": { * "range" : { * "expiredate" : { * "gte": "2015-08-15T11:28:45.114-07:00" * "lt": "2015-08-20T11:28:45.114-07:00" * } * } * }, "boost": 0 * } * }, * { * "constant_score": { // this orders popular items for backfill * "filter": { * "match_all": {} * }, * "boost": 0.000001 // must have as least a small number to be boostable * } * } * } * } * } * An example Elasticsearch query on a multi- field index created from the output of the CCO engine. The index includes about 90% of the data in the “whole enchilada” equation. This executes in 50ms on a non-cached cluster and ~26ms on an unoptimized cluster.