Overview of Recommender System
STANLEY WANG
SOLUTION ARCHITECT, TECH LEAD
@SWANG68
http://www.linkedin.com/in/stanley-wang-a2b143b
Recommender System
What is Recommender System?
Feedback to Recommender System
Which Areas can Recommender Benefit?
Typical Architecture of Recommender
System
Recommender System Types
• Collaborative Filtering System – aggregation of consumers’
preferences and recommendations to other users based on
similarity in behavioral patterns;
• Content-based System – supervised machine learning used
to induce a classifier to discriminate between interesting and
uninteresting items for the user;
• Knowledge-based System – knowledge about users and
products used to reason what meets the user’s
requirements, using discrimination tree, decision support
tools, case-based reasoning ;
Paradigms of Recommender: Collaborative
Filtering
Collaborative: "Tell me what's
popular among my peers"
Paradigms of Recommender : Content
Based
Content-based: "Show me more of the
same what I've liked"
Paradigms of Recommender : Knowledge Based
Knowledge-based: "Tell me what fits
based on my needs"
Paradigms of Recommender : Hybrid
Hybrid: combinations of various
inputs and/or composition of
different mechanism
Technology Evolution of Recommender
abcd
People who liked this also liked …..
How Collaborative Filtering works?
13
Item
to
Item
User to
User
abcd
User-to-User
 Recommendations are made by finding users
with similar tastes. Jane and Tim both liked
Item 2 and disliked Item 3; it seems they might
have similar taste, which suggests that in
general Jane agrees with Tim. This makes
Item 1 a good recommendation for Tim.
This approach does not scale well for millions
of users.
Item-to-Item
 Recommendations are made by finding items
that have similar appeal to many users.
Tom and Sandra are two users who liked both
Item 1 and Item 4. That suggests that, in
general, people who liked Item 4 will also like
item 1, so Item 1 will be recommended to Tim.
This approach is scalable to millions of users
and millions of items.
Collaborative Filtering
• The most prominent approach to generate
recommendations
o used by large, commercial e-commerce sites
o well-understood, various algorithms and variations exist
o applicable in many domains (book, movies, songs, ..)
• Approach
o use the "wisdom of the crowd" to recommend items
• Basic assumption and idea
o Users give ratings to catalog items (implicitly or explicitly)
o Customers who had similar tastes in the past, will have
similar tastes in the future
Collaborative Filtering Toolkit
• Implemented Big Graph ML Algorithms, including:
o Alternative Least Squares (ALS)
o Sparse-ALS
o SVD++
o LibFM (factorization machines)
o GenSGD
o Item-similarity based methods
User-based Nearest-Neighbor CF
• The basic technique:
o Given an "active user" (Alice) and an item I not yet seen by
Alice
o The goal is to estimate Alice's rating for this item, e.g., by
• find a set of users (peers) who liked the same items as
Alice in the past and who have rated item I
• use, e.g. the average of their ratings to predict, if Alice
will like item I
• do this for all items Alice has not seen and
recommend the best-rated
User-based Nearest-Neighbor CF
• Some first questions
o How do we measure similarity?
o How many neighbors should we consider?
o How do we generate a prediction from the neighbors' ratings?
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Commonly Used Similarity Measure
KNN Nearest Neighbour Methods
• unseen item needed to be
classified
• positive rated items
• negative rated items
• k = 3: negative
• k = 5: positive
A user-based kNN collaborative filtering
method consists of two primary phases:
• the neighborhood formation phase
• the recommendation phase
Measuring user similarity
• A popular similarity measure in user-based CF: Pearson correlation
a, b : users
ra,p : rating of user a for item p
P : set of items, rated both by a and b
Possible similarity values between -1 and 1; = user's average ratings
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
sim = 0,85
sim = 0,70
sim = -0,79
𝒓 𝒂, 𝒓 𝒃
Making predictions
• A common prediction function:
• Calculate, whether the neighbors' ratings for the unseen item i are
higher or lower than their average
• Combine the rating differences – use the similarity as a weight
• Add/subtract the neighbors' bias from the active user's average and use
this as a prediction
Item-based Collaborative Filtering
• Basic idea:
o Use the similarity between items (and not users) to make predictions
• Example:
o Look for items that are similar to Item5
o Take Alice's ratings for these items to predict the rating for Item5
Item1 Item2 Item3 Item4 Item5
Alice 5 3 4 4 ?
User1 3 1 2 3 3
User2 4 3 4 3 5
User3 3 3 1 5 4
User4 1 5 5 2 1
Pre-processing for Item-based CF
• Item-based filtering does not solve the scalability problem itself
• Pre-processing approach by Amazon.com in 2003
o Calculate all pair-wise item similarities in advance
o The neighborhood to be used at run-time is typically rather small,
because only items are taken into account which the user has rated
o Item similarities are supposed to be more stable than user similarities
• Memory requirements
o Up to N2 pair-wise similarities to be memorized (N = number of items)
in theory
o In practice, this is significantly lower (items with no co-ratings)
o Further reductions possible
• Minimum threshold for co-ratings (items, which are rated at least
by n users)
• Limit the size of the neighborhood (might affect recommendation
accuracy)
Similarity Measure for Item based CF
• Produces better results in item-to-item filtering
o for some datasets, no consistent picture in literature
• Ratings are seen as vector in n-dimensional space
• Similarity is calculated based on the angle between the vectors
• Adjusted cosine similarity
o take average user ratings into account, transform the original
ratings
o U: set of users who have rated both items a and b
Recommendation for Item-based CF
• After computing the similarity between items we select a set
of k most similar items to the target item and generate a
predicted value of user u’s rating
where J is the set of k similar items






Jj
Jj j
jisim
jisimr
ip
),(
),(
)(
,u
u,
What is Latent Factor Model?
Latent variables are introduced to account for the underlying reasons of a
user’s choice. When the connections between the latent variables and
observed variables (user, product, rating, etc.) are estimated during the
training recommendations can be made to users by computing their possible
interactions with each product through the latent variables;
Matrix Factorization Approach
How does LSM Work?
Latent Factor Model Algorithm
LSM Algorithm : Alternating Least Square
LSM Algorithm : Alternating Least Square
LSM Algorithm : Stochastic Gradient Descent
Context-Based Recommender Systems
Overview
 The recommender system uses additional data about the context of an
item consumption;
 For example, in the case of a restaurant the time or the location may
be used to improve the recommendation compared to what could be
performed without this additional source of information;
 A restaurant recommendation for a Saturday evening when you go
with your spouse should be different than a restaurant
recommendation on a workday afternoon when you go with co-
workers;
abcd
Overview
Context-Based Recommender Systems
 Recommend a vacation
 Winter vs. summer
 Recommend a purchase (e-retailer)
 Gift vs. for yourself
 Recommend a movie
 To a student who wants to watch it on Saturday
night with his girlfriend in a movie theater.
Motivating Examples
35
 Recommend music
 The music that we like to hear is greatly affected by a context, such
that can be thought of a mixture of our feelings (mood) and the
situation or location (the theme) we associate it with.
 Listen to Bruce Springteen "Born in USA" while driving along the
101.
 Listening to Mozart's Magic Flute while walking in Salzburg.
Motivating Examples
Context-Based Recommender Systems
36
 What is the user when asking for a recommendation?
 Where (and when) the user is ?
 What does the user (e.g., improve his knowledge or really
buy a product)?
 Is the user or with other ?
 Are there products to choose or only ?
Plain recommendation technologies forget to take
into account the user context.
Context-Based Recommender Systems
What simple recommendation techniques ignore?
37
 Obtain sufficient and reliable data describing the user context
 Selecting the right information, i.e., relevant in a particular
personalization task
 Understand the impact of contextual dimensions on the
personalization process
 Computational model the contextual dimension in a more
classical recommendation technology
 For instance: how to extend Collaborative Filtering to include
contextual dimensions?
abcd
Major obstacle for contextual computing
Context-Based Recommender Systems
38
 Each item in the data base is a candidate for splitting
 Context defines all possible splits of an item ratings vector
 We test all the possible splits – we do not have many contextual features
 We choose one split (using a single contextual feature) that maximizes
an impurity measure and whose impurity is higher than a threshold
abcd
Item Split - Intuition and Approach
Context-Based Recommender Systems
39
 Each item in the data base is a candidate for splitting
 Context defines all possible splits of an item ratings vector
 We test all the possible splits – we do not have many contextual features
 We choose one split (using a single contextual feature) that maximizes
an impurity measure and whose impurity is higher than a threshold
abcd
Item Split - Intuition and Approach
Context-Based Recommender Systems
40
Context-Aware Splitting Approaches
Types of Context
Different Views of Context
Model of Context-Based Recommender Systems
Context-Based Pre Filtering Model
Context-Based Post Filtering Model
Context-Based Contextual Model
?3
Active user
Rating
prediction
Trust- based Collaborative Filtering
Active users’ trusted
friends
Users tend to receive advice from people
they trust, such as Trusted friends who can
be defined explicitly by the users or
inferred from social networks .
• Global Metrics: computes a single global trust value for every
single user (reputation on the network)
• Pros:
o Based on the whole community opinion
• Cons:
o Trust is subjective (controversial users)
a
b
d
c
1 3
32
3
Metrics of Trust based Recommender
• Local Metrics: predicts (different) trust scores that are
personalized from the point of view of every single user
• Pros:
o More accurate
o Attack resistance
• Cons:
o Ignoring the “wisdom of the crowd”
a
b
d
c
1 5
32
?
Metrics of Trust based Recommender
51
Content-Based Recommender System
• In content-based recommendations the system tries to
recommend items that matches the User Profile;
• The Profile is based on items user has liked in the past or
explicit interests that he defines;
• A content-based recommender system matches the profile of
the item to the user profile to decide on its relevancy to the
user;
52
Read update
User Profile
New books User Profile
Recommender
Systems
Match
recommendation
Example of Content Based Recommender
What is the “Content"?
• The genre is actually not part of the content of a book
• Most CB-recommendation methods originate from
Information Retrieval (IR) field:
o The item descriptions are usually automatically
extracted (important words)
o Goal is to find and rank interesting text documents
(news articles, web pages)
• Here are some examples:
o Classical IR-based methods based on keywords
o No expert recommendation knowledge involved
o User profile (preferences) are rather learned than explicitly
elicited
Content Representation
• Items stored in a database table
Content Representation
• Structured data
 Small number of attributes
 Each item is described by the same set of attributes
 Known set of values that the attributes may have
• Straightforward topic to work with
 User’s profile contains positive rating for 1001, 1002, 1003
 Would the user be interested in say Oscar, French cuisine, table
service?
• Unstructured data
 No attribute names with well-defined values
 Need to impose structure on free text before it can be used
 Natural language complexity
 Same word with different meanings
 Different words with same meaning
Term-Frequency - Inverse Document Frequency
• Simple keyword representation has its problems
o In particular when automatically extracted because
• Not every word has similar importance
• Longer documents have a higher chance to have an overlap with the
user profile
• Standard measure: TF-IDF
o Encodes text documents as weighted term vector
o TF: Measures, how often a term appears (density in a document)
• Assuming that important terms appear more often
• Normalization has to be done in order to take document length into
account
o IDF: Aims to reduce the weight of terms that appear in all documents
TF - IDF Weighting
• Term frequency tft,d of a term t in a document d
• Inverse document frequency idft of a term t
• TF*IDF weighting


k
dk
dt
dt
n
n
tf
,
,
,







t
t
df
N
idf log
  tdt idftfdtw  ,,
Example TF IDF Representation
User Profiles
• User profile consists of two main types of information
 A model of the user’s preferences. e.g., a function that for any item
predicts the likelihood that the user is interested in that item
 User’s interaction history. e.g., items viewed by a user, items
purchased by a user, search queries, etc.
• “Manual” recommending approaches
 Provide “check box” interface that let the users construct their own
profiles of interests
 A simple database matching process is used to find items that meet
the specified criteria and recommend these to users.
• Rule-based Recommendation
 The system has rules to recommend other products based on user
history
 Rule to recommend sequel to a book or movie to customers who
purchased the previous item in the series
 Can capture common reasons for making recommendations

Overview of recommender system

  • 1.
    Overview of RecommenderSystem STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b
  • 2.
  • 3.
  • 4.
  • 5.
    Which Areas canRecommender Benefit?
  • 6.
    Typical Architecture ofRecommender System
  • 7.
    Recommender System Types •Collaborative Filtering System – aggregation of consumers’ preferences and recommendations to other users based on similarity in behavioral patterns; • Content-based System – supervised machine learning used to induce a classifier to discriminate between interesting and uninteresting items for the user; • Knowledge-based System – knowledge about users and products used to reason what meets the user’s requirements, using discrimination tree, decision support tools, case-based reasoning ;
  • 8.
    Paradigms of Recommender:Collaborative Filtering Collaborative: "Tell me what's popular among my peers"
  • 9.
    Paradigms of Recommender: Content Based Content-based: "Show me more of the same what I've liked"
  • 10.
    Paradigms of Recommender: Knowledge Based Knowledge-based: "Tell me what fits based on my needs"
  • 11.
    Paradigms of Recommender: Hybrid Hybrid: combinations of various inputs and/or composition of different mechanism
  • 12.
  • 13.
    abcd People who likedthis also liked ….. How Collaborative Filtering works? 13 Item to Item User to User abcd User-to-User  Recommendations are made by finding users with similar tastes. Jane and Tim both liked Item 2 and disliked Item 3; it seems they might have similar taste, which suggests that in general Jane agrees with Tim. This makes Item 1 a good recommendation for Tim. This approach does not scale well for millions of users. Item-to-Item  Recommendations are made by finding items that have similar appeal to many users. Tom and Sandra are two users who liked both Item 1 and Item 4. That suggests that, in general, people who liked Item 4 will also like item 1, so Item 1 will be recommended to Tim. This approach is scalable to millions of users and millions of items.
  • 14.
    Collaborative Filtering • Themost prominent approach to generate recommendations o used by large, commercial e-commerce sites o well-understood, various algorithms and variations exist o applicable in many domains (book, movies, songs, ..) • Approach o use the "wisdom of the crowd" to recommend items • Basic assumption and idea o Users give ratings to catalog items (implicitly or explicitly) o Customers who had similar tastes in the past, will have similar tastes in the future
  • 15.
    Collaborative Filtering Toolkit •Implemented Big Graph ML Algorithms, including: o Alternative Least Squares (ALS) o Sparse-ALS o SVD++ o LibFM (factorization machines) o GenSGD o Item-similarity based methods
  • 16.
    User-based Nearest-Neighbor CF •The basic technique: o Given an "active user" (Alice) and an item I not yet seen by Alice o The goal is to estimate Alice's rating for this item, e.g., by • find a set of users (peers) who liked the same items as Alice in the past and who have rated item I • use, e.g. the average of their ratings to predict, if Alice will like item I • do this for all items Alice has not seen and recommend the best-rated
  • 17.
    User-based Nearest-Neighbor CF •Some first questions o How do we measure similarity? o How many neighbors should we consider? o How do we generate a prediction from the neighbors' ratings? Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1
  • 18.
  • 19.
    KNN Nearest NeighbourMethods • unseen item needed to be classified • positive rated items • negative rated items • k = 3: negative • k = 5: positive A user-based kNN collaborative filtering method consists of two primary phases: • the neighborhood formation phase • the recommendation phase
  • 20.
    Measuring user similarity •A popular similarity measure in user-based CF: Pearson correlation a, b : users ra,p : rating of user a for item p P : set of items, rated both by a and b Possible similarity values between -1 and 1; = user's average ratings Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 sim = 0,85 sim = 0,70 sim = -0,79 𝒓 𝒂, 𝒓 𝒃
  • 21.
    Making predictions • Acommon prediction function: • Calculate, whether the neighbors' ratings for the unseen item i are higher or lower than their average • Combine the rating differences – use the similarity as a weight • Add/subtract the neighbors' bias from the active user's average and use this as a prediction
  • 22.
    Item-based Collaborative Filtering •Basic idea: o Use the similarity between items (and not users) to make predictions • Example: o Look for items that are similar to Item5 o Take Alice's ratings for these items to predict the rating for Item5 Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1
  • 23.
    Pre-processing for Item-basedCF • Item-based filtering does not solve the scalability problem itself • Pre-processing approach by Amazon.com in 2003 o Calculate all pair-wise item similarities in advance o The neighborhood to be used at run-time is typically rather small, because only items are taken into account which the user has rated o Item similarities are supposed to be more stable than user similarities • Memory requirements o Up to N2 pair-wise similarities to be memorized (N = number of items) in theory o In practice, this is significantly lower (items with no co-ratings) o Further reductions possible • Minimum threshold for co-ratings (items, which are rated at least by n users) • Limit the size of the neighborhood (might affect recommendation accuracy)
  • 24.
    Similarity Measure forItem based CF • Produces better results in item-to-item filtering o for some datasets, no consistent picture in literature • Ratings are seen as vector in n-dimensional space • Similarity is calculated based on the angle between the vectors • Adjusted cosine similarity o take average user ratings into account, transform the original ratings o U: set of users who have rated both items a and b
  • 25.
    Recommendation for Item-basedCF • After computing the similarity between items we select a set of k most similar items to the target item and generate a predicted value of user u’s rating where J is the set of k similar items       Jj Jj j jisim jisimr ip ),( ),( )( ,u u,
  • 26.
    What is LatentFactor Model? Latent variables are introduced to account for the underlying reasons of a user’s choice. When the connections between the latent variables and observed variables (user, product, rating, etc.) are estimated during the training recommendations can be made to users by computing their possible interactions with each product through the latent variables;
  • 27.
  • 28.
  • 30.
  • 31.
    LSM Algorithm :Alternating Least Square
  • 32.
    LSM Algorithm :Alternating Least Square
  • 33.
    LSM Algorithm :Stochastic Gradient Descent
  • 34.
    Context-Based Recommender Systems Overview The recommender system uses additional data about the context of an item consumption;  For example, in the case of a restaurant the time or the location may be used to improve the recommendation compared to what could be performed without this additional source of information;  A restaurant recommendation for a Saturday evening when you go with your spouse should be different than a restaurant recommendation on a workday afternoon when you go with co- workers; abcd Overview
  • 35.
    Context-Based Recommender Systems Recommend a vacation  Winter vs. summer  Recommend a purchase (e-retailer)  Gift vs. for yourself  Recommend a movie  To a student who wants to watch it on Saturday night with his girlfriend in a movie theater. Motivating Examples 35
  • 36.
     Recommend music The music that we like to hear is greatly affected by a context, such that can be thought of a mixture of our feelings (mood) and the situation or location (the theme) we associate it with.  Listen to Bruce Springteen "Born in USA" while driving along the 101.  Listening to Mozart's Magic Flute while walking in Salzburg. Motivating Examples Context-Based Recommender Systems 36
  • 37.
     What isthe user when asking for a recommendation?  Where (and when) the user is ?  What does the user (e.g., improve his knowledge or really buy a product)?  Is the user or with other ?  Are there products to choose or only ? Plain recommendation technologies forget to take into account the user context. Context-Based Recommender Systems What simple recommendation techniques ignore? 37
  • 38.
     Obtain sufficientand reliable data describing the user context  Selecting the right information, i.e., relevant in a particular personalization task  Understand the impact of contextual dimensions on the personalization process  Computational model the contextual dimension in a more classical recommendation technology  For instance: how to extend Collaborative Filtering to include contextual dimensions? abcd Major obstacle for contextual computing Context-Based Recommender Systems 38
  • 39.
     Each itemin the data base is a candidate for splitting  Context defines all possible splits of an item ratings vector  We test all the possible splits – we do not have many contextual features  We choose one split (using a single contextual feature) that maximizes an impurity measure and whose impurity is higher than a threshold abcd Item Split - Intuition and Approach Context-Based Recommender Systems 39
  • 40.
     Each itemin the data base is a candidate for splitting  Context defines all possible splits of an item ratings vector  We test all the possible splits – we do not have many contextual features  We choose one split (using a single contextual feature) that maximizes an impurity measure and whose impurity is higher than a threshold abcd Item Split - Intuition and Approach Context-Based Recommender Systems 40
  • 41.
  • 42.
  • 43.
  • 44.
    Model of Context-BasedRecommender Systems
  • 45.
  • 46.
  • 47.
  • 48.
    ?3 Active user Rating prediction Trust- basedCollaborative Filtering Active users’ trusted friends Users tend to receive advice from people they trust, such as Trusted friends who can be defined explicitly by the users or inferred from social networks .
  • 49.
    • Global Metrics:computes a single global trust value for every single user (reputation on the network) • Pros: o Based on the whole community opinion • Cons: o Trust is subjective (controversial users) a b d c 1 3 32 3 Metrics of Trust based Recommender
  • 50.
    • Local Metrics:predicts (different) trust scores that are personalized from the point of view of every single user • Pros: o More accurate o Attack resistance • Cons: o Ignoring the “wisdom of the crowd” a b d c 1 5 32 ? Metrics of Trust based Recommender
  • 51.
    51 Content-Based Recommender System •In content-based recommendations the system tries to recommend items that matches the User Profile; • The Profile is based on items user has liked in the past or explicit interests that he defines; • A content-based recommender system matches the profile of the item to the user profile to decide on its relevancy to the user;
  • 52.
    52 Read update User Profile Newbooks User Profile Recommender Systems Match recommendation Example of Content Based Recommender
  • 53.
    What is the“Content"? • The genre is actually not part of the content of a book • Most CB-recommendation methods originate from Information Retrieval (IR) field: o The item descriptions are usually automatically extracted (important words) o Goal is to find and rank interesting text documents (news articles, web pages) • Here are some examples: o Classical IR-based methods based on keywords o No expert recommendation knowledge involved o User profile (preferences) are rather learned than explicitly elicited
  • 54.
    Content Representation • Itemsstored in a database table
  • 55.
    Content Representation • Structureddata  Small number of attributes  Each item is described by the same set of attributes  Known set of values that the attributes may have • Straightforward topic to work with  User’s profile contains positive rating for 1001, 1002, 1003  Would the user be interested in say Oscar, French cuisine, table service? • Unstructured data  No attribute names with well-defined values  Need to impose structure on free text before it can be used  Natural language complexity  Same word with different meanings  Different words with same meaning
  • 56.
    Term-Frequency - InverseDocument Frequency • Simple keyword representation has its problems o In particular when automatically extracted because • Not every word has similar importance • Longer documents have a higher chance to have an overlap with the user profile • Standard measure: TF-IDF o Encodes text documents as weighted term vector o TF: Measures, how often a term appears (density in a document) • Assuming that important terms appear more often • Normalization has to be done in order to take document length into account o IDF: Aims to reduce the weight of terms that appear in all documents
  • 57.
    TF - IDFWeighting • Term frequency tft,d of a term t in a document d • Inverse document frequency idft of a term t • TF*IDF weighting   k dk dt dt n n tf , , ,        t t df N idf log   tdt idftfdtw  ,,
  • 58.
    Example TF IDFRepresentation
  • 59.
    User Profiles • Userprofile consists of two main types of information  A model of the user’s preferences. e.g., a function that for any item predicts the likelihood that the user is interested in that item  User’s interaction history. e.g., items viewed by a user, items purchased by a user, search queries, etc. • “Manual” recommending approaches  Provide “check box” interface that let the users construct their own profiles of interests  A simple database matching process is used to find items that meet the specified criteria and recommend these to users. • Rule-based Recommendation  The system has rules to recommend other products based on user history  Rule to recommend sequel to a book or movie to customers who purchased the previous item in the series  Can capture common reasons for making recommendations