An introduction to Recommender Systems

1
INTRODUCTION TO RECOMMENDER SYSTEMS
2014.12.04
David Zibriczky (ImpressTV)

2
About ImpressTV
• Netflix Prize (2006-2009)
• The Ensemble: Runner-up from 40K teams
• Gravity team: Members of The Ensemble
• Gravity R&D
› Hungarian start-up, launched in 2007, B2B business
› On-site personalization solution provider company for e-commerce, IPTV, OTT and classified media
› 100M+ recommendations per day
• ImpressTV Limited
› IPTV and OTT was acquired from Gravity R&D by british investors in July 2014
› HQ in Budapest, international corporate clients
• About me
› Joined Gravity R&D in January 2010, transfered to ImpressTV Limited in July 2014
› Current position: Head of Data Science at ImpressTV Limited

5
• Too many existing items, too many options
› YouTube videos, Amazon books, Netflix movies, Pandora musics, …
• High item publishing intensity, hard to follow the flow of information
› Google News, Facebook posts, Twitter tweets, …
• Users are not qualified enough
› They know what they want to watch/buy, but they don’t know whether a given item would
satisfy their needs
› E.g.: Grandma would like a new computer, knows what she wants to do with it, but has no idea
of the computer parts and what they do
• One may not know what they would like
› First time in a Chinese restaurant, no experience about Chinese food.
• How to improve the user satisfaction?
Information overload II. – Consumers

6
• Challange in handling ever increasing number of contents and corresponding data
• Challenge in handling transactional data
• Increasing amount of inner information
• What is useful?
• How to determ the usefulness of data?
• Competition in business market
› keeping and converting consumers is essential
› keeping the market adavantage
› increasing revenue
• How to exctract and use the collected information to improve business success?
Information overload II. – Content Providers

7
„Recommender Systems (RS) are software agents that elicit the interests and
preferences of individual consumers […] and make recommendations
accordingly. They have the potential to support and improve the quality of the
decisions consumers make while searching for and selecting products online.„ 1
What are Recommender Systems?
1 Xiao, Bo, and Izak Benbasat, 2007, E-commerce product recommendation agents: Use, characteristics, and
impact, Mis Quarterly 31, 137-209.
Recommender Systems

8
Recommender Systems as software agents
Items Users
Recommender
System
Recommend item X to user A

10
Examples / YouTube (Uploaded Videos)

11
Examples / Spotify+Last.fm (Music)

13
Examples / LinkedIn (News, Jobs, Groups, People, …)

14
Examples / Twitter (Tweets, Peoples, Groups, Companies)

15
Examples / ScienceDirect (Papers)

16
Examples / StackOverflow (Answers)

17
Examples / Grocery (Foods, Drinks)

18
• News (What to read now?)
• Dating sites (Which girl to contact?)
• Coupon portals (Which coupon is good for me?)
• Restaurants (Where to eat?)
• Vacations (Where to travel and what to see?)
• Retailers (How much products to supply?)
• Financial assets (loan, portfolio)
• Advertisements
• …etc
Other Examples

19
• For the consumers (users)
› Helping the users to find useful contents to satisfy their needs
› Reducing the time of content searching
› Providing relevant information from the massive information flow
› Exploring new preferences, trust in recommender system
• For the business
› Improving business success indicators (or key performance indicators, KPI)
• Increasing revenue, CTR, watching/listening duration
• Increasing conversion rate and user engagement, reducing churn
• Cross-selling, upselling, advertisement
› Reducing popularity effect, less popular contents are also consumed
› Promotions, targeting, campaign
Goals and benefits of Recommender Systems

20
• Goals for consumers not necessarily equal to business!
• Simplified example 1: YouTube (free contents)
› Goal of the business: Receive more income from advertisements
How: More videos watched, more advertisements seen
› Goal of the users: Having good/useful time by watching/listening videos
How: Clicking on recommendations, using search engine
› The goal is realized in the same way, more videos watched are good for both.
• Simplified example 2: Netflix (DVD/Blu-ray or Video On Demand rental)
› Goal of the business: Increase the income
How: More expensive contents, improving user engagement
› Goal for the users: Buying interesting movies, spending less time for searching
How: Using recommendation engine to make it easier
› The goal is different. Netflix wants the users pay more. The users basically don’t want to
spend more, only if they find it worth.
Difference between the goals of consumers and business

22
Recommender Systems as software agents
Items Users
Recommender
System
Recommend item X to user A

23
• Items: Entites that are recommended (movies, musics, books, news, coupons, restaurants, etc...)
• Item data: Descriptive information about the items (e.g. genre, category, price)
• Users: People to whom we recommend (Who is the user? Member, cookie, unidentified?)
• Paradigm: Recommendations are calculated by predefined non-personalized rules
• How: Setting item data based rules (recommending the latest movies for all users)
• Properties:
› Non-personalized static recommendations
› Requires manual work (e.g editorial pick)
Recommendation paradigms / Non-personalized rules
Items
Recommender
System
RecoItem data
Users

24
• User data: Descriptive information about the users (e.g. age, gender, location)
• Paradigm: Recommendations are calculated by predefined used data based rules
• How: Setting item and user data based rules (men between 45-55  expensive cars)
• Properties:
› Semi-personalized
› Requires manual work (e.g rule constructing)
› Interpretable
Recommendation paradigms / Personalized rules
Items User data
Recommender
System
Reco
Item data
Users

25
• Transactions: Interaction between users and items
• Transaction types: Numerical ratings, ordinal ratings, binary ratings, unary ratings, textual reviews
• Explicit feedback: The user quantifies his preference about an item (rating)
• Implicit feedback: Events that indicates but not quantifies the preference of the user about an item
› Positive: buy, watch, like, add to favourite
› Negative: dislike, remove from favourites
• In practice, implicit feedback is less valueable than explicit feedback, but significantly more provided
Recommendation paradigms / Transaction based personalization I.
Items User data
Recommender
System
Reco
Item data
Users
Transactions

26
• Paradigm: Recommendations are calculated based on the users interactions
• How: Learning on interactional data (collaborative filtering)
• Properties:
› Personalized
› Adaptive
› Less interpretable
Recommendation paradigms / Transaction based personalization II.
Items User data
Recommender
System
Reco
Item data
Users
Transactions

27
• Context: Information that can be observed in the time of recommendation
• Types:
› Time-based temporal information (Recos of sports equipments should be different in summer and in winter)
› Mood (Different TV programs should be recommended to the same person based on her current mood)
› Device (Different content types based on device type, e.g. movie trailers in TV, music in phone)
› Location
› Event sequence information (If the user browses the TV category of a webshop, it is reasonable to recommend
DVD players, but it is not if the user browses the laptop category)
Recommendation paradigms / Context based personalization I.
Items User data
Recommender
System
Reco
Item data
Users
Context
Transactions

28
• Paradigm: Recommendations are calculated based on the users interactions and current context
• How: Learning on contextual data (the user usually watches new in the morning, action at night)
• Properties:
› Context-sensitive, fully adaptive
› More interpretable
› More complex
Recommendation paradigms / Context based personalization II.
Items User data
Recommender
System
Reco
Item data
Users
Context
Transactions

29
Cross-domain recommendation
Task Goal
Multi-domain
Cross- selling
Diversity
Serendipity
Linked-domain Accuracy
Cross-domain
Cold-start
New users
New items
• Multi-domain: You watched movies and bought books, we recommend movies or books
• Linked-domain: You bought books only, we recommend books using both movie and book
consumption patterns
• Cross-domain: You watched movies only, we recommend books, based on movie-book
consumption relationship
Domain Example Ratio
Attribute Comedy Thriller 12%
Type Movies   Books 9%
Item Movies  Restaurants 55%
System Netflix  MovieLens 24%

30
• Item to User
› Conventional recommendation task: „You may like these items”.
• Item to Item
› Recommending items that are somewhat similar to the item currently viewed by the user
› Personalized or non-personalized similarity
• User to User
› Recommending different users for the users based on metadata or activity
› Social recommendations (who to follow, who to connect) or similar users
• User to Item
› Promoting items that the seller wants to sell to the most probable buyers (e.g. newsletters, notifications)
› „Who would buy this item?”
• Group recommendations
› Recommending group of items or users
Recommendation types

32
• Content-based Filtering (CBF)
› Recommend items that are similar to the ones that the user liked in the past.
› Similarity based on the metadata of the items.
› E.g.: If the user likes romantic movies, recommend her the like
• Collaborative Filtering (CF)
› Recommend items that are liked by users that have similar taste as the current user
› Similarity betwen users is calculated by the transaction history of users
› Only uses the transaction data  domain independent
• Demographic
› Recommendations made based on the demographic profile.
› Realizes a simple personalization.
› E.g.: Recommend computer parts to young people who study informatics.
Recommendation techniques I.

33
• Knowledge-based
› Uses extensive domain-specific knowledge to generate recs.
› User requirements collected (problem), items as possible solutions to the specific problems
› Example: user would like to find a digital camera, she provides her needs and the level of her skills, etc.
› Pure knowledge based systems (without learning) tend to perform better in the beginning of deployment
but they fall behind later.
• Community-based
› Recommendations based on the preferences of the user’s friends.
› People tend to accept recommendations of their friends.
› Using of social networks  social recommender systems.
• Hybrid recommendation systems
› Combination of the techniques above.
› Trying to use the advantages and fix the disadvantages of the different techniques.
Recommendation techniques II.

34
1. Setting the recommendation type (e.g. Item2user recommendation)
2. Requesting a recommendation
3. Selecting the recommendable items, filtering (e.g. new movies)
4. Selecting the algorithm for scoring these items (e.g. a collaborative filteirng algorithm)
5. The algorithm provides score for each items
6. Ordering the list by the scores
7. Post processing the input item list (e.g. randomizing topN, selecting one episode per series, etc…)
8. Selecting the first N items to send back as a response of recommendation request
Recommendation data flow

35
• Key features of a good recommender systems, that should be considered
› Accuracy (efficiency of modeling the user preference)
› Adaptation, context awareness (ability to detect changes in user behavior)
› Diversity, coverage (avoiding monotone recommendations, and preference cannibalization)
› Novelty, serendipity (improving suprise factor, exploitation vs. exploration)
› Trust, explanation (the users should understand and trust in recommender system)
› Scalability, responsivity, availability (recommendations should be provided in reasonable time)
• Tradeoffs in recommender systems
› Accuracy vs. Diversity
› Discovery vs. Continuation
› Depth vs. Coverage
› Freshness vs. Stability
› Recommendations vs. Tasks
Properties of recommender system

36
• Method
› Splitting data set into disjunct train and test set
› By time, random, users, user history
› Training on train set, measuring on test set
• Evaluation metrics
› Accuracy: Rating (RMSE), TopN (nDCG@N, Recall@N, Precision@N)
› Coverage: Ratio of the recommended items
› Diversity: Entropy, Gini-index
› Novelty: Ratio of the long tail items
› Serendipity: Ratio of the less
› Training time
› Accuracy is the typical primary metric, others are secondary
Offline Evaluation
DataSet
Train Test

37
• Method
› Effects of recommender systems are measured
› A/B testing: Splitting user base into equally valuable subsets, different recommendations for
the subsets
› A/B tests avoid
› Online improvement can be measured
• Evaluation metrics
› Some effects cannot be evaluated before the recommender system set live
› Click Though Rate (CTR, Number of clicks per recomendation)
› Average Revenue Per User (ARPU), Total Revenue Increase
› Page impression (number of page views)
› Conversion Rate, churn Rate
Online Evaluation

38
1. Understanding the business needs
› Business goals, Key Performance Indicators for live recommender system
› Recommendation scenarios, placeholders
› Data understanding
2. Integration (for Vendors only)
› Developing data integration method (the way the customer provide it’s data)
› Setting up recommendations request/response interface (how the customer request a recommendation,
and the vendor provides response, e.g. JSON or REST API)
3. Data preparation
› Data enrichment from external sources (e.g. crawling additional item meta data)
› Data transformation (shall we handle a complete series as an item, or a group of episodes?)
Workflow of Recommender System integration

39
4. Data Mining and Offline Experiments

40
5. Online Evaluation / Deployment
› Setting recommendation engine live (starting to provide recommendations for real end-users)
› Measuring online performance metrics (e.g. CTR, ARPU or page impression)
› Measuring response times (is the algorithm fast enough in live service?)
› Analyzing correlation between the offline and online metrics (what to optimize in offline experiments)
6. Optimization
› Implementing additional algorithms (Step 4)
› A/B testing (is the new algoritm better than the original one?)
› Statistical significance based A/B selection (e.g. T-test between the performance of algorithm A and B)
7. Reporting and follow-up
› Reporting performance (weekly, monthly)
› Monitoring response times and availability (if the algorithm not roboust enough, it may result outage)
› Monitoring change in user behavior, size of user base
› Adapting to new features (e.g. new placeholders in the website)

42
› Recommend items that are similar to the ones that the user liked in the past
› Similarity based on the metadata of the items
› E.g.: If the user likes horror movies, recommend her horror movies
• Demographic
• Knowledge-based
• Community-based
Recommendation Techniques

43
Recommending an item to a user based upon a description of the item and a profile of the user’s interests
• Items are represented by their metadata (e.g genre or description)
• Users are represented by their transaction history (e.g. item A and B was rated by 5)
• We generate user profiles from the user transaction data and the metadata of the items in the user’s
transactions
• The user profiles are compared to the representation of the items
• The items similar to the user profile are recommended to the user
CBF Method
Twilight
5
5
?
?
Sci-Fi RomanceAdventure
The
Matrix
The
Matrix 2
The
Matrix 3

45
Content Analyzer
• Preprocessing module
• Input: Item meta data
• Methods:
› Text mining
› Semantic Analysis
› Natural Language
Process
› Feature extraction
› Meta data enrichment
› Auto tagging
• Output: Item models

46
• We create terms from the item metadata
› Standard text mining preprocessing steps
• Filtering stopwords
• Filtering too rare/common terms
• Stemming
• Items are represented by a (sparse) vector of terms
› The vector of each item (=document) contains weights for each term.
› Weighting is often done by the TF-IDF scheme
• Rare terms are not less relevant than frequent terms (IDF).
• Multiple occurences in a document are not less relevent than a single occurence (TF).
• Invariant of the length of the document (normalization).
• Similarity measurement
› Most common: cosine similarity
• Scalar product of the L2 normalized vectors
Content Analyzer / Vector space model (VSM)

















T
k
k
T
k
k
T
k
kk
ww
ww
iisim
1
2
2
1
2
1
1
21
21 ),(

47
• Semantic analysis in order to extract more information about the items
• Domain-specific knowledge and taxonomy is used
Content Analyzer / Semantic analysis, ontologies

48
• The meta data provided by clients is usually not enough
• Meta data enrichment: Crawling additional data about the items from public web
• Publicly available knowledge sources
› Open Directory Project
› Yahoo! Web Directory
› Wikipedia
› Freebase
› Tribune
› Etc…
• Goals:
› Fixing missing or wrong data
› Better characterization of items
› Improving the accuracy of CBF algorithms
Content Analyzer / Meta data enrichment

49
Profile Learner
• User preference modeling
• Inputs:
› Item models
› User transactional data
› User meta data
• Methods:
› Meta data weighting
› Machine learning
• Output: User profile

50
Profile Learner / Keyword-based profiles

51
• Negative and positive examples collected
› Explicit case: Items rated under 3 counts negative, above positive (variants: above overall average, user average…)
› Implicit case: Items viewed counts positive, others negative
• Approach 1
› User profile: weighted average of item vectors (Rocchio’s algorithm)
› Similarity between user profile and item vectors as relevance score
• Approach 2
› Negative and positive user profiles
› Items that match negative profile are filtered
› Similarity between the positive profile and the item’s vector as relevance score
• Approach 3
› Rule based classifiers (decision trees, decision lists) learning on the examples
› New items are judged by the classifier
• Approach 4
› Nearest Neighbor methods
Profile Learner / User profile in VSM model

52
• The terms in the item metadata are handled independently
• Semantic information is lost in the process, we have no idea of the meaning of the terms
• It is hard to handle expressions containing more than one word
• Data sparsity
• Need diverse user feedbacks to improve the accuracy
Profile Learner / Problems with the VSM model

53
Profile Learner / Phenomenon of polysemy

54
Profile Learner / Word sense disambiguation

55
Architecture / Filtering Component
• Recommendation list filtering
• Inputs:
› User profile model
› Item models
› Recommendable items
• Methods:
› Filtering non-relevant
items
› Similarity based ranking
• Output: Recommended items

56
• User independence
› Profiles of the users are built independently
› A user can not influence the recommendations for other users (no attacks on the recommender system)
• Interpretable
› Recommendations can be easily explained (we recommend this comedy because you watched another
comedies)
› Explanations can easily built, user profiles are easy to be understood (user keyword model)
• Item cold start
› Solving item cold start problem
› Capable of recommending items that were never rated/viewed by anybody
Advantages of Content-based Filtering

57
• Limited content analysis
› Natural limit in the number and type of features that are associated with the recommended items
› Domain knowledge is often needed
• Over-specialization
› Hard to improve serendipity (recommending some unexpected)
› Recommendations mirrors the user history
› Monotone recommendations: „The winner takes it all” effect
› Exploitation over exploration
• Meaning
› word meaning disambiguation (apple = company or the fruit?)
• User cold start
› Needs some ratings before the profile learner can build an accurate user model
› There will be no reliable recommendations when only a few ratings available
Disadvantages of Content-based Filtering

58
Disadvantages / Word meaning disambiguation

59
• Deep learning (Sentiment Analysis, Paragraph Vector Model)
• User generated contents (Tagging, Folksonomies)
• Serendipity problem
› Using randomness or genetic algorithms
› Filtering too similar items
› Balance between exploration and exploitation
› Using poor similarity measures to produce anomalies and exceptions
• Centralized knowledgebase for meta data enrichment
Research topics in Content-Based Filtering

61
› Recommend items that are liked by users that have similar taste as the
current user
› Similarity between users is calculated by the transaction history of users
› Only uses the transaction data  domain independent
• Demographic
• Knowledge-based
• Community-based
Recommendation Techniques

62
– Explicit Collaborative Filtering
– Implicit Collaborative Filtering
Explicit
Collaborative
Filtering

63
5 ?
?
The Matrix The Matrix 2 Twilight The Matrix 3
?
• Classic item recommendation (Netflix) on explicit feedbacks (ratings)
• Rating problem: The goal is to predict how the user would rate the items
• Accuracy metrics:
› Root Mean Squared (RMSE)
› Mean Absolute Error (MAE)
 
test
Rriu
iu
R
rr
RMSE test



),,(
2
,
ˆ
test
Rriu
iu
R
rr
MAE test



),,(
,
ˆ
Rating prediction problem

64
5 ?
?
?
• Global average (average of all ratings)
• User average (average of user ratings)
• Item average (average of ratings given to the item)
Baseline Methods

65
5
5
4
5
5
?
?
The Matrix 3The Matrix The Matrix 2 Twilight
5
?
Explicit Collaborative Filtering

66
R Item1 Item2 Item3 Item4
User1 5 ? ? ?
User2 5
User3 5 4
User4 5 5
Explicit CF / Rating Matrix Representation
5
5
4
5 5
?
?
Item4Item1 Item2 Item3
5
?

67
Collaborative Filtering
– Neighbor methods
– Matrix Factorization
Explicit
Neighbor Methods

68
• User based neigbor methods
› 1. Find users who have rated item i
› 2. Select k from these users that are the most similar to u
› 3. Calculate R(u,i) from their ratings on i
• Item based neigbor methods
› 1. Find items that have been rated by user u
› 2. Select k from these items that are the most similar to i
› 3. Calculate R(u,i) from their ratings by u
Explicit CF / Neigbor methods

69
• Similarity between
› Rows of the preference matrix (user based)
› Columns of the preference matrix (item based)
› Dimension reduction on user/item preference vectors  feature vectors
• Similarity properties:
› S(a,a) >= S(a,b) (often S(a,a) = 1 required)
› S(a,b) = S(b,a)
• Similarity measures
› Cosine similarity (CS)
› Pearson correlation (PC)
› Adjusted cosine similarity (ACS)
› Eucledian distance (EUC)
Explicit CF / Neigbor methods / Similarity setup
2 2
uv
u v
ui vi
i I
ui vj
i I j I
r r
CS
r r

 


  






uvuv
uv
Ii
vvi
Ii
uui
Ii
vviuui
rrrr
rrrr
PC
22
)()(
))((
2 2
( )( )
( ) ( )
uv
uv uv
ui i vi i
i I
ui i vi i
i I i I
r r r r
ACS
r r r r

 
 

 

   

uvIi
viui rrEUC
2

70
• Advantages
› Simplicity (easy to set up, only a few parameters)
› Justifiability (recommendations can be explained)
› Stability (not very sensitive to new items/users/ratings)
› Good for item2item and user2user recommendation
• Disadvantages
› Computationally expensive in the recommending phase
• Similarities have to be computed for recommendation
• Similarity maxtrix might be computed before recommendations but then there is a costly training
phase
• Similarity matrix might not fit in the memory
› Less accurate than model based methods for personalized recommendation
Advantages and disadvantages of neighbor methods

71
– Explicit collaborative filtering
– Implicit collaborative filtering
Explicit
Matrix
Factorization

72
𝑹 = 𝑷𝑸 𝑻
𝑟 𝑢𝑖 = 𝒑 𝑢
𝑇 𝒒𝑖
𝑹 𝑵𝒙𝑴: rating matrix
𝑷 𝑵𝒙𝑲: user feature matrix
𝑸 𝑴𝒙𝑲: item feature matrix
𝑵: #users
𝑴: #items
𝑲: #features
𝑲 ≪ 𝑴, 𝑲 ≪ 𝑵
R Item1 Item2 Item3 …
User1
User2 𝒓 𝑢𝑖
User3
…
P
𝒑 𝑢
𝑇
QT
𝒒𝑖
Explicit CF / Matrix Factorization

73
• ALS (Alternating Least Squares)
› Updating P and Q matrices by multivariate linear regression based solver
• BRISMF (Biased Regularized Simultaneous Matrix Factorization)
› Stochastic gradient descent based matrix factorization
› Minimizing the prediction error by iterating on transactions and modifying factors
› Using bias for user and item model
› Prediction:
• SVD (Singular Value Decomposition)
› Decomposes matrix R to three matrices
› S is a diagonal matrix, containing singular values of matrix R
• NSVD1
› Decomposes matrix R to three matrices
› W is a weight matrix
Explicit CF / Matrix Factorization
iu
K
k
kiukiuiuui cbqpcbqpr  1
ˆ

T
R PSQ
T
R PWQ

74
Explicit CF / Matrix Factorization / Model Visualization
P
John 1.1 0.5
Paul 0.6 0.9
Suzy 1.0 0.9
Item 1 Item 2 Item 3 Item 4 Item 5
QT
1.0 -0.9 0.5 -0.5 1.3
-0.3 1.6 0.7 1.6 -1.1

75
Explicit CF / Matrix Factorization / Model Visualization
P
John 1.1 0.5
Paul 0.6 0.9
Suzy 1.0 0.9
QT
1.0 -0.9 0.5 -0.5 1.3
-0.3 1.6 0.7 1.6 -1.1
J
P S
1
2
3
4
5

76
Explicit CF / Matrix Factorization / Clustering
P
John 1.1 0.5
Paul 0.6 0.9
Suzy 1.0 0.9
QT
1.0 -0.9 0.5 -0.5 1.3
-0.3 1.6 0.7 1.6 -1.1
J
P S
1
2
3
4
5
A
B
C

77
Memory based
algorithms
Model based algorithms
Hierarchy of Collaborative Filtering methods
Matrix
factorization
Explicit feedback base algorithms
Implicit feedback base algorithms
Neighbor
methods

78
Implicit
Collaborative
Filtering

79
?
?
?
• Classic item recommendation on implicit feedbacks (view, buy, like, add to favourite)
• Preference problem: The goal is to predict the probability that the user would choose that item
• Accuracy metrics:
› Ranking error instead of prediction error
› Recall@N
› Precision@N
Preference prediction problem

80
?
?
?
• Most popular items
• Most recent items
• Most popular items in the user’s favourite category
Baseline Methods

81
?
?
The Matrix 3The Matrix The Matrix 2 Twilight
?
Implicit Collaborative Filtering

82
Implicit CF / Preference Matrix
?
?
Item4Item1 Item2 Item3
?
• Implicit Feedbacks: Assuming positive preference
• Value = 1
• Estimation of unknown preference?
• Sorting items by estimation  Item Recommendation
User1 1
User2 1
User3 1 1
User4 1 1

83
• Zero value for unknown preference (zero example). Many 0s, few 1s, in practice-
• 𝒄 𝑢𝑖 confidence for known feedback (constant or function of the context of event)
• Zero examples are less important, but important.
User1 1 0 0 0
User2 0 0 1 0
User3 1 1 0 0
User4 0 1 0 1
C Item1 Item2 Item3 Item4
User1 𝒄11 1 1 1
User2 1 1 𝒄23 1
User3 𝒄31 𝒄32 1 1
User4 1 𝒄42 1 𝒄44
Implicit CF / Confidence Matrix

84
Implicit
Neighbor Methods

85
• User based neigbor methods
› 1. Find users who have rated item i
› 2. Select k from these users that are the most similar to u
› 3. Calculate R(u,i) from their ratings on i
• Item based neigbor methods
› 1. Find items that have been rated by user u
› 2. Select k from these items that are the most similar to i
› 3. Calculate R(u,i) from their ratings by u
Implicit CF / Neigbor methods

86
• Co-occurrence similarity algorithm (most often used in business)
• The algorithm:
1. For all item i:
1. Count the number of users that interacted with the item i: 𝑠𝑢𝑝𝑝(𝑖)
2. For all user u that interacted with the item i:
1. For all item i iteracted by user u:
1. Increment the counter of the co-occurrence of item i and j: 𝑠𝑢𝑝𝑝 𝑖, 𝑗
2. The similarity between item i and item j: 𝑆 𝑖, 𝑗 =
𝑠𝑢𝑝𝑝(𝑖,𝑗)
𝑠𝑢𝑝𝑝 𝑖 +𝛾 1−𝛼 𝑠𝑢𝑝𝑝 𝑗 +𝛾 𝛼
• 𝛼: popularity factor
• 𝛾: reguralization factor
3. The prediction for user u and item i: 𝑟 𝑢𝑖 =
𝑗∈𝐶 𝑢>0 𝑐 𝑢𝑗 𝑆(𝑖,𝑗)
𝑗∈𝐶 𝑢>0 𝑐 𝑢𝑗
Implicit CF / Item based neigbor methods

87
– Tensor Factorization
Implicit
Matrix
Factorization

88
𝑹 = 𝑷𝑸 𝑻
𝑟 𝑢𝑖 = 𝒑 𝑢
𝑇 𝒒𝑖
𝑹 𝑵𝒙𝑴: preference matrix
𝑵: #users
𝑴: #items
𝑲: #features
𝑲 ≪ 𝑴, 𝑲 ≪ 𝑵
R Item1 Item2 Item3 …
User1
User2 𝒓 𝑢𝑖
User3
…
P
𝒑 𝑢
𝑇
QT
𝒒𝑖
Matrix Factorization

89
• Objective function:
C Item1 Item2 Item3 Item4
User1 𝒄11 1 1 1
User2 1 1 𝒄23 1
User3 𝒄31 𝒄32 1 1
User4 1 𝒄42 1 𝒄44
𝒇 𝑷, 𝑸 = 𝑾𝑺𝑺𝑬 =
(𝒖,𝒊)
𝒄 𝒖𝒊 𝒓 𝒖𝒊 − 𝒓 𝒖𝒊
𝟐
𝑷 = ?
𝑸 = ?
User1 1 0 0 0
User2 0 0 1 0
User3 1 1 0 0
User4 0 1 0 1
Weighted Sum of Squared Errors

90
Optimizer / Alternating Least Squares
Ridge Regression
• 𝑝 𝑢 = 𝑄 𝑇
𝐶 𝑢
𝑄 −1
𝑄 𝑇
𝐶 𝑢
𝑅 𝑟 𝑢
• 𝑞𝑖 = 𝑃 𝑇
𝐶 𝑖
𝑃
−1
𝑃 𝑇
𝐶 𝑖
𝑅 𝑐 𝑖
QT
0.1 -0.4 0.8 0.6
0.6 0.7 -0.7 -0.2
P
-0.2 0.6
0.6 0.4
0.7 0.2
0.5 -0.2
User1 1 0 0 0
User2 0 0 1 0
User3 1 1 0 0
User4 0 1 0 1

91
Ridge Regression
• 𝑝 𝑢 = 𝑄 𝑇
𝐶 𝑢
𝑄 −1
𝑄 𝑇
𝐶 𝑢
𝑅 𝑟 𝑢
𝐶 𝑖
𝑃
−1
𝑃 𝑇
𝐶 𝑖
𝑅 𝑐 𝑖
User1 1 0 0 0
User2 0 0 1 0
User3 1 1 0 0
User4 0 1 0 1
QT
0.3 -0.3 0.7 0.7
0.7 0.8 -0.5 -0.1
P
-0.2 0.6
0.6 0.4
0.7 0.2
0.5 -0.2

92
Ridge Regression
• 𝑝 𝑢 = 𝑄 𝑇
𝐶 𝑢
𝑄 −1
𝑄 𝑇
𝐶 𝑢
𝑅 𝑟 𝑢
𝐶 𝑖
𝑃
−1
𝑃 𝑇
𝐶 𝑖
𝑅 𝑐 𝑖
User1 1 0 0 0
User2 0 0 1 0
User3 1 1 0 0
User4 0 1 0 1
QT
0.3 -0.3 0.7 0.7
0.7 0.8 -0.5 -0.1
P
-0.2 0.7
0.6 0.5
0.8 0.2
0.6 -0.2

93
• Complexity of naive solution: 𝚶 𝑰𝑲 𝟐
𝑵𝑴 + 𝑰𝑲 𝟑
𝑵 + 𝑴
𝑬: number of examples, 𝑰 : number of iterations
• Improvement (Hu, Koren, Volinsky)
– Ridge Regression: 𝑝 𝑢 = 𝑄 𝑇 𝐶 𝑢 𝑄 −1 𝑄 𝑇 𝐶 𝑢 𝑅 𝑟 𝑢
– 𝑄 𝑇 𝐶 𝑢 𝑄 = 𝑄 𝑇 𝑄 + 𝑄 𝑇 𝐶 𝑢 − 𝐼 𝑄 = 𝐶𝑂𝑉𝑄0 + 𝐶𝑂𝑉𝑄+, 𝚶(𝑰𝑲 𝟐 𝑵𝑴) is costly
– 𝐶𝑂𝑉𝑄0 is user independent, need to be calculated at the start of the iteration
– Calculating 𝐶𝑂𝑉𝑄+ needs only #𝑷(𝒖)+
steps.
• #𝑷(𝒖)+
: number of positive examples of user u
– Complexity: 𝜪 𝑰𝑲 𝟐 𝑬 + 𝑰𝑲 𝟑(𝑵 + 𝑴) = 𝜪 𝑰𝑲 𝟐(𝑬 + 𝑲(𝑵 + 𝑴)
– Codename: IALS
• Complexity issues on large dataset:
– If 𝑲 is low: 𝜪(𝑰𝑲 𝟐
𝑬) is dominant
– If 𝑲 is high: 𝑶(𝑰𝑲 𝟑(𝑵 + 𝑴)) is dominant

94
– Tensor Factorization
Implicit
Tensor
Factorization

95
R1 Item1 Item2 Item3 …
User1 1 …
User2 1 0 …
User3 …
…. … … … …
• Tensor Factorizaton
• Different preferences during the day
• Context: Time period
• Time period 1: 06:00-14:00

96
User1 1 …
User2 1 0 …
User3 …
…. … … … …
User1 0 1 …
User2 1 …
User3 1 …
…. … … … …
• Time period 2: 14:00-22:00

97
User1 1 …
User2 1 0 …
User3 …
…. … … … …
User1 0 1 …
User2 1 …
User3 1 …
…. … … … …
User1 1 …
User2 …
User3 1 1 …
…. … … … …
• Time period 3: 22:00-06:00

98
User1 1 …
User2 1 0 …
User3 …
…. … … … …
User1 0 1 …
User2 1 …
User3 1 …
…. … … … …
User1 …
User2 𝒓 𝑢𝑖𝑡 …
User3 …
…. … … … …
QT
q11 q21 q31 …
q12 q22 q32 …
P
p11 p12
p21 p22
p31 p32
… …
Tt11
t12
t21
t22
t31
t32
𝑹 𝑵𝒙𝑴: preference matrix
𝑻 𝑳𝒙𝑲: time feature matrix
𝑵: #users
𝑴: #items
𝑳: #time periods
𝑲: #features
𝒓 𝒖𝒊t =
𝒌
𝒑 𝒖𝒌 𝒒𝒊𝒌 𝒕𝒕𝒌
𝑹 = 𝑷° 𝑸° 𝑻• Tensor Factorizaton

99
• Relies on feedbacks
› Explicit feedback is more reliable than implicit feedback
› But explicit feedback is often not provided, just implicit feedbacks
› Does not need heterogeneous data sources
• Extracts latent information using consumtion behaviors
• More accurate than content based filtering
• Domain independent
Advantages of Collaborative Filtering

100
• Cold start problem
› Items without feedbacks cannot be recommended
› CF algorithms cannot provide personalized recommendations for users without feedbacks
• Warmup period
› Requires numerous events to be able to characterize the user properly
› Inaccurate for users with few feedbacks
› Inaccurate for domains with weak collaboration
• Vulnerable to attacks
• Harder to explain the recommendations
• Unable to provide cross-domain recommendations without overlapping data
Disadvantages of Collaborative Filtering

102
Other applications / Hybrid Filtering
• Combination of collaborative and content-based filtering
• Advantages
› Transforms latent behavioral knowledge to meta data level
› Solves cold start problem
› Used for weighting meta data, able to improve the efficiency of CBF
› Behavioral knowledge can be interpreted by meta data
• Disadvantages
› Requires both meta data and transactional data
› More complex than CF and CBF
› Less developed than CF and CBF, hot topic
› Challenging to provide mixed recommendations with new and old items
› Heterogeneous data issues

103
Other applications / Explanation
• Approaches
› Item explanation: Explanaining why those items were recommended
› User explanation: Description about the user profile
• Types of the item explanations
› Non-personalized: „… because this content is trending”
› Explicit features: „… because you bought horror movies”
› Explicit user-to-user links: „… because your friends love this movie”
› Explicit user relations: „.. because some users similar to you like this movie”
› Implicit features: „… because actor X is similar to your preferred actor Y”
• Types of user explanations
› User tag cloud: „the user consumes 80% horror and 20% comedy”
› User similarity: „the user is similar to another user who is a fan of Star Wars”
• Goal: Increasing trust in recommender system
• Difficulties: Hard to optimize how the recommendations should be explained

104
Other applications / Recommender strategies
• The most common way to recommend from prediction scores is to order items by them
• How to avoid monotonity and „the winner takes all” effect?
• How to avoid preference cannibalization?
• Recommender strategies: How to select items from the scored list?
› Best match
› One episode per series
› At most N items per category
› Different categories should follow each other
• Exploration vs. Exploitation with Multi-Armed Bandits
• Entropy maximalization per recommendaiton box for new users

105
Presented by:
Contact:
THANK YOU!
www.impresstv.com
David Zibriczky
Head of Data Science, ImpressTV Limited
david.zibriczky@impresstv.com

An introduction to Recommender Systems

More Related Content

What's hot

Similar to An introduction to Recommender Systems

More from David Zibriczky

Recently uploaded

An introduction to Recommender Systems