Recommender Systems, Matrices and Graphs

Recommender
Systems, MaTRICES
and Graphs
Roelof Pieters
roelof@vionlabs.com
14 May 2014 @ KTH

About me
Interests in:
• IR, RecSys, Big Data, ML, NLP, SNA,
Graphs, CV, Data Visualization, Discourse
Analysis
History:
• 2002-2006: almost-BA Computer Science @
Amsterdam Tech Uni (dropped out in 2006)
• 2006-2010: BA Cultural Anthropology @
Leiden & Amsterdam Uni’s
• 2010-2012: MA Social Anthropology @
Stockholm Uni
• 2011-Current: Working @ Vionlabs
se.linkedin.com/in/roelofpieters/
roelof@vionlabs.com

Say Hello!
St: Eriksgatan 63
112 33 Stockholm - Sweden
Email: hello@vionlabs.com
Tech company here in Stockholm with Geeks
and Movie lovers…
Since 2009:
• Digital ecosystems for network operators,
cable TV companies, and ﬁlm distributor
such as Tele2/Comviq, Cyberia, and
Warner Bros
• Various software and hardware hacks for
different companies: Webbstory, Excito,
Spotify, Samsung
Focus since 2012:
• Movie and TV recommendation  
service
FoorSee

Outline
•Recommender Systems
•Algorithms*
•Graphs
(* math magicians better pay attention here)

Outline
•Taxonomy
•History
•Evaluating Recommenders
•Algorithms*
•Graphs

Information Retrieval
• Recommender
Systems as part of
Information Retrieval
Document(s)Document(s)Document(s)Document(s)Document(s)
Retrieval
USER
Query
• Information Retrieval is
the activity of obtaining
information resources
relevant to an
information need from a
collection of information
resources.

IR: Measure Success
• Recall: success in retrieving all correct documents
• Precision: success in retrieving the most relevant
documents
• Given a set of terms and a set of document terms
select only the most relevant documents
(precision), and preferably all the relevant ones
(recall)

“generate meaningful recommendations to a
(collection of) user(s) for items or products that
might interest them”
Recommender Systems

Where can RS be found?
• Movie recommendation (Netﬂix)
• Related product recommendation (Amazon)
• Web page ranking (Google)
• Social recommendation (Facebook)
• News content recommendation (Yahoo)
• Priority inbox & spam ﬁltering (Google)
• Online dating (OK Cupid)
• Computational Advertising (Yahoo)

Taxonomy of RS
• Collaborative Filtering (CF)
• Content Based Filtering (CBF)
• Knowledge Based Filtering (KBF)
• Hybrid

Taxonomy of RS
• Collaborative Filtering (CF)!
• Hybrid

Collaborative Filtering:
• relies on past user behavior
• Implicit feedback
• Explicit feedback
• requires no gathering of external data
• sparse data
• domain free
• cold start problem
16

Collaborative
(Dietmar et. al. At ‘AI 2011)
User based Collaborative Filtering

User based Collaborative Filtering

Taxonomy of RS
• Content Based Filtering (CBF)!
• Hybrid

Content Filtering
• creates proﬁle for user/movie
• requires gathering external data
• dense data
• domain-bounded
• no cold start problem
20

Content based
Item based Collaborative Filtering

Item based Collaborative Filtering

Taxonomy of RS
• Knowledge Based Filtering (KBF)!
• Hybrid

Knowledge based
Knowledge based Content Filtering

Knowledge based Content Filtering

Hybrid

History
• 1992-1995: Manual Collaborative Filtering
• 1994-2000: Automatic Collaborative Filtering +
Content
• 2000+: Commercialization…

TQL:
Tapestry (1992)
(Golberg et. al 1992)

Grouplens (1994)
(Resnick et. al 1994)

2000+: Commercial CF’s
• 2001: Amazon starts using item based collaborative
filtering (Patent filed at 1998)
• 2000: Pandora starts music genome 
project, where each song“is analyzed using up to 450
distinct musical characteristics by a trained music analyst.”
• 2006-2009: Netflix Contents: 2 of many algorithms put
in use by Netflix replacing “Cinematch": Matrix
Factorization (SVD) and Restricted Boltzmann
Machines (RBM) 
(http://www.pandora.com/about/mgp)
(http://www.netflixprize.com)

Annual Conferences
• RecSys (since 2007) http://recsys.acm.org
• SIGIR (since 1978) http://sigir.org/
• KDD (ofﬁcial since 1998) http://www.kdd.org/
• KDD Cup

Ongoing Discussion
• Evaluation
• Scalability
• Similarity versus Diversity
• Cold start (items + users)
• Fraud
• Imbalanced dataset or Sparsity
• Personalization
• Filter Bubbles
• Privacy
• Data Collection

Evaluating Recommenders
• Least mean squares prediction error
• RMSE 
 
 
• Similarity measure enough ? 
 
 
rmse(S) =
s
|S| 1
X
(i,u)2S
(ˆrui rui)2

Evaluating Recommenders
rmse(S) =
s
|S| 1
X
(i,u)2S
(ˆrui rui)2

Outline
•Algorithms*
•Content based Algorithms *
•Collaborative Algorithms *
•Classiﬁcation
•Rating/Ranking *
•Graphs

• content is exploited (item to item ﬁltering)
• content model:
• keywords (ie TF-IDF)
• similarity/distance measures:
• Euclidean distance:
• L1 and L2-norm
• Jaccard distance
Content-based Filtering
• (adjusted) Cosine distance
• Edit distance
• Hamming distance

• Euclidean distance 
• Jaccard distance 
• Cosine distance
dot product x.y is  
1 × 2 + 2 × 1 + (−1) × 1 = 3
x = [1,2, −1] and = [2,1,1].
L2-norm = 
√12 + 22 + (−1)2 = 6
ie:

• Euclidean distance 
• Jaccard distance 
• Cosine distance
dot product x.y is  
1 × 2 + 2 × 1 + (−1) × 1 = 3
x = [1,2, −1] and = [2,1,1].
cosine of angle: 
3/(√6√6) =1/2
cos distance of 1/2:
60 degrees,
L2-norm = 
√12 + 22 + (−1)2 = 6
ie:

Examples
• Item to Query
• Item to Item
• Item to User

Examples
• Item to Query!
• Item to Item
• Item to User

Example: Item to Query
Title Price Genre Rating
The Avengers 5 Action 3,7
Spiderman II 10 Action 4,5
user query q :  
“price (6) AND genre(Adventure) AND rating (4)”
weights of features: 0.22 0.450.33
Sim(q,”The Avengers”) =  
0.22 x (1 - 1/25) + 0.33 x 0 + 0.45 x (1 - 0.3/5) = 0.6342
1-25 price range no matchdiff of 1 diff of 0.3 0-5 rating range
Sim(q,”Spiderman II”) = 0.5898  
(0.6348 if we count rating 4.5 > 4 as match)
Weighted Sum:

Examples
• Item to Query
• Item to Item!
• Item to User

Example: Item to Item Similarity
Title ReleaseTime Genres Actors Rating
TA 90s, start 90s, 1993 Action, Comedy, Romance X,Y,Z 3,7
S2 90s, start 90s, 1991 Action W,X,Z 4,5
numeric
Array of Booleans
Sim(X,Y) = 1 - d(X,Y)  
or  
Sim(X,Y) = exp(- d(X,Y))
where 0 ≤ wi ≤ 1, and i=1..n (number of features).
Set of hierarchical
related symbols

Title ReleaseTime Genres Actors Rating
TA 90s, start 90s, 1993 Action, Comedy, Romance X,Y,Z 3,7
S2 90s, start 90s, 1991 Action W,X,Z 4,5
numeric
Array of Booleans
Set of hierarchical
related symbols
X1 = (90s,S90s,1993)
X2 = (1,1,1)
X3 = (0,1,1,1)
X4 = 3.7
TA
W 0.5 0.3 0.2
X1 = (90s,S90s,1991)
X2 = (1,0,0)
X3 = (1,1,0,1)
X4 = 4.5
S2
weights of feature all the same
weights of categories within “Release
time” different

X1 = (90s,S90s,1993)
X2 = (1,1,1)
X3 = (0,1,1,1)
X4 = 3.7
TA
W 0.5 0.3 0.2
X1 = (90s,S90s,1991)
X2 = (1,0,0)
X3 = (1,1,0,1)
X4 = 4.5
S2
 
exp(- (1/√4) √d1(X1,Y1)2 +…+d4(X4,Y4 )2 ) =
exp(- )
exp(-(1/√4) √(1-(0.3+0.5))2 + (1-1/3)2 +(1-2/4)2 + (1-0.8/5)2 ) =
exp(-(1/√4) √(1.5745 ) = exp(-0.339) = 0.534
Sim( dest1,dest2 ) =

Example: Item to User
Title Roelof Klas Mo Max X
(Action)
X
(
)
The Avengers 5 1 2 5 0.8 0.1
Spiderman II ? 2 1 ? 0.9 0.2
American Pie 2 5 ? 1 0.05 0.9
X(1) =
1 
0.8 
0.1
For each user u, learn a parameter ∈
R(n+1)
. 
Predict user u as rating movie i with
( )T
x(i)

(Action)
X
(
)
The Avengers 5 1 2 5 0.8 0.1
Spiderman II ? 2 1 ? 0.9 0.2
American Pie 2 5 ? 1 0.05 0.9
Mo ( (3)
) and Klas ( (2)
)
predict rating Mo ( (3)
), 
American pie (X(3)
)
(2) (3)(1) (4)
X(1)
X(2)
X(3)
X(3) =
1 
0.05 
0.9
temp
(3)
=
0 
0 
5

(Action)
X
(
)
The Avengers 5 1 2 5 0.8 0.1
Spiderman II ? 2 1 ? 0.9 0.2
American Pie 2 5 ? 1 0.05 0.9
Mo ( (3)
) and Klas ( (2)
)
), 
American pie (X(3)
)
(2) (3)(1) (4)
X(1)
X(2)
X(3)
1 
0.05 
0.9
0 
0 
5
dot product
≈ 4.5

(Action)
X
(
)
The Avengers 5 1 2 5 0.8 0.1
Spiderman II ? 2 1 ? 0.9 0.2
American Pie 2 5 4.5 1 0.05 0.9
Mo ( (3)
) and Klas ( (2)
)
), 
American pie (X(3)
)
(2) (3)(1) (4)
X(1)
X(2)
X(3)
1 
0.05 
0.9
0 
0 
5
dot product
≈ 4.5

(Action)
X
(
)
The Avengers 5 1 2 5 0.8 0.1
Spiderman II ≈4 2 1 ≈4 0.9 0.2
American Pie 2 5 4.5 1 0.05 0.9
How do we learn these user factor parameters?
(2) (3)(1) (4)
X(1)
X(2)
X(3)

problem formulation:!
• r(i,u) = 1 if user u has rated movie i, otherwise 0
• y
(i,u)
= rating by user u on movie i (if deﬁned)
•
(u)
= parameter vector for user u
• x
(i)
= feature vector for movie i
• For user u, movie i, predicted rating: ( )
T
(x
(i)
)
• temp m
(u)
= # of movies rated by user u 
 
min ∑ ( ( (u))T!(i) - "(i,u) )2 + ∑ ( )2
ƛ
——
2
#
k=1
(u)
(u)
1
2
——
m(u)m(u)
Say what?• learning (u) =
(A. Ng. 2013)

min ∑ ∑ (( (u))T!(i) - "(i,u))2 + ∑ ∑ ( )2
ƛ
—
2
#
u=1
problem formulation:!
• learning (u):
• learning (1), (2) , … , #
:
#
1
2
—
min ∑ ( ( (u))T!(i) - "(i,u) )2 + ∑ ( )2
ƛ
—
2
#
k=1
(u)
(u)
1
2
—
#u
k=1
(u)
regularization term
#
squared error term
actualpredicted
learn for “all” users
Example: Item to Userremember: 
y = rating  
parameter vector for a user 
x = feature vector for a movie

• User-based approach!
• Find a set of users Si who rated item j, that are most similar to
ui
• compute predicted Vij score as a function of ratings of item j
given by Si (usually weighted linear combination)
• Item-based approach!
• Find a set of most similar items Sj to the item j which were
rated by ui
• compute predicted Vij score as a function of ui's ratings for Sj

• Two primary models:
• Neighborhood models!
• focus on relationships between movies or users
• Latent Factor models
• focus on factors inferred from (rating) patterns
• computerized alternative to naive content creation
• predicts rating by dot product of user and movie locations
on known dimensions
68
(Sarwar, B. et al. 2001)

Neighborhood (user oriented)
69
(pic from Koren et al. 2009)

Neighbourhood Methods
• Problems:
• Ratings biased per user
• Ratings biased towards certain items
• Ratings change over time
• Ratings can rapidly change through real time
events (Oscar nomination, etc)
• Bias correction needed

Latent Factors
71
• latent factor models map users and items into a
latent feature space
• user's feature vector denotes the user's afﬁnity to
each of the features
• item's feature vector represents how much the
item itself is related to the features.
• rating is approximated by the dot product of the
user feature vector and the item feature vector.

Latent Factors (users+movies)
72
(pic from Koren et al. 2009)

Latent Factors (x+y)
73
(http://xkcd.com/388/)
xkcd.com

Latent Factor models
• Matrix Factorization:
• characterizes items + users by vectors of
factors inferred from (ratings or other user-
item related) patterns
• Given a list of users and items, and user-item
interactions, predict user behavior
• can deal with sparse data (matrix)
• can incorporate additional information
74

Matrix Factorization
• Dimensionality reduction
• Principal Components Analysis, PCA
• Singular Value Decomposition, SVD
• Non Negative Matrix Factorization, NNMF

Matrix Factorization: SVD
SVD, Singular Value Decomposition
• transforms correlated variables into a set of
uncorrelated ones that better expose the various
relationships among the original data items.
• identiﬁes and orders the dimensions along
which data points exhibit the most variation.
• allowing us to ﬁnd the best approximation of the
original data points using fewer dimensions.

SVD: Matrix Decomposition
77
U: document-to-concept similarity
matrix !
V: term-to-concept similarity matrix !
ƛ : its diagonal elements: ‘strength’ of
each concept !
(pic by Xavier Amatriain 2013)

SVD for  
Collaborative Filtering
each item i associated with vector qi ∈ ℝf  
each user u associated with vector pu ∈ ℝf  
qi measures extent to which item possesses factors 
pu measures extent of interest for user in items which
possess high on factors 
user-item interactions modeled as dot products within
the factor space, measured by qi
T pu 
user u rating on item i approximates: rui = qi
T pu
78
^

SVD for  
Collaborative Filtering
• compute u,i mappings: qi,pu ∈ ℝ
f
• factor user, item matrix
• imputation (Sarwar et.al. 2000)
• model only observed ratings + regularization (Funk 2006; Koren
2008)
• learn factor vectors qi and pu by minimizing (regularized) squared
error on set of known ratings: approximate user u rating of item i,
denoted by rui, leading to Learning Algorithm: 
 
79
^

SVD Visualized
regression line reducing two dimensional
space into one dimensional one

reducing three dimensional (multidimensional)
space into two dimensional plane
SVD Visualized

SVD: Code Away!
<Coding Time>
82

Stochastic Gradient Descent
• optimizable by Stochastic Gradient Descent
(SGD) (Funk 2006)
• incremental learning
• loops trough ratings and computes prediction
error for predicted rating on rui : 
 
• modify parameters by magnitude proportional
to y in opposite direction of the gradient, giving
learning rule: 
 
 
 
83
and

Gradient Descent
<Coding Time>
84

Alternating Least Squares
• optimizable by Alternating Least Squares (ALS) (2006)
• both qi and pu unknown: minimum function not convex 
—> can not be solved for a minimum.
• ALS rotates between ﬁxing qi’s and pu’s
• Fix qi or pu makes optimization problem quadratic  
—> one not optimized can now be solved
• qi and pu independently computed of other item/user factors:
parallelization
• Best for implicit data (dense matrix)
85

Alternating Least Squares
• rotates between fixing qi’s and pu’s
• when all pu’s fixed, recompute qi’s by solving a least
squares problem: 
• Fix matrix P as some matrix P, so that minimization problem: 
• or fix Q similarly as: 
• Learning Rule: 
86
where
and

• Add Biases:
• Add Input Sources: Implicit
Feedback: 
pu in rui becomes (pu +  
+ (…) )Add
Temporal Aspect / time-varying
parameters 
 
• Vary Conﬁdence Levels of Inputs
Develop Further…
87
and
pic: Lei Guo 2012
(Salakhutdinov &  
Mnih 2008; Koren 2010)

Develop Further…
• Final Algoritm: 
 
 
 
88
conﬁdence bias terms
regularization
(Paterek,A. 2007)

• Final Algorithm with Temporal dimensions: 
 
 
 
Develop Further…
89

• So what if we don’t have any content factors
known?
• Probabilistic Matrix Factorization to the rescue!
• describe each user and each movie by a
small set of attributes

Probabilistic Matrix
Factorization
• Imagine we have the following rating data: 
 
 
 
 
 
 
 
we could say that Roelof and Klas like Action
movies, but don’t like Comedy’s, while its the
opposite for Mo and Max
Title Roelof Klas Mo Max
The Avengers 5 1 1 4
Spiderman II 4 2 1 5
American Pie 3 5 4 1
Shrek 1 4 5 2

Factorization
• This could be represented by the PMF model by using three
dimensional vectors to describe each user and each movie.
• example latent vectors: • AV: [0, 0.3]
• SPII: [1, 0.3]
• AP [1, 0.3]
• SH [1, 0.3]
• Roelof: [0, 3]
• Klas: [8, 3]
• Mo [10, 3]
• Max [10, 3]
• predict rating by dot product
of user vector with the item
vector
• So predicting Klas’ rating for
Spiderman II = 8*1 + 3*0.3 =
• But descriptions of users
and movies not known
ahead of time.
• PGM discovers such latent
characteristics

<CODE TIME>
ratings
Factorization

Classiﬁcation
• k-Nearest Neighbors (KNN)
• Decision Trees
• Rule-Based
• Bayesian
• Artiﬁcial Neural Networks
• Support Vector Machines

Classiﬁcation
• k-Nearest Neighbors (KNN)!
• Decision Trees
• Rule-Based
• Bayesian
• Artiﬁcial Neural Networks
• Support Vector Machines

k-Nearest Neighbor s
• non parametric lazy learning algorithm
• data as feature space
• simple and fast
• k-nn classiﬁcation
• k-nn regression: density estimation

kNN: Classiﬁcation
• Classify
• several Xi used to classify Y
• compare (X1
p,X2
p) and (X1
q,Xq) by Squared
Euclidean distance:  
d2
pq = (X1
p - x1
q)2 + (X2
p - X2q)2
• ﬁnd k-Nearest Neighbors

kNN: Classiﬁcation
• input: content extracted emotional values of 561
movies. thanks: Johannes Östling :) 
 
 
 
 
  ie:
dimensions of
movie
“Hamlet”:

k-Nearest Neighbors
emotional
dimension “Anger”
vs “Love”

k-Nearest Neighbors
Negative: afraid, confused, helpless', hurt,
sad, angry, depressed
Positive: good, interested, love, positive,
strong
aggregate of
positive and
negative
emotions

Rating predictions:
• Pos — Neg
• Average
• Bayesian (Weighted) Estimates
• Lower bound of Wilson score conﬁdence interval
for a Bernoulli parameter

Rating predictions:
• Pos — Neg!
• Average

P — N
• (Positive ratings) - (Negative ratings)
• Problematic: 
 
 
 
 
 
(http://www.urbandictionary.com/deﬁne.php?term=movies)

Rating predictions:
• Pos — Neg
• Average!

Average
• (Positive ratings) / (Total ratings)
• Problematic: 
 
 
 
 
 
(http://www.amazon.co.uk/gp/bestsellers/electronics/)

Rating predictions:
• Pos — Neg
• Average
• Bayesian (Weighted) Estimates!

Ratings
• Top Ranking at IMDB (gives Bayesian estimate):
• Weighted Rating (WR) =  
(v / (v+m)) × R + (m / (v+m)) × C!
• Where:
R = average for the movie (mean) = (Rating) 
v = number of votes for the movie = (votes) 
m = minimum votes required to be listed in the Top 250
(currently 25000) 
C = the mean vote across the whole report (currently 7.0)

Bayesian (Weighted)
Estimates
• :
• weighted average on a  
per-item basis:
(source(s): http://www.imdb.com/title/tt0368891/ratings)

Bayesian (Weighted)
Estimates @ IMDB
Bayesian Weights for m = 1250
0"
0,1"
0,2"
0,3"
0,4"
0,5"
0,6"
0,7"
0,8"
0,9"
1"
0" 250" 500" 750" 1000" 1250" 2000" 3000" 4000" 5000"
speciﬁc" global"
• speciﬁc part for
individual items
• global part is
constant over all
items
• can be
precalculated

Rating predictions:
• Pos — Neg
• Average
• Lower bound of Wilson score conﬁdence
interval for a Bernoulli parameter

Wilson Score interval
• 1927 by Edwin B. Wilson 
 
 
• Given the ratings I have, there is a 95% chance
that the "real" fraction of positive ratings is at
least what?

• used by Reddit for comments ranking
• “rank the best comments highest  
regardless of their submission time”
• algorithm introduced to Reddit by  
Randall Munroe (the author of xkcd).
• treats the vote count as a statistical sampling of a
hypothetical full vote by everyone, much as in an
opinion poll.

• Endpoints for Wilson Score interval: 
 
 
 
• Reddit’s comment Ranking function 
(phat+z*z/(2*n) - z*sqrt((phat*(1-phat) + z*z/(4*n))/n))
/(1+z*z/n)

Bernoulli anyone?
*as the trial (N) = 2 (2
throws of dice) its actually
not a real Bernoulli
distribution

Graph Based Approaches
• Whats a Graph?!
• Why Graphs?
• Who uses Graphs?
• Talking with Graphs
• Graph example: Recommendations
• Graph example: Data Analysis

What’s a Graph?
124
Movie
has_genre
Genre
features_actor
Actor
Director
directed_by
likes
User
watches
rates
Userlikes_user
likes_user
friends
follows comments_movie
Comment
likes_com
m
ent
likes_actor
…
has_X
etcetera
locations!
time!
moods!
keywords!
…
Vertices (Nodes)
Edges (Relations)

• Whats a Graph?
• Why Graphs?!

Why Graphs?
• more complex (social networks…)
• more connected (wikis, pingbacks, rdf, collaborative
tagging)
• more semi-structured (wikis, rss)
• more decentralized: democratization of content production
(blogs, twitter*, social media*)
and just: MORE
Its the nature of todays Data, which is getting:

Data Trend
“Every 2 days we
create as much
information as we did
up to 2003” 
— Eric Schmidt, Google
Why Graphs?

Graphs vs Relational
128
relational
graph
graph
(pic by Michael Hunger, neo4j)
Why Graphs?
Its Fast!
Matrix based Calculations:  
Exponential run-time  
(items x users x factori x …)

Graphs vs Relational
129
relational
graph
graph
Why Graphs?
Its Fast!
Graph based Calculations:  
Linear/Constant run-time  
(item of interest x relations)

Its  
White-Board 
Friendly !
Why Graphs?

• Whats a Graph?
• Why Graphs?
• Who uses Graphs?!

Who uses Graphs?
• Facebook: Open Graph (https://
developers.facebook.com/docs/opengraph)
• Google: Knowledge Graph (http://
www.google.com/insidesearch/features/search/
knowledge.html)
• Twitter: FlockDB (https://github.com/twitter/ﬂockdb)
• Mozilla: Pancake (https://wiki.mozilla.org/Pancake)
• (…)

135

• Whats a Graph?
• Why Graphs?
• Talking with Graphs!

Talking with Graphs
• Graphs can be queried!
• no unions for comparison, but traversals!
• many different graph traversal patterns
(xkcd)

graph traversal patterns
• traversals can be seen as a diffusion
proces over a graph!
• “Energy” moves over a graph and spreads
out through the network!
• energy:
(Ghahramani 2012)

Energy Diffusion
(pic by Marko A. Rodriguez, 2011)

Energy Diffusion
energy = 4

Energy Diffusion
energy = 3

Energy Diffusion
energy = 2

Energy Diffusion
energy = 1

• Whats a Graph?
• Why Graphs?
• Graph example: Recommendations!

Diffusion Example:
Recommendations
• Energy diffusion is an easy algorithms for
making recommendations!
• different paths make different
recommendations!
• different paths for different problems can
be solved on same graph/domain!
• recommendation = “jumps” through the
data

Friend
Recommendation
• Who are my friends’ friends that are not
me or my friends

Friend
Recommendation
• Who are my friends’ friends 
 
• Who are my friends’ friends that are not
me or my friends
G.V(‘me’).outE[knows].inV.outE.inV
G.V(‘me’).outE[knows].inV.aggregate(x).outE. 
inV{!x.contains(it)}

Product
Recommendation
• Who likes what I like —> of these things, what
do they like which I dont’ already like

Product
Recommendation
• Who likes what I like 
do they like which I dont’ already like 
do they like which I dont’ already like
G.V(‘me’).outE[likes].inV.inE[likes].outV
G.V(‘me’).outE[likes].inV.aggregate(x).inE[likes]. 
outV.outE[like].inV{!x.contains(it)}
G.V(‘me’).outE[likes].inV.inE[likes].outV.outE[like].inV

Recommendations at 
with FoorSee

• Whats a Graph?
• Why Graphs?

Graphs: Conclusion
• Fast!
• Scalable!
• Diversiﬁcation!
• No Cold Start!
• Sparsity/Density not applicable

Graphs: Conclusion
• NaturalVisualizable!
• Feedback / Understandable!
• Connectable to the “web” / semantic web!
• Social Network Analysis!
• Real Time Updates / Recommendations !

WARNING
Graphs  
are  
Addictive!

References
• J. Dietmar, G. Friedrich and M. Zanker (2011) “Recommender Systems”,
International Joint Conference on Artificial Intelligence Barcelona
• Z. Ghahramani (2012) “Graph-based Semi-supervised Learning”, MLSS,
La Palma
• D. Goldbergs, D. Nichols, B.M. Oki and D. Terry (1992) “Using
collaborative filtering to weave an information tapestry”, Communications
of the ACM 35 (12)
• M. Hunger (2013) “Data Modeling with Neo4j”, http://
www.slideshare.net/neo4j/data-modeling-with-neo4j-25767444
• S. Funk (2006) “Netflix Update: Try This at Home”, sifter.org/~simon/
journal/20061211.html
159

References
• Y. Koren (2008) “Factorization meets the Neighborhood: A
Multifaceted Collaborative Filtering Model”, SIGKDD, http://
public.research.att.com/~volinsky/netflix/kdd08koren.pdf
• Y. Koren & C. Bell, (2007) “Scalable Collaborative Filtering with
Jointly Derived Neighborhood Interpolation Weights”
• Y, Koren (2010) “Collaborative filtering with temporal dynamics”
• A. Ng. (2013) Machine Learning, ml-004 @ Coursera
• A. Paterek (2007) “Improving Regularized Singular Value
Decomposition for Collaborative Filtering”, KDD
160

References
• P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J. Riedl (1994),
“GroupLens: An Open Architecture for Collaborative Filtering of
Netnews”, Proceedings of ACM
• B.R. Sarwar et al. (2000) “Application of Dimensionality Reduction in
Recommender System—A case Study”, WebKDD
• B. Saewar, G. Karypis, J. Konstan, J, Riedl (2001) “Item-Based
Collaborative Filtering Recommendation Algorithms”
• R. Salakhutdinov & A. Mnih (2008) “Probabilistic Matrix
Factorization”
• xkcd.com
161

Take Away Points
• Focus on the best Question, not just the Answer…!
• Best Match (most similar) vs Most Popular!
• Personalized vs Global Factors!
• Less is More ?!
• What is relevant?

Thanks for listening!
163
(xkcd)

Say What?
• So what other stuﬀ do we do at Vionlabs?

• Some examples of data extraction which is fed into
our BAG (Big Ass Grap)…

Recommender Systems, Matrices and Graphs

More Related Content

What's hot

Viewers also liked

Similar to Recommender Systems, Matrices and Graphs

More from Roelof Pieters

Recently uploaded

Recommender Systems, Matrices and Graphs