WhoToFollow
Recommendations
Rohan Agrawal
Fall 2013 Internship @ Spotify
User Recommendation Problem
●
●
●

First step: Candidate set generation
Second step: Rank candidates using a supervised ML model
Problem?
●
●

●

●
●

Need to generate training data for the ML model
Generate candidates (2 hop) for users in an old social graph, say 1 month
before
Look at current social graph, if a link was established between user, candidate
in the current graph, treat the edge as a positive class.
If a link was not established, treat the edge as a negative class.
Not the best way to get Training Data as edges actually formed depend on
the previous recommendation algorithm, but a good start.
Candidate Set Generation
Which Users Do you want to consider for WTF recs
● Simple Approach: All Users at 2 hops are candidates (ranked by the
total number of hops, just take the top 200)
● Complex Approaches
●
●

Use personalized PageRank, SALSA to find candidates for each user.
Use user interaction to get weighted social graph, then perform above
techniques.

Many users (around 50% users do not have 2 hop neighborhood)
● Use facebook friends as candidates (only 16% users don’t have fb
candidates, and 5 % of users don’t have fb candidates or 2 hop
neighbors)
● Use Approximate Nearest Neighbors
Extracting Features
●
●
●
●
●
●

●
●

●

●
●
●

hops: number of paths of length 2 between user1 and user2
hopslog: hops/log(# of subscribers user2 has)
common: no. of common neighbors shared by user1 and user2
jaccard: common/(union of neighbors of user1 and user2)
cosine: cosine similarity of user vectors of user1 and user2
adamic: summation over neighbors of user1 [1/log(# of subscribers of
the neighbor)]
indegree: in degree of user2
fraction_n2: for 2 users i and j, fraction of subscriptions of i that are
following j
fraction_n1: for 2 users i and j, fraction of subscriptions of j that have i
follows
pref_attachment: number of subscriptions of i * num of followers of j
reverse_edge: of i,j = 1 if j follows i
Label: positive or negative class, as described in slide 2.
Ranking Features by Importance
●
●
●
●
●
●
●
●
●
●
●

0.185521009562 hops
0.151976624315 fraction_n2
0.126571252655 fraction_n1
0.126321244854 cosine
0.0828860325682 pref_attachment
0.0709010797719 indegree_j
0.0660478462424 hopslog
0.0649419577136 adamic
0.0531705297389 common
0.0372079185808 jaccard
0.0344545039974 reverse_edge

As given by Gradient Boosted Regression Trees. This ranking should be
looked at just as an indication because many features like fraction_n2,
fraction_n1, jaccard are dependent on each other, and features like
cosine similarity don’t depend on other features.
Extracting Features
●

More Features that can be considered in the future:
●

Facebook friend Boolean, PageRank score, Geographic Distance, Age
Difference, …
Machine Learning Models
● Tried Logistic Regression, SVM, Random Forests, in the end Gradient
Boosted Decision Trees give the best performance. (68 - 69%)
● Though the model they’ve learnt depends on the current module which
is serving WTF recs.
● When pushed to production, model can learn from a better training set.
Results from testing with Spotify Employees
● Total Records: 1251
● Yes / Total = 22.14%
● Yes and I know the recommendation / Total responses where users
knew their recommendation = 61.11%
● Yes and I like the persons musical taste / Total responses where users
liked their recommendations taste = 61.36%
● Yes, I like and Know the recommended user / Total people who liked
and knew their recommendations = 78.57%
● Yes, I like users taste but I don’t know user / Total people who like taste
and didn’t know their recommendations= 35.7%
● Yes, I know the user but dislike users taste / Total people who disliked
taste and knew their recommendations= 17.8%
Optimizations:
● First I had converted each userID into an integer, loaded the entire
dataset into memory, and then done the computation.
● This was very difficult to convert to Multiprocessing Code. (Each
process tried to make a copy of the graph, which was not possible,
creating a shared object was very slow)
● Best option was to use a DataBase, because only retrieval was needed
to be done.
● Sparkey preferred to Tokyo Cabinet, because time to construct index
was much lower.
● 1 Process: Very Very Slow, 10 users per second
●

●
●

bound by call to OpenGraph API for spotify users’ FB friends

100 Processes: 92.6 users per second, 1 Million Users in 180 minutes
150 Processes: 116.7 users per second, 1.8 Million Users in 257 minutes
Resources
● Seminal paper by Kleinberg http://www.cs.cornell.
edu/home/kleinber/link-pred.pdf
● Supervised Learning http://www3.nd.edu/~dial/papers/KDD10.pdf
● Twitter http://www.stanford.edu/~rezab/papers/wtf_overview.pdf
●

●

Twitter’s WTF problem is pretty similar to ours, asymmetric follows

Future:
●

●

●

Supervised Random Walks http://cs.stanford.edu/people/jure/pubs/linkpredwsdm11.pdf
Large Scale Twitter http://www.umiacs.umd.
edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf
Fast Page Rank http://arxiv.org/abs/1006.2880
Thank YOU!
Questions?

WhoToFollow @Spotify

  • 1.
  • 2.
    User Recommendation Problem ● ● ● Firststep: Candidate set generation Second step: Rank candidates using a supervised ML model Problem? ● ● ● ● ● Need to generate training data for the ML model Generate candidates (2 hop) for users in an old social graph, say 1 month before Look at current social graph, if a link was established between user, candidate in the current graph, treat the edge as a positive class. If a link was not established, treat the edge as a negative class. Not the best way to get Training Data as edges actually formed depend on the previous recommendation algorithm, but a good start.
  • 3.
    Candidate Set Generation WhichUsers Do you want to consider for WTF recs ● Simple Approach: All Users at 2 hops are candidates (ranked by the total number of hops, just take the top 200) ● Complex Approaches ● ● Use personalized PageRank, SALSA to find candidates for each user. Use user interaction to get weighted social graph, then perform above techniques. Many users (around 50% users do not have 2 hop neighborhood) ● Use facebook friends as candidates (only 16% users don’t have fb candidates, and 5 % of users don’t have fb candidates or 2 hop neighbors) ● Use Approximate Nearest Neighbors
  • 4.
    Extracting Features ● ● ● ● ● ● ● ● ● ● ● ● hops: numberof paths of length 2 between user1 and user2 hopslog: hops/log(# of subscribers user2 has) common: no. of common neighbors shared by user1 and user2 jaccard: common/(union of neighbors of user1 and user2) cosine: cosine similarity of user vectors of user1 and user2 adamic: summation over neighbors of user1 [1/log(# of subscribers of the neighbor)] indegree: in degree of user2 fraction_n2: for 2 users i and j, fraction of subscriptions of i that are following j fraction_n1: for 2 users i and j, fraction of subscriptions of j that have i follows pref_attachment: number of subscriptions of i * num of followers of j reverse_edge: of i,j = 1 if j follows i Label: positive or negative class, as described in slide 2.
  • 5.
    Ranking Features byImportance ● ● ● ● ● ● ● ● ● ● ● 0.185521009562 hops 0.151976624315 fraction_n2 0.126571252655 fraction_n1 0.126321244854 cosine 0.0828860325682 pref_attachment 0.0709010797719 indegree_j 0.0660478462424 hopslog 0.0649419577136 adamic 0.0531705297389 common 0.0372079185808 jaccard 0.0344545039974 reverse_edge As given by Gradient Boosted Regression Trees. This ranking should be looked at just as an indication because many features like fraction_n2, fraction_n1, jaccard are dependent on each other, and features like cosine similarity don’t depend on other features.
  • 6.
    Extracting Features ● More Featuresthat can be considered in the future: ● Facebook friend Boolean, PageRank score, Geographic Distance, Age Difference, …
  • 7.
    Machine Learning Models ●Tried Logistic Regression, SVM, Random Forests, in the end Gradient Boosted Decision Trees give the best performance. (68 - 69%) ● Though the model they’ve learnt depends on the current module which is serving WTF recs. ● When pushed to production, model can learn from a better training set.
  • 8.
    Results from testingwith Spotify Employees ● Total Records: 1251 ● Yes / Total = 22.14% ● Yes and I know the recommendation / Total responses where users knew their recommendation = 61.11% ● Yes and I like the persons musical taste / Total responses where users liked their recommendations taste = 61.36% ● Yes, I like and Know the recommended user / Total people who liked and knew their recommendations = 78.57% ● Yes, I like users taste but I don’t know user / Total people who like taste and didn’t know their recommendations= 35.7% ● Yes, I know the user but dislike users taste / Total people who disliked taste and knew their recommendations= 17.8%
  • 9.
    Optimizations: ● First Ihad converted each userID into an integer, loaded the entire dataset into memory, and then done the computation. ● This was very difficult to convert to Multiprocessing Code. (Each process tried to make a copy of the graph, which was not possible, creating a shared object was very slow) ● Best option was to use a DataBase, because only retrieval was needed to be done. ● Sparkey preferred to Tokyo Cabinet, because time to construct index was much lower. ● 1 Process: Very Very Slow, 10 users per second ● ● ● bound by call to OpenGraph API for spotify users’ FB friends 100 Processes: 92.6 users per second, 1 Million Users in 180 minutes 150 Processes: 116.7 users per second, 1.8 Million Users in 257 minutes
  • 10.
    Resources ● Seminal paperby Kleinberg http://www.cs.cornell. edu/home/kleinber/link-pred.pdf ● Supervised Learning http://www3.nd.edu/~dial/papers/KDD10.pdf ● Twitter http://www.stanford.edu/~rezab/papers/wtf_overview.pdf ● ● Twitter’s WTF problem is pretty similar to ours, asymmetric follows Future: ● ● ● Supervised Random Walks http://cs.stanford.edu/people/jure/pubs/linkpredwsdm11.pdf Large Scale Twitter http://www.umiacs.umd. edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf Fast Page Rank http://arxiv.org/abs/1006.2880
  • 11.