Homophily Approach Fast Post Recommendation Microblogging
1. An Homophily-based Approach for Fast Post
Recommendation in Microblogging Systems
Quentin Grossetti1,2
PhD supervised by C´edric du Mouza2,
Camelia Constantin1 and Nicolas Travers2
1LIP6 - Universit´e Pierre Marie Curie - Paris, France
2CEDRIC Laboratory - CNAM - Paris, France
RecSys 8th - 2018
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 1 / 25
2. Introduction
Context
Growth of microblogging plateforms since 2000
700 millions of messages/day in 2017
300 millions of messages/day in 2017
70 millions of publications/day in 2017
70 millions of pictures/day in 2017
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 2 / 25
3. Problem
How to connect users to relevant messages on those platforms ?
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 3 / 25
4. Problem
How to connect users to relevant messages on those platforms ?
Can we use traditional models ?
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 3 / 25
5. State of the art
State of the art
Content-based
Method Pros Cons
Content-based No need of interactions tweets are hard to describe
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
6. State of the art
State of the art
Collaborative filtering
Method Pros Cons
Content-based No need of interactions tweets are hard to describe
Collaborative filtering simple model and good results matrix too large
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
7. State of the art
State of the art
Matrix Factorization
Method Pros Cons
Content-based No need of interactions tweets are hard to describe
Collaborative filtering simple model and good results matrix too large
Matrix Factorization efficient to fight sparsity matrix growing fastly
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
8. State of the art
State of the art
Social systems
Method Pros Cons
Content-based No need of interactions tweets are hard to describe
Collaborative filtering simple model and good results matrix too large
Matrix Factorization efficient to fight sparsity matrix growing fastly
Social systems increase user engagement low meaning on edges
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
9. State of the art
State of the art
Random walks models (GraphJet)
Method Pros Cons
Content-based No need of interactions tweets are hard to describe
Collaborative filtering simple model and good results matrix too large
Matrix Factorization efficient to fight sparsity matrix growing fastly
Social systems increase user engagement low meaning on edges
Random walks models very cheap low memory
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
10. Data Analysis
Data Analysis
Dataset
Updated connected component from the graph found in [Kwak (2009)].
No of nodes 2,182,867
No of edges 325,451,980
No of tweets 2,571,173,369
Avg. out-degree 57.8
Avg. in-degree 69.4
max out-degree 348,595
max in-degree 185,401
Diameter 15
Average shortest path 3.7
Table – Twitter dataset characteristics
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 5 / 25
11. Data Analysis Retweets
Data Analysis
Retweets
0 1 2-5 6-50 51-200201-500 500+
103
104
105
106
107
108
109
1010
Number of retweets
Numberoftweets
Figure – Distribution of the number of
retweets per tweet
92% of tweets are never
retweeted
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 6 / 25
12. Data Analysis Retweets
Data Analysis
Lifespan
10 100 500 1,000
102
103
104
105
106
107
Lifespan (in hours)
Nbofmessages
Figure – Lifespan of a message
< 1hour : 40%
< 3days : 90%
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 7 / 25
13. Data Analysis Homophily
Data Analysis
Homophily
Distance Nb of users Perc. Average similarity
1 19,163 05.96% 0.0056
2 121,857 37.91% 0.0021
3 166,633 51.84% 0.0017
4 12,070 03.76% 0.0018
5 297 00.09% 0.0016
6 6 00.01% 0.0019
Impossible 1,396 00.43% 0.0023
Table – Evolution of the similarity score through distance in the network
sim(u, v) =
i∈Lu∩Lv
1
log(1+pop(i))
|Lu ∪ Lv |
(1)
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 8 / 25
14. Data Analysis Homophily
Data Analysis
Homophily
0 5 10 15 20 25
0
0.5
·10−2
Position in the ranking
Averagescore
Distances distribution (%)
Rank Average Distance 1 2 3 4
1 1.65 53.30 28.20 16.65 1.45
2 1.78 43.70 34.50 20.50 1.05
3 1.88 37.99 36.04 24.37 1.35
4 1.97 33.18 36.99 27.68 1.70
5 1.99 32.01 37.93 28.20 1.56
Table – Link beetween distance in the network and position in the Top-N
ranking
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 9 / 25
15. Data Analysis Homophily
Data Analysis
Conclusions
Many conclusions from this analysis :
Freshness is crucial (Messages dies very fast)
⇒ real-time recommendation
Few users have high similarity
⇒ use transitivity
Distance 2 successfully gather important users
⇒ rely on this homophily
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 10 / 25
16. Approach Similarity graph
Similarity Graph
Building process
U W
Z
V Y
X
Z1
Z2
Z3
Z4
Figure – Twitter Graph
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
17. Approach Similarity graph
Similarity Graph
Building process
U W
Z
V Y
X
Z1
Z2
Z3
Z4
Figure – Twitter Graph
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
18. Approach Similarity graph
Similarity Graph
Building process
U W
Z
V Y
X
Z1
Z2
Z3
Z4
Figure – Twitter Graph
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
19. Approach Similarity graph
Similarity Graph
Building process
U W
Z
V Y
X
Z1
Z2
Z3
Z4
Figure – Twitter Graph
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
20. Approach Similarity graph
Similarity Graph
Building process
U Y
V
Z1
sim(u, v)
sim(u, y)
sim(u, z1)
Figure – Similarity Graph
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
21. Approach Similarity graph
Propagation Model
In a nutshell
p(u, t) =
v∈Fu
p(u ← v, t)
|Fu|
(2)
With Fu the set of users influential to u and p(u ← v, t) a probability
estimation that u likes t determined by the behavior of the user v.
p(u ← v, t) = p(v, t) × sim(u, v) (3)
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 12 / 25
22. Approach Similarity graph
Propagation Model
Example
U W
V Y
X
0.3
0.5
0.1
0.5
0.4 0.8
Figure – Propagation example
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
23. Approach Propagation Model
Propagation Model
Example
U W
V Y
X t1
0.3
0.5
0.1
0.5
0.4 0.8
Figure – Propagation example - a tweet t1 is published
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
24. Approach Propagation Model
Propagation Model
Example
U W
V Y
X t1
0.3
0.5
0.1
0.5
0.4 0.8
Figure – Propagation example - X shares/likes t1
p(x, t1) = 1
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
25. Approach Propagation Model
Propagation Model
Example
U W
V Y
X t1
0.3
0.5
0.1
0.5
0.4 0.8
Figure – Propagation example - Propagation
p(w, t1) = v∈Fw
p(w←v,t)
|Fw| = 0+1×0.5
2 = 0.25
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
26. Approach Propagation Model
Propagation Model
Example
U W
V Y
X t1
0.3
0.5
0.1
0.5
0.4 0.8
Figure – Propagation example - Propagation
p(u, t1) = 0.25×0.5
2 = 0.0625
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
27. Approach Propagation Model
Propagation Model
Convergence
Let n be users (u1, u2, ..., un) :
a11pu1 + a12pu2 + .... + a1npun = b1
a21pu1 + a22pu2 + .... + a2npun = b2
... = ...
an1pu1 + an2pu2 + .... + annpun = bn
Could also be written as Ap = b with
A =
u1 u2 · · · un
u1 a11 a12 . . . a1n
u2 a21 a22 . . . a2n
...
...
...
...
...
un an1 an2 . . . ann
p =
p(u1)
p(u2)
...
p(un)
b =
b1
b2
...
bn
Because ∀u, v sim(u, v) ≤ 1, |ajj | ≥
j=i
|aij | for every i, the matrix A is
diagonally dominant.
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 14 / 25
28. Approach Propagation Model
Propagation Model
Optimizations
Speed up the convergence
Let ∆(u, t1) = p(u, t)k+1 − p(u, t)k
If ∆(u, t1) < β we stop the propagation
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 15 / 25
29. Approach Propagation Model
Propagation Model
Optimizations
Speed up the convergence
Let ∆(u, t1) = p(u, t)k+1 − p(u, t)k
If ∆(u, t1) < β we stop the propagation
Limitation of popular messages
If p(u, t) < f (t) no need to propagate.
f (t) = 1 − kp
kp+pop(t)p
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 15 / 25
30. Experiments Protocol
Experiments
Protocol
34 Millions of messages shared at least twice (130M RT actions)
Split the ranked set 90% - 10%
Try to predict this 10% for 1500 random users
Comparison with
CF : naive collaborative filtering
Bayes : probabilistic model
GraphJet : Twitter used solution
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 16 / 25
31. Experiments Results
Experiments
Hits
20 40 60 80 100 120 140 160 180 200
0
0.5
1
1.5
2
2.5
·104
Number of daily recommendations per user
Numberofhits(×104
)
Bayes
CF
GraphJet
SimGraph
Figure – Hits pour 1500 utilisateurs
Linear growth of CF
Fast growth for
SimGraph
GraphJet stuck around
5000 hits
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 17 / 25
32. Experiments Results
Experiments
Hits accuracy
20 40 60 80 100 120 140 160 180 200
101
102
Number of daily recommendations per user
Avg.numberofshares
Bayes
CF
GraphJet
SimGraph
Figure – Hits popularity
Bayes targets close
messages
GraphJet targets popular
messages
CF and SimGraph are
mixing both popular and
close messages
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 18 / 25
33. Experiments Results
Experiments
F1 scores
20 40 60 80 100 120 140 160 180 200
0
0.2
0.4
0.6
0.8
1
·10−2
Number of daily recommendations per user
F1Score(×10−2
)
Bayes
CF
GraphJet
SimGraph
Figure – F1 Scores
Small values
Peak around 20
recommendations
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 19 / 25
34. Experiments Results
Experiments
Running time
init. (per user) init total time time (per message) total time (70 cores //) total time
1,149,374 users 13,238,941 Tweets (Trial period) init + recos
Bayes 10ms 0.04h 975ms 51.22h 51.26h
CF 8,583ms 39.40h 0.5ms 0.02h 41.01h
SimGraph 311ms 1.41h 38ms 2.00h 3.41h
init. (per user) init total time time (per user) total time (70 cores //) total time
1,149,374 users 1,149,374 users * 66 days (Trial period) init + recos
GraphJet 0ms 0h 14ms 4.2h 4.2h
Table – Initialization and recommendation time (in ms)
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 20 / 25
35. Experiments Updating strategies
Experiments
Updating strategies
How to update SimGraph ?
Split the last 10% in 2
Evaluate hits prediction impact for the remaining 5% :
do nothing
recompute everything
update only weights
crossfold
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 21 / 25
36. Experiments Updating strategies
Experiments
Updating strategies
20 40 60 80 100 120 140 160 180 200
0
1,000
2,000
3,000
4,000
5,000
6,000
Number of daily recommendations per user
Numberofhits
recompute everything
do nothing
crossfold
update weights
Figure – Hits / updating strategies
doing nothing is the
same as updating
weights
crossfold (very cheap)
works very well
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 22 / 25
37. Experiments Updating strategies
Experiments
Convergence property of the SimGraph
Iteration Number of edges
1 4 950 417
2 7 519 031
3 10 836 129
4 11 496 445
5 11 678 747
Table – Number of edges evolution through iterations
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 23 / 25
38. Conclusion
Conclusion
Method relying on homophily to find nearest neighbors at low cost
Use transitivity to fight high sparsity
Our model outperforms state of the art solutions
Low-cost updates
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
39. Conclusion
Thanks for you attention !
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
41. Annexes
Annexes
Characteristics
Twitter Network Similarity Graph
No of nodes 2 182 867 1 149 374
No of edges 325,451,980 4 950 417
Avg. similarity score 0.008
Mean out-degree 57.8 5.9
Table – Similarity Graph Characteristics
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
42. Annexes
Annexes
Topology
1 2 3 4 5 10 15
100
102
104
106
108
1010
Smallest path
Numberofpaths
Figure – Twitter smallest paths distribution
Small world with average
distance of 3.7
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
43. Annexes
Annexes
Lifespan and popularity
100 101 102 103 104
100
101
102
103
104
Avg. lifepan (hours)
Avg.numberofretweets
Figure – Correlation between lifespan and
popularity
Strong correlation up to
103 hours
After a month, the
correlation fades
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
44. Annexes
Annexes
Topology
0 10 20
100
101
102
103
104
105
106
107
108
Shortest distance
Numberofpaths
Figure – Smallest path distribution for the
similarity graph
Diameter of 21 for an
average path of 7.5
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
45. Annexes
Annexes
Similarities
0 5 10 15 20 25
0
0.5
·10−2
Position dans le classement
Scoremoyen
Figure – Score similarity evolution
Really weak scores
Breaks after the fifth
most similar user
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
46. Annexes
Annexes
Hits according to user profiles
20 40 60 80 100 120 140 160 180 200
0
200
400
600
800
Number of daily recommendations per user
Numberofhits
Bayes
CF
GraphJet
SimGraph
Figure – 500 small
20 40 60 80 100 120 140 160 180 200
0
1,000
2,000
3,000
4,000
5,000
6,000
Number of daily recommendations per user
Bayes
CF
GraphJet
SimGraph
Figure – 500 medium
20 40 60 80 100 120 140 160 180 200
0
0.5
1
1.5
·104
Number of daily recommendations per user
Bayes
CF
GraphJet
SimGraph
Figure – 500 big users
small < 50 ; medium < 1000 ; big > 1000
Tendencies are very stables no matter the profile of users
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
47. Annexes
Annexes
Intersections
20 40 60 80 100 120 140 160 180 200
0
0.2
0.4
0.6
0.8
1
Number of daily recommendations per user
RatioofhitsincommonwithSimGraph
Bayes
CF
GraphJet
SimGraph
Figure – Parts of hits included in SimGraph
SimGraph merges all the
methods
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
48. Annexes
Annexes
Number of recommendations
20 40 60 80 100 120 140 160 180 200
0
20
40
60
80
100
120
140
Number of daily recommendations per user
Numberofactualrecommendations
Bayes
CF
GraphJet
SimGraph
Figure – Recall capacity
CF is less limited
Other methods are
bunched together
Threshold effect for
SimGraph and Bayes
An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25