Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems - Quentin Grossetti - LIP6

285 views

Published on

RecSysFR 8th meetup. See video and more details at http://recommenders.fr/index.php/2018/02/12/recsys-fr-meetup-8th-edition-videos-slides-available/

Published in: Science
  • Be the first to comment

  • Be the first to like this

An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems - Quentin Grossetti - LIP6

  1. 1. An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems Quentin Grossetti1,2 PhD supervised by C´edric du Mouza2, Camelia Constantin1 and Nicolas Travers2 1LIP6 - Universit´e Pierre Marie Curie - Paris, France 2CEDRIC Laboratory - CNAM - Paris, France RecSys 8th - 2018 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 1 / 25
  2. 2. Introduction Context Growth of microblogging plateforms since 2000 700 millions of messages/day in 2017 300 millions of messages/day in 2017 70 millions of publications/day in 2017 70 millions of pictures/day in 2017 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 2 / 25
  3. 3. Problem How to connect users to relevant messages on those platforms ? An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 3 / 25
  4. 4. Problem How to connect users to relevant messages on those platforms ? Can we use traditional models ? An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 3 / 25
  5. 5. State of the art State of the art Content-based Method Pros Cons Content-based No need of interactions tweets are hard to describe An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
  6. 6. State of the art State of the art Collaborative filtering Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results matrix too large An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
  7. 7. State of the art State of the art Matrix Factorization Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results matrix too large Matrix Factorization efficient to fight sparsity matrix growing fastly An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
  8. 8. State of the art State of the art Social systems Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results matrix too large Matrix Factorization efficient to fight sparsity matrix growing fastly Social systems increase user engagement low meaning on edges An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
  9. 9. State of the art State of the art Random walks models (GraphJet) Method Pros Cons Content-based No need of interactions tweets are hard to describe Collaborative filtering simple model and good results matrix too large Matrix Factorization efficient to fight sparsity matrix growing fastly Social systems increase user engagement low meaning on edges Random walks models very cheap low memory An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 4 / 25
  10. 10. Data Analysis Data Analysis Dataset Updated connected component from the graph found in [Kwak (2009)]. No of nodes 2,182,867 No of edges 325,451,980 No of tweets 2,571,173,369 Avg. out-degree 57.8 Avg. in-degree 69.4 max out-degree 348,595 max in-degree 185,401 Diameter 15 Average shortest path 3.7 Table – Twitter dataset characteristics An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 5 / 25
  11. 11. Data Analysis Retweets Data Analysis Retweets 0 1 2-5 6-50 51-200201-500 500+ 103 104 105 106 107 108 109 1010 Number of retweets Numberoftweets Figure – Distribution of the number of retweets per tweet 92% of tweets are never retweeted An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 6 / 25
  12. 12. Data Analysis Retweets Data Analysis Lifespan 10 100 500 1,000 102 103 104 105 106 107 Lifespan (in hours) Nbofmessages Figure – Lifespan of a message < 1hour : 40% < 3days : 90% An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 7 / 25
  13. 13. Data Analysis Homophily Data Analysis Homophily Distance Nb of users Perc. Average similarity 1 19,163 05.96% 0.0056 2 121,857 37.91% 0.0021 3 166,633 51.84% 0.0017 4 12,070 03.76% 0.0018 5 297 00.09% 0.0016 6 6 00.01% 0.0019 Impossible 1,396 00.43% 0.0023 Table – Evolution of the similarity score through distance in the network sim(u, v) = i∈Lu∩Lv 1 log(1+pop(i)) |Lu ∪ Lv | (1) An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 8 / 25
  14. 14. Data Analysis Homophily Data Analysis Homophily 0 5 10 15 20 25 0 0.5 ·10−2 Position in the ranking Averagescore Distances distribution (%) Rank Average Distance 1 2 3 4 1 1.65 53.30 28.20 16.65 1.45 2 1.78 43.70 34.50 20.50 1.05 3 1.88 37.99 36.04 24.37 1.35 4 1.97 33.18 36.99 27.68 1.70 5 1.99 32.01 37.93 28.20 1.56 Table – Link beetween distance in the network and position in the Top-N ranking An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 9 / 25
  15. 15. Data Analysis Homophily Data Analysis Conclusions Many conclusions from this analysis : Freshness is crucial (Messages dies very fast) ⇒ real-time recommendation Few users have high similarity ⇒ use transitivity Distance 2 successfully gather important users ⇒ rely on this homophily An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 10 / 25
  16. 16. Approach Similarity graph Similarity Graph Building process U W Z V Y X Z1 Z2 Z3 Z4 Figure – Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
  17. 17. Approach Similarity graph Similarity Graph Building process U W Z V Y X Z1 Z2 Z3 Z4 Figure – Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
  18. 18. Approach Similarity graph Similarity Graph Building process U W Z V Y X Z1 Z2 Z3 Z4 Figure – Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
  19. 19. Approach Similarity graph Similarity Graph Building process U W Z V Y X Z1 Z2 Z3 Z4 Figure – Twitter Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
  20. 20. Approach Similarity graph Similarity Graph Building process U Y V Z1 sim(u, v) sim(u, y) sim(u, z1) Figure – Similarity Graph An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 11 / 25
  21. 21. Approach Similarity graph Propagation Model In a nutshell p(u, t) = v∈Fu p(u ← v, t) |Fu| (2) With Fu the set of users influential to u and p(u ← v, t) a probability estimation that u likes t determined by the behavior of the user v. p(u ← v, t) = p(v, t) × sim(u, v) (3) An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 12 / 25
  22. 22. Approach Similarity graph Propagation Model Example U W V Y X 0.3 0.5 0.1 0.5 0.4 0.8 Figure – Propagation example An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
  23. 23. Approach Propagation Model Propagation Model Example U W V Y X t1 0.3 0.5 0.1 0.5 0.4 0.8 Figure – Propagation example - a tweet t1 is published An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
  24. 24. Approach Propagation Model Propagation Model Example U W V Y X t1 0.3 0.5 0.1 0.5 0.4 0.8 Figure – Propagation example - X shares/likes t1 p(x, t1) = 1 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
  25. 25. Approach Propagation Model Propagation Model Example U W V Y X t1 0.3 0.5 0.1 0.5 0.4 0.8 Figure – Propagation example - Propagation p(w, t1) = v∈Fw p(w←v,t) |Fw| = 0+1×0.5 2 = 0.25 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
  26. 26. Approach Propagation Model Propagation Model Example U W V Y X t1 0.3 0.5 0.1 0.5 0.4 0.8 Figure – Propagation example - Propagation p(u, t1) = 0.25×0.5 2 = 0.0625 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 13 / 25
  27. 27. Approach Propagation Model Propagation Model Convergence Let n be users (u1, u2, ..., un) :    a11pu1 + a12pu2 + .... + a1npun = b1 a21pu1 + a22pu2 + .... + a2npun = b2 ... = ... an1pu1 + an2pu2 + .... + annpun = bn Could also be written as Ap = b with A =      u1 u2 · · · un u1 a11 a12 . . . a1n u2 a21 a22 . . . a2n ... ... ... ... ... un an1 an2 . . . ann      p =      p(u1) p(u2) ... p(un)      b =      b1 b2 ... bn      Because ∀u, v sim(u, v) ≤ 1, |ajj | ≥ j=i |aij | for every i, the matrix A is diagonally dominant. An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 14 / 25
  28. 28. Approach Propagation Model Propagation Model Optimizations Speed up the convergence Let ∆(u, t1) = p(u, t)k+1 − p(u, t)k If ∆(u, t1) < β we stop the propagation An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 15 / 25
  29. 29. Approach Propagation Model Propagation Model Optimizations Speed up the convergence Let ∆(u, t1) = p(u, t)k+1 − p(u, t)k If ∆(u, t1) < β we stop the propagation Limitation of popular messages If p(u, t) < f (t) no need to propagate. f (t) = 1 − kp kp+pop(t)p An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 15 / 25
  30. 30. Experiments Protocol Experiments Protocol 34 Millions of messages shared at least twice (130M RT actions) Split the ranked set 90% - 10% Try to predict this 10% for 1500 random users Comparison with CF : naive collaborative filtering Bayes : probabilistic model GraphJet : Twitter used solution An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 16 / 25
  31. 31. Experiments Results Experiments Hits 20 40 60 80 100 120 140 160 180 200 0 0.5 1 1.5 2 2.5 ·104 Number of daily recommendations per user Numberofhits(×104 ) Bayes CF GraphJet SimGraph Figure – Hits pour 1500 utilisateurs Linear growth of CF Fast growth for SimGraph GraphJet stuck around 5000 hits An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 17 / 25
  32. 32. Experiments Results Experiments Hits accuracy 20 40 60 80 100 120 140 160 180 200 101 102 Number of daily recommendations per user Avg.numberofshares Bayes CF GraphJet SimGraph Figure – Hits popularity Bayes targets close messages GraphJet targets popular messages CF and SimGraph are mixing both popular and close messages An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 18 / 25
  33. 33. Experiments Results Experiments F1 scores 20 40 60 80 100 120 140 160 180 200 0 0.2 0.4 0.6 0.8 1 ·10−2 Number of daily recommendations per user F1Score(×10−2 ) Bayes CF GraphJet SimGraph Figure – F1 Scores Small values Peak around 20 recommendations An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 19 / 25
  34. 34. Experiments Results Experiments Running time init. (per user) init total time time (per message) total time (70 cores //) total time 1,149,374 users 13,238,941 Tweets (Trial period) init + recos Bayes 10ms 0.04h 975ms 51.22h 51.26h CF 8,583ms 39.40h 0.5ms 0.02h 41.01h SimGraph 311ms 1.41h 38ms 2.00h 3.41h init. (per user) init total time time (per user) total time (70 cores //) total time 1,149,374 users 1,149,374 users * 66 days (Trial period) init + recos GraphJet 0ms 0h 14ms 4.2h 4.2h Table – Initialization and recommendation time (in ms) An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 20 / 25
  35. 35. Experiments Updating strategies Experiments Updating strategies How to update SimGraph ? Split the last 10% in 2 Evaluate hits prediction impact for the remaining 5% : do nothing recompute everything update only weights crossfold An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 21 / 25
  36. 36. Experiments Updating strategies Experiments Updating strategies 20 40 60 80 100 120 140 160 180 200 0 1,000 2,000 3,000 4,000 5,000 6,000 Number of daily recommendations per user Numberofhits recompute everything do nothing crossfold update weights Figure – Hits / updating strategies doing nothing is the same as updating weights crossfold (very cheap) works very well An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 22 / 25
  37. 37. Experiments Updating strategies Experiments Convergence property of the SimGraph Iteration Number of edges 1 4 950 417 2 7 519 031 3 10 836 129 4 11 496 445 5 11 678 747 Table – Number of edges evolution through iterations An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 23 / 25
  38. 38. Conclusion Conclusion Method relying on homophily to find nearest neighbors at low cost Use transitivity to fight high sparsity Our model outperforms state of the art solutions Low-cost updates An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  39. 39. Conclusion Thanks for you attention ! An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  40. 40. Annexes ANNEXES An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  41. 41. Annexes Annexes Characteristics Twitter Network Similarity Graph No of nodes 2 182 867 1 149 374 No of edges 325,451,980 4 950 417 Avg. similarity score 0.008 Mean out-degree 57.8 5.9 Table – Similarity Graph Characteristics An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  42. 42. Annexes Annexes Topology 1 2 3 4 5 10 15 100 102 104 106 108 1010 Smallest path Numberofpaths Figure – Twitter smallest paths distribution Small world with average distance of 3.7 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  43. 43. Annexes Annexes Lifespan and popularity 100 101 102 103 104 100 101 102 103 104 Avg. lifepan (hours) Avg.numberofretweets Figure – Correlation between lifespan and popularity Strong correlation up to 103 hours After a month, the correlation fades An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  44. 44. Annexes Annexes Topology 0 10 20 100 101 102 103 104 105 106 107 108 Shortest distance Numberofpaths Figure – Smallest path distribution for the similarity graph Diameter of 21 for an average path of 7.5 An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  45. 45. Annexes Annexes Similarities 0 5 10 15 20 25 0 0.5 ·10−2 Position dans le classement Scoremoyen Figure – Score similarity evolution Really weak scores Breaks after the fifth most similar user An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  46. 46. Annexes Annexes Hits according to user profiles 20 40 60 80 100 120 140 160 180 200 0 200 400 600 800 Number of daily recommendations per user Numberofhits Bayes CF GraphJet SimGraph Figure – 500 small 20 40 60 80 100 120 140 160 180 200 0 1,000 2,000 3,000 4,000 5,000 6,000 Number of daily recommendations per user Bayes CF GraphJet SimGraph Figure – 500 medium 20 40 60 80 100 120 140 160 180 200 0 0.5 1 1.5 ·104 Number of daily recommendations per user Bayes CF GraphJet SimGraph Figure – 500 big users small < 50 ; medium < 1000 ; big > 1000 Tendencies are very stables no matter the profile of users An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  47. 47. Annexes Annexes Intersections 20 40 60 80 100 120 140 160 180 200 0 0.2 0.4 0.6 0.8 1 Number of daily recommendations per user RatioofhitsincommonwithSimGraph Bayes CF GraphJet SimGraph Figure – Parts of hits included in SimGraph SimGraph merges all the methods An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25
  48. 48. Annexes Annexes Number of recommendations 20 40 60 80 100 120 140 160 180 200 0 20 40 60 80 100 120 140 Number of daily recommendations per user Numberofactualrecommendations Bayes CF GraphJet SimGraph Figure – Recall capacity CF is less limited Other methods are bunched together Threshold effect for SimGraph and Bayes An Homophily-based Approach for Fast Post Recommendation in Microblogging Systems RecSys 8th - 2018 24 / 25

×