Who Will Follow You                    Back? Reciprocal                    Relationship11                2                ...
reciprocal                                                       parasocialMotivation                                     ...
Example : real friend relationship   On Twitter : Who Will Follow You Back?                                     30%       ...
Several key challenges   How to model the formation of    two-way relationships?                                        y...
Outline   Previous works   Our approach   Experimental results   Conclusion & future works
Link prediction   Unsupervised link prediction        Scores & intution, such as preferential attachment [N01].   Super...
Social behavior analysis   Existing works on social behavior analysis:      The difference of the social influence on di...
Twitter study   The twitter network.        The topological and geographical properties. [J07]        Twittersphere and...
Outline   Previous works   Our approach   Experimental results   Conclusion & future works
Factor graph model   Problem definition        Given a network at time t, i.e., Gt = (Vt, Et, Xt, Yt)        Variables ...
Random Field                              Given an undirected graph G=(V,     Ya                  Yb      E) such that Y={...
Conditional random field(CRF)   Model them in a Markov random field, by the Hammersley-Clifford theorem,      P(xi|yi) =...
Maximize likelihood      Objective function         O(θ) = log Pθ(Y | X, G) = ΣiΣjαj fj (xij, yi) + ΣΣμk hk(Yc) – log Z ...
Learning algorithm           θ*    Goal : = argmax O(θ)                     L                                    (k )    ...
Learning algorithm   One challenge : how to calculate the marginal distribution Pμ (yc|x, G).                            ...
Belief propagation   Sum-product     The algorithm works by passing real values functions called “messages” along      t...
Belief propagation   A “message” from a variable node v to a factor node u is    μv->u(xv) = Πu*∈N(v){u}μu*->v(xv)   A “...
Learning algorithm(TriFG model)Input : network Gt, learning rateηOutput : estimated parametersθInitalize θ= 0;Repeat     P...
Prediction features   Geographic distance        Global vs Local   Homophily        Link homophily        Status homo...
Our approach : TriFG   TriFG model     Features based on observations     Partially labeled     Conditional random fie...
Outline   Previous works   Our approach   Experimental results   Conclusion & future works
Data collection   Huge sub-network of twitter      13,442,659 users and 56,893,234 following links.      Extracted 35,7...
Prediction performance   Baseline algorithms         SVM & LRC & CRF   Accurately infer 90% of reciprocal relationships...
Convergence property   BLP-based learning algorithm can converge in less than 10 iterations.   After only 7 learning ite...
Effect of Time Span   Distribution of time span in which follow back action is performed.      60% of follow-backs are p...
Factor contribution analysis   We consider five different factor functions        Geographic distance(G)        Link ho...
Outline   Previous works   Our approach   Experimental results   Conclusion & future works
Conclusion   Reciprocal relationship prediction in social network   Incorporates social theories into prediction model....
Future works   Other social theories for reciprocal relationship prediction.   User feedback.   Incorporating user inte...
   Thanks!   Q&A
Reference    [BL11] L.Backstrom and J.Leskovec. Supervised random walks :     predicting and recommending links in social...
Reference    [J07]A. Java, X.Song, T.Finin, and B.L. Tseng. Why we twitter : An     analysis of a microblogging community...
Upcoming SlideShare
Loading in …5
×

Who will-follow-you-back-reciprocal-relationship-prediction

276 views

Published on

Published in: Technology, Health & Medicine
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
276
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Who will-follow-you-back-reciprocal-relationship-prediction

  1. 1. Who Will Follow You Back? Reciprocal Relationship11 2 Prediction* 3John Hopcroft, Tiancheng Lou, Jie TangDepartment of Computer Science, Cornell University,2Institutefor Interdisciplinary Information Sciences, TsinghuaUniversity3Department of Computer Science, Tsinghua University
  2. 2. reciprocal parasocialMotivation v2 v3 v1 Two kinds of relationships in social network, v4 v5  one-way(called parasocial) relationship and,  two-way(called reciprocal) relationship v6 Two-way(reciprocal) relationship  usually developed from a one-way relationship  more trustful. after 3 days prediction Try to understand(predict) the formation of two- way relationships  micro-level dynamics of the social network. v2 v3  underlying community structure? v1  how users influence each other? v4 v5 v6
  3. 3. Example : real friend relationship On Twitter : Who Will Follow You Back? 30% ? 100% ? 60% ? JimmyQiao Ladygaga 1% ? Shiteng Obama Huwei
  4. 4. Several key challenges How to model the formation of two-way relationships? y2= ? y2= ? y2 y2  SVM & LRC y4 = 0 4 How to combine many social y4 4 theories into the prediction y1= 1 y1 y1= 1 y1 model? y3 y3 y3 = ? y3 = ? v2 v3 v1 v4 v5 (v1,, v2) (v1 v2) (v3, v4) 3 4 v6 (v1,, v5) (v1 v5) (v2,, v4) (v2 v4)
  5. 5. Outline Previous works Our approach Experimental results Conclusion & future works
  6. 6. Link prediction Unsupervised link prediction  Scores & intution, such as preferential attachment [N01]. Supervised link prediction  supervised random walks [BL11].  logistic regression model to predict positive and negative links [L10]. Main differences:  We predict a directed link instead of only handles undirected social networks.  Our model is dynamic and learned from the evolution of the Twitter network.
  7. 7. Social behavior analysis Existing works on social behavior analysis:  The difference of the social influence on difference topics and to model the topic- level social influence in social networks. [T09]  How social actions evolve in a dynamic social network? [T10] Main differences:  The proposed methods in previous work can be used here  but the problem is fundamentally different.
  8. 8. Twitter study The twitter network.  The topological and geographical properties. [J07]  Twittersphere and some notable properties, such as a non-power-law follower distribution, and low reciprocity. [K10] The twitter users.  Influential users.  Tweeting behaviors of users. The tweets.  Utilize the real-time nature to detect a target event. [S10]  TwitterMonitor, to detect emerging topics. [M10]
  9. 9. Outline Previous works Our approach Experimental results Conclusion & future works
  10. 10. Factor graph model Problem definition  Given a network at time t, i.e., Gt = (Vt, Et, Xt, Yt)  Variables y are partially labeled.  Goal : infer unknown variables. Factor graph model  P(Y | X, G) = P(X, G|Y) P(Y) / P(X, G) = C0 P(X | Y) P(Y | G)  In P(X | Y), assuming that the generative probability is conditionally independent,  P(Y | X, G) = C0P(Y | G)ΠP(xi|yi)  How to deal with P(Y | G) ?
  11. 11. Random Field Given an undirected graph G=(V, Ya Yb E) such that Y={Yv|v∈V}, if the probability of Yv given X and Yc those random variables corresponding to nodes neighboring v in G. Then (X, Y) Ye is a conditional random field. Yd undirected graphical model Yf globally conditioned on X
  12. 12. Conditional random field(CRF) Model them in a Markov random field, by the Hammersley-Clifford theorem,  P(xi|yi) = 1/Z1 * exp {Σαj fj (xij, yi)}  P(Y|G) = 1/Z2 * exp {ΣcΣkμkhk(Yc)}  Z1 and Z2 are normalization factors. So, the likelihood probability is :  P(Y | X, G) = 1/Z0 * exp {Σαj fj (xij, yi) + ΣcΣkμkhk(Yc)}  Z0 is normalization factors.
  13. 13. Maximize likelihood  Objective function  O(θ) = log Pθ(Y | X, G) = ΣiΣjαj fj (xij, yi) + ΣΣμk hk(Yc) – log Z  Learning the model to  estimate a parameter configuration θ= {α, μ} to maximize the objective function :  that is, the goal is to compute θ* = argmax O(θ)
  14. 14. Learning algorithm θ* Goal : = argmax O(θ) L (k ) (k) ( Z ( x ( k ) )) Fj ( y ,x ) j k Z ( x(k ) ) Newton methold. The gradient of each μk with Z ( x(k ) ) exp F ( y, x ( k ) ) y regard to the objective exp( F ( y, x ( k ) )) F j ( y, x ( k ) ) function. (Z ( x (k ) )) y  dθ/ dμk= E[hk(Yc)] – EPμ (Yc|X, k Z (x (k ) ) exp F ( y, x ( k ) ) G)[hk(Yc)] y exp( F ( y, x ( k ) ) (k ) ) F j ( y, x ( k ) ) y exp F ( y, x ) y p ( y | x ( k ) ) F j ( y, x ( k ) ) y E p (Y | x( k ) ) F j (Y , x ( k ) )
  15. 15. Learning algorithm One challenge : how to calculate the marginal distribution Pμ (yc|x, G). k  Intractable to directly calculate the marginal distribution.  Use approximation algorithm : Belief propagation.
  16. 16. Belief propagation Sum-product  The algorithm works by passing real values functions called “messages” along the edges between the nodes.  Extend edges to triangles
  17. 17. Belief propagation A “message” from a variable node v to a factor node u is μv->u(xv) = Πu*∈N(v){u}μu*->v(xv) A “message” from a factor node u to a variable node v is μu->v(xu) = Σx’ux’v=xvΠu*∈N(u){v}μv*->u(x’u)
  18. 18. Learning algorithm(TriFG model)Input : network Gt, learning rateηOutput : estimated parametersθInitalize θ= 0;Repeat Perform LBP to calculate marginal distribution of unknown variables P(yi|xi, G); Perform LBP to calculate marginal distribution of triad c, i.e. P(yc|Xc, G); Calculate the gradient of μk according to : dθ/ dμk= E[hk(Yc)] – EPμ (Yc|X, G)[hk(Yc)] k Update parameter θ with the learning rate η: θ new = θold + ηd θUntil Convergence;
  19. 19. Prediction features Geographic distance  Global vs Local Homophily  Link homophily  Status homophily Implicit structure  Retweet or reply (A) and (B) are balanced, but (C) and (D) are not.  Retweeting seems to be Users who share Elite users have a more helpful Global common links will Local much stronger Structural balance have a tendency to tendency to  Two-way relationships are follow each other. follow each other balanced (88%),  But, one-way relationships are not (only 29%).
  20. 20. Our approach : TriFG TriFG model  Features based on observations  Partially labeled  Conditional random field  Triad correlation factors
  21. 21. Outline Previous works Our approach Experimental results Conclusion & future works
  22. 22. Data collection Huge sub-network of twitter  13,442,659 users and 56,893,234 following links.  Extracted 35,746,366 tweets. Dynamic networks  With an average of 728,509 new links per day.  Averagely 3,337 new follow-back links per day.  13 time stamps by viewing every four days as a time stamp
  23. 23. Prediction performance Baseline algorithms  SVM & LRC & CRF Accurately infer 90% of reciprocal relationships in twitter. Data Algotithm Precision Recall F1Measure Accuracy SVM 0.6908 0.6129 0.6495 0.9590 Test LRC 0.6957 0.2581 0.3765 0.9510 Case CRF 1.0000 0.6290 0.7723 0.9770 1 TriFG 1.0000 0.8548 0.9217 0.9910 SVM 0.7323 0.6212 0.6722 0.9534 Test LRC 0.8333 0.3030 0.4444 0.9417 Case CRF 1.0000 0.6333 0.7755 0.9717 2 TriFG 1.0000 0.8788 0.9355 0.9907
  24. 24. Convergence property BLP-based learning algorithm can converge in less than 10 iterations. After only 7 learning iterations, the prediction performance of TriFG becomes stable.
  25. 25. Effect of Time Span Distribution of time span in which follow back action is performed.  60% of follow-backs are performed in the next time stamp,  37% of follow-backs would be still performed in the following 3 time stamps. How different settings of the time span.  The prediction performance of TriFG drops sharply when setting the time span as two or less.  The performance is acceptable, when setting it as three time stamps.
  26. 26. Factor contribution analysis We consider five different factor functions  Geographic distance(G)  Link homophily(L)  Status homophily(S)  Implicit network correlation(I)  Structural balance correlation(B)
  27. 27. Outline Previous works Our approach Experimental results Conclusion & future works
  28. 28. Conclusion Reciprocal relationship prediction in social network Incorporates social theories into prediction model. Several interesting phenomena.  Elite users tend to follow each other.  Two-way relationships on Twitter are balanced, but one-way relationships are not.  Social networks are going global, but also stay local.
  29. 29. Future works Other social theories for reciprocal relationship prediction. User feedback. Incorporating user interactions. Building a theory for different kinds of networks.
  30. 30.  Thanks! Q&A
  31. 31. Reference  [BL11] L.Backstrom and J.Leskovec. Supervised random walks : predicting and recommending links in social networks. In WSDM’11  [C10] D.J.Crandall, L.Backstrom, D. Cosley, S.Suri, D.Huttenlocher, and J. Kleinberg. Inferring social ties from geographic coincidences. PNAS, Dec. 2010  [W10] C.Wang, J. Han, Y.Jia, J.Tang, D.Zhang, Y. Yu and J.Guo. Mining advisor-advisee relationships from research publication networks. In KDD’10.  [N01]M.E.J. Newman. Clustering and preferential attachment in growing networks. Phys. Rev. E, 2001  [L10] J.Leskovec, D.Huttenlocher, and J.Kleinberg. Predicting positive and negative links in online social networks. In WWW10.  [T10] C.Tan, J. Tang, J. Sun, Q.Lin, and F.Wang. Social action tracking via noise tolerant time-varying factor graphs. In KDD10  [T09] J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD09.
  32. 32. Reference  [J07]A. Java, X.Song, T.Finin, and B.L. Tseng. Why we twitter : An analysis of a microblogging community. In KDD2007.  [K10]H. Kwak, C.Lee, H.Park, and S.B. Moon. What is twitter, a social network or a news media? In WWW2010.  [M10]M.Mathioudakis and N.Koudas. Twittermonitor : trend detection over the twitter stream. In SIGMOD10.  [S10]T. Sakaki, M. Okazaki, and Y.Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In WWW10.

×