SlideShare a Scribd company logo
1 of 19
Download to read offline
All Friends are Not Equal:
      Using Weights in Social Graphs to
      Improve Search

Sudheendra Hangal, Diana MacLean, Monica S. Lam, Jeffrey Heer

       Computer Science Department, Stanford University
Outline

  Problem: social search in a global network
     Most contemporary approaches optimize for path length
     Tie strength not considered
     Success can be highly path dependent

  Question: Could a longer path be better?
     More likely to get an introduction through people who like me.
     If not, is there a best shortest path?

  Contributions:
     Influence as a model of tie strength in directed & undirected networks
     Best path = most influential path
     Study of influence and optimal paths in 2 networks (Twitter RT & DBLP)

  Results
     The shortest path is not always the best path!
Social Search Scenario
                             John’s	
  	
                          HR	
  
                            College	
  	
                       Manager’s	
  
                           Roommate	
                            Brother	
  




                                                                                Google	
  	
  
                John	
                                                            HR	
  	
  
                                                                                Manager	
  


                                               Google	
  	
  
                                              Recruiter	
  




  John would like to apply for a job at Google. What is the best path to the HR
   manager?
  James thinks Mary is cute. Who is the best person to ask for an introduction?
  …
  <Graphic of LinkedIn showing hundreds of paths>
Assigning Tie Strengths

  A social tie may be both weighted and asymmetric

  Infer automatically
     Most users would not input in any case

  Based on interaction frequency
     Latently captured in many social networks (emails, co-authorships…)
     Involves cost investment from user, so good proxy for tie strength

  We assume a global view of the data
Influence

  X’s influence on Y is proportional to Y’s investment in X

  Assume each node has equal, fixed resources to invest

  Influence of an edge:

                           Invests(B, A)
      Inf luence(A, B) = 
                          X Invests(B, X)

  Influence of a person:
                            
      Inf luence(A) =           Inf luence(A, X)
                            X

  Influence is both asymmetric and weighted
Influence
 Co-Authorship       Influence




                 7

                       0.75
       6
                       0.35


 2               4
Influence of a Path

  Influence of a path:
                             
     S(P ) = D        |P |
                                 Inf luence(ei ), ei ∈ P

  Decay factor d damps influence as path length increases

  Many other models
     This one is simple

  Strongest path = most influential
Computing the Strongest Path

  Adaptation of Djikstra’s shortest path algorithm.

  In order to maximize S(P):
               
       S(P ) =  (D × Inf luence(ei )), ei ∈ P
               
             =   (log(D) + log(Inf luence(ei )), ei ∈ P
  Thus minimizing:
                    
                 −   (log(D) + log(Inf luence(ei ))), ei ∈ P
                         1              1
                =   (log( ) + log(                 )), ei ∈ P
                         D         Inf luence(ei )

  We provide log(1/D) + log(1/Inf luence(ei )) as the starting weights to the
   shortest path algorithm.
Networks Studied

  DBLP
    Investment: co-authorship
    ~600K nodes, ~4M edges (giant component only)
    Example of influence relationship: earlier slide



  Twitter RT
    Investment: re-tweeting someone’s tweet
    1 month’s worth of tweets
    ~2.4M nodes, ~8.85M edges (giant component only)
    Example of influence relationship: Obama  Joe the Plumber
Obama  Joe


                  Joe the Plumber




          Obama
Experiment

  Pick 500 random node pairs

  Compute:
    Strongest path
    Shortest path

  Questions
    Do stronger paths tend to be longer? Equivalent to shortest path?
    What proportion of stronger paths are longer?
    How is influence distributed across nodes?
Results

  Node influence distributions
Results

  Short vs. Strong paths

  DBLP              All     |Pstrong|  |Pshort|   |Pshort| = |Pstrong|
  Node Pairs        500     215 (43.0%)            285 (57.0%)
  Avg. |Pshort|     6.5     6.6                    6.5
  Avg. |Pstrong|    7.0     7.8                    6.5


  TWITTER           All     |Pstrong|  |Pshort|   |Pshort| = |Pstrong|
  Node Pairs        500     339(67.8%)             161 (32.2%)
  Avg. |Pshort|     7.7     7.9                    7.3
  Avg. |Pstrong|    9.2     10.1                   7.3
Discussion (1)

  Influence metric
     Captures asymmetry at the node level – most have influence  1

  Differences between Twitter and DBLP datasets
     Twitter outliers  DBLP outliers
     Twitter more sparse than DBLP
     Twitter, driven by popularity  hype, lends itself to influence?
Discussion (2)

  Stronger path longer than shortest path
     43% in DBLP (~1 extra hop compared with shortest path)
     68% in Twitter (~2 extra hops compared with shortest path)
     More worthwhile to pick the stronger path

  Strongest path length equal to shortest path length
     Still better to pick the strongest, shortest path

  Future work:
     Explore alternate models of influence
     Consider paths between n-degree connected pairs.
Related Work
  Global social search
     Aardvark [Horwitz  Kamvar, WWW ’09]
     Facebook (and other OSN companies)

  Local social search
   [Dodds et al., Science, August ’03]
   [Adamic  Adar, Social Networks, July ’05]
   [Watts et al., Science, May ’02]

  Inferring tie strengths from social graphs
   [Gilbert  Karahalois, CHI ’09], [Xiang et al., WWW ’09],
   [Leskovec et al., CHI ’10], [Onnela et al., NJP, June ‘07]
Conclusions

  Longer paths are often better than shortest paths
     Cost of 1-2 extra “hops” seems small for tasks that are highly path dependent

  Even when the better path is not longer
     it is still better that picking randomly from the set of shortest paths

  In general, we need to develop more graph analysis methods for
   weighted graphs
     Binary ties are often arbitrary
     Weights can be easily inferred
     Weights encode a wealth of social information

  Influence metric
     Simple
     Applicable to any graph encoding social interactions
Thank you!

  Questions?

  http://prpl. stanford.edu/influence

More Related Content

Similar to Maclean.pptx

Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...
Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...
Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...Haewoon Kwak
 
Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...Hiroki Sayama
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksTuri, Inc.
 
Building networks for organizational learning presentation
Building networks for organizational learning presentationBuilding networks for organizational learning presentation
Building networks for organizational learning presentationStephen Judd
 
Simple Program for Enhancing Quality in Discussion Boards
Simple Program for Enhancing Quality in Discussion BoardsSimple Program for Enhancing Quality in Discussion Boards
Simple Program for Enhancing Quality in Discussion BoardsRafael Hernandez
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaCharalampos Chelmis
 
Private Distributed Collaborative Filtering
Private Distributed Collaborative FilteringPrivate Distributed Collaborative Filtering
Private Distributed Collaborative FilteringNeal Lathia
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018Arsalan Khan
 
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsInferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsNicolas Kourtellis
 
Ties that matter: Effects of the network context on the association between s...
Ties that matter: Effects of the network context on the association between s...Ties that matter: Effects of the network context on the association between s...
Ties that matter: Effects of the network context on the association between s...Srecko Joksimovic
 
Learning with me Mate: Analytics of Social Networks in Higher Education
Learning with me Mate: Analytics of Social Networks in Higher EducationLearning with me Mate: Analytics of Social Networks in Higher Education
Learning with me Mate: Analytics of Social Networks in Higher EducationDragan Gasevic
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...Elena Simperl
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfBalasundaramSr
 
cs224w-79-final
cs224w-79-finalcs224w-79-final
cs224w-79-finalDarren Koh
 

Similar to Maclean.pptx (20)

Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...
Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...
Comparison of Online Social Relations in terms of Volume vs. Interaction: A C...
 
Link prediction
Link predictionLink prediction
Link prediction
 
SN- Lecture 8
SN- Lecture 8SN- Lecture 8
SN- Lecture 8
 
Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...Self-organization of society: fragmentation, disagreement, and how to overcom...
Self-organization of society: fragmentation, disagreement, and how to overcom...
 
Ghost
GhostGhost
Ghost
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Collective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social NetworksCollective Spammer Detection in Evolving Multi-Relational Social Networks
Collective Spammer Detection in Evolving Multi-Relational Social Networks
 
Building networks for organizational learning presentation
Building networks for organizational learning presentationBuilding networks for organizational learning presentation
Building networks for organizational learning presentation
 
An influence propagation view of page rank
An influence propagation view of page rankAn influence propagation view of page rank
An influence propagation view of page rank
 
Simple Program for Enhancing Quality in Discussion Boards
Simple Program for Enhancing Quality in Discussion BoardsSimple Program for Enhancing Quality in Discussion Boards
Simple Program for Enhancing Quality in Discussion Boards
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
Private Distributed Collaborative Filtering
Private Distributed Collaborative FilteringPrivate Distributed Collaborative Filtering
Private Distributed Collaborative Filtering
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer SystemsInferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
Inferring Peer Centrality in Socially-Informed Peer-to-Peer Systems
 
Ties that matter: Effects of the network context on the association between s...
Ties that matter: Effects of the network context on the association between s...Ties that matter: Effects of the network context on the association between s...
Ties that matter: Effects of the network context on the association between s...
 
Learning with me Mate: Analytics of Social Networks in Higher Education
Learning with me Mate: Analytics of Social Networks in Higher EducationLearning with me Mate: Analytics of Social Networks in Higher Education
Learning with me Mate: Analytics of Social Networks in Higher Education
 
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
One does not simply crowdsource the Semantic Web: 10 years with people, URIs,...
 
SocialCom09-tutorial.pdf
SocialCom09-tutorial.pdfSocialCom09-tutorial.pdf
SocialCom09-tutorial.pdf
 
cs224w-79-final
cs224w-79-finalcs224w-79-final
cs224w-79-final
 
Ppt
PptPpt
Ppt
 

More from hangal

Maclean.pptx
Maclean.pptxMaclean.pptx
Maclean.pptxhangal
 
Maclean.pptx
Maclean.pptxMaclean.pptx
Maclean.pptxhangal
 
Unifi
Unifi Unifi
Unifi hangal
 
Dunbar
DunbarDunbar
Dunbarhangal
 
IODINE - DAC presentation
IODINE - DAC presentationIODINE - DAC presentation
IODINE - DAC presentationhangal
 
TSOtool - ISCA presentation
TSOtool - ISCA presentationTSOtool - ISCA presentation
TSOtool - ISCA presentationhangal
 
isca-preso
isca-presoisca-preso
isca-presohangal
 

More from hangal (7)

Maclean.pptx
Maclean.pptxMaclean.pptx
Maclean.pptx
 
Maclean.pptx
Maclean.pptxMaclean.pptx
Maclean.pptx
 
Unifi
Unifi Unifi
Unifi
 
Dunbar
DunbarDunbar
Dunbar
 
IODINE - DAC presentation
IODINE - DAC presentationIODINE - DAC presentation
IODINE - DAC presentation
 
TSOtool - ISCA presentation
TSOtool - ISCA presentationTSOtool - ISCA presentation
TSOtool - ISCA presentation
 
isca-preso
isca-presoisca-preso
isca-preso
 

Maclean.pptx

  • 1. All Friends are Not Equal: Using Weights in Social Graphs to Improve Search Sudheendra Hangal, Diana MacLean, Monica S. Lam, Jeffrey Heer Computer Science Department, Stanford University
  • 2. Outline   Problem: social search in a global network   Most contemporary approaches optimize for path length   Tie strength not considered   Success can be highly path dependent   Question: Could a longer path be better?   More likely to get an introduction through people who like me.   If not, is there a best shortest path?   Contributions:   Influence as a model of tie strength in directed & undirected networks   Best path = most influential path   Study of influence and optimal paths in 2 networks (Twitter RT & DBLP)   Results   The shortest path is not always the best path!
  • 3. Social Search Scenario John’s     HR   College     Manager’s   Roommate   Brother   Google     John   HR     Manager   Google     Recruiter     John would like to apply for a job at Google. What is the best path to the HR manager?   James thinks Mary is cute. Who is the best person to ask for an introduction?   …
  • 4.   <Graphic of LinkedIn showing hundreds of paths>
  • 5. Assigning Tie Strengths   A social tie may be both weighted and asymmetric   Infer automatically   Most users would not input in any case   Based on interaction frequency   Latently captured in many social networks (emails, co-authorships…)   Involves cost investment from user, so good proxy for tie strength   We assume a global view of the data
  • 6. Influence   X’s influence on Y is proportional to Y’s investment in X   Assume each node has equal, fixed resources to invest   Influence of an edge: Invests(B, A) Inf luence(A, B) = X Invests(B, X)   Influence of a person: Inf luence(A) = Inf luence(A, X) X   Influence is both asymmetric and weighted
  • 7. Influence Co-Authorship Influence 7 0.75 6 0.35 2 4
  • 8. Influence of a Path   Influence of a path: S(P ) = D |P | Inf luence(ei ), ei ∈ P   Decay factor d damps influence as path length increases   Many other models   This one is simple   Strongest path = most influential
  • 9. Computing the Strongest Path   Adaptation of Djikstra’s shortest path algorithm.   In order to maximize S(P): S(P ) = (D × Inf luence(ei )), ei ∈ P = (log(D) + log(Inf luence(ei )), ei ∈ P   Thus minimizing: − (log(D) + log(Inf luence(ei ))), ei ∈ P 1 1 = (log( ) + log( )), ei ∈ P D Inf luence(ei )   We provide log(1/D) + log(1/Inf luence(ei )) as the starting weights to the shortest path algorithm.
  • 10. Networks Studied   DBLP   Investment: co-authorship   ~600K nodes, ~4M edges (giant component only)   Example of influence relationship: earlier slide   Twitter RT   Investment: re-tweeting someone’s tweet   1 month’s worth of tweets   ~2.4M nodes, ~8.85M edges (giant component only)   Example of influence relationship: Obama Joe the Plumber
  • 11. Obama Joe Joe the Plumber Obama
  • 12. Experiment   Pick 500 random node pairs   Compute:   Strongest path   Shortest path   Questions   Do stronger paths tend to be longer? Equivalent to shortest path?   What proportion of stronger paths are longer?   How is influence distributed across nodes?
  • 14. Results   Short vs. Strong paths DBLP All |Pstrong| |Pshort| |Pshort| = |Pstrong| Node Pairs 500 215 (43.0%) 285 (57.0%) Avg. |Pshort| 6.5 6.6 6.5 Avg. |Pstrong| 7.0 7.8 6.5 TWITTER All |Pstrong| |Pshort| |Pshort| = |Pstrong| Node Pairs 500 339(67.8%) 161 (32.2%) Avg. |Pshort| 7.7 7.9 7.3 Avg. |Pstrong| 9.2 10.1 7.3
  • 15. Discussion (1)   Influence metric   Captures asymmetry at the node level – most have influence 1   Differences between Twitter and DBLP datasets   Twitter outliers DBLP outliers   Twitter more sparse than DBLP   Twitter, driven by popularity hype, lends itself to influence?
  • 16. Discussion (2)   Stronger path longer than shortest path   43% in DBLP (~1 extra hop compared with shortest path)   68% in Twitter (~2 extra hops compared with shortest path)   More worthwhile to pick the stronger path   Strongest path length equal to shortest path length   Still better to pick the strongest, shortest path   Future work:   Explore alternate models of influence   Consider paths between n-degree connected pairs.
  • 17. Related Work   Global social search   Aardvark [Horwitz Kamvar, WWW ’09]   Facebook (and other OSN companies)   Local social search [Dodds et al., Science, August ’03] [Adamic Adar, Social Networks, July ’05] [Watts et al., Science, May ’02]   Inferring tie strengths from social graphs [Gilbert Karahalois, CHI ’09], [Xiang et al., WWW ’09], [Leskovec et al., CHI ’10], [Onnela et al., NJP, June ‘07]
  • 18. Conclusions   Longer paths are often better than shortest paths   Cost of 1-2 extra “hops” seems small for tasks that are highly path dependent   Even when the better path is not longer   it is still better that picking randomly from the set of shortest paths   In general, we need to develop more graph analysis methods for weighted graphs   Binary ties are often arbitrary   Weights can be easily inferred   Weights encode a wealth of social information   Influence metric   Simple   Applicable to any graph encoding social interactions
  • 19. Thank you!   Questions?   http://prpl. stanford.edu/influence