Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks
Upcoming SlideShare
Loading in...5
×
 

Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks

on

  • 1,135 views

Presentation from the International Semantic Web Conference 2012 in Boston, US. Co-authored paper with Milan Stankovic and Harith Alani

Presentation from the International Semantic Web Conference 2012 in Boston, US. Co-authored paper with Milan Stankovic and Harith Alani

Statistics

Views

Total Views
1,135
Views on SlideShare
1,108
Embed Views
27

Actions

Likes
4
Downloads
17
Comments
0

3 Embeds 27

https://twitter.com 24
https://si0.twimg.com 2
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks Presentation Transcript

  • Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks International Semantic Web Conference 2012. Boston, US Matthew Rowe1 Milan Stankovic2,3 Harith Alani4 1 School of Computing and Communications, Lancaster University, Lancaster, UK 2 Hypios Research, 187 rue du Temple, 75003 Paris, France 3 Universit Paris-Sorbonne, 28 rue Serpente, 75006 Paris 4 Knowledge Media Institute, The Open University, Milton Keynes, UK @mrowebot | m.rowe@lancaster.ac.ukhttp://www.matthew-rowe.com | http://www.lancs.ac.uk/staff/rowem/
  • Background Problem Formulation Approach Experiments Summary Attention Information Networks The intersection of information and social networks [Yin et al., 2011]: Users can follow other users: u subscribes to v u = Follower, v = Followee User u is paying attention to the content from user v u v Users become ’Information Hubs’ [Romero and Kleinberg, 2010] Tune in to get real time event information E.g. #Sandy, #Arabspring, #Londonriots People become social sensors u v Attention is paid to the information that users publishWho will follow whom? Exploiting Semantics for Link Prediction 2 / 22
  • Background Problem Formulation Approach Experiments Summary Attention Economics Large uptake/adoption of Attention-Information Networks: 31.9% increase in Twitter users in 2011 Attention becomes a limited commodity “What counts now is what is most scarce now, namely attention.” [Goldhaber, 1997] Users must consider who they wish to subscribe to Whose content do I wish to receive? Who interests me? If we can understand who will follow whom & follower decisions: Predict social capital based on expected network growth; Facilitate audience building Of interest to Digital Marketing firms - i.e. boosting client’s presenceWho will follow whom? Exploiting Semantics for Link Prediction 3 / 22
  • Background Problem Formulation Approach Experiments Summary Outline Problem Formulation Related Work Follower-Decision Hypotheses Formulating the Problem Approach Features Concept Disambiguation with User Contexts Experiments Dataset Experimental Setup Results: Prediction Accuracy Results: Follower-Decision PatternsWho will follow whom? Exploiting Semantics for Link Prediction 4 / 22
  • Background Problem Formulation Approach Experiments Summary Related Work Network-topology approaches [Golder and Yardi, 2010, Yin et al., 2011, Backstrom and Leskovec, 2011]: Path structures, common followers and common friends Local metadata approaches [Schifanella et al., 2010, Leroy et al., 2010, Brzozowski and Romero, 2011]: Common tags (Flickr, YouTube), group information (on Flickr) Local metadata approaches use tags or group memberships, but no concepts No examination of the follower-decision behaviour patterns And no exploration of divergent follower decision behaviourWho will follow whom? Exploiting Semantics for Link Prediction 5 / 22
  • Background Problem Formulation Approach Experiments Summary Follower-Decision Hypotheses H1. Following a user is performed when there is a topical affinity between the follower and the followee [Schifanella et al., 2010] found social and topical homophily to be correlated on Flickr H2. Users who do not focus on specific topics do not base their follower-decisions on topical information but on social factors Unfocussed users show divergent decision behaviour H3. Users who are more socially connected are driven by social rather than topical factors High-degree users are driven by social network effectsWho will follow whom? Exploiting Semantics for Link Prediction 6 / 22
  • Background Problem Formulation Approach Experiments Summary Formulating the Problem A directed social network is a graph: G = V , E , where: V denotes the set of users (nodes), and; E is the set of edges ( u, v ∈ E ) between nodes. meaning that u follows v An egocentric social network (egonet) of u is denoted by Γ(u) Γ− (u) denotes in the follower network (incoming edges) Γ+ (u) denotes in the followee network (outgoing edges) A given user u is provided with a set of recommended users R(u) R(u) ∩ Γ+ (u) = ∅ Goal: induce a function between users and recommendations: f : V × R → {0, 1}Who will follow whom? Exploiting Semantics for Link Prediction 7 / 22
  • Background Problem Formulation Approach Experiments Summary Predicting Links in Attention-Information Networks Given our problem setting we want to: 1. Identify the best performing general model; 2. Explore follower-decision behaviour and how this differs Problem is a binary classification task: pairwise features between u and each of his recommendations (v ∈ R(u)) To enable accurate prediction and explore different factors behind link creation we implement: Social features: based on the network-structure Topical features: based on content published by u and v Visibility features: based on the user noticing a followed We now explain the various features which are computed between u and v ∈ R(u)...Who will follow whom? Exploiting Semantics for Link Prediction 8 / 22
  • Background Problem Formulation Approach Experiments Summary Social Social features account for the topology of the network and the existence of edges present within the network prior to recommendations Mutual Followers Count: Measures the overlap of the follower sets (i.e. the set of users connecting into a given user) between u and v . Mutual Followees Count Measures the overlap of the followee sets Mutual Friends Count Measures the overlap of the friends sets i.e. Friendship is denoted by a bi-directional edge between nodes Mutual Neighbours Measures the overlap of the ego-centric networks of u and v whilst ignoring the directions of the links in the network [Zhou et al., 2009, Yin et al., 2011, Backstrom and Leskovec, 2011]Who will follow whom? Exploiting Semantics for Link Prediction 9 / 22
  • Background Problem Formulation Approach Experiments Summary Topical (I) In attention-information networks users pay attention the content of other users Topical features use: a) tags, b) concept bags, c) concept graphs Tag Vectors: Examining tag/keyword overlap between u and v [Schifanella et al., 2010] Cosine Similarity: between the tag vectors of u and v Concept Bags: Examining overlap of concepts Return concepts from user content, then derive the concept bag vector Cosine Similarity: similarity between the concept bag vectors of u and v Jensen-Shannon Divergence: probability distribution divergence between concept bag vectors of u and v Greater divergence means greater dissimilarity between topicsWho will follow whom? Exploiting Semantics for Link Prediction 10 / 22
  • Background Problem Formulation Approach Experiments Summary Topical (II) C1 C3 C2 u v Concept Graphs: Semantic relatedness of users using graph-based metrics Measure distances between concepts from tags of u and v : d(ci , cj ) Distance measures have two varieties, based on input tags: 1. Tag Intersection: Intersection of the tag sets of u and v 2. All Tags: All tags from the tag sets of u and v Measured three distances measures for d(ci , cj ) using the above sets: Shortest Path: least number of steps from ci to cj (Bellman-Ford algorithm) Hitting Time: number of steps for a random walker to leave ci and reach cj [Fouss et al., 2007] Commute Time: number of steps for a random walker to leave ci and reach cj , and then return to ciWho will follow whom? Exploiting Semantics for Link Prediction 11 / 22
  • Background Problem Formulation Approach Experiments Summary Visibility The presence of information published by a prospective followee could influence users in their follower-decisions Retweet Count: total number of times a given user (v ) has been retweeted by members of the followee network belonging to u Mention Count: total number of times a given user (v ) has been mentioned by members of the followee network belonging to u Comment Count: total number of times a given user (v ) has had his content commented on by members of the followee network belonging to u Weighted Counts: weight each count by reply-frequency with ego-network memberWho will follow whom? Exploiting Semantics for Link Prediction 12 / 22
  • Background Problem Formulation Approach Experiments Summary Features Summary Type Feature Name Output Domain Social Mutual Followers Count {0} ∪ + N Mutual Followees Count {0} ∪ + N Mutual Friends Count {0} ∪ + N Mutual Neighbours Count {0} ∪ + N Topical Tag Vectors - Cosine [0, 1] Concept Bags - Cosine [0, 1] Concept Bags - JS-Divergence R + Concept Graphs - Int - Shortest Path N + Concept Graphs - All - Shortest Path N + Concept Graphs - Int - Hitting Time R + Concept Graphs - All - Hitting Time R + Concept Graphs - Int - Commute Time R + Concept Graphs - All - Commute Time R + Visibility Retweet Count {0} ∪ N + Mention Count {0} ∪ N + Comment Count {0} ∪ N + Weighted Retweet Count {0} ∪ R + Weighted Mention Count {0} ∪ R + Weighted Comment Count {0} ∪ R +Who will follow whom? Exploiting Semantics for Link Prediction 13 / 22
  • Background Problem Formulation Approach Experiments Summary Concept Disambiguation with User Contexts Distances across the concept graph capture semantic relatedness Distance metrics require a mapping between a tag and a concept... Polysemy Problem: one tag can be mapped to multiple concepts [Cantador et al., 2011] propose ‘distributional aggregation’ to choose the most representative tag for a web resource: Voting mechanism: Tag usage frequency amongst a collection of users Our voting mechanism: concept frequency given the user For a given tag: count candidate concept frequency in concept bag CTu , choose the most frequentWho will follow whom? Exploiting Semantics for Link Prediction 14 / 22
  • Background Problem Formulation Approach Experiments Summary Dataset Knowledge Discovery and Data mining (KDD) Cup 2012 Follower Prediction task Chinese microblogging platform Tencent Weibo Users, recommendations, and outcomes Follow-graph of users Set of tags found within each user’s content Tag-categorisation data and category graph q q qq q q q q q qq qq qq 10000 qq qq q q qq qq 1000 qq qq Frequency (c(n)) Frequency (c(n)) qq qq qq qq qq q qq qq q qq q q qq q qq q q qq q q qq q q qq q qq q q q qq q q qq q q qq q q 100 qq q q qq qq q q q q qq q qq q q qqq q q qq qq q qq q 100 qq q qq q qq q qq q q q q q qq q qq q qq q q qq q qq q qq qq qq q qq qqq qq q qq q q q q qq qq q qq q qq qq qq 10 q q qq q q qq qq q qq q q q qq q qq q qq qq qq qq qq qq qq qq q q qq qq qq q q q qq q q qqq qq q qq qq q qq qq q qq q qq q q qqq q qq qq qqqq qqq qq q qqq qq q qq qq qq q q qq q qq q q qqq qqq qqq qq qq q qqqq qqqq qqqq qqq qqq qq 1 1 q qqqqq qqqq qqq q q qqq qqq qqq qqq qq qq 1 2 5 10 20 50 1 5 50 500 categories (n) recommendations (n) (a) Categories per Tag (b) Recommend’ per UserWho will follow whom? Exploiting Semantics for Link Prediction 15 / 22
  • Background Problem Formulation Approach Experiments Summary Experimental Setup 1. General Follower Prediction: seeking a follower model Randomly selected 10% of users and built pairwise feature vectors 2. Binned Follower Prediction: seeking behaviour-specific models Divided users into 10 bins based on: a) concept-bag entropy, b) out-degree Selected all the users from low and high bins, built feature vectors Divided each dataset into an 80:20% split for training and testing For each experiment: 1. Model Selection 2. Pattern Analysis Evaluation Measures: 1. Area Under the receiver operator characteristic Curve (AUC ) 2. Matthews correlation coefficient (MCC )Who will follow whom? Exploiting Semantics for Link Prediction 16 / 22
  • Background Problem Formulation Approach Experiments Summary Results: Prediction Accuracy General Follower Prediction Model Topical features significantly better Models significantly outperform the random model Binned Follower Prediction Models Concept entropy: low - topical features; high - social features Degree: low and high - topical features Visibility features have little effect on predictions (majority are zero) 1.0 0.4 Social Social Topical Topical 0.8 0.3 Visibility Visibility All All 0.6 0.2 0.4 0.1 0.2 0.0 −0.1 0.0 Full Entropy − Low Entropy − High Degree − Low Degree − High Full Entropy − Low Entropy − High Degree − Low Degree − High (c) AUC (d) MCCWho will follow whom? Exploiting Semantics for Link Prediction 17 / 22
  • Background Problem Formulation Approach Experiments Summary Results: Follower-Decision Patterns Connections are formed... In the General Follower Prediction Model when: users share neighbours users are closer in terms of the subjects they discuss In the Binned Follower Prediction Model for low entropy and low degree users when: same feature pattern as the general model for high entropy users when: users have an overlap of subscribers tags differ, but similar concepts! for high degree users when: users listen to the same people users share a topical affinity with the same pattern as the general modelWho will follow whom? Exploiting Semantics for Link Prediction 18 / 22
  • Background Problem Formulation Approach Experiments Summary Findings General behaviour pattern: topical homphily [Schifanella et al., 2010] found socially close users to have high tag cosine Our approach detects latent patterns based on concept graphs On common followers: [Golder and Yardi, 2010, Brzozowski and Romero, 2011] found mutual audience to correlate with link creation We find that: mutual followers should be reduced in the general model On common neighbours: [Leroy et al., 2010] found an increase in mutual neighbours to correlate with link creation Similar effect in our findings Divergent behaviour for high entropy users: suggests a need for bespoke modelsWho will follow whom? Exploiting Semantics for Link Prediction 19 / 22
  • Background Problem Formulation Approach Experiments Summary Conclusions Our approach for link prediction outperforms: a) a random baseline, b) existing network-structure approaches General follower-decision model identified topical homophily effects Accounting for behaviour uncovered different follower-decisions: Unfocussed users follow users with whom they have conceptual affinity Concept-graphs allowed for latent effects to be identified Applicable over the linked data graph Can improve recommendations by accounting for behaviour and building bespoke models: Growing the platform’s network and increasing social capital Understand who will follow whom, and audience growthWho will follow whom? Exploiting Semantics for Link Prediction 20 / 22
  • Background Problem Formulation Approach Experiments Summary Future Work Apply our approach over Twitter and YouTube: are findings consistent? Extract concepts from content, measure distances across the Linked Data graph Inclusion of more nuanced user behaviour Conjecture: performance is conditioned on time-sensitive user behaviour User Churn: detecting the complement of link creation 25 days of Twitter logs show this (red): ∆(u) = |Γ− (u)| − |Γ− (u)| t t (1) 6000 5000 4000 c(∆) 3000 2000 1000 0 −40 −20 0 20 40 ∆Who will follow whom? Exploiting Semantics for Link Prediction 21 / 22
  • Background Problem Formulation Approach Experiments Summary Questions Twitter: @mrowebot Email: m.rowe@lancaster.ac.uk WWW: http://www.matthew-rowe.com WWW: http://www.lancs.ac.uk/staff/rowem/Who will follow whom? Exploiting Semantics for Link Prediction 22 / 22