WWW 2013 presentations slides for our paper.
Abstract:
Online social networks have become important for networking, communication, sharing, and discovery. A considerable challenge these networks face is the fact that an online social network is partially observed because two individuals might know each other, but may not have established a connection on the site. Therefore, link prediction and recommendations are important tasks for any online social network. In this paper, we address the problem of computing edge affinity between two users on a social network, based on the users belonging to organizations such as companies, schools, and online groups. We present experimental insights from social network data on organizational overlap, a novel mathematical model to compute the probability of connection between two people based on organizational overlap, and experimental validation of this model based on real social network data. We also present novel ways in which the organization overlap model can be applied to link prediction and community detection, which in itself could be useful for recommending entities to follow and generating personalized news feed.
Organizational Overlap on Social Networks and its Applications
1. Organizational Overlap on
Social Networks and its
Applications
Mitul Tiwari
Joint work with Cho-Jui Hsieh, Deepak Agarwal,
Xinyi (Lisa) Huang, and Sam Shah
LinkedIn
Wednesday, May 15, 13
3. Outline
• Motivation
• Organizational Overlap Model
• Problem Definition
• Data Analysis
• Mathematical Formulation
• Experimental Validation
• Applications
• Link Prediction
• Community Detection 3
Wednesday, May 15, 13
4. Motivation
• Social Networks : important for
• Sharing and Discovery
• Communication
• Networking
• Online Social Networks are partially observed
• Link Prediction and Recommending entities are important
4
Wednesday, May 15, 13
10. Motivation
• Member profile contains various types of organizations
• Company, Schools, Groups, ...
• Can we compute edge affinity based on these organization
information?
• Useful for many applications:
• Recommending members to connect (link prediction)
• Recommending other entities from the same community (community
detection)
10
Wednesday, May 15, 13
11. Outline
• Motivation
• Organizational Overlap Model
• Problem Definition
• Data Analysis
• Mathematical Formulation
• Experimental Validation
• Applications
• Link Prediction
• Community Detection 11
Wednesday, May 15, 13
12. Organizational Overlap Problem
• Goal: compute the probability of connection based on the
organizational time overlap
• For a pair of members (A, B) who belonged to the same
organization and overlapped in time, we have organizational
time overlap: T(A, B, O)
• Probability that A and B are connected: P(A, B)
• Assume (A, B) only one common org: P(A, B) = f(T(A, B, O), O)
• A function of time overlapped in the organization O and Properties of the
organization O
• In short, P(t) = f(t, O), where t=T(A,B,O)
12
Wednesday, May 15, 13
13. Organizational Overlap Data Analysis
• Insight 1: Connection density increases with organizational
time overlap
13
Wednesday, May 15, 13
14. Organizational Overlap Data Analysis
• Insight 2: Connection density decreases with the size of
the organizational
14
Wednesday, May 15, 13
18. Organizational Overlap Model:
Estimating !
• !: organization dependent
parameter
• Members of smaller
organization is more likely to
know each other
• Empirical and MLE estimates
for log(!) ~ -0.8 log(|S|)
18
Wednesday, May 15, 13
19. Outline
• Motivation
• Organizational Overlap Model
• Problem Definition
• Data Analysis
• Mathematical Formulation
• Experimental Validation
• Applications
• Link Prediction
• Community Detection
19
Wednesday, May 15, 13
20. Application: Link Prediction
• Warm start: existing edges
• 2 features: org. overlap time
and size
• Common Neighbors (CN)
• Adamic-Adar (AA)
• Data Sets: LinkedIn, Enron
emails, Wiki talk
20
Wednesday, May 15, 13
21. Application: Link Prediction
• Cold start: no or sparse
edges
• All features:
• time overlap, company size,
company propensity, node
propensity, ...
• logistic regression model
21
Wednesday, May 15, 13
22. Application: Community Detection
• Good for candidate generation for an entity recommendation
systems, such as, companies to follow
• Graph Clustering algorithm (Graclus)
• Members as nodes and an edge between any pair of nodes with overlap
• Organizational overlap model for computing edge weight
• Graclus: minimizes the total weight of the cuts
• Evaluation using
• Virality of company follow within communities
• Virality of article updates
22
Wednesday, May 15, 13
23. Community Detection Evaluation
• Compared 3 methods
• Organizational overlap based
• Using social connections graph
• Random: partition the nodes in the
same company
• Using Spread of company follow
• Spread: avg # of companies
followed within d days of the
first follow event
• Propagation rate: norm. spread
23
Wednesday, May 15, 13
24. Community Detection Evaluation
• Virality of article updates within communities
24
Avg degree: 4-6 Avg degree: 12-14
Wednesday, May 15, 13
26. Summary
• Motivation
• Organizational Overlap Model
• Problem Definition
• Data Analysis
• Mathematical Formulation
• Experimental Validation
• Applications and Evaluation
• Link Prediction: cold and warm start
• Community Detection
26
Wednesday, May 15, 13