Organizational Overlap on Social Networks and its Applications

394 views

Published on

WWW 2013 paper presentation slides. Paper can be found here: http://mitultiwari.net/docs/papers/www13_overlap.pdf

Published in: Internet
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
394
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Hi, I am Mitul Tiwari. Today I am going to present our paper on “Organizational Overlap on Social Networks and its Applications”.
    This is joint work with Cho-Jui, Deepak, Lisa, and Sam
  • Here is the outline of the rest of my talk.
  • LinkedIn is the second largest social network for professionals with more than 225 million members.
  • Members can create profiles with their education and employment details
  • Members can connect with each other and maintain their professional network on linkedin.
    TODO: replace screenshot
  • PYMK is a large scale recommendation system that helps you connect with others.
    Basically, PYMK is a link prediction problem, where we analyze billions of edges to recommend possible connections to you.
    A big big-data problem!
  • Companies can create pages and members can follow companies.
    TODO: replace screenshot
  • LinkedIn’s homepage is powered by recommendation engine: News, Connections, Jobs, Groups, Companies
    Also, ADs, Releavant Updates
  • A rich recommender systems ecosystem at linkedin: from connections, news, skills, Jobs, companies, groups, search queries, talent, similar profiles, ...
  • Here is the outline of the rest of my talk.
  • For a company A, this graph shows connection density, that is, the ratio of the # of connection with certain time overlap t within Company A and the total number of pairs with time overlap t within Company A
    We observe that connection density increases with time overlap t
    We see similar behavior with many companies, groups, and schools
    We came to this insight that connection density increases with organizational time overlap
  • we sampled companies of different sizes
    we calculated connection density with respect to company size
    we observed that connection density decreases as the size of the organization increases
    it makes sense since in a smaller organization people know each other
  • 1. Community-Affiliation Graph Model (AGM) proposes P(O1, O2) = 1 - (1-P(O1))(1-P(O2))
    Using that we can come to assumption 1
    2. P(t) is probability, so we can safely assume that it is between 0 and 1. And P(t) is 0 iff t=0, that is, there is no overlap
  • 1. Assumption 1 can be used to further decompose a time interval t into m smaller intervals to get Lemma1
    2. P(δt) = 0 from Assumption 2. Using Assumption 1: P(t-delta t) = p(t) = p(t+delta t)
    3. From Lemma 1 and Lemma 2 we can derive: 1-P(t) = Q(1)^t
  • Empirical connection density value fits our model well.
    In large companies it is not possible to have P(t) to be 1 for large t.
    We observe an upper bound mu for the probability
  • MLE: maximize log likelihood that is : Sum ( X_i log(P(t_i) + (1-X_i)log(1-P(t_i)) )
  • Here is the outline of the rest of my talk.
  • warm start setting where we have existing edges
    Enron emails:
    Wiki talk: conversation, discussion between editors. Edits on the same page implies conversation
  • Here is the outline of the rest of my talk.
  • questions, details, hiring
  • Organizational Overlap on Social Networks and its Applications

    1. 1. Organizational Overlap on Social Networks and its Applications Mitul Tiwari Joint work with Cho-Jui Hsieh, Deepak Agarwal, Xinyi (Lisa) Huang, and Sam Shah LinkedIn
    2. 2. 2 Who am I
    3. 3. 3 Outline • Motivation • Organizational Overlap Model • Problem Definition • Data Analysis • Mathematical Formulation • Experimental Validation • Applications • Link Prediction • Community Detection
    4. 4. 4 Motivation • Social Networks : important for • Sharing and Discovery • Communication • Networking • Online Social Networks are partially observed • Link Prediction and Recommending entities are important
    5. 5. 5 Motivation: Rich Member Profile
    6. 6. 6 Motivation: Network is Important
    7. 7. 7 Motivation: People You May Know
    8. 8. 8 Motivation: Other Entities
    9. 9. 10 Motivation: Recommender Ecosystem Similar Profiles Connections News Skill Endorsements
    10. 10. 11 Motivation • Member profile contains various types of organizations • Company, Schools, Groups, ... • Can we compute edge affinity based on these organization information? • Useful for many applications: • Recommending members to connect (link prediction) • Recommending other entities from the same community (community detection)
    11. 11. 12 Outline • Motivation • Organizational Overlap Model • Problem Definition • Data Analysis • Mathematical Formulation • Experimental Validation • Applications • Link Prediction • Community Detection
    12. 12. 13 Organizational Overlap Problem • Goal: compute the probability of connection based on the organizational time overlap • Organizational time overlap between two members A and B, who belonged to the same organization O : T(A, B, O) • Probability that A and B are connected: P(A, B) • P(A, B) = f(T(A, B, O), O), over all organizations O • A function of time overlapped in the organization O • Properties of the organization O
    13. 13. 14 Organizational Overlap Data Analysis • Insight 1: Connection density increases with organizational time overlap
    14. 14. 15 Organizational Overlap Data Analysis • Insight 2: Connection density decreases with the size of the organizational
    15. 15. 16 Organizational Overlap Model
    16. 16. 17 Organizational Overlap Model
    17. 17. 18 Organizational Overlap Model Validation • Empirical connection density fits our model
    18. 18. 19 Organizational Overlap Model: Estimating λ • λ: organization dependent parameter • Members of smaller organization is more likely to know each other • Empirical and MLE estimates for log(λ) ~ -0.8 log(|S|)
    19. 19. 20 Outline • Motivation • Organizational Overlap Model • Problem Definition • Data Analysis • Mathematical Formulation • Experimental Validation • Applications • Link Prediction • Community Detection
    20. 20. 21 Application: Link Prediction • Warm start: existing edges • 2 features: org. overlap time and size • Common Neighbors (CN) • Adamic-Adar (AA) • Data Sets: LinkedIn, Enron emails, Wiki talk
    21. 21. 22 Application: Link Prediction • Cold start: no or sparse edges • All features: • time overlap, company size, company propensity, node propensity, ...
    22. 22. 23 Application: Community Detection • Good for candidate generation for an entity recommendation systems, such as, companies to follow • Graph Clustering algorithm (Graclus) • Members as nodes and an edge between any pair of nodes with overlap • Organizational overlap model for computing edge weight • Graclus: minimizes the total weight of the cuts • Evaluation using • Virality of company follow within communities • Virality of article updates
    23. 23. 24 Community Detection Evaluation • Using Spread of company follow • Compared 3 methods • Organizational overlap based • Using social connections graph • Random: partition the nodes in the same company • Spread: avg # of companies followed within d days of the first follow event • Propagation rate: norm. spread
    24. 24. 25 Community Detection Evaluation • Virality of article updates within communities Avg degree: 4-6 Avg degree: 12-14
    25. 25. 26 Related Work
    26. 26. 27 Summary • Motivation • Organizational Overlap Model • Problem Definition • Data Analysis • Mathematical Formulation • Experimental Validation • Applications • Link Prediction • Community Detection
    27. 27. 28 Acknowledgement • http://data.linkedin.com • We are hiring! • Contact: mtiwari[at]linkedin.com • Follow: @mitultiwari on Twitter
    28. 28. 29 Questions?

    ×