Social Network Analysis at LinkedIn

4,353 views
4,175 views

Published on

Social Network Analysis talk at UT Austin

Published in: Technology, Business
1 Comment
14 Likes
Statistics
Notes
No Downloads
Views
Total views
4,353
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
1
Likes
14
Embeds 0
No embeds

No notes for slide

Social Network Analysis at LinkedIn

  1. 1. Social Network Analysis at Linkedin Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn 1
  2. 2. Who am I? 2
  3. 3. OutlineAbout MeAbout LinkedInAnalytics products at LinkedInFun Facts: Sequencing the Startup DNA 3
  4. 4. LinkedIn by the numbers120,000,000+ users (August 4, 2011)2+ new user registrations per second81+ Million monthly unique users* (Comscore)2 Billion People Searches in 20107.1 Billion page views in Q2 20102+ Million companies with LinkedIn Company Pages2+ M LinkedIn Groups 4
  5. 5. Broad Range of Products 5
  6. 6. User Profile 6
  7. 7. Connections 7
  8. 8. CommunicationsCommunications 10 8
  9. 9. Hiring Solutions 9
  10. 10. Job Search 10
  11. 11. Company Pages 11
  12. 12. OutlineAbout MeAbout LinkedInAnalytics products at LinkedInFun Facts: Sequencing the Startup DNA 12
  13. 13. What do I mean by Analytics Products? 13
  14. 14. People You May Know 14
  15. 15. Profile Stats: WVMP 15
  16. 16. Viewers of this profile also ... 16
  17. 17. Skills 17
  18. 18. InMaps 18
  19. 19. Related SearchesMillions of Searches everydayHelp users to explore and refine their queries 19
  20. 20. Analytics Products: Key IdeasRecommendations People You May Know, Related Searches, Viewers of this profile ...Insight Profile Stats: Who Viewed My Profile, SkillsVisualization InMaps 20
  21. 21. Analytics Products: Challenges LinkedIn: largest professional network 120+ million members on LinkedIn Billions of pageviews Terabytes of data to process 21
  22. 22. OutlineAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 22
  23. 23. Connection StrengthLet’s build “Connection Strength”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 23
  24. 24. Connection StrengthHow well do people know each other?Connection StrengthApplication: reorder updates to show updatesfrom close connections 24
  25. 25. Connection Strength How well do Alicepeople know each other? Bob Carol 25
  26. 26. Connection Strength How well do Alicepeople know each other? Bob Carol 26
  27. 27. Connection Strength How well do Alicepeople know each other? Bob Carol Triangle closing 27
  28. 28. Connection Strength How well do Alicepeople know each other? Bob Carol Triangle closingProb(Bob knows Carol) ~ the # of common connections 28
  29. 29. Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 29
  30. 30. Pig OverviewLoad: load data, specify formatStore: store data, specify formatForeach, Generate: Projections, similar to selectGroup by: group by column(s)Join, Filter, Limit, Order, ...User Defined Functions (UDFs) 30
  31. 31. Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 31
  32. 32. Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 32
  33. 33. Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 33
  34. 34. Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 34
  35. 35. Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 35
  36. 36. Triangle Closing Example Alice Bob Carol connections = LOAD `connections` USING1.(A,B),(B,A),(A,C),(C,A) PigStorage();2.(A,{B,C}),(B,{A}),(C,{A})3.(A,{B,C}),(A,{C,B})4.(B,C,1), (C,B,1) 36
  37. 37. Triangle Closing Example Alice Bob Carol1.(A,B),(B,A),(A,C),(C,A) group_conn = GROUP connections BY2.(A,{B,C}),(B,{A}),(C,{A}) source_id;3.(A,{B,C}),(A,{C,B})4.(B,C,1), (C,B,1) 37
  38. 38. Triangle Closing Example Alice Bob Carol1.(A,B),(B,A),(A,C),(C,A)2.(A,{B,C}),(B,{A}),(C,{A}) pairs = FOREACH group_conn GENERATE3.(A,{B,C}),(A,{C,B}) generatePair(connections.dest_id) as (id1, id2);4.(B,C,1), (C,B,1) 38
  39. 39. Triangle Closing Example Alice Bob Carol1.(A,B),(B,A),(A,C),(C,A)2.(A,{B,C}),(B,{A}),(C,{A}) common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn3.(A,{B,C}),(A,{C,B}) GENERATE flatten(group) as (source_id, dest_id),4.(B,C,1), (C,B,1) COUNT(pairs) as common_connections; 39
  40. 40. Connection StrengthAgeCompany OverlapSchool Overlap 40
  41. 41. Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 41
  42. 42. Systems and ToolsKafka (LinkedIn)Hadoop (Apache)Azkaban (LinkedIn)Voldemort (LinkedIn) 42
  43. 43. Data Cycle 43
  44. 44. Systems and ToolsKafka publish-subscribe messaging system transfer data from production to HDFSHadoopAzkabanVoldemort 44
  45. 45. Systems and ToolsKafkaHadoop Java MapReduce and Pig process dataAzkabanVoldemort 45
  46. 46. Systems and ToolsKafkaHadoopAzkaban Hadoop workflow management tool to manage hundreds of Hadoop jobsVoldemort 46
  47. 47. Systems and ToolsKafkaHadoopAzkabanVoldemort Key-value store store output of Hadoop jobs and serve in production 47
  48. 48. Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 48
  49. 49. Hadoop MapReduceLarge-scale distributed data processingMap phase: partition the problem in smallerstepsReduce phase: aggregate the output of smallerstepsAllows parallelism and fault-tolerance 49
  50. 50. Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 50
  51. 51. Our Workflow TriangleAge School Overlap Closing Union push-to-prod 51
  52. 52. Our Workflow Triangle Age School Overlap Closing Unionpush-to-qa push-to-prod 52
  53. 53. Sample Workflow 53
  54. 54. Azkaban Workflow Management Dependency management Regular Scheduling Monitoring Diverse jobs: Java, Pig, Clojure Configuration/Parameters Resource control/locking Restart/Stop/Retry Visualization History Logs 54
  55. 55. Our Workflow TriangleAge School Overlap Closing Union push-to-prod 55
  56. 56. Our Workflow TriangleAge School Overlap Closing Union push-to-prod 56
  57. 57. Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 57
  58. 58. VoldemortLarge amount of data/ScalableQuick lookup/low latencyVersioning and RollbackFault tolerance through replicationRead onlyOffline index building 58
  59. 59. Voldemort RO Store 59
  60. 60. OutlineAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 60
  61. 61. Related SearchesMillions of Searches everydayHelp users to explore and refine their queries 61
  62. 62. How to build Related Searches? Collaborative Filtering: searches done in the same session Q1 Q2 Q3 Q4 Time 62
  63. 63. How to build Related Searches? Searches correlated by results clicks Q1 R1 Qn Rm 63
  64. 64. How to build Related Searches? Searches with overlapping terms Q1 Jeff LinkedIn Q2 LinkedIn CEO 64
  65. 65. OutlineAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 65
  66. 66. Sequencing the Startup DNA Done by Monica Rogati at LinkedIn 66
  67. 67. Sequencing the Startup DNA•Top majors: Entrepreneurship, Computer Engineering•Bottom: Nursing, Administration 67
  68. 68. Sequencing the Startup DNA•Most founders age at first startup between 20 and 40 68
  69. 69. Sequencing the Startup DNA•Top regions: San Francisco, NYC 69
  70. 70. Sequencing the Startup DNA•Top Business Schools:Stanford, Harvard, MIT Sloan 70
  71. 71. Things CoveredAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 71
  72. 72. SNA TeamThanks to SNA Team at LinkedInhttp://sna-projects.comWe are hiring! 72
  73. 73. Questions? 73

×