• Save
Social Network Analysis at LinkedIn
Upcoming SlideShare
Loading in...5
×
 

Social Network Analysis at LinkedIn

on

  • 3,266 views

Social Network Analysis talk at UT Austin

Social Network Analysis talk at UT Austin

Statistics

Views

Total Views
3,266
Views on SlideShare
3,162
Embed Views
104

Actions

Likes
9
Downloads
0
Comments
0

4 Embeds 104

http://www.linkedin.com 59
http://paper.li 28
https://www.linkedin.com 16
http://yali-ld1.linkedin.biz 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Social Network Analysis at LinkedIn Social Network Analysis at LinkedIn Presentation Transcript

  • Social Network Analysis at Linkedin Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn 1
  • Who am I? 2
  • OutlineAbout MeAbout LinkedInAnalytics products at LinkedInFun Facts: Sequencing the Startup DNA 3
  • LinkedIn by the numbers120,000,000+ users (August 4, 2011)2+ new user registrations per second81+ Million monthly unique users* (Comscore)2 Billion People Searches in 20107.1 Billion page views in Q2 20102+ Million companies with LinkedIn Company Pages2+ M LinkedIn Groups 4
  • Broad Range of Products 5
  • User Profile 6
  • Connections 7
  • CommunicationsCommunications 10 8
  • Hiring Solutions 9
  • Job Search 10
  • Company Pages 11
  • OutlineAbout MeAbout LinkedInAnalytics products at LinkedInFun Facts: Sequencing the Startup DNA 12
  • What do I mean by Analytics Products? 13
  • People You May Know 14
  • Profile Stats: WVMP 15
  • Viewers of this profile also ... 16
  • Skills 17
  • InMaps 18
  • Related SearchesMillions of Searches everydayHelp users to explore and refine their queries 19
  • Analytics Products: Key IdeasRecommendations People You May Know, Related Searches, Viewers of this profile ...Insight Profile Stats: Who Viewed My Profile, SkillsVisualization InMaps 20
  • Analytics Products: Challenges LinkedIn: largest professional network 120+ million members on LinkedIn Billions of pageviews Terabytes of data to process 21
  • OutlineAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 22
  • Connection StrengthLet’s build “Connection Strength”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 23
  • Connection StrengthHow well do people know each other?Connection StrengthApplication: reorder updates to show updatesfrom close connections 24
  • Connection Strength How well do Alicepeople know each other? Bob Carol 25
  • Connection Strength How well do Alicepeople know each other? Bob Carol 26
  • Connection Strength How well do Alicepeople know each other? Bob Carol Triangle closing 27
  • Connection Strength How well do Alicepeople know each other? Bob Carol Triangle closingProb(Bob knows Carol) ~ the # of common connections 28
  • Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 29
  • Pig OverviewLoad: load data, specify formatStore: store data, specify formatForeach, Generate: Projections, similar to selectGroup by: group by column(s)Join, Filter, Limit, Order, ...User Defined Functions (UDFs) 30
  • Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 31
  • Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 32
  • Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 33
  • Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 34
  • Triangle Closing in Pig-- connections in (source_id, dest_id) format in both directionsconnections = LOAD `connections` USING PigStorage();group_conn = GROUP connections BY source_id;pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2);common_conn = GROUP pairs BY (id1, id2);common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections;STORE common_conn INTO `common_conn` USING PigStorage(); 35
  • Triangle Closing Example Alice Bob Carol connections = LOAD `connections` USING1.(A,B),(B,A),(A,C),(C,A) PigStorage();2.(A,{B,C}),(B,{A}),(C,{A})3.(A,{B,C}),(A,{C,B})4.(B,C,1), (C,B,1) 36
  • Triangle Closing Example Alice Bob Carol1.(A,B),(B,A),(A,C),(C,A) group_conn = GROUP connections BY2.(A,{B,C}),(B,{A}),(C,{A}) source_id;3.(A,{B,C}),(A,{C,B})4.(B,C,1), (C,B,1) 37
  • Triangle Closing Example Alice Bob Carol1.(A,B),(B,A),(A,C),(C,A)2.(A,{B,C}),(B,{A}),(C,{A}) pairs = FOREACH group_conn GENERATE3.(A,{B,C}),(A,{C,B}) generatePair(connections.dest_id) as (id1, id2);4.(B,C,1), (C,B,1) 38
  • Triangle Closing Example Alice Bob Carol1.(A,B),(B,A),(A,C),(C,A)2.(A,{B,C}),(B,{A}),(C,{A}) common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn3.(A,{B,C}),(A,{C,B}) GENERATE flatten(group) as (source_id, dest_id),4.(B,C,1), (C,B,1) COUNT(pairs) as common_connections; 39
  • Connection StrengthAgeCompany OverlapSchool Overlap 40
  • Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 41
  • Systems and ToolsKafka (LinkedIn)Hadoop (Apache)Azkaban (LinkedIn)Voldemort (LinkedIn) 42
  • Data Cycle 43
  • Systems and ToolsKafka publish-subscribe messaging system transfer data from production to HDFSHadoopAzkabanVoldemort 44
  • Systems and ToolsKafkaHadoop Java MapReduce and Pig process dataAzkabanVoldemort 45
  • Systems and ToolsKafkaHadoopAzkaban Hadoop workflow management tool to manage hundreds of Hadoop jobsVoldemort 46
  • Systems and ToolsKafkaHadoopAzkabanVoldemort Key-value store store output of Hadoop jobs and serve in production 47
  • Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 48
  • Hadoop MapReduceLarge-scale distributed data processingMap phase: partition the problem in smallerstepsReduce phase: aggregate the output of smallerstepsAllows parallelism and fault-tolerance 49
  • Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 50
  • Our Workflow TriangleAge School Overlap Closing Union push-to-prod 51
  • Our Workflow Triangle Age School Overlap Closing Unionpush-to-qa push-to-prod 52
  • Sample Workflow 53
  • Azkaban Workflow Management Dependency management Regular Scheduling Monitoring Diverse jobs: Java, Pig, Clojure Configuration/Parameters Resource control/locking Restart/Stop/Retry Visualization History Logs 54
  • Our Workflow TriangleAge School Overlap Closing Union push-to-prod 55
  • Our Workflow TriangleAge School Overlap Closing Union push-to-prod 56
  • Related SearchesLet’s build “Related Searches”Systems and Tools we useHadoop MapReduceManaging workflowServing data in production 57
  • VoldemortLarge amount of data/ScalableQuick lookup/low latencyVersioning and RollbackFault tolerance through replicationRead onlyOffline index building 58
  • Voldemort RO Store 59
  • OutlineAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 60
  • Related SearchesMillions of Searches everydayHelp users to explore and refine their queries 61
  • How to build Related Searches? Collaborative Filtering: searches done in the same session Q1 Q2 Q3 Q4 Time 62
  • How to build Related Searches? Searches correlated by results clicks Q1 R1 Qn Rm 63
  • How to build Related Searches? Searches with overlapping terms Q1 Jeff LinkedIn Q2 LinkedIn CEO 64
  • OutlineAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 65
  • Sequencing the Startup DNA Done by Monica Rogati at LinkedIn 66
  • Sequencing the Startup DNA•Top majors: Entrepreneurship, Computer Engineering•Bottom: Nursing, Administration 67
  • Sequencing the Startup DNA•Most founders age at first startup between 20 and 40 68
  • Sequencing the Startup DNA•Top regions: San Francisco, NYC 69
  • Sequencing the Startup DNA•Top Business Schools:Stanford, Harvard, MIT Sloan 70
  • Things CoveredAnalytics products at LinkedInDeep Dive - Connection StrengthDeep Dive - Related SearchesFun Facts: Sequencing the Startup DNA 71
  • SNA TeamThanks to SNA Team at LinkedInhttp://sna-projects.comWe are hiring! 72
  • Questions? 73