SlideShare a Scribd company logo
1 of 16
Download to read offline
Recommender Systems using Graph
Inferences at LinkedIn
Souvik Ghosh
Staff Research Scientist
LinkedIn Confidential ©2014 All Rights Reserved
4
Some Popular Products at LinkedIn
5
People You May Know
Jobs You May Be Interested In
Data Flow
6
Online services
-  Get online features
-  Modify scores
-  Rank and serve
-  Track
Member activity
-  New connections formed
-  New members joining
-  Profile updates
-  Group activities
Offline processing (Hadoop)
-  Data analysis and modeling
-  Scoring
-  Pushing to a key-value store
-  Spark
Connections Matter
§  Members
§  A well connected member is much more likely to be able to
use the LinkedIn eco-system for her professional
development and growth
§  LinkedIn
§  Well connected members are
–  More likely to engage with LinkedIn products
–  More likely to drive viral growth and connectivity
§  A large number of connections at LinkedIn via people
recommendations
7
People You May Know (PYMK)
§  Recommend members to connect to other people (current
members and guest)
§  Scale:
–  350M+ members
–  200K+ new members every day
–  Many billions of existing edges
–  Many millions of new edges added daily
§  Viral Loop
–  New connections affect the network of a member
–  That affects future recommendations
§  Multi-level and delayed response
–  Member send out invitation to connect
–  Invitation acceptance is delayed
§  Multiple Channels:
–  Desktop, mobile, dedicated apps and pages, LinkedIn feed
8
PYMK: Objectives
Optimize:
§  Value to the member
§  Number of connections formed
§  Member growth
§  Connections that are likely to drive more interactions
§  Connections that increase diversity (different communities,
companies etc.)
§  Networks of new members
§  Members with small networks
9
Recommendation
Rank by score:
§  For a member u, candidate connect v are ranked by
§  P[u, v] = Probability that member u invites v and v accepts
§  O = Set of objectives
§  V(u, v, o) = Value for user u on objective o given a
connection between u and v
§  w(u, o) = Weight on objective o for user u
10
Score(u, v) = P
⇥
u, v
⇤
⇥
X
o2O
w(u, o)V (u, v, o)
Link Prediction
Predict P[u,v]: (See Nowell-Kleinberg CIKM 2004)
Probability of connection formed is related to:
§  Number of common friends
§  Number of common close friends
§  Same school and major
§  Same company, job function, duration
§  Similar skill set
§  More
11
Link Prediction
§  Use a regression model
§  fu, fv are member features
§  fu
TAfv models interaction between individual features
§  fuv are edge features
12
P
⇥
u, v
⇤
= ↵T
fu + T
fv + fT
u Afv + T
fuv
Value from Connection
§  Interactions:
§  Members have many activities on LinkedIn, example-
–  Post an article, join a professional group, connect to another
member, like/share another post, update profile etc.
§  These result in interactions with the member’s connections
(restrictions apply)
§  V(u, v, o) = Expected marginal number of interactions per
week if a connection between u and v is formed
§  V(u, v, o) is predicted using GLMs
13
V (u, v, o) = F fu, fv, fuv, Netu, Actv
Value from Guest Invitation
§  Members inviting others to join the network is crucial for
growth of the network
§  Anderson et. al. (WWW 2015)
§  LinkedIn’s cascade trees are very viral
14
The growth of trees is
almost linear in time
Network A/B Testing and Measurement
§  Random treatment assignments
§  Treatment recommendations affect control group as well
§  Experiment effects are often biased
§  Gui, Xu et. al. WWW 2015
15
© 2014 LinkedIn Corp. All rights reserved.

More Related Content

Similar to GraphEx-SGhosh

Student ppt
Student ppt Student ppt
Student ppt joyding
 
Mendeley: The Reference Manager and Beyond
Mendeley: The Reference Manager and Beyond Mendeley: The Reference Manager and Beyond
Mendeley: The Reference Manager and Beyond Genevieve Musasa
 
VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014Anup Sawant
 
Seminoles United Consolidated Advancement Project
Seminoles United Consolidated Advancement ProjectSeminoles United Consolidated Advancement Project
Seminoles United Consolidated Advancement ProjectWendy Jaccard
 
Using Data to Define Cool
Using Data to Define CoolUsing Data to Define Cool
Using Data to Define CoolSalesforce.org
 
QuestBack - International Presentation
QuestBack - International PresentationQuestBack - International Presentation
QuestBack - International PresentationFalco Noort
 
Minding your P's and Q's: Enrich-ing Enlighten
Minding your P's and Q's: Enrich-ing EnlightenMinding your P's and Q's: Enrich-ing Enlighten
Minding your P's and Q's: Enrich-ing Enlightenenlightenrepository
 
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentationPaolo Missier
 
Overview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesOverview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesAndrea Westerinen
 
Enlaso Case Study Sep 2009 V4 Final
Enlaso Case Study Sep 2009 V4 FinalEnlaso Case Study Sep 2009 V4 Final
Enlaso Case Study Sep 2009 V4 FinalAlden Globe
 
Top 5 tips for developing powerful Linked in profiles SLA conference June 201...
Top 5 tips for developing powerful Linked in profiles SLA conference June 201...Top 5 tips for developing powerful Linked in profiles SLA conference June 201...
Top 5 tips for developing powerful Linked in profiles SLA conference June 201...Nathan Rosen
 

Similar to GraphEx-SGhosh (20)

Student ppt
Student ppt Student ppt
Student ppt
 
Mendeley: The Reference Manager and Beyond
Mendeley: The Reference Manager and Beyond Mendeley: The Reference Manager and Beyond
Mendeley: The Reference Manager and Beyond
 
VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014VIVO Team Builder - VIVO conference 2014
VIVO Team Builder - VIVO conference 2014
 
Linked-Out
Linked-OutLinked-Out
Linked-Out
 
Seminoles United Consolidated Advancement Project
Seminoles United Consolidated Advancement ProjectSeminoles United Consolidated Advancement Project
Seminoles United Consolidated Advancement Project
 
Using Data to Define Cool
Using Data to Define CoolUsing Data to Define Cool
Using Data to Define Cool
 
BLDS Migration to Koha (KohaCon12)
BLDS Migration to Koha (KohaCon12)BLDS Migration to Koha (KohaCon12)
BLDS Migration to Koha (KohaCon12)
 
KaylaResume
KaylaResumeKaylaResume
KaylaResume
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
Quest Back 2010 En
Quest Back 2010 EnQuest Back 2010 En
Quest Back 2010 En
 
QuestBack - International Presentation
QuestBack - International PresentationQuestBack - International Presentation
QuestBack - International Presentation
 
Introducing ePoise
Introducing ePoiseIntroducing ePoise
Introducing ePoise
 
Minding your P's and Q's: Enrich-ing Enlighten
Minding your P's and Q's: Enrich-ing EnlightenMinding your P's and Q's: Enrich-ing Enlighten
Minding your P's and Q's: Enrich-ing Enlighten
 
Alamw15 VIVO
Alamw15 VIVOAlamw15 VIVO
Alamw15 VIVO
 
Bcuenr310
Bcuenr310Bcuenr310
Bcuenr310
 
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Overview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologiesOverview of-semantic-technologies-and-ontologies
Overview of-semantic-technologies-and-ontologies
 
Enlaso Case Study Sep 2009 V4 Final
Enlaso Case Study Sep 2009 V4 FinalEnlaso Case Study Sep 2009 V4 Final
Enlaso Case Study Sep 2009 V4 Final
 
Top 5 tips for developing powerful Linked in profiles SLA conference June 201...
Top 5 tips for developing powerful Linked in profiles SLA conference June 201...Top 5 tips for developing powerful Linked in profiles SLA conference June 201...
Top 5 tips for developing powerful Linked in profiles SLA conference June 201...
 

GraphEx-SGhosh

  • 1. Recommender Systems using Graph Inferences at LinkedIn Souvik Ghosh Staff Research Scientist LinkedIn Confidential ©2014 All Rights Reserved
  • 2.
  • 3.
  • 4. 4
  • 5. Some Popular Products at LinkedIn 5 People You May Know Jobs You May Be Interested In
  • 6. Data Flow 6 Online services -  Get online features -  Modify scores -  Rank and serve -  Track Member activity -  New connections formed -  New members joining -  Profile updates -  Group activities Offline processing (Hadoop) -  Data analysis and modeling -  Scoring -  Pushing to a key-value store -  Spark
  • 7. Connections Matter §  Members §  A well connected member is much more likely to be able to use the LinkedIn eco-system for her professional development and growth §  LinkedIn §  Well connected members are –  More likely to engage with LinkedIn products –  More likely to drive viral growth and connectivity §  A large number of connections at LinkedIn via people recommendations 7
  • 8. People You May Know (PYMK) §  Recommend members to connect to other people (current members and guest) §  Scale: –  350M+ members –  200K+ new members every day –  Many billions of existing edges –  Many millions of new edges added daily §  Viral Loop –  New connections affect the network of a member –  That affects future recommendations §  Multi-level and delayed response –  Member send out invitation to connect –  Invitation acceptance is delayed §  Multiple Channels: –  Desktop, mobile, dedicated apps and pages, LinkedIn feed 8
  • 9. PYMK: Objectives Optimize: §  Value to the member §  Number of connections formed §  Member growth §  Connections that are likely to drive more interactions §  Connections that increase diversity (different communities, companies etc.) §  Networks of new members §  Members with small networks 9
  • 10. Recommendation Rank by score: §  For a member u, candidate connect v are ranked by §  P[u, v] = Probability that member u invites v and v accepts §  O = Set of objectives §  V(u, v, o) = Value for user u on objective o given a connection between u and v §  w(u, o) = Weight on objective o for user u 10 Score(u, v) = P ⇥ u, v ⇤ ⇥ X o2O w(u, o)V (u, v, o)
  • 11. Link Prediction Predict P[u,v]: (See Nowell-Kleinberg CIKM 2004) Probability of connection formed is related to: §  Number of common friends §  Number of common close friends §  Same school and major §  Same company, job function, duration §  Similar skill set §  More 11
  • 12. Link Prediction §  Use a regression model §  fu, fv are member features §  fu TAfv models interaction between individual features §  fuv are edge features 12 P ⇥ u, v ⇤ = ↵T fu + T fv + fT u Afv + T fuv
  • 13. Value from Connection §  Interactions: §  Members have many activities on LinkedIn, example- –  Post an article, join a professional group, connect to another member, like/share another post, update profile etc. §  These result in interactions with the member’s connections (restrictions apply) §  V(u, v, o) = Expected marginal number of interactions per week if a connection between u and v is formed §  V(u, v, o) is predicted using GLMs 13 V (u, v, o) = F fu, fv, fuv, Netu, Actv
  • 14. Value from Guest Invitation §  Members inviting others to join the network is crucial for growth of the network §  Anderson et. al. (WWW 2015) §  LinkedIn’s cascade trees are very viral 14 The growth of trees is almost linear in time
  • 15. Network A/B Testing and Measurement §  Random treatment assignments §  Treatment recommendations affect control group as well §  Experiment effects are often biased §  Gui, Xu et. al. WWW 2015 15
  • 16. © 2014 LinkedIn Corp. All rights reserved.