Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Large scale social recommender systems and their evaluation

939 views

Published on

This talk will give an overview of some of the large-scale recommender systems at LinkedIn such as People You May Know (PYMK) and Suggested Skills Endorsements. This talk will also address how we formulate machine learning modeling problems to build these recommender systems and evaluate our models. Modeling for these recommender systems involves careful feature engineering and incorporating user feedback - both explicit and implicit. This talk will describe how we feature engineer through an example of modeling organizational overlap between people for link prediction and community detection over social graph. Also, how we incorporate user feedback through impression discounting ignored recommended results will be described. Careful evaluation of modeling changes both offline and online (A/B testing) is inherent part of measuring effectiveness of our recommender systems. We have built a sophisticated end-to-end A/B testing and evaluation platform called XLNT at LinkedIn and this talk will also cover how we use XLNT for power analysis, A/B testing, and measuring confidence of the results.

Published in: Data & Analytics
  • Be the first to comment

Large scale social recommender systems and their evaluation

  1. 1. Large-Scale Social Recommender Systems and their Evaluation Mitul Tiwari Search, Network, and Analytics (SNA) LinkedIn
  2. 2. Who am I 2
  3. 3. Outline • Social Recommender Systems at LinkedIn • People You May Know • Feature Engineering - Organization Overlap Model • User Feedback - Impression Discounting • Skills Endorsement Suggestions • Evaluation • Offline • Online A/B Testing 3
  4. 4. LinkedIn by the numbers 4 313M members 2 new members/sec
  5. 5. Broad Range of Products 5
  6. 6. LinkedIn Homepage • Powered by recommendations • News, Connections, Jobs, Groups, Companies • Relevant Updates and ads 6
  7. 7. Rich Recommender Ecosystem 7 Similar Profiles Connections News Skill Endorsements
  8. 8. Outline • Social Recommender Systems at LinkedIn • People You May Know (PYMK) • Feature Engineering - Organization Overlap Model • User Feedback - Impression Discounting • Skills Endorsement Suggestions • Evaluation • Offline • Online A/B Testing 8
  9. 9. Network is Important at LinkedIn 9 • Largest Professional Network • Network helps people: Discovering Opportunities, Getting Updates, Keeping in touch, …
  10. 10. PYMK: Link Prediction over social Graph 10 • People You May Know (PYMK) • Link Prediction System • Recommends other members to connect with
  11. 11. People You May Know • More than 50% of total connections and invitations • That is, > 50% of LinkedIn’s social graph is formed by PYMK • Challenges • Feature Engineering • Machine Learning • Scaling to process 100s of TBs 11
  12. 12. People You May Know: Feature Engineering Alice Bob Carol 12 How do people know each other?
  13. 13. People You May Know: Feature Engineering Alice Bob Carol 13 How do people know each other? • One good signal is if there common connections
  14. 14. People You May Know: Feature Engineering Alice Bob Carol Triangle closing 14 How do people know each other?
  15. 15. People You May Know: Feature Engineering Alice Bob Carol Triangle closing Prob(Bob knows Carol) ~ the # of common connections 15 How do people know each other?
  16. 16. Triangle Closing in Pig -- connections in (source_id, dest_id) format in both directions connections = LOAD `connections` USING PigStorage(); group_conn = GROUP connections BY source_id; pairs = FOREACH group_conn GENERATE generatePair(connections.dest_id) as (id1, id2); common_conn = GROUP pairs BY (id1, id2); common_conn = FOREACH common_conn GENERATE flatten(group) as (source_id, dest_id), COUNT(pairs) as common_connections; STORE common_conn INTO `common_conn` USING PigStorage(); 16
  17. 17. People You May Know: Feature Engineering • Member profile contains various types of organizations • Company, Schools, Groups, ... • Can we compute edge affinity based on these organization information? • Useful for many applications: • Recommending members to connect (link prediction) • Recommending other entities from the same community (community detection) 17
  18. 18. Organizational Overlap: Feature Engineering • Insight 1: Connection density increases with organizational time overlap 18 Hsieh, Tiwari, et. al, WWW’13
  19. 19. Organizational Overlap: Feature Engineering • Insight 2: Connection density decreases with the size of the organizational 19 Hsieh, Tiwari, et. al, WWW’13
  20. 20. Organizational Overlap Model 20 • Independence assumption - Community-affiliation model (AGM) • P(O1, O2) = 1 - (1-P(O1))(1-P(O2)) (Leskovec et. al 2012) Hsieh, Tiwari, et. al, WWW’13
  21. 21. Organizational Overlap Model 21 • Using Assumption 1 to further decompose the time intervals Hsieh, Tiwari, et. al, WWW’13
  22. 22. Organizational Overlap Model • Empirical connection density fits our model • Observe an upper bound • Why an upper bound? • In large companies it is not possible to have P(t) to be 1 even for large overlap t 22 Hsieh, Tiwari, et. al, WWW’13
  23. 23. Outline • Social Recommender Systems at LinkedIn • People You May Know • Feature Engineering - Organization Overlap Model • User Feedback - Impression Discounting • Skills Endorsement Suggestions • Evaluation • Offline • Online A/B Testing 23
  24. 24. Impression Discounting Example: PYMK 24 Li, Laks, Tiwari, Shah KDD’14
  25. 25. Impression Discounting Example: Endorsement •If you are likely to endorse, we will show a skills suggestion again •Otherwise, we should leave this space for other skills suggestions 25 Li, Laks, Tiwari, Shah KDD’14
  26. 26. Impression Discounting Problem Definition • Impressions: recommendations shown to user • Conversion: positive action - invite, endorse, etc. • In natural language: • We already impressed an item several times with no conversion. How much should we discount that item? • More formally: • For a user u and item i, given an impression history (T1, T2 , …, Tn) between u and i, can we predict the conversion rate for u on i? 26 Li, Laks, Tiwari, Shah KDD’14
  27. 27. Impression Discounting Framework 27 •Impression Discounting plugin or a feature in the model Li, Laks, Tiwari, Shah KDD’14
  28. 28. Data Analysis: Conversion rate Changes with Impression Count • If we show an item to a user repeatedly, the conversion rate decreases • Decrease in conversion rate varies 28 Li, Laks, Tiwari, Shah KDD’14
  29. 29. Fitting Conversion Rate with Impression Count 29
  30. 30. Conversion Rate with Last seen (recency) 30
  31. 31. • Linear Aggregation • Multiplicative Aggregation Impression Discounting Models 31 f(Xi) is one of discounting function given earlier Li, Laks, Tiwari, Shah KDD’14
  32. 32. How does PYMK work? • Combine features using a Machine Learning model 32
  33. 33. Outline • Social Recommender Systems at LinkedIn • People You May Know • Feature Engineering - Organization Overlap Model • User Feedback - Impression Discounting • Skills Endorsement Suggestions • Evaluation • Offline • Online A/B Testing 33
  34. 34. Suggested Skills Endorsement 34 •Suggestions to endorse your connections for a particular skill
  35. 35. Skills Endorsements: Example 35
  36. 36. Viral Growth: 3B Skills Endorsements in a year • One of the fastest growing product in LinkedIn’s history • How did we get to Skills Endorsements? 36
  37. 37. Skills Tagging • What is tagging? • Extracting potential skills from profile using skills taxonomy • Standardize skill phrase variants: deduplication Profile Tokenize Skills Tagger Phrases Skills 37
  38. 38. Skills Inference and Recommendation • What is Inference? • Predict a skill even if not present in the profile • Based on likelihood of member having a skill • Features: company, industry, skills, ... 38 Profile Tokenize Skills Tagger Phrases Skills Skills Classifier Profile features Recommended Skills
  39. 39. Suggested Skill Endorsements • Prompt your connections to validate your skill and expertise • How does it work? 39
  40. 40. Suggested Skill Endorsements • Binary Classification • Features • Company overlap, School overlap, Industrial and functional area similarity, Title similarity, Site interactions, Co-interactions, ... Candidate generation Classifier Features - Company - Title - Industry ... Suggested Endorsement s 40
  41. 41. Social Recommendation and tagging Skill Tagging Skill Recommendation Suggested Skill Endorsements 41
  42. 42. Skills Important for Data Scientists? 42
  43. 43. Find influencers in Venture Capital? 43
  44. 44. Outline • Social Recommender Systems at LinkedIn • People You May Know • Feature Engineering - Organization Overlap Model • User Feedback - Impression Discounting • Skills Endorsement Suggestions • Evaluation • Offline • Online A/B Testing 44
  45. 45. Offline Evaluation •AUC - Area under the ROC curve •Precision at top k •Precision improvement for different behavior sets. For example, Impression Discounting 45
  46. 46. Online Evaluation: A/B testing on PYMK •On LinkedIn PYMK online system: Impression Discounting impact 46
  47. 47. Growth of A/B Test Experiments @LinkedIn • Daily • More than 200 experiments • 800+ Metrics • Billions of experiments events • How did we scale to so many experiments? 47
  48. 48. XLNT: Scalable A/B testing Platform 48 • Experimental management and design • Online Infrastructure and Logging • Offline Analysis
  49. 49. XLNT: Experimental management and design • Self-serve • 40+ built-in targeting attributes • Customizable 49
  50. 50. XLNT: Online Infrastructure • Trigger based A/B testing • Trigger logging for analysis • Log when a request hits an experiment: control vs treatment 50
  51. 51. XLNT: Offline Analysis • Metrics design and management • De-centralize ownership • Centralized management • Statistical Analysis • Significance testing (p-value, Confidence interval) • Deep-dive: slicing and dicing • Dashboard: Monitoring and Alerting 51
  52. 52. Network Challenges • Network Effect • PYMK inviter and invitee may be in different buckets • Treatment results in leak into Control • Experimental effects are underestimated • Network Bucket Testing 52
  53. 53. References • Modeling Impression Discounting in Large-scale Recommender Systems , Pei Li, Laks V.S. Lakshmanan, Mitul Tiwari, Sam Shah. In Proceedings of the 20th ACM Conference on Knowledge Discovery and Data Mining, August 2014. • Organizational Overlap on Social Networks and its Applications, Cho-Jui Hsieh, Mitul Tiwari, Deepak Agarwal, Xinyi (Lisa) Huang, and Sam Shah. In Proceedings of the 22nd International World Wide Web Conference (WWW), May 2013. • Structural Diversity in Social Recommender Systems, Xinyi Huang, Mitul Tiwari and Sam Shah. In Proceedings of the 5th ACM RecSys Workshop on Recommender Systems and the Social Web, October 2013. • The Browsemaps: Collaborative Filtering at LinkedIn, Lili Wu, Sam Shah, Sean Choi, Mitul Tiwari, Christian Posse. In Proceedings of the 6th ACM RecSys Workshop on Recommender Systems and the Social Web, October 2014. • Metaphor: a system for related search recommendations, Azarias Reda, Yubin Park, Mitul Tiwari, Christian Posse, and Sam Shah. In Proceedings of the 21st International Conference on Information and Knowledge Management (CIKM), October 2012. • LinkedIn blog post on XLNT: http://engineering.linkedin.com/ab-testing/xlnt-platform-driving-ab-testing-linkedin 53
  54. 54. Acknowledgement • Thanks to Data Team at LinkedIn: http://data.linkedin.com • We are hiring! • Contact: mtiwari[at]linkedin.com • Follow: @mitultiwari on Twitter 54
  55. 55. Questions? 55

×