Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cold-Start Recommendations to Users With Rich Profiles

39 views

Published on

Presentation to the RecSys NYC Meetup

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Cold-Start Recommendations to Users With Rich Profiles

  1. 1. Cold-Start Recommendations to Users With Rich Profiles Harlan D. Harris, PhD
 Director of Data Science at WayUp September, 2018 RecSys NYC Meetup 1
  2. 2. After This Meetup! • Go to The Storehouse! • Meet other RecSys peeps! 2
  3. 3. 3
  4. 4. Why Build a RecSys? • College students may not know what they want — must show options • Promote customer jobs • Ongoing engagements with content (blog, guide) recs 4
  5. 5. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  6. 6. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  7. 7. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  8. 8. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  9. 9. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  10. 10. the problem with collaborative filters… 6
  11. 11. the problem with collaborative filters… 6
  12. 12. Leverage the Profile • Structured & Unstructured Data • Natural Language Processing • Learning to Rank • Domain Knowledge & Feature Engineering 7
  13. 13. Architecture User & 
 Front End: Hey, show me jobs! Main App:
 That’s hard! But I know who you are! DB Microservice: Got you. Feature Engineering your Profile… DB Profile,
 Interaction History Listing IDs Listing
 Details User
 Details User ID, Params Ranked Listings
 & Details Offline Machine Learning 8
  14. 14. What do you mean by… Similar? Graphic Designer
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re a great artist. Risk Manager
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re OK at math. Visual Brand Lead
 Can you draw? Dunder Mifflin seeks a talented person to help bring our office paper business to the next level. And you’ll be on television! Meetup, next week! 9
  15. 15. How to Build a Multi-Factor, Profile-Based, Cold-Start Content Recommendation System 10
  16. 16. 11
  17. 17. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  18. 18. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  19. 19. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  20. 20. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  21. 21. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  22. 22. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  23. 23. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  24. 24. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? 7. Random! Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  25. 25. The More the Better id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  26. 26. The More the Better • Sum Weighted Log Rank (not Score) • Tune with A/B tests (or reinforcement learning) • Plausible “why” could be exposed to user • Mix of general and personalized rankers id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  27. 27. 14
  28. 28. Separation of Concerns 15
  29. 29. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering 15
  30. 30. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering Recommender microservice • Knows about content, not users • Updated nightly with new content & statistics • Parses, engineers features, ranks • Returns ranked IDs 15
  31. 31. Metrics & Tuning 16
  32. 32. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B 16
  33. 33. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) 16
  34. 34. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! 16
  35. 35. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! • Offline debugging tool is very handy 16
  36. 36. Pros & Cons 17
  37. 37. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production 17
  38. 38. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations 17
  39. 39. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) 17
  40. 40. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities 17
  41. 41. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content 17
  42. 42. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content • Doesn’t work with very large number of items; 
 Requires tuning 17
  43. 43. Thank You! Harlan Harris harlan@wayup.com @harlanh on Twitter, Medium, GitHub http://harlan.harris.name 18
  44. 44. What Happens When? Real Time • Ranking 19 Nightly • Update content • Compute popularity • Refit collaborative ranker Periodically • Tuning parameters • Exploring new rankers

×