Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Cold-Start Recommendations
to Users With Rich Profiles
Harlan D. Harris, PhD

Director of Data Science at WayUp
September,...
After This Meetup!
• Go to The
Storehouse!
• Meet other
RecSys peeps!
2
3
Why Build a RecSys?
• College students
may not know
what they want —
must show options
• Promote customer
jobs
• Ongoing
e...
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
(Feed)
5
RecSys UX Categories
Who You AreWhat You’ve Done
Feed-Like
Catalog-Like
(Feed)
5
the problem with
collaborative filters…
6
the problem with
collaborative filters…
6
Leverage the Profile
• Structured &
Unstructured Data
• Natural Language
Processing
• Learning to Rank
• Domain Knowledge ...
Architecture
User & 

Front End:
Hey, show me
jobs!
Main App:

That’s hard! But
I know who you
are!
DB
Microservice:
Got y...
What do you mean by… Similar?
Graphic Designer

Lehman Brothers is the
leading firm in highly
leveraged mortgages!
We have...
How to Build a Multi-Factor,
Profile-Based, Cold-Start Content
Recommendation System
10
11
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
Ranker #1
{major: Math,
grad_date:
2018/05/15, college:
Yale, skills: vide...
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
Ranker #1
{major: Math,...
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career S...
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career S...
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career S...
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career S...
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career S...
17,
4,
12,
97,
17,
4
Ranker #1
Divide & Conquer
1. Popular — nightly update, absolute or relative?
2. Relevant to Career S...
The More the Better
id recent major log rec log maj tot rank why
1 1 4 0 1.4 2.1 3 rec
2 2 2 0.7 0.7 1.7 2 maj
3 4 3 1.4 1...
The More the Better
• Sum Weighted Log Rank (not Score)
• Tune with A/B tests (or reinforcement learning)
• Plausible “why...
14
Separation of Concerns
15
Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends ...
Separation of Concerns
Main App
• Built by software engineers,
not data scientists
• Knows about user
immediately
• Sends ...
Metrics & Tuning
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
16
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 
...
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 
...
Metrics & Tuning
• Need to store: User X was recommended Content 

A, B, C on Page Y, then read B
• Metrics & A/B tests: 
...
Pros & Cons
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
17
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy...
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy...
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy...
Pros & Cons
• Incredibly fast to prototype offline; 

Fairly fast to build in production
• Amenable to explanations
• Easy...
Thank You!
Harlan Harris
harlan@wayup.com
@harlanh on Twitter, Medium, GitHub
http://harlan.harris.name
18
What Happens When?
Real Time
• Ranking
19
Nightly
• Update
content
• Compute
popularity
• Refit
collaborative
ranker
Perio...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
What to Upload to SlideShare
Next
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Cold-Start Recommendations to Users With Rich Profiles

Download to read offline

Presentation to the RecSys NYC Meetup

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Cold-Start Recommendations to Users With Rich Profiles

  1. 1. Cold-Start Recommendations to Users With Rich Profiles Harlan D. Harris, PhD
 Director of Data Science at WayUp September, 2018 RecSys NYC Meetup 1
  2. 2. After This Meetup! • Go to The Storehouse! • Meet other RecSys peeps! 2
  3. 3. 3
  4. 4. Why Build a RecSys? • College students may not know what they want — must show options • Promote customer jobs • Ongoing engagements with content (blog, guide) recs 4
  5. 5. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  6. 6. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  7. 7. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like 5
  8. 8. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  9. 9. RecSys UX Categories Who You AreWhat You’ve Done Feed-Like Catalog-Like (Feed) 5
  10. 10. the problem with collaborative filters… 6
  11. 11. the problem with collaborative filters… 6
  12. 12. Leverage the Profile • Structured & Unstructured Data • Natural Language Processing • Learning to Rank • Domain Knowledge & Feature Engineering 7
  13. 13. Architecture User & 
 Front End: Hey, show me jobs! Main App:
 That’s hard! But I know who you are! DB Microservice: Got you. Feature Engineering your Profile… DB Profile,
 Interaction History Listing IDs Listing
 Details User
 Details User ID, Params Ranked Listings
 & Details Offline Machine Learning 8
  14. 14. What do you mean by… Similar? Graphic Designer
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re a great artist. Risk Manager
 Lehman Brothers is the leading firm in highly leveraged mortgages! We have a ping pong table! You’re OK at math. Visual Brand Lead
 Can you draw? Dunder Mifflin seeks a talented person to help bring our office paper business to the next level. And you’ll be on television! Meetup, next week! 9
  15. 15. How to Build a Multi-Factor, Profile-Based, Cold-Start Content Recommendation System 10
  16. 16. 11
  17. 17. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  18. 18. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  19. 19. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  20. 20. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  21. 21. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  22. 22. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  23. 23. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  24. 24. 17, 4, 12, 97, 17, 4 Ranker #1 Divide & Conquer 1. Popular — nightly update, absolute or relative? 2. Relevant to Career Status — needs content tagging/ taxonomy 3. Relevant to Major (Category) — needs content tagging 4. Recent — e.g., “10 great internships you can apply to now!” 5. Collaborative — people with profiles like yours read content with tags like this 6. Sponsored — why wouldn’t we…? 7. Random! Ranker #1 {major: Math, grad_date: 2018/05/15, college: Yale, skills: video games} 17 4 12 97 11 3 Aggregator 4 12 7 17 2 3 12
  25. 25. The More the Better id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  26. 26. The More the Better • Sum Weighted Log Rank (not Score) • Tune with A/B tests (or reinforcement learning) • Plausible “why” could be exposed to user • Mix of general and personalized rankers id recent major log rec log maj tot rank why 1 1 4 0 1.4 2.1 3 rec 2 2 2 0.7 0.7 1.7 2 maj 3 4 3 1.4 1.1 3.0 4 maj 4 3 1 1.1 0 1.1 1 maj *1.5 13 *1.0
  27. 27. 14
  28. 28. Separation of Concerns 15
  29. 29. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering 15
  30. 30. Separation of Concerns Main App • Built by software engineers, not data scientists • Knows about user immediately • Sends JSON profile with no feature engineering Recommender microservice • Knows about content, not users • Updated nightly with new content & statistics • Parses, engineers features, ranks • Returns ranked IDs 15
  31. 31. Metrics & Tuning 16
  32. 32. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B 16
  33. 33. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) 16
  34. 34. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! 16
  35. 35. Metrics & Tuning • Need to store: User X was recommended Content 
 A, B, C on Page Y, then read B • Metrics & A/B tests: 
 Click-through Rate (did they like the suggestions?),
 Mean Reciprocal Rank (did they like the top items?) • Avoid hurting top KPIs! • Offline debugging tool is very handy 16
  36. 36. Pros & Cons 17
  37. 37. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production 17
  38. 38. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations 17
  39. 39. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) 17
  40. 40. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities 17
  41. 41. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content 17
  42. 42. Pros & Cons • Incredibly fast to prototype offline; 
 Fairly fast to build in production • Amenable to explanations • Easy to extend once history available (MF or LTR subrankers) • Easy to incorporate business priorities • Works with new users and new-ish content • Doesn’t work with very large number of items; 
 Requires tuning 17
  43. 43. Thank You! Harlan Harris harlan@wayup.com @harlanh on Twitter, Medium, GitHub http://harlan.harris.name 18
  44. 44. What Happens When? Real Time • Ranking 19 Nightly • Update content • Compute popularity • Refit collaborative ranker Periodically • Tuning parameters • Exploring new rankers

Presentation to the RecSys NYC Meetup

Views

Total views

177

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

5

Shares

0

Comments

0

Likes

0

×