Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evan Estola – Data Scientist, Meetup.com at MLconf ATL

2,532 views

Published on

Beyond Collaborative Filtering: using Machine Learning to power recommendations at Meetup
Collaborative filtering and other common recommendation algorithms are a powerful technique for some scenarios. I will cover how to design a recommendation system from the ground up using an ensemble classifier and supervised learning to avoid some of the pitfalls of collaborative filtering. From sampling to deployment, we’ve had to invent our approach with few non-academic and non-toy examples to follow. At Meetup we’re all about sharing information and empowering communities, so I’ll present the details of our model as well as some of the new features we are still developing.

Published in: Technology

Evan Estola – Data Scientist, Meetup.com at MLconf ATL

  1. 1. Beyond Collaborative Filtering: ML & Recommendations at Meetup Evan Estola Machine Learning Engineer Meetup.com evan@meetup.com @estola
  2. 2. Meetup what are you
  3. 3. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and >125 million rsvps by 18 million members ● 3 million rsvps in the last 30 days ○ 1/second
  4. 4. Tools at Meetup ● Hive - SQL on Hadoop ● Spark - Distributed Scala on Hadoop cluster ● Scala - Recommendations service ● R - Data analysis, Model building ● Python - Scripting, Data organizing ● Java - Backend of our web stack
  5. 5. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this
  6. 6. Weaknesses of CF ● Sparsity ● Cold Start ● Coverage ● Diversity
  7. 7. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Low rsvp/person ○ Membership: 0.001% ○ Compare to Netflix Prize Dataset: 1%
  8. 8. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup group/Didn’t join Meetup group
  9. 9. Preprocessing ● Schenectady ● Fake RSVP boosts (+100 guests!) ● Outliers ● Bucketing ● Etc etc
  10. 10. Problem definition and assumptions ● Assumption: if you’re not in a given group, you don’t want to be ○ Negative samples: groups you’re not in ○ Also a good classifier... ● Membership << expected error rate ○ Solution: sample to 50/50 join/no-join
  11. 11. Ranking ● Model output label no longer explicitly true ○ Luckily, we’d rather rank all of the results anyway ● Use a classifier that gives you a useful output ○ Fancy black box ○ Logistic Regression ■ Easier to explain
  12. 12. Meetup what are you
  13. 13. Ensemble Learning “... use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms”
  14. 14. Ensemble Learning ● Topic match (original algorithm) ● Collaborative Filtering on Topics ● Social algorithm ● Other simple features (Popularity, Gender…) ● Add output of algorithms as features into Logistic Regression model
  15. 15. Logistic Regression Output ● TopicScore 4.14 ● ExtendedTS 0.47 ● RelatedTS 0.66 ● FbFriends 2.02 ● 2ndFbFriends 0.09 ● AgeUnmatch -2.40 ● GenUnmatch -3.37 ● Distance -0.04 ● StateMatch 0.54 ● CountyMatch 0.41 ● ZipScore 0.06 ● RsvpScore 0.02
  16. 16. Facebook Likes ● Lots of information, but how to use? ● Map to topics, let training the model take care of the rest!
  17. 17. Mapping FB Likes to Meetup Topics ● Text based? ○ Go(game) vs Go(lang)? ○ Burton? ● Data approach! ○ Grab most popular topics across all members with the same like
  18. 18. Normalization ● Top topics for Burton-Likers ○ Meeting New People, Coffee, bla bla ○ Most popular still dominates ● Solution: Normalize based on expected topic occurrence in sample
  19. 19. Normalization ● For members with a given Like, compare percent with each topic to expected among total population ● Total population ○ 20% “Meeting New People” ○ 2% “Snowboarding ● Burton: ○ 20% “Meeting New People” ○ 9% “Snowboarding”
  20. 20. Results ● Generate top topics for all likes ○ Path from member to like to topic to group ● Add Facebook Like based topic match feature to model ● Positive weight ○ Very good sign! ● Deploy/Split test ○ TBD
  21. 21. Summary ● Supervised Learning for Ranking as Recommendations is cool ● Simple, interpretable models are cool ● Feature engineering is cool
  22. 22. Thanks! Smart people come work with me. http://www.meetup.com/jobs/

×