Machine learning and data at Meetup

682 views

Published on

Presentation given for Tech Talks at Meetup event on 8/27/13

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
682
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
23
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Machine learning and data at Meetup

  1. 1. Machine Learning and Data at Meetup Evan Estola Meetup.com evan@meetup.com @estola
  2. 2. My Background ● Software Engineer/Data Scientist ● Machine learning team ● At Meetup since May 2012 ● BS Computer Science ○ Information Retrieval ○ Data Mining ○ Math ■ Linear Algebra ■ Graph Theory
  3. 3. You ● Data Scientists? ● Engineers? ● Statisticians? ● Students? ● Non-technical?
  4. 4. What this talk is ● Super secret peek into Meetup! ● Meetup recommendations examples ● How we do recommendations (model/features) ● Lessons learned/what’s next
  5. 5. What this talk isn’t ● What is a data scientist? ● What is big data? ● How does matrix factorization or gradient boosted decision trees or map reduce or this framework I hope you’ll use work?
  6. 6. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and 114 million rsvps by >14 million members ● 2.7 million rsvps in the last 30 days ○ ~1/second
  7. 7. Data at Meetup ● User data ● Site monitoring/performance ● AB testing ● Recommendations*
  8. 8. “Everything is a recommendation” ● Not my phrase ● Not actually true yet ● Working on it
  9. 9. Recommendation
  10. 10. Topic Recommendations ● New registrant ● Don’t know anything about you yet! ● Most popular is boring/repetitive Algorithm: ○ Group local meetups by topic ○ Select topic with most groups ○ Remove those groups ○ Repeat
  11. 11. Group/Event Recommendations ● Replaced a topic only system ● Inputs: ○ Member, location, topics, facebook friends? demographics? ● Outputs: ○ Ranking
  12. 12. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this
  13. 13. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Groups/person ○ Membership: 0.001% ○ Compare to Netflix: 1%
  14. 14. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup/Didn’t join Meetup ● “Features”
  15. 15. Topic Match
  16. 16. State Match
  17. 17. Logistic Regression ● Score ○ “Probability” ○ Ranking ● Fast + Easy ● Weights!
  18. 18. Group recommendation weights ● TopicMatch 1.21 ● TopicMatchExtended 0.17 ● FacebookFriends 0.15 ● SecondDegreeFacebook 0.79 ● AgeUnmatch -2.20 ● GenderUnmatch -2.6 ● StateMatchFeature 0.44 ● CityMatch 0.02 ● DistanceBucket <2 1.39 ● DistanceBucket 2-5 0.83 ● DistanceBucket 5-10 0.60 ● DistanceBucket >10 n/a
  19. 19. Making up features ● “Zipscore” ● All topics not created equal ● Facebook likes
  20. 20. Real data is gross ● Preprocessing is critical! ○ missing data ○ outliers ○ log scale ○ bucketing ○ selection/sampling (not introducing bias)
  21. 21. Cleaning data ● Schenectady ● Beverly Hills ● Astronaut ● Fake RSVP boosts (+100 guests!) ● Rsvp hogs
  22. 22. TO THE FUTURE! ● Hadoop ● Clicks ● Impressions ● People to people recommendations? ● Recommending people to groups?
  23. 23. Thanks! Smart people come work with me. http://www.meetup.com/jobs/ Special thanks: ● Chris Halpert ● Victor J Wang

×