Machine Learning and
Data at Meetup
Evan Estola
Meetup.com
evan@meetup.com
@estola
My Background
● Software Engineer/Data Scientist
● Machine learning team
● At Meetup since May 2012
● BS Computer Science
○ Information Retrieval
○ Data Mining
○ Math
■ Linear Algebra
■ Graph Theory
You
● Data Scientists?
● Engineers?
● Statisticians?
● Students?
● Non-technical?
What this talk is
● Super secret peek into Meetup!
● Meetup recommendations examples
● How we do recommendations
(model/features)
● Lessons learned/what’s next
What this talk isn’t
● What is a data scientist?
● What is big data?
● How does matrix factorization or gradient
boosted decision trees or map reduce or this
framework I hope you’ll use work?
Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and 114 million rsvps by >14 million
members
● 2.7 million rsvps in the last 30 days
○ ~1/second
Data at Meetup
● User data
● Site monitoring/performance
● AB testing
● Recommendations*
“Everything is a recommendation”
● Not my phrase
● Not actually true yet
● Working on it
Recommendation
Topic Recommendations
● New registrant
● Don’t know anything about you yet!
● Most popular is boring/repetitive
Algorithm:
○ Group local meetups by topic
○ Select topic with most groups
○ Remove those groups
○ Repeat
Group/Event Recommendations
● Replaced a topic only system
● Inputs:
○ Member, location, topics, facebook friends?
demographics?
● Outputs:
○ Ranking
Collaborative Filtering
● Classic recommendations approach
● Users who like this also like this
Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Groups/person
○ Membership: 0.001%
○ Compare to Netflix: 1%
Supervised Learning/Classification
● “Inferring a function from labeled training
data”
● Joined Meetup/Didn’t join Meetup
● “Features”
Topic Match
State Match
Logistic Regression
● Score
○ “Probability”
○ Ranking
● Fast + Easy
● Weights!
Group recommendation weights
● TopicMatch 1.21
● TopicMatchExtended 0.17
● FacebookFriends 0.15
● SecondDegreeFacebook 0.79
● AgeUnmatch -2.20
● GenderUnmatch -2.6
● StateMatchFeature 0.44
● CityMatch 0.02
● DistanceBucket <2 1.39
● DistanceBucket 2-5 0.83
● DistanceBucket 5-10 0.60
● DistanceBucket >10 n/a
Making up features
● “Zipscore”
● All topics not created equal
● Facebook likes
Real data is gross
● Preprocessing is critical!
○ missing data
○ outliers
○ log scale
○ bucketing
○ selection/sampling (not introducing bias)
Cleaning data
● Schenectady
● Beverly Hills
● Astronaut
● Fake RSVP boosts (+100 guests!)
● Rsvp hogs
TO THE FUTURE!
● Hadoop
● Clicks
● Impressions
● People to people recommendations?
● Recommending people to groups?
Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/
Special thanks:
● Chris Halpert
● Victor J Wang

Machine learning and data at Meetup

  • 1.
    Machine Learning and Dataat Meetup Evan Estola Meetup.com evan@meetup.com @estola
  • 2.
    My Background ● SoftwareEngineer/Data Scientist ● Machine learning team ● At Meetup since May 2012 ● BS Computer Science ○ Information Retrieval ○ Data Mining ○ Math ■ Linear Algebra ■ Graph Theory
  • 3.
    You ● Data Scientists? ●Engineers? ● Statisticians? ● Students? ● Non-technical?
  • 4.
    What this talkis ● Super secret peek into Meetup! ● Meetup recommendations examples ● How we do recommendations (model/features) ● Lessons learned/what’s next
  • 5.
    What this talkisn’t ● What is a data scientist? ● What is big data? ● How does matrix factorization or gradient boosted decision trees or map reduce or this framework I hope you’ll use work?
  • 6.
    Why Meetup datais cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and 114 million rsvps by >14 million members ● 2.7 million rsvps in the last 30 days ○ ~1/second
  • 8.
    Data at Meetup ●User data ● Site monitoring/performance ● AB testing ● Recommendations*
  • 9.
    “Everything is arecommendation” ● Not my phrase ● Not actually true yet ● Working on it
  • 10.
  • 13.
    Topic Recommendations ● Newregistrant ● Don’t know anything about you yet! ● Most popular is boring/repetitive Algorithm: ○ Group local meetups by topic ○ Select topic with most groups ○ Remove those groups ○ Repeat
  • 16.
    Group/Event Recommendations ● Replaceda topic only system ● Inputs: ○ Member, location, topics, facebook friends? demographics? ● Outputs: ○ Ranking
  • 17.
    Collaborative Filtering ● Classicrecommendations approach ● Users who like this also like this
  • 18.
    Why Recs atMeetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Groups/person ○ Membership: 0.001% ○ Compare to Netflix: 1%
  • 19.
    Supervised Learning/Classification ● “Inferringa function from labeled training data” ● Joined Meetup/Didn’t join Meetup ● “Features”
  • 20.
  • 21.
  • 22.
    Logistic Regression ● Score ○“Probability” ○ Ranking ● Fast + Easy ● Weights!
  • 23.
    Group recommendation weights ●TopicMatch 1.21 ● TopicMatchExtended 0.17 ● FacebookFriends 0.15 ● SecondDegreeFacebook 0.79 ● AgeUnmatch -2.20 ● GenderUnmatch -2.6 ● StateMatchFeature 0.44 ● CityMatch 0.02 ● DistanceBucket <2 1.39 ● DistanceBucket 2-5 0.83 ● DistanceBucket 5-10 0.60 ● DistanceBucket >10 n/a
  • 24.
    Making up features ●“Zipscore” ● All topics not created equal ● Facebook likes
  • 25.
    Real data isgross ● Preprocessing is critical! ○ missing data ○ outliers ○ log scale ○ bucketing ○ selection/sampling (not introducing bias)
  • 26.
    Cleaning data ● Schenectady ●Beverly Hills ● Astronaut ● Fake RSVP boosts (+100 guests!) ● Rsvp hogs
  • 29.
    TO THE FUTURE! ●Hadoop ● Clicks ● Impressions ● People to people recommendations? ● Recommending people to groups?
  • 30.
    Thanks! Smart people comework with me. http://www.meetup.com/jobs/ Special thanks: ● Chris Halpert ● Victor J Wang