SlideShare a Scribd company logo
1 of 30
Download to read offline
Machine Learning and
Data at Meetup
Evan Estola
Meetup.com
evan@meetup.com
@estola
My Background
● Software Engineer/Data Scientist
● Machine learning team
● At Meetup since May 2012
● BS Computer Science
○ Information Retrieval
○ Data Mining
○ Math
■ Linear Algebra
■ Graph Theory
You
● Data Scientists?
● Engineers?
● Statisticians?
● Students?
● Non-technical?
What this talk is
● Super secret peek into Meetup!
● Meetup recommendations examples
● How we do recommendations
(model/features)
● Lessons learned/what’s next
What this talk isn’t
● What is a data scientist?
● What is big data?
● How does matrix factorization or gradient
boosted decision trees or map reduce or this
framework I hope you’ll use work?
Why Meetup data is cool
● Real people meeting up
● Every meetup could change someone's life
● No ads, just do the best thing
● Oh and 114 million rsvps by >14 million
members
● 2.7 million rsvps in the last 30 days
○ ~1/second
Data at Meetup
● User data
● Site monitoring/performance
● AB testing
● Recommendations*
“Everything is a recommendation”
● Not my phrase
● Not actually true yet
● Working on it
Recommendation
Topic Recommendations
● New registrant
● Don’t know anything about you yet!
● Most popular is boring/repetitive
Algorithm:
○ Group local meetups by topic
○ Select topic with most groups
○ Remove those groups
○ Repeat
Group/Event Recommendations
● Replaced a topic only system
● Inputs:
○ Member, location, topics, facebook friends?
demographics?
● Outputs:
○ Ranking
Collaborative Filtering
● Classic recommendations approach
● Users who like this also like this
Why Recs at Meetup are hard
● Incomplete Data (topics)
● Cold start
● Asking user for data is hard
● Going to meetups is scary
● Sparsity
○ Location
○ Groups/person
○ Membership: 0.001%
○ Compare to Netflix: 1%
Supervised Learning/Classification
● “Inferring a function from labeled training
data”
● Joined Meetup/Didn’t join Meetup
● “Features”
Topic Match
State Match
Logistic Regression
● Score
○ “Probability”
○ Ranking
● Fast + Easy
● Weights!
Group recommendation weights
● TopicMatch 1.21
● TopicMatchExtended 0.17
● FacebookFriends 0.15
● SecondDegreeFacebook 0.79
● AgeUnmatch -2.20
● GenderUnmatch -2.6
● StateMatchFeature 0.44
● CityMatch 0.02
● DistanceBucket <2 1.39
● DistanceBucket 2-5 0.83
● DistanceBucket 5-10 0.60
● DistanceBucket >10 n/a
Making up features
● “Zipscore”
● All topics not created equal
● Facebook likes
Real data is gross
● Preprocessing is critical!
○ missing data
○ outliers
○ log scale
○ bucketing
○ selection/sampling (not introducing bias)
Cleaning data
● Schenectady
● Beverly Hills
● Astronaut
● Fake RSVP boosts (+100 guests!)
● Rsvp hogs
TO THE FUTURE!
● Hadoop
● Clicks
● Impressions
● People to people recommendations?
● Recommending people to groups?
Thanks!
Smart people come work with me.
http://www.meetup.com/jobs/
Special thanks:
● Chris Halpert
● Victor J Wang

More Related Content

Viewers also liked

GWC14: Michiel van Eunen - "Retail Gamification"
GWC14: Michiel van Eunen - "Retail Gamification"GWC14: Michiel van Eunen - "Retail Gamification"
GWC14: Michiel van Eunen - "Retail Gamification"gamificationworldcongress
 
GWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA GameGWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA Gamegamificationworldcongress
 
Sos besu forum_v4
Sos besu forum_v4Sos besu forum_v4
Sos besu forum_v4rajarshir
 
Group project linux helix
Group project linux helixGroup project linux helix
Group project linux helixJeff Carroll
 
TPC CONCEPT Performare echipe manageriale
TPC CONCEPT Performare echipe managerialeTPC CONCEPT Performare echipe manageriale
TPC CONCEPT Performare echipe managerialeTPC CONCEPT
 
GWC14: Nick Pelling - "Gamification: past and present"
GWC14: Nick Pelling - "Gamification: past and present"GWC14: Nick Pelling - "Gamification: past and present"
GWC14: Nick Pelling - "Gamification: past and present"gamificationworldcongress
 
Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3ffbroadwell
 
Program Aplikasi Hasil Penelitian
Program Aplikasi Hasil PenelitianProgram Aplikasi Hasil Penelitian
Program Aplikasi Hasil PenelitianArdi Novra
 
Mesi kas ainult maiustus
Mesi   kas ainult maiustusMesi   kas ainult maiustus
Mesi kas ainult maiustusElis Sarapuu
 
Lil wayne
Lil wayneLil wayne
Lil waynelulyruz
 
Presentasjon om biler2
Presentasjon om biler2Presentasjon om biler2
Presentasjon om biler2Abdelhay1961
 
„Stykówka” – miasto na wyciągnięcie smartfona
„Stykówka” – miasto na wyciągnięcie smartfona„Stykówka” – miasto na wyciągnięcie smartfona
„Stykówka” – miasto na wyciągnięcie smartfonaJacek Szlak
 
Chapter 11 presentation
Chapter 11 presentationChapter 11 presentation
Chapter 11 presentationmeganmcleod
 
Chapter 13 Presentation
Chapter 13 PresentationChapter 13 Presentation
Chapter 13 Presentationmeganmcleod
 

Viewers also liked (20)

GWC14: Michiel van Eunen - "Retail Gamification"
GWC14: Michiel van Eunen - "Retail Gamification"GWC14: Michiel van Eunen - "Retail Gamification"
GWC14: Michiel van Eunen - "Retail Gamification"
 
GWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA GameGWC13 - Javier Borderías - BBVA - BBVA Game
GWC13 - Javier Borderías - BBVA - BBVA Game
 
Sos besu forum_v4
Sos besu forum_v4Sos besu forum_v4
Sos besu forum_v4
 
Group project linux helix
Group project linux helixGroup project linux helix
Group project linux helix
 
TPC CONCEPT Performare echipe manageriale
TPC CONCEPT Performare echipe managerialeTPC CONCEPT Performare echipe manageriale
TPC CONCEPT Performare echipe manageriale
 
GWC14: Nick Pelling - "Gamification: past and present"
GWC14: Nick Pelling - "Gamification: past and present"GWC14: Nick Pelling - "Gamification: past and present"
GWC14: Nick Pelling - "Gamification: past and present"
 
2011 Hamilton County Iowa Laborshed Summary
2011 Hamilton County Iowa Laborshed Summary2011 Hamilton County Iowa Laborshed Summary
2011 Hamilton County Iowa Laborshed Summary
 
Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3Cfsa maximizing small spaces 3of 3
Cfsa maximizing small spaces 3of 3
 
Program Aplikasi Hasil Penelitian
Program Aplikasi Hasil PenelitianProgram Aplikasi Hasil Penelitian
Program Aplikasi Hasil Penelitian
 
Civil Rights = Labor Unions
Civil Rights = Labor UnionsCivil Rights = Labor Unions
Civil Rights = Labor Unions
 
Variation
VariationVariation
Variation
 
Mesi kas ainult maiustus
Mesi   kas ainult maiustusMesi   kas ainult maiustus
Mesi kas ainult maiustus
 
Marce Flores Exam
Marce Flores ExamMarce Flores Exam
Marce Flores Exam
 
Lil wayne
Lil wayneLil wayne
Lil wayne
 
Astrologia
AstrologiaAstrologia
Astrologia
 
Tugas agama
Tugas agamaTugas agama
Tugas agama
 
Presentasjon om biler2
Presentasjon om biler2Presentasjon om biler2
Presentasjon om biler2
 
„Stykówka” – miasto na wyciągnięcie smartfona
„Stykówka” – miasto na wyciągnięcie smartfona„Stykówka” – miasto na wyciągnięcie smartfona
„Stykówka” – miasto na wyciągnięcie smartfona
 
Chapter 11 presentation
Chapter 11 presentationChapter 11 presentation
Chapter 11 presentation
 
Chapter 13 Presentation
Chapter 13 PresentationChapter 13 Presentation
Chapter 13 Presentation
 

Similar to Machine learning and data at Meetup

Estola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estolaEstola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estolaData Con LA
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career GuidanceDeepak Sood
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments NASIG
 
Group Presentation for MGMT-4160
Group Presentation for MGMT-4160Group Presentation for MGMT-4160
Group Presentation for MGMT-4160Sam Dowd
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdfpreetikumara
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Mark Levy
 
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptx
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptxSmall Tasks Make Big Changes - Shmulik Dorinbaum.pptx
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptxShmulik Dorinbaum
 
4.how to think like a data scientist
4.how to think like a data scientist4.how to think like a data scientist
4.how to think like a data scientistAnirud Reddy Vem
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPaulina Galindo
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentationburnsr
 
2015 itsa 20 low cost tools v1
2015 itsa 20 low cost tools v12015 itsa 20 low cost tools v1
2015 itsa 20 low cost tools v1Jason R. Mata
 
Measuring: Promoting Online Education & Certification
 Measuring: Promoting Online Education & Certification  Measuring: Promoting Online Education & Certification
Measuring: Promoting Online Education & Certification HighRoad Solution
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningNikhil Dandekar
 

Similar to Machine learning and data at Meetup (20)

Estola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estolaEstola meetup big_datacampla_6_14_evan_estola
Estola meetup big_datacampla_6_14_evan_estola
 
Computer Science Career Guidance
Computer Science Career GuidanceComputer Science Career Guidance
Computer Science Career Guidance
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments
 
CV Masterclass
CV MasterclassCV Masterclass
CV Masterclass
 
Research Methods in UX
Research Methods in UXResearch Methods in UX
Research Methods in UX
 
Is IT for me?
Is IT for me?Is IT for me?
Is IT for me?
 
Group Presentation for MGMT-4160
Group Presentation for MGMT-4160Group Presentation for MGMT-4160
Group Presentation for MGMT-4160
 
Final pp
Final ppFinal pp
Final pp
 
Be Part of a Community
Be Part of a CommunityBe Part of a Community
Be Part of a Community
 
Website hub
Website hubWebsite hub
Website hub
 
CP vs Project - Elevate Ep. 02.pdf
CP vs Project  - Elevate Ep. 02.pdfCP vs Project  - Elevate Ep. 02.pdf
CP vs Project - Elevate Ep. 02.pdf
 
Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?Offline evaluation of recommender systems: all pain and no gain?
Offline evaluation of recommender systems: all pain and no gain?
 
Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design Starr Hoffman - Data Collection & Research Design
Starr Hoffman - Data Collection & Research Design
 
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptx
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptxSmall Tasks Make Big Changes - Shmulik Dorinbaum.pptx
Small Tasks Make Big Changes - Shmulik Dorinbaum.pptx
 
4.how to think like a data scientist
4.how to think like a data scientist4.how to think like a data scientist
4.how to think like a data scientist
 
Pivotal Tracker - Research Findings
Pivotal Tracker - Research FindingsPivotal Tracker - Research Findings
Pivotal Tracker - Research Findings
 
Sample graduation project presentation
Sample graduation project presentationSample graduation project presentation
Sample graduation project presentation
 
2015 itsa 20 low cost tools v1
2015 itsa 20 low cost tools v12015 itsa 20 low cost tools v1
2015 itsa 20 low cost tools v1
 
Measuring: Promoting Online Education & Certification
 Measuring: Promoting Online Education & Certification  Measuring: Promoting Online Education & Certification
Measuring: Promoting Online Education & Certification
 
Maintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learningMaintaining high quality user generated content through machine learning
Maintaining high quality user generated content through machine learning
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Machine learning and data at Meetup

  • 1. Machine Learning and Data at Meetup Evan Estola Meetup.com evan@meetup.com @estola
  • 2. My Background ● Software Engineer/Data Scientist ● Machine learning team ● At Meetup since May 2012 ● BS Computer Science ○ Information Retrieval ○ Data Mining ○ Math ■ Linear Algebra ■ Graph Theory
  • 3. You ● Data Scientists? ● Engineers? ● Statisticians? ● Students? ● Non-technical?
  • 4. What this talk is ● Super secret peek into Meetup! ● Meetup recommendations examples ● How we do recommendations (model/features) ● Lessons learned/what’s next
  • 5. What this talk isn’t ● What is a data scientist? ● What is big data? ● How does matrix factorization or gradient boosted decision trees or map reduce or this framework I hope you’ll use work?
  • 6. Why Meetup data is cool ● Real people meeting up ● Every meetup could change someone's life ● No ads, just do the best thing ● Oh and 114 million rsvps by >14 million members ● 2.7 million rsvps in the last 30 days ○ ~1/second
  • 7.
  • 8. Data at Meetup ● User data ● Site monitoring/performance ● AB testing ● Recommendations*
  • 9. “Everything is a recommendation” ● Not my phrase ● Not actually true yet ● Working on it
  • 11.
  • 12.
  • 13. Topic Recommendations ● New registrant ● Don’t know anything about you yet! ● Most popular is boring/repetitive Algorithm: ○ Group local meetups by topic ○ Select topic with most groups ○ Remove those groups ○ Repeat
  • 14.
  • 15.
  • 16. Group/Event Recommendations ● Replaced a topic only system ● Inputs: ○ Member, location, topics, facebook friends? demographics? ● Outputs: ○ Ranking
  • 17. Collaborative Filtering ● Classic recommendations approach ● Users who like this also like this
  • 18. Why Recs at Meetup are hard ● Incomplete Data (topics) ● Cold start ● Asking user for data is hard ● Going to meetups is scary ● Sparsity ○ Location ○ Groups/person ○ Membership: 0.001% ○ Compare to Netflix: 1%
  • 19. Supervised Learning/Classification ● “Inferring a function from labeled training data” ● Joined Meetup/Didn’t join Meetup ● “Features”
  • 22. Logistic Regression ● Score ○ “Probability” ○ Ranking ● Fast + Easy ● Weights!
  • 23. Group recommendation weights ● TopicMatch 1.21 ● TopicMatchExtended 0.17 ● FacebookFriends 0.15 ● SecondDegreeFacebook 0.79 ● AgeUnmatch -2.20 ● GenderUnmatch -2.6 ● StateMatchFeature 0.44 ● CityMatch 0.02 ● DistanceBucket <2 1.39 ● DistanceBucket 2-5 0.83 ● DistanceBucket 5-10 0.60 ● DistanceBucket >10 n/a
  • 24. Making up features ● “Zipscore” ● All topics not created equal ● Facebook likes
  • 25. Real data is gross ● Preprocessing is critical! ○ missing data ○ outliers ○ log scale ○ bucketing ○ selection/sampling (not introducing bias)
  • 26. Cleaning data ● Schenectady ● Beverly Hills ● Astronaut ● Fake RSVP boosts (+100 guests!) ● Rsvp hogs
  • 27.
  • 28.
  • 29. TO THE FUTURE! ● Hadoop ● Clicks ● Impressions ● People to people recommendations? ● Recommending people to groups?
  • 30. Thanks! Smart people come work with me. http://www.meetup.com/jobs/ Special thanks: ● Chris Halpert ● Victor J Wang