When
Recommendation
Systems Go Bad
Evan Estola
12/10/15
About Me
● Evan Estola
● Senior Machine Learning Engineer @ Meetup
● evan@meetup.com
● @estola
Agenda
1. Meetup
2. Recommendation Systems
3. What can go wrong?
4. How to prevent it
We want a world full of real, local community.
Women’s Veterans Meetup, San Antonio, TX
Recommendation Systems: Collaborative Filtering
You just wanted a
kitchen scale, now
Amazon thinks you’re
a drug dealer
Recommendation Systems: Rating Prediction
● Netflix prize
● How many stars would user X give movie Y
● Boring
Recommendation Systems: Learning To Rank
● Active area of research
● Use ML model to solve a ranking problem
● Pointwise: Logistic Regression on binary label, use output for ranking
● Listwise: Optimize entire list
● Performance Metrics
○ Mean Average Precision
○ P@K
○ Discounted Cumulative Gain
Data
Science
impacts
lives
● News you’re exposed to
● Friend’s Activity/Facebook feed
● Job openings you find/get
● If you can get a loan
● Route you take to get somewhere
● If you can get a ride
● Movies you watch on Netflix
● Products you buy on Amazon
● Price you pay for things
● If a product is available to you at all
● Apps you download
● Ads you see
Ego
● Member/customer/user first
● Focus on building the best product,
not on being the most clever data
scientist
● Much harder to spin a positive user
story than a story about how smart
you are
●
● Fake profiles, track ads
● Career coaching for “200k+”
Executive jobs Ad
● Male group: 1852 impressions
● Female group: 318
● “Black-sounding” names 25% more
likely to be served ad suggesting
criminal record
Ethics
We have accepted that Machine Learning
can seem creepy, how do we prevent it
from becoming immoral?
We have an ethical obligation to not
teach machines to be prejudiced.
Data
Ethics
Awareness
● Tell your friends
● Tell your coworkers
● Tell your boss
Awareness
● Start a conversation
○ Identify potentially marginalized user groups
○ Have an ethics strategy for evaluating whether to include
sensitive features
Interpretable
Models
● For simple problems, simple
solutions are often worth a small
concession in performance
● Inspectable models make it easier
to debug problems in data
collection, feature engineering etc.
● Only include features that work the
way you want
● Don’t include feature interactions
that you don’t want
Logistic Regression
StraightDistanceFeature(-0.0311f),
ChapterZipScore(0.0250f),
RsvpCountFeature(0.0207f),
AgeUnmatchFeature(-1.5876f),
GenderUnmatchFeature(-3.0459f),
StateMatchFeature(0.4931f),
CountryMatchFeature(0.5735f),
FacebookFriendsFeature(1.9617f),
SecondDegreeFacebookFriendsFeature(0.1594f),
ApproxAgeUnmatchFeature(-0.2986f),
SensitiveUnmatchFeature(-0.1937f),
KeywordTopicScoreFeatureNoSuppressed(4.2432f),
TopicScoreBucketFeatureNoSuppressed(1.4469f,0.257f,10f),
TopicScoreBucketFeatureSuppressed(0.2595f,0.099f,10f),
ExtendedTopicsBucketFeatureNoSuppressed(1.6203f,1.091f,10f),
ChapterRelatedTopicsBucketFeatureNoSuppressed(0.1702f,0.252f,0.641f),
ChapterRelatedTopicsBucketFeatureNoSuppressed(0.4983f,0.641f,10f),
DoneChapterTopicsFeatureNoSuppressed(3.3367f)
Feature Engineering and Interactions
● Good Feature:
○ Join! You’re interested in Tech x Meetup is about Tech
● Good Feature:
○ Don’t join! Group is intended only for Women x You are a Man
● Bad Feature:
○ Don’t join! Group is mostly Men x You are a Woman
● Horrible Feature:
○ Don’t join! Meetup is about Tech x You are a Woman
Meetup is not interested in propagating gender stereotypes
Ensemble
Models and
Data
segregation
Ensemble Models: Combine outputs of
several classifiers for increased accuracy
If you have features that are useful but
you’re worried about interaction (and
your model does it automatically) use
ensemble modeling to restrict the
features to separate models.
Ensemble Model, Data Segregation
Data:
*Interests
Searches
Friends
Location
Data:
*Gender
Friends
Location
Data:
Model1 Prediction
Model2 Prediction
Model1 Prediction
Model2 Prediction
Final Prediction
Diversity
controlled
test data
● Make sure product works for
everybody
● Generate test data and evaluate
your model against it to confirm no
encapsulated prejudice
Diversity Controlled Testing
● CMU - AdFisher
○ Crawls ads with simulated user profiles
● Same technique can work to find bias in your own models!
○ Generate Test Data
■ Randomize sensitive feature in real data set
○ Run Model
■ Evaluate for unacceptable biased treatment
● Must identify what features are sensitive and what outcomes are
unwanted
You know racist computers are a
bad idea
Don’t let your company invent
racist computers

When recommendation systems go bad