A presentation on ethics in Machine Learning and Recommendation Systems given at the NYC Data Science Meetup at MeetupHQ on 12/10 http://www.meetup.com/NYC-Data-Science/events/226998694/
17. Recommendation Systems: Learning To Rank
● Active area of research
● Use ML model to solve a ranking problem
● Pointwise: Logistic Regression on binary label, use output for ranking
● Listwise: Optimize entire list
● Performance Metrics
○ Mean Average Precision
○ P@K
○ Discounted Cumulative Gain
19. Data
Science
impacts
lives
● News you’re exposed to
● Friend’s Activity/Facebook feed
● Job openings you find/get
● If you can get a loan
● Route you take to get somewhere
● If you can get a ride
● Movies you watch on Netflix
● Products you buy on Amazon
● Price you pay for things
● If a product is available to you at all
● Apps you download
● Ads you see
23. Ego
● Member/customer/user first
● Focus on building the best product,
not on being the most clever data
scientist
● Much harder to spin a positive user
story than a story about how smart
you are
24. ●
● Fake profiles, track ads
● Career coaching for “200k+”
Executive jobs Ad
● Male group: 1852 impressions
● Female group: 318
26. Ethics
We have accepted that Machine Learning
can seem creepy, how do we prevent it
from becoming immoral?
We have an ethical obligation to not
teach machines to be prejudiced.
28. Awareness
● Start a conversation
○ Identify potentially marginalized user groups
○ Have an ethics strategy for evaluating whether to include
sensitive features
30. Interpretable
Models
● For simple problems, simple
solutions are often worth a small
concession in performance
● Inspectable models make it easier
to debug problems in data
collection, feature engineering etc.
● Only include features that work the
way you want
● Don’t include feature interactions
that you don’t want
32. Feature Engineering and Interactions
● Good Feature:
○ Join! You’re interested in Tech x Meetup is about Tech
● Good Feature:
○ Don’t join! Group is intended only for Women x You are a Man
● Bad Feature:
○ Don’t join! Group is mostly Men x You are a Woman
● Horrible Feature:
○ Don’t join! Meetup is about Tech x You are a Woman
Meetup is not interested in propagating gender stereotypes
33. Ensemble
Models and
Data
segregation
Ensemble Models: Combine outputs of
several classifiers for increased accuracy
If you have features that are useful but
you’re worried about interaction (and
your model does it automatically) use
ensemble modeling to restrict the
features to separate models.
36. Diversity
controlled
test data
● Make sure product works for
everybody
● Generate test data and evaluate
your model against it to confirm no
encapsulated prejudice
37. Diversity Controlled Testing
● CMU - AdFisher
○ Crawls ads with simulated user profiles
● Same technique can work to find bias in your own models!
○ Generate Test Data
■ Randomize sensitive feature in real data set
○ Run Model
■ Evaluate for unacceptable biased treatment
● Must identify what features are sensitive and what outcomes are
unwanted
38. You know racist computers are a
bad idea
Don’t let your company invent
racist computers