Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16

785 views

Published on

When Recommendations Systems Go Bad: Machine learning and recommendations systems have changed the way we interact with not just the internet, but some of the basic products and services that we use to run our lives.

While the reach and impact of big data and algorithms will continue to grow, how do we ensure that people are treated justly? Certainly there are already algorithms in use that determine if someone will receive a job interview or be accepted into a school. Misuse of data in many of these cases could have serious public relations, legal, and ethical consequences.

As the people that build these systems, we have a social responsibility to consider their effect on humanity, and we should do whatever we can to prevent these models from perpetuating some of the prejudice and bias that exist in our society today.

In this talk I intend to cover some examples of recommendation systems that have gone wrong across various industries, as well as why they went wrong and what can be done about it. The first step towards solving this larger issue is raising awareness, but there are concrete technical approaches that can be employed as well. Three that will be covered are:
- Accepting simplicity with interpretable models.
- Data segregation via ensemble modelling.
- Designing test data sets for capturing unintended bias.

Published in: Technology
  • Be the first to comment

Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16

  1. 1. When Recommendation Systems Go Bad Evan Estola 5/20/16
  2. 2. About Me Evan Estola Lead Machine Learning Engineer @ Meetup evan@meetup.com @estola
  3. 3. We want a world full of real, local community. Women’s Veterans Meetup, San Antonio, TX
  4. 4. Recommendation Systems: Collaborative Filtering
  5. 5. Recommendation Systems: Rating Prediction Netflix prize How many stars would user X give movie Y Boring
  6. 6. Recommendation Systems: Learning To Rank Active area of research Use ML model to solve a ranking problem Pointwise: Logistic Regression on binary label, use output for ranking Listwise: Optimize entire list Performance Metrics Mean Average Precision P@K Discounted Cumulative Gain
  7. 7. Data Science impacts lives Ads you see Apps you download Friend’s Activity/Facebook feed News you’re exposed to If a product is available If you can get a ride Price you pay for things Admittance into college Job openings you find/get If you can get a loan
  8. 8. You just wanted a kitchen scale, now Amazon thinks you’re a drug dealer
  9. 9. Ego Member/customer/user first Focus on building the best product, not on being the most clever data scientist Much harder to spin a positive user story than a story about how smart you are
  10. 10. “Black-sounding” names 25% more likely to be served ad suggesting criminal record
  11. 11. Ethics We have accepted that Machine Learning can seem creepy, how do we prevent it from becoming immoral? We have an ethical obligation to not teach machines to be prejudiced.
  12. 12. Data Ethics Awareness Tell your friends Tell your coworkers Tell your boss Identify groups that could be negatively impacted by your work Make a choice Take a stand
  13. 13. Interpretable Models For simple problems, simple solutions are often worth a small concession in performance Inspectable models make it easier to debug problems in data collection, feature engineering etc. Only include features that work the way you want Don’t include feature interactions that you don’t want
  14. 14. Logistic Regression StraightDistanceFeature(-0.0311f), ChapterZipScore(0.0250f), RsvpCountFeature(0.0207f), AgeUnmatchFeature(-1.5876f), GenderUnmatchFeature(-3.0459f), StateMatchFeature(0.4931f), CountryMatchFeature(0.5735f), FacebookFriendsFeature(1.9617f), SecondDegreeFacebookFriendsFeature(0.1594f), ApproxAgeUnmatchFeature(-0.2986f), SensitiveUnmatchFeature(-0.1937f), KeywordTopicScoreFeatureNoSuppressed(4.2432f), TopicScoreBucketFeatureNoSuppressed(1.4469f,0.257f,10f), TopicScoreBucketFeatureSuppressed(0.2595f,0.099f,10f), ExtendedTopicsBucketFeatureNoSuppressed(1.6203f,1.091f,10f), ChapterRelatedTopicsBucketFeatureNoSuppressed(0.1702f,0.252f,0.641f), ChapterRelatedTopicsBucketFeatureNoSuppressed(0.4983f,0.641f,10f), DoneChapterTopicsFeatureNoSuppressed(3.3367f)
  15. 15. Feature Engineering and Interactions ● Good Feature: ○ Join! You’re interested in Tech x Meetup is about Tech ● Good Feature: ○ Don’t join! Group is intended only for Women x You are a Man ● Bad Feature: ○ Don’t join! Group is mostly Men x You are a Woman ● Horrible Feature: ○ Don’t join! Meetup is about Tech x You are a Woman Meetup is not interested in propagating gender stereotypes
  16. 16. Ensemble Models and Data segregation Ensemble Models: Combine outputs of several classifiers for increased accuracy If you have features that are useful but you’re worried about interaction (and your model does it automatically) use ensemble modeling to restrict the features to separate models.
  17. 17. Ensemble Model, Data Segregation Data: *Interests Searches Friends Location Data: *Gender Friends Location Data: Model1 Prediction Model2 Prediction Model1 Prediction Model2 Prediction Final Prediction
  18. 18. Fake profiles, track ads Career coaching for “200k+” Executive jobs Ad Male group: 1852 impressions Female group: 318
  19. 19. Diversity Controlled Testing CMU - AdFisher Crawls ads with simulated user profiles Same technique can work to find bias in your own models! Generate Test Data Randomize sensitive feature in real data set Run Model Evaluate for unacceptable biased treatment Must identify what features are sensitive and what outcomes are unwanted
  20. 20. ● Twitter bot ● “Garbage in, garbage out” ● Responsibility? “In the span of 15 hours Tay referred to feminism as a "cult" and a "cancer," as well as noting "gender equality = feminism" and "i love feminism now." Tweeting "Bruce Jenner" at the bot got similar mixed response, ranging from "caitlyn jenner is a hero & is a stunning, beautiful woman!" to the transphobic "caitlyn jenner isn't a real woman yet she won woman of the year?"” Tay.ai
  21. 21. Diverse test data Outliers can matter The real world is messy Some people will mess with you Some people look/act different than you Defense Diversity Design
  22. 22. You know racist computers are a bad idea Don’t let your company invent racist computers

×