Oct. 4, 2012•0 likes•681 views

Report

Technology

Presented at RecSys 2012. "BlurMe: Inferring and Obfuscating User Gender Based on Ratings" User demographics, such as age, gender and ethnicity, are routinely used for targeting content and advertising products to users. Similarly, recommender systems utilize user demographics for personalizing recommendations and overcoming the cold-start problem. Often, privacy-concerned users do not provide these details in their online profiles. In this work, we show that a recommender system can infer the gender of a user with high accuracy, based solely on the ratings provided by users (without additional metadata), and a relatively small number of users who share their demographics. We design techniques for effectively adding ratings to a user's profile for obfuscating the user's gender, while having an insignificant effect on the recommendations provided to that user.

Smriti BhagatFollow

ASFWS 2011 : Code obfuscation: Quid Novi ?Cyber Security Alliance

New techniques in sql obfuscation, from DEFCON 20Nick Galbreath

BeEF_EUSecWest-2012_Michele-OrruMichele Orru

TakeDownCon Rocket City: WebShells by Adrian CrenshawEC-Council

Web attacks using obfuscated scriptAmol Kamble

On deobfuscation in practiceDmitry Schelkunov

- 1. BlurMe: Inferring and Obfuscating User Gender Based on Ratings Smriti Bhagat Joint work with Udi Weinsberg, Stratis Ioannidis, Nina Taft Technicolor, Palo Alto
- 2. Motivation What alldo What can they know infer about me!!?! about me? 2 9/17/12
- 3. Overview Given a rating profile, can we infer the user s demographics – age, gender, ethnicity, income etc.? If so, is it possible to obfuscate a user’s gender while maintaining useful recommendations? 3 9/17/12
- 4. Gender Inference 4 9/17/12
- 5. System Setting Recommender System Gender Male or Inference Female Training Dataset (Ratings & Gender User Rating of users) Profile (Obfuscated) Gender Obfuscation User Rating Profile 5 9/17/12
- 6. Gender Inference – Methods Notation: For a user we wish to classify, let • : vector of user ratings for M movies (0 if not rated) • : indicator vector of movies rated • : user gender Classification: • Three Generative Models • Binary Logistic Regression • Support Vector Machines (SVM) 6 9/17/12
- 7. Gender Inference – Generative Models (1) Notation: For a user we wish to classify, let • : : vector of user ratings for M movies (0 if not rated) • : indicator vector of movies rated • : user gender Classification: 1. Bernoulli Naïve Bayes: Over indicator vector of movies rated 7 9/17/12
- 8. Gender Inference – Generative Models (2) Notation: For a user we wish to classify, let • : : vector of user ratings for M movies (0 if not rated) • : indicator vector of movies rated • : user gender Classification: 2. Multinomial Naïve Bayes: Over user ratings vector , 8 9/17/12
- 9. Gender Inference – Generative Models (3) Notation: For a user we wish to classify, let • : : vector of user ratings for M movies (0 if not rated) • : indicator vector of movies rated • : user gender Classification: 3. Mixed Naïve Bayes: Considers both indicator and ratings vector Bernoulli Normal distribution distribution 9 9/17/12
- 10. Gender Inference – Logistic Regression & SVMs Notation: For a user we wish to classify, let • : vector of user ratings for M movies (0 if not rated) • : indicator vector of movies rated • : user gender Classification: • Logistic regression: Movies may not be rated independently Over and Indicating how correlated a movie is with the inferred gender • SVM: Over and 10 9/17/12
- 11. Gender Inference Results: AUC comparison Datasets: Flixster 34k users, 17k movies, 5.8m ratings, M:F=4:6 Movielens 6k users, 3.7k movies, 1m ratings, M:F=7:3 Flixster Movielens Logistic regression and SVM outperform naïve bayes methods 11 9/17/12
- 12. Gender Inference Results: Rating event Classifier Flixster Movielens AUC (Precision/Recall) AUC (Precision/Recall) SVM 0.82 (0.73/0.70) 0.86 (0.78/0.77) SVM 0.80 (0.72/0.70) 0.85 (0.78/0.77) Logistic Regression 0.84 (0.76/0.77) 0.85 (0.80/0.80) Logistic Regression 0.83 (0.75/0.76) 0.84 (0.78/0.79) Rating event is a strong signal for classification 12 9/17/12
- 13. Gender Inference Results: Most indicative movies List of top-10 movies most indicative of a user’s gender Flixster 13 9/17/12
- 14. Gender Obfuscation 14 9/17/12
- 15. System Setting Recommender System Gender Training Inference Dataset (User Ratings + Obfuscated Gender) Rating Profile Gender Obfuscation User Rating Profile 15 9/17/12
- 16. Gender Obfuscation: Main Idea • Insert k additional ratings, such that, it is hard to infer the gender of the user, while minimally impacting the quality of recommendations received • Two key decisions: Which movies should be added? What should be the rating assigned to each movie? 16 9/17/12
- 17. Gender Obfuscation: Mechanisms Q: Which movie to add? A: Three strategies for movie selection: • Random Strategy • Pick an movie uniformly at random from the list of movies indicative of the opposite gender • Sampled Strategy • Sample a movie based on the distribution of scores that indicate how strongly correlated a movie is with the opposite gender • Greedy Strategy • Greedily pick the movie most correlated with the opposite gender 17 9/17/12
- 18. Gender Obfuscation: Rating Assignment Q: What should be the rating? A: Two rating assignments: • Average movie rating • Predicted rating (using user and movie latent factors) 18 9/17/12
- 19. Gender Obfuscation: Accuracy of Inference Flixster Compared with 76.5% gender inference accuracy on unadulterated data, using the greedy strategy to add • 1% ratings drops the accuracy to 15% (80% decrease) • 10% ratings drops the accuracy to 0.1% 19 9/17/12
- 20. Gender Obfuscation: Impact on Recommendations Flixster Overall change in RMSE is small, i.e., maximum of 0.015 when 10% extra ratings are added using random strategy In theory, change in RMSE can be 0 if the rating value is the user’s predicted rating 20 9/17/12
- 21. Gender Obfuscation: Privacy-Utility Tradeoff Utility decreases Flixster Privacy increases 21 9/17/12
- 22. Conclusions 22 9/17/12
- 23. Contributions • Ratings alone can predict gender with about 80% accuracy • Act of rating a movie, regardless of the rating provided, is indicative of gender • Obfuscation strategies proposed reduce the success of gender inference by 80% with adding just 1% ratings while resulting in an insignificant change in RMSE 23 9/17/12
- 24. Ongoing work • Repeat for other demographic attributes – age, ethnicity • Inference in an online setting – which movie ratings should be solicited from a user to quickly infer demographics • Design obfuscation mechanisms that provide theoretical guarantees on privacy against any statistical inference mechanism 24 9/17/12
- 25. Thank You! Intern, post-doc or join our research team @paris or @palo-alto #technicolor # recsys2012 Contact: smriti.bhagat@technicolor.com 25 9/17/12