(Ab)using Identifiers: Indiscernibility of Identity


Published on

Ben Gross at BayCHI November 10, 2009

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

(Ab)using Identifiers: Indiscernibility of Identity

  1. 1. (Ab)using Identifiers @ Ben Gross BayCHI 2009-11-10 University of Illinois Urbana Champaign Library and Information Science bgross@acm.org http://bengross.com/ @
  2. 2. @
  3. 3. Why I am interested @ bengross@gmail.com bgross@uiuc.edu @bengross bgross@acm.org bgross@bgross.com http://bengross.com http://flickr.com/bengross bgross@ischool.berkeley.edu http://facebook.com/bengross bgross@messagingnews.com @
  4. 4. How many @ Email addresses Web site logins Instant Social network messenger IDs profiles Domain names Phone numbers Do you have? @
  5. 5. All your @’s are belong to us @
  6. 6. Why you might care •Usability implications •Productivity implications •Security implications •Employee satisfaction @
  7. 7. How did I get here? •“I only have one email address...” •“Well, except that one I only use for...” •“And that other one I use with...” @
  8. 8. Half a million users “... average user has 6.5 passwords, each of which is shared across 3.9 different sites. Each user has about 25 accounts that require passwords, and types an average of 8 passwords per day.” Dinei Florêncio and Cormac Herley. A Large- Scale Study of Web Password Habits. WWW ’07 @
  9. 9. Population •Qualitative in-depth interview study •44 people across two Bay Area firms •Financial services firm (regulated) •Design firm (unregulated) • @
  10. 10. Data • Financial services • Average # of email addresses = 1.8 min 1 / max 4. IM = 1.8 min 1 / max 4 • Design Firm • Average # of email addresses = 3.6 min 1 / max 10 IM = 1.7 min 1 / max 3 • Combined total • Average = 3.3 @
  11. 11. “The individual in ordinary work situations presents himself and his activity to others, the ways in which he guides and controls the impression they form of him and the kinds of things he may and may not do while sustaining his performance before them.” Erving Goffman Presentation of Self in Everyday Life, 1959. @
  12. 12. Why more than one? @
  13. 13. Social factors •“I knew that my college one wasn't forever, so I wanted something more permanent after I graduated.” •“...I didn't like the name that I picked when it was my first email.” •“...you just say oh my first name and last name at gmail.com ... something easy to remember.” @
  14. 14. Technical factors •Namespace saturation AKA the jimsm1th77@hotmail.com problem •Firewalls and VPNs AKA “They don’t let me use Hotmail at work...” •Configuration problems AKA “What does SMTP-AUTH with MD5 checksums on port 567 mean?” @
  15. 15. Regulatory factors @
  16. 16. It’s Just Data... “We’re an information economy. They teach you that in school. What they don't tell you is that it's impossible to move, to live, to operate at any level without leaving traces, bits, seemingly meaningless fragments that can be retrieved amplified...” William Gibson Johnny Mnemonic @
  17. 17. What’s Underneath? •Developer Tools •FireBug/FireCookie •Safari Web Inspector •Charles Proxy/HTTP Analyzer •Forensic Tools @
  18. 18. Cookies @
  19. 19. More detail @
  20. 20. Bake Your Own @
  21. 21. Managing Flash Cookies http://www.macromedia.com/support/ documentation/en/flashplayer/help/ settings_manager07.html @
  22. 22. Referer (sic) •adsl-75-18-132-43.dsl.pltn13.sbcglobal.net - - [10/Nov/2009:14:50:56 -0800] "GET / wireless.html HTTP/1.1" 200 29149 "http://bengross.com/voip.html" "Mozilla/ 5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9" @
  23. 23. Leaky Headers On the Leakage of Personally Identifiable Information Via Online Social Networks Balachander Krishnamurthy and Craig Wills @
  24. 24. More Options •URL Munging and Session IDs in URL •Flash Cookies/Local Shared Object •Silverlight Cookies •Virtual Page Views, Event (Google Analytics) User Defined Values @
  25. 25. Synthetic IDs •Everything in the Referer header can be used to for a synthetic identifier. •The User Agent is a good source •IP addresses if you have them •Screen dimensions, user agent •Hash of IP address/remote ports @
  26. 26. Other Sources of Bits •Last Modified and ETag headers •HTTP Keepalive •SSL Session IDs •TCP Timestamps @
  27. 27. The Art of Being Lost •“We do not collect personal contact information from visitors to your website. Personal contact information means billing address, physical address, individual name, email address, etc.” (OpenTracker.com) @
  28. 28. Netflix Data Released •Dataset contains 100,480,507 movie ratings, created by 480,189 Netflix subscribers between December 1999 and December 2005. •“...all customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy...” •No unique identifiers or quasi-identifiers @
  29. 29. You Only Need Two •Robust De-anonymization of Large Sparse Datasets by Arvind Narayanan and Vitaly Shmatikov •IMBD as a source of entropy •“With 8 movie ratings (of which 2 may be completely wrong) and dates that may have a 14-day error, 99% of records can be uniquely identified in the dataset.” @
  30. 30. It comes down to this “Q: If you don't publicly rate movies on IMDb and similar forums, there is nothing to worry about. A: ...you should not ever mention any movies you watched prior to 2005 on a public blog or website. Everybody who was a Netflix subscriber prior to 2005 should restrain themselves from these activities... We do not think this is a feasible privacy policy.” FAQ “How to Break Anonymity of the Netflix Prize Dataset” @
  31. 31. Guessing Your SSN •Predicting Social Security Numbers from Public Data by Alessandro Acquisti and Ralph Gross •...I’ll just need the last 4 of your SSN for verification purposes... •“...we accurately predicted the first 5 digits of 2% of California records with 1980 birthdays, and 90% of Vermont records with 1995 birthdays.” @
  32. 32. Disclosure and UI •“Facebook Beacon is a way for you to bring actions you take online into Facebook. Beacon works by allowing affiliate websites to send stories about actions you take to Facebook.” •Launched November 2007 •Class action lawsuit August 2008 •Shut down September 2009 @
  33. 33. Opt Out: First Try @
  34. 34. Opt Out: Second Try @
  35. 35. Evasion •Ghostery •Opt Out Tools •Ad Blockers/Flash Blockers •HTTP Cookie/LSO Managers •Header Modification Tools •Proxies/Tor @
  36. 36. @
  37. 37. @
  38. 38. @
  39. 39. @
  40. 40. What’s Next? •Geolocation •Roll up for more large collections •More of addition bits need for de- anonymization available via social networks @
  41. 41. @ Ben Gross University of Illinois Urbana Champaign Library and Information Science bgross@acm.org http://bengross.com/ @