(Ab)using Identifiers: Indiscernibility of Identity
1. (Ab)using IdentiďŹers
@ Ben Gross
BayCHI
2009-11-10
University of Illinois Urbana Champaign
Library and Information Science
bgross@acm.org
http://bengross.com/ @
6. Why you might care
â˘Usability implications
â˘Productivity implications
â˘Security implications
â˘Employee satisfaction
@
7. How did I get here?
â˘âI only have one email address...â
â˘âWell, except that one I only use for...â
â˘âAnd that other one I use with...â
@
8. Half a million users
â... average user has 6.5 passwords, each of
which is shared across 3.9 different sites.
Each user has about 25 accounts that require
passwords, and types an average of 8
passwords per day.â
Dinei FlorĂŞncio and Cormac Herley. A Large-
Scale Study of Web Password Habits. WWW â07
@
10. Data
⢠Financial services ⢠Average # of
email addresses = 1.8
min 1 / max 4.
IM = 1.8
min 1 / max 4
⢠Design Firm ⢠Average # of
email addresses = 3.6
min 1 / max 10
IM = 1.7
min 1 / max 3
⢠Combined total ⢠Average = 3.3
@
11. âThe individual in ordinary work situations
presents himself and his activity to others, the
ways in which he guides and controls the
impression they form of him and the kinds of
things he may and may not do while sustaining
his performance before them.â
Erving Goffman
Presentation of Self in Everyday Life, 1959.
@
13. Social factors
â˘âI knew that my college one wasn't
forever, so I wanted something more
permanent after I graduated.â
â˘â...I didn't like the name that I
picked when it was my ďŹrst email.â
â˘â...you just say oh my ďŹrst name and
last name at gmail.com ... something
easy to remember.â
@
14. Technical factors
â˘Namespace saturation AKA the
jimsm1th77@hotmail.com problem
â˘Firewalls and VPNs AKA âThey
donât let me use Hotmail at work...â
â˘ConďŹguration problems AKA âWhat
does SMTP-AUTH with MD5
checksums on port 567 mean?â
@
16. Itâs Just Data...
âWeâre an information economy. They
teach you that in school. What they don't
tell you is that it's impossible to move, to
live, to operate at any level without leaving
traces, bits, seemingly meaningless
fragments that can be retrieved
ampliďŹed...â
William Gibson Johnny Mnemonic
@
21. Managing Flash Cookies
http://www.macromedia.com/support/
documentation/en/ďŹashplayer/help/
settings_manager07.html @
22. Referer (sic)
â˘adsl-75-18-132-43.dsl.pltn13.sbcglobal.net -
- [10/Nov/2009:14:50:56 -0800] "GET /
wireless.html HTTP/1.1" 200 29149
"http://bengross.com/voip.html" "Mozilla/
5.0 (Macintosh; U; Intel Mac OS X 10_6_2;
en-us) AppleWebKit/531.9 (KHTML, like
Gecko) Version/4.0.3 Safari/531.9"
@
23. Leaky Headers
On the Leakage of Personally IdentiďŹable
Information Via Online Social Networks
Balachander Krishnamurthy and Craig Wills
@
24. More Options
â˘URL Munging and Session IDs in URL
â˘Flash Cookies/Local Shared Object
â˘Silverlight Cookies
â˘Virtual Page Views, Event (Google
Analytics) User DeďŹned Values
@
25. Synthetic IDs
â˘Everything in the Referer header can
be used to for a synthetic identiďŹer.
â˘The User Agent is a good source
â˘IP addresses if you have them
â˘Screen dimensions, user agent
â˘Hash of IP address/remote ports
@
26. Other Sources of Bits
â˘Last ModiďŹed and ETag headers
â˘HTTP Keepalive
â˘SSL Session IDs
â˘TCP Timestamps
@
27. The Art of Being Lost
â˘âWe do not collect personal contact
information from visitors to your
website. Personal contact information
means billing address, physical
address, individual name, email
address, etc.â (OpenTracker.com)
@
28. NetďŹix Data Released
â˘Dataset contains 100,480,507 movie
ratings, created by 480,189 NetďŹix
subscribers between December 1999 and
December 2005.
â˘â...all customer identifying information
has been removed; all that remains are
ratings and dates.
This
follows our
privacy policy...â
â˘No unique identiďŹers or quasi-identiďŹers
@
29. You Only Need Two
â˘Robust De-anonymization of Large Sparse
Datasets by Arvind Narayanan and Vitaly
Shmatikov
â˘IMBD as a source of entropy
â˘âWith 8 movie ratings (of which 2 may be
completely wrong) and dates that may have
a 14-day error, 99% of records can be
uniquely identiďŹed in the dataset.â
@
30. It comes down to this
âQ: If you don't publicly rate movies on IMDb and similar
forums, there is nothing to worry about.
A: ...you should not ever mention any movies you
watched prior to 2005 on a public blog or website.
Everybody who was a NetďŹix subscriber prior to 2005
should restrain themselves from these activities...
We do not think this is a feasible privacy policy.â
FAQ
âHow to Break Anonymity of the NetďŹix Prize Datasetâ
@
31. Guessing Your SSN
â˘Predicting Social Security Numbers
from Public Data by Alessandro Acquisti
and Ralph Gross
â˘...Iâll just need the last 4 of your SSN for
veriďŹcation purposes...
â˘â...we accurately predicted the ďŹrst 5
digits of 2% of California records with
1980 birthdays, and 90% of Vermont
records with 1995 birthdays.â
@
32. Disclosure and UI
â˘âFacebook Beacon is a way for you to
bring actions you take online into
Facebook. Beacon works by allowing
afďŹliate websites to send stories about
actions you take to Facebook.â
â˘Launched November 2007
â˘Class action lawsuit August 2008
â˘Shut down September 2009
@