Towards a socio demographic fingerprint ch-iassist 2013

Towards a procedure to anonymise micro data
Anonymising data from official statistics for public use
IASSIST, Köln - 30.05.2013
Katelijne Gysen
katelijne.gysen@fors.unil.ch

2
Outline
1. Promotion of official statistics
2. Anonymisation of data
2.1 Trade off: disclosure risk versus data utility
2.2 Procedure
2.3 Parameter setting for Statistical Disclosure Control (SDC)
1. Uniqueness and k-anonymity
3.1 Concepts
3.2 Recent research on mobility data
3.3 The real fingerprint
3.4 Socio-demographic fingerprint

3
1. Promotion of official statistics
 Data from National Statistical Institute (NSI)
 Labour Force Survey
 Survey on Structure of Earnings
 SILC (Survey on Income and Living Conditions)
 PISA (Education)
 Swiss Health Survey
 Population Census and Business Census, …
 Micro data for research and teaching purposes
Collaboration with our NSI:

4
2. Anonymisation of data
2.1 Trade-off dilemma: disclosure risk versus data utility
researcher versus data owner
Data utility
Data protection

5
2.2 Procedure (1)Dataset
Release data
Risk / utility
Balance ?
Describe
Intrusion scenario
Apply
SDC methods
Describe
Dataset characteristics
Define
Target public
Release data
Disclosure risk ?
Measure
Data utility
Describe
access conditions

6
2.2 Procedure (2)Dataset
Release data
Data utility ?
Describe
Intrusion scenario
Apply
SDC methods
Set
SDC parametersDescribe
Dataset characteristics
Define
Target public
SDC
parameters
met ?
Release data
Disclosure risk ?
Measure
Data utility
Describe
access conditions

7
2.3 Parameter setting for Statistical Disclosure Control (SDC)
1. Age of the data (min.)
2. Subsample (min.)
3. Level of geographical detail (max.)
4. Global and individual risk (max.)
5. Number of indirect identifying variables (max.)
6. Degree of anonymity for socio-demographic characteristics
(min.)

8
Micro data
identifying
variables
Non identifying variables Rare
Observable
Searchable
3 Uniqueness and k-anonymity - 3.1 Concepts

9
3.2 Recent research about mobility data
“… four, randomly chosen “spatio-temporal points” (for
example, mobile device pings to antennas)
is enough to: uniquely identify 95% of the individuals”.
The mobility pattern is apparently unique.

10
3.3 The real fingerprint
“There are as many as 150 ridge characteristics (points) in the average fingerprint.
So how many points must a fingerprint examiner match in order to safely say the
prints are indeed those of a particular suspect?”
The answer is surprising.
“There is no standard number required. …
… In fact, the decision as to whether or not there is a match is left entirely to the
individual examiner. However, individual departments and agencies may have their
own set of standards in place that requires a certain number of points be matched
before making a positive identification.”
Source: http://www.leelofland.com/wordpress/comparing-fingerprints-whats-the-point
/

11
3.4 The socio-demographic fingerprint
 Gender
 Date of birth
 Municipality
 Civil status
 Nationality

12
3.4 The socio-demographic fingerprint (2)
Source: STATPOP 2010, BFS.
k-anonymity
1 2 5 20 100 1000
Gender * DOB * Municipality 74 86.9 95.3 100 100 100
Gender * YOB * Municipality 0.7 1.9 6.3 27.6 68.3 92.1
Gender * YOB * Civil status * Municipality 3.2 6.4 14.9 41.5 77.9 96.6
Gender * YOB * Nationality * Municipality 7.9 12.9 21.3 47.1 82 97.1
Gender * YOB * Civil * Nation * Municip. 12 18.6 31.1 59.6 87.4 98.9
Anonymity of the Swiss population given simple socio-demographics

13
References
 de Montjoye, Y.A., Hidalgo C.A., Verleysen M., Blondel V.D. Unique in the crowd: the
privacy bounds of human mobility. Scientific Reports 3, article 1376, DOI:
10.1038/srep01376. 2013
 Franconi, L., Public Use Files: practices and methods to increase quality of released
microdata. OECD, 2012.
 Golle, P. Revisiting the uniqueness of simple demographics in the US population. Palo
Alto Research Center. 2006
 Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E.,
Spicer, K. , De Wolf P.P., Statistical Disclosure Control. Wiley. 2012.
 Sweeney, L. Simple Demographics often identify people uniquely. Carnegie Mellon
University, Data Privacy Working Paper 3. Pittsburgh 2000.
 Sweeney, L. k-Anonymity: a model for protecting privacy. International Journal on
Uncertainty, Fuziness and Knowledge-based Systems, 10 (5), 2002, 557-570.
 Meindl, B., Kowarik, A., Templ M. Guidelines for the anonymisation of microdata using R-
package sdcMicro. Vienna. 2012

14
Find out more ?
about FORS: www.fors.unil.ch
about public microdata for research in CH: www.compass.unil.ch
Let’s connect !

Towards a socio demographic fingerprint ch-iassist 2013

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Towards a socio demographic fingerprint ch-iassist 2013

Similar to Towards a socio demographic fingerprint ch-iassist 2013 (20)

Recently uploaded

Recently uploaded (20)

Towards a socio demographic fingerprint ch-iassist 2013

Editor's Notes