SlideShare a Scribd company logo
Towards a procedure to anonymise micro data
Anonymising data from official statistics for public use
IASSIST, Köln - 30.05.2013
Katelijne Gysen
katelijne.gysen@fors.unil.ch
2
Outline
1. Promotion of official statistics
2. Anonymisation of data
2.1 Trade off: disclosure risk versus data utility
2.2 Procedure
2.3 Parameter setting for Statistical Disclosure Control (SDC)
1. Uniqueness and k-anonymity
3.1 Concepts
3.2 Recent research on mobility data
3.3 The real fingerprint
3.4 Socio-demographic fingerprint
3
1. Promotion of official statistics
 Data from National Statistical Institute (NSI)
 Labour Force Survey
 Survey on Structure of Earnings
 SILC (Survey on Income and Living Conditions)
 PISA (Education)
 Swiss Health Survey
 Population Census and Business Census, …
 Micro data for research and teaching purposes
Collaboration with our NSI:
4
2. Anonymisation of data
2.1 Trade-off dilemma: disclosure risk versus data utility
researcher versus data owner
Data utility
Data protection
5
2.2 Procedure (1)Dataset
Release data
Risk / utility
Balance ?
Describe
Intrusion scenario
Apply
SDC methods
Describe
Dataset characteristics
Define
Target public
Release data
Disclosure risk ?
Measure
Data utility
Describe
access conditions
6
2.2 Procedure (2)Dataset
Release data
Data utility ?
Describe
Intrusion scenario
Apply
SDC methods
Set
SDC parametersDescribe
Dataset characteristics
Define
Target public
SDC
parameters
met ?
Release data
Disclosure risk ?
Measure
Data utility
Describe
access conditions
7
2.3 Parameter setting for Statistical Disclosure Control (SDC)
1. Age of the data (min.)
2. Subsample (min.)
3. Level of geographical detail (max.)
4. Global and individual risk (max.)
5. Number of indirect identifying variables (max.)
6. Degree of anonymity for socio-demographic characteristics
(min.)
8
Micro data
identifying
variables
Non identifying variables Rare
Observable
Searchable
3 Uniqueness and k-anonymity - 3.1 Concepts
9
3.2 Recent research about mobility data
“… four, randomly chosen “spatio-temporal points” (for
example, mobile device pings to antennas)
is enough to: uniquely identify 95% of the individuals”.
The mobility pattern is apparently unique.
10
3.3 The real fingerprint
“There are as many as 150 ridge characteristics (points) in the average fingerprint.
So how many points must a fingerprint examiner match in order to safely say the
prints are indeed those of a particular suspect?”
The answer is surprising.
“There is no standard number required. …
… In fact, the decision as to whether or not there is a match is left entirely to the
individual examiner. However, individual departments and agencies may have their
own set of standards in place that requires a certain number of points be matched
before making a positive identification.”
Source: http://www.leelofland.com/wordpress/comparing-fingerprints-whats-the-point
/
11
3.4 The socio-demographic fingerprint
 Gender
 Date of birth
 Municipality
 Civil status
 Nationality
12
3.4 The socio-demographic fingerprint (2)
Source: STATPOP 2010, BFS.
k-anonymity
  1 2 5 20 100 1000
Gender * DOB * Municipality 74 86.9 95.3 100 100 100
Gender * YOB * Municipality  0.7 1.9 6.3 27.6 68.3 92.1
Gender * YOB * Civil status * Municipality  3.2 6.4 14.9 41.5 77.9 96.6
Gender * YOB * Nationality * Municipality  7.9 12.9 21.3 47.1 82 97.1
Gender * YOB * Civil * Nation * Municip.  12 18.6 31.1 59.6 87.4 98.9
Anonymity of the Swiss population given simple socio-demographics
13
References
 de Montjoye, Y.A., Hidalgo C.A., Verleysen M., Blondel V.D. Unique in the crowd: the
privacy bounds of human mobility. Scientific Reports 3, article 1376, DOI:
10.1038/srep01376. 2013
 Franconi, L., Public Use Files: practices and methods to increase quality of released
microdata. OECD, 2012.
 Golle, P. Revisiting the uniqueness of simple demographics in the US population. Palo
Alto Research Center. 2006
 Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E.,
Spicer, K. , De Wolf P.P., Statistical Disclosure Control. Wiley. 2012.
 Sweeney, L. Simple Demographics often identify people uniquely. Carnegie Mellon
University, Data Privacy Working Paper 3. Pittsburgh 2000.
 Sweeney, L. k-Anonymity: a model for protecting privacy. International Journal on
Uncertainty, Fuziness and Knowledge-based Systems, 10 (5), 2002, 557-570.
 Meindl, B., Kowarik, A., Templ M. Guidelines for the anonymisation of microdata using R-
package sdcMicro. Vienna. 2012
14
Find out more ?
about FORS: www.fors.unil.ch
about public microdata for research in CH: www.compass.unil.ch
Let’s connect !

More Related Content

Viewers also liked

Repco home finance q4 fy13 earnings
Repco home finance q4 fy13 earningsRepco home finance q4 fy13 earnings
Repco home finance q4 fy13 earnings
Purv
 
Mr isafety overview_anaesthetists_2013_edit_4_online
Mr isafety overview_anaesthetists_2013_edit_4_onlineMr isafety overview_anaesthetists_2013_edit_4_online
Mr isafety overview_anaesthetists_2013_edit_4_online
mriphysics
 
Intel proccessor manufacturing
Intel proccessor manufacturingIntel proccessor manufacturing
Intel proccessor manufacturingTirtha Mal
 
Ethical hacking Book Review
Ethical hacking Book ReviewEthical hacking Book Review
Ethical hacking Book Review
Tirtha Mal
 
Fleet management system
Fleet management systemFleet management system
Fleet management system
Tirtha Mal
 
Leadership Style of the richest Indian Mukesh ambani
Leadership Style of the richest Indian Mukesh ambaniLeadership Style of the richest Indian Mukesh ambani
Leadership Style of the richest Indian Mukesh ambani
Tirtha Mal
 
Communication ppt
Communication pptCommunication ppt
Communication pptTirtha Mal
 

Viewers also liked (7)

Repco home finance q4 fy13 earnings
Repco home finance q4 fy13 earningsRepco home finance q4 fy13 earnings
Repco home finance q4 fy13 earnings
 
Mr isafety overview_anaesthetists_2013_edit_4_online
Mr isafety overview_anaesthetists_2013_edit_4_onlineMr isafety overview_anaesthetists_2013_edit_4_online
Mr isafety overview_anaesthetists_2013_edit_4_online
 
Intel proccessor manufacturing
Intel proccessor manufacturingIntel proccessor manufacturing
Intel proccessor manufacturing
 
Ethical hacking Book Review
Ethical hacking Book ReviewEthical hacking Book Review
Ethical hacking Book Review
 
Fleet management system
Fleet management systemFleet management system
Fleet management system
 
Leadership Style of the richest Indian Mukesh ambani
Leadership Style of the richest Indian Mukesh ambaniLeadership Style of the richest Indian Mukesh ambani
Leadership Style of the richest Indian Mukesh ambani
 
Communication ppt
Communication pptCommunication ppt
Communication ppt
 

Similar to Towards a socio demographic fingerprint ch-iassist 2013

What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin
eraser Juan José Calderón
 
Intelligence Analysis
Intelligence AnalysisIntelligence Analysis
Intelligence Analysis
Nicolae Sfetcu
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Academia Sinica
 
Big data for development
Big data for development Big data for development
Big data for development
Junaid Qadir
 
Altman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless DataAltman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless Data
National Information Standards Organization (NISO)
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Symeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
Eleftherios Spyromitros-Xioufis
 
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
Tony Vilchez Yarihuaman
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
University of Washington
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTP
Micah Altman
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Amit Sheth
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"
Micah Altman
 
WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011Vincent Ducrey
 
IIR 2017, Lugano Switzerland
IIR 2017, Lugano SwitzerlandIIR 2017, Lugano Switzerland
IIR 2017, Lugano Switzerland
Marco Polignano
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
Micah Altman
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspective
Micah Altman
 
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di PisaDATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
Laboratorio di Cultura Digitale, labcd.humnet.unipi.it
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
Micah Altman
 
Micah Altman NISO privacy in library systems
Micah Altman NISO privacy in library systemsMicah Altman NISO privacy in library systems
Micah Altman NISO privacy in library systems
National Information Standards Organization (NISO)
 

Similar to Towards a socio demographic fingerprint ch-iassist 2013 (20)

What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin What Data Can Do: A Typology of Mechanisms . Angèle Christin
What Data Can Do: A Typology of Mechanisms . Angèle Christin
 
Intelligence Analysis
Intelligence AnalysisIntelligence Analysis
Intelligence Analysis
 
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...Computational Social Science:The Collaborative Futures of Big Data, Computer ...
Computational Social Science:The Collaborative Futures of Big Data, Computer ...
 
Big data for development
Big data for development Big data for development
Big data for development
 
asi_22876_Rev
asi_22876_Revasi_22876_Rev
asi_22876_Rev
 
Altman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless DataAltman - Perfectly Anonymous Data is Perfectly Useless Data
Altman - Perfectly Anonymous Data is Perfectly Useless Data
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
A bluetooth-low-energy-dataset-for-the-analysis-of-social-inte 2020-data-in-
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Big Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTPBig Data & Privacy -- Response to White House OSTP
Big Data & Privacy -- Response to White House OSTP
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
"Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective""Reproducibility from the Informatics Perspective"
"Reproducibility from the Informatics Perspective"
 
WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011WEF - Personal Data New Asset Report2011
WEF - Personal Data New Asset Report2011
 
IIR 2017, Lugano Switzerland
IIR 2017, Lugano SwitzerlandIIR 2017, Lugano Switzerland
IIR 2017, Lugano Switzerland
 
Scientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics PerspectiveScientific Reproducibility from an Informatics Perspective
Scientific Reproducibility from an Informatics Perspective
 
Reproducibility from an infomatics perspective
Reproducibility from an infomatics perspectiveReproducibility from an infomatics perspective
Reproducibility from an infomatics perspective
 
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di PisaDATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
DATA & PRIVACY PROTECTION Anna Monreale Università di Pisa
 
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESBROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCES
 
Micah Altman NISO privacy in library systems
Micah Altman NISO privacy in library systemsMicah Altman NISO privacy in library systems
Micah Altman NISO privacy in library systems
 

Recently uploaded

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 

Recently uploaded (20)

Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 

Towards a socio demographic fingerprint ch-iassist 2013

  • 1. Towards a procedure to anonymise micro data Anonymising data from official statistics for public use IASSIST, Köln - 30.05.2013 Katelijne Gysen katelijne.gysen@fors.unil.ch
  • 2. 2 Outline 1. Promotion of official statistics 2. Anonymisation of data 2.1 Trade off: disclosure risk versus data utility 2.2 Procedure 2.3 Parameter setting for Statistical Disclosure Control (SDC) 1. Uniqueness and k-anonymity 3.1 Concepts 3.2 Recent research on mobility data 3.3 The real fingerprint 3.4 Socio-demographic fingerprint
  • 3. 3 1. Promotion of official statistics  Data from National Statistical Institute (NSI)  Labour Force Survey  Survey on Structure of Earnings  SILC (Survey on Income and Living Conditions)  PISA (Education)  Swiss Health Survey  Population Census and Business Census, …  Micro data for research and teaching purposes Collaboration with our NSI:
  • 4. 4 2. Anonymisation of data 2.1 Trade-off dilemma: disclosure risk versus data utility researcher versus data owner Data utility Data protection
  • 5. 5 2.2 Procedure (1)Dataset Release data Risk / utility Balance ? Describe Intrusion scenario Apply SDC methods Describe Dataset characteristics Define Target public Release data Disclosure risk ? Measure Data utility Describe access conditions
  • 6. 6 2.2 Procedure (2)Dataset Release data Data utility ? Describe Intrusion scenario Apply SDC methods Set SDC parametersDescribe Dataset characteristics Define Target public SDC parameters met ? Release data Disclosure risk ? Measure Data utility Describe access conditions
  • 7. 7 2.3 Parameter setting for Statistical Disclosure Control (SDC) 1. Age of the data (min.) 2. Subsample (min.) 3. Level of geographical detail (max.) 4. Global and individual risk (max.) 5. Number of indirect identifying variables (max.) 6. Degree of anonymity for socio-demographic characteristics (min.)
  • 8. 8 Micro data identifying variables Non identifying variables Rare Observable Searchable 3 Uniqueness and k-anonymity - 3.1 Concepts
  • 9. 9 3.2 Recent research about mobility data “… four, randomly chosen “spatio-temporal points” (for example, mobile device pings to antennas) is enough to: uniquely identify 95% of the individuals”. The mobility pattern is apparently unique.
  • 10. 10 3.3 The real fingerprint “There are as many as 150 ridge characteristics (points) in the average fingerprint. So how many points must a fingerprint examiner match in order to safely say the prints are indeed those of a particular suspect?” The answer is surprising. “There is no standard number required. … … In fact, the decision as to whether or not there is a match is left entirely to the individual examiner. However, individual departments and agencies may have their own set of standards in place that requires a certain number of points be matched before making a positive identification.” Source: http://www.leelofland.com/wordpress/comparing-fingerprints-whats-the-point /
  • 11. 11 3.4 The socio-demographic fingerprint  Gender  Date of birth  Municipality  Civil status  Nationality
  • 12. 12 3.4 The socio-demographic fingerprint (2) Source: STATPOP 2010, BFS. k-anonymity   1 2 5 20 100 1000 Gender * DOB * Municipality 74 86.9 95.3 100 100 100 Gender * YOB * Municipality  0.7 1.9 6.3 27.6 68.3 92.1 Gender * YOB * Civil status * Municipality  3.2 6.4 14.9 41.5 77.9 96.6 Gender * YOB * Nationality * Municipality  7.9 12.9 21.3 47.1 82 97.1 Gender * YOB * Civil * Nation * Municip.  12 18.6 31.1 59.6 87.4 98.9 Anonymity of the Swiss population given simple socio-demographics
  • 13. 13 References  de Montjoye, Y.A., Hidalgo C.A., Verleysen M., Blondel V.D. Unique in the crowd: the privacy bounds of human mobility. Scientific Reports 3, article 1376, DOI: 10.1038/srep01376. 2013  Franconi, L., Public Use Files: practices and methods to increase quality of released microdata. OECD, 2012.  Golle, P. Revisiting the uniqueness of simple demographics in the US population. Palo Alto Research Center. 2006  Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Schulte Nordholt, E., Spicer, K. , De Wolf P.P., Statistical Disclosure Control. Wiley. 2012.  Sweeney, L. Simple Demographics often identify people uniquely. Carnegie Mellon University, Data Privacy Working Paper 3. Pittsburgh 2000.  Sweeney, L. k-Anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuziness and Knowledge-based Systems, 10 (5), 2002, 557-570.  Meindl, B., Kowarik, A., Templ M. Guidelines for the anonymisation of microdata using R- package sdcMicro. Vienna. 2012
  • 14. 14 Find out more ? about FORS: www.fors.unil.ch about public microdata for research in CH: www.compass.unil.ch Let’s connect !

Editor's Notes

  1. 25 juin 2013 Good afternoon everybody, When I got the invitation of our project leader to send in an abstract for this conference, first I was a bit hesitating that the subject I’m working on would fit. So I have to admit that arriving here – and being a sociologist -I started to count the word confidentiality and now I do not have this concern anymore. Let’s have a look.
  2. 25 juin 2013 As an introduction I will briefly talk about the data we work with and than I will move to the topic : anonymisation of data: first I present the subject as trade off – balancing excercise, than I will show you why this exercise might get complicated, so we developed a procedure to simplify again. The last point will be about uniqueness of people, I would like to introduce the concept socio-demographic fingerprint.
  3. 25 juin 2013 I’m lucky that the plenary session of this morning was about the same kind of data … so you’ll probably know what I’m talking about. Our task is to promote official statistics for research and in our case this means data stemming from the Federal Statistical Office in Neuchâtel. I just put some names to show you what it is about: The first three are European surveys, where Eurostat is giving guidelines, Pisa international, than we have a couple of national surveys, for subjects that have an equivalent in other countries. And something important to mention is the fact that we only work with micro data (rectangular): records and variables. Of course this is a project in coll. with our NSI ------- Why public micro data? Wirth, H. The data from the FSO rely in general on: Large samples - precision Long time series Good quality Definitions: Public data /offical data / data stemming from NSI, collected with public money Opengovernment data Micro data (variables for a record/entity on different characteristics) Anonymising: prevent for identification - Therefore one can apply SDC techniques : e.g. recoding, suppression of information, perturbate information. As an introduction, I will shortly talk about the datasets we are promoting for secondary use at FORS and about the big question that has to be answered when dealing with anonymisation of data. Core of the presentation, present/ the procedure FORS is a national centre of expertise in the social sciences. Its primary activities consist of: 1. production of survey data, including national and international surveys; 2. preservation and dissemination of data for use in secondary analysis; 3. research in empirical social sciences, with focus on survey methodology; 4. consulting services for researchers in Switzerland and abroad. FORS collaborates with researchers and research institutes in the social sciences in Switzerland and internationally. FORS is a national centre of expertise in the social sciences. Its primary activities consist of: 1. production of survey data, including national and international surveys; 2. preservation and dissemination of data for use in secondary analysis; 3. research in empirical social sciences, with focus on survey methodology; 4. consulting services for researchers in Switzerland and abroad. FORS collaborates with researchers and research institutes in the social sciences in Switzerland and internationally.
  4. 25 juin 2013 As these data are not just weather data or about your opinion on the weather these days … We have to recognize that the confidentiality of the data requires a special approach and treatment. Let’s keep it simple first: As all big questions in this world : it is all about a trade off – finding a balance. And it has been said before: It is finding a balance between the data utility and the disclosure risk, so there will be some data protection. To what extend? It should not be to difficult to argue where to put the slider, if it was not that quite a lot of elements play a role in the decision and you have most of the time different players : In our case it is our data service (representing researchers) and the NSI.
  5. 25 juin 2013 It can get complex, so in order to oversee the whole we put things into a scheme / procedure. That’s what the next slide is about. On the left you will find the different elements indicating you where to put that slider. Then you study how a potential intruder might try to disclose information (and again I do not have time to go into detail), but I will just mention two different ways: e.g. response knowledge might be a threat or the possibility to link data with other datasets. For the left part we are developing guidelines, that will appear as a kind of Checklist for Disclosure Potential. If you see that there is no disclosure risk because for example you have the accessible in a safe center or on remote access. You can make the data accessible. You will have to apply some methods to control for this statistical disclosure. Check the Balance and you can publish the data. This is nice, but the question : How many anonymisation is enough ? How many content do I need to have an interesting dataset ?
  6. 25 juin 2013 The reason why we had to extend the scheme with two other boxes being: The necessity to agree on a threshold – I called it - SDC parameters. Literature is talking about disclosure risk measurement. So we can use them. Let’s have a look at the parameters we used.
  7. 25 juin 2013 It is possible to fix some threshold for the next elements: In general: The older the data, the more difficult to disclose information The smaller the subsample, the more difficult to disclose The less detailed the geographical detail, the more difficult to disclose. The smaller the global and individual risk, the more difficult to disclose The smaller the nb and categories of indirect id. Var., the more difficult to disclose The higher the degree of anonymity for the socio-demographic characteristics, the more difficult to disclose. In the next and last part of my presentation I will concentrate on this degree of anonymity for socio-demographic characteristics.
  8. First some basic concepts. Micro dataset you can devide the variables that are identifying and the var. that are not identifying. Identifying variables are variables that are either rare, observable or searchable. And we distinguish in general : direct identifiers and indirect identifiers. Some examples: It is common sense that you can not make data available with direct identifiers. And it is –in the meantime – common sense that you have to be careful with indirect identifiers as they may function as a quasi identifier. Easily said, if you know that the data you are looking at are from a female, living in Ecublens next to Lausanne, who was born on the 23.12.yyyy – could be me. The next question I will have is how many statistical look alikes do I have ? So, those indirect identifiers can be used as key to disclose information. That’s the reason why it is important to describe the degree of anonymity.
  9. 25 juin 2013 Just small jump into real world. I will just cite this work / just interesting. References are at the of the presentation. 2 points identify 50 % of the individuals. That ‘s what they call a virtual fingerprint.
  10. 25 juin 2013 The real world and about a real fingerprint
  11. 25 juin 2013 And now I will come back to one of our parameters: degree of socio-demographic characteristic. As our experience shows that the biggest risk comes from linking our datasets with datasets with socio-demographic characteristics we concentrate on obtaining knowledge about the uniqueness of people on those characteristics. We started to look at : gender, age and location. Then we extended with: civil status and nationality.
  12. 25 juin 2013 Some figures. You find here the anonimity of the swiss population, given some simple demographics.
  13. 25 juin 2013 FORS Ich werde Ihnen jetzt die Arbeit von unser Team COMPASS vorstellen. Wir fangen am Anfang an: es gibt in Neuchatel das Bundesamt für Statistik und es gibt in der Schweiz verbreitet Universitäten, Fachhochschulen, Hochschulen. Das Bundesamt für Statistik sammelt Daten und verarbeitet sie, wenn publiziert kommen da meistens Tabellen heraus. Sie verfügen aber auch über Datensätze und darum geht es hier: Datensätze sind Datenschätze. Die Universitäten haben Forschenden, studierenden. Sie Forschen und unterrichten. Man könnte fast a priori erahnen / sagen das da ein Interesse für Datensätze vorhanden ist. (sekundäre analyse). An sich wurde man sagen ok: da rufen wir einfach an und frage nach wegen ein Datensatz. Aber: Wo kann soll anrufen, wem soll ich fragen und welche Daten soll ich verlangen, welche sind am meisten geeignet für meine Forschungs/Unterrichtszwecke? Es gibt im BFS keine Abteilung /Kontaktstelle die als Aufgabe hat alles im Überblick dar zu stellen. Die Lücke möchten wir füllen. Natürlich habe ich mir erlaubt die ganze Situation vereinfacht dar zu stellen. Ich werde jetzt etwas mehr ins Detail treten.
  14. 25 juin 2013 FORS Ich werde Ihnen jetzt die Arbeit von unser Team COMPASS vorstellen. Wir fangen am Anfang an: es gibt in Neuchatel das Bundesamt für Statistik und es gibt in der Schweiz verbreitet Universitäten, Fachhochschulen, Hochschulen. Das Bundesamt für Statistik sammelt Daten und verarbeitet sie, wenn publiziert kommen da meistens Tabellen heraus. Sie verfügen aber auch über Datensätze und darum geht es hier: Datensätze sind Datenschätze. Die Universitäten haben Forschenden, studierenden. Sie Forschen und unterrichten. Man könnte fast a priori erahnen / sagen das da ein Interesse für Datensätze vorhanden ist. (sekundäre analyse). An sich wurde man sagen ok: da rufen wir einfach an und frage nach wegen ein Datensatz. Aber: Wo kann soll anrufen, wem soll ich fragen und welche Daten soll ich verlangen, welche sind am meisten geeignet für meine Forschungs/Unterrichtszwecke? Es gibt im BFS keine Abteilung /Kontaktstelle die als Aufgabe hat alles im Überblick dar zu stellen. Die Lücke möchten wir füllen. Natürlich habe ich mir erlaubt die ganze Situation vereinfacht dar zu stellen. Ich werde jetzt etwas mehr ins Detail treten.