SlideShare a Scribd company logo
Anonymising quantitative
data
Dr Sharon Bolton
UK Data Service
UK Data Archive, University of Essex
Anonymising Research Data workshop
Dublin, 22 June 2016
The UK Data Service
• Single point of access to wide range of social science data:
ukdataservice.ac.uk
• Funded by the ESRC to serve the academic community: training
and guidance; UK Data Archive established 1967
• Used by academic researchers and students; government analysts;
charities; business; research centres; think tanks
• Survey microdata; cohort studies; international macrodata; census
data; qualitative/mixed methods data
• Support and guide data creators, including disclosure review
(anonymisation) and preparation for archiving
Protecting confidentiality: the ‘5 Safes’
Five guiding principles:
• Safe people - educate researchers to use data safely
• Safe projects - research projects for ‘public good’
• Safe settings - SecureLab system for sensitive data
• Safe outputs - SecureLab projects outputs screened
• Safe data - treat the data to protect respondent
confidentiality
• For this session, we will concentrate (mostly) on Safe
data
Data collection: planning
• Explain to respondents what archiving entails and gain
agreement for data sharing – informed consent
• Think about disclosure risks before starting – what kind
of information do you need to collect?
• Direct identifiers include: names; addresses; telephone
numbers; email addresses; photos; (perhaps) IP
addresses; do you really need them?
• Unless explicit consent obtained for sharing, direct
identifiers should always be removed from data
Anonymising data: indirect identifiers
Indirect identifiers include:
• Sensitive information: health information/medical
conditions; crime victimisation/offending; drug/alcohol
use etc.
• ‘Less sensitive’ information: age/birth date; educational
characteristics; employment details; religious affiliation;
household size; geographic area
• Look at demographics in combination (e.g.
demographics + geographies)
• Text/string variables – too detailed?
Anonymising indirect identifiers
• Aggregate categories to reduce precision
• Band ages, incomes, expenditure, etc. to disguise outliers
• Use standard coding frames – e.g. SOC2010
• Generalise meaning of detailed text
• Document the changes you make
• Talk to other researchers, archives, data services
Published guides:
• UCD Research Data Management Guide
http://libguides.ucd.ie/data/ethics
• ONS Disclosure control guidance for microdata produced from social
surveys
http://www.ons.gov.uk/methodology/methodologytopicsandstatisticalc
oncepts/disclosurecontrol/policyforsocialsurveymicrodata
Anonymising data: new developments and tools
Statistical Disclosure Control (SDC) software is available:
• mu-Argus
• standalone software package recommended by Eurostat for
government statisticians
• software and manual: http://neon.vb.cbs.nl/casc/mu.htm
• R tool - SDCMicro (GUI)
• Software, manual:
http://www.inside-r.org/packages/cran/sdcMicro/docs/sdcMicro
• new documentation being developed by UK Data Service, working with
R developers
Quiz 1: disclosive text in job title
Job title Frequency Valid Percent
nurse 73 73.0
carer for elderly man 1 1.0
hospital ward cleaner 1 1.0
social science researcher 1 1.0
head of dental practice 2 2.0
cleaner in electronics factory 1 1.0
Financial Director, Sunnyview Care Home,
Colchester
1 1.0
general manager 1 1.0
GP 1 1.0
Manager, Cotterill Village Stores 1 1.0
works in electronics factory 1 1.0
on benefits, not working 1 1.0
police officer 2 2.0
consultant, geriatric psychiatry 1 1.0
Reetired 1 1.0
retired 1 1.0
Retired 1 1.0
retirement 1 1.0
geography teacher 2 2.0
Teacher, music 2 2.0
Seondary school teeacher 1 1.0
unemployed 1 1.0
web designer 2 2.0
Total 100 100.0
Quiz 1: jobs coded with SOC2010
Job title: SOC2010 Frequency Valid Percent
1131: Director, financial 1 1.0
1171: Manager, general 1 1.0
1190: Manager, retail 1 1.0
2231: Nurse 73 73.0
2426: Researcher 1 1.0
2215: Dentist 2 2.0
2211: Doctor, medical 2 2.0
3312: Officer, police 2 2.0
2314 Teacher, secondary 3 3.0
2137: Designer, web 2 2.0
6145: Carer 1 1.0
9139: Worker, factory 1 1.0
9233: Cleaner 2 2.0
Retired 4 4.0
Unemployed 2 2.0
Total 100 100.0
Quiz 2: detailed religion categories
Religious affiliation
Frequency Valid Percent
1 Protestant 41 41.4
2 Anglican 4 4.0
3 Catholic 26 26.3
4 Muslim 8 8.1
5 Sikh 5 5.1
6 Jehovah's Witness 6 6.1
7 Methodist 1 1.0
8 Mormon 1 1.0
9 Baptist 1 1.0
10 Buddhist 3 3.0
11 None 1 1.0
12 No religion 1 1.0
13 Moravian 1 1.0
Total 99 100.0
Quiz 2: religion categories aggregated
Religious affiliation
Frequency Valid Percent
1 Protestant 49 49.0
3 Catholic 26 26.0
4 Muslim 8 8.0
5 Sikh 5 5.0
6 Other religion 10 10.0
7 No religion 2 2.0
Total 100 100.0
Quiz 3: age
in years
Age in years
Frequency Valid Percent
16 3 3.0
17 3 3.0
18 9 9.0
19 9 9.0
20 16 16.0
21 4 4.0
22 2 2.0
23 2 2.0
24 2 2.0
25 2 2.0
26 2 2.0
27 2 2.0
28 2 2.0
29 2 2.0
30 2 2.0
31 1 1.0
32 1 1.0
40 11 11.0
41 1 1.0
42 1 1.0
43 3 3.0
49 1 1.0
50 13 13.0
51 1 1.0
60 1 1.0
61 1 1.0
62 1 1.0
63 1 1.0
64 1 1.0
Total 100 100.0
Quiz 3: banded age
Age (banded)
Frequency Valid Percent
1 16-20 40 40.0
2 21-30 22 22.0
4 41-50 13 13.0
5 51-60 19 19.0
6 60-64 6 6.0
Total 100 100.0
Access control
• Don’t over anonymise - find balance between protecting
respondents’ confidentiality and maintaining research
usability of data
• Can’t fully anonymise data without removing all the
useful detail? Go back to the 5 Safes – think about
access control: Safe people, Safe settings, Safe outputs
Access control
• At UK Data Service, data available under 3 access levels:
• OPEN – open public access
• SAFEGUARDED – downloadable, but use is traceable
• Registered users only (agree not to try to identify any
individual respondents)
• Special agreements/licence: permission-only access;
approved projects – usage agreed in advance
• CONTROLLED – accredited users take a further training course
• Access via on-site safe setting or virtual secure environment
(SecureLab)
• Outputs disclosure-checked before publication
Anonymising quantitative data: summary
• Informed consent
• Think about level of detail needed before data collection
• Remove direct identifiers
• Check and treat indirect identifiers to reduce disclosure
risk
• Document your changes
• Balance anonymisation with access control to preserve
data usability
Questions?
Guidance on anonymisation:
• UCD: http://libguides.ucd.ie/data/ethics
• UKDS: www.data-archive.ac.uk/create-manage/consent-
ethics/anonymisation
• Managing and Sharing Research Data book
https://uk.sagepub.com/en-gb/eur/managing-and-sharing-research-
data/book240297

More Related Content

What's hot

Principles, key responsibilities, and their intersection
Principles, key responsibilities, and their intersectionPrinciples, key responsibilities, and their intersection
Principles, key responsibilities, and their intersection
ARDC
 
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Trilateral Research
 
Denise Esserman MedicReS World Congress 2015
Denise Esserman MedicReS World Congress 2015 Denise Esserman MedicReS World Congress 2015
Denise Esserman MedicReS World Congress 2015
MedicReS
 
Strengthening data sharing for public health: ethical, legal and political is...
Strengthening data sharing for public health: ethical, legal and political is...Strengthening data sharing for public health: ethical, legal and political is...
Strengthening data sharing for public health: ethical, legal and political is...
ExternalEvents
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Philip Bourne
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
datacite
 
Research integrity and data management
Research integrity and data managementResearch integrity and data management
Research integrity and data management
ARDC
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
Jisc
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
Philip Bourne
 
ARC/NHMRC Perspectives on Data Management and Future Direction
ARC/NHMRC Perspectives on Data Management and Future DirectionARC/NHMRC Perspectives on Data Management and Future Direction
ARC/NHMRC Perspectives on Data Management and Future Direction
ARDC
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
Katina Toufexis
 
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Legal and regulatory challenges to  data sharing for clinical genetics and ge...Legal and regulatory challenges to  data sharing for clinical genetics and ge...
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Human Variome Project
 
David B. Resnik MedicReS World Congress 2015
David B. Resnik MedicReS World Congress 2015David B. Resnik MedicReS World Congress 2015
David B. Resnik MedicReS World Congress 2015
MedicReS
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data Things
Katina Toufexis
 
Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
Katina Toufexis
 
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
ARDC
 
NHMRC and data sharing
NHMRC and data sharingNHMRC and data sharing
NHMRC and data sharing
ARDC
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data Things
Katina Toufexis
 
Active research management and sharing
Active research management and sharingActive research management and sharing
Active research management and sharing
Jisc
 
Clinical Trial Data Transparency: Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency:  Explaining Governance for Public Data SharingClinical Trial Data Transparency:  Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency: Explaining Governance for Public Data Sharing
Health Data Consortium
 

What's hot (20)

Principles, key responsibilities, and their intersection
Principles, key responsibilities, and their intersectionPrinciples, key responsibilities, and their intersection
Principles, key responsibilities, and their intersection
 
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
Methodologies for Addressing Privacy and Social Issues in Health Data: A Case...
 
Denise Esserman MedicReS World Congress 2015
Denise Esserman MedicReS World Congress 2015 Denise Esserman MedicReS World Congress 2015
Denise Esserman MedicReS World Congress 2015
 
Strengthening data sharing for public health: ethical, legal and political is...
Strengthening data sharing for public health: ethical, legal and political is...Strengthening data sharing for public health: ethical, legal and political is...
Strengthening data sharing for public health: ethical, legal and political is...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
2013 DataCite Summer Meeting - Closing Keynote: Building Community Engagement...
 
Research integrity and data management
Research integrity and data managementResearch integrity and data management
Research integrity and data management
 
Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014Why science needs open data – Jisc and CNI conference 10 July 2014
Why science needs open data – Jisc and CNI conference 10 July 2014
 
Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?Cancer Research Meets Data Science — What Can We Do Together?
Cancer Research Meets Data Science — What Can We Do Together?
 
ARC/NHMRC Perspectives on Data Management and Future Direction
ARC/NHMRC Perspectives on Data Management and Future DirectionARC/NHMRC Perspectives on Data Management and Future Direction
ARC/NHMRC Perspectives on Data Management and Future Direction
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
Legal and regulatory challenges to data sharing for clinical genetics and ge...
Legal and regulatory challenges to  data sharing for clinical genetics and ge...Legal and regulatory challenges to  data sharing for clinical genetics and ge...
Legal and regulatory challenges to data sharing for clinical genetics and ge...
 
David B. Resnik MedicReS World Congress 2015
David B. Resnik MedicReS World Congress 2015David B. Resnik MedicReS World Congress 2015
David B. Resnik MedicReS World Congress 2015
 
20160523 23 Research Data Things
20160523 23 Research Data Things20160523 23 Research Data Things
20160523 23 Research Data Things
 
Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
 
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
Research Data Management in practice, RIA Data Management Workshop Adelaide 2017
 
NHMRC and data sharing
NHMRC and data sharingNHMRC and data sharing
NHMRC and data sharing
 
20160719 23 Research Data Things
20160719 23 Research Data Things20160719 23 Research Data Things
20160719 23 Research Data Things
 
Active research management and sharing
Active research management and sharingActive research management and sharing
Active research management and sharing
 
Clinical Trial Data Transparency: Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency:  Explaining Governance for Public Data SharingClinical Trial Data Transparency:  Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency: Explaining Governance for Public Data Sharing
 

Similar to Anonymising quantative data

Accessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five SafesAccessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five Safes
Louise Corti
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...
Louise Corti
 
Use of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesUse of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issues
Louise Corti
 
Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...
Jisc
 
Secure Lab at the UK Data Service
Secure Lab at the UK Data ServiceSecure Lab at the UK Data Service
Secure Lab at the UK Data Service
Jisc RDM
 
Safe use of personal data in research
Safe use of personal data in researchSafe use of personal data in research
Safe use of personal data in research
anttipursula
 
Bishop open qual_recode_sheffield_14_15may2015 (3)
Bishop open qual_recode_sheffield_14_15may2015 (3)Bishop open qual_recode_sheffield_14_15may2015 (3)
Bishop open qual_recode_sheffield_14_15may2015 (3)
Thordis Sveinsdottir
 
Making Qualitative Data Open - Libby Bishop, UK Data Service
Making Qualitative Data Open - Libby Bishop, UK Data ServiceMaking Qualitative Data Open - Libby Bishop, UK Data Service
Making Qualitative Data Open - Libby Bishop, UK Data Service
Thordis Sveinsdottir
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) ok
Jisc RDM
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
DataONE
 
20170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 202020170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 2020
OpenAIRE
 
Grampian safe haven, research data network
Grampian safe haven, research data networkGrampian safe haven, research data network
Grampian safe haven, research data network
Jisc RDM
 
Preparing Research Data for Sharing
Preparing Research Data for SharingPreparing Research Data for Sharing
Preparing Research Data for Sharing
London School of Hygiene and Tropical Medicine
 
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise BezuidenhoutIncentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
African Open Science Platform
 
Introduction to data support services and resources for public policy
Introduction to data support services and resources for public policyIntroduction to data support services and resources for public policy
Introduction to data support services and resources for public policy
Historic Environment Scotland
 
Ethical Issues in Social Media in Health (October 30, 2017)
Ethical Issues in Social Media in Health (October 30, 2017)Ethical Issues in Social Media in Health (October 30, 2017)
Ethical Issues in Social Media in Health (October 30, 2017)
Nawanan Theera-Ampornpunt
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Fiona Nielsen
 
Sourcing health data for open-access collection
Sourcing health data for open-access collectionSourcing health data for open-access collection
Sourcing health data for open-access collection
Greg D'Arcy
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
IUPUI
 
FSCI Drivers and Barriers to sharing research data
FSCI Drivers and Barriers to sharing research dataFSCI Drivers and Barriers to sharing research data
FSCI Drivers and Barriers to sharing research data
ARDC
 

Similar to Anonymising quantative data (20)

Accessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five SafesAccessing data for research: data publishing pathways and the Five Safes
Accessing data for research: data publishing pathways and the Five Safes
 
The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...The art of depositing social science data: maximising quality and ensuring go...
The art of depositing social science data: maximising quality and ensuring go...
 
Use of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesUse of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issues
 
Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...Certifying and Securing a Trusted Environment for Health Informatics Research...
Certifying and Securing a Trusted Environment for Health Informatics Research...
 
Secure Lab at the UK Data Service
Secure Lab at the UK Data ServiceSecure Lab at the UK Data Service
Secure Lab at the UK Data Service
 
Safe use of personal data in research
Safe use of personal data in researchSafe use of personal data in research
Safe use of personal data in research
 
Bishop open qual_recode_sheffield_14_15may2015 (3)
Bishop open qual_recode_sheffield_14_15may2015 (3)Bishop open qual_recode_sheffield_14_15may2015 (3)
Bishop open qual_recode_sheffield_14_15may2015 (3)
 
Making Qualitative Data Open - Libby Bishop, UK Data Service
Making Qualitative Data Open - Libby Bishop, UK Data ServiceMaking Qualitative Data Open - Libby Bishop, UK Data Service
Making Qualitative Data Open - Libby Bishop, UK Data Service
 
Stories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) okStories from the Field: Data are Messy and that's (kind of) ok
Stories from the Field: Data are Messy and that's (kind of) ok
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
20170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 202020170530_Open Research Data in Horizon 2020
20170530_Open Research Data in Horizon 2020
 
Grampian safe haven, research data network
Grampian safe haven, research data networkGrampian safe haven, research data network
Grampian safe haven, research data network
 
Preparing Research Data for Sharing
Preparing Research Data for SharingPreparing Research Data for Sharing
Preparing Research Data for Sharing
 
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise BezuidenhoutIncentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
Incentivizing data sharing: a "bottom up" perspective/Louise Bezuidenhout
 
Introduction to data support services and resources for public policy
Introduction to data support services and resources for public policyIntroduction to data support services and resources for public policy
Introduction to data support services and resources for public policy
 
Ethical Issues in Social Media in Health (October 30, 2017)
Ethical Issues in Social Media in Health (October 30, 2017)Ethical Issues in Social Media in Health (October 30, 2017)
Ethical Issues in Social Media in Health (October 30, 2017)
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Sourcing health data for open-access collection
Sourcing health data for open-access collectionSourcing health data for open-access collection
Sourcing health data for open-access collection
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
 
FSCI Drivers and Barriers to sharing research data
FSCI Drivers and Barriers to sharing research dataFSCI Drivers and Barriers to sharing research data
FSCI Drivers and Barriers to sharing research data
 

More from ISSDA

Key issues in reusing data
Key issues in reusing dataKey issues in reusing data
Key issues in reusing data
ISSDA
 
Finding & accessing data
Finding & accessing dataFinding & accessing data
Finding & accessing data
ISSDA
 
Accessing and using TILDA data, available through ISSDA
Accessing and using TILDA data, available through ISSDAAccessing and using TILDA data, available through ISSDA
Accessing and using TILDA data, available through ISSDA
ISSDA
 
Irish Social Science Data Archive Services for Depositors & Researchers
Irish Social Science Data Archive Services for Depositors & ResearchersIrish Social Science Data Archive Services for Depositors & Researchers
Irish Social Science Data Archive Services for Depositors & Researchers
ISSDA
 
Qualitative data – anonymisation for sharing
Qualitative data – anonymisation for sharingQualitative data – anonymisation for sharing
Qualitative data – anonymisation for sharing
ISSDA
 
Sports datasets available in ISSDA
Sports datasets available in ISSDASports datasets available in ISSDA
Sports datasets available in ISSDA
ISSDA
 

More from ISSDA (6)

Key issues in reusing data
Key issues in reusing dataKey issues in reusing data
Key issues in reusing data
 
Finding & accessing data
Finding & accessing dataFinding & accessing data
Finding & accessing data
 
Accessing and using TILDA data, available through ISSDA
Accessing and using TILDA data, available through ISSDAAccessing and using TILDA data, available through ISSDA
Accessing and using TILDA data, available through ISSDA
 
Irish Social Science Data Archive Services for Depositors & Researchers
Irish Social Science Data Archive Services for Depositors & ResearchersIrish Social Science Data Archive Services for Depositors & Researchers
Irish Social Science Data Archive Services for Depositors & Researchers
 
Qualitative data – anonymisation for sharing
Qualitative data – anonymisation for sharingQualitative data – anonymisation for sharing
Qualitative data – anonymisation for sharing
 
Sports datasets available in ISSDA
Sports datasets available in ISSDASports datasets available in ISSDA
Sports datasets available in ISSDA
 

Recently uploaded

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
RAHUL
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
paigestewart1632
 

Recently uploaded (20)

LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPLAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UP
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
Cognitive Development Adolescence Psychology
Cognitive Development Adolescence PsychologyCognitive Development Adolescence Psychology
Cognitive Development Adolescence Psychology
 

Anonymising quantative data

  • 1. Anonymising quantitative data Dr Sharon Bolton UK Data Service UK Data Archive, University of Essex Anonymising Research Data workshop Dublin, 22 June 2016
  • 2. The UK Data Service • Single point of access to wide range of social science data: ukdataservice.ac.uk • Funded by the ESRC to serve the academic community: training and guidance; UK Data Archive established 1967 • Used by academic researchers and students; government analysts; charities; business; research centres; think tanks • Survey microdata; cohort studies; international macrodata; census data; qualitative/mixed methods data • Support and guide data creators, including disclosure review (anonymisation) and preparation for archiving
  • 3. Protecting confidentiality: the ‘5 Safes’ Five guiding principles: • Safe people - educate researchers to use data safely • Safe projects - research projects for ‘public good’ • Safe settings - SecureLab system for sensitive data • Safe outputs - SecureLab projects outputs screened • Safe data - treat the data to protect respondent confidentiality • For this session, we will concentrate (mostly) on Safe data
  • 4. Data collection: planning • Explain to respondents what archiving entails and gain agreement for data sharing – informed consent • Think about disclosure risks before starting – what kind of information do you need to collect? • Direct identifiers include: names; addresses; telephone numbers; email addresses; photos; (perhaps) IP addresses; do you really need them? • Unless explicit consent obtained for sharing, direct identifiers should always be removed from data
  • 5. Anonymising data: indirect identifiers Indirect identifiers include: • Sensitive information: health information/medical conditions; crime victimisation/offending; drug/alcohol use etc. • ‘Less sensitive’ information: age/birth date; educational characteristics; employment details; religious affiliation; household size; geographic area • Look at demographics in combination (e.g. demographics + geographies) • Text/string variables – too detailed?
  • 6. Anonymising indirect identifiers • Aggregate categories to reduce precision • Band ages, incomes, expenditure, etc. to disguise outliers • Use standard coding frames – e.g. SOC2010 • Generalise meaning of detailed text • Document the changes you make • Talk to other researchers, archives, data services Published guides: • UCD Research Data Management Guide http://libguides.ucd.ie/data/ethics • ONS Disclosure control guidance for microdata produced from social surveys http://www.ons.gov.uk/methodology/methodologytopicsandstatisticalc oncepts/disclosurecontrol/policyforsocialsurveymicrodata
  • 7. Anonymising data: new developments and tools Statistical Disclosure Control (SDC) software is available: • mu-Argus • standalone software package recommended by Eurostat for government statisticians • software and manual: http://neon.vb.cbs.nl/casc/mu.htm • R tool - SDCMicro (GUI) • Software, manual: http://www.inside-r.org/packages/cran/sdcMicro/docs/sdcMicro • new documentation being developed by UK Data Service, working with R developers
  • 8. Quiz 1: disclosive text in job title Job title Frequency Valid Percent nurse 73 73.0 carer for elderly man 1 1.0 hospital ward cleaner 1 1.0 social science researcher 1 1.0 head of dental practice 2 2.0 cleaner in electronics factory 1 1.0 Financial Director, Sunnyview Care Home, Colchester 1 1.0 general manager 1 1.0 GP 1 1.0 Manager, Cotterill Village Stores 1 1.0 works in electronics factory 1 1.0 on benefits, not working 1 1.0 police officer 2 2.0 consultant, geriatric psychiatry 1 1.0 Reetired 1 1.0 retired 1 1.0 Retired 1 1.0 retirement 1 1.0 geography teacher 2 2.0 Teacher, music 2 2.0 Seondary school teeacher 1 1.0 unemployed 1 1.0 web designer 2 2.0 Total 100 100.0
  • 9. Quiz 1: jobs coded with SOC2010 Job title: SOC2010 Frequency Valid Percent 1131: Director, financial 1 1.0 1171: Manager, general 1 1.0 1190: Manager, retail 1 1.0 2231: Nurse 73 73.0 2426: Researcher 1 1.0 2215: Dentist 2 2.0 2211: Doctor, medical 2 2.0 3312: Officer, police 2 2.0 2314 Teacher, secondary 3 3.0 2137: Designer, web 2 2.0 6145: Carer 1 1.0 9139: Worker, factory 1 1.0 9233: Cleaner 2 2.0 Retired 4 4.0 Unemployed 2 2.0 Total 100 100.0
  • 10. Quiz 2: detailed religion categories Religious affiliation Frequency Valid Percent 1 Protestant 41 41.4 2 Anglican 4 4.0 3 Catholic 26 26.3 4 Muslim 8 8.1 5 Sikh 5 5.1 6 Jehovah's Witness 6 6.1 7 Methodist 1 1.0 8 Mormon 1 1.0 9 Baptist 1 1.0 10 Buddhist 3 3.0 11 None 1 1.0 12 No religion 1 1.0 13 Moravian 1 1.0 Total 99 100.0
  • 11. Quiz 2: religion categories aggregated Religious affiliation Frequency Valid Percent 1 Protestant 49 49.0 3 Catholic 26 26.0 4 Muslim 8 8.0 5 Sikh 5 5.0 6 Other religion 10 10.0 7 No religion 2 2.0 Total 100 100.0
  • 12. Quiz 3: age in years Age in years Frequency Valid Percent 16 3 3.0 17 3 3.0 18 9 9.0 19 9 9.0 20 16 16.0 21 4 4.0 22 2 2.0 23 2 2.0 24 2 2.0 25 2 2.0 26 2 2.0 27 2 2.0 28 2 2.0 29 2 2.0 30 2 2.0 31 1 1.0 32 1 1.0 40 11 11.0 41 1 1.0 42 1 1.0 43 3 3.0 49 1 1.0 50 13 13.0 51 1 1.0 60 1 1.0 61 1 1.0 62 1 1.0 63 1 1.0 64 1 1.0 Total 100 100.0
  • 13. Quiz 3: banded age Age (banded) Frequency Valid Percent 1 16-20 40 40.0 2 21-30 22 22.0 4 41-50 13 13.0 5 51-60 19 19.0 6 60-64 6 6.0 Total 100 100.0
  • 14. Access control • Don’t over anonymise - find balance between protecting respondents’ confidentiality and maintaining research usability of data • Can’t fully anonymise data without removing all the useful detail? Go back to the 5 Safes – think about access control: Safe people, Safe settings, Safe outputs
  • 15. Access control • At UK Data Service, data available under 3 access levels: • OPEN – open public access • SAFEGUARDED – downloadable, but use is traceable • Registered users only (agree not to try to identify any individual respondents) • Special agreements/licence: permission-only access; approved projects – usage agreed in advance • CONTROLLED – accredited users take a further training course • Access via on-site safe setting or virtual secure environment (SecureLab) • Outputs disclosure-checked before publication
  • 16. Anonymising quantitative data: summary • Informed consent • Think about level of detail needed before data collection • Remove direct identifiers • Check and treat indirect identifiers to reduce disclosure risk • Document your changes • Balance anonymisation with access control to preserve data usability
  • 17. Questions? Guidance on anonymisation: • UCD: http://libguides.ucd.ie/data/ethics • UKDS: www.data-archive.ac.uk/create-manage/consent- ethics/anonymisation • Managing and Sharing Research Data book https://uk.sagepub.com/en-gb/eur/managing-and-sharing-research- data/book240297