SlideShare a Scribd company logo
1 of 27
Personal Identifiable
Information vs Attribute Data
An exploration of dealing with Personal
Identifiable Information in enabling analysis.
This slide deck was compiled by Jonathan Swan, ADR-UK Engineering Lead
Data Engineering and Operations | Data Growth and Operations
Office for National Statistics July 2023
Contents
Background and context Slide 3
Examples (of PII) Slide 9
The Legal Bit Slide 12
Disclosure Control Slide 16
Using PII Slide 23
Basic Definitions
Personal
Identifiable
Information (PII)
• Personally identifiable
information (PII) is
information that, when
used alone or with other
relevant data, can
identify an individual.
Attribute data
• Data that can be used to describe or quantify an object
or entity.
• A characteristic or feature that is measured for each
observation (record) and can vary from one observation
to another. It might measured in continuous values (e.g.
time spent on a web site), or in categorical values (e.g.
red, yellow, green). The terms "attribute" and "feature"
are used in the machine learning community, "variable"
is used in the statistics community. They are synonyms.
• Any data that that are used for statistics, analysis, or
research, to describe a data subject, or structural data to
support such analysis.
Background and context
• DPA – Data Protection Act 2018
• GDPR – General Data Protection Regulations
• SRSA – Statistics and Registration Service Act 2007
• DEA – Digital Economy Act 2017
Relevant legislation
Background and context
Why does it matter?
• Section 64 of the DEA allows sharing of “personal information” for research
purposes. Subsection (3) (a) requires that “the person’s identity is not specified
in the information”
• The third GDPR principle (DPA, S37) requires that processing of personal data is
“relevant and not excessive” i.e. proportionate
• The fifth GDPR principle (DPA S39 (1)) requires that personal data “must be kept
for no longer than is necessary for the purpose for which it is processed”
• In short
• PII can not be shared to approved researchers and
• Processors (like ONS) must be proportionate and time sensitive for personal data
Background and context
So why is PII required in research?
Why can’t it be removed ‘at source’?
• Data held in operational systems is often structured around “unique
identifiers”
• Unique identifiers, such as National Insurance Number, are often PII
• Unique identifiers like National Insurance Numbers (known as NINo) can be
needed to join data across sources
• PII, like names, are essential to good quality matching
• PII, can be used to measure error and bias when matching or joining data
• This includes joining on identifiers (like NHS no) where we know there is error
• PII can be used to create attributes (e.g. address used to derive
geographical data)
Background and context
Implications
• It is illegal to share PII with Approved Researchers
• Separating processing of PII and attributes is (often) a proportionate approach
• Separation reduces the burden on, and protects, the people processing the data
• Yes I can see their names – but I don’t know anything about them
• I can see intimate details about people – but I don’t know who they are
• It’s far less likely I find something out about people, maybe even a friend
• I can’t be accused of leaking personal details, if I can’t see them
Background and context
In context
• Five Safes:
• safe people.
• safe projects.
• safe settings.
• safe data.
• safe outputs.
• Appropriate separation of PII and attributes helps ensure safe data.
Background and context
Examples of PII
PII Attributes
Name
Address
Date of Birth
Email
Phone Number
National Insurance No.
NHS No.
Employer Reference No.
Company Name
Sex
Age at
Post Code
Income
SIC
SOC
Qualification
Number of employees
Examples of PII
Preparing data for analysis
Forename Surname NINo Company ID Sex DOB
Michael Mouse AB123456A Disney1 M 01/05/1928
NINo ADRID
AB123456A XYZ123 ADRID Company ID Sex
Age at
1/1/23
XYZ123 PQ7TH89U M 94
Supress
Lookup Apply
Hash
Function
Derive
Variable
Examples of PII
The Legal Bit
Comparison of Legislation
DPA/GDPR SRSA DEA
Definition “Personal data” means any
information relating to an
identified or identifiable
living individual …
“personal information”
means information which
relates to and identifies a
particular person (including a
body corporate)
… information is “personal information”
if—
(a)it relates to a particular person
(including a body corporate), but
(b)it is not information about the internal
administrative arrangements of a public
authority.
Personal
Information /
data
personal data personal information personal information
Body
Corporate
  
Deceased   
The legal bit
Bodies Corporate
• If you are used to GDPR – the concept of protecting the identity of a
corporate body may seem odd to you. But:
• Sole traders are covered under GDPR, and
• Corporate Bodies are explicitly covered under both the SRSA and DEA – so we
have to avoid identifying them.
• Bodies corporate definitely includes companies and charities, but
• Schools, local authorities, government departments, etc are also included
under the SRSA, and may be covered under the DEA in some circumstances.
• Best to include them as requiring protection of identify,
• But under some specific circumstances it may be possible to share
identifiers.
The legal bit
I see dead people
• The GDPR explicitly refers to “living individual[s]”
• The SRSA is interpreted to include dead people in scope
• The DEA is not explicit, but it should be assumed they are covered
• It is safest to assume the identity of dead people is protected
• But death registrations are public
• And the 100 year rule may apply (like for the Census)
• So it may be possible to use identifiable data on dead people in
specific circumstances.
The legal bit
Disclosure Control
Supressing PII - Part of the story
• Data made available to approved
researchers are de-identified
• Published data must be
anonymous
• Anonymisation is a high
standard and an explicit legal
definition.
• De-identification: The act of
removing identifiers from data
• Anonymous: “information which
does not relate to an identified
or identifiable natural person or
to personal data rendered
anonymous in such a manner
that the data subject is not or no
longer identifiable.” (GDPR)
Disclosure control
Isn’t Pseudonymisation enough?
• In short: NO!
• GDPR defines pseudonymisation: “…the processing of personal data in such
a manner that the personal data can no longer be attributed to a specific
data subject without the use of additional information, provided that such
additional information is kept separately and is subject to technical and
organisational measures to ensure that the personal data are not
attributed to an identified or identifiable natural person.”
• And GDPR says ““…Personal data which have undergone
pseudonymisation, which could be attributed to a natural person by the
use of additional information should be considered to be information on an
identifiable natural person…”
• Pseudonymisation is a risk reduction method only, which is good practice
under certain circumstances.
Disclosure control
De-identification – a little more
• De-identification may involve the removal of postcode or other small
area identifiers (like output area) in order to ensure legislation
compliance or appropriate risk management.
• De-identification may also require other measures, like record
swapping or ‘blurring’ or rounding to prevent identification.
• For some variables removal of extreme outliers is required.
• e.g. Income data ‘capping’ may be required - very high salaries can become
identifiers.
Disclosure control
Other measures
• To ensure legislation compliance, and avoid (re)
identification other measures are required
• Safe Projects avoid re-identification by avoiding
toxic data mixes
• Safe Settings help prevent combining data with
other data to enable re-identification
• The higher the risk – the more stringent the
measure
• These measures help to keep “safe data”
• Disclosure control, as above, ensures safe
outputs/
Five Safes:
• safe people.
• safe projects.
• safe settings.
• safe data.
• safe outputs.
Disclosure control
Publishing (Disclosure) – Issues to be aware off
• Publishing requires data / information are anonymous
• Re-identification must not be possible
• Care with dominance
• Especially for corporate bodes
• Caution for small geographies or other groupings
• Where one or two units provide the majority of a measure within a grouping
• Aggregate tables
• Sufficient aggregation required
• Small values an issue
• Specific requirements for some data sets
• Summary statistics
• Care with point values (max, min, etc)
• Computed statistics or models
• High detail may cause disclosure
• Graphical output
• Care with point values
Disclosure control
Publishing – output can be achieved without
identification
• Individual case studies or examples are possible
• As long as the identity of the individual is not discoverable (by the researcher
or other parties)
• Qualitative results can be achieved
• And may avoid the identification issues that would occur by putting numbers
on the results
Disclosure control
Using PII
PII as a resource
• We cannot share PII to Approved Researchers
• But we can use them to help Researchers achieve their aims
• It is entirely legitimate, and intended, that we process PII
• Matching and joining data
• The obvious way we can help
• But not the end of the story …
Using PII
A brief aside
Hashing:
• The use of a cryptographic hash function to apply a one way
transition of a string of characters to a fixed length encoded string.
• ‘One way encryption of data’
• A secure and repeatable way of transforming text into a ‘random’
string that is practically irreversible.
Using PII
Ways to use PII
• Hashing an ID (e.g. NHS no.) so that data from different sources can
be joined without using identifiers
• Hashing to enable analysis by group (e.g. hash school-name, hospital,
company, etc. to enable analysis at unit level without disclosing the
unit – e.g. is the range of e.g. school performance large)
• Creation of derived variables from PII – e.g. Company name includes
word “partner”, calculating weekday, where a date (e.g. DoB) can not
be shared
• Applying algorithms to derive values, e.g. applying an algorithm
derived from test or anonymised data to real data – e.g. textual
analysis algorithms
Using PII
More ways to use PII
• Measure of error or bias in data
• Particularly linked data
• Including error in identifiers like NINo
• Hashing identifiers to enable frequency type analysis (e.g. does
having a rare name correlate to higher salary?)
• Correlation of PII and attribute – e.g. does forename correlate to a
characteristic (e.g. ethnicity)
• Applying an imputed characteristic or proxy – using name or title to
imply sex
Using PII

More Related Content

Similar to PII.pptx

GDPR Data Life Cycle
GDPR Data Life CycleGDPR Data Life Cycle
GDPR Data Life CycleJatin Kochhar
 
Hivos and Responsible Data
Hivos and Responsible DataHivos and Responsible Data
Hivos and Responsible DataTom Walker
 
Confidential data management_key_concepts
Confidential data management_key_conceptsConfidential data management_key_concepts
Confidential data management_key_conceptsMicah Altman
 
Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...Peter Procházka
 
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data QualityEnabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data QualityEryk Budi Pratama
 
Use of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesUse of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesLouise Corti
 
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
 
GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017Saskia Videler
 
GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? SecurityScorecard
 
Employee monitoring updated
Employee monitoring updatedEmployee monitoring updated
Employee monitoring updatedAdvent IM Ltd
 
The privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analyticsThe privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analyticsDan Michaluk
 
Constraintsand challenges
Constraintsand challengesConstraintsand challenges
Constraintsand challengesjyotikhadake
 
2014 NCSAM - Data Security and Compliance—What You Need to Know.pptx
2014 NCSAM - Data Security and Compliance—What You Need to Know.pptx2014 NCSAM - Data Security and Compliance—What You Need to Know.pptx
2014 NCSAM - Data Security and Compliance—What You Need to Know.pptxVITNetflix
 
How MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceHow MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceMongoDB
 
Continuous PCI and GDPR Compliance With Data-Centric Security
Continuous PCI and GDPR Compliance With Data-Centric SecurityContinuous PCI and GDPR Compliance With Data-Centric Security
Continuous PCI and GDPR Compliance With Data-Centric SecurityTokenEx
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRARDC
 
Deconstructing Data Breach Cost
Deconstructing Data Breach CostDeconstructing Data Breach Cost
Deconstructing Data Breach CostResilient Systems
 
Data Science at Intersection of Security and Privacy
Data Science at Intersection of Security and PrivacyData Science at Intersection of Security and Privacy
Data Science at Intersection of Security and PrivacyTarun Chopra
 

Similar to PII.pptx (20)

GDPR Data Life Cycle
GDPR Data Life CycleGDPR Data Life Cycle
GDPR Data Life Cycle
 
GDPR Data Lifecycle
GDPR Data LifecycleGDPR Data Lifecycle
GDPR Data Lifecycle
 
Hivos and Responsible Data
Hivos and Responsible DataHivos and Responsible Data
Hivos and Responsible Data
 
Confidential data management_key_concepts
Confidential data management_key_conceptsConfidential data management_key_concepts
Confidential data management_key_concepts
 
Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...
 
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data QualityEnabling Data Governance - Data Trust, Data Ethics, Data Quality
Enabling Data Governance - Data Trust, Data Ethics, Data Quality
 
Use of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issuesUse of data in safe havens: ethics and reproducibility issues
Use of data in safe havens: ethics and reproducibility issues
 
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
 
GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017GDPR for Things - ThingsCon Amsterdam 2017
GDPR for Things - ThingsCon Amsterdam 2017
 
GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready?
 
Employee monitoring updated
Employee monitoring updatedEmployee monitoring updated
Employee monitoring updated
 
The privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analyticsThe privacy and security implications of AI, big data and predictive analytics
The privacy and security implications of AI, big data and predictive analytics
 
Constraintsand challenges
Constraintsand challengesConstraintsand challenges
Constraintsand challenges
 
2014 NCSAM - Data Security and Compliance—What You Need to Know.pptx
2014 NCSAM - Data Security and Compliance—What You Need to Know.pptx2014 NCSAM - Data Security and Compliance—What You Need to Know.pptx
2014 NCSAM - Data Security and Compliance—What You Need to Know.pptx
 
How MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR complianceHow MongoDB can accelerate a path to GDPR compliance
How MongoDB can accelerate a path to GDPR compliance
 
Continuous PCI and GDPR Compliance With Data-Centric Security
Continuous PCI and GDPR Compliance With Data-Centric SecurityContinuous PCI and GDPR Compliance With Data-Centric Security
Continuous PCI and GDPR Compliance With Data-Centric Security
 
Sharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSRSharing Confidential Data in ICPSR
Sharing Confidential Data in ICPSR
 
Deconstructing Data Breach Cost
Deconstructing Data Breach CostDeconstructing Data Breach Cost
Deconstructing Data Breach Cost
 
Co3 rsc r5
Co3 rsc r5Co3 rsc r5
Co3 rsc r5
 
Data Science at Intersection of Security and Privacy
Data Science at Intersection of Security and PrivacyData Science at Intersection of Security and Privacy
Data Science at Intersection of Security and Privacy
 

Recently uploaded

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

PII.pptx

  • 1. Personal Identifiable Information vs Attribute Data An exploration of dealing with Personal Identifiable Information in enabling analysis. This slide deck was compiled by Jonathan Swan, ADR-UK Engineering Lead Data Engineering and Operations | Data Growth and Operations Office for National Statistics July 2023
  • 2. Contents Background and context Slide 3 Examples (of PII) Slide 9 The Legal Bit Slide 12 Disclosure Control Slide 16 Using PII Slide 23
  • 3. Basic Definitions Personal Identifiable Information (PII) • Personally identifiable information (PII) is information that, when used alone or with other relevant data, can identify an individual. Attribute data • Data that can be used to describe or quantify an object or entity. • A characteristic or feature that is measured for each observation (record) and can vary from one observation to another. It might measured in continuous values (e.g. time spent on a web site), or in categorical values (e.g. red, yellow, green). The terms "attribute" and "feature" are used in the machine learning community, "variable" is used in the statistics community. They are synonyms. • Any data that that are used for statistics, analysis, or research, to describe a data subject, or structural data to support such analysis. Background and context
  • 4. • DPA – Data Protection Act 2018 • GDPR – General Data Protection Regulations • SRSA – Statistics and Registration Service Act 2007 • DEA – Digital Economy Act 2017 Relevant legislation Background and context
  • 5. Why does it matter? • Section 64 of the DEA allows sharing of “personal information” for research purposes. Subsection (3) (a) requires that “the person’s identity is not specified in the information” • The third GDPR principle (DPA, S37) requires that processing of personal data is “relevant and not excessive” i.e. proportionate • The fifth GDPR principle (DPA S39 (1)) requires that personal data “must be kept for no longer than is necessary for the purpose for which it is processed” • In short • PII can not be shared to approved researchers and • Processors (like ONS) must be proportionate and time sensitive for personal data Background and context
  • 6. So why is PII required in research? Why can’t it be removed ‘at source’? • Data held in operational systems is often structured around “unique identifiers” • Unique identifiers, such as National Insurance Number, are often PII • Unique identifiers like National Insurance Numbers (known as NINo) can be needed to join data across sources • PII, like names, are essential to good quality matching • PII, can be used to measure error and bias when matching or joining data • This includes joining on identifiers (like NHS no) where we know there is error • PII can be used to create attributes (e.g. address used to derive geographical data) Background and context
  • 7. Implications • It is illegal to share PII with Approved Researchers • Separating processing of PII and attributes is (often) a proportionate approach • Separation reduces the burden on, and protects, the people processing the data • Yes I can see their names – but I don’t know anything about them • I can see intimate details about people – but I don’t know who they are • It’s far less likely I find something out about people, maybe even a friend • I can’t be accused of leaking personal details, if I can’t see them Background and context
  • 8. In context • Five Safes: • safe people. • safe projects. • safe settings. • safe data. • safe outputs. • Appropriate separation of PII and attributes helps ensure safe data. Background and context
  • 10. PII Attributes Name Address Date of Birth Email Phone Number National Insurance No. NHS No. Employer Reference No. Company Name Sex Age at Post Code Income SIC SOC Qualification Number of employees Examples of PII
  • 11. Preparing data for analysis Forename Surname NINo Company ID Sex DOB Michael Mouse AB123456A Disney1 M 01/05/1928 NINo ADRID AB123456A XYZ123 ADRID Company ID Sex Age at 1/1/23 XYZ123 PQ7TH89U M 94 Supress Lookup Apply Hash Function Derive Variable Examples of PII
  • 13. Comparison of Legislation DPA/GDPR SRSA DEA Definition “Personal data” means any information relating to an identified or identifiable living individual … “personal information” means information which relates to and identifies a particular person (including a body corporate) … information is “personal information” if— (a)it relates to a particular person (including a body corporate), but (b)it is not information about the internal administrative arrangements of a public authority. Personal Information / data personal data personal information personal information Body Corporate    Deceased    The legal bit
  • 14. Bodies Corporate • If you are used to GDPR – the concept of protecting the identity of a corporate body may seem odd to you. But: • Sole traders are covered under GDPR, and • Corporate Bodies are explicitly covered under both the SRSA and DEA – so we have to avoid identifying them. • Bodies corporate definitely includes companies and charities, but • Schools, local authorities, government departments, etc are also included under the SRSA, and may be covered under the DEA in some circumstances. • Best to include them as requiring protection of identify, • But under some specific circumstances it may be possible to share identifiers. The legal bit
  • 15. I see dead people • The GDPR explicitly refers to “living individual[s]” • The SRSA is interpreted to include dead people in scope • The DEA is not explicit, but it should be assumed they are covered • It is safest to assume the identity of dead people is protected • But death registrations are public • And the 100 year rule may apply (like for the Census) • So it may be possible to use identifiable data on dead people in specific circumstances. The legal bit
  • 17. Supressing PII - Part of the story • Data made available to approved researchers are de-identified • Published data must be anonymous • Anonymisation is a high standard and an explicit legal definition. • De-identification: The act of removing identifiers from data • Anonymous: “information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable.” (GDPR) Disclosure control
  • 18. Isn’t Pseudonymisation enough? • In short: NO! • GDPR defines pseudonymisation: “…the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.” • And GDPR says ““…Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person…” • Pseudonymisation is a risk reduction method only, which is good practice under certain circumstances. Disclosure control
  • 19. De-identification – a little more • De-identification may involve the removal of postcode or other small area identifiers (like output area) in order to ensure legislation compliance or appropriate risk management. • De-identification may also require other measures, like record swapping or ‘blurring’ or rounding to prevent identification. • For some variables removal of extreme outliers is required. • e.g. Income data ‘capping’ may be required - very high salaries can become identifiers. Disclosure control
  • 20. Other measures • To ensure legislation compliance, and avoid (re) identification other measures are required • Safe Projects avoid re-identification by avoiding toxic data mixes • Safe Settings help prevent combining data with other data to enable re-identification • The higher the risk – the more stringent the measure • These measures help to keep “safe data” • Disclosure control, as above, ensures safe outputs/ Five Safes: • safe people. • safe projects. • safe settings. • safe data. • safe outputs. Disclosure control
  • 21. Publishing (Disclosure) – Issues to be aware off • Publishing requires data / information are anonymous • Re-identification must not be possible • Care with dominance • Especially for corporate bodes • Caution for small geographies or other groupings • Where one or two units provide the majority of a measure within a grouping • Aggregate tables • Sufficient aggregation required • Small values an issue • Specific requirements for some data sets • Summary statistics • Care with point values (max, min, etc) • Computed statistics or models • High detail may cause disclosure • Graphical output • Care with point values Disclosure control
  • 22. Publishing – output can be achieved without identification • Individual case studies or examples are possible • As long as the identity of the individual is not discoverable (by the researcher or other parties) • Qualitative results can be achieved • And may avoid the identification issues that would occur by putting numbers on the results Disclosure control
  • 24. PII as a resource • We cannot share PII to Approved Researchers • But we can use them to help Researchers achieve their aims • It is entirely legitimate, and intended, that we process PII • Matching and joining data • The obvious way we can help • But not the end of the story … Using PII
  • 25. A brief aside Hashing: • The use of a cryptographic hash function to apply a one way transition of a string of characters to a fixed length encoded string. • ‘One way encryption of data’ • A secure and repeatable way of transforming text into a ‘random’ string that is practically irreversible. Using PII
  • 26. Ways to use PII • Hashing an ID (e.g. NHS no.) so that data from different sources can be joined without using identifiers • Hashing to enable analysis by group (e.g. hash school-name, hospital, company, etc. to enable analysis at unit level without disclosing the unit – e.g. is the range of e.g. school performance large) • Creation of derived variables from PII – e.g. Company name includes word “partner”, calculating weekday, where a date (e.g. DoB) can not be shared • Applying algorithms to derive values, e.g. applying an algorithm derived from test or anonymised data to real data – e.g. textual analysis algorithms Using PII
  • 27. More ways to use PII • Measure of error or bias in data • Particularly linked data • Including error in identifiers like NINo • Hashing identifiers to enable frequency type analysis (e.g. does having a rare name correlate to higher salary?) • Correlation of PII and attribute – e.g. does forename correlate to a characteristic (e.g. ethnicity) • Applying an imputed characteristic or proxy – using name or title to imply sex Using PII

Editor's Notes

  1. Attribute Data has a different meaning in “lean six sigma” methodology. Data Attributes is a distinct term used in coding.
  2. Illegal to share under the DEA, may be possible under different gateway – under very specific circumstances.
  3. The word anonymisation is frequently misused – it is much more than just removing a name.
  4. Publishing includes removal from safe environment Full detail here is beyond scope – but the issue is relevant to the context of PII
  5. Some of the examples may be a bit tenuous - but are intended to provoke ideas.