SlideShare a Scribd company logo
Data Privatisation, Data
Anonymisation, Data
Pseudonymisation and
Differential Privacy
Alan McSweeney
http://ie.linkedin.com/in/alanmcsweeney
https://www.amazon.com/dp/1797567616
Introduction, Purpose And Scope
• Present details on technology approaches to ensuring
compliance with data privacy
− Anonymisation
− Pseudonymisation
− Differential Privacy
January 4, 2022 2
Data Value
• Your data has value to your organisation and to third-parties
− The data was expensively obtained
− It represents a valuable asset on which a return must be generated
• Similarly third-party data can be used to augment your data to
increase its value
• To achieve the value inherent in the data you need to be able to
make it appropriately available to others, both within and
outside the organisation
• You need a process that enables you to make your data
available as widely as possible without exposing you to risks
associated with non-compliance with the wide range of
differing data privacy regulations
• You need a common approach and framework that works for all
data sharing while guaranteeing legislative and regulatory
compliance
January 4, 2022 3
Data Privatisation, Data Anonymisation, Data
Pseudonymisation and Differential Privacy
• Data privatisation is the removal of personal identifiable
information (PII) from data
• At a very high-level, data privatisation can be achieved in
one or both of two ways:
1. Data Summarisation – sets of individual data records are
compressed into summary statistics
2. Data Tokenisation – the personal data within a dataset that
allows an individual to be identified is replaced by a token
(possibly generated from the personal data such as by hashing),
either permanently (anonymisation) or reversibly
(pseudonymisation)
January 4, 2022 4
Data Privacy And Risk Management
• The concept of risk is at the core of data protection regulations and
legislation
• GDPR contains many references to risk
− For example, GDPR encourages pseudonymisation as a means to “reduce the risks
to the data subjects”
• Appropriate technology appropriately implemented and operated is a
means of managing and reducing risks of re-identification by making the
time, skills, resources and money necessary to achieve this unrealistic
• Accepting that there will always be a residual risk of re-identification even
where data is protected by pseudonymisation, anonymisation and
differential privacy is a realistic approach in light of technology changes
and development
• A demonstrable technology-based approach to data privacy reduces an
organisation’s liability in the event of data breaches
− For example, where a data breach occurs, the controller is exempted from its
notification obligations where it can show that the breach is ‘unlikely to result in a
risk to the rights and freedoms of natural persons“ such as when pseudonymised
data leaks and the re-identification risk is remote
January 4, 2022 5
Data Privatisation – Anonymisation,
Pseudonymisation And Differential Privacy
January 4, 2022 6
Source Data
Differential Privacy
Source data is summarised and
individual personal references are
removed
Summarised Data
Anonymisation
Identifying data is destroyed and
cannot be recovered so individual
cannot be identified
Pseudonymisation
Identifying data is encrypted and
recovery data/token is stored
securely elsewhere
Anonymised Data
Pseudonymised Data
Pseudonymisation
Key
Data Privatisation – Anonymisation,
Pseudonymisation And Differential Privacy
• There are different routes to making data accessible and shareable
within and outside the organisation without compromising
compliance with data protection legislation and regulations and
removing the risk associated with allowing access to personal data
− Differential Privacy – source data is summarised and individual personal
references are removed
• The one-to-one correspondence between original and transformed data has been
removed
− Anonymisation – identifying data is destroyed and cannot be recovered so
individual cannot be identified
• There is still a one-to-one correspondence between original and transformed data
− Pseudonymisation – identifying data is encrypted and recovery data/token is
stored securely elsewhere
• There is still a one-to-one correspondence between original and transformed data
• These technologies and approaches are not mutually exclusive – each
is appropriate to differing data sharing and data access use cases
January 4, 2022 7
Data Privatisation Balance
• Perfect data privacy can be achieved by not sharing or making
accessible any data irrespective of whether it contains personal
identifiable information
− Data is unused
• Perfect data utility can be achieved by sharing and making accessible
all data
− There is no data privacy
• There is a need for a risk-based balancing act between data utility and
data privacy
January 4, 2022 8
Anonymisation And Pseudonymisation
• Functionally anonymisation and pseudonymisation can be
regarded as close
− Anonymisation replaces or deletes identifying data with
unconnected data and destroys link with original data
− Pseudonymisation replaces identifying data with derived data or
token and stores link between original and pseudonymised data
• A similar set of attacks that can be directed against
pseudonymised data can be applied to anonymised data:
− Mosaic attacks
− Differencing attacks
− Reconstruction attacks
January 4, 2022 9
Context Of Data Privatisation – Anonymisation,
Pseudonymisation And Differential Privacy
January 4, 2022 10
Data Privacy
Laws and
Regulations
Technologies
Value in Data
Volumes and
Data Assets
Lots of These and
Increasing in
Number And
Complexity
Mature Secure
Data Sharing and
Exchange
Technologies
Need to Get Value
From Expensively
Generated and
Collected Data
Compliance With
Laws and
Regulations Acting
As An Inhibitor To
Achieving Data
Value
Technologies Can
Embed
Compliance
Technologies
Allow Data to
be Shared
Value To Be
Realised
Data Processes
and Business
Data Trends
Data Needs to be
Shared With
Outsourcing and
Business Partners
Sharing Data Allows
Its Value To Be
Realised
Data Privatisation Topology – Data Privacy Laws and
Regulations
January 4, 2022 11
General Data
Protection
Regulation
(EU)
Personal Data
Protection
(Amendment) Act
(Singapore)
Lei Geral de
Proteção de
Dados (Brasil)
California
Consumer
Privacy Act
(CCPA)
California Privacy
Rights Act (CPRA)
Protection of
Personal
Information
Act (South
Africa)
Data Privacy
Laws and
Regulations
Technologies
Value in Data
Volumes and
Data Assets
Data Processes
and Business
Data Trends
The landscape of data
protection and
privacy legislation
and regulations is
extensive, complex
and growing – this is
just a partial and
incomplete view
Organisations that
share data externally
need to be able to
guarantee
compliance with all
relevant and
applicable legislation
Data Privatisation Topology – Value in Data
Volumes and Data Assets
January 4, 2022 12
Data Privacy
Laws and
Regulations
Technologies
Value in Data
Volumes and
Data Assets
Data Processes
and Business
Data Trends
Data Volumes And
Data Assets and
growing in Size
And Complexity
Need to Share Data
More Widely For
Research Purposes
Open Data
Initiatives
Need to Get
More Value
From Data
Assets
Organisations
have more and
more data of
increasing
complexity that
they want and
need to share in
order to
generate value
Data Privatisation Topology – Technologies
January 4, 2022 13
Data Privacy
Laws and
Regulations
Technologies
Value in Data
Volumes and
Data Assets
Data Processes
and Business
Data Trends
Pseudonymisation
Deidentification
Anonymisation
Differential
Privacy
There are a
range of
well-proven
technologies
available for
ensuring
data privacy
Data Privatisation Topology – Data Processes and
Business Data Trends
January 4, 2022 14
Data Privacy
Laws and
Regulations
Technologies
Value in Data
Volumes and
Data Assets
Data Processes
and Business
Data Trends
Organisations
want to outsource
their business
processes and
share their data
with partners to
gain access to
specialist
analytics and
research skills and
tools
Business
Process
Outsourcing
Third-Party
Analytics
Platforms and
Services
Data Sharing
and Data Value
Extraction
Context Of Data Privatisation – Anonymisation,
Pseudonymisation And Differential Privacy
• Value in Data Volumes and Data Assets – organisations have expended substantial resources in
gathering and processing and generating data
− This data has value that you want to realise by making it more widely available within and outside the
organisation
− The need to comply with the increasing body of data protection and privacy laws inhibits your ability to
achieve this
− Organisations are frequently data rich and information poor, lacking the skills, experience and resources
to convert raw data into value
• Data Privacy Laws and Regulations – you need to ensure that making your data available to a
wider range of individuals and organisations does not breach the ever-increasing set of data
protection and privacy legislation and regulations
− All too frequently the cost of and concerns around ensuring this compliance prevents this wider data
access
− The default approach is not to allow data access and data sharing
− Each data access and data sharing use case has to establish the need separately and independently
• Technologies – data anonymisation, pseudonymisation and differential privacy technologies are
mature, well-proven, industrialised and are independently certified
− They can be used to provide controlled, secure access to your data while guaranteeing compliance with
data protection and privacy legislation
− Using these technologies will embed such compliance by design into your data sharing and access facilities
− This will allow you to realise value from your data successfully
• Data Processes and Business Data Trends – third-party data access and sharing, business
process and other outsourcing activities require data sharing and third-party data access
− Technologies can enable secure data sharing
January 4, 2022 15
Data Trends
January 4, 2022 16
Broader and More
Extended Data
Landscape
More Data Capabilities and
Technologies Available
(Especially Cloud-Based)
And Being Looked For
More Data Demands
From Business
Organisation
More Data Types,
Entities and Greater
Data Landscape
Complexity
Continuously
Changing
Data
Varying Data
Accuracy and
Uncertainty
Greater
Data
Volumes
Different Times
and Rates of
Data
Generation
Wider Range
of Data
Contents
More
Data
Sources
More Data
Formats
Differing Data
Value and
Utility
Increasing Complexity
and Pervasiveness of
Data Privacy Regulations
Greater
Outsourcing and
Associated Need
for Data Sharing
Data Sharing And Third-Party Data Access Use Cases
• There are many data sharing use cases and scenarios that involve the
sharing potential personal identifiable information such as:
1. Share data with other business functions within your organisation
2. Use third-party data processing and storage platform and facilities
3. Use third-party data access and sharing as a service platform and facilities
4. Use third-party data analytics platform and facilities
5. Engage third-party data research organisations to provide specialist services
6. Share data with external researchers
7. Outsource business processes and enable data sharing with third parties
8. Share data with industry business partners to gain industry insights
9. Share data to detect and avoid fraud
10.Share customer data with service providers at the request of the customer
11.Enable customer switching
12.Participate in Open Data initiatives
January 4, 2022 17
All These Data Trends Mean ...
• … We need a mechanism to industrialise and
operationalise the implementation of data privatisation
• That is proven, reliable and secure
• That is applied consistently and pervasively
• That does not need separate privacy impact assessments
and lengthy compliance checks before datasets containing
personal information can be used
Data Privacy By Design And By Default
• Pseudonymisation and differential privacy are proven
technologies that are already in use by large organisations
for data sharing while guaranteeing data privacy
January 4, 2022 18
Data Sharing And Data Privacy Is More Than A
Technology Issue
• There is wider operational data sharing and data privacy
framework that includes technology aspects, among other key
areas
January 4, 2022 19
Data Sharing and
Access Framework
Business and
Strategy Dimension
Overall Objectives,
Purposes and Goals
Data Sharing Strategy
Risk Management,
Governance and
Decision Making
Charges and Payments
Monitoring and
Reporting
Legal Dimension
Data Privacy Legislation
and Regulation
Compliance
Contract Development
and Compliance
Technology
Dimension
Data Sharing and Data
Access Technology
Selection
Technology Standards
Monitoring and
Compliance
Security Standards
Monitoring and
Compliance
Development and
Implementation
Dimension
Technology Platform and
Toolset Selection and
Implementation
Functionality Model
Development and
Implementation
Data Sharing and Access
Implementations
Data Sharing and Access
Maintenance and
Support
Service Management
Dimension
Service Management
Processes
Operational and Service
Level Agreement
Management
Maintain Inventory of
Data Sharing
Arrangements
Service Monitoring and
Reporting
Issue Handing and
Escalation
Data Sharing And Data Privacy Is More Than A
Technology Issue
• Having an overall data privacy management strategy
including a comprehensive data sharing and access
framework is part of the risk management approach
referred to earlier
January 4, 2022 20
Data Breaches And Attacks
• Unlike other attack scenarios, a key concern with data
access and sharing arrangements is that the entity being
provided with legitimate access to the data is the attacker
or the data access control arrangements at the entity are
weak
January 4, 2022 21
Pseudonymisation
• Pseudonymisation is an approach to deidentification where
personally identifiable information (PII) values are replaced
by tokens or artificial identifiers – pseudonyms
• Pseudonymisation is one technique to assist compliance
with EU General Data Protection Regulation (GDPR)
requirements for secure storage of personal information
• Pseudonymised is intended to be reversible – the
pseudonymised data can be restored to its original state
January 4, 2022 22
Pseudonymisation – Field Level Transformation Or
Tokenisation
• Personal data fields can be individually pseudonymised so there is a
one-to-one correspondence between original source data fields and
transformed data fields or the personal data fields can be removed
and replaced with a token
January 4, 2022 23
• IDAT = Identifying Data
• ADAT = Analysis Data
• PIDAT = Pseudonymised Identifying Data
Record IDAT ADAT IDAT ADAT IDAT IDAT ADAT ADAT
1IDAT1.1 ADAT1.1 IDAT2.1 ADAT2.1 IDAT3.1 IDAT4.1 ADAT3.1 ADAT4.1
2IDAT1.2 ADAT1.2 IDAT2.2 ADAT2.2 IDAT3.2 IDAT4.2 ADAT3.2 ADAT4.2
3IDAT1.3 ADAT1.3 IDAT2.3 ADAT2.3 IDAT3.3 IDAT4.3 ADAT3.3 ADAT4.3
4IDAT1.4 ADAT1.4 IDAT2.4 ADAT2.4 IDAT3.4 IDAT4.4 ADAT3.4 ADAT4.4
5IDAT1.5 ADAT1.5 IDAT2.5 ADAT2.5 IDAT3.5 IDAT4.5 ADAT3.5 ADAT4.5
6IDAT1.6 ADAT1.6 IDAT2.6 ADAT2.6 IDAT3.6 IDAT4.6 ADAT3.6 ADAT4.6
7IDAT1.7 ADAT1.7 IDAT2.7 ADAT2.7 IDAT3.7 IDAT4.7 ADAT3.7 ADAT4.7
8IDAT1.8 ADAT1.8 IDAT2.8 ADAT2.8 IDAT3.8 IDAT4.8 ADAT3.8 ADAT4.8
9IDAT1.9 ADAT1.9 IDAT2.9 ADAT2.9 IDAT3.9 IDAT4.9 ADAT3.9 ADAT4.9
10IDAT1.10 ADAT1.10 IDAT2.10 ADAT2.10 IDAT3.10 IDAT4.10 ADAT3.10 ADAT4.10 Record PIDAT ADAT PIDAT ADAT PIDAT PIDAT ADAT ADAT
1 PIDAT1.1 ADAT1.1 PIDAT2.1 ADAT2.1 PIDAT3.1 PIDAT4.1 ADAT3.1 ADAT4.1
2 PIDAT1.2 ADAT1.2 PIDAT2.2 ADAT2.2 PIDAT3.2 PIDAT4.2 ADAT3.2 ADAT4.2
3 PIDAT1.3 ADAT1.3 PIDAT2.3 ADAT2.3 PIDAT3.3 PIDAT4.3 ADAT3.3 ADAT4.3
4 PIDAT1.4 ADAT1.4 PIDAT2.4 ADAT2.4 PIDAT3.4 PIDAT4.4 ADAT3.4 ADAT4.4
5 PIDAT1.5 ADAT1.5 PIDAT2.5 ADAT2.5 PIDAT3.5 PIDAT4.5 ADAT3.5 ADAT4.5
6 PIDAT1.6 ADAT1.6 PIDAT2.6 ADAT2.6 PIDAT3.6 PIDAT4.6 ADAT3.6 ADAT4.6
7 PIDAT1.7 ADAT1.7 PIDAT2.7 ADAT2.7 PIDAT3.7 PIDAT4.7 ADAT3.7 ADAT4.7
8 PIDAT1.8 ADAT1.8 PIDAT2.8 ADAT2.8 PIDAT3.8 PIDAT4.8 ADAT3.8 ADAT4.8
9 PIDAT1.9 ADAT1.9 PIDAT2.9 ADAT2.9 PIDAT3.9 PIDAT4.9 ADAT3.9 ADAT4.9
10 PIDAT1.10 ADAT1.10 PIDAT2.10 ADAT2.10 PIDAT3.10 PIDAT4.10 ADAT3.10 ADAT4.10
Record PIDAT ADAT ADAT ADAT ADAT
1 PIDAT1.1 ADAT11 ADAT11 ADAT11 ADAT11
2 PIDAT1.2 ADAT12 ADAT12 ADAT12 ADAT12
3 PIDAT1.3 ADAT13 ADAT13 ADAT13 ADAT13
4 PIDAT1.4 ADAT14 ADAT14 ADAT14 ADAT14
5 PIDAT1.5 ADAT15 ADAT15 ADAT15 ADAT15
6 PIDAT1.6 ADAT16 ADAT16 ADAT16 ADAT16
7 PIDAT1.7 ADAT17 ADAT17 ADAT17 ADAT17
8 PIDAT1.8 ADAT18 ADAT18 ADAT18 ADAT18
9 PIDAT1.9 ADAT19 ADAT19 ADAT19 ADAT19
10 PIDAT1.10 ADAT110 ADAT110 ADAT110 ADAT110
Option 2 -
Pseudonymisation
Of Partial Original
Identifying Data and
Removal of Other
Identifying Data and
Their Replacement
by a Token
Option 1 -
Pseudonymisation
Of All Original
Identifying Data
GDPR Origin Of Pseudonymisation
• Pseudonymisation” is defined in Article 4(5) of the GDPR
• Means the processing of personal data in such a manner that the personal data can no longer
be attributed to a specific data subject without the use of additional information, provided that
such additional information is kept separately and is subject to technical and organisational
measures to ensure that the personal data are not attributed to an identified or identifiable
natural person
• Article 29 Working Party:
− “pseudonymisation is not a method of anonymisation. It merely reduces the linkability of a dataset with
the original identity of a data subject, and is accordingly a useful security measure.”
• Encryption is a form of pseudonymisation
− The original data cannot be read
− The process cannot be reversed without the correct decryption key
− GDPR requires that this additional information be kept separate from the pseudonymised data.
• Pseudonymisation reduces risks associated with data loss or unauthorised data access
− Pseudonymised data is still regarded as personal data and so remains covered by the GDPR
− It is viewed as part of the Data Protection By Design and By Default principle
• Pseudonymisation is not mandatory
− Implementing pseudonymisation with existing IT systems and processes would be complex and expensive
and, to that extent, pseudonymisation might be considered an example of unnecessary complexity within
the GDPR
January 4, 2022 24
GDPR Origin Of Pseudonymisation
• GDPR Recital 26
− The principles of data protection should apply to any information concerning an identified or
identifiable natural person. Personal data which have undergone pseudonymisation, which
could be attributed to a natural person by the use of additional information should be
considered to be information on an identifiable natural person. To determine whether a natural
person is identifiable, account should be taken of all the means reasonably likely to be used,
such as singling out, either by the controller or by another person to identify the natural person
directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the
natural person, account should be taken of all objective factors, such as the costs of and the
amount of time required for identification, taking into consideration the available technology at
the time of the processing and technological developments. The principles of data protection
should therefore not apply to anonymous information, namely information which does not
relate to an identified or identifiable natural person or to personal data rendered anonymous in
such a manner that the data subject is not or no longer identifiable. This Regulation does not
therefore concern the processing of such anonymous information, including for statistical or
research purposes.
• Pseudonymisation is not anonymisation
− Anonymisation means data cannot be attributed to a person
− Pseudonymisation means data can be attributed to a person using additional information
− Pseudonymisation just makes identifying persons from data more difficult, time-consuming and
expensive
January 4, 2022 25
GDPR Origin Of Pseudonymisation
• Article 89 (1): as a means of enhancing protection in case
of further use of data for research and statistics
• Article 6 (4): as a means of possibly contributing to the
compatibility of further use of data
• Article 25: as a means to contribute to “privacy by design”
in data applications
• Recital 28: “The application of pseudonymisation to
personal data can reduce the risks to the data subjects
concerned and help controllers and processors to meet
their data-protection obligations. The explicit introduction
of ‘pseudonymisation’ in this Regulation is not intended to
preclude any other measures of data protection.”
January 4, 2022 26
Why Pseudonymise?
• Personal identifiable data is pseudonymised when there is
a need to re-identify the data, for example, after it has
been worked on by a third-party
• Steps:
1. Original data
2. Pseudonymised data
3. Pseudonymisation key
4. Pseudonymised data transmitted to data processor
5. Processed data with additional processed data
6. Pseudonymised data with additional processed data returned
7. Original data merged with additional processed data
January 4, 2022 27
Why Pseudonymise?
January 4, 2022 28
• Pseudonymisation is necessary when, for example, data is sent to
another entity, either within or outside the organisation, for processing
and the results of the processing need to be matched to the original data
1 2
3
4
5
6
7
Within Organisation Outside Organisation
Data Access/
Data in
Transit
Pseudonymisation And Data Breaches
January 4, 2022 29
Outside Organisation
Data Breach of Pseudonymised Data
Occurs Here – Risk of Reidentification
of Pseudonymisation Data Should Be
Low
Data Breach of Pseudonymised Data Occurs Here – Risk of
Reidentification of Pseudonymisation Data Should Be Low
Within Organisation
Data Breach of Pseudonymised Key
Occurs Here– Risk of Reidentification of
Pseudonymisation Data Will Be High
Data Breach Of Pseudonymisation
Algorithm Occurs Here – Risk of
Reidentification of Pseudonymisation
Data Will Be High
Pseudonymisation And Data Breaches
• Pseudonymisation does not prevent data breaches but it
can significantly reduce the possibility of any identification
of individuals within the data
• The depseudonymisation key and the algorithm used
including seed values must be kept secure
January 4, 2022 30
Approach To Pseudonymisation
• The approach to pseudonymisation depends of the format
of the source data: text file, spreadsheet, database
• Pseudonymisation is a field-level activity – it is designed to
leave non-personal identifiable information unchanged
• You may want to implement an approach where all data is
converted to a common format before pseudonymisation
to ensure consistency
January 4, 2022 31
Growing Importance Of Pseudonymisation
• Schrems II judgement
− https://curia.europa.eu/juris/document/document.jsf?text=&docid=228677&pageIndex=0&docl
ang=en
• This increased the importance of pseudonymisation in relation to data transfers
outside the EU
− Judgement found that the US FISA (Foreign Intelligence Surveillance Act) does not respect the
minimum safeguards resulting from the principle of proportionality and cannot be regarded as
limited to what is strictly necessary
− While the changes apply to transfers outside the EU, especially the US, they can be adopted
pervasively to all data transfers to ensure consistency
• European Data Protection Board (EDPB) adopted version 2 of its recommendations
on supplementary measures to enhance data transfer arrangement to ensure
compliance with EU personal data protection of personal requirements
− https://edpb.europa.eu/system/files/2021-
06/edpb_recommendations_202001vo.2.0_supplementarymeasurestransferstools_en.pdf
• Pseudonymised data must ensure that:
− Data is protected at the record and data set level as well as the field level so that the protection
travels with the data wherever it is sent
− Direct, indirect, and quasi-identifiers of personal information are protected
− Prevents against mosaic effect re-identification attacks by adding high levels of uncertainty to
pseudonymisation techniques
January 4, 2022 32
Approaches To Pseudonymisation
January 4, 2022 33
Approaches to
Pseudonymisation
Replace IDAT
Fields With
Linking Identifier
Hashing
Hash IDAT Fields
Hash IDAT Fields
With Additional
Salting/Peppering
Generate Hash
From All Contents
Pseudonymisation By Replacing ID Fields With
Linking Identifier (Token)
January 4, 2022 34
• Replaces identifying data with
random value that can be
made available
• Create separate non-accessible
set of data to links random
value to original record
• Original record can be
retrieved using the identifier
• The approach to
pseudonymisation must be
kept secure
• The depseudonymisation key
data must be kept secure
Record
ID
Personal
Data
Analytic
Data
1 IDAT1 ADAT1
2 IDAT2 ADAT2
3 IDAT3 ADAT3
4 IDAT4 ADAT4
5 IDAT5 ADAT5
6 IDAT6 ADAT6
7 IDAT7 ADAT7
8 IDAT8 ADAT8
9 IDAT9 ADAT9
10 IDAT10 ADAT10
Pseudonymised
Personal Data
Identifier
Analytic
Data
189ADAT1
157ADAT2
189ADAT3
252ADAT4
271ADAT5
174ADAT6
196ADAT7
144ADAT8
232ADAT9
210ADAT10
Record
ID
Pseudonymised
Personal Data
Identifier
Personal
Data
1 189IDAT1
2 157IDAT2
3 189IDAT3
4 252IDAT4
5 271IDAT5
6 174IDAT6
7 196IDAT7
8 144IDAT8
9 232IDAT9
10 210IDAT10
Pseudonymised
Data
Additional
Data
Stored To
Allow
Recovery
Of Personal
Data
Source Data
Distributed Data
Depseudonymisation Key
Pseudonymisation By Replacing ID Fields With
Linking Identifier – Multiple ID Fields
January 4, 2022 35
• Replaces identifying
data with random
value that can be
made available
• Multiple sets of
identifying data can be
removed and replaced
with single identifier
Record
ID
Personal
Data 1
Personal
Data 2
Personal
Data 3
Analytic
Data
1IDAT1.1 IDAT2.1 IDAT3.1 ADAT1
2IDAT1.2 IDAT2.2 IDAT3.2 ADAT2
3IDAT1.3 IDAT2.3 IDAT3.3 ADAT3
4IDAT1.4 IDAT2.4 IDAT3.4 ADAT4
5IDAT1.5 IDAT2.5 IDAT3.5 ADAT5
6IDAT1.6 IDAT2.6 IDAT3.6 ADAT6
7IDAT1.7 IDAT2.7 IDAT3.7 ADAT7
8IDAT1.8 IDAT2.8 IDAT3.8 ADAT8
9IDAT1.9 IDAT2.9 IDAT3.9 ADAT9
10IDAT1.10 IDAT2.10 IDAT3.10 ADAT10
Pseudonymised
Personal Data
Identifier
Analytic
Data
189ADAT1
157ADAT2
189ADAT3
252ADAT4
271ADAT5
174ADAT6
196ADAT7
144ADAT8
232ADAT9
210ADAT10
Record
ID
Pseudonymised
Personal Data
Identifier
1 189
2 157
3 189
4 252
5 271
6 174
7 196
8 144
9 232
10 210
Pseudonymised
Data
Additional
Data
Stored To
Allow
Recovery
Of Personal
Data
Source Data
Distributed Data
Depseudonymisation Key
ID Field Hashing Pseudonymisation
January 4, 2022 36
• Replaces identifying data with
a hash code of the data
− SHA3-512(IDAT1) =
576c23e0ec773508ae7a03d1b2
86d75f3a7cfe524625b658a196
1d3fa7b0ebb4cc01b3b530c634
c9525631614ad3ebcb3afb69d3
3e5d8608a1587c2f43c16535
• Input identifying cannot be
recalculated from hash
directly
• Hash values can be easily
calculated (“brute force”
attack) and compared to
pseudonymised values to
generate the original
identifying data
Record
ID
Personal
Data
Analytic
Data
1IDAT1 ADAT1
2IDAT2 ADAT2
3IDAT3 ADAT3
4IDAT4 ADAT4
5IDAT5 ADAT5
6IDAT6 ADAT6
7IDAT7 ADAT7
8IDAT8 ADAT8
9IDAT9 ADAT9
10 IDAT10 ADAT10
Pseudonymised SHA3-512
Personal Data Identifier
Analytic
Data
576c23e0ec…2f43c16535 ADAT1
851e96103a…af098faa80 ADAT2
2c0efa26c6…16d0e11e7a ADAT3
8d189e9f9d…5b536f446e ADAT4
fd3d9477f3…6ff3971823 ADAT5
f056988e7e…672729e376 ADAT6
7421c6c952…c6c7aef649 ADAT7
e271bcb565…838f34f2d0 ADAT8
418830d8b4…5afb7ae575 ADAT9
f90de46242…ab093b5ee5 ADAT10
Record
ID
Pseudonymised SHA3-512
Personal Data Identifier
1576c23e0ec…2f43c16535
2851e96103a…af098faa80
32c0efa26c6…16d0e11e7a
48d189e9f9d…5b536f446e
5fd3d9477f3…6ff3971823
6f056988e7e…672729e376
77421c6c952…c6c7aef649
8e271bcb565…838f34f2d0
9418830d8b4…5afb7ae575
10f90de46242…ab093b5ee5
Pseudonymised
Data
Additional
Data
Stored To
Allow
Recovery
Of Personal
Data
Source Data Distributed Data
Depseudonymisation Key
Hashing And Identifier Codes
• If any of the IDAT fields contains a recognisable identifier code
then brute force hash attacks are very feasible, even with
modest computing resources
• Identifying data tends to be more structured than other data
• For example, consider an identifier code with a format such as:
− AAA-NNN-NNN-C
• Where
− A is an upper-case alphabetic character
− N is a number from 0-9
− C is a check character
• There are 17,576,000,000 possible combinations of this sample
identifier code – this may appear to be a large number
• A single high-specification PC could calculate all the SHA3-512
hash values for these combinations in a few hours
January 4, 2022 37
ID Field Hashing Pseudonymisation With Data Salting
And Peppering
• Salt is an additional different data item added to each
identifying data item before hashing
• Pepper is a fixed item of data added to record or field level
data before hashing
• HASH(CONCATENATE(IDATi+SALTi+PEPPER)) = Hashed
Identifying Data
− SHA3-512(CONCATENATE(IDAT1 +SALT1+PEPPER)) =
3fa075114200b2327092f18067059ba81a5b191b33d5a10a204267
3adcb119fac4dc5d3f63c60d44e132f4db5996d416fd70216d4e055
f1e5ccc0258ff15e1e1
• This approach eliminates almost all the risk from brute
force hash generation attacks unless approach to
generating Salt and Pepper can be determined
January 4, 2022 38
ID Field Hashing Pseudonymisation With Data Salting
And Peppering – Example
• One possible approach is to use a cryptographically secure
pseudo random number generator (PRNG) to generate salt
values such as:
− Fortuna - https://www.schneier.com/academic/fortuna/
− PCG - https://www.pcg-random.org/
• Other less secure PRNGs are vulnerable to attacks
• This ensures that the random salt values are very difficult
to determine which in turn makes brute force attacks
virtually impossible
− HASH(CONCATENATE(IDAT1+1144360296176+2356573852518))
− HASH(CONCATENATE(IDAT2+4700182946372+2356573852518))
− HASH(CONCATENATE(IDAT3+1112492458021+2356573852518))
− HASH(CONCATENATE(IDAT4+2755842713752+2356573852518))
− HASH(CONCATENATE(IDAT5+6908485085952+2356573852518))
January 4, 2022 39
ID Field Hashing Pseudonymisation With Data Salting
And Peppering – Attacks
• HASH(CONCATENATE(IDAT1+1144360296176+2356573852518)) =
c47b08542113284a426e5db9fc19203cc2e464f600005e7fb00e2e
2362088d5107b993cb141696887c17464aaa8d2e99b0aceff3421d
942ae355ae7cedbfe888
January 4, 2022 40
1. Know the structure
of the identifying data
in order to permute its
values
• To identify the value that generated the hash with a brute force
attack you would have to:
2. Know the PRNG algorithm
and its seed values or the
individual salt value
associated with a record
3. Know the
pepper value
ID Field Hashing Pseudonymisation With Data Salting
And Peppering
January 4, 2022 41
• Replaces identifying data
with a hash code of the
data
• Input identifying cannot
be recalculated from
hash directly
• Hash values cannot be
easily calculated (“brute
force” attack) and
compared to
pseudonymised values
to generate the original
identifying data
Record
ID
Personal
Data
Analytic
Data
1IDAT1 ADAT1
2IDAT2 ADAT2
3IDAT3 ADAT3
4IDAT4 ADAT4
5IDAT5 ADAT5
6IDAT6 ADAT6
7IDAT7 ADAT7
8IDAT8 ADAT8
9IDAT9 ADAT9
10 IDAT10 ADAT10
Pseudonymised SHA3-512
Personal Data Identifier
Analytic
Data
3fa0751142…58ff15e1e1 ADAT1
a8bb5547f4…4acdfb8897 ADAT2
23ca9f1638…07b93affcf ADAT3
2891a8d93f…124c7153b7 ADAT4
5245824d14…0802c1c711 ADAT5
f707bc0c7f…20d041329f ADAT6
74d27921d7…7d64cb0368 ADAT7
78d63bd6aa…beb8a13ac9 ADAT8
8e8edb07f5…357f0e548b ADAT9
1e7604e8b4…ffc5bdc796 ADAT10
Record
ID
Pseudonymised SHA3-512
Personal Data Identifier
13fa0751142…58ff15e1e1
2a8bb5547f4…4acdfb8897
323ca9f1638…07b93affcf
42891a8d93f…124c7153b7
55245824d14…0802c1c711
6f707bc0c7f…20d041329f
774d27921d7…7d64cb0368
878d63bd6aa…beb8a13ac9
98e8edb07f5…357f0e548b
101e7604e8b4…ffc5bdc796
Pseudonymised
Data After
Salting and
Peppering
Additional
Data
Stored To
Allow
Recovery
Of Personal
Data
Source Data Distributed Data
Depseudonymisation Key
Content Hashing Pseudonymisation
January 4, 2022 42
• Generate a hash token
value based on the entire
record contents
− SHA3-
512(IDAT1,ADAT1,SALT1,PEPPER)
• This results in a very high
degree of variability in
the source data for the
hashes
• Increases the difficulty of
identifying the source
data that generated the
hash code
Record
ID
Personal
Data
Analytic
Data
1IDAT1 ADAT1
2IDAT2 ADAT2
3IDAT3 ADAT3
4IDAT4 ADAT4
5IDAT5 ADAT5
6IDAT6 ADAT6
7IDAT7 ADAT7
8IDAT8 ADAT8
9IDAT9 ADAT9
10 IDAT10 ADAT10
Pseudonymised SHA3-512
Personal Data Identifier
Analytic
Data
576c23e0ec…2f43c16535 ADAT1
851e96103a…af098faa80 ADAT2
2c0efa26c6…16d0e11e7a ADAT3
8d189e9f9d…5b536f446e ADAT4
fd3d9477f3…6ff3971823 ADAT5
f056988e7e…672729e376 ADAT6
7421c6c952…c6c7aef649 ADAT7
e271bcb565…838f34f2d0 ADAT8
418830d8b4…5afb7ae575 ADAT9
f90de46242…ab093b5ee5 ADAT10
Record
ID
Pseudonymised SHA3-512 Personal
Data Identifier
1HASH(IDAT1,ADAT1,SALT1,PEPPER)
2HASH(IDAT2,ADAT2,SALT2,PEPPER)
3HASH(IDAT3,ADAT3,SALT3,PEPPER)
4HASH(IDAT4,ADAT4,SALT4,PEPPER)
5HASH(IDAT5,ADAT5,SALT5,PEPPER)
6HASH(IDAT6,ADAT6,SALT6,PEPPER)
7HASH(IDAT7,ADAT7,SALT7,PEPPER)
8HASH(IDAT8,ADAT8,SALT8,PEPPER)
9HASH(IDAT9,ADAT9,SALT9,PEPPER)
10HASH(IDAT10,ADAT10,SALT10,PEPPER)
Pseudonymised
Data
Additional
Data
Stored To
Allow
Recovery
Of Personal
Data
Source Data Distributed Data
Depseudonymisation Key
Hashing And Reversibility
• The hash of a value is always the same – there is no randomness in
hashing
• Hashes of very similar input values are very different – very small
input change leads to very large difference in the generated hash
− SHA3-512 – 0.5% change in input value leads to 85%-95% difference in hash
output
− Given two hash values, it is cannot be determined how similar the input
values are or what the structure of the input values might be
− This non-correlation property means the hash function is characterised by
erratic behaviour in its output generation
• Hashing process as a form of pseudonymisation is potentially
vulnerable to brute force attacks as large number of hashes can be
generated very easily and quickly – if you have some knowledge of the
input value you can generate large numbers of permutations and their
hashes and compare values with the known hash to identify the
original value
• Ultimately you have to have the exact input value to generate the
same hash – being very close is of no benefit
January 4, 2022 43
Hashing And Reversibility
• Combining the original data with even a small amount of
randomised data renders brute force attacks of hash values
ineffective
January 4, 2022 44
Hashing And Reversibility
• Small (single character) sample input value changes and hashes generated
January 4, 2022 45
Input SHA3-512 Hash
... no man has the right to fix the boundary of a nation. No man has
the right to say to his country, "Thus far shalt thou go and no further",
and we have never attempted to fix the "ne plus ultra" to the progress
of ...
e0ef7bd38b6b4bc6a27e7260d2162b2ea
58cf5afa5098072d0f735f9d73b67f9b9f6
99b8b098ec41d44e117135e88b3cfb670
876a2f34efd5734e7ce80b64450
... no man has the right to fix the boundary of a nation. No man has
the right to say to his country, "Thus far shalt thou go and no further",
and we have never attempted to fix the "Ne plus ultra" to the
progress of ...
e0ab9f0efb8f4cc2b89b73439f7b1365e6
87b17b7e0bdc0ede00751a5a883ad8ee
0877b9b6a3032ad23521a7bc25a0b199
e5c57cdb2cb5d7500c997e133c41a1
... no man has the right to fix the boundary of a nation. No man has
the right to say to his country, "Thus far shalt thou go and no further",
and we have never attempted to fix the "ne Plus ultra" to the progress
of ...
61361212da56a824559b81409cf02ba5f
8c3bf41d4c8038faa885a183e1bdac1705
eefad72594af1fc3901aa55295c3166eb6
635ca866f1e5cdf56c7ff0fb56a
... no man has the right to fix the boundary of a nation. No man has
the right to say to his country, "Thus far shalt thou go and no further",
and we have never attempted to fix the "ne plus Ultra" to the
progress of ...
833d8b7cc47843cf74fd42cbbf782e8754
3c677ecbdc1f7fe4d7ad9166557fac4c17
d467fa81302a195e60a0a6f3f89c34e03a
5c94eefcb3f19cabcfd87a37ad
Pseudonymisation – Calculation Or Storage
• Storing pseudonymisation values for depseudonymisation
involves a storage overhead
• Pseudonymisation values could be computed to avoid key
storage at the expense of computation overhead
• Computation overhead for generating the
depseudonymisation key for an individual value depends
on the approach to calculating the individual SALT value
• The relative position of the record n must be know or be
fixed to generate the correct SALTn
January 4, 2022 46
Pseudonymisation And Data Lakes/Data Warehouses
January 4, 2022 47
Source Data Pseudonymised Data
Depseudonymisation Key
Data Lake Data Warehouse
1 2
3
4
Pseudonymisation And Data Lakes/Data Warehouses
• Data should be pseudonymised before the data lake
and/or data warehouse is populated as part of the Data
Privacy By Design And By Default approach
• This ensure data privacy by design and by default
• The high-level stages are:
1. As part of the ETL/ELT process, the source data is
pseudonymised and the depseudonymisation key is created
2. The pseudonymised data is passed to the data lake
3. The pseudonymised data created by the ETL/ELT process is
used to update the data warehouse directly
4. The pseudonymised data in the data lake is used to update the
data warehouse
January 4, 2022 48
Differential Privacy
• Differential privacy allows for the (public) sharing of information
about a group or aggregate by describing the patterns of groups
within the group or aggregate while suppressing information about
individuals in the group or aggregate
• A viewer of the information cannot tell if an individual's information
was or was not used in the group or aggregate
• This involves inserting noise into the results returned from a query of
the data
• Well-proven, widely used robust technique
− The Algorithmic Foundations of Differential Privacy -
https://www.cis.upenn.edu/~aaroth/privacybook.html
• Eliminates the possibility of re-identification of individuals from the
dataset
• Individual-specific information is always hidden
• Automates the curation of data
January 4, 2022 49
Differential Privacy Hosted Platform
• This illustrates a logical architecture for a hosted differential privacy
platform where organisation data is moved to the external platform
January 4, 2022 50
Organisation Data
Zone Organisation Application Zone
Data
Extract
Process
Data Sources
Summarised
Data
Metadata
Data Privacy
Computation
Engine
User
Directory
Organisation DMZ
Authorised
Users
Privatised
Analysis Results
Privacy Audit
Logging and
Monitoring
Platform
Performance
and Usage
User
Access
API
Differential
Privacy Platform
User
Directory
Data
Gateway
Data
Visualisation
Interface
Differential Privacy On-Premises Platform
• This illustrates a logical architecture for an on-premises differential
privacy platform where external access to the platform is enabled
January 4, 2022 51

Organisation Data
Zone Organisation Application Zone
Data
Extract
Process
Data Sources
Summarised
Data
Metadata
Data Privacy
Computation
Engine
User
Directory
Organisation DMZ
Authorised
User
Privatised
Analysis
Results
Privacy Audit
Logging and
Monitoring
Platform
Performance
and Usage
User
Access
API
Differential
Privacy Platform
Differential Privacy Logical Architecture
January 4, 2022 52
Authorised
Internal Access
Core Data Privatisation/Differential Privacy
Operational Platform
Data
Access
Connector
Internally Located
Data Source –
Internally Owned
Internally Located
Data Source -
Third Party
Owned
Externally Located
Data Source -
Third Party
Owned
Management
and
Administration
Security and
Access Control
Data Access
Creation,
Validation and
Deployment
Analytics and
Reporting
Data
Access
Connector
Data
Access
Connector
Monitoring,
Logging and
Auditing
Authorised
External Data
Access
Billing System
Interface
User Access API
User Directory
Metadata
Store
Data
Analysis
Data Store
1
2
3
6
9
12
11
14 15 16 17 18
19
Batch Task
Manager
7
Data
Visualisation
Interface
10
13
Access and
Usage Log
8
Data Ingestion
and
Summarisation
5
4
Differential Privacy Logical Architecture –
Components
Item Description
1 Core Data Privatisation/Differential Privacy Operational Platform – this is the core differential privacy platform. This can be installed on-premises or on a
cloud platform. It takes and summarises data from designated data sources and provides different levels of and types of computational access to authorised
users via a data API. It also provides a range of management and administration functions.
2 Data Sources – these represent data held in a variety of databases and other data storage systems. The differential privacy platform needs read-only access to
these data sources.
3 Data Access Connector – these are connectors that enable read-only access to data held in the data sources.
4 Data Ingestion and Summarisation – this takes data from data sources, processes it and outputs in a format suitable for access. It includes features to manage
data ingestion workflows, scheduling and error identification and handing.
5 Data Analysis Data Store – the core differential privacy platform creates pre-summarised versions of the raw data from the data sources. The platform never
provides access to individual source data records. The data is encrypted while at rest in the data store.
6 Metadata Store – the platform creates and stores metadata about each data source. This is used to optimise data privacy of the result sets generated in
response to data queries.
7 Batch Task Manager – in addition to running online data queries, asynchronous batch tasks can be run for longer data tasks.
8 Access and Usage Log – this logs data accesses
9 User Access API – the platform provides an API for common data analytics tools to generate and retrieve privatised randomised sets of data summaries as well
as providing data querying and analytics capabilities. Data results returned from queries is encrypted while in transit.
10 Data Visualisation Interface – this provides a data access and visualisation interface.
11 User Directory – the platform will use you existing user directories for user authentication and authorisation.
12 Authorised Internal Access – authorised internal users can access different datasets and perform different query types depending on their assigned rights.
13 Authorised External Access – authorised external users can access different datasets and perform different query types depending on their assigned rights.
14 Analytics and Reporting – this will allow you analyse and report on users accesses to data managed by the platform.
15 Monitoring, Logging and Auditing – this will log both system events and user activities. This information can be used both for platform management and
planning as well as identifying potential patterns of data use and possible abuse.
16 Data Access Creation, Validation and Deployment – this will allow new data sources to be onboarded and allow existing data sources to be managed and
updated.
17 Management and Administration – this will provide facilities to manage the overall platform such as adding and removing users and user groups and applying
data privacy settings to different datasets.
18 Security and Access Control – this allows the management of different types of user access to different datasets.
19 Billing System Interface – you may want to charge for data access, either at a flat rate or by access or a mix of both. This represents an optional link to a
financial management system to enable this
January 4, 2022 53
Differential Privacy – Privacy Budget
January 4, 2022 54
Summarised Data
Metadata
Data Privacy
Computation
Engine
User
Access
API
Differential
Privacy Platform
Dataset Specific
Privacy Budget
(Privacy Exposure
Limit)
Differential Privacy
Platform Introduces
Fuzziness
(Randomisation)
Into Query Results
Every Query Has
a Privacy Cost
That Is Taken
From the
Dataset Privacy
Budget
Differential Privacy – Privacy Budget
• A differential privacy approach to data privacy assigns a
privacy budget to each dataset
• The differential privacy engine introduces a fuzziness into
the results of queries
− The greater the introduced fuzziness the greater the privacy but
the utility of the results is reduced
• Each query has a privacy cost
• The total privacy expenditure across all queries by all users
is tracked
• When the budget has been spent, no further data queries
can be performed until more privacy budget is allocated
• This provides a warning threshold for differencing attacks
January 4, 2022 55
Differential Privacy – Privacy Budget
• Effective and usable data privatisation and differential
privacy means finding the right balance between data
privacy and data utility
January 4, 2022 56
Level of
Data
Privacy
Amount and
Complexity of Data
Processing Allowed
Level of
Detail
Contained in
Results
Level of
Data
Privacy
Amount and
Complexity of Data
Processing Allowed
Level of
Detail
Contained in
Results
Level of
Data
Privacy
Amount and
Complexity of Data
Processing Allowed
Level of
Detail
Contained in
Results
Differential Privacy And Data Attacks
January 4, 2022 57
Organisation Data
Zone Organisation Application Zone
Data
Extract
Process
Data Sources
Summarised
Data
Metadata
Data Privacy
Computation
Engine
User
Directory
Organisation DMZ
Combine
Results of
Multiple
Queries
Privacy Audit
Logging and
Monitoring
Platform
Performance
and Usage
User
Access
API
Differential
Privacy Platform
User
Directory
Data
Gateway
Data
Visualisation
Interface
Combine
Results With
Other Data
Sources
+
+
Differencing
Attack
Mosaic Effect
Attack
Differencing Attack, Reconstruction Attack And
Mosaic Effect Attack
• A reconstruction attack uses the information from a
differencing attack to identify how the original dataset was
processed to create the summary
− Process is compromised and individual data may be compromised
• Mosaic effect attack involves combining data from other
data (public) sources to identify individuals
• For example, apparently anonymised medical data
containing dates of death can be combined with public
death notice records to identify individual
January 4, 2022 58
Differencing Attack
• Multiple partially-overlapping queries can be run until the
results can be combined to identify an individual
− How many people in the group are aged greater than N?
− How many people in the group aged greater than N have attribute A?
− How many people in the group aged greater than N have attribute B?
− How many people with ages in the range N-9 to N-5 are male?
− How many people with ages in the range N-4 to N are male?
• After a number of queries you may be able to identify
individuals or small numbers of individuals in a given age range
of a given sex have a defined attribute
• Apparently anonymous summary results can be combined to
reveal potentially sensitive insights and comprise confidentiality
• Differential privacy is designed to reduce or eliminate the
threat of differencing attacks
January 4, 2022 59
Differencing Attack
January 4, 2022 60
Summarised
And Reduced
Data Records
Individual Queries On
Summarised Data
Intersection Of
Query Results Can
Allow Individuals To
Be Identified
Differencing Attack, Reconstruction Attack And
Mosaic Effect
January 4, 2022 61
Differencing Attack
Identification Of
Individuals Or Small
Groups
Reconstruction
Attack
Can Provide
Insight
That Allows
Identification Of
Individuals
Can Lead
To
Can
Lead
To
Other Data Sets
Mosaic Effect
Combined
With
Identification Of
Expanded Set Of
Information About
Individuals
Other Data Sets
Used By
Can
Lead
To
Summarised Source
Data
Multiple
Queries Run
Against
Source Dara
Seeks To Understand The
Structure Of The Source Data
Original Source Data
Source Data Processed
to Create Summarised
Data
Summary
• Your data has value to your organisation and to relevant data sharing
partners
− The data was expensively obtained
− It represents a valuable asset on which a return must be generated
• To achieve the value inherent in the data you need to be able to make it
appropriately available to others, both within and outside the organisation
• This has outlined technology approaches to achieving compliance with data
privacy regulations and legislation while providing access to data
• Technology is part of a risk management approach to data privacy
• Using these technologies will embed such compliance by design into your
data sharing and access facilities
• This will allow you to realise value from your data successfully
• The data privacy regulatory landscape is complex and getting even more
complex so an approach to data access and sharing that embeds
compliance as a matter of course is required
• There is wider operational data sharing and data privacy framework that
includes technology aspects, among other key areas
January 4, 2022 62
More Information
Alan McSweeney
http://ie.linkedin.com/in/alanmcsweeney
https://www.amazon.com/dp/1797567616
4 January 2022 63

More Related Content

What's hot

Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data GovernanceJohn Bao Vuu
 
Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Alan McSweeney
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?DATAVERSITY
 
Reference master data management
Reference master data managementReference master data management
Reference master data managementDr. Hamdan Al-Sabri
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewJohn Bao Vuu
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceDATAVERSITY
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceAlation
 
Data Governance and MDM | Profisse, Microsoft, and CCG
Data Governance and MDM | Profisse, Microsoft, and CCGData Governance and MDM | Profisse, Microsoft, and CCG
Data Governance and MDM | Profisse, Microsoft, and CCGCCG
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 

What's hot (20)

Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Introduction to Data Governance
Introduction to Data GovernanceIntroduction to Data Governance
Introduction to Data Governance
 
Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...Data, Information And Knowledge Management Framework And The Data Management ...
Data, Information And Knowledge Management Framework And The Data Management ...
 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?The Evolving Role of the Data Architect – What Does It Mean for Your Career?
The Evolving Role of the Data Architect – What Does It Mean for Your Career?
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
Enterprise Data Management Framework Overview
Enterprise Data Management Framework OverviewEnterprise Data Management Framework Overview
Enterprise Data Management Framework Overview
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Data Governance
Data GovernanceData Governance
Data Governance
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 
Glossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data GovernanceGlossaries, Dictionaries, and Catalogs Result in Data Governance
Glossaries, Dictionaries, and Catalogs Result in Data Governance
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data Intelligence
 
Data Governance and MDM | Profisse, Microsoft, and CCG
Data Governance and MDM | Profisse, Microsoft, and CCGData Governance and MDM | Profisse, Microsoft, and CCG
Data Governance and MDM | Profisse, Microsoft, and CCG
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 

Similar to Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy

Data Privacy Compliance Navigating the Evolving Regulatory Landscape.pdf
Data Privacy Compliance Navigating the Evolving Regulatory Landscape.pdfData Privacy Compliance Navigating the Evolving Regulatory Landscape.pdf
Data Privacy Compliance Navigating the Evolving Regulatory Landscape.pdfCIOWomenMagazine
 
Global Data Privacy Regulation
Global Data Privacy RegulationGlobal Data Privacy Regulation
Global Data Privacy RegulationJatin Kochhar
 
Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...Peter Procházka
 
Introduction to EU General Data Protection Regulation: Planning, Implementat...
 Introduction to EU General Data Protection Regulation: Planning, Implementat... Introduction to EU General Data Protection Regulation: Planning, Implementat...
Introduction to EU General Data Protection Regulation: Planning, Implementat...Financial Poise
 
GDPR Privacy Introduction
GDPR Privacy IntroductionGDPR Privacy Introduction
GDPR Privacy IntroductionNiclasGranqvist
 
Introduction to EU General Data Protection Regulation: Planning, Implementati...
Introduction to EU General Data Protection Regulation: Planning, Implementati...Introduction to EU General Data Protection Regulation: Planning, Implementati...
Introduction to EU General Data Protection Regulation: Planning, Implementati...Financial Poise
 
Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion
 
Sharp Cookie Advisors legal_botar_ai_dataskydd_gdpr
Sharp Cookie Advisors legal_botar_ai_dataskydd_gdprSharp Cookie Advisors legal_botar_ai_dataskydd_gdpr
Sharp Cookie Advisors legal_botar_ai_dataskydd_gdprSharp Cookie Advisors
 
GDPR: Your Journey to Compliance
GDPR: Your Journey to ComplianceGDPR: Your Journey to Compliance
GDPR: Your Journey to ComplianceCobweb
 
CBC GDPR The Physics
CBC GDPR The PhysicsCBC GDPR The Physics
CBC GDPR The PhysicsJason Chapman
 
GDPR Benefits and a Technical Overview
GDPR  Benefits and a Technical OverviewGDPR  Benefits and a Technical Overview
GDPR Benefits and a Technical OverviewErnest Staats
 
Data protection within development
Data protection within developmentData protection within development
Data protection within developmentowaspsuffolk
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarEryk Budi Pratama
 
GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? SecurityScorecard
 

Similar to Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy (20)

GDPR for your Payroll Bureau
GDPR for your Payroll BureauGDPR for your Payroll Bureau
GDPR for your Payroll Bureau
 
Data security and privacy
Data security and privacyData security and privacy
Data security and privacy
 
Data Privacy Compliance Navigating the Evolving Regulatory Landscape.pdf
Data Privacy Compliance Navigating the Evolving Regulatory Landscape.pdfData Privacy Compliance Navigating the Evolving Regulatory Landscape.pdf
Data Privacy Compliance Navigating the Evolving Regulatory Landscape.pdf
 
Global Data Privacy Regulation
Global Data Privacy RegulationGlobal Data Privacy Regulation
Global Data Privacy Regulation
 
Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...Privacy by Design and by Default + General Data Protection Regulation with Si...
Privacy by Design and by Default + General Data Protection Regulation with Si...
 
Introduction to EU General Data Protection Regulation: Planning, Implementat...
 Introduction to EU General Data Protection Regulation: Planning, Implementat... Introduction to EU General Data Protection Regulation: Planning, Implementat...
Introduction to EU General Data Protection Regulation: Planning, Implementat...
 
GDPR Privacy Introduction
GDPR Privacy IntroductionGDPR Privacy Introduction
GDPR Privacy Introduction
 
Prepare Your Firm for GDPR
Prepare Your Firm for GDPRPrepare Your Firm for GDPR
Prepare Your Firm for GDPR
 
Introduction to EU General Data Protection Regulation: Planning, Implementati...
Introduction to EU General Data Protection Regulation: Planning, Implementati...Introduction to EU General Data Protection Regulation: Planning, Implementati...
Introduction to EU General Data Protection Regulation: Planning, Implementati...
 
Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event Vuzion Love Cloud GDPR Event
Vuzion Love Cloud GDPR Event
 
Sharp Cookie Advisors legal_botar_ai_dataskydd_gdpr
Sharp Cookie Advisors legal_botar_ai_dataskydd_gdprSharp Cookie Advisors legal_botar_ai_dataskydd_gdpr
Sharp Cookie Advisors legal_botar_ai_dataskydd_gdpr
 
GDPR: Your Journey to Compliance
GDPR: Your Journey to ComplianceGDPR: Your Journey to Compliance
GDPR: Your Journey to Compliance
 
GDPR: What does it mean for your business?
GDPR: What does it mean for your business?GDPR: What does it mean for your business?
GDPR: What does it mean for your business?
 
CBC GDPR The Physics
CBC GDPR The PhysicsCBC GDPR The Physics
CBC GDPR The Physics
 
GDPR for your Payroll Bureau
GDPR for your Payroll BureauGDPR for your Payroll Bureau
GDPR for your Payroll Bureau
 
GDPR Benefits and a Technical Overview
GDPR  Benefits and a Technical OverviewGDPR  Benefits and a Technical Overview
GDPR Benefits and a Technical Overview
 
Data protection within development
Data protection within developmentData protection within development
Data protection within development
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI Webinar
 
What does GDPR mean for your business?
What does GDPR mean for your business?What does GDPR mean for your business?
What does GDPR mean for your business?
 
GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready? GDPR Enforcement is here. Are you ready?
GDPR Enforcement is here. Are you ready?
 

More from Alan McSweeney

Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdfAlan McSweeney
 
Solution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdfSolution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdfAlan McSweeney
 
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...Alan McSweeney
 
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...Alan McSweeney
 
IT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdfIT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdfAlan McSweeney
 
Solution Architecture And Solution Security
Solution Architecture And Solution SecuritySolution Architecture And Solution Security
Solution Architecture And Solution SecurityAlan McSweeney
 
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
 
Solution Security Architecture
Solution Security ArchitectureSolution Security Architecture
Solution Security ArchitectureAlan McSweeney
 
Solution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation SolutionsSolution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation SolutionsAlan McSweeney
 
Data Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata HarmonisationData Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney
 
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Alan McSweeney
 
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...Alan McSweeney
 
Operational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureOperational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureAlan McSweeney
 
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...Alan McSweeney
 
Ireland 2019 and 2020 Compared - Individual Charts
Ireland   2019 and 2020 Compared - Individual ChartsIreland   2019 and 2020 Compared - Individual Charts
Ireland 2019 and 2020 Compared - Individual ChartsAlan McSweeney
 
Analysis of Irish Mortality Using Public Data Sources 2014-2020
Analysis of Irish Mortality Using Public Data Sources 2014-2020Analysis of Irish Mortality Using Public Data Sources 2014-2020
Analysis of Irish Mortality Using Public Data Sources 2014-2020Alan McSweeney
 
Ireland – 2019 And 2020 Compared In Data
Ireland – 2019 And 2020 Compared In DataIreland – 2019 And 2020 Compared In Data
Ireland – 2019 And 2020 Compared In DataAlan McSweeney
 
Review of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability ModelsReview of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability ModelsAlan McSweeney
 
Critical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureCritical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureAlan McSweeney
 
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020Alan McSweeney
 

More from Alan McSweeney (20)

Data Architecture for Solutions.pdf
Data Architecture for Solutions.pdfData Architecture for Solutions.pdf
Data Architecture for Solutions.pdf
 
Solution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdfSolution Architecture and Solution Estimation.pdf
Solution Architecture and Solution Estimation.pdf
 
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...
 
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...
 
IT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdfIT Architecture’s Role In Solving Technical Debt.pdf
IT Architecture’s Role In Solving Technical Debt.pdf
 
Solution Architecture And Solution Security
Solution Architecture And Solution SecuritySolution Architecture And Solution Security
Solution Architecture And Solution Security
 
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...
 
Solution Security Architecture
Solution Security ArchitectureSolution Security Architecture
Solution Security Architecture
 
Solution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation SolutionsSolution Architecture And (Robotic) Process Automation Solutions
Solution Architecture And (Robotic) Process Automation Solutions
 
Data Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata HarmonisationData Profiling, Data Catalogs and Metadata Harmonisation
Data Profiling, Data Catalogs and Metadata Harmonisation
 
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...
 
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...
 
Operational Risk Management Data Validation Architecture
Operational Risk Management Data Validation ArchitectureOperational Risk Management Data Validation Architecture
Operational Risk Management Data Validation Architecture
 
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...
 
Ireland 2019 and 2020 Compared - Individual Charts
Ireland   2019 and 2020 Compared - Individual ChartsIreland   2019 and 2020 Compared - Individual Charts
Ireland 2019 and 2020 Compared - Individual Charts
 
Analysis of Irish Mortality Using Public Data Sources 2014-2020
Analysis of Irish Mortality Using Public Data Sources 2014-2020Analysis of Irish Mortality Using Public Data Sources 2014-2020
Analysis of Irish Mortality Using Public Data Sources 2014-2020
 
Ireland – 2019 And 2020 Compared In Data
Ireland – 2019 And 2020 Compared In DataIreland – 2019 And 2020 Compared In Data
Ireland – 2019 And 2020 Compared In Data
 
Review of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability ModelsReview of Information Technology Function Critical Capability Models
Review of Information Technology Function Critical Capability Models
 
Critical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference ArchitectureCritical Review of Open Group IT4IT Reference Architecture
Critical Review of Open Group IT4IT Reference Architecture
 
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020
 

Recently uploaded

How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonPayment Village
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxDilipVasan
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJames Polillo
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like BitcoinDOT TECH
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsCEPTES Software Inc
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxNidaFaviankaNawawi
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxbenishzehra469
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理pyhepag
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Calllward7
 

Recently uploaded (20)

How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptxMALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
MALL CUSTOMER SEGMENTATION USING K-MEANS CLUSTERING.pptx
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 

Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy

  • 1. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy Alan McSweeney http://ie.linkedin.com/in/alanmcsweeney https://www.amazon.com/dp/1797567616
  • 2. Introduction, Purpose And Scope • Present details on technology approaches to ensuring compliance with data privacy − Anonymisation − Pseudonymisation − Differential Privacy January 4, 2022 2
  • 3. Data Value • Your data has value to your organisation and to third-parties − The data was expensively obtained − It represents a valuable asset on which a return must be generated • Similarly third-party data can be used to augment your data to increase its value • To achieve the value inherent in the data you need to be able to make it appropriately available to others, both within and outside the organisation • You need a process that enables you to make your data available as widely as possible without exposing you to risks associated with non-compliance with the wide range of differing data privacy regulations • You need a common approach and framework that works for all data sharing while guaranteeing legislative and regulatory compliance January 4, 2022 3
  • 4. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy • Data privatisation is the removal of personal identifiable information (PII) from data • At a very high-level, data privatisation can be achieved in one or both of two ways: 1. Data Summarisation – sets of individual data records are compressed into summary statistics 2. Data Tokenisation – the personal data within a dataset that allows an individual to be identified is replaced by a token (possibly generated from the personal data such as by hashing), either permanently (anonymisation) or reversibly (pseudonymisation) January 4, 2022 4
  • 5. Data Privacy And Risk Management • The concept of risk is at the core of data protection regulations and legislation • GDPR contains many references to risk − For example, GDPR encourages pseudonymisation as a means to “reduce the risks to the data subjects” • Appropriate technology appropriately implemented and operated is a means of managing and reducing risks of re-identification by making the time, skills, resources and money necessary to achieve this unrealistic • Accepting that there will always be a residual risk of re-identification even where data is protected by pseudonymisation, anonymisation and differential privacy is a realistic approach in light of technology changes and development • A demonstrable technology-based approach to data privacy reduces an organisation’s liability in the event of data breaches − For example, where a data breach occurs, the controller is exempted from its notification obligations where it can show that the breach is ‘unlikely to result in a risk to the rights and freedoms of natural persons“ such as when pseudonymised data leaks and the re-identification risk is remote January 4, 2022 5
  • 6. Data Privatisation – Anonymisation, Pseudonymisation And Differential Privacy January 4, 2022 6 Source Data Differential Privacy Source data is summarised and individual personal references are removed Summarised Data Anonymisation Identifying data is destroyed and cannot be recovered so individual cannot be identified Pseudonymisation Identifying data is encrypted and recovery data/token is stored securely elsewhere Anonymised Data Pseudonymised Data Pseudonymisation Key
  • 7. Data Privatisation – Anonymisation, Pseudonymisation And Differential Privacy • There are different routes to making data accessible and shareable within and outside the organisation without compromising compliance with data protection legislation and regulations and removing the risk associated with allowing access to personal data − Differential Privacy – source data is summarised and individual personal references are removed • The one-to-one correspondence between original and transformed data has been removed − Anonymisation – identifying data is destroyed and cannot be recovered so individual cannot be identified • There is still a one-to-one correspondence between original and transformed data − Pseudonymisation – identifying data is encrypted and recovery data/token is stored securely elsewhere • There is still a one-to-one correspondence between original and transformed data • These technologies and approaches are not mutually exclusive – each is appropriate to differing data sharing and data access use cases January 4, 2022 7
  • 8. Data Privatisation Balance • Perfect data privacy can be achieved by not sharing or making accessible any data irrespective of whether it contains personal identifiable information − Data is unused • Perfect data utility can be achieved by sharing and making accessible all data − There is no data privacy • There is a need for a risk-based balancing act between data utility and data privacy January 4, 2022 8
  • 9. Anonymisation And Pseudonymisation • Functionally anonymisation and pseudonymisation can be regarded as close − Anonymisation replaces or deletes identifying data with unconnected data and destroys link with original data − Pseudonymisation replaces identifying data with derived data or token and stores link between original and pseudonymised data • A similar set of attacks that can be directed against pseudonymised data can be applied to anonymised data: − Mosaic attacks − Differencing attacks − Reconstruction attacks January 4, 2022 9
  • 10. Context Of Data Privatisation – Anonymisation, Pseudonymisation And Differential Privacy January 4, 2022 10 Data Privacy Laws and Regulations Technologies Value in Data Volumes and Data Assets Lots of These and Increasing in Number And Complexity Mature Secure Data Sharing and Exchange Technologies Need to Get Value From Expensively Generated and Collected Data Compliance With Laws and Regulations Acting As An Inhibitor To Achieving Data Value Technologies Can Embed Compliance Technologies Allow Data to be Shared Value To Be Realised Data Processes and Business Data Trends Data Needs to be Shared With Outsourcing and Business Partners Sharing Data Allows Its Value To Be Realised
  • 11. Data Privatisation Topology – Data Privacy Laws and Regulations January 4, 2022 11 General Data Protection Regulation (EU) Personal Data Protection (Amendment) Act (Singapore) Lei Geral de Proteção de Dados (Brasil) California Consumer Privacy Act (CCPA) California Privacy Rights Act (CPRA) Protection of Personal Information Act (South Africa) Data Privacy Laws and Regulations Technologies Value in Data Volumes and Data Assets Data Processes and Business Data Trends The landscape of data protection and privacy legislation and regulations is extensive, complex and growing – this is just a partial and incomplete view Organisations that share data externally need to be able to guarantee compliance with all relevant and applicable legislation
  • 12. Data Privatisation Topology – Value in Data Volumes and Data Assets January 4, 2022 12 Data Privacy Laws and Regulations Technologies Value in Data Volumes and Data Assets Data Processes and Business Data Trends Data Volumes And Data Assets and growing in Size And Complexity Need to Share Data More Widely For Research Purposes Open Data Initiatives Need to Get More Value From Data Assets Organisations have more and more data of increasing complexity that they want and need to share in order to generate value
  • 13. Data Privatisation Topology – Technologies January 4, 2022 13 Data Privacy Laws and Regulations Technologies Value in Data Volumes and Data Assets Data Processes and Business Data Trends Pseudonymisation Deidentification Anonymisation Differential Privacy There are a range of well-proven technologies available for ensuring data privacy
  • 14. Data Privatisation Topology – Data Processes and Business Data Trends January 4, 2022 14 Data Privacy Laws and Regulations Technologies Value in Data Volumes and Data Assets Data Processes and Business Data Trends Organisations want to outsource their business processes and share their data with partners to gain access to specialist analytics and research skills and tools Business Process Outsourcing Third-Party Analytics Platforms and Services Data Sharing and Data Value Extraction
  • 15. Context Of Data Privatisation – Anonymisation, Pseudonymisation And Differential Privacy • Value in Data Volumes and Data Assets – organisations have expended substantial resources in gathering and processing and generating data − This data has value that you want to realise by making it more widely available within and outside the organisation − The need to comply with the increasing body of data protection and privacy laws inhibits your ability to achieve this − Organisations are frequently data rich and information poor, lacking the skills, experience and resources to convert raw data into value • Data Privacy Laws and Regulations – you need to ensure that making your data available to a wider range of individuals and organisations does not breach the ever-increasing set of data protection and privacy legislation and regulations − All too frequently the cost of and concerns around ensuring this compliance prevents this wider data access − The default approach is not to allow data access and data sharing − Each data access and data sharing use case has to establish the need separately and independently • Technologies – data anonymisation, pseudonymisation and differential privacy technologies are mature, well-proven, industrialised and are independently certified − They can be used to provide controlled, secure access to your data while guaranteeing compliance with data protection and privacy legislation − Using these technologies will embed such compliance by design into your data sharing and access facilities − This will allow you to realise value from your data successfully • Data Processes and Business Data Trends – third-party data access and sharing, business process and other outsourcing activities require data sharing and third-party data access − Technologies can enable secure data sharing January 4, 2022 15
  • 16. Data Trends January 4, 2022 16 Broader and More Extended Data Landscape More Data Capabilities and Technologies Available (Especially Cloud-Based) And Being Looked For More Data Demands From Business Organisation More Data Types, Entities and Greater Data Landscape Complexity Continuously Changing Data Varying Data Accuracy and Uncertainty Greater Data Volumes Different Times and Rates of Data Generation Wider Range of Data Contents More Data Sources More Data Formats Differing Data Value and Utility Increasing Complexity and Pervasiveness of Data Privacy Regulations Greater Outsourcing and Associated Need for Data Sharing
  • 17. Data Sharing And Third-Party Data Access Use Cases • There are many data sharing use cases and scenarios that involve the sharing potential personal identifiable information such as: 1. Share data with other business functions within your organisation 2. Use third-party data processing and storage platform and facilities 3. Use third-party data access and sharing as a service platform and facilities 4. Use third-party data analytics platform and facilities 5. Engage third-party data research organisations to provide specialist services 6. Share data with external researchers 7. Outsource business processes and enable data sharing with third parties 8. Share data with industry business partners to gain industry insights 9. Share data to detect and avoid fraud 10.Share customer data with service providers at the request of the customer 11.Enable customer switching 12.Participate in Open Data initiatives January 4, 2022 17
  • 18. All These Data Trends Mean ... • … We need a mechanism to industrialise and operationalise the implementation of data privatisation • That is proven, reliable and secure • That is applied consistently and pervasively • That does not need separate privacy impact assessments and lengthy compliance checks before datasets containing personal information can be used Data Privacy By Design And By Default • Pseudonymisation and differential privacy are proven technologies that are already in use by large organisations for data sharing while guaranteeing data privacy January 4, 2022 18
  • 19. Data Sharing And Data Privacy Is More Than A Technology Issue • There is wider operational data sharing and data privacy framework that includes technology aspects, among other key areas January 4, 2022 19 Data Sharing and Access Framework Business and Strategy Dimension Overall Objectives, Purposes and Goals Data Sharing Strategy Risk Management, Governance and Decision Making Charges and Payments Monitoring and Reporting Legal Dimension Data Privacy Legislation and Regulation Compliance Contract Development and Compliance Technology Dimension Data Sharing and Data Access Technology Selection Technology Standards Monitoring and Compliance Security Standards Monitoring and Compliance Development and Implementation Dimension Technology Platform and Toolset Selection and Implementation Functionality Model Development and Implementation Data Sharing and Access Implementations Data Sharing and Access Maintenance and Support Service Management Dimension Service Management Processes Operational and Service Level Agreement Management Maintain Inventory of Data Sharing Arrangements Service Monitoring and Reporting Issue Handing and Escalation
  • 20. Data Sharing And Data Privacy Is More Than A Technology Issue • Having an overall data privacy management strategy including a comprehensive data sharing and access framework is part of the risk management approach referred to earlier January 4, 2022 20
  • 21. Data Breaches And Attacks • Unlike other attack scenarios, a key concern with data access and sharing arrangements is that the entity being provided with legitimate access to the data is the attacker or the data access control arrangements at the entity are weak January 4, 2022 21
  • 22. Pseudonymisation • Pseudonymisation is an approach to deidentification where personally identifiable information (PII) values are replaced by tokens or artificial identifiers – pseudonyms • Pseudonymisation is one technique to assist compliance with EU General Data Protection Regulation (GDPR) requirements for secure storage of personal information • Pseudonymised is intended to be reversible – the pseudonymised data can be restored to its original state January 4, 2022 22
  • 23. Pseudonymisation – Field Level Transformation Or Tokenisation • Personal data fields can be individually pseudonymised so there is a one-to-one correspondence between original source data fields and transformed data fields or the personal data fields can be removed and replaced with a token January 4, 2022 23 • IDAT = Identifying Data • ADAT = Analysis Data • PIDAT = Pseudonymised Identifying Data Record IDAT ADAT IDAT ADAT IDAT IDAT ADAT ADAT 1IDAT1.1 ADAT1.1 IDAT2.1 ADAT2.1 IDAT3.1 IDAT4.1 ADAT3.1 ADAT4.1 2IDAT1.2 ADAT1.2 IDAT2.2 ADAT2.2 IDAT3.2 IDAT4.2 ADAT3.2 ADAT4.2 3IDAT1.3 ADAT1.3 IDAT2.3 ADAT2.3 IDAT3.3 IDAT4.3 ADAT3.3 ADAT4.3 4IDAT1.4 ADAT1.4 IDAT2.4 ADAT2.4 IDAT3.4 IDAT4.4 ADAT3.4 ADAT4.4 5IDAT1.5 ADAT1.5 IDAT2.5 ADAT2.5 IDAT3.5 IDAT4.5 ADAT3.5 ADAT4.5 6IDAT1.6 ADAT1.6 IDAT2.6 ADAT2.6 IDAT3.6 IDAT4.6 ADAT3.6 ADAT4.6 7IDAT1.7 ADAT1.7 IDAT2.7 ADAT2.7 IDAT3.7 IDAT4.7 ADAT3.7 ADAT4.7 8IDAT1.8 ADAT1.8 IDAT2.8 ADAT2.8 IDAT3.8 IDAT4.8 ADAT3.8 ADAT4.8 9IDAT1.9 ADAT1.9 IDAT2.9 ADAT2.9 IDAT3.9 IDAT4.9 ADAT3.9 ADAT4.9 10IDAT1.10 ADAT1.10 IDAT2.10 ADAT2.10 IDAT3.10 IDAT4.10 ADAT3.10 ADAT4.10 Record PIDAT ADAT PIDAT ADAT PIDAT PIDAT ADAT ADAT 1 PIDAT1.1 ADAT1.1 PIDAT2.1 ADAT2.1 PIDAT3.1 PIDAT4.1 ADAT3.1 ADAT4.1 2 PIDAT1.2 ADAT1.2 PIDAT2.2 ADAT2.2 PIDAT3.2 PIDAT4.2 ADAT3.2 ADAT4.2 3 PIDAT1.3 ADAT1.3 PIDAT2.3 ADAT2.3 PIDAT3.3 PIDAT4.3 ADAT3.3 ADAT4.3 4 PIDAT1.4 ADAT1.4 PIDAT2.4 ADAT2.4 PIDAT3.4 PIDAT4.4 ADAT3.4 ADAT4.4 5 PIDAT1.5 ADAT1.5 PIDAT2.5 ADAT2.5 PIDAT3.5 PIDAT4.5 ADAT3.5 ADAT4.5 6 PIDAT1.6 ADAT1.6 PIDAT2.6 ADAT2.6 PIDAT3.6 PIDAT4.6 ADAT3.6 ADAT4.6 7 PIDAT1.7 ADAT1.7 PIDAT2.7 ADAT2.7 PIDAT3.7 PIDAT4.7 ADAT3.7 ADAT4.7 8 PIDAT1.8 ADAT1.8 PIDAT2.8 ADAT2.8 PIDAT3.8 PIDAT4.8 ADAT3.8 ADAT4.8 9 PIDAT1.9 ADAT1.9 PIDAT2.9 ADAT2.9 PIDAT3.9 PIDAT4.9 ADAT3.9 ADAT4.9 10 PIDAT1.10 ADAT1.10 PIDAT2.10 ADAT2.10 PIDAT3.10 PIDAT4.10 ADAT3.10 ADAT4.10 Record PIDAT ADAT ADAT ADAT ADAT 1 PIDAT1.1 ADAT11 ADAT11 ADAT11 ADAT11 2 PIDAT1.2 ADAT12 ADAT12 ADAT12 ADAT12 3 PIDAT1.3 ADAT13 ADAT13 ADAT13 ADAT13 4 PIDAT1.4 ADAT14 ADAT14 ADAT14 ADAT14 5 PIDAT1.5 ADAT15 ADAT15 ADAT15 ADAT15 6 PIDAT1.6 ADAT16 ADAT16 ADAT16 ADAT16 7 PIDAT1.7 ADAT17 ADAT17 ADAT17 ADAT17 8 PIDAT1.8 ADAT18 ADAT18 ADAT18 ADAT18 9 PIDAT1.9 ADAT19 ADAT19 ADAT19 ADAT19 10 PIDAT1.10 ADAT110 ADAT110 ADAT110 ADAT110 Option 2 - Pseudonymisation Of Partial Original Identifying Data and Removal of Other Identifying Data and Their Replacement by a Token Option 1 - Pseudonymisation Of All Original Identifying Data
  • 24. GDPR Origin Of Pseudonymisation • Pseudonymisation” is defined in Article 4(5) of the GDPR • Means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person • Article 29 Working Party: − “pseudonymisation is not a method of anonymisation. It merely reduces the linkability of a dataset with the original identity of a data subject, and is accordingly a useful security measure.” • Encryption is a form of pseudonymisation − The original data cannot be read − The process cannot be reversed without the correct decryption key − GDPR requires that this additional information be kept separate from the pseudonymised data. • Pseudonymisation reduces risks associated with data loss or unauthorised data access − Pseudonymised data is still regarded as personal data and so remains covered by the GDPR − It is viewed as part of the Data Protection By Design and By Default principle • Pseudonymisation is not mandatory − Implementing pseudonymisation with existing IT systems and processes would be complex and expensive and, to that extent, pseudonymisation might be considered an example of unnecessary complexity within the GDPR January 4, 2022 24
  • 25. GDPR Origin Of Pseudonymisation • GDPR Recital 26 − The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person. To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments. The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes. • Pseudonymisation is not anonymisation − Anonymisation means data cannot be attributed to a person − Pseudonymisation means data can be attributed to a person using additional information − Pseudonymisation just makes identifying persons from data more difficult, time-consuming and expensive January 4, 2022 25
  • 26. GDPR Origin Of Pseudonymisation • Article 89 (1): as a means of enhancing protection in case of further use of data for research and statistics • Article 6 (4): as a means of possibly contributing to the compatibility of further use of data • Article 25: as a means to contribute to “privacy by design” in data applications • Recital 28: “The application of pseudonymisation to personal data can reduce the risks to the data subjects concerned and help controllers and processors to meet their data-protection obligations. The explicit introduction of ‘pseudonymisation’ in this Regulation is not intended to preclude any other measures of data protection.” January 4, 2022 26
  • 27. Why Pseudonymise? • Personal identifiable data is pseudonymised when there is a need to re-identify the data, for example, after it has been worked on by a third-party • Steps: 1. Original data 2. Pseudonymised data 3. Pseudonymisation key 4. Pseudonymised data transmitted to data processor 5. Processed data with additional processed data 6. Pseudonymised data with additional processed data returned 7. Original data merged with additional processed data January 4, 2022 27
  • 28. Why Pseudonymise? January 4, 2022 28 • Pseudonymisation is necessary when, for example, data is sent to another entity, either within or outside the organisation, for processing and the results of the processing need to be matched to the original data 1 2 3 4 5 6 7 Within Organisation Outside Organisation Data Access/ Data in Transit
  • 29. Pseudonymisation And Data Breaches January 4, 2022 29 Outside Organisation Data Breach of Pseudonymised Data Occurs Here – Risk of Reidentification of Pseudonymisation Data Should Be Low Data Breach of Pseudonymised Data Occurs Here – Risk of Reidentification of Pseudonymisation Data Should Be Low Within Organisation Data Breach of Pseudonymised Key Occurs Here– Risk of Reidentification of Pseudonymisation Data Will Be High Data Breach Of Pseudonymisation Algorithm Occurs Here – Risk of Reidentification of Pseudonymisation Data Will Be High
  • 30. Pseudonymisation And Data Breaches • Pseudonymisation does not prevent data breaches but it can significantly reduce the possibility of any identification of individuals within the data • The depseudonymisation key and the algorithm used including seed values must be kept secure January 4, 2022 30
  • 31. Approach To Pseudonymisation • The approach to pseudonymisation depends of the format of the source data: text file, spreadsheet, database • Pseudonymisation is a field-level activity – it is designed to leave non-personal identifiable information unchanged • You may want to implement an approach where all data is converted to a common format before pseudonymisation to ensure consistency January 4, 2022 31
  • 32. Growing Importance Of Pseudonymisation • Schrems II judgement − https://curia.europa.eu/juris/document/document.jsf?text=&docid=228677&pageIndex=0&docl ang=en • This increased the importance of pseudonymisation in relation to data transfers outside the EU − Judgement found that the US FISA (Foreign Intelligence Surveillance Act) does not respect the minimum safeguards resulting from the principle of proportionality and cannot be regarded as limited to what is strictly necessary − While the changes apply to transfers outside the EU, especially the US, they can be adopted pervasively to all data transfers to ensure consistency • European Data Protection Board (EDPB) adopted version 2 of its recommendations on supplementary measures to enhance data transfer arrangement to ensure compliance with EU personal data protection of personal requirements − https://edpb.europa.eu/system/files/2021- 06/edpb_recommendations_202001vo.2.0_supplementarymeasurestransferstools_en.pdf • Pseudonymised data must ensure that: − Data is protected at the record and data set level as well as the field level so that the protection travels with the data wherever it is sent − Direct, indirect, and quasi-identifiers of personal information are protected − Prevents against mosaic effect re-identification attacks by adding high levels of uncertainty to pseudonymisation techniques January 4, 2022 32
  • 33. Approaches To Pseudonymisation January 4, 2022 33 Approaches to Pseudonymisation Replace IDAT Fields With Linking Identifier Hashing Hash IDAT Fields Hash IDAT Fields With Additional Salting/Peppering Generate Hash From All Contents
  • 34. Pseudonymisation By Replacing ID Fields With Linking Identifier (Token) January 4, 2022 34 • Replaces identifying data with random value that can be made available • Create separate non-accessible set of data to links random value to original record • Original record can be retrieved using the identifier • The approach to pseudonymisation must be kept secure • The depseudonymisation key data must be kept secure Record ID Personal Data Analytic Data 1 IDAT1 ADAT1 2 IDAT2 ADAT2 3 IDAT3 ADAT3 4 IDAT4 ADAT4 5 IDAT5 ADAT5 6 IDAT6 ADAT6 7 IDAT7 ADAT7 8 IDAT8 ADAT8 9 IDAT9 ADAT9 10 IDAT10 ADAT10 Pseudonymised Personal Data Identifier Analytic Data 189ADAT1 157ADAT2 189ADAT3 252ADAT4 271ADAT5 174ADAT6 196ADAT7 144ADAT8 232ADAT9 210ADAT10 Record ID Pseudonymised Personal Data Identifier Personal Data 1 189IDAT1 2 157IDAT2 3 189IDAT3 4 252IDAT4 5 271IDAT5 6 174IDAT6 7 196IDAT7 8 144IDAT8 9 232IDAT9 10 210IDAT10 Pseudonymised Data Additional Data Stored To Allow Recovery Of Personal Data Source Data Distributed Data Depseudonymisation Key
  • 35. Pseudonymisation By Replacing ID Fields With Linking Identifier – Multiple ID Fields January 4, 2022 35 • Replaces identifying data with random value that can be made available • Multiple sets of identifying data can be removed and replaced with single identifier Record ID Personal Data 1 Personal Data 2 Personal Data 3 Analytic Data 1IDAT1.1 IDAT2.1 IDAT3.1 ADAT1 2IDAT1.2 IDAT2.2 IDAT3.2 ADAT2 3IDAT1.3 IDAT2.3 IDAT3.3 ADAT3 4IDAT1.4 IDAT2.4 IDAT3.4 ADAT4 5IDAT1.5 IDAT2.5 IDAT3.5 ADAT5 6IDAT1.6 IDAT2.6 IDAT3.6 ADAT6 7IDAT1.7 IDAT2.7 IDAT3.7 ADAT7 8IDAT1.8 IDAT2.8 IDAT3.8 ADAT8 9IDAT1.9 IDAT2.9 IDAT3.9 ADAT9 10IDAT1.10 IDAT2.10 IDAT3.10 ADAT10 Pseudonymised Personal Data Identifier Analytic Data 189ADAT1 157ADAT2 189ADAT3 252ADAT4 271ADAT5 174ADAT6 196ADAT7 144ADAT8 232ADAT9 210ADAT10 Record ID Pseudonymised Personal Data Identifier 1 189 2 157 3 189 4 252 5 271 6 174 7 196 8 144 9 232 10 210 Pseudonymised Data Additional Data Stored To Allow Recovery Of Personal Data Source Data Distributed Data Depseudonymisation Key
  • 36. ID Field Hashing Pseudonymisation January 4, 2022 36 • Replaces identifying data with a hash code of the data − SHA3-512(IDAT1) = 576c23e0ec773508ae7a03d1b2 86d75f3a7cfe524625b658a196 1d3fa7b0ebb4cc01b3b530c634 c9525631614ad3ebcb3afb69d3 3e5d8608a1587c2f43c16535 • Input identifying cannot be recalculated from hash directly • Hash values can be easily calculated (“brute force” attack) and compared to pseudonymised values to generate the original identifying data Record ID Personal Data Analytic Data 1IDAT1 ADAT1 2IDAT2 ADAT2 3IDAT3 ADAT3 4IDAT4 ADAT4 5IDAT5 ADAT5 6IDAT6 ADAT6 7IDAT7 ADAT7 8IDAT8 ADAT8 9IDAT9 ADAT9 10 IDAT10 ADAT10 Pseudonymised SHA3-512 Personal Data Identifier Analytic Data 576c23e0ec…2f43c16535 ADAT1 851e96103a…af098faa80 ADAT2 2c0efa26c6…16d0e11e7a ADAT3 8d189e9f9d…5b536f446e ADAT4 fd3d9477f3…6ff3971823 ADAT5 f056988e7e…672729e376 ADAT6 7421c6c952…c6c7aef649 ADAT7 e271bcb565…838f34f2d0 ADAT8 418830d8b4…5afb7ae575 ADAT9 f90de46242…ab093b5ee5 ADAT10 Record ID Pseudonymised SHA3-512 Personal Data Identifier 1576c23e0ec…2f43c16535 2851e96103a…af098faa80 32c0efa26c6…16d0e11e7a 48d189e9f9d…5b536f446e 5fd3d9477f3…6ff3971823 6f056988e7e…672729e376 77421c6c952…c6c7aef649 8e271bcb565…838f34f2d0 9418830d8b4…5afb7ae575 10f90de46242…ab093b5ee5 Pseudonymised Data Additional Data Stored To Allow Recovery Of Personal Data Source Data Distributed Data Depseudonymisation Key
  • 37. Hashing And Identifier Codes • If any of the IDAT fields contains a recognisable identifier code then brute force hash attacks are very feasible, even with modest computing resources • Identifying data tends to be more structured than other data • For example, consider an identifier code with a format such as: − AAA-NNN-NNN-C • Where − A is an upper-case alphabetic character − N is a number from 0-9 − C is a check character • There are 17,576,000,000 possible combinations of this sample identifier code – this may appear to be a large number • A single high-specification PC could calculate all the SHA3-512 hash values for these combinations in a few hours January 4, 2022 37
  • 38. ID Field Hashing Pseudonymisation With Data Salting And Peppering • Salt is an additional different data item added to each identifying data item before hashing • Pepper is a fixed item of data added to record or field level data before hashing • HASH(CONCATENATE(IDATi+SALTi+PEPPER)) = Hashed Identifying Data − SHA3-512(CONCATENATE(IDAT1 +SALT1+PEPPER)) = 3fa075114200b2327092f18067059ba81a5b191b33d5a10a204267 3adcb119fac4dc5d3f63c60d44e132f4db5996d416fd70216d4e055 f1e5ccc0258ff15e1e1 • This approach eliminates almost all the risk from brute force hash generation attacks unless approach to generating Salt and Pepper can be determined January 4, 2022 38
  • 39. ID Field Hashing Pseudonymisation With Data Salting And Peppering – Example • One possible approach is to use a cryptographically secure pseudo random number generator (PRNG) to generate salt values such as: − Fortuna - https://www.schneier.com/academic/fortuna/ − PCG - https://www.pcg-random.org/ • Other less secure PRNGs are vulnerable to attacks • This ensures that the random salt values are very difficult to determine which in turn makes brute force attacks virtually impossible − HASH(CONCATENATE(IDAT1+1144360296176+2356573852518)) − HASH(CONCATENATE(IDAT2+4700182946372+2356573852518)) − HASH(CONCATENATE(IDAT3+1112492458021+2356573852518)) − HASH(CONCATENATE(IDAT4+2755842713752+2356573852518)) − HASH(CONCATENATE(IDAT5+6908485085952+2356573852518)) January 4, 2022 39
  • 40. ID Field Hashing Pseudonymisation With Data Salting And Peppering – Attacks • HASH(CONCATENATE(IDAT1+1144360296176+2356573852518)) = c47b08542113284a426e5db9fc19203cc2e464f600005e7fb00e2e 2362088d5107b993cb141696887c17464aaa8d2e99b0aceff3421d 942ae355ae7cedbfe888 January 4, 2022 40 1. Know the structure of the identifying data in order to permute its values • To identify the value that generated the hash with a brute force attack you would have to: 2. Know the PRNG algorithm and its seed values or the individual salt value associated with a record 3. Know the pepper value
  • 41. ID Field Hashing Pseudonymisation With Data Salting And Peppering January 4, 2022 41 • Replaces identifying data with a hash code of the data • Input identifying cannot be recalculated from hash directly • Hash values cannot be easily calculated (“brute force” attack) and compared to pseudonymised values to generate the original identifying data Record ID Personal Data Analytic Data 1IDAT1 ADAT1 2IDAT2 ADAT2 3IDAT3 ADAT3 4IDAT4 ADAT4 5IDAT5 ADAT5 6IDAT6 ADAT6 7IDAT7 ADAT7 8IDAT8 ADAT8 9IDAT9 ADAT9 10 IDAT10 ADAT10 Pseudonymised SHA3-512 Personal Data Identifier Analytic Data 3fa0751142…58ff15e1e1 ADAT1 a8bb5547f4…4acdfb8897 ADAT2 23ca9f1638…07b93affcf ADAT3 2891a8d93f…124c7153b7 ADAT4 5245824d14…0802c1c711 ADAT5 f707bc0c7f…20d041329f ADAT6 74d27921d7…7d64cb0368 ADAT7 78d63bd6aa…beb8a13ac9 ADAT8 8e8edb07f5…357f0e548b ADAT9 1e7604e8b4…ffc5bdc796 ADAT10 Record ID Pseudonymised SHA3-512 Personal Data Identifier 13fa0751142…58ff15e1e1 2a8bb5547f4…4acdfb8897 323ca9f1638…07b93affcf 42891a8d93f…124c7153b7 55245824d14…0802c1c711 6f707bc0c7f…20d041329f 774d27921d7…7d64cb0368 878d63bd6aa…beb8a13ac9 98e8edb07f5…357f0e548b 101e7604e8b4…ffc5bdc796 Pseudonymised Data After Salting and Peppering Additional Data Stored To Allow Recovery Of Personal Data Source Data Distributed Data Depseudonymisation Key
  • 42. Content Hashing Pseudonymisation January 4, 2022 42 • Generate a hash token value based on the entire record contents − SHA3- 512(IDAT1,ADAT1,SALT1,PEPPER) • This results in a very high degree of variability in the source data for the hashes • Increases the difficulty of identifying the source data that generated the hash code Record ID Personal Data Analytic Data 1IDAT1 ADAT1 2IDAT2 ADAT2 3IDAT3 ADAT3 4IDAT4 ADAT4 5IDAT5 ADAT5 6IDAT6 ADAT6 7IDAT7 ADAT7 8IDAT8 ADAT8 9IDAT9 ADAT9 10 IDAT10 ADAT10 Pseudonymised SHA3-512 Personal Data Identifier Analytic Data 576c23e0ec…2f43c16535 ADAT1 851e96103a…af098faa80 ADAT2 2c0efa26c6…16d0e11e7a ADAT3 8d189e9f9d…5b536f446e ADAT4 fd3d9477f3…6ff3971823 ADAT5 f056988e7e…672729e376 ADAT6 7421c6c952…c6c7aef649 ADAT7 e271bcb565…838f34f2d0 ADAT8 418830d8b4…5afb7ae575 ADAT9 f90de46242…ab093b5ee5 ADAT10 Record ID Pseudonymised SHA3-512 Personal Data Identifier 1HASH(IDAT1,ADAT1,SALT1,PEPPER) 2HASH(IDAT2,ADAT2,SALT2,PEPPER) 3HASH(IDAT3,ADAT3,SALT3,PEPPER) 4HASH(IDAT4,ADAT4,SALT4,PEPPER) 5HASH(IDAT5,ADAT5,SALT5,PEPPER) 6HASH(IDAT6,ADAT6,SALT6,PEPPER) 7HASH(IDAT7,ADAT7,SALT7,PEPPER) 8HASH(IDAT8,ADAT8,SALT8,PEPPER) 9HASH(IDAT9,ADAT9,SALT9,PEPPER) 10HASH(IDAT10,ADAT10,SALT10,PEPPER) Pseudonymised Data Additional Data Stored To Allow Recovery Of Personal Data Source Data Distributed Data Depseudonymisation Key
  • 43. Hashing And Reversibility • The hash of a value is always the same – there is no randomness in hashing • Hashes of very similar input values are very different – very small input change leads to very large difference in the generated hash − SHA3-512 – 0.5% change in input value leads to 85%-95% difference in hash output − Given two hash values, it is cannot be determined how similar the input values are or what the structure of the input values might be − This non-correlation property means the hash function is characterised by erratic behaviour in its output generation • Hashing process as a form of pseudonymisation is potentially vulnerable to brute force attacks as large number of hashes can be generated very easily and quickly – if you have some knowledge of the input value you can generate large numbers of permutations and their hashes and compare values with the known hash to identify the original value • Ultimately you have to have the exact input value to generate the same hash – being very close is of no benefit January 4, 2022 43
  • 44. Hashing And Reversibility • Combining the original data with even a small amount of randomised data renders brute force attacks of hash values ineffective January 4, 2022 44
  • 45. Hashing And Reversibility • Small (single character) sample input value changes and hashes generated January 4, 2022 45 Input SHA3-512 Hash ... no man has the right to fix the boundary of a nation. No man has the right to say to his country, "Thus far shalt thou go and no further", and we have never attempted to fix the "ne plus ultra" to the progress of ... e0ef7bd38b6b4bc6a27e7260d2162b2ea 58cf5afa5098072d0f735f9d73b67f9b9f6 99b8b098ec41d44e117135e88b3cfb670 876a2f34efd5734e7ce80b64450 ... no man has the right to fix the boundary of a nation. No man has the right to say to his country, "Thus far shalt thou go and no further", and we have never attempted to fix the "Ne plus ultra" to the progress of ... e0ab9f0efb8f4cc2b89b73439f7b1365e6 87b17b7e0bdc0ede00751a5a883ad8ee 0877b9b6a3032ad23521a7bc25a0b199 e5c57cdb2cb5d7500c997e133c41a1 ... no man has the right to fix the boundary of a nation. No man has the right to say to his country, "Thus far shalt thou go and no further", and we have never attempted to fix the "ne Plus ultra" to the progress of ... 61361212da56a824559b81409cf02ba5f 8c3bf41d4c8038faa885a183e1bdac1705 eefad72594af1fc3901aa55295c3166eb6 635ca866f1e5cdf56c7ff0fb56a ... no man has the right to fix the boundary of a nation. No man has the right to say to his country, "Thus far shalt thou go and no further", and we have never attempted to fix the "ne plus Ultra" to the progress of ... 833d8b7cc47843cf74fd42cbbf782e8754 3c677ecbdc1f7fe4d7ad9166557fac4c17 d467fa81302a195e60a0a6f3f89c34e03a 5c94eefcb3f19cabcfd87a37ad
  • 46. Pseudonymisation – Calculation Or Storage • Storing pseudonymisation values for depseudonymisation involves a storage overhead • Pseudonymisation values could be computed to avoid key storage at the expense of computation overhead • Computation overhead for generating the depseudonymisation key for an individual value depends on the approach to calculating the individual SALT value • The relative position of the record n must be know or be fixed to generate the correct SALTn January 4, 2022 46
  • 47. Pseudonymisation And Data Lakes/Data Warehouses January 4, 2022 47 Source Data Pseudonymised Data Depseudonymisation Key Data Lake Data Warehouse 1 2 3 4
  • 48. Pseudonymisation And Data Lakes/Data Warehouses • Data should be pseudonymised before the data lake and/or data warehouse is populated as part of the Data Privacy By Design And By Default approach • This ensure data privacy by design and by default • The high-level stages are: 1. As part of the ETL/ELT process, the source data is pseudonymised and the depseudonymisation key is created 2. The pseudonymised data is passed to the data lake 3. The pseudonymised data created by the ETL/ELT process is used to update the data warehouse directly 4. The pseudonymised data in the data lake is used to update the data warehouse January 4, 2022 48
  • 49. Differential Privacy • Differential privacy allows for the (public) sharing of information about a group or aggregate by describing the patterns of groups within the group or aggregate while suppressing information about individuals in the group or aggregate • A viewer of the information cannot tell if an individual's information was or was not used in the group or aggregate • This involves inserting noise into the results returned from a query of the data • Well-proven, widely used robust technique − The Algorithmic Foundations of Differential Privacy - https://www.cis.upenn.edu/~aaroth/privacybook.html • Eliminates the possibility of re-identification of individuals from the dataset • Individual-specific information is always hidden • Automates the curation of data January 4, 2022 49
  • 50. Differential Privacy Hosted Platform • This illustrates a logical architecture for a hosted differential privacy platform where organisation data is moved to the external platform January 4, 2022 50 Organisation Data Zone Organisation Application Zone Data Extract Process Data Sources Summarised Data Metadata Data Privacy Computation Engine User Directory Organisation DMZ Authorised Users Privatised Analysis Results Privacy Audit Logging and Monitoring Platform Performance and Usage User Access API Differential Privacy Platform User Directory Data Gateway Data Visualisation Interface
  • 51. Differential Privacy On-Premises Platform • This illustrates a logical architecture for an on-premises differential privacy platform where external access to the platform is enabled January 4, 2022 51 Organisation Data Zone Organisation Application Zone Data Extract Process Data Sources Summarised Data Metadata Data Privacy Computation Engine User Directory Organisation DMZ Authorised User Privatised Analysis Results Privacy Audit Logging and Monitoring Platform Performance and Usage User Access API Differential Privacy Platform
  • 52. Differential Privacy Logical Architecture January 4, 2022 52 Authorised Internal Access Core Data Privatisation/Differential Privacy Operational Platform Data Access Connector Internally Located Data Source – Internally Owned Internally Located Data Source - Third Party Owned Externally Located Data Source - Third Party Owned Management and Administration Security and Access Control Data Access Creation, Validation and Deployment Analytics and Reporting Data Access Connector Data Access Connector Monitoring, Logging and Auditing Authorised External Data Access Billing System Interface User Access API User Directory Metadata Store Data Analysis Data Store 1 2 3 6 9 12 11 14 15 16 17 18 19 Batch Task Manager 7 Data Visualisation Interface 10 13 Access and Usage Log 8 Data Ingestion and Summarisation 5 4
  • 53. Differential Privacy Logical Architecture – Components Item Description 1 Core Data Privatisation/Differential Privacy Operational Platform – this is the core differential privacy platform. This can be installed on-premises or on a cloud platform. It takes and summarises data from designated data sources and provides different levels of and types of computational access to authorised users via a data API. It also provides a range of management and administration functions. 2 Data Sources – these represent data held in a variety of databases and other data storage systems. The differential privacy platform needs read-only access to these data sources. 3 Data Access Connector – these are connectors that enable read-only access to data held in the data sources. 4 Data Ingestion and Summarisation – this takes data from data sources, processes it and outputs in a format suitable for access. It includes features to manage data ingestion workflows, scheduling and error identification and handing. 5 Data Analysis Data Store – the core differential privacy platform creates pre-summarised versions of the raw data from the data sources. The platform never provides access to individual source data records. The data is encrypted while at rest in the data store. 6 Metadata Store – the platform creates and stores metadata about each data source. This is used to optimise data privacy of the result sets generated in response to data queries. 7 Batch Task Manager – in addition to running online data queries, asynchronous batch tasks can be run for longer data tasks. 8 Access and Usage Log – this logs data accesses 9 User Access API – the platform provides an API for common data analytics tools to generate and retrieve privatised randomised sets of data summaries as well as providing data querying and analytics capabilities. Data results returned from queries is encrypted while in transit. 10 Data Visualisation Interface – this provides a data access and visualisation interface. 11 User Directory – the platform will use you existing user directories for user authentication and authorisation. 12 Authorised Internal Access – authorised internal users can access different datasets and perform different query types depending on their assigned rights. 13 Authorised External Access – authorised external users can access different datasets and perform different query types depending on their assigned rights. 14 Analytics and Reporting – this will allow you analyse and report on users accesses to data managed by the platform. 15 Monitoring, Logging and Auditing – this will log both system events and user activities. This information can be used both for platform management and planning as well as identifying potential patterns of data use and possible abuse. 16 Data Access Creation, Validation and Deployment – this will allow new data sources to be onboarded and allow existing data sources to be managed and updated. 17 Management and Administration – this will provide facilities to manage the overall platform such as adding and removing users and user groups and applying data privacy settings to different datasets. 18 Security and Access Control – this allows the management of different types of user access to different datasets. 19 Billing System Interface – you may want to charge for data access, either at a flat rate or by access or a mix of both. This represents an optional link to a financial management system to enable this January 4, 2022 53
  • 54. Differential Privacy – Privacy Budget January 4, 2022 54 Summarised Data Metadata Data Privacy Computation Engine User Access API Differential Privacy Platform Dataset Specific Privacy Budget (Privacy Exposure Limit) Differential Privacy Platform Introduces Fuzziness (Randomisation) Into Query Results Every Query Has a Privacy Cost That Is Taken From the Dataset Privacy Budget
  • 55. Differential Privacy – Privacy Budget • A differential privacy approach to data privacy assigns a privacy budget to each dataset • The differential privacy engine introduces a fuzziness into the results of queries − The greater the introduced fuzziness the greater the privacy but the utility of the results is reduced • Each query has a privacy cost • The total privacy expenditure across all queries by all users is tracked • When the budget has been spent, no further data queries can be performed until more privacy budget is allocated • This provides a warning threshold for differencing attacks January 4, 2022 55
  • 56. Differential Privacy – Privacy Budget • Effective and usable data privatisation and differential privacy means finding the right balance between data privacy and data utility January 4, 2022 56 Level of Data Privacy Amount and Complexity of Data Processing Allowed Level of Detail Contained in Results Level of Data Privacy Amount and Complexity of Data Processing Allowed Level of Detail Contained in Results Level of Data Privacy Amount and Complexity of Data Processing Allowed Level of Detail Contained in Results
  • 57. Differential Privacy And Data Attacks January 4, 2022 57 Organisation Data Zone Organisation Application Zone Data Extract Process Data Sources Summarised Data Metadata Data Privacy Computation Engine User Directory Organisation DMZ Combine Results of Multiple Queries Privacy Audit Logging and Monitoring Platform Performance and Usage User Access API Differential Privacy Platform User Directory Data Gateway Data Visualisation Interface Combine Results With Other Data Sources + + Differencing Attack Mosaic Effect Attack
  • 58. Differencing Attack, Reconstruction Attack And Mosaic Effect Attack • A reconstruction attack uses the information from a differencing attack to identify how the original dataset was processed to create the summary − Process is compromised and individual data may be compromised • Mosaic effect attack involves combining data from other data (public) sources to identify individuals • For example, apparently anonymised medical data containing dates of death can be combined with public death notice records to identify individual January 4, 2022 58
  • 59. Differencing Attack • Multiple partially-overlapping queries can be run until the results can be combined to identify an individual − How many people in the group are aged greater than N? − How many people in the group aged greater than N have attribute A? − How many people in the group aged greater than N have attribute B? − How many people with ages in the range N-9 to N-5 are male? − How many people with ages in the range N-4 to N are male? • After a number of queries you may be able to identify individuals or small numbers of individuals in a given age range of a given sex have a defined attribute • Apparently anonymous summary results can be combined to reveal potentially sensitive insights and comprise confidentiality • Differential privacy is designed to reduce or eliminate the threat of differencing attacks January 4, 2022 59
  • 60. Differencing Attack January 4, 2022 60 Summarised And Reduced Data Records Individual Queries On Summarised Data Intersection Of Query Results Can Allow Individuals To Be Identified
  • 61. Differencing Attack, Reconstruction Attack And Mosaic Effect January 4, 2022 61 Differencing Attack Identification Of Individuals Or Small Groups Reconstruction Attack Can Provide Insight That Allows Identification Of Individuals Can Lead To Can Lead To Other Data Sets Mosaic Effect Combined With Identification Of Expanded Set Of Information About Individuals Other Data Sets Used By Can Lead To Summarised Source Data Multiple Queries Run Against Source Dara Seeks To Understand The Structure Of The Source Data Original Source Data Source Data Processed to Create Summarised Data
  • 62. Summary • Your data has value to your organisation and to relevant data sharing partners − The data was expensively obtained − It represents a valuable asset on which a return must be generated • To achieve the value inherent in the data you need to be able to make it appropriately available to others, both within and outside the organisation • This has outlined technology approaches to achieving compliance with data privacy regulations and legislation while providing access to data • Technology is part of a risk management approach to data privacy • Using these technologies will embed such compliance by design into your data sharing and access facilities • This will allow you to realise value from your data successfully • The data privacy regulatory landscape is complex and getting even more complex so an approach to data access and sharing that embeds compliance as a matter of course is required • There is wider operational data sharing and data privacy framework that includes technology aspects, among other key areas January 4, 2022 62