This paper describes how technologies such as data pseudonymisation and differential privacy technology enables access to sensitive data and unlocks data opportunities and value while ensuring compliance with data privacy legislation and regulations.
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
Your data has value to your organisation and to relevant data sharing partners. It has been expensively obtained. It represents a valuable asset on which a return must be generated. To achieve the value inherent in the data you need to be able to make it appropriately available to others, both within and outside the organisation.
Organisations are frequently data rich and information poor, lacking the skills, experience and resources to convert raw data into value.
These notes outline technology approaches to achieving compliance with data privacy regulations and legislation while providing access to data.
There are different routes to making data accessible and shareable within and outside the organisation without compromising compliance with data protection legislation and regulations and removing the risk associated with allowing access to personal data:
• Differential Privacy – source data is summarised and individual personal references are removed. The one-to-one correspondence between original and transformed data has been removed
• Anonymisation – identifying data is destroyed and cannot be recovered so individual cannot be identified. There is still a one-to-one correspondence between original and transformed data
• Pseudonymisation – identifying data is encrypted and recovery data/token is stored securely elsewhere. There is still a one-to-one correspondence between original and transformed data
These technologies and approaches are not mutually exclusive – each is appropriate to differing data sharing and data access use cases
The data privacy regulatory and legislative landscape is complex and getting even more complex so an approach to data access and sharing that embeds compliance as a matter of course is required.
Appropriate technology appropriately implemented and operated is a means of managing and reducing risks of re-identification by making the time, skills, resources and money necessary to achieve this unrealistic.
Technology is part of a risk management approach to data privacy. There is wider operational data sharing and data privacy framework that includes technology aspects, among other key areas. Using these technologies will embed such compliance by design into your data sharing and access facilities. This will allow you to realise value from your data successfully.
Blockchain Essentials and Blockchain on AzureNuri Cankaya
In this presentation I cover from the basics of Blockchain and deep-dive into the possibilities with Microsoft Azure on Blockchain projects.
What is Blockchain
Blockchain Disruption
Blockchain Business Scenarios
Microsoft’s Strategy on Blockchain
Blockchain 2.0: Smart Contracts
Blockchain 3.0: Cryptlets innovation
Blockchain on Microsoft Azure
Bletchley Project
Azure Blockchain Solutions
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differentia...Alan McSweeney
Your data has value to your organisation and to relevant data sharing partners. It has been expensively obtained. It represents a valuable asset on which a return must be generated. To achieve the value inherent in the data you need to be able to make it appropriately available to others, both within and outside the organisation.
Organisations are frequently data rich and information poor, lacking the skills, experience and resources to convert raw data into value.
These notes outline technology approaches to achieving compliance with data privacy regulations and legislation while providing access to data.
There are different routes to making data accessible and shareable within and outside the organisation without compromising compliance with data protection legislation and regulations and removing the risk associated with allowing access to personal data:
• Differential Privacy – source data is summarised and individual personal references are removed. The one-to-one correspondence between original and transformed data has been removed
• Anonymisation – identifying data is destroyed and cannot be recovered so individual cannot be identified. There is still a one-to-one correspondence between original and transformed data
• Pseudonymisation – identifying data is encrypted and recovery data/token is stored securely elsewhere. There is still a one-to-one correspondence between original and transformed data
These technologies and approaches are not mutually exclusive – each is appropriate to differing data sharing and data access use cases
The data privacy regulatory and legislative landscape is complex and getting even more complex so an approach to data access and sharing that embeds compliance as a matter of course is required.
Appropriate technology appropriately implemented and operated is a means of managing and reducing risks of re-identification by making the time, skills, resources and money necessary to achieve this unrealistic.
Technology is part of a risk management approach to data privacy. There is wider operational data sharing and data privacy framework that includes technology aspects, among other key areas. Using these technologies will embed such compliance by design into your data sharing and access facilities. This will allow you to realise value from your data successfully.
Blockchain Essentials and Blockchain on AzureNuri Cankaya
In this presentation I cover from the basics of Blockchain and deep-dive into the possibilities with Microsoft Azure on Blockchain projects.
What is Blockchain
Blockchain Disruption
Blockchain Business Scenarios
Microsoft’s Strategy on Blockchain
Blockchain 2.0: Smart Contracts
Blockchain 3.0: Cryptlets innovation
Blockchain on Microsoft Azure
Bletchley Project
Azure Blockchain Solutions
The presentation is about blockchain technology.It will give the basic idea about blockchain and about it's application,i.e where we can use blockchain technology.
This Edureka Blockchain technology tutorial will give you an understanding of how blockchain works and what are blockchain technologies. This tutorial helps you to learn following topics:
1. What are Blockchain & Bitcoin
2. Blockchain Technologies
3. Peer to Peer Network
4. Cryptography
5. Proof of Work & Blockchain Program
6. Ethereum & Smart Contracts
7. Blockchain Applications and Use Cases
What is Blockchain?
Advantages of Blockchain
Working of Blockchain
Challenges Blockchain
Blockchain in Bitcoin
Satoshi Nakamoto brief intro
Blockchain Mining
Need & Types of Blockchain Mining
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...Edureka!
This Edureka Blockchain training will give you a fundamental understanding regrading Blockchain and Bitcoin.
This session will help you learn following topics:
1. Current Existing Monetary System
2. How can Blockchain and Bitcoin help?
3. What is Blockchain?
4. Blockchain concepts
5. Bitcoin Transaction
6. Blockchain features
7. Blockchain Use Case
8. Demo: Bitcoin Transaction
Generative AI's impact on creativity and productivity is undeniable. This presentation dives into real-world security and privacy risks, along with methods to address them. Can generative AI be used for cybersecurity? Let's explore!
A presentation explaining the concepts of Blockchain. It covers the introduction to blockchain, types of blockchain, process of adding blocks in bitcoin blockchain, hyperledger block structure, use cases of blockchain explained.
I created this presentation for a client who wanted to understand how blockchain technology can be used in healthcare, particularly for eHR (electronic health record). They wanted a non-technical overview.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Overview of Data Loss Prevention (DLP) TechnologyLiwei Ren任力偉
DLP is a technology that detects potential data breach incidents in timely manner and prevents them by monitoring data in-use (endpoints), in-motion (network traffic), and at-rest (data storage). It has been driven by regulatory compliances and intellectual property protection. This talk will introduce DLP models that describe the capabilities and scope that a DLP system should cover. A few system categories will be discussed accordingly with high-level system architecture. DLP is an interesting technology in that it provides advanced content inspection techniques. As such, a few content inspection techniques will be proposed and investigated in rigorous terms.
The presentation is about blockchain technology.It will give the basic idea about blockchain and about it's application,i.e where we can use blockchain technology.
This Edureka Blockchain technology tutorial will give you an understanding of how blockchain works and what are blockchain technologies. This tutorial helps you to learn following topics:
1. What are Blockchain & Bitcoin
2. Blockchain Technologies
3. Peer to Peer Network
4. Cryptography
5. Proof of Work & Blockchain Program
6. Ethereum & Smart Contracts
7. Blockchain Applications and Use Cases
What is Blockchain?
Advantages of Blockchain
Working of Blockchain
Challenges Blockchain
Blockchain in Bitcoin
Satoshi Nakamoto brief intro
Blockchain Mining
Need & Types of Blockchain Mining
- Learn to understand what knowledge graphs are for
- Understand the structure of knowledge graphs (and how it relates to taxonomies and ontologies)
- Understand how knowledge graphs can be created using manual, semi-automatic, and fully automatic methods.
- Understand knowledge graphs as a basis for data integration in companies
- Understand knowledge graphs as tools for data governance and data quality management
- Implement and further develop knowledge graphs in companies
- Query and visualize knowledge graphs (including SPARQL and SHACL crash course)
- Use knowledge graphs and machine learning to enable information retrieval, text mining and document classification with the highest precision
- Develop digital assistants and question and answer systems based on semantic knowledge graphs
- Understand how knowledge graphs can be combined with text mining and machine learning techniques
- Apply knowledge graphs in practice: Case studies and demo applications
Blockchain Training | Blockchain Tutorial for Beginners | Blockchain Technolo...Edureka!
This Edureka Blockchain training will give you a fundamental understanding regrading Blockchain and Bitcoin.
This session will help you learn following topics:
1. Current Existing Monetary System
2. How can Blockchain and Bitcoin help?
3. What is Blockchain?
4. Blockchain concepts
5. Bitcoin Transaction
6. Blockchain features
7. Blockchain Use Case
8. Demo: Bitcoin Transaction
Generative AI's impact on creativity and productivity is undeniable. This presentation dives into real-world security and privacy risks, along with methods to address them. Can generative AI be used for cybersecurity? Let's explore!
A presentation explaining the concepts of Blockchain. It covers the introduction to blockchain, types of blockchain, process of adding blocks in bitcoin blockchain, hyperledger block structure, use cases of blockchain explained.
I created this presentation for a client who wanted to understand how blockchain technology can be used in healthcare, particularly for eHR (electronic health record). They wanted a non-technical overview.
Presentation on Data Mesh: The paradigm shift is a new type of eco-system architecture, which is a shift left towards a modern distributed architecture in which it allows domain-specific data and views “data-as-a-product,” enabling each domain to handle its own data pipelines.
Overview of Data Loss Prevention (DLP) TechnologyLiwei Ren任力偉
DLP is a technology that detects potential data breach incidents in timely manner and prevents them by monitoring data in-use (endpoints), in-motion (network traffic), and at-rest (data storage). It has been driven by regulatory compliances and intellectual property protection. This talk will introduce DLP models that describe the capabilities and scope that a DLP system should cover. A few system categories will be discussed accordingly with high-level system architecture. DLP is an interesting technology in that it provides advanced content inspection techniques. As such, a few content inspection techniques will be proposed and investigated in rigorous terms.
White Paper - Process Neutral Data ModellingDavid Walker
This paper describes in detail the process for creating an enterprise data warehouse physical data model that is less susceptible to change. Change is one of the largest on-going costs in a data warehouse and therefore reducing change reduces the total cost of ownership of the system. This is achieved by removing business process specific data and concentrating on core business information.
The white paper examines why data-modelling style is important and how issues arise when using a data model for reporting. It discusses a number of techniques and proposes a specific solution. The techniques should be considered when building a data warehouse solution even when an organisation decides against using the specific solution.
This paper is intended for a technical audience and project managers involved with the technical aspects of a data warehouse project.
These are the first 4 chapters of my book Refresh the Road Ahead (www.refreshroadahead.com). A book on how to work successfully with Microsoft. The full book is 12 chapters and 260 pages and you can buy it from the website www.refreshroadahead.com or on Amazon
Cambridge Centre for Alternative Finance supported by Invesco published the "3rd Global Cryptoasset Benchmarking Study " (attached). Excerpt on the key findings of the regulatory and compliance standards across the industry and geographies:
"Just over two out of five surveyed firms are licensed or in the process of obtaining a license; these firms are primarily located in Europe. However, the remaining 58% should not be perceived as the share of entities conducting unregulated activities or evading regulations: some surveyed service providers are engaged in activities that do not yet warrant any authorisation process (e.g. non-custodial functions) or are operating in jurisdiction(s) where no regulatory framework or guidance has been put forth.
Compliance with KYC/AML obligations is heterogeneous across regions. Nearly all customer accounts at European and North American service providers have been KYC’ed, whereas this is the case for only one out of two accounts at MEA-based service providers.
The share of cryptoasset-only companies that did not conduct any KYC checks at all dropped from 48% to 13% between 2018 and 2020, most likely resulting from the progressive harmonisation of KYC/AML standards across jurisdictions, such as initiated by the Financial Action Task Force (FATF).
The inclusion of firms exclusively supporting cryptoassets featured in FATF’s updated standards and recommendations is believed to have spurred greater compliance among this group of firms. However, this should not be interpreted as these companies becoming fully KYC compliant as some KYC checks are only applied to a subset of consumers.
54% of surveyed custodial service providers indicated that they performed an externally-led audit of their cryptoasset reserves over the past 12 months. This is a 24-percentage points decline compared to our 2018 sample. Firms that have undergone an independent audit are most likely to be operating out of Europe or the APAC region."
"Teach us about online safety in schools", say young people in the Safer Internet Forum
In seeking to identify how national education systems approach online safety issues faced by children, the European Commission's Safer Internet Programme has carried out a consultation targeted at a broad range of stakeholders, the results of which have now been published in an "Assessment report on the status of online safety education in schools across Europe", written by an external expert.
Similar to Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy (20)
The data architecture of solutions is frequently not given the attention it deserves or needs. Frequently, too little attention is paid to designing and specifying the data architecture within individual solutions and their constituent components. This is due to the behaviours of both solution architects ad data architects.
Solution architecture tends to concern itself with functional, technology and software components of the solution
Data architecture tends not to get involved with the data aspects of technology solutions, leaving a data architecture gap. Combined with the gap where data architecture tends not to get involved with the data aspects of technology solutions, there is also frequently a solution architecture data gap. Solution architecture also frequently omits the detail of data aspects of solutions leading to a solution data architecture gap. These gaps result in a data blind spot for the organisation.
Data architecture tends to concern itself with post-individual solutions. Data architecture needs to shift left into the domain of solutions and their data and more actively engage with the data dimensions of individual solutions. Data architecture can provide the lead in sealing these data gaps through a shift-left of its scope and activities as well providing standards and common data tooling for solution data architecture
The objective of data design for solutions is the same as that for overall solution design:
• To capture sufficient information to enable the solution design to be implemented
• To unambiguously define the data requirements of the solution and to confirm and agree those requirements with the target solution consumers
• To ensure that the implemented solution meets the requirements of the solution consumers and that no deviations have taken place during the solution implementation journey
Solution data architecture avoids problems with solution operation and use:
• Poor and inconsistent data quality
• Poor performance, throughput, response times and scalability
• Poorly designed data structures can lead to long data update times leading to long response times, affecting solution usability, loss of productivity and transaction abandonment
• Poor reporting and analysis
• Poor data integration
• Poor solution serviceability and maintainability
• Manual workarounds for data integration, data extract for reporting and analysis
Data-design-related solution problems frequently become evident and manifest themselves only after the solution goes live. The benefits of solution data architecture are not always evident initially.
Solution Architecture and Solution Estimation.pdfAlan McSweeney
Solution architects and the solution architecture function are ideally placed to create solution delivery estimates
Solution architects have the knowledge and understanding of the solution constituent component and structure that is needed to create solution estimate:
• Knowledge of solution options
• Knowledge of solution component structure to define a solution breakdown structure
• Knowledge of available components and the options for reuse
• Knowledge of specific solution delivery constraints and standards that both control and restrain solution options
Accurate solution delivery estimates are need to understand the likely cost/resources/time/options needed to implement a new solution within the context of a range of solutions and solution options. These estimates are a key input to investment management and making effective decisions on the portfolio of solutions to implement. They enable informed decision-making as part of IT investment management.
An estimate is not a single value. It is a range of values depending on a number of conditional factors such level of knowledge, certainty, complexity and risk. The range will narrow as the level of knowledge and uncertainty decreases
There is no easy or magic way to create solution estimates. You have to engage with the complexity of the solution and its components. The more effort that is expended the more accurate the results of the estimation process will be. But there is always a need to create estimates (reasonably) quickly so a balance is needed between effort and quality of results.
The notes describe a structured solution estimation process and an associated template. They also describe the wider context of solution estimates in terms of IT investment and value management and control.
Validating COVID-19 Mortality Data and Deaths for Ireland March 2020 – March ...Alan McSweeney
This analysis seeks to validate published COVID-19 mortality statistics using mortality data derived from general mortality statistics, mortality estimated from population size and mortality rates and death notice data
Analysis of the Numbers of Catholic Clergy and Members of Religious in Irelan...Alan McSweeney
This analysis looks at the changes in the numbers of priests and nuns in Ireland for the years 1926 to 2016. It combines data from a range of sources to show the decline in the numbers of priests and nuns and their increasing age profile.
This analysis consists of the following sections:
• Summary - this highlights some of the salient points in the analysis.
• Overview of Analysis - this describes the approach taken in this analysis.
• Context – this provides background information on the number of Catholics in Ireland as a context to this analysis.
• Analysis of Census Data 1926 – 2016 - this analyses occupation age profile data for priests and nuns. It also includes sample projections on the numbers of priests and nuns.
• Analysis of Catholic Religious Mortality 2014-2021 - this analyses death notice data from RIP.ie to shows the numbers of priests and nuns that have died in the years 2014 to 2021. It also looks at deaths of Irish priests and nuns outside Ireland and at the numbers of countries where Irish priests and nuns have worked.
• Analysis of Data on Catholic Clergy From Other Sources - this analyses data on priests and nuns from other sources.
• Notes on Data Sources and Data Processing - this lists the data sources used in this analysis.
IT Architecture’s Role In Solving Technical Debt.pdfAlan McSweeney
Technical debt is an overworked term without an effective and common agreed understanding of what exactly it is, what causes it, what are its consequences, how to assess it and what to do about it.
Technical debt is the sum of additional direct and indirect implementation and operational costs incurred and risks and vulnerabilities created because of sub-optimal solution design and delivery decisions.
Technical debt is the sum of all the consequences of all the circumventions, budget reduction, time pressure, lack of knowledge, manual workarounds, short-cuts, avoidance, poor design and delivery quality and decisions to remove elements from solution scope and failure to provide foundational and backbone solution infrastructure.
Technical debt leads to a negative feedback cycle with short solution lifespan, earlier solution replacement and short-term tactical remedial actions.
All the disciplines within IT architecture have a role to play in promoting an understanding of and in the identification of how to resolve technical debt. IT architecture can provide the leadership in both remediating existing technical debt and preventing future debt.
Failing to take a complete view of the technical debt within the organisation means problems and risks remained unrecognised and unaddressed. The real scope of the problem is substantially underestimated. Technical debt is always much more than poorly written software.
Technical debt can introduce security risks and vulnerabilities into the organisation’s solution landscape. Failure to address technical debt leaves exploitable security risks and vulnerabilities in place.
Shadow IT or ghost IT is a largely unrecognised source of technical debt including security risks and vulnerabilities. Shadow IT is the consequence of a set of reactions by business functions to an actual or perceived inability or unwillingness of the IT function to respond to business needs for IT solutions. Shadow IT is frequently needed to make up for gaps in core business solutions, supplementing incomplete solutions and providing omitted functionality.
Solution Architecture And Solution SecurityAlan McSweeney
This describes an approach to embedding security within the technology solution landscape. It describes a security model that encompasses the range of individual solution components up to the entire solution landscape. The solution security model allows the security status of a solution and its constituent delivery and operational components to be tracked wherever those components are located. This provides an integrated approach to solution security across all solution components and across the entire organisation topology of solutions. It allows the solution architect to validate the security of an individual solution. It enables the security status of the entire solution landscape to be assessed and recorded. Solution security is a wicked problem because there is no certainly about when the problem has been resolved and a state of security has been achieved. The security state of a solution can just be expressed along a subjective spectrum of better or worse rather than a binary true or false. Solution security can have negative consequences: prevents types of access, limits availability in different ways, restricts functionality provided, makes solution harder to use, lengthens solution delivery times, increases costs along the entire solution lifecycle, leads to loss of usability, utility and rate of use.
Solution architects must be aware of the need for solution security and of the need to have enterprise-level controls that solutions can adopt.
The sets of components that comprise the extended solution landscape, including those components that provide common or shared functionality, are located in different zones, each with different security characteristics.
The functional and operational design of any solution and therefore its security will include many of these components, including those inherited by the solution or common components used by the solution.
The complete solution security view should refer explicitly to the components and their controls.
While each individual solution should be able to inherit the security controls provided by these components, the solution design should include explicit reference to them for completeness and to avoid unvalidated assumptions.
There is a common and generalised set of components, many of which are shared, within the wider solution topology that should be considered when assessing overall solution architecture and solution security.
Individual solutions must be able to inherit security controls, facilities and standards from common enterprise-level controls, standards, toolsets and frameworks.
Individual solutions must not be forced to implement individual infrastructural security facilities and controls. This is wasteful of solution implementation resources, results in multiple non-standard approaches to security and represents a security risk to the organisation.
The extended solution landscape potentially consists of a large number of interacting components and entities located in different zones, each with different security profiles, requirements and concerns. Different security concerns and therefore controls apply to each of these components.
Solution security is not covered by a single control. It involves multiple overlapping sets of controls providing layers of security.
Solution Architecture And (Robotic) Process Automation SolutionsAlan McSweeney
Automation is a technology trend IT architects should be aware of and know how to respond to business requests as well as recommend automation technologies and solutions where appropriate. Automation is a bigger topic than just RPA (Robotic Process Automation).
Automation solutions, like all other technology solutions, should be subject to an architecture and design process. There are many approaches to and options for the automation of business activities. Too often automation solutions are tactical applications layered over existing business systems
The objective of all IT solutions is to automate manual business processes and their activities to a certain extent. The requirement for RPA-type applications arises in part because of automation failures within existing applications or the need to automate the interactions with or integrations between separate, possibly legacy, applications.
One of the roles of IT architecture is to always seek to take the wider architectural view and to ensure that solutions are designed and delivered within a strategic framework to avoid, as much as is practical and realistic, short-term tactical solutions and approaches that lead to an accumulation of design, operations and support debt. Tactical solutions will always play a part in the organisation’s solution landscape.
The objective of these notes is to put automation into its wider and larger IT architecture context while accepting the need for tactical approaches in some instances.
These notes cover the following topics:
• Solution And Process Automation – The Wider Technology And Approach Landscape
• Business Processes, Business Solutions And Automation
• Organisation Process Model
• Strategic And Tactical Automation
• Deciding On The Scope Of Automation
• Digital Strategy, Digital Transformation And Automation
• Specifying The Automation Solution
• Business Process Model and Notation (BPMN)
• Sample Business Process – Order To Cash
• RPA (Robotic Process Automation)
Data Profiling, Data Catalogs and Metadata HarmonisationAlan McSweeney
These notes discuss the related topics of Data Profiling, Data Catalogs and Metadata Harmonisation. It describes a detailed structure for data profiling activities. It identifies various open source and commercial tools and data profiling algorithms. Data profiling is a necessary pre-requisite activity in order to construct a data catalog. A data catalog makes an organisation’s data more discoverable. The data collected during data profiling forms the metadata contained in the data catalog. This assists with ensuring data quality. It is also a necessary activity for Master Data Management initiatives. These notes describe a metadata structure and provide details on metadata standards and sources.
Comparison of COVID-19 Mortality Data and Deaths for Ireland March 2020 – Mar...Alan McSweeney
This document compares published COVID-19 mortality statistics for Ireland with publicly available mortality data extracted from informal public data sources. This mortality data is taken from published death notices on the web site www.rip.ie. This is used a substitute for poor quality and long-delayed officially published mortality statistics.
Death notice information on the web site www.rip.ie is available immediately and contains information at a greater level of detail than published statistics. There is a substantial lag in officially published mortality data and the level of detail is very low. However, the extraction of death notice data and its conversion into a usable and accurate format requires a great deal of processing.
The objective of this analysis is to assess the accuracy of published COVID-19 mortality statistics by comparing trends in mortality over the years 2014 to 2020 with both numbers of deaths recorded from 2020 to 2021 and the COVID-19 statistics. It compares number of deaths for the seven 13-month intervals:
1. Mar 2014 - Mar 2015
2. Mar 2015 - Mar 2016
3. Mar 2016 - Mar 2017
4. Mar 2017 - Mar 2018
5. Mar 2018 - Mar 2019
6. Mar 2019 - Mar 2020
7. Mar 2020 - Mar 2021
It focuses on the seventh interval which is when COVID-19 deaths have occurred. It combines an analysis of mortality trends with details on COVID-19 deaths. This is a fairly simplistic analysis that looks to cross-check COVID-19 death statistics using data from other sources.
The subject of what constitutes a death from COVID-19 is controversial. This analysis is not concerned with addressing this controversy. It is concerned with comparing mortality data from a number of sources to identify potential discrepancies. It may be the case that while the total apparent excess number of deaths over an interval is less than the published number of COVID-19 deaths, the consequence of COVID-19 is to accelerate deaths that might have occurred later in the measurement interval.
Accurate data is needed to make informed decisions. Clearly there are issues with Irish COVID-19 mortality data. Accurate data is also needed to ensure public confidence in decision-making. Where this published data is inaccurate, this can lead of a loss of this confidence that can exploited.
Analysis of Decentralised, Distributed Decision-Making For Optimising Domesti...Alan McSweeney
This analysis looks at the potential impact that large numbers of electric vehicles could have on electricity demand, electricity generation capacity and on the electricity transmission and distribution grid in Ireland. It combines data from a number of sources – electricity usage patterns, vehicle usage patterns, electric vehicle current and possible future market share – to assess the potential impact of electric vehicles.
It then analyses a possible approach to electric vehicle charging where the domestic charging unit has some degree of decentralised intelligence and decision-making capability in deciding when to start vehicle charging to minimise electricity usage impact and optimise electricity generation usage.
The potential problem to be addressed is that if large numbers of electric cars are plugged-in and charging starts immediately when the drivers of those cars arrive home, the impact on demand for electricity will be substantial.
Operational Risk Management Data Validation ArchitectureAlan McSweeney
This describes a structured approach to validating data used to construct and use an operational risk model. It details an integrated approach to operational risk data involving three components:
1. Using the Open Group FAIR (Factor Analysis of Information Risk) risk taxonomy to create a risk data model that reflects the required data needed to assess operational risk
2. Using the DMBOK model to define a risk data capability framework to assess the quality and accuracy of risk data
3. Applying standard fault analysis approaches - Fault Tree Analysis (FTA) and Failure Mode and Effect Analysis (FMEA) - to the risk data capability framework to understand the possible causes of risk data failures within the risk model definition, operation and use
Data Integration, Access, Flow, Exchange, Transfer, Load And Extract Architec...Alan McSweeney
These notes describe a generalised data integration architecture framework and set of capabilities.
With many organisations, data integration tends to have evolved over time with many solution-specific tactical approaches implemented. The consequence of this is that there is frequently a mixed, inconsistent data integration topography. Data integrations are often poorly understood, undocumented and difficult to support, maintain and enhance.
Data interoperability and solution interoperability are closely related – you cannot have effective solution interoperability without data interoperability.
Data integration has multiple meanings and multiple ways of being used such as:
- Integration in terms of handling data transfers, exchanges, requests for information using a variety of information movement technologies
- Integration in terms of migrating data from a source to a target system and/or loading data into a target system
- Integration in terms of aggregating data from multiple sources and creating one source, with possibly date and time dimensions added to the integrated data, for reporting and analytics
- Integration in terms of synchronising two data sources or regularly extracting data from one data sources to update a target
- Integration in terms of service orientation and API management to provide access to raw data or the results of processing
There are two aspects to data integration:
1. Operational Integration – allow data to move from one operational system and its data store to another
2. Analytic Integration – move data from operational systems and their data stores into a common structure for analysis
Ireland 2019 and 2020 Compared - Individual ChartsAlan McSweeney
This analysis compares some data areas - Economy, Crime, Aviation, Energy, Transport, Health, Mortality. Housing and Construction - for Ireland for the years 2019 and 2020, illustrating the changes that have occurred between the two years. It shows some of the impacts of COVID-19 and of actions taken in response to it, such as the various lockdowns and other restrictions.
The first lockdown clearly had major changes on many aspects of Irish society. The third lockdown which began at the end of the period analysed will have as great an impact as the first lockdown.
The consequences of the events and actions that have causes these impacts could be felt for some time into the future.
Analysis of Irish Mortality Using Public Data Sources 2014-2020Alan McSweeney
This describes the use of published death notices on the web site www.rip.ie as a substitute to officially published mortality statistics. This analysis uses data from RIP.ie for the years 2014 to 2020.
Death notice information is available immediately and contains information at a greater level of detail than published statistics. There is a substantial lag in officially published mortality data.
This analysis compares some data areas - Economy, Crime, Aviation, Energy, Transport, Health, Mortality. Housing and Construction - for Ireland for the years 2019 and 2020, illustrating the changes that have occurred between the two years. It shows some of the impacts of COVID-19 and of actions taken in response to it, such as the various lockdowns and other restrictions.
The first lockdown clearly had major changes on many aspects of Irish society. The third lockdown which began at the end of the period analysed will have as great an impact as the first lockdown.
The consequences of the events and actions that have causes these impacts could be felt for some time into the future.
Review of Information Technology Function Critical Capability ModelsAlan McSweeney
IT Function critical capabilities are key areas where the IT function needs to maintain significant levels of competence, skill and experience and practise in order to operate and deliver a service. There are several different IT capability frameworks. The objective of these notes is to assess the suitability and applicability of these frameworks. These models can be used to identify what is important for your IT function based on your current and desired/necessary activity profile.
Capabilities vary across organisation – not all capabilities have the same importance for all organisations. These frameworks do not readily accommodate variability in the relative importance of capabilities.
The assessment approach taken is to identify a generalised set of capabilities needed across the span of IT function operations, from strategy to operations and delivery. This generic model is then be used to assess individual frameworks to determine their scope and coverage and to identify gaps.
The generic IT function capability model proposed here consists of five groups or domains of major capabilities that can be organised across the span of the IT function:
1. Information Technology Strategy, Management and Governance
2. Technology and Platforms Standards Development and Management
3. Technology and Solution Consulting and Delivery
4. Operational Run The Business/Business as Usual/Service Provision
5. Change The Business/Development and Introduction of New Services
In the context of trends and initiatives such as outsourcing, transition to cloud services and greater platform-based offerings, should the IT function develop and enhance its meta-capabilities – the management of the delivery of capabilities? Is capability identification and delivery management the most important capability? Outsourced service delivery in all its forms is not a fire-and-forget activity. You can outsource the provision of any service except the management of the supply of that service.
The following IT capability models have been evaluated:
• IT4IT Reference Architecture https://www.opengroup.org/it4it contains 32 functional components
• European e-Competence Framework (ECF) http://www.ecompetences.eu/ contains 40 competencies
• ITIL V4 https://www.axelos.com/best-practice-solutions/itil has 34 management practices
• COBIT 2019 https://www.isaca.org/resources/cobit has 40 management and control processes
• APQC Process Classification Framework - https://www.apqc.org/process-performance-management/process-frameworks version 7.2.1 has 44 major IT management processes
• IT Capability Maturity Framework (IT-CMF) https://ivi.ie/critical-capabilities/ contains 37 critical capabilities
The following model has not been evaluated
• Skills Framework for the Information Age (SFIA) - http://www.sfia-online.org/ lists over 100 skills
Critical Review of Open Group IT4IT Reference ArchitectureAlan McSweeney
This reviews the Open Group’s IT4IT Reference Architecture (https://www.opengroup.org/it4it) with respect to other operational frameworks to determine its suitability and applicability to the IT operating function.
IT4IT is intended to be a reference architecture for the management of the IT function. It aims to take a value chain approach to create a model of the functions that IT performs and the services it provides to assist organisations in the identification of the activities that contribute to business competitiveness. It is intended to be an integrated framework for the management of IT that emphasises IT service lifecycles.
This paper reviews what is meant by a value-chain, with special reference to the Supply Chain Operations Reference (SCOR) model (https://www.apics.org/apics-for-business/frameworks/scor). the most widely used and most comprehensive such model.
The SCOR model is part of wider set of operations reference models that describe a view of the critical elements in a value chain:
• Product Life Cycle Operations Reference model (PLCOR) - Manages the activities for product innovation and product and portfolio management
• Customer Chain Operations Reference model (CCOR) - Manages the customer interaction processes
• Design Chain Operations Reference model (DCOR) - Manages the product and service development processes
• Managing for Supply Chain Performance (M4SC) - Translates business strategies into supply chain execution plans and policies
It also compares the IT4IT Reference Architecture and its 32 functional components to other frameworks that purport to identify the critical capabilities of the IT function:
• IT Capability Maturity Framework (IT-CMF) https://ivi.ie/critical-capabilities/ contains 37 critical capabilities
• Skills Framework for the Information Age (SFIA) - http://www.sfia-online.org/ lists over 100 skills
• European e-Competence Framework (ECF) http://www.ecompetences.eu/ contains 40 competencies
• ITIL IT Service Management https://www.axelos.com/best-practice-solutions/itil
• COBIT 2019 https://www.isaca.org/resources/cobit has 40 management and control processes
Analysis of Possible Excess COVID-19 Deaths in Ireland From Jan 2020 to Jun 2020Alan McSweeney
This analysis seeks to determine if there are excess deaths that occurred in Ireland in the interval Jan – Jun 2020 that can be attributed to COVID-19. Excess deaths means deaths in excess of the number of expected deaths plus the number of deaths directly attributed to COVID-19. On the other hand a deficiency of deaths would occur when the number of expected deaths plus the number of deaths directly attributed to COVID-19 is less than the actual deaths.
This analysis uses number of deaths taken from the web site RIP.ie to generate an estimate of the number of deaths in Jan – Jun 2020 in the absence of any other official source. The last data extract from the RIP.ie web site was taken on 3 Jul 2020.
The analysis uses historical data from RIP.ie from 2018 and 2019 to assess its accuracy as a data source.
The analysis then uses the following three estimation approaches to assess the excess or deficiency of deaths:
1. The pattern of deaths in 2020 can be compared to previous comparable year or years. The additional COVID-19 deaths can be added to the comparable year and the difference between the expected, actual from RIP.ie and actual COVID-19 deaths can be analysed to generate an estimate of any excess or deficiency.
2. The age-specific mortality rates described on page 16 can be applied to estimates of population numbers to generates an estimate of expected deaths. This can be compared to the actual RIP.ie and actual COVID-19 deaths to generate an estimate of any excess or deficiency.
3. The range of death rates per 1,000 of population as described in Figure 10 on page 16 can be applied to estimates of population numbers to generates an estimate of expected deaths. This can be compared to the actual RIP.ie and actual COVID-19 deaths to generate an estimate of any excess or deficiency.
This presentation describes systematic, repeatable and co-ordinated approach to agile solution architecture and design. It is intended to describe a set of practical steps and activities embedded within a framework to allow an agile method to be adopted and used for solution design and delivery. This approach ensures consistency in the assessment of solution design options and in subsequent solution design and solution delivery activities. This process leads to the rapid design and delivery of realistic and achievable solutions that meet real solution consumer needs. The approach provides for effective solution decision-making. It generates options and results quickly and consistently. Implementing a framework such as this provides for the creation of a knowledgebase of previous solution design and delivery exercises that leads to an accumulated body of knowledge within the organisation.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
1. Data Privatisation,
Data Anonymisation,
Data
Pseudonymisation
and Differential
Privacy
This paper describes how technologies such
as data pseudonymisation and differential
privacy technology enables access to
sensitive data and unlocks data
opportunities and value while ensuring
compliance with data privacy legislation and
regulations
Alan McSweeney
January 2022
alan@alanmcsweeney.com
2. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 2
Contents
Introduction.......................................................................................................................................................................4
Personal Information .........................................................................................................................................................6
Third-Party Data Sharing And Data Access Framework .....................................................................................................7
Data Privacy Technologies...............................................................................................................................................10
Context Of Data Privatisation – Anonymisation, Pseudonymisation And Differential Privacy .......................................... 11
Data Sharing Use Cases...............................................................................................................................................14
Pseudonymisation ........................................................................................................................................................... 15
Why Pseudonymise Rather Than Anonymise? .............................................................................................................16
GDPR Origin Of Pseudonymisation .............................................................................................................................16
Growing Importance Of Pseudonymisation .................................................................................................................19
Approaches To Pseudonymisation...............................................................................................................................19
Pseudonymisation By Replacing ID Fields With Linking Identifier (Token) ...............................................................20
Pseudonymisation By Replacing ID Fields With Linking Identifier – Multiple ID Fields..............................................21
ID Field Hashing Pseudonymisation.........................................................................................................................21
Hashing And Identifier Codes ..................................................................................................................................22
Hashing And Reversibility........................................................................................................................................23
ID Field Hashing Pseudonymisation With Data Salting And Peppering ....................................................................24
Data Attacks – ID Field Hashing Pseudonymisation With Data Salting And Peppering ............................................25
Content Hashing Pseudonymisation........................................................................................................................26
Pseudonymisation And Data Lakes/Data Warehouses................................................................................................. 27
Pseudonymisation Implementation.............................................................................................................................28
Data Breaches and Attacks ..............................................................................................................................................28
Pseudonymisation and Data Breaches.........................................................................................................................29
Differencing Attack .....................................................................................................................................................30
Differencing Attack, Reconstruction Attack And Mosaic Effect.................................................................................... 31
Differential Privacy ..........................................................................................................................................................32
Data Privatisation and Differential Privacy Solution Architecture Overview.....................................................................34
Differential Privacy Platform Solution Service Management Processes .......................................................................36
Differential Privacy Platform Deployment Options...................................................................................................... 37
On-Premises Deployment .......................................................................................................................................38
Cloud Deployment ..................................................................................................................................................39
Differential Privacy and Data Attacks ..........................................................................................................................40
Data Privatisation and Differential Privacy Solution Planning ..........................................................................................40
Data Privatisation and Differential Privacy Solution Operation and Use...........................................................................41
Data Privatisation and Differential Privacy Next Steps.....................................................................................................43
Early Business Engagement and Differential Privacy Opportunity Validation...............................................................45
Differential Privacy Detailed Design ............................................................................................................................46
Differential Privacy Readiness Assessment.................................................................................................................. 47
Differential Privacy Architecture Sprint .......................................................................................................................49
3. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 3
List of Figures
Figure 1 – Data Privacy Subject Areas ................................................................................................................................5
Figure 2 – Data Privacy and Data Utility Balancing Act.......................................................................................................5
Figure 3 – Data Sharing and Data Access Framework.........................................................................................................7
Figure 4 – Data Sharing and Access Topologies .................................................................................................................9
Figure 5 – Data Privatisation Spectrum ............................................................................................................................10
Figure 6 – Data Privacy Technologies............................................................................................................................... 11
Figure 7 – Context of Data Privatisation ...........................................................................................................................12
Figure 8 – Overview of Pseudonymisation ....................................................................................................................... 15
Figure 9 – Pseudonymisation for Data Sharing with External Business Partners...............................................................16
Figure 10 – Overview of Approaches to Pseudonymisation ..............................................................................................19
Figure 11 – Pseudonymisation By Replacing ID Fields With Linking Identifier...................................................................20
Figure 12 – Pseudonymisation By Replacing ID Fields With Linking Identifier – Multiple ID Fields ....................................21
Figure 13 – ID Field Hashing Pseudonymisation ...............................................................................................................22
Figure 14 – ID Field Hashing Pseudonymisation With Data Salting And Peppering...........................................................24
Figure 15 – Data Attacks – ID Field Hashing Pseudonymisation With Data Salting And Peppering ...................................25
Figure 16 – Content Hashing Pseudonymisation ..............................................................................................................26
Figure 17 – Pseudonymisation and Data Lakes/Data Warehouses .................................................................................... 27
Figure 18 – Pseudonymisation and Data Breaches ...........................................................................................................29
Figure 19 – Differential Privacy and Differencing Attacks................................................................................................. 31
Figure 20 – Differencing Attack, Reconstruction Attack And Mosaic Effect......................................................................32
Figure 21 – Differential Privacy Operation........................................................................................................................ 33
Figure 22 – Data Privatisation and Differential Privacy Balancing Act ..............................................................................34
Figure 23 – Operational Data Privatisation and Differential Privacy Solution Architecture ............................................... 35
Figure 24 – Sample High-Level On-Premises Deployment ...............................................................................................38
Figure 25 – Sample High-Level Cloud Deployment ..........................................................................................................39
Figure 26 – Data Privatisation and Differential Privacy Solution Journey..........................................................................43
Figure 27 – Approaches to Data Privatisation and Differential Privacy Solution Scoping and Definition ...........................44
Figure 28 – Early Business Engagement and Differential Privacy Opportunity Validation Process ....................................46
Figure 29 – Differential Privacy Detailed Design Views .................................................................................................... 47
Figure 30 – Areas Covered in Differential Privacy Readiness Assessment .........................................................................48
4. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 4
Introduction
This paper examines the related concepts of data privatisation, data anonymisation, data pseudonymisation, and
differential privacy.
Data has value. To realise this value, it may need to be made more widely available, both within and outside your
organisation for various types of access such as sharing data with outsourcing and service partners or making data
available to research partners. This data sharing must be performed in the context of maintaining personal data privacy.
This paper examines the technology options to provide different types access to data while preserving privacy and
ensuring compliance with the many (and growing) data privacy regulatory and legislative requirements.
You need to take a risk management approach to data sharing and third-party data access. Appropriate technology,
appropriately implemented and operated is a means of managing and reducing risks of re-identification by making the
time, skills, resources and money necessary to achieve this unrealistic. A demonstrable technology-based approach to
data privacy supported by a data sharing business framework reduces an organisation’s liability in the event of data
breaches.
For example, with the EU GDPR (General Data Protection Regulation)1 where a data breach occurs, the controller is
exempted from its notification obligations where it can show that the breach is ‘unlikely to result in a risk to the rights and
freedoms of natural persons’2 such as when pseudonymised data leaks and the re-identification risk is remote.
Organisations need a well-defined and implement process that enables them to make your data available as widely as
possible without exposing them to risks associated with non-compliance with the wide range of differing data privacy
regulations.
Managing data privacy in the context of data access and sharing arrangements encompasses the areas of:
• Data Governance
• Privacy Management
• Security Management
• Risk Management
1 See http://eur-lex.europa.eu/legal-content/en/TXT/?uri=CELEX%3A32016R0679
2 See GDPR recitals 80 and 85 and articles 27 and 33.
5. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 5
Figure 1 – Data Privacy Subject Areas
Managing data privacy in the context of data access and sharing arrangements is a balancing access between data
privacy and data utility. Perfect data privacy can be achieved by not sharing or making accessible any data irrespective of
whether it contains personal identifiable information. The result of this is that data is unused.
Perfect data utility can be achieved by sharing and making accessible all data. The result of this is that there is no data
privacy.
One aspect of data privacy management is taking a risk-based approach to this balancing act.
Figure 2 – Data Privacy and Data Utility Balancing Act
This paper describes some practical, realistic and achievable approaches to implementing data privatisation using
pseudonymisation and differential privacy approaches as a means of addressing your data sharing and access
requirements and opportunities.
6. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 6
This paper covers the following topics:
• Personal Information – this discusses what is meant by personal information.
• Third-Party Data Sharing And Data Access Framework – data sharing is a business issue enabled through
technologies. But it is primarily a business concern and any arrangements should be grounded in a business
framework.
• Data Privacy Technologies and Context Of Data Privatisation – this discusses data privatisation approaches of
anonymisation, pseudonymisation and differential privacy. It covers the GDPR origin of pseudonymisation, the
growing importance of pseudonymisation and various approaches to pseudonymisation, hashing and
pseudonymisation and data lakes/data warehouses.
• Data Breaches and Attacks – the provides background information on data breaches and attacks and how data
privatisation approaches provide protection against them.
• Why Data Privatisation and Differential Privacy – this provides a context to the need for a robust, secure
operational data privatisation and differential privacy technology framework.
• Data Privatisation and Differential Privacy Solution Architecture Overview – how does a differential privacy
solution sits within your existing information technology solution and data landscape, what are its components and
what are the solution deployment options.
• Data Privatisation and Differential Privacy Solution Planning – what does an exercise to plan for the
implementation and operation of a successful data privatisation and differential privacy solution consists of.
• Data Privatisation and Differential Privacy Solution Operation and Use – how the data privatisation and
differential privacy solution is operated and used.
• Differential Privacy Next Steps – this describes a set of possible next steps and types of engagement to allow you
move along the data privatisation and differential privacy journey successfully.
Personal Information
Personal information is any information relating to an identified or identifiable natural person. This can be direct -
information that directly identifies a single individual – or indirect or quasi-identifiers - information that can be used to
identify an individual by being linked with other information.
Quasi-identifiers include information such as date of birth, date of death, post code and others. These do not specifically
link to an individual but such links can be determined.
Personal information can be structured or unstructured such as free-form text or it can take other forms such as images
(photographs, medical images) or other data types such as genomic data.
Personal information can be stored in multiple different ways from database tables and columns to data formats such as
documents and spreadsheets to image files. Personal information may also exist in the form of metadata attached to
date files.
The technologies underpinning data privatisation will need to handle all these data types and formats.
7. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 7
When considering data privatisation in the context of data access and sharing, the full set of personal information and the
range of data formats should be considered. The approach to handling quasi-identifiers may be different to that taken for
direct identifiers. Rather than completely removing them they could made more general such as month and year for date
of birth or a data range could be specified.
Third-Party Data Sharing And Data Access Framework
Managing data privacy in the context of data access and data sharing is not just a technology concern. The selection,
implementation and operation of technologies needed to ensure data privacy exist within a wider data sharing and access
framework. Organisations that intend to share and provide access to data should define such a framework. This will
provide an explicit approach rather than leaving such arrangements implicit and poorly defined. It will reduce the time
and effort required to implement data access and sharing. It will ensure a consistent and coherent approach. The
following diagram describes a possible structure for such a framework.
Data access and sharing covers both internal, such as other business units other than the originating business unit
accessing data, and external – third-parties being given access to data for business and research purposes.
Figure 3 – Data Sharing and Data Access Framework
This framework has the following dimensions:
1. Business and Strategy Dimension – this relates to the overall organisation posture relating to internal and external
data access and sharing and needs to cover topics such as:
• Overall Objectives, Purposes and Goals – this sets the context and overall direction of and the principles that
will underpin data sharing and data access arrangements. The objectives, purposes and goals of these
arrangements will be defined.
8. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 8
• Data Sharing Strategy – this will define the organisation’s strategy for internal and external data sharing and
access – why it is being done, who will be allowed access to data, the types of data to which access will be
granted, the types of access allowed and the technology approaches that will be used.
• Risk Management, Governance and Decision Making – this will cover how data sharing and access
arrangements will be governed and managed, how decisions will be made on these arrangements and how data
sharing and access risks will be managed.
• Charges and Payments – this will define the charges and payments structure, if applicable, that will apply to
data access and sharing arrangements.
• Monitoring and Reporting – this will document how the operation and use of data access and sharing
arrangements will be monitored, audited and reported on.
2. Legal Dimension – this encompasses the legal aspects of data sharing and needs to cover topics such as:
• Data Privacy Legislation and Regulation Compliance – this will cover the activities of researching and
monitoring the data privacy legislative and regulatory landscape and any changes and developments that may
impact data access and sharing.
• Contract Development and Compliance – this will encompass the development, negotiation and
implementation of contractual arrangements governing specific data access and sharing arrangements.
3. Technology Dimension – this covers technology and security standards and needs to cover topics such as:
• Data Sharing and Data Access Technology Selection – this covers the arrangements and responsibilities for
selecting the tools and technologies that will be used to implement data access and sharing.
• Technology Standards Monitoring and Compliance – this will define the responsibilities for and scope of
monitoring technology standards and developments, the organisation’s adoption of and compliance with those
standards and managing change as the standards change.
• Security Standards Monitoring and Compliance – this will describe both how data access and sharing security
standards should be monitored, how security is implemented for data sharing and access arrangements and
managing change as the standards change.
4. Development and Implementation Dimension – this relates to the implementation of data sharing technology tools
and platforms and of specific data access and sharing arrangements and needs to cover topics such as:
• Technology Platform and Toolset Selection and Implementation – the includes the selection and
implementation of specific data access and sharing technologies covering security and access control, the range
of data types and data access facilities being offered.
• Functionality Model Development and Implementation – this relates to defining and implementing the data
access and sharing functionality, features being offered and the tools and technologies that will support them.
• Data Sharing and Access Implementations – this encompasses the specification and implementation of specific
data access and sharing arrangements.
• Data Sharing and Access Maintenance and Support – this covers the maintenance and support arrangements
both of the overall data access and sharing tools, platforms and technologies as well as the specific
arrangements.
9. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 9
5. Service Management Dimension – this defines the operational processes that should be defined and implemented in
order to operate data sharing and needs to cover topics such as:
• Service Management Processes – this defines the operational and service management processes that need to
be implemented and operated.
• Operational and Service Level Agreement Management – this covers the topic of defining and then managing
and monitoring compliance with operational and service level agreements for data access and sharing
arrangements.
• Maintain Inventory of Data Sharing Arrangements – this covers the maintenance of a list of current and
previous data sharing and access arrangements.
• Service Monitoring and Reporting – this defines how the data sharing arrangements will be monitored and
reported on.
• Issue Handing and Escalation – this covers how any issues relating to the operation and use of data sharing will
be recorded, handled and escalated.
There are different data sharing and access arrangements.
Figure 4 – Data Sharing and Access Topologies
Data can be made available more widely within the organisation for purposes for which it was not originally collected.
Data can be made publically available. Once this has been done, it will not be possible to control who uses it, the uses to
which it is put or to recall it.
10. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 10
Data can be shared subject to some form of legal or contractual arrangement.
Data can be shared through some form of controlled and secure facility.
In the last two arrangements, some form of trust exists between the sharing entity and the data recipient. This sharing
may be supported by penalties (after disclosure) or by technology (disclosure prevention) or both.
Data can be pushed to the target or the data can be made available to the target through a pull or download facility.
The data sharing and access framework should cover all these possibilities.
Within the context of data access and sharing, data privatisation can be viewed as a spectrum from completely
identifiable data to data that is not linked to individuals.
Figure 5 – Data Privatisation Spectrum
The data privacy risk is reduced as you move further to the right. Data utility may also be reduced as you move to the
right.
The data sharing and access framework should combine both the data sharing and access topology and data privatisation
spectrum to get a more complete view of data access arrangements.
Data Privacy Technologies
Data privatisation is the removal of personal identifiable information (PII) from data. At a very high-level, data
privatisation can be achieved in one or both of two ways:
1. Data Summarisation – sets of individual data records are compressed into summary statistics with all personal
information removed
2. Data Tokenisation – the personal data within a dataset that allows an individual to be identified is replaced by a
token (possibly generated from the personal data such as by hashing), either permanently (anonymisation) or
reversibly (pseudonymisation)
11. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 11
Figure 6 – Data Privacy Technologies
There are different routes to making data accessible and shareable within and outside the organisation without
compromising compliance with data protection legislation and regulations and removing the risk associated with
allowing access to personal data.
• Differential Privacy – source data is summarised, and individual personal references are removed. The one-to-one
correspondence between original and transformed data has been removed.
• Anonymisation – identifying data is destroyed and cannot be recovered so individual cannot be identified. There is
still a one-to-one correspondence between original and transformed data.
• Pseudonymisation – identifying data is encrypted and recovery data/token is stored securely elsewhere. There is still
a one-to-one correspondence between original and transformed data
These technologies and approaches are not mutually exclusive – each is appropriate to different data sharing and data
access use cases.
Context Of Data Privatisation – Anonymisation, Pseudonymisation And
Differential Privacy
The wider context of data privatisation and specific approaches for enabling it such as anonymisation, pseudonymisation
and differential privacy can be represented by the four interrelated areas of:
• Value in Data Volumes and Data Assets – you have expended substantial resources in gathering and processing and
generating data. This data has value that you want to realise by making it more widely available. The need to comply
with the increasing body of data protection and privacy laws inhibits your ability to achieve this.
• Data Privacy Laws and Regulations – you need to ensure that making your data available to a wider range of
individuals and organisations does not breach the ever-increasing set of data protection and privacy legislation and
regulations. All too frequently the cost of and concerns around ensuring this compliance prevents this wider data
access.
12. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 12
• Technologies – the various data privatisation privacy technologies are mature, well-proven, industrialised and are
independently certified. They can be used to provide controlled, secure access to your data while guaranteeing
compliance with data protection and privacy legislation. Using these technologies will embed such compliance by
design into your data sharing and access facilities. This will allow you to realise value from your data successfully.
• Data Processes and Business Data Trends – the volumes of data available to organisations are increasing. The range
of analysis tools and technologies available are increasing. Data storage is moving to cloud-platforms that can handle
data volumes and provide analysis tools more easily than costly and complex on-premises solutions that are available
only to larger organisations. Organisations are outsourcing more business processes to third parties. These
outsourcing arrangements require the sharing of data.
Figure 7 – Context of Data Privatisation
To achieve the value inherent in your data you need to be able to make it appropriately available to others. You need a
process that enables you to make your data available as widely as possible without exposing you to risks associated with
non-compliance with the wide range of differing data privacy regulations. You need one data access framework and
associated set of technologies that work for all data access and sharing while guaranteeing legislative and regulatory
compliance.
13. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 13
Data Privatisation Topology –
Data Privacy Laws and
Regulations
The landscape of data protection
and privacy legislation and
regulations is extensive, complex
and growing – this is just a partial
and incomplete view.
Organisations that share data
externally need to be able to
guarantee compliance with all
relevant and applicable
legislation.
Data Privatisation Topology –
Value in Data Volumes and
Data Assets
Organisations have more and
more data of increasing
complexity that they want and
need to share in order to
generate value.
14. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 14
Data Privatisation Topology –
Technologies
There are a range of well-proven
technologies available for
ensuring data privacy.
Data Privatisation Topology –
Data Processes and Business
Data Trends
Organisations want to outsource
their business processes and
share their data with partners to
gain access to specialist analytics
and research skills and tools.
Data Sharing Use Cases
There are many data sharing use cases and scenarios that involve the sharing potential personal identifiable information
such as:
• Share data with other business functions within your organisation
• Use third-party data processing and storage platform and facilities
• Use third-party data access and sharing as a service platform and facilities
• Use third-party data analytics platform and facilities
• Engage third-party data research organisations to provide specialist services
15. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 15
• Share data with external researchers
• Outsource business processes and enable data sharing with third parties
• Share data with industry business partners to gain industry insights
• Share data to detect and avoid fraud
• Share customer data with service providers at the request of the customer
• Enable customer switching
• Participate in Open Data initiatives
Pseudonymisation
Pseudonymisation is an approach to deidentification where personally identifiable information (PII) values are replaced
by tokens or artificial identifiers or pseudonyms.
Pseudonymisation is one technique to assist compliance with EU General Data Protection Regulation (GDPR)
requirements for secure storage of personal information.
Pseudonymised is intended to be reversible: the pseudonymised data can be restored to its original state.
Personal data fields can be individually pseudonymised so there is a one-to-one correspondence between original source
data fields and transformed data fields or the personal data fields can be removed and replaced with a token.
Figure 8 – Overview of Pseudonymisation
16. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 16
Why Pseudonymise Rather Than Anonymise?
Personal identifiable data is pseudonymised when there is a need to re-identify the data, for example, after it has been
worked on by a third-party either within or outside the organisation and the results of the processing need to be matched
to the original data. The following diagram illustrates such a scenario.
Figure 9 – Pseudonymisation for Data Sharing with External Business Partners
The numbered steps are:
1. Original Data – this is the original collected or processed data containing personal identifiable information.
2. Pseudonymised Data – the personal identifiable information within the data is pseudonymised.
3. Pseudonymisation Key – there is separate pseudonymisation key that allows pseudonymised for be re-identified
when needed. This needs to be kept separate from the pseudonymised data.
4. Pseudonymised Data Transmitted to Data Processor – the pseudonymised data is then sent to the external data
processor for their use.
5. Processed Data with Additional Processed Data – the data is enriched with the results of additional processing.
6. Pseudonymised Data with Additional Processed Data Returned – the enriched data is returned to the organisation.
7. Original Data Merged with Additional Processed Data – the enriched data is re-identified using the previously
created pseudonymisation key.
Pseudonymisation can also be used as part of the archiving process for data containing personal identifiable information
after its main processing has been completed and the data is being retained for historical purposes.
GDPR Origin Of Pseudonymisation
The use of pseudonymisation as a form of encryption of personal identifiable information gained importance and
legitimacy from the GDPR. Pseudonymisation is referred to many times in the GDPR.
The term pseudonymisation is defined in Article 4(5) of the GDPR:
‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no
longer be attributed to a specific data subject without the use of additional information, provided that such
17. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 17
additional information is kept separately and is subject to technical and organisational measures to ensure that
the personal data are not attributed to an identified or identifiable natural person;
Pseudonymisation is also referred to in Recitals 26 and 28 of the GDPR:
Recital 26
The principles of data protection should apply to any information concerning an identified or identifiable
natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural
person by the use of additional information should be considered to be information on an identifiable natural
person. To determine whether a natural person is identifiable, account should be taken of all the means
reasonably likely to be used, such as singling out, either by the controller or by another person to identify the
natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the
natural person, account should be taken of all objective factors, such as the costs of and the amount of time
required for identification, taking into consideration the available technology at the time of the processing and
technological developments. The principles of data protection should therefore not apply to anonymous
information, namely information which does not relate to an identified or identifiable natural person or to
personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This
Regulation does not therefore concern the processing of such anonymous information, including for statistical
or research purposes.
Recital 28
The application of pseudonymisation to personal data can reduce the risks to the data subjects concerned and
help controllers and processors to meet their data-protection obligations. The explicit introduction of
‘pseudonymisation’ in this Regulation is not intended to preclude any other measures of data protection.
Article 32(1)(a), dealing with security refers to the pseudonymisation and encryption of personal data, uses
pseudonymisation to mean changing personal data so that resulting data cannot be attributed to a specific person
without the use of additional information.
Article 89, covering safeguards and derogations relating to processing for archiving purposes in the public interest,
scientific or historical research purposes or statistical purposes, refers to pseudonymisation as follows
1. Processing for archiving purposes in the public interest, scientific or historical research purposes or statistical
purposes, shall be subject to appropriate safeguards, in accordance with this Regulation, for the rights and
freedoms of the data subject. Those safeguards shall ensure that technical and organisational measures are in
place in particular in order to ensure respect for the principle of data minimisation. Those measures may
include pseudonymisation provided that those purposes can be fulfilled in that manner. Where those
purposes can be fulfilled by further processing which does not permit or no longer permits the identification of
data subjects, those purposes shall be fulfilled in that manner.
Article 6 (4), covering lawfulness of processing, refers to pseudonymisation as a means of possibly contributing to the
compatibility of further use of data:
Where the processing for a purpose other than that for which the personal data have been collected is not
based on the data subject's consent or on a Union or Member State law which constitutes a necessary and
proportionate measure in a democratic society to safeguard the objectives referred to in Article 23(1), the
controller shall, in order to ascertain whether processing for another purpose is compatible with the purpose for
which the personal data are initially collected, take into account, inter alia:
(a) any link between the purposes for which the personal data have been collected and the purposes of the
intended further processing;
18. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 18
(b) the context in which the personal data have been collected, in particular regarding the relationship between
data subjects and the controller;
(c) the nature of the personal data, in particular whether special categories of personal data are processed,
pursuant to Article 9, or whether personal data related to criminal convictions and offences are processed,
pursuant to Article 10;
(d) the possible consequences of the intended further processing for data subjects;
(e) the existence of appropriate safeguards, which may include encryption or pseudonymisation.
Article 25 refers to pseudonymisation as a means to contribute to data protection by design and by default in data
applications
1. Taking into account the state of the art, the cost of implementation and the nature, scope, context and
purposes of processing as well as the risks of varying likelihood and severity for rights and freedoms of natural
persons posed by the processing, the controller shall, both at the time of the determination of the means for
processing and at the time of the processing itself, implement appropriate technical and organisational
measures, such as pseudonymisation, which are designed to implement data-protection principles, such as
data minimisation, in an effective manner and to integrate the necessary safeguards into the processing in
order to meet the requirements of this Regulation and protect the rights of data subjects.
Encryption is a form of pseudonymisation. The original data cannot be read. The process cannot be reversed without the
correct decryption key. GDPR requires that this additional information be kept separate from the pseudonymised data.
Pseudonymisation reduces risks associated with data loss or unauthorised data access. Pseudonymised data is still
regarded as personal data and so remains covered by the GDPR. It is viewed as part of the Data Protection By Design
and By Default principle.
Pseudonymisation is not mandatory. Implementing pseudonymisation with old legacy IT systems and processes may be
complex and expensive and, to that extent, pseudonymisation might be considered an example of unnecessary
complexity within the GDPR.
In relation to processing that does not require identification, it is appropriate to refer to Article 11. Article 11(1) provides
that if the purposes for which a controller processes personal data do not, or no longer, require the identification of a data
subject by the controller, the controller shall not be obliged to maintain, acquire or process additional information in
order to identify the data subject for the sole purpose of complying with the GDPR. Where, in such cases, the controller is
able to demonstrate that it is not in a position to identify the data subject, the controller shall inform the data subject
accordingly, if possible and in such cases, Articles 15 to 20 shall not apply except where the data subject, for the purpose
of exercising his or her rights under those articles, provides additional information enabling his or her identification.
The GDPR has effectively made pseudonymisation the recommended approach to protecting personal identifiable
information.
19. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 19
Growing Importance Of Pseudonymisation
The Schrems II judgement3 has further increased the importance and relevance of data pseudonymisation. This increased
the importance of pseudonymisation in relation to data transfers outside the EU. The judgement found that the US FISA
(Foreign Intelligence Surveillance Act) does not respect the minimum safeguards resulting from the principle of
proportionality and cannot be regarded as limited to what is strictly necessary. While the changes apply to transfers
outside the EU, especially the US, they can be adopted pervasively to all data transfers to ensure consistency.
The European Data Protection Board (EDPB) adopted version 2 of its recommendations on supplementary measures4 to
enhance data transfer arrangement to ensure compliance with EU personal data protection of personal requirements.
In this context, data pseudonymisation must ensure that:
• Data is protected at the record and data set level as well as the field level so that the protection travels with the data
wherever it is sent
• Direct, indirect, and quasi-identifiers of personal information are protected
• The approach must attempt to prevent against mosaic effect re-identification attacks by adding high levels of
uncertainty to pseudonymisation techniques.
Approaches To Pseudonymisation
There are several potential approaches to pseudonymisation that can be implemented, as shown in the following
diagram:
Figure 10 – Overview of Approaches to Pseudonymisation
These approaches include:
• Replace IDAT Fields With Linking Identifier
• Hash IDAT Fields
3 https://curia.europa.eu/juris/document/document.jsf?text=&docid=228677&pageIndex=0&doclang=en
4 https://edpb.europa.eu/system/files/2021-06/edpb_recommendations_202001vo.2.0_supplementarymeasurestransferstools_en.pdf
20. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 20
• Hash IDAT Fields With Additional Salting/Peppering
• Generate Hash From All Contents
These approaches are explained in more detail in the next sections. In the following, IDAT means identifying data and
refers to personal identifiable information and ADAT means analytic information.
Pseudonymisation By Replacing ID Fields With Linking Identifier (Token)
This approach involves replacing identifying data fields with a random value. These random values are then stored in a
separate secure non-accessible set of data that links the random value to the original record.
Figure 11 – Pseudonymisation By Replacing ID Fields With Linking Identifier
21. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 21
Pseudonymisation By Replacing ID Fields With Linking Identifier – Multiple ID Fields
Where there are multiple identifying data fields these can be replaced with random values. The multiple identifying data
fields can be removed and replaced with single identifier.
Figure 12 – Pseudonymisation By Replacing ID Fields With Linking Identifier – Multiple ID Fields
Replacing multiple source fields with a single token field reduces the granularity with which the original source data can
be retrieved. The entire set of source fields must be retrieved from the depseudonymisation key and then the individual
field required can be retrieved.
ID Field Hashing Pseudonymisation
The hashing approach to pseudonymisation involves replacing identifying data with a hash code of the data. So, for
example the SHA3-512 hash of IDAT1 in hexadecimal is:
576c23e0ec773508ae7a03d1b286d75f3a7cfe524625b658a1961d3fa7b0ebb4cc01b3b530c63
4c9525631614ad3ebcb3afb69d33e5d8608a1587c2f43c16535
The SHA3-512 algorithm returns a 512-bit value. The hexadecimal value above is represented as the following binary
string:
01010111011011000010001111100000111011000111011100110101000010001010111001111
01000000011110100011011001010000110110101110101111100111010011111001111111001
01001001000110001001011011011001010111101000011001011000011101001111111010011
22. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 22
11011000011101011101101001100110000000001101100111011010100110000110001100011
01001100100101010010010101100011000101100001010010101101001111101011110010110
01110101111101101101001110100110011111001011101100001100000100010100001010110
00011111000010111101000011110000010110010100110101
Storing a SHA3-512 hash code requires 64 bytes. In the case of some identifying data fields, this may be longer than the
field itself. So pseudonymisation will increase storage requirements by replacing shorter fields with longer ones and by
requiring the storage of separate depseudonymisation keys – see page 28
The input identifying data cannot be recalculated from hash directly. However, hash values can be easily and quickly
calculated (“brute force” attack) and compared to pseudonymised values to generate the original identifying data.
Figure 13 – ID Field Hashing Pseudonymisation
Hashing And Identifier Codes
If any of the IDAT fields contains a recognisable identifier code then brute force hash attacks are very feasible, even with
modest computing resources. In general, identifying data tends to be more structured than other data – names,
addresses, codes and so on.
For example, consider an identifier code with a format such as:
23. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 23
AAA-NNN-NNN-C
where:
A is an upper-case alphabetic character
N is a number from 0-9
C is a check character
There are 17,576,000,000 possible combinations of this sample identifier code. This may appear to be a large number. But
a single high-specification PC could calculate all the SHA3-512 hash values for these combinations in a few hours.
So, unless the input to the hash generation is augmented with additional more random information, brute force attacks
are feasible.
The following illustrates how a small (single character) change (in the case, changing a character from lower to upper
case) in the sample input value generates very different hash codes.
Input SHA3-512 Hash
... no man has the right to fix the boundary of a nation. No
man has the right to say to his country, "Thus far shalt
thou go and no further", and we have never attempted to
fix the "ne plus ultra" to the progress of ...
e0ef7bd38b6b4bc6a27e7260d2162b2ea58cf5a
fa5098072d0f735f9d73b67f9b9f699b8b098ec
41d44e117135e88b3cfb670876a2f34efd5734e
7ce80b64450
... no man has the right to fix the boundary of a nation. No
man has the right to say to his country, "Thus far shalt
thou go and no further", and we have never attempted to
fix the "Ne plus ultra" to the progress of ...
e0ab9f0efb8f4cc2b89b73439f7b1365e687b17
b7e0bdc0ede00751a5a883ad8ee0877b9b6a303
2ad23521a7bc25a0b199e5c57cdb2cb5d7500c9
97e133c41a1
... no man has the right to fix the boundary of a nation. No
man has the right to say to his country, "Thus far shalt
thou go and no further", and we have never attempted to
fix the "ne Plus ultra" to the progress of ...
61361212da56a824559b81409cf02ba5f8c3bf4
1d4c8038faa885a183e1bdac1705eefad72594a
f1fc3901aa55295c3166eb6635ca866f1e5cdf5
6c7ff0fb56a
... no man has the right to fix the boundary of a nation. No
man has the right to say to his country, "Thus far shalt
thou go and no further", and we have never attempted to
fix the "ne plus Ultra" to the progress of ...
833d8b7cc47843cf74fd42cbbf782e87543c677
ecbdc1f7fe4d7ad9166557fac4c17d467fa8130
2a195e60a0a6f3f89c34e03a5c94eefcb3f19ca
bcfd87a37ad
Hashing And Reversibility
The hash of a value is always the same – there is no randomness in hashing. However, as show above, hashes of very
similar input values are very different. A very small input change leads to very large difference in the generated hash. For
SHA3-512 a 0.5% change in input value leads to 85%-95% difference in hash output.
So, given two hash values, it cannot be easily determined how similar the input values are or what the structure of the
input values might be. This non-correlation property means the hash function is characterised by erratic behaviour in its
output generation.
Hashing process as a form of pseudonymisation is potentially vulnerable to brute force attacks as large number of hashes
can be generated very easily and quickly. If you have some knowledge of the input value, you can generate large numbers
of permutations and their hashes and compare values with the known hash to identify the original value. But ultimately
you have to have the exact input value to generate the same hash: being very close is of no benefit
Therefore, combining the original data with even a small amount of randomised data renders brute force attacks of hash
values more complex.
24. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 24
ID Field Hashing Pseudonymisation With Data Salting And Peppering
Salt is an additional different data item added to each identifying data field before hashing. Pepper is a fixed item of data
added to record or field level data before hashing. With this approach the hashed identifying data is:
HASH(CONCATENATE(IDATi+SALTi+PEPPER)) =
For example, SHA3-512(CONCATENATE(IDAT1+SALT1+PEPPER)) =
3fa075114200b2327092f18067059ba81a5b191b33d5a10a2042673adcb119fac4dc5d3f63c60
d44e132f4db5996d416fd70216d4e055f1e5ccc0258ff15e1e1
This approach eliminates almost all the risk from brute force hash generation attacks unless approach to generating Salt
and Pepper can be determined.
Figure 14 – ID Field Hashing Pseudonymisation With Data Salting And Peppering
While the Pepper value seems to add little to the randomisation of the hash, it makes determining the pseudo random
number generator harder and thus makes the hash more secure.
25. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 25
One possible approach to generating the Salt is to use a cryptographically secure pseudo random number generator5
(PRNG) to generate salt values. Other less secure PRNGs are vulnerable to attacks.
This ensures that the random salt values are very difficult to determine which in turn makes brute force attacks virtually
impossible. The following show some examples of random numbers added to identifying data to generate has codes:
HASH(CONCATENATE(IDAT1+1144360296176+2356573852518))
HASH(CONCATENATE(IDAT2+4700182946372+2356573852518))
HASH(CONCATENATE(IDAT3+1112492458021+2356573852518))
HASH(CONCATENATE(IDAT4+2755842713752+2356573852518))
HASH(CONCATENATE(IDAT5+6908485085952+2356573852518))
Data Attacks – ID Field Hashing Pseudonymisation With Data Salting And Peppering
Using this approach to augment the identifying data hash, in order to find the identifying data and additional random
data used to generate a hash code, you need to know three pieces of information:
1. The structure of the identifying data in order to generate all possible permutations
2. The pseudo random number generator used to generate the Salt values
3. The specific Pepper code used, if this has been added.
Figure 15 – Data Attacks – ID Field Hashing Pseudonymisation With Data Salting And Peppering
5 Examples of cryptographically secure pseudo random number generators are:
Fortuna - https://www.schneier.com/academic/fortuna/
PCG - https://www.pcg-random.org/
26. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 26
Content Hashing Pseudonymisation
Content hashing involves generating the hash token from the entire record contents rather than just individual
identifying fields. For example, the hash is generated from:
SHA3-512(IDAT1,ADAT1,SALT1,PEPPER) =
df767164078cb0779d06c1de02de74c62192461e82bbb0d01d60c3c3664c9c69111d5d2f07415
333e85cc04acfc1f7a204eadd8deead25a63c5a5ad343a5b3f2
This results in a very high degree of variability in the source data for the hashes. It increases the difficulty of identifying
the source data that generated the hash code.
Figure 16 – Content Hashing Pseudonymisation
27. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 27
Pseudonymisation And Data Lakes/Data Warehouses
Data should be pseudonymised before the data lake and/or data warehouse is populated as part of a Data Privacy By
Design And By Default approach.
At a high-level the stages involved in this are:
1. As part of the standard ETL/ELT process, the source data is pseudonymised and the depseudonymisation key is
created.
2. The pseudonymised data is passed to the data lake. The data may remain in the data lake or it may be used to
populate the data warehouse.
3. The pseudonymised data created by the ETL/ELT process may be used to update the data warehouse directly,
bypassing the data lake stage.
4. The pseudonymised data in the data lake is used to update the data warehouse.
Figure 17 – Pseudonymisation and Data Lakes/Data Warehouses
The data in the data warehouse can be made available for more general use within the organisation without any concerns
about personal data being made available. This ensures compliance with GDPR article 6 (see page 17). In this case,
pseudonymisation is used as part of the archiving process for data containing personal identifiable information after its
main processing has been completed and the data is being retained for historical and analytical purposes.
28. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 28
Pseudonymisation Implementation
As mentioned on page 22, storing a SHA3-512 hash code requires 64 bytes. In the case of some identifying data fields,
this may be longer than the field itself. So pseudonymisation will increase storage requirements by replacing shorter
fields with longer ones and by requiring the storage of separate depseudonymisation keys.
For example, a table in an Oracle database with 10 million records, five IDAT fields each with an average length of 20
bytes, five ADAT fields each with an average length of 8 bytes and one index column of 8 bytes will require about 1.22 GB
of storage.
With pseudonymisation of individual IDAT fields, these will be replaced with 64 bytes each. The table size will increase to
about 2.48 GB.
There will also be a depseudonymisation key table that will hold both the original five IDAT fields each with an average
length of 20 bytes and the five pseudonymisation fields of 64 bytes each as well as one index column of 8 bytes. This will
occupy 2.95 GB of storage.
So, in this example, pseudonymisation increases storage requirements from 1.22 GB to 5.43 GB, an increase of 4.21 GB.
As mentioned on page 21, replacing multiple source IDAT fields with a single pseudonymisation hash reduces the
granularity with which the original source data can be retrieved. The entire set of source fields must be retrieved from the
depseudonymisation key and then the individual field required can be retrieved. This reduces the storage overhead.
The use of a separate depseudonymisation key table is not required. The original source data with its personal identifiable
information can be used as the depseudonymisation key. The pseudonymised data will need to store a link to the row in
the original source data. The hash code contained in the pseudonymised data could be compared with a hash code
generated from the source data. However, in this case, if the hash generation process was augmented with salting and
peppering, the correct sale would have to be regenerated.
Data Breaches and Attacks
The objectives of data privatisation technologies are:
• To prevent data breaches and attacks
• To minimise or eliminate the impact of a data breach or attack
Data privatisation technologies are just one of a number of layers of data protection an organisation should implement to
its systems and data.
Data access and data sharing arrangements introduce an additional level of data privatisation complexity in that the
person or organisation being given access to the data may be the attacker. Or the data protection arrangements
implemented and operated by the person or organisation being given data access may not have the same level of data
protection arrangements as the source organisation.
So, the source organisation should assume that data sharing and access arrangements are implicitly compromised and
act accordingly.
There are many security frameworks that can be used to define this wider organisation security framework, such as:
• Center for Internet Security (CIS) Critical Security Controls – https://www.cisecurity.org/controls/
29. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 29
• Control Objectives for Information Technologies (COBIT) – https://www.isaca.org/resources/cobit
• NIST: Cybersecurity Framework, 800-53, 800-171 – https://csrc.nist.gov/Projects/risk-management/sp800-53-
controls/downloads
• US FedRAMP (Federal Risk and Authorization Management Program – https://tailored.fedramp.gov/) Security
Controls Baseline – https://tailored.fedramp.gov/static/APPENDIX%20A%20-
%20FedRAMP%20Tailored%20Security%20Controls%20Baseline.xlsx
• Cybersecurity Maturity Model Certification (CMMC) – https://www.acq.osd.mil/cmmc/documentation.html
• Cloud Security Alliance (CSA)Cloud Controls Matrix (CCM) – https://cloudsecurityalliance.org/research/cloud-
controls-matrix/
The analysis of these security standards and frameworks is outside the scope of this paper.
Pseudonymisation and Data Breaches
Pseudonymisation protects against data breaches by making data unusable should it be exposed.
Figure 18 – Pseudonymisation and Data Breaches
The ways in which pseudonymised data can be exposed and the impact of these breaches include:
1. The data may be exposed, accidentally or deliberately, by the entity with which the data is shared. If the data is
correctly pseudonymised and if the pseudonymisation algorithm is protected then the impact of such a breached
would be low.
30. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 30
2. The sharing organisation may cause the pseudonymised data to be exposed. For example, the data sharing
mechanism used to share or provide access to the data may be compromised. The impact of such a breached would
be low.
3. The depseudonymisation may be compromised. The risk of personal data re-identification will be high if this
happens.
4. The pseudonymisation algorithm may be compromised. The risk of personal data re-identification will be high if this
happens.
Differencing Attack
Differencing attacks work by running multiple partially overlapping queries can be run against summarised data until the
results can be combined to identify an individual. Differencing attacks apply especially to differential privacy data access
platforms. For example, the following set of queries can be run against the data:
• How many people in the group are aged greater than N?
• How many people in the group aged greater than N have attribute A?
• How many people in the group aged greater than N have attribute B?
• How many people with ages in the range N-9 to N-5 are male?
• How many people with ages in the range N-4 to N are male?
After a number of queries, you may be able to identify individuals or small numbers of individuals in a given age range of a
given sex have a defined attribute. Apparently anonymous summary results can be combined to reveal potentially
sensitive insights and comprise confidentiality.
Differential privacy can be designed to reduce or eliminate the threat of differencing attacks by attaching a cost to each
query. A budget is assigned to the dataset. The amount spent by queries against the dataset is tracked. When the budget
is expended, no more queries can be run until the budget is increased.
A differential privacy platform should be able to track queries performed by solution consumers given access to
determine potential patterns of abuse.
31. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 31
Figure 19 – Differential Privacy and Differencing Attacks
Differencing Attack, Reconstruction Attack And Mosaic Effect
In addition to a differencing attack, there are various types of data attacks that can be performed on data as made
available, without the need to obtain other data access:
• A reconstruction attack uses the information from a differencing attack to identify how the original dataset was
processed to create the summary.
• A mosaic effect attack involves combining data from other data (public) sources to identify individuals. For example,
apparently anonymised medical data containing dates of death can be combined with public death notice records to
identify individual.
This results in a data attack topology that should be monitored to ensure data privatisation is maintained.
32. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 32
Figure 20 – Differencing Attack, Reconstruction Attack And Mosaic Effect
Differential Privacy
Differential privacy allows for the (public) sharing of information about a group or aggregate by describing the patterns of
groups within the group or aggregate while suppressing information about individuals in the group or aggregate. Source
data is aggregated ad summarised and individual personal references are removed. The one-to-one correspondence
between original and transformed data has been removed.
A viewer of the information cannot (or should not be able to) tell if a specific individual's information was or was not used
in the group or aggregate. This involves inserting noise into the results returned from a query of the data by a differential
privacy middleware tool. The greater the noise introduced, the less usable the data will be but the re-identification risk
will be reduced.
It is a well-proven, widely used robust technique6. It aims to eliminate the possibility of re-identification of individuals
from the dataset being analysed. Individual-specific information is always hidden.
Differential privacy technologies are more complex than anonymisation and pseudonymisation as an approach to data
privatisation. It will require more technical skills and the possible selection and implementation of a software platform.
The remainder of this paper covers the topic of differential privacy in more detail.
An effective data privatisation and differential privacy operational solution consists at its core of a computational layer
that introduces deliberate randomisation into the summarised results returned from a data query. This means that the
6 See The Algorithmic Foundations of Differential Privacy https://www.cis.upenn.edu/~aaroth/privacybook.html.
33. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 33
action of running multiple queries across the dataset cannot be used to reconstruct the underlying individual records. It
thus enables Privacy Preserving Data Mining (PPDM). The objective is to prevent access to or identification of specific,
individual personal records or sensitive information while preserving the aggregated or structural properties on the data.
Figure 21 – Differential Privacy Operation
Differential privacy assigns a privacy budget to each dataset. The differential privacy engine introduces a fuzziness into
the results of queries. Each query has a privacy cost. The total privacy expenditure across all queries by all users is tracked.
When the budget has been spent, no further data queries can be performed until more privacy budget is allocated.
Effective and usable data privatisation and differential privacy means finding the right balance between data privacy
and data utility. At one extreme, the solution would be to completely delete or prevent any access to data. While this
preserves absolute data privacy, it also eliminates the utility and usefulness of the data.
34. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 34
Figure 22 – Data Privatisation and Differential Privacy Balancing Act
This results in a balancing act between three factors:
1. Level of Detail Contained in Results Presented
2. Amount and Complexity of Data Processing Allowed
3. Level of Data Privacy
Relaxing or constraining one factor affects the other two. In other to determine the right equilibrium across these factors
for your organisation and your data, you need to explicitly formalise your approach to data privacy and data utility in a
policy. This policy should be made accessible and be able to be understood by those in charge of managing data. The
policy should also be formally defined so its applicability and its subsequent implementation, operation and use can be
verified. Differential privacy technology can them be used to operationalise this policy including monitoring its operation
and use.
Technology is a key enabler of data privatisation and differential privacy. It ensures and embeds Privacy By Design in
your data access solution rather than data privacy concerns being addressed as an afterthought
Data Privatisation and Differential Privacy Solution Architecture Overview
This section describes the idealised architecture and design of an operational data privatisation and differential privacy
solution. This essentially illustrates a reference architecture that you can use to determine what solution components are
needed and what must be installed, implemented, and configured to create a usable and secure solution within your
organisation. It can be used as a structured framework to define business and technical requirements. It can also be used
to evaluate suitable products.
35. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 35
Figure 23 – Operational Data Privatisation and Differential Privacy Solution Architecture
The numbered components of this are:
1. Core Data Privatisation/Differential Privacy Operational Platform – this is the core differential privacy platform.
This can be installed on-premises or on a cloud platform such as AWS, Google Cloud and Azure. It takes and
summarises data from designated data sources and provides different levels of and types of computational access to
authorised users via a data API. It also provides a range of management and administration functions.
2. Data Sources – these represent data held in a variety of databases such as Oracle, SQL Server and other data storage
systems such as HDFS, Cassandra, PostgreSQL and Teradata as well as external data stores such as AWS S3 and
Azure. The differential privacy platform needs read-only access to these data sources.
3. Data Access Connector – these are connectors that enable read-only access to data held in the data sources.
4. Data Ingestion and Summarisation – this takes data from data sources, processes it and outputs in a format suitable
for access. It includes features to manage data ingestion workflows, scheduling and error identification and handing.
5. Data Analysis Data Store – the core differential privacy platform creates pre-summarised versions of the raw data
from the data sources. The platform never provides access to individual source data records. The data is encrypted
while at rest in the data store.
6. Metadata Store – the platform creates and stores metadata about each data source. This is used to optimise data
privacy of the result sets generated in response to data queries.
36. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 36
7. Batch Task Manager – in addition to running online data queries, asynchronous batch tasks can be run for longer
data tasks.
8. Access and Usage Log – this logs data accesses.
9. User Access API – the platform provides an API for common data analytics tools such as Python and R to generate
and retrieve privatised randomised sets of data summaries as well as providing data querying and analytics
capabilities. Data results returned from queries is encrypted while in transit.
10. Data Visualisation Interface – this provides a data access and visualisation interface.
11. User Directory – the platform will use you existing user directories such as Active Directory or Azure Active Directory
for user authentication and authorisation.
12. Authorised Internal Users – authorised internal users can access different datasets and perform different query types
depending on their assigned access rights.
13. Authorised External Users – authorised external users can access different datasets and perform different query
types depending on their assigned access rights.
14. Analytics and Reporting – this will allow you analyse and report on users accesses to data managed by the platform.
15. Monitoring, Logging and Auditing – this will log both system events and user activities. This information can be used
both for platform management and planning as well as identifying potential patterns of data use and possible abuse.
16. Data Access Creation, Validation and Deployment – this will allow new data sources to be onboarded and allow
existing data sources to be managed and updated.
17. Management and Administration – this will provide facilities to manage the overall platform such as adding and
removing users and user groups and applying data privacy settings to different datasets.
18. Security and Access Control – this allows the management of different types of user access to different datasets.
19. Billing System Interface – you may want to charge for data access, either at a flat rate or by access or a mix of both.
This represents an optional link to a financial management system to enable this
Differential Privacy Platform Solution Service Management Processes
Just like any other information technology solution, service management processes should be implemented for an
operational differential privacy solution. Because a differential privacy solution exposes personal data, albeit in a
summarised, randomised and anonymised manner, these service management processes are important. They should be
part of any implementation project. This will maximise confidence in differential privacy technology in your organisation
and reduce project risk. In turn, this will maximise the success of the platform and ensure that return on investment is
optimised.
The following table lists what we regard as being most important service management processes in the context of a
differential privacy solution.
Your organisation will already have invested in information technology service management processes. These should be
extended to the differential privacy platform.
37. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 37
Service Management Process Overview and Scope
Access Management This process is concerned with operationalising security management policies
relating to enabling authorised users access the differential privacy platform and
managing their access lifecycle.
Availability Management This process relates to ensuring the differential privacy platform meets its agreed
availability targets and obligations by planning, defining, measuring, analysing and
improving availability.
Capacity Management This is concerned with planning, defining, measuring, analysing and delivering the
required facilities to ensure that the differential privacy platform has sufficient
capacity to meet its service level commitments in the short-, medium- and long-term.
Compliance Management This process is focused on ensuring that the design, operation and use of the
differential privacy platform complies with legal and regulatory requirements and
obligations.
Knowledge Management This is about ensuring that knowledge about the implementation, operation and use
of the differential privacy platform is collated, stored and shared, maximising reuse
and eliminating the need for knowledge rediscovery.
Operations Management This process is concerned with implementing and operating the housekeeping
activities and tasks relating to the differential privacy solution, including monitoring
and controlling the platform and backup and recovery.
Risk Management This relates to the identification, evaluation and management of risks including
threats to and vulnerabilities of the differential privacy solution.
Security Management This is concerned with ensuring the confidentiality of the data assets contained in the
differential privacy solution. Your organisation will already have invested in security
management. This needs to be extended to the differential privacy solution.
Service Continuity Management This is focused on ensuring the continuity of operation of and access to the
differential privacy solution is maintained in the event of problems.
Service Level Management This relates to the definition of and the subsequent monitoring of service level targets
and service level agreements relating to the access to and use of the differential
privacy solution.
Differential Privacy Platform Deployment Options
This section outlines two solution deployment options: on-premises and in the cloud.
38. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 38
On-Premises Deployment
The following diagram illustrates the key components of an on-premises implementation of a differential privacy solution.
Figure 24 – Sample High-Level On-Premises Deployment
If users outside the organisation are to be given access to the data platform then either an existing external access facility
will be used to provide secure access or a new facility will have to be implemented.
39. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 39
Cloud Deployment
The following diagram illustrates the key components of a cloud implementation of a differential privacy solution.
Figure 25 – Sample High-Level Cloud Deployment
For a cloud deployment, the key differences relate to how on-premises data is processed and transferred to the cloud
platform and how data access users outside the organisation authenticate using an approach such as Azure Active
Directory.
40. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 40
Differential Privacy and Data Attacks
Data Privatisation and Differential Privacy Solution Planning
There are many different paths along the journey to the implementation of an operational data privatisation and
differential privacy solution. The section Data Privatisation and Differential Privacy Next Steps on page 43 lists some of
the possible stages along this journey. This section lists a possible set of activities and tasks that you can used to create a
workplan for implementing a workable solution.
The goal is to create an operational, supportable, maintainable, usable solution that provide access to your data without
compromising data privacy and security.
The implementation of a data privatisation and differential privacy solution is not very different from any other
information technology solution that your organisation wants to implement.
The following high-level set of steps can be iterated several times as you move from an initial pilot implementation to a
complete production solution over time.
• Create a prioritised inventory of potential data sources to which you would like to provide secure privatised
computational access
• Profile the data: understand the structure and contents of data, evaluate data quality and data conformance with
standards, identify terms and metadata used to describe data and identify data relationships and dependencies, data
sensitivity, Privacy Exposure Limit (PEL) and privacy requirements of each dataset
• Define the data extract processes
• Identify the target set of users for access to one or more of the datasets and define the type of access
• Define and agree user access processes and security requirements
41. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 41
• Define the subsets of data to be made available for querying
• Perform capacity planning and analysis in terms of raw data volumes, expected number and type of data access
transactions, data refresh frequency, caching of results for performance, creation of materialised views and other
factors that give rise to resource requirements
• Define and agree platform audit logging and reporting, user activity monitoring, event, exception and alert handling
processes
• Define data access charging and billing
• Define the platform operational administration, maintenance and support processes
• Create a cost model for the solution including license costs, infrastructure, support and maintenance and any
proposed revenue streams
• Decide on the deployment approach
• Define the organisational structures and service management processes needed to support the new solution
• Decide on the data integration approach, especially if the solution is to be deployed on a cloud platform
• Define the different types of training needed: administrator, support, data administrator, data query user
• Create, review, validate and approve a differential privacy solution architecture design that incorporates the
information gathered in the previous steps
• Conduct a security review of the differential privacy solution
• Acquire trial versions of platform licenses
• Acquire deployment infrastructure, either on-premises or cloud
• Configure the differential privacy platform and its data sources
• Validate the platform
• Allow user access to the platform in a phased and controlled manner
Data Privatisation and Differential Privacy Solution Operation and Use
The following table lists some key differential privacy platform use cases and what they entail.
These can be embedded into operational service management processes that are listed in the section Differential
Privacy Platform Solution Service Management Processes on page 36.
42. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 42
Data Privatisation and
Differential Privacy Use
Case
Description
User Enrolment The user must be defined in the organisation’s user directory. The process for enrolling users
outside the organisation depends on the platform deployment model – on-premises or
cloud.
If the user is outside the organisation, then you may choose to use a cloud-based directory
such as Azure Active Directory as a SAML identity provider.
The user can be assigned to one of more groups, if needed. The user (or the groups to which
the user belongs) will have different access rights to different datasets. The access rights
include details on the subsets of data sources that can be queried, and the number and type
of data queries the user can run before being prevented from running additional requests.
Platform Usage
Reporting and Analysis
The usage of the platform can be analyses in several ways:
1. The overall platform performance, rate of usage, number of users, number and type of
data query transactions, both online and batch, can be analysed and reported on. This
will ensure that the platform is able to handle the current and expected future volume of
data and its use.
2. The amount of data privacy exposed by user queries can be analysed to ensure that the
privacy of data being made available is maintained.
3. Any charges for access to your data can be determined and bills generated.
Addition of Data Source The data source should be profiled to understand its structure and content.
A link must be defined between the data source and the differential privacy platform
summarised data subset.
The data refresh frequency must be defined.
The Privacy Exposure Limit (PEL) of the dataset must be defined. This is the maximum
amount of privacy exposed by all data queries run on the dataset. As queries are run, this is
incremented. Once the limit has been reached, no further access is possible.
Platform Security
Auditing
Platform auditing can be performed at three levels:
1. The overall differential privacy platform can be audited to ensure that it guarantees that
no personal information can be disclosed.
2. The privacy settings of individual datasets can be audited to ensure that they are
appropriate for the sensitivity of their information.
3. The use of the platform can be audited through the analysis of audit records collected to
determine unusual patterns of queries by users.
43. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 43
Data Privatisation and Differential Privacy Next Steps
The previous section Data Privatisation and Differential Privacy Solution Planning on page 40 contains a generic set of
steps involved in the planning for differential privacy technology
The journey to creating an industrialised and productionised differential privacy solution can involve a number of points
at which a decision to proceed to the next stage in the journey can be made.
Figure 26 – Data Privatisation and Differential Privacy Solution Journey
In order to allow your organisation move along this journey we have identified a number of practical engagement
exercises that are designed to answer specific questions you might have on in order to progress your differential privacy
journey and to provide you with specific deliverables. These engagements are:
1. Early Business Engagement and Differential Privacy Opportunity Validation
2. Differential Privacy Design Process
3. Differential Privacy Readiness Assessment
4. Differential Privacy Architecture Sprint
Implementing differential privacy technology is a means to an end rather than an end it itself. It is a way of resolving or
addressing a data access problem or opportunity. These engagements are designed with this in mind.
While these engagement types are described individually here, they can be combined to create a custom exercise to suit
your specific needs.
The following diagram illustrates at a high-level the scope of each of these engagements in terms of the duration and
where they fit into your journey to the successful implementation of differential privacy in your organisation.
44. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 44
Figure 27 – Approaches to Data Privatisation and Differential Privacy Solution Scoping and Definition
The following table summarises the characteristics of each of these engagements.
What Question You Want
Answered
Engagement
Type
Level of
Detail
Included in
Deliverable
Likely
Engagement
Duration
What You Get
I want a consulting exercise to define
new business structures and
associated solutions to address the
potential data access provision
opportunity
Early Business
Engagement
and Differential
Privacy
Opportunity
Validation
Medium to
High
Medium A validated differential privacy
opportunity across the areas of:
• Strategic fit
• Options evaluation and
identification
• Procurement and
implementation
• Expected whole-life revenue and
costs
• Realistic and staged plan for
achievement
I want a full detailed design created
from an initial not necessarily well-
defined idea that I can pass to
Differential
Privacy
Detailed
High Medium A detailed end-to-end design for a
differential privacy solution
encompassing all solution
45. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 45
What Question You Want
Answered
Engagement
Type
Level of
Detail
Included in
Deliverable
Likely
Engagement
Duration
What You Get
solution delivery Design components
I want generalised solution options
identified to the potential data
access provision opportunity
Differential
Privacy
Readiness
Assessment
Low to
Medium
Medium An understanding the scope,
requirements, objectives, approach,
options for a differential privacy
platform and to get a high-level
understanding of the likely
resources, timescale and cost
required before starting the solution
implementation
I have a good idea of the potential
data access solution I want and I am
looking for a quick view of the
solution options and their indicative
costs, resources and timescales to
implement
Differential
Privacy
Architecture
Sprint
Low to
Medium
Short A high-level design for an end-to-end
differential privacy solution focusing
on technology aspects, that
identifies if the solution is feasible,
worthwhile and justifiable
The following sections contain more detail on each of these engagement types.
Early Business Engagement and Differential Privacy Opportunity Validation
The engagement is concerned with analysing and defining the structure and operations of a business function within your
organisation that will operate a differential privacy platform to provide controlled access to your data. It describes a
target business model that includes identifying the differential privacy platform and its constituent components.
The objective is to create a realistic, achievable, implementable and operable target differential privacy platform business
justification to achieve the desired business targets.
This is not an exact engagement with an easily defined and understood extent and duration. It has an essential
investigative and exploratory aspect that means it has to have a necessary latitude. This is not an excuse for excessive
analysis without reaching a conclusion. The goal is to produce results and answers within a reasonable time to allow
decisions to be made based on evidence.
46. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 46
Figure 28 – Early Business Engagement and Differential Privacy Opportunity Validation Process
The deliverables from this exercise will contain information in five key areas: strategic fit, options evaluation and
identification, procurement and implementation, expected whole-life revenue and costs and a realistic and staged plan
for achievement.
Strategic Fit Options Evaluation
and Identification
Procurement and
Implementation
Whole-Life Revenue
and Costs
Realistic and Staged
Plan for Achievement
Business need and its
contribution to the
organisation’s data
strategy
Key benefits to be
realised
Critical success factors
and how they will be
measured.
Cost/benefit analysis of
realistic options for
meeting the business
need
Statement of possible
soft benefits that
cannot be quantified in
financial terms
Identify preferred
option and any trade-
offs
Proposed sourcing
option with reasons
Key features of
proposed commercial
arrangements
Procurement
approach/strategy with
supporting details
Statement of available
funding and details of
projected whole-life
revenue from and cost
of project (acquisition
and operation),
including all relevant
costs
Expected financial
benefits
Plan for achieving the
desired outcome with
key milestones and
dependencies
Contingency plans
Risks identified and
mitigation plan
External supplier plans
Resources, skills and
experience required
Differential Privacy Detailed Design
This is a very comprehensive engagement that produces a detailed end-to-end design for a differential privacy solution
for your organisation. This approach to solution design is based on using six views as a structure to gather information
and to create the design. These six views are divided into two groups:
• Core Solution Architecture Views – concerned with the kernel of the solution:
• Business
• Functional
• Data
47. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 47
• Extended Solution Architecture Views – concerned with solution implementation and operation:
• Technical
• Implementation
• Management and Operation
Figure 29 – Differential Privacy Detailed Design Views
The core dimensions/views define what the differential privacy solution must do, how it must operate and the results it
will generate. The extended dimensions/views define how the solution must or should be implemented, managed and
operated. They describe factors that affect, drive and support decisions made during the solution design process. Many
of these factors will have been defined as requirements of the solution and so their delivery will be included in the
solution design.
Together these core and extended views describe the end-to-end solution design comprehensively.
Differential Privacy Readiness Assessment
The Differential Privacy Readiness Assessment is intended to allow the exploration of an as yet undefined solution that
addresses a data access opportunity using differential privacy technology.
The work is done from business, information technology and data perspectives. The objective is to understand the scope,
requirements, objectives, approach, options for a differential privacy platform and to get a high-level understanding of
the likely resources, timescale and cost required before starting the solution implementation.
It looks to identify the changes needed within the organisation in order to successfully adopt differential privacy
technology and use it to make your data more widely available.
48. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 48
Figure 30 – Areas Covered in Differential Privacy Readiness Assessment
These domains of change can be categorised as follows:
• Business-Oriented Change Areas
− Facilities – existing and new facilities of the organisation, their types and functions
− Business Processes – current and future business process definitions, requirements, characteristics,
performance
− Organisation and Structure – organisation resources and arrangement, business unit, function and team
structures and composition, relationships, reporting and management, roles and skills
• Technology-Oriented Change Areas
− Technology and Infrastructure – current and future technical infrastructure including security, constraints,
standards, technology trends, characteristics, performance requirements
− Applications and Systems – current and future applications and systems including the core differential
privacy platform and any extended components, their characteristics, constraints, assumptions,
requirements, design principles, interface standards, connectivity to business processes
− Information and Data – the data to which privatised access is to be provided, data and information
architecture, data integration, data access and management, data security and privacy
The analysis also included an extended change domain that covers the organisation operating environment and business
landscape and the organisation data access and data availability strategy.
This categorisation provides a structure for this engagement. It aims to define the changes needed across these domains
that are needed to use differential privacy technology to enable data access.
49. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 49
Differential Privacy Architecture Sprint
This engagement is designed to produce a high-level design for an end-to-end differential privacy technology solution.
The focus is on the breadth of the technology solution rather than on depth and detail. This engagement recognises that
the journey from initial business concept to operational solution is rarely simple. Not all business concepts progress to
solution delivery projects and not all solution delivery projects advance to a completed operational solution. There is
always an inevitable and necessary attrition during the process. There are many reasons why this should and could
happen. Business and organisation needs and the operational environment both change. The allocation of budgets and
resources are prioritised elsewhere.
In this light, there is a need for a differential privacy solution design sprint that generates results quickly. There is a need
to identify a feasible, worthwhile, justifiable concept that merit proceeding to implementation and to eliminate those
that are not cost-effective
The areas analysed in the differential privacy solution design sprint are:
• Systems/Applications – these are existing systems and applications that will participate in the operation of the
differential privacy solution and which may need to be changed and new systems and applications that will have to
be delivered as part of the solution
• System Interfaces – these are links between systems for the transfer and exchange of data
• Actors – these are individuals, groups or business functions who will be involved in the operation and use of the
differential privacy solution
• Actor-System Interactions – interactions between Actors and Systems/Applications
• Actor-Actor Interactions – interactions between Actors
• Functions – these are activities that are performed by actors using facilities and functionality provided by systems
• Processes – business processes required to operate the differential privacy solution and the business processes
enabled by the solution, including new business processes and changes to existing business processes
• Journey – standard journey through processes/functions and exceptions/deviations from this “happy path”
• Logical Data View – data elements required
• Data Exchanges – movement of data between Systems/Applications
This set of information combines to provide a comprehensive view of the potential differential privacy solution at an early
stage.
50. Data Privatisation, Data Anonymisation, Data Pseudonymisation and Differential Privacy
Page 50
For more information, please contact:
Alan McSweeney
alan@alanmcsweeney.com