1
bipartisanpolicy
Ulf Mattsson
Chief Security Strategist
www.Protegrity.com
Privacy-preserving Computing
and
Secure Multi Party Computation
2
Tokenization Management
and Security
Cloud Management and Security
Payment Card Industry (PCI)
Security Standards Council (SSC):
1. Tokenization Task Force
2. Encryption Task Force, Point to
Point Encryption Task Force
3. Risk Assessment SIG
4. eCommerce SIG
5. Cloud SIG, Virtualization SIG
6. Pre-Authorization SIG, Scoping SIG
Working Group
• Chief Security Strategist at Protegrity, previously Head of Innovation at
TokenEx and Chief Technology Officer at Atlantic BT, Compliance Engineering,
and IT Architect at IBM
Ulf Mattsson
ULFMATTSSON.COM
• Products and Services:
• Data Encryption, Tokenization, Data Discovery, Cloud Application Security
Brokers (CASB), Web Application Firewalls (WAF), Robotics, and
Applications
• Security Operation Center (SOC), Managed Security Services (MSSP)
• Inventor of more than 70 issued US Patents and developed Industry
Standards with ANSI X9 and PCI SSC
May 2020
Dec 2019
May 2020
3
Security Compliance
PrivacyControls
&
Tools
Regulations
Policies
Risk
Management
Why, What & How
Balance
Why, What & How?
Breaches
Opportunities
4
Increased need for data analytics drives requirements.
Data Lake,
ETL, Files
…
• Policy Enforcement Point (PEP)Protected data fields
U
• Encryption Key Management
U
External Data
Internal
Data
Secure Multi Party Computation
Analytics, Data Science, AI and ML
Data Pipeline
Data Collaboration
Data Pipeline
Data Privacy
On-premises
Cloud
Opportunities
Internal and Individual Third-Party Data Sharing
5
http://homomorphicencryption.org
Use Cases for Secure Multi Party Computation &
Homomorphic Encryption (HE)
6
Use case - Financial services industry
Confidential financial datasets which are vital for gaining significant insights.
• The use of this data requires navigating a minefield of private client information as well as sharing data
between independent financial institutions, to create a statistically significant dataset.
• Data privacy regulations such as CCPA, GDPR and other emerging regulations around the world
• Data residency controls as well as enable data sharing in a secure and private fashion.
Reduce and remove the legal, risk and compliance processes
• Collaboration across divisions, other organizations and across jurisdictions where data cannot be
relocated or shared
• Generating privacy respectful datasets with higher analytical value for Data Science and Analytics
applications.
7
Use case – Retail - Data for Secondary Purposes
Large aggregator of credit card transaction data.
Open a new revenue stream
• Using its data with its business partners: retailers, banks and advertising companies.
• They could help their partners achieve better ad conversion rate, improved customer satisfaction, and more timely offerings.
• Needed to respect user privacy and specific regulations. In this specific case, they wanted to work with a retailer.
• Allow the retailer to gain insights while protecting user privacy, and the credit card organization’s IP.
• An analyst at each organization’s office first used the software to link the data without exchanging any of the underlying
data.
Data used to train the machine learning and statistical models.
• In this specific use-case, a logistic and linear regression model was trained using secure multi-party computation (SMC).
• In the simplest form SMC splits a dataset into secret shares and enables you to train a model without needing to put together
the pieces.
• The information that is communicated between the peers is encrypted at all times and cannot be reverse engineered.
• The resultant machine learning model coefficients (output of the training) were only shared with the partner identified as the
receiver of such information.
With the augmented dataset, the retailer was able to get a better picture of its customers buying habits.
8
Use case: Bank - Internal Data Usage by Other Units
A large bank wanted to broaden access to its data lake without compromising data privacy, preserving the data’s analytical
value, and at reasonable infrastructure costs.
• Current approaches to de-identify data did not fulfill the compliance requirements and business needs, which had led to
several bank projects being stopped.
• The issue with these techniques, like masking, tokenization, and aggregation, was that they did not sufficiently protect the
data without overly degrading data quality.
This approach allows creating privacy protected datasets that retain their analytical value for Data Science and business
applications.
A plug-in to the organization’s analytical pipeline to enforce the compliance policies before the data was consumed by data
science and business teams from the data lake.
• The analytical quality of the data was preserved for machine learning purposes by-using AI and leveraging privacy models like
differential privacy and k-anonymity.
Improved data access for teams increased the business’ bottom line without adding excessive infrastructure costs, while
reducing the risk of-consumer information exposure.
9Source: Gartner
Important
Privacy-Preserving Computation
Techniques
10
https://royalsociety.org
Secure Multi-Party Computation (MPC)
Private multi-party machine learning with MPC
Using MPC, different
parties send
encrypted messages
to each other, and
obtain the model
F(A,B,C) they wanted
to compute without
revealing their own
private input, and
without the need for a
trusted central
authority.
Secure Multi-Party machine learningCentral trusted authority
A B C
F(A, B,C)
F(A, B,C) F(A, B,C)
Protected data fields
U
B
A C
F(A, B,C)
U U
U
11
Medium.com
Example of Multi-party Computation: Average Salary #1
12
Inpher
Example of Multi-party Computation: Average Salary #2
Say Allie’s salary is $100k. In additive secret sharing, $100k is split
into three randomly-generated pieces (or “secret shares”): $20k,
$30k, and $50k for example.
• Allie keeps one of these secret shares ($50k) for herself, and
distributes one secret share to each Brian ($30k) and Caroline
($20k).
• Brian and Caroline also secret-share their salaries while following
the same process.
Each participant locally sums their secret shares to
calculate a partial result; in our example, each partial
result is one third of the necessary information to
calculate the final answer.
• The partial results are then recombined, summing
the complete set of secret shares previously
distributed.
• Allie, Brian, and Caroline’s average salary is $200k.
13Source: Gartner
Six Important Privacy-Preserving
Computation Techniques
April 2020
14
https://royalsociety.org
Homomorphic encryption (HE)
HE depicted in a client-server model
• The client sends encrypted
data to a server, where a
specific analysis is performed
on the encrypted data,
without decrypting that data.
• The encrypted result is then
sent to the client, who can
decrypt it to obtain the
result of the analysis they
wished to outsource.
Encryption of x
Client
Server
Analysis
Encrypted F(x)
• Policy Enforcement Point (PEP)
Protected data fields
U
• Encryption Key Management
Differential
Privacy (DP)
2-way
Format
Preserving
Encryption
(FPE)
Homomorphic
Encryption
(HE)
K-anonymity
model
Tokenization
MaskingHashing
1-way
Privacy-Preserving
Computation (PPC)
Analytics
Machine
Learning (ML)
Secure Multi
Party
Computation
(SMPC)
Considerations when using different de-identification techniques and formal privacy measurement models
Algorithmic
Random
Noise
added
Computing
on
encrypted
data
Format
Preserving
Fast Slow Very
slow
Fast
Fast
Format
Preserving
16
Homomorphic Encryption Market
Size (USD Million)
Encryption Software Market
Size (USD billion)
Markets and Markets Analysis
Market Research Future
Homomorphic
Encryption Market
The Global Market for
Homomorphic Encryption is
estimated to reach USD 269
million by 2027
Encryption Market
17
Homomorphic Encryption Market Size (USD Million)
PHE (Partially
Homomorphic
Encryption) schemes are
in general more efficient
than SHE and FHE, mainly
because they are
homomorphic w.r.t to only
one type of operation:
addition or
multiplication.
SHE (Somewhat
Homomorphic
Encryption) is more
general than PHE in the
sense that it supports
homomorphic operations
with additions and
multiplications.
SHE scheme is also a PHE. This implies that PHE's are
at least as efficient as SHE's.
FHE (Fully Homomorphic
Encryption) allows you to
do an unbounded number
of operations
Source: Market Intellica
SHE and PHE are not suitable for data sharing scenarios.
18
Homomorphic
Encryption
History
Paillier cryptosystem a probabilistic asymmetric
algorithm with additive homomorphic
properties. This means that given the
ciphertexts of two numbers, anyone can
compute an encryption of the sum of these two
numbers.
El-Gamal encryption system is an asymmetric
key encryption algorithm for public-key
cryptography which is based on the Diffie–
Hellman keys
Source:
Research Gate, Department
of Software Engineering,
University of Engineering &
Technology, Taxila, Pakistan
Source:
Medium
Process time
of RSA,
ElGamal and
Paillier on File
Decryption
19
128 bit
Encryption
performance of
some algorithms
Homomorphic Encryption
Encryption Time for AES and RSA
(milliseconds) by file size
Department of Information Science & Engineering , BVBCET , Hubli 5800031,India
NTRU is an open source public-key cryptosystem that uses
lattice-based cryptography to encrypt and decrypt data.
20
BGV & BFV
CKKS
FHEW & TFHE
Microsoft
Used by Duality
Homomorphic
encryption
algorithms &
libraries
SIMD: Single instruction, multiple data is a class of parallel computers
21
• Microsoft SEAL is a
homomorphic encryption
library that allows additions
and multiplications to be
performed on encrypted
integers or real numbers.
• Other operations, such as
encrypted comparison, sorting,
or regular expressions, are in
most cases not feasible to
evaluate on encrypted data
using this technology.
Microsoft
Used by Duality
Homomorphic encryption algorithms, operations & libraries
HE operation encrypted data BGV BFV CKKS
Addition y y y
Multiplication y y y
Division n n n
No exponentiating a number by an encrypted one n n n
No non-polynomial operations n n n
Only be performed on integers y y
Complex numbers with limited precision y
22
Post-quantum cryptography research
1. Lattice-based cryptography
• This approach includes cryptographic systems such as learning with errors, ring learning with errors
(ring-LWE*), the ring learning with errors key exchange and the ring learning with errors signature.
2. Multivariate cryptography
3. Hash-based cryptography
4. Code-based cryptography
5. Supersingular elliptic curve isogeny cryptography
6. Symmetric key quantum resistance
*: Ring learning with errors (RLWE) is a computational problem which serves as the foundation of new cryptographic algorithms, such as NewHope,
designed to protect against cryptanalysis by quantum computers and also to provide the basis for homomorphic encryption.
Shortest vector problem (SVP)
In the SVP, a basis of a vector space V and a norm N (often L2) are
given for a lattice L and one must find the shortest non-zero vector
in V, as measured by N, in L.
Lattice problems are an example of NP-hard problems which have
been shown to be average-case hard, providing a test case for
the security of cryptographic algorithms
23
// Decrypt the result of additions
Plaintext plaintextAddResult;
cryptoContext->Decrypt(keyPair.secretKey, ciphertextAddResult,
&plaintextAddResult);
// Decrypt the result of multiplications
Plaintext plaintextMultResult;
cryptoContext->Decrypt(keyPair.secretKey, ciphertextMultResult,
&plaintextMultResult);
// Output results
cout << plaintextAddResult << endl;
cout << plaintextMultResult << endl;
Sample Program: Step 3 – Encryption
// First plaintext vector is encoded
std::vector<uint64_t> vectorOfInts1 =
{1,2,3,4,5,6,7,8,9,10,11,12};
Plaintext plaintext1 = cryptoContext-
>MakePackedPlaintext(vectorOfInts1);
// Second plaintext vector is encoded
std::vector<uint64_t> vectorOfInts2 =
{3,2,1,4,5,6,7,8,9,10,11,12};
Plaintext plaintext2 = cryptoContext-
>MakePackedPlaintext(vectorOfInts2);
// Third plaintext vector is encoded
std::vector<uint64_t> vectorOfInts3 =
{1,2,5,2,5,6,7,8,9,10,11,12};
Plaintext plaintext3 = cryptoContext-
>MakePackedPlaintext(vectorOfInts3);
// The encoded vectors are encrypted
auto ciphertext1 = cryptoContext-
>Encrypt(keyPair.publicKey, plaintext1);
auto ciphertext2 = cryptoContext-
>Encrypt(keyPair.publicKey, plaintext2);
auto ciphertext3 = cryptoContext-
>Encrypt(keyPair.publicKey, plaintext3);
Sample Program: Step 4 – Evaluation
// Homomorphic additions
auto ciphertextAdd12 = cryptoContext-
>EvalAdd(ciphertext1,ciphertext2);
auto ciphertextAddResult = cryptoContext-
>EvalAdd(ciphertextAdd12,ciphertext3);
// Homomorphic multiplications
auto ciphertextMul12 = cryptoContext-
>EvalMult(ciphertext1,ciphertext2);
auto ciphertextMultResult = cryptoContext-
>EvalMult(ciphertextMul12,ciphertext3);
3. Encrypt
1. Set
Context
Modulus
65537
Security
Level 128
5. Decrypt Result4.
Add
How to code a HE application
24Source: Gartner
The Adoption of Homomorphic Encryption (HE) Remains Far From Mainstream
Apr 2020
Performance:
While massive improvements have been made in recent years, general Fully HE-based (FHE) processing remains 1,000 to
1,000,000 times slower than equivalent plaintext efforts.
• Consequently, the computational overhead remains too heavy for FHE in most general computing scenarios.
• However, there are plenty of opportunities to exploit FHE potential even in discrete use cases.
Lack of Standardization:
Like any early technology, HE efforts remain diverse and fragmented.
• A lack of standardization inhibits consistency wherein potential customers can rally around to create an economy of
scope and scale.
• For example, the HE community must continue to work to simplify and standardize APIs and SDKs.
Too Difficult for Typical IT Developers:
The overwhelming historical HE research has come from elite corporate and academic cryptographic experts.
• As a result, most of the HE libraries remain too difficult for mainstream IT providers and customers to leverage without
intensive training.
• To succeed among mainstream end-user and TSP organizations, HE technology must be abstracted and simplified by
incorporating it into familiar developer languages, frameworks and platforms.
• Moreover, the IT industry must work with HE experts to train application developers on ways to incorporate it within
real-world practical solutions.
25Source: Gartner
Important
Privacy-Preserving Computation
Techniques
26https://royalsociety.org/
Trusted Execution Environment (TEE)
TEEs are a secure area inside a processor.
• A Trusted Execution Environment
(TEE) is a secure area inside a main
processor.
• TEEs are isolated from the rest of the
system, so that the operating system
or hypervisor cannot read the code in
the TEE.
• However, TEEs can access memory
outside.
• TEEs can also protect data ‘at rest’,
when it is not being analyzed,
through encryption.
• Like homomorphic encryption, TEEs might be used to securely outsource computations on sensitive data to the cloud.
Instead of a cryptographic solution, TEEs offer a hardware based way to ensure data and code cannot be learnt by a
server to which computation is outsourced.
• TEEs are a good place to store master cryptographic keys, for example.
• Encryption Key Management
Application Application
Operating system
Hypervisor
Hardware
Code Data
Protected data fields
U
U
27https://www.eetimes.com/holy-grail-of-encryption-could-be-a-game-changer-for-ai/
SIMD-style, GPU-style computing, as well as FPGA-style computing
Single instruction, multiple
data (SIMD) is a class of
parallel computers with
multiple processing
elements.
FPGA (Field
Programmable Gate
Arrays) are
semiconductor devices
that are based around a
matrix of configurable
logic blocks (CLBs)
Example of Secure collaborations on encrypted data in a hybrid environment
Linear computations
Homomorphic encryption
(HE) enabled device
Non-linear computations
Trusted hardware in an
unencrypted domain
Total cost
28
Centre for Secure Information Technologies (CSIT), Queen’s University Belfast, Northern Ireland
Timings for multiplication
and small size encryption
with Fully homomorphic
encryption (FHE)
• FHE (Fully Homomorphic Encryption): Gentry, using lattice-based cryptography, described the first plausible
construction for a fully homomorphic encryption scheme.
• TSMC is the first foundry to provide 5-nanometer production capabilities, the most advanced semiconductor process
technology available in the world.
• GTX 16 SERIES. The Gaming Supercharger. Experience the breakthrough graphics performance.
• GPU (Graphics Processing Unit) is a specialized electronic circuit designed to rapidly manipulate and alter memory to
accelerate the creation of images.
• FPGA (Field Programmable Gate Arrays) are semiconductor devices that are based around a matrix of configurable
logic blocks
Processor platform Multiplication (milliseconds) Encryption (seconds)
TSMC 8 2
GPU 1 2
GTX 0.5 0.01
29Source: Gartner
Important
Privacy-Preserving Computation
Techniques
30
System
Untrusted Trusted
Policy
Enforcement
Point Data
Enterprise
Policies
User
CDM
System
Industry
Compliance
Threat
Intelligence
Activity
Logs
Data Access
Policy
PKI
ID
Management
SIEM System
A zero-trust architecture (US NIST, ANSI X9)
1. All data sources and computing services are considered resources.
2. All communication is secured regardless of network location.
3. Access to individual enterprise resources is granted on a per-session basis.
4. Access to resources is determined by dynamic policy—including the observable state of client identity, application, and
other behavioral attributes.
5. The enterprise monitors assets to ensure that they remain in the most secure state possible.
6. All resource authentication and authorization are dynamic and strictly enforced before access is allowed.
7. The enterprise collects the current state of network infrastructure and communications.
31Source: Gartner
Important
Privacy-Preserving Computation
Techniques
32Crypto Numerics
Private Set Intersection (PSI)
Identifies common customers without disclosing any other information
PSI replaces simplistic approaches such as one-way hashing functions that are susceptible to dictionary attacks.
• Applications for PSI include identifying the overlap with potential data partners (i.e. “Is there a large enough client
base in common to be worthwhile to work together), as well as aligning datasets with data partners in preparation
for using MPC to train a machine learning model.
33
Organization 1 2 3 4 5 6 7 8 9 10 11 12
# Employees
HE Standards HE.org HE.org HE.org
COTS COTS COTS COTS
Models AI AI
Federated
Learning
Quantum
Safe
Fuzzy
Search
Local ML
Training
Enc
Sort
Encrypted
Query
Private Set
Intersection
Enc Sort
Encrypted
Query
Encr Proxy
Encr
Proxy
Two-Person
Integrity
Lib
Open
Source
Lib
Zero
Trust
Secure
Exec Env
Block
Chain
Differential
Priv
Differential
Priv
Featured
Functions
Smaller HE Organizations / vendors
200 to 50 49 to 20 19 and smaller
HomomorphicEncryption.org
34
TrustArc
Legal and regulatory risks are exploding
35
CCPA redefines ”Personal information”
• CCPA states that ”Personal information” means information that identifies, relates to, describes, is capable of
being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or
household
PwC,
Micro Focus
36Source: MS
Encryption* and
Tokenization
Discover
Data Assets
Security
by Design
GDPR Security Requirements – Encryption and Tokenization
Homomorphic encryption cannot be used to enable data scientist to circumvent GDPR. For example, there is no way
for a cloud service to use homomorphic encryption to draw insights from encrypted customer data. Instead, results of
encrypted computations remain encrypted and can only be decrypted by the owner of the data, e.g., a cloud service
Source: IBM
37
Find your sensitive data in cloud and on premise
www.protegrity.com
38Source: Gartner
Important
Privacy-Preserving Computation
Techniques
39https://royalsociety.org/
Differential privacy
• The two concepts can be
brought together
• Differential privacy comes
with a mathematical proof
(or guarantee)
• In differential privacy, the
concern is about privacy as
the relative difference in the
result whether a specific
individual or entity is
included in the input or
excluded
• In contrast, ‘anonymization’
in data protection law is
concerned about removing
any attributes associated
with an individual that could
make them identifiable in a
dataset
Analysis /
computation
Opt-out
scenario
Difference
Real world
computation
Input Output
Analysis /
computation
Input Output
Protected
Curator*
(Filter)
Output
Cleanser
(Filter)
Input Protected
Database
INTERNATIONAL STANDARD ISO/IEC 20889
*: Example: Apple
40
Random
differential
privacy
Probabilistic
differential
privacy
Concentrated
differential
privacy
Noise is very low.
Used in practice.
Tailored to large numbers
of computations.
Approximate
differential
privacy
More useful analysis can be performed.
Well-studied.
Can lead to unlikely outputs.
Widely used
Computational
differential privacy
Multiparty
differential
privacy
Can ensure the privacy of individual contributions.
Aggregation is performed locally.
Strong degree of protection.
High accuracy
6 Differential
Privacy
Models
A pure model provides protection even against attackers with unlimited
computational power.
41
Clear text
data
Cleanser
Filter
Database
K-anonymity model
The k-anonymity model that ensures that groups smaller
than k individuals cannot be identified.
• Queries will return at least k number of records. K-
anonymity is a formal privacy measurement model that
ensures that for each identifier there is a corresponding
equivalence class containing at least K records.
Source: INTERNATIONAL STANDARD ISO/IEC 20889
Some of the de-identification techniques can be used either independently or in combination with each other to satisfy the K-
anonymity model.
Suppression techniques, generalization techniques, and microaggregation* can be applied to different types of attributes in a
dataset to achieve the desired results.
*: Microaggregation replaces all values of continuous attributes with their averages computed in a certain algorithmic way.
42
Enhancements to the concept of K-anonymity
Source: INTERNATIONAL STANDARD ISO/IEC 20889
L-diversity
1. L-diversity is an enhancement to K-anonymity for datasets with poor attribute variability.
2. It is designed to protect against deterministic inference attempts by ensuring that each equivalence class has at least L
well-represented values for each sensitive attribute.
3. L-diversity is not a single model but a group of models
• Each model has diversity defined slightly differently, e.g. by counting distinct values or by entropy.
4. L-diversity can be difficult to achieve and can cause significant loss in data utility due to the implicit assumption of how
values for each sensitive attribute are distributed.
5. Its ability to protect against inferences is also limited when data values are unevenly distributed.
6. This variant of K-anonymity is subject to attacks, which have led to the development of M-invariance and T-closeness.
T-closeness
• T-closeness is an enhancement to L-diversity for datasets with attributes that are unevenly distributed, belong to a small
range of values, or are categorical.
• Group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data
representation. This reduction is a trade off that results in some loss of effectiveness of data management or data mining
algorithms in order to gain some privacy.
43
Shared
responsibili
ties across
cloud
service
models
Data Protection for Multi-
cloud
Payment
Application
Payment
Network
Payment
Data
Policy,
tokenization,
encryption
and keys
Gateway
Call Center
Application
PI* Data
Tokenization
Salesforce
Analytics
Application
Differential Privacy (DP),
K-anonymity model
PI* Data
Microsoft
Election
Guard
development
kit
Election
Data
Homomorphic Encryption (HE)
Data
Warehouse
PI* Data
Vault-less tokenization (VLT)
Use-cases of some data privacy techniques
Voting
Application
*: PI Data (Personal information) means information that identifies, relates to, describes, is capable of being associated
with, or could reasonably be linked, directly or indirectly, with a consumer or household according to CCPA
Dev/test
Systems
Masking
PI* Data
44
Risk reduction
of standardized
de-identification
techniques
Source: INTERNATIONAL
STANDARD ISO/IEC 20889
Transit Use Storage Singling out Linking Inference
Pseudonymization Tokenization
Protects the data flow
from attacks
Yes Yes Yes Yes Direct identifiers No Partially No
Deterministic
encryption
Protects the data when
not used in processing
operations
Yes No Yes Yes All attributes No Partially No
Order-preserving
encryption
Protects the data from
attacks
Partially Partially Partially Yes All attributes No Partially No
Homomorphic
encryption
Protects the data also
when used in processing
operations
Yes Yes Yes Yes All attributes No No No
Masking
Protects the data in
dev/test and analytical
applications
Yes Yes Yes Yes Local identifiers Yes Partially No
Local suppression
Protects the data in
analytical applications
Yes Yes Yes Yes
Identifying
attributes
Partially Partially Partially
Record suppression
Removes the data from
the data set
Yes Yes Yes Yes All attributes Yes Yes Yes
Sampling
Exposes only a subset of
the data for analytical
applications
Partially Partially Partially Yes All attributes Partially Partially Partially
Generalization
Protects the data in
dev/test and analytical
applications
Yes Yes Yes Yes
Identifying
attributes
Partially Partially Partially
Rounding
Protects the data in
dev/test and analytical
applications
Yes Yes Yes Yes
Identifying
attributes
No Partially Partially
Top/bottom coding
Protects the data in
dev/test and analytical
applications
Yes Yes Yes Yes
Identifying
attributes
No Partially Partially
Noise addition
Protects the data in
dev/test and analytical
applications
Yes Yes Yes No
Identifying
attributes
Partially Partially Partially
Permutation
Protects the data in
dev/test and analytical
applications
Yes Yes Yes No
Identifying
attributes
Partially Partially Partially
Micro aggregation
Protects the data in
dev/test and analytical
applications
Yes Yes Yes No All attributes No Partially Partially
Differential privacy
Protects the data in
analytical applications
No Yes Yes No
Identifying
attributes
Yes Yes Partially
K-anonymity
Protects the data in
analytical applications
No Yes Yes Yes Quai identifiers Yes Partially No
Privacy models
Applicable to
types of
attributes
Reduces the risk of
Cryptographic tools
Suppression
Generalization
Technique name
Data
truthfulness
at record
level
Use Case / User Story
Data protected in
Randomization
45
Example of
what Role 2
can see
Example of
the data
values that
are stored in
the file at rest
Reduce risk by not exposing the full
data value to applications and users
that only need to operate on a
limited representation of the data
Reduce risk by field level protection
46
Access to Data Sources / FieldsLow High
High -
Low -
I I
Risk, productivity and access to
more data fields
User Productivity
47
Data sources
Data
Warehouse
Complete policy-
enforced de-
identification of
sensitive data across
all bank entities
Example of Cross Border Data-centric Security using Tokenization
• Protecting Personally Identifiable Information
(PII), including names, addresses, phone, email,
policy and account numbers
• Compliance with EU Cross Border Data
Protection Laws
• Utilizing Data Tokenization, and centralized
policy, key management, auditing, and
reporting
Data should
be protected
48
49
Data protection techniques: Deployment on-premises, and clouds
Data
Warehouse
Centralized Distributed
On-
premises
Public
Cloud
Private
Cloud
Vault-based tokenization y y
Vault-less tokenization y y y y y y
Format preserving
encryption
y y y y y
Homomorphic encryption y y
Masking y y y y y y
Hashing y y y y y y
Server model y y y y y y
Local model y y y y y y
L-diversity y y y y y y
T-closeness y y y y y y
Privacy enhancing data de-identification
terminology and classification of techniques
De-
identification
techniques
Tokenization
Cryptographic
tools
Suppression
techniques
Formal
privacy
measurement
models
Differential
Privacy
K-anonymity
model
50Source: Gartner
Source:
Netskope
Current use or plan to use:
Spending by Deployment Model, Digital Commerce Platforms, Worldwide
51
Cloud-based searchable encryption systems
Build-Index Searchable encryption systems commonly
utilize an index structure to keep track of occurrences
of keywords in documents and insert them into the
index structure.
This Build-Index process is used by the data owner to
generate a secure and searchable structure that enables
search over the encrypted data.
• An index structure is generally implemented in form
of a hash table , meta data (markup) , or inverted
index where each unique keyword is mapped to
documents
High-Performing Cloud Computing (HPCC) Laboratory, School of
Computing and Informatics, University of Louisiana and IXUP
52
Legal Compliance and Nation-State Attacks
• Many companies have information that is attractive to governments and intelligence services.
• Others worry that litigation may result in a subpoena for all their data.
Securosis, 2019
Multi-Cloud Key Management considerations
Jurisdiction
• Cloud service providers,
especially IaaS vendors, offer
services in multiple countries,
often in more than one
region, with redundant data
centers.
• This redundancy is great for
resilience, but regulatory
concerns arises when
moving data across regions
which may have different
laws and jurisdictions. SecuPi
53
• Amazon S3 encryption and decryption takes place in the EMRFS client on your cluster.
• Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded.
• The EMR (Elastic MapReduce) File System (EMRFS) is an implementation of HDFS that all Amazon EMR clusters use for
reading and writing regular files from Amazon EMR directly to Amazon S3.
Amazon S3 client-side encryption
54
AWS encryption key management
55
Risk
Elasticity
Out-sourcedIn-house
On-premises
On-premises
Private Cloud
Hosted
Private Cloud
Public Cloud
Low -
High -
Computing Cost
- High
- Low
• Encryption/decryption Point
Protected data
U
U
U
• Encryption Key Management
U
U
Multi-Cloud Key Management
56
A Data Security Gateway can protect sensitive data in Cloud and On-premise
• Policy Enforcement Protected data
U
• Encryption Key
On-premise
57
Protection throughout the lifecycle of data in Hadoop
Tokenizes or encrypts
sensitive data fields
Enterprise
Policies
Privacy policies may be
managed on-prem or
Cloud Platform
• Policy Enforcement Point (PEP)
Protected data fields
U
Separation of Duties
• Encryption Key Management
Big Data Analytics
Data
Producers
Data
Users
Google Cloud
UU
Big Data Protection with Granular Field Level Protection for Google Cloud
58
Protect data before landing
Enterprise
Policies
Apps using de-identified
data
Sensitive data streams
Enterprise on-
prem
Data lifted to S3 is
protected before use
S3
• Applications can use de-
identified data or data in the
clear based on policies
• Protection of data in AWS S3
before landing in a S3 bucket
Protection of data
in AWS S3 with
Separation of Duties
• Policy Enforcement Point (PEP)
Separation of Duties
• Encryption Key Management
59Securosis, 2019
Consistency
• Most firms are quite familiar with their
on-premises encryption and key
management systems, so they often
prefer to leverage the same tool and skills
across multiple clouds.
• Firms often adopt a “best of breed” cloud
approach.
Examples of Hybrid Cloud considerations
Trust
• Some customers simply do not trust
their vendors.
Vendor Lock-in and Migration
• A common concern is vendor
lock-in, and an inability to
migrate to another cloud
service provider.
Cloud Gateway
Google Cloud AWS Cloud Azure Cloud
S3
Salesforce
60
References:
1. California Consumer Privacy Act, OCT 4, 2019, https://www.csoonline.com/article/3182578/california-consumer-privacy-act-what-
you-need-to-know-to-be-compliant.html
2. CIS Controls V7.1 Mapping to NIST CSF, https://dataprivacylab.org/projects/identifiability/paper1.pdf
3. GDPR and Tokenizing Data, https://tdwi.org/articles/2018/06/06/biz-all-gdpr-and-tokenizing-data-3.aspx
4. GDPR VS CCPA, https://wirewheel.io/wp-content/uploads/2018/10/GDPR-vs-CCPA-Cheatsheet.pdf
5. General Data Protection Regulation, https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
6. IBM Framework Helps Clients Prepare for the EU's General Data Protection Regulation, https://ibmsystemsmag.com/IBM-
Z/03/2018/ibm-framework-gdpr
7. INTERNATIONAL STANDARD ISO/IEC 20889, https://webstore.ansi.org/Standards/ISO/ISOIEC208892018?gclid=EAIaIQobChMIvI-
k3sXd5gIVw56zCh0Y0QeeEAAYASAAEgLVKfD_BwE
8. INTERNATIONAL STANDARD ISO/IEC 27018, https://webstore.ansi.org/Standards/ISO/
ISOIEC270182019?gclid=EAIaIQobChMIleWM6MLd5gIVFKSzCh3k2AxKEAAYASAAEgKbHvD_BwE
9. New Enterprise Application and Data Security Challenges and Solutions https://www.brighttalk.com/webinar/new-enterprise-
application-and-data-security-challenges-and-solutions/
10. Machine Learning and AI in a Brave New Cloud World https://www.brighttalk.com/webcast/14723/357660/machine-learning-and-ai-
in-a-brave-new-cloud-world
11. Emerging Data Privacy and Security for Cloud https://www.brighttalk.com/webinar/emerging-data-privacy-and-security-for-cloud/
12. New Application and Data Protection Strategies https://www.brighttalk.com/webinar/new-application-and-data-protection-
strategies-2/
13. The Day When 3rd Party Security Providers Disappear into Cloud https://www.brighttalk.com/webinar/the-day-when-3rd-party-
security-providers-disappear-into-cloud/
14. Advanced PII/PI Data Discovery https://www.brighttalk.com/webinar/advanced-pii-pi-data-discovery/
15. Emerging Application and Data Protection for Cloud https://www.brighttalk.com/webinar/emerging-application-and-data-protection-
for-cloud/
16. Data Security: On Premise or in the Cloud, ISSA Journal, December 2019, ulf@ulfmattsson.com
17. Webinars and slides, www.ulfmattsson.com
61
bipartisanpolicy
Ulf Mattsson
Chief Security Strategist
www.Protegrity.com
Thank You!

Privacy preserving computing and secure multi party computation

  • 1.
    1 bipartisanpolicy Ulf Mattsson Chief SecurityStrategist www.Protegrity.com Privacy-preserving Computing and Secure Multi Party Computation
  • 2.
    2 Tokenization Management and Security CloudManagement and Security Payment Card Industry (PCI) Security Standards Council (SSC): 1. Tokenization Task Force 2. Encryption Task Force, Point to Point Encryption Task Force 3. Risk Assessment SIG 4. eCommerce SIG 5. Cloud SIG, Virtualization SIG 6. Pre-Authorization SIG, Scoping SIG Working Group • Chief Security Strategist at Protegrity, previously Head of Innovation at TokenEx and Chief Technology Officer at Atlantic BT, Compliance Engineering, and IT Architect at IBM Ulf Mattsson ULFMATTSSON.COM • Products and Services: • Data Encryption, Tokenization, Data Discovery, Cloud Application Security Brokers (CASB), Web Application Firewalls (WAF), Robotics, and Applications • Security Operation Center (SOC), Managed Security Services (MSSP) • Inventor of more than 70 issued US Patents and developed Industry Standards with ANSI X9 and PCI SSC May 2020 Dec 2019 May 2020
  • 3.
  • 4.
    4 Increased need fordata analytics drives requirements. Data Lake, ETL, Files … • Policy Enforcement Point (PEP)Protected data fields U • Encryption Key Management U External Data Internal Data Secure Multi Party Computation Analytics, Data Science, AI and ML Data Pipeline Data Collaboration Data Pipeline Data Privacy On-premises Cloud Opportunities Internal and Individual Third-Party Data Sharing
  • 5.
    5 http://homomorphicencryption.org Use Cases forSecure Multi Party Computation & Homomorphic Encryption (HE)
  • 6.
    6 Use case -Financial services industry Confidential financial datasets which are vital for gaining significant insights. • The use of this data requires navigating a minefield of private client information as well as sharing data between independent financial institutions, to create a statistically significant dataset. • Data privacy regulations such as CCPA, GDPR and other emerging regulations around the world • Data residency controls as well as enable data sharing in a secure and private fashion. Reduce and remove the legal, risk and compliance processes • Collaboration across divisions, other organizations and across jurisdictions where data cannot be relocated or shared • Generating privacy respectful datasets with higher analytical value for Data Science and Analytics applications.
  • 7.
    7 Use case –Retail - Data for Secondary Purposes Large aggregator of credit card transaction data. Open a new revenue stream • Using its data with its business partners: retailers, banks and advertising companies. • They could help their partners achieve better ad conversion rate, improved customer satisfaction, and more timely offerings. • Needed to respect user privacy and specific regulations. In this specific case, they wanted to work with a retailer. • Allow the retailer to gain insights while protecting user privacy, and the credit card organization’s IP. • An analyst at each organization’s office first used the software to link the data without exchanging any of the underlying data. Data used to train the machine learning and statistical models. • In this specific use-case, a logistic and linear regression model was trained using secure multi-party computation (SMC). • In the simplest form SMC splits a dataset into secret shares and enables you to train a model without needing to put together the pieces. • The information that is communicated between the peers is encrypted at all times and cannot be reverse engineered. • The resultant machine learning model coefficients (output of the training) were only shared with the partner identified as the receiver of such information. With the augmented dataset, the retailer was able to get a better picture of its customers buying habits.
  • 8.
    8 Use case: Bank- Internal Data Usage by Other Units A large bank wanted to broaden access to its data lake without compromising data privacy, preserving the data’s analytical value, and at reasonable infrastructure costs. • Current approaches to de-identify data did not fulfill the compliance requirements and business needs, which had led to several bank projects being stopped. • The issue with these techniques, like masking, tokenization, and aggregation, was that they did not sufficiently protect the data without overly degrading data quality. This approach allows creating privacy protected datasets that retain their analytical value for Data Science and business applications. A plug-in to the organization’s analytical pipeline to enforce the compliance policies before the data was consumed by data science and business teams from the data lake. • The analytical quality of the data was preserved for machine learning purposes by-using AI and leveraging privacy models like differential privacy and k-anonymity. Improved data access for teams increased the business’ bottom line without adding excessive infrastructure costs, while reducing the risk of-consumer information exposure.
  • 9.
  • 10.
    10 https://royalsociety.org Secure Multi-Party Computation(MPC) Private multi-party machine learning with MPC Using MPC, different parties send encrypted messages to each other, and obtain the model F(A,B,C) they wanted to compute without revealing their own private input, and without the need for a trusted central authority. Secure Multi-Party machine learningCentral trusted authority A B C F(A, B,C) F(A, B,C) F(A, B,C) Protected data fields U B A C F(A, B,C) U U U
  • 11.
    11 Medium.com Example of Multi-partyComputation: Average Salary #1
  • 12.
    12 Inpher Example of Multi-partyComputation: Average Salary #2 Say Allie’s salary is $100k. In additive secret sharing, $100k is split into three randomly-generated pieces (or “secret shares”): $20k, $30k, and $50k for example. • Allie keeps one of these secret shares ($50k) for herself, and distributes one secret share to each Brian ($30k) and Caroline ($20k). • Brian and Caroline also secret-share their salaries while following the same process. Each participant locally sums their secret shares to calculate a partial result; in our example, each partial result is one third of the necessary information to calculate the final answer. • The partial results are then recombined, summing the complete set of secret shares previously distributed. • Allie, Brian, and Caroline’s average salary is $200k.
  • 13.
    13Source: Gartner Six ImportantPrivacy-Preserving Computation Techniques April 2020
  • 14.
    14 https://royalsociety.org Homomorphic encryption (HE) HEdepicted in a client-server model • The client sends encrypted data to a server, where a specific analysis is performed on the encrypted data, without decrypting that data. • The encrypted result is then sent to the client, who can decrypt it to obtain the result of the analysis they wished to outsource. Encryption of x Client Server Analysis Encrypted F(x) • Policy Enforcement Point (PEP) Protected data fields U • Encryption Key Management
  • 15.
    Differential Privacy (DP) 2-way Format Preserving Encryption (FPE) Homomorphic Encryption (HE) K-anonymity model Tokenization MaskingHashing 1-way Privacy-Preserving Computation (PPC) Analytics Machine Learning(ML) Secure Multi Party Computation (SMPC) Considerations when using different de-identification techniques and formal privacy measurement models Algorithmic Random Noise added Computing on encrypted data Format Preserving Fast Slow Very slow Fast Fast Format Preserving
  • 16.
    16 Homomorphic Encryption Market Size(USD Million) Encryption Software Market Size (USD billion) Markets and Markets Analysis Market Research Future Homomorphic Encryption Market The Global Market for Homomorphic Encryption is estimated to reach USD 269 million by 2027 Encryption Market
  • 17.
    17 Homomorphic Encryption MarketSize (USD Million) PHE (Partially Homomorphic Encryption) schemes are in general more efficient than SHE and FHE, mainly because they are homomorphic w.r.t to only one type of operation: addition or multiplication. SHE (Somewhat Homomorphic Encryption) is more general than PHE in the sense that it supports homomorphic operations with additions and multiplications. SHE scheme is also a PHE. This implies that PHE's are at least as efficient as SHE's. FHE (Fully Homomorphic Encryption) allows you to do an unbounded number of operations Source: Market Intellica SHE and PHE are not suitable for data sharing scenarios.
  • 18.
    18 Homomorphic Encryption History Paillier cryptosystem aprobabilistic asymmetric algorithm with additive homomorphic properties. This means that given the ciphertexts of two numbers, anyone can compute an encryption of the sum of these two numbers. El-Gamal encryption system is an asymmetric key encryption algorithm for public-key cryptography which is based on the Diffie– Hellman keys Source: Research Gate, Department of Software Engineering, University of Engineering & Technology, Taxila, Pakistan Source: Medium Process time of RSA, ElGamal and Paillier on File Decryption
  • 19.
    19 128 bit Encryption performance of somealgorithms Homomorphic Encryption Encryption Time for AES and RSA (milliseconds) by file size Department of Information Science & Engineering , BVBCET , Hubli 5800031,India NTRU is an open source public-key cryptosystem that uses lattice-based cryptography to encrypt and decrypt data.
  • 20.
    20 BGV & BFV CKKS FHEW& TFHE Microsoft Used by Duality Homomorphic encryption algorithms & libraries SIMD: Single instruction, multiple data is a class of parallel computers
  • 21.
    21 • Microsoft SEALis a homomorphic encryption library that allows additions and multiplications to be performed on encrypted integers or real numbers. • Other operations, such as encrypted comparison, sorting, or regular expressions, are in most cases not feasible to evaluate on encrypted data using this technology. Microsoft Used by Duality Homomorphic encryption algorithms, operations & libraries HE operation encrypted data BGV BFV CKKS Addition y y y Multiplication y y y Division n n n No exponentiating a number by an encrypted one n n n No non-polynomial operations n n n Only be performed on integers y y Complex numbers with limited precision y
  • 22.
    22 Post-quantum cryptography research 1.Lattice-based cryptography • This approach includes cryptographic systems such as learning with errors, ring learning with errors (ring-LWE*), the ring learning with errors key exchange and the ring learning with errors signature. 2. Multivariate cryptography 3. Hash-based cryptography 4. Code-based cryptography 5. Supersingular elliptic curve isogeny cryptography 6. Symmetric key quantum resistance *: Ring learning with errors (RLWE) is a computational problem which serves as the foundation of new cryptographic algorithms, such as NewHope, designed to protect against cryptanalysis by quantum computers and also to provide the basis for homomorphic encryption. Shortest vector problem (SVP) In the SVP, a basis of a vector space V and a norm N (often L2) are given for a lattice L and one must find the shortest non-zero vector in V, as measured by N, in L. Lattice problems are an example of NP-hard problems which have been shown to be average-case hard, providing a test case for the security of cryptographic algorithms
  • 23.
    23 // Decrypt theresult of additions Plaintext plaintextAddResult; cryptoContext->Decrypt(keyPair.secretKey, ciphertextAddResult, &plaintextAddResult); // Decrypt the result of multiplications Plaintext plaintextMultResult; cryptoContext->Decrypt(keyPair.secretKey, ciphertextMultResult, &plaintextMultResult); // Output results cout << plaintextAddResult << endl; cout << plaintextMultResult << endl; Sample Program: Step 3 – Encryption // First plaintext vector is encoded std::vector<uint64_t> vectorOfInts1 = {1,2,3,4,5,6,7,8,9,10,11,12}; Plaintext plaintext1 = cryptoContext- >MakePackedPlaintext(vectorOfInts1); // Second plaintext vector is encoded std::vector<uint64_t> vectorOfInts2 = {3,2,1,4,5,6,7,8,9,10,11,12}; Plaintext plaintext2 = cryptoContext- >MakePackedPlaintext(vectorOfInts2); // Third plaintext vector is encoded std::vector<uint64_t> vectorOfInts3 = {1,2,5,2,5,6,7,8,9,10,11,12}; Plaintext plaintext3 = cryptoContext- >MakePackedPlaintext(vectorOfInts3); // The encoded vectors are encrypted auto ciphertext1 = cryptoContext- >Encrypt(keyPair.publicKey, plaintext1); auto ciphertext2 = cryptoContext- >Encrypt(keyPair.publicKey, plaintext2); auto ciphertext3 = cryptoContext- >Encrypt(keyPair.publicKey, plaintext3); Sample Program: Step 4 – Evaluation // Homomorphic additions auto ciphertextAdd12 = cryptoContext- >EvalAdd(ciphertext1,ciphertext2); auto ciphertextAddResult = cryptoContext- >EvalAdd(ciphertextAdd12,ciphertext3); // Homomorphic multiplications auto ciphertextMul12 = cryptoContext- >EvalMult(ciphertext1,ciphertext2); auto ciphertextMultResult = cryptoContext- >EvalMult(ciphertextMul12,ciphertext3); 3. Encrypt 1. Set Context Modulus 65537 Security Level 128 5. Decrypt Result4. Add How to code a HE application
  • 24.
    24Source: Gartner The Adoptionof Homomorphic Encryption (HE) Remains Far From Mainstream Apr 2020 Performance: While massive improvements have been made in recent years, general Fully HE-based (FHE) processing remains 1,000 to 1,000,000 times slower than equivalent plaintext efforts. • Consequently, the computational overhead remains too heavy for FHE in most general computing scenarios. • However, there are plenty of opportunities to exploit FHE potential even in discrete use cases. Lack of Standardization: Like any early technology, HE efforts remain diverse and fragmented. • A lack of standardization inhibits consistency wherein potential customers can rally around to create an economy of scope and scale. • For example, the HE community must continue to work to simplify and standardize APIs and SDKs. Too Difficult for Typical IT Developers: The overwhelming historical HE research has come from elite corporate and academic cryptographic experts. • As a result, most of the HE libraries remain too difficult for mainstream IT providers and customers to leverage without intensive training. • To succeed among mainstream end-user and TSP organizations, HE technology must be abstracted and simplified by incorporating it into familiar developer languages, frameworks and platforms. • Moreover, the IT industry must work with HE experts to train application developers on ways to incorporate it within real-world practical solutions.
  • 25.
  • 26.
    26https://royalsociety.org/ Trusted Execution Environment(TEE) TEEs are a secure area inside a processor. • A Trusted Execution Environment (TEE) is a secure area inside a main processor. • TEEs are isolated from the rest of the system, so that the operating system or hypervisor cannot read the code in the TEE. • However, TEEs can access memory outside. • TEEs can also protect data ‘at rest’, when it is not being analyzed, through encryption. • Like homomorphic encryption, TEEs might be used to securely outsource computations on sensitive data to the cloud. Instead of a cryptographic solution, TEEs offer a hardware based way to ensure data and code cannot be learnt by a server to which computation is outsourced. • TEEs are a good place to store master cryptographic keys, for example. • Encryption Key Management Application Application Operating system Hypervisor Hardware Code Data Protected data fields U U
  • 27.
    27https://www.eetimes.com/holy-grail-of-encryption-could-be-a-game-changer-for-ai/ SIMD-style, GPU-style computing,as well as FPGA-style computing Single instruction, multiple data (SIMD) is a class of parallel computers with multiple processing elements. FPGA (Field Programmable Gate Arrays) are semiconductor devices that are based around a matrix of configurable logic blocks (CLBs) Example of Secure collaborations on encrypted data in a hybrid environment Linear computations Homomorphic encryption (HE) enabled device Non-linear computations Trusted hardware in an unencrypted domain Total cost
  • 28.
    28 Centre for SecureInformation Technologies (CSIT), Queen’s University Belfast, Northern Ireland Timings for multiplication and small size encryption with Fully homomorphic encryption (FHE) • FHE (Fully Homomorphic Encryption): Gentry, using lattice-based cryptography, described the first plausible construction for a fully homomorphic encryption scheme. • TSMC is the first foundry to provide 5-nanometer production capabilities, the most advanced semiconductor process technology available in the world. • GTX 16 SERIES. The Gaming Supercharger. Experience the breakthrough graphics performance. • GPU (Graphics Processing Unit) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images. • FPGA (Field Programmable Gate Arrays) are semiconductor devices that are based around a matrix of configurable logic blocks Processor platform Multiplication (milliseconds) Encryption (seconds) TSMC 8 2 GPU 1 2 GTX 0.5 0.01
  • 29.
  • 30.
    30 System Untrusted Trusted Policy Enforcement Point Data Enterprise Policies User CDM System Industry Compliance Threat Intelligence Activity Logs DataAccess Policy PKI ID Management SIEM System A zero-trust architecture (US NIST, ANSI X9) 1. All data sources and computing services are considered resources. 2. All communication is secured regardless of network location. 3. Access to individual enterprise resources is granted on a per-session basis. 4. Access to resources is determined by dynamic policy—including the observable state of client identity, application, and other behavioral attributes. 5. The enterprise monitors assets to ensure that they remain in the most secure state possible. 6. All resource authentication and authorization are dynamic and strictly enforced before access is allowed. 7. The enterprise collects the current state of network infrastructure and communications.
  • 31.
  • 32.
    32Crypto Numerics Private SetIntersection (PSI) Identifies common customers without disclosing any other information PSI replaces simplistic approaches such as one-way hashing functions that are susceptible to dictionary attacks. • Applications for PSI include identifying the overlap with potential data partners (i.e. “Is there a large enough client base in common to be worthwhile to work together), as well as aligning datasets with data partners in preparation for using MPC to train a machine learning model.
  • 33.
    33 Organization 1 23 4 5 6 7 8 9 10 11 12 # Employees HE Standards HE.org HE.org HE.org COTS COTS COTS COTS Models AI AI Federated Learning Quantum Safe Fuzzy Search Local ML Training Enc Sort Encrypted Query Private Set Intersection Enc Sort Encrypted Query Encr Proxy Encr Proxy Two-Person Integrity Lib Open Source Lib Zero Trust Secure Exec Env Block Chain Differential Priv Differential Priv Featured Functions Smaller HE Organizations / vendors 200 to 50 49 to 20 19 and smaller HomomorphicEncryption.org
  • 34.
  • 35.
    35 CCPA redefines ”Personalinformation” • CCPA states that ”Personal information” means information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a particular consumer or household PwC, Micro Focus
  • 36.
    36Source: MS Encryption* and Tokenization Discover DataAssets Security by Design GDPR Security Requirements – Encryption and Tokenization Homomorphic encryption cannot be used to enable data scientist to circumvent GDPR. For example, there is no way for a cloud service to use homomorphic encryption to draw insights from encrypted customer data. Instead, results of encrypted computations remain encrypted and can only be decrypted by the owner of the data, e.g., a cloud service Source: IBM
  • 37.
    37 Find your sensitivedata in cloud and on premise www.protegrity.com
  • 38.
  • 39.
    39https://royalsociety.org/ Differential privacy • Thetwo concepts can be brought together • Differential privacy comes with a mathematical proof (or guarantee) • In differential privacy, the concern is about privacy as the relative difference in the result whether a specific individual or entity is included in the input or excluded • In contrast, ‘anonymization’ in data protection law is concerned about removing any attributes associated with an individual that could make them identifiable in a dataset Analysis / computation Opt-out scenario Difference Real world computation Input Output Analysis / computation Input Output Protected Curator* (Filter) Output Cleanser (Filter) Input Protected Database INTERNATIONAL STANDARD ISO/IEC 20889 *: Example: Apple
  • 40.
    40 Random differential privacy Probabilistic differential privacy Concentrated differential privacy Noise is verylow. Used in practice. Tailored to large numbers of computations. Approximate differential privacy More useful analysis can be performed. Well-studied. Can lead to unlikely outputs. Widely used Computational differential privacy Multiparty differential privacy Can ensure the privacy of individual contributions. Aggregation is performed locally. Strong degree of protection. High accuracy 6 Differential Privacy Models A pure model provides protection even against attackers with unlimited computational power.
  • 41.
    41 Clear text data Cleanser Filter Database K-anonymity model Thek-anonymity model that ensures that groups smaller than k individuals cannot be identified. • Queries will return at least k number of records. K- anonymity is a formal privacy measurement model that ensures that for each identifier there is a corresponding equivalence class containing at least K records. Source: INTERNATIONAL STANDARD ISO/IEC 20889 Some of the de-identification techniques can be used either independently or in combination with each other to satisfy the K- anonymity model. Suppression techniques, generalization techniques, and microaggregation* can be applied to different types of attributes in a dataset to achieve the desired results. *: Microaggregation replaces all values of continuous attributes with their averages computed in a certain algorithmic way.
  • 42.
    42 Enhancements to theconcept of K-anonymity Source: INTERNATIONAL STANDARD ISO/IEC 20889 L-diversity 1. L-diversity is an enhancement to K-anonymity for datasets with poor attribute variability. 2. It is designed to protect against deterministic inference attempts by ensuring that each equivalence class has at least L well-represented values for each sensitive attribute. 3. L-diversity is not a single model but a group of models • Each model has diversity defined slightly differently, e.g. by counting distinct values or by entropy. 4. L-diversity can be difficult to achieve and can cause significant loss in data utility due to the implicit assumption of how values for each sensitive attribute are distributed. 5. Its ability to protect against inferences is also limited when data values are unevenly distributed. 6. This variant of K-anonymity is subject to attacks, which have led to the development of M-invariance and T-closeness. T-closeness • T-closeness is an enhancement to L-diversity for datasets with attributes that are unevenly distributed, belong to a small range of values, or are categorical. • Group based anonymization that is used to preserve privacy in data sets by reducing the granularity of a data representation. This reduction is a trade off that results in some loss of effectiveness of data management or data mining algorithms in order to gain some privacy.
  • 43.
    43 Shared responsibili ties across cloud service models Data Protectionfor Multi- cloud Payment Application Payment Network Payment Data Policy, tokenization, encryption and keys Gateway Call Center Application PI* Data Tokenization Salesforce Analytics Application Differential Privacy (DP), K-anonymity model PI* Data Microsoft Election Guard development kit Election Data Homomorphic Encryption (HE) Data Warehouse PI* Data Vault-less tokenization (VLT) Use-cases of some data privacy techniques Voting Application *: PI Data (Personal information) means information that identifies, relates to, describes, is capable of being associated with, or could reasonably be linked, directly or indirectly, with a consumer or household according to CCPA Dev/test Systems Masking PI* Data
  • 44.
    44 Risk reduction of standardized de-identification techniques Source:INTERNATIONAL STANDARD ISO/IEC 20889 Transit Use Storage Singling out Linking Inference Pseudonymization Tokenization Protects the data flow from attacks Yes Yes Yes Yes Direct identifiers No Partially No Deterministic encryption Protects the data when not used in processing operations Yes No Yes Yes All attributes No Partially No Order-preserving encryption Protects the data from attacks Partially Partially Partially Yes All attributes No Partially No Homomorphic encryption Protects the data also when used in processing operations Yes Yes Yes Yes All attributes No No No Masking Protects the data in dev/test and analytical applications Yes Yes Yes Yes Local identifiers Yes Partially No Local suppression Protects the data in analytical applications Yes Yes Yes Yes Identifying attributes Partially Partially Partially Record suppression Removes the data from the data set Yes Yes Yes Yes All attributes Yes Yes Yes Sampling Exposes only a subset of the data for analytical applications Partially Partially Partially Yes All attributes Partially Partially Partially Generalization Protects the data in dev/test and analytical applications Yes Yes Yes Yes Identifying attributes Partially Partially Partially Rounding Protects the data in dev/test and analytical applications Yes Yes Yes Yes Identifying attributes No Partially Partially Top/bottom coding Protects the data in dev/test and analytical applications Yes Yes Yes Yes Identifying attributes No Partially Partially Noise addition Protects the data in dev/test and analytical applications Yes Yes Yes No Identifying attributes Partially Partially Partially Permutation Protects the data in dev/test and analytical applications Yes Yes Yes No Identifying attributes Partially Partially Partially Micro aggregation Protects the data in dev/test and analytical applications Yes Yes Yes No All attributes No Partially Partially Differential privacy Protects the data in analytical applications No Yes Yes No Identifying attributes Yes Yes Partially K-anonymity Protects the data in analytical applications No Yes Yes Yes Quai identifiers Yes Partially No Privacy models Applicable to types of attributes Reduces the risk of Cryptographic tools Suppression Generalization Technique name Data truthfulness at record level Use Case / User Story Data protected in Randomization
  • 45.
    45 Example of what Role2 can see Example of the data values that are stored in the file at rest Reduce risk by not exposing the full data value to applications and users that only need to operate on a limited representation of the data Reduce risk by field level protection
  • 46.
    46 Access to DataSources / FieldsLow High High - Low - I I Risk, productivity and access to more data fields User Productivity
  • 47.
    47 Data sources Data Warehouse Complete policy- enforcedde- identification of sensitive data across all bank entities Example of Cross Border Data-centric Security using Tokenization • Protecting Personally Identifiable Information (PII), including names, addresses, phone, email, policy and account numbers • Compliance with EU Cross Border Data Protection Laws • Utilizing Data Tokenization, and centralized policy, key management, auditing, and reporting Data should be protected
  • 48.
  • 49.
    49 Data protection techniques:Deployment on-premises, and clouds Data Warehouse Centralized Distributed On- premises Public Cloud Private Cloud Vault-based tokenization y y Vault-less tokenization y y y y y y Format preserving encryption y y y y y Homomorphic encryption y y Masking y y y y y y Hashing y y y y y y Server model y y y y y y Local model y y y y y y L-diversity y y y y y y T-closeness y y y y y y Privacy enhancing data de-identification terminology and classification of techniques De- identification techniques Tokenization Cryptographic tools Suppression techniques Formal privacy measurement models Differential Privacy K-anonymity model
  • 50.
    50Source: Gartner Source: Netskope Current useor plan to use: Spending by Deployment Model, Digital Commerce Platforms, Worldwide
  • 51.
    51 Cloud-based searchable encryptionsystems Build-Index Searchable encryption systems commonly utilize an index structure to keep track of occurrences of keywords in documents and insert them into the index structure. This Build-Index process is used by the data owner to generate a secure and searchable structure that enables search over the encrypted data. • An index structure is generally implemented in form of a hash table , meta data (markup) , or inverted index where each unique keyword is mapped to documents High-Performing Cloud Computing (HPCC) Laboratory, School of Computing and Informatics, University of Louisiana and IXUP
  • 52.
    52 Legal Compliance andNation-State Attacks • Many companies have information that is attractive to governments and intelligence services. • Others worry that litigation may result in a subpoena for all their data. Securosis, 2019 Multi-Cloud Key Management considerations Jurisdiction • Cloud service providers, especially IaaS vendors, offer services in multiple countries, often in more than one region, with redundant data centers. • This redundancy is great for resilience, but regulatory concerns arises when moving data across regions which may have different laws and jurisdictions. SecuPi
  • 53.
    53 • Amazon S3encryption and decryption takes place in the EMRFS client on your cluster. • Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded. • The EMR (Elastic MapReduce) File System (EMRFS) is an implementation of HDFS that all Amazon EMR clusters use for reading and writing regular files from Amazon EMR directly to Amazon S3. Amazon S3 client-side encryption
  • 54.
  • 55.
    55 Risk Elasticity Out-sourcedIn-house On-premises On-premises Private Cloud Hosted Private Cloud PublicCloud Low - High - Computing Cost - High - Low • Encryption/decryption Point Protected data U U U • Encryption Key Management U U Multi-Cloud Key Management
  • 56.
    56 A Data SecurityGateway can protect sensitive data in Cloud and On-premise • Policy Enforcement Protected data U • Encryption Key On-premise
  • 57.
    57 Protection throughout thelifecycle of data in Hadoop Tokenizes or encrypts sensitive data fields Enterprise Policies Privacy policies may be managed on-prem or Cloud Platform • Policy Enforcement Point (PEP) Protected data fields U Separation of Duties • Encryption Key Management Big Data Analytics Data Producers Data Users Google Cloud UU Big Data Protection with Granular Field Level Protection for Google Cloud
  • 58.
    58 Protect data beforelanding Enterprise Policies Apps using de-identified data Sensitive data streams Enterprise on- prem Data lifted to S3 is protected before use S3 • Applications can use de- identified data or data in the clear based on policies • Protection of data in AWS S3 before landing in a S3 bucket Protection of data in AWS S3 with Separation of Duties • Policy Enforcement Point (PEP) Separation of Duties • Encryption Key Management
  • 59.
    59Securosis, 2019 Consistency • Mostfirms are quite familiar with their on-premises encryption and key management systems, so they often prefer to leverage the same tool and skills across multiple clouds. • Firms often adopt a “best of breed” cloud approach. Examples of Hybrid Cloud considerations Trust • Some customers simply do not trust their vendors. Vendor Lock-in and Migration • A common concern is vendor lock-in, and an inability to migrate to another cloud service provider. Cloud Gateway Google Cloud AWS Cloud Azure Cloud S3 Salesforce
  • 60.
    60 References: 1. California ConsumerPrivacy Act, OCT 4, 2019, https://www.csoonline.com/article/3182578/california-consumer-privacy-act-what- you-need-to-know-to-be-compliant.html 2. CIS Controls V7.1 Mapping to NIST CSF, https://dataprivacylab.org/projects/identifiability/paper1.pdf 3. GDPR and Tokenizing Data, https://tdwi.org/articles/2018/06/06/biz-all-gdpr-and-tokenizing-data-3.aspx 4. GDPR VS CCPA, https://wirewheel.io/wp-content/uploads/2018/10/GDPR-vs-CCPA-Cheatsheet.pdf 5. General Data Protection Regulation, https://en.wikipedia.org/wiki/General_Data_Protection_Regulation 6. IBM Framework Helps Clients Prepare for the EU's General Data Protection Regulation, https://ibmsystemsmag.com/IBM- Z/03/2018/ibm-framework-gdpr 7. INTERNATIONAL STANDARD ISO/IEC 20889, https://webstore.ansi.org/Standards/ISO/ISOIEC208892018?gclid=EAIaIQobChMIvI- k3sXd5gIVw56zCh0Y0QeeEAAYASAAEgLVKfD_BwE 8. INTERNATIONAL STANDARD ISO/IEC 27018, https://webstore.ansi.org/Standards/ISO/ ISOIEC270182019?gclid=EAIaIQobChMIleWM6MLd5gIVFKSzCh3k2AxKEAAYASAAEgKbHvD_BwE 9. New Enterprise Application and Data Security Challenges and Solutions https://www.brighttalk.com/webinar/new-enterprise- application-and-data-security-challenges-and-solutions/ 10. Machine Learning and AI in a Brave New Cloud World https://www.brighttalk.com/webcast/14723/357660/machine-learning-and-ai- in-a-brave-new-cloud-world 11. Emerging Data Privacy and Security for Cloud https://www.brighttalk.com/webinar/emerging-data-privacy-and-security-for-cloud/ 12. New Application and Data Protection Strategies https://www.brighttalk.com/webinar/new-application-and-data-protection- strategies-2/ 13. The Day When 3rd Party Security Providers Disappear into Cloud https://www.brighttalk.com/webinar/the-day-when-3rd-party- security-providers-disappear-into-cloud/ 14. Advanced PII/PI Data Discovery https://www.brighttalk.com/webinar/advanced-pii-pi-data-discovery/ 15. Emerging Application and Data Protection for Cloud https://www.brighttalk.com/webinar/emerging-application-and-data-protection- for-cloud/ 16. Data Security: On Premise or in the Cloud, ISSA Journal, December 2019, ulf@ulfmattsson.com 17. Webinars and slides, www.ulfmattsson.com
  • 61.
    61 bipartisanpolicy Ulf Mattsson Chief SecurityStrategist www.Protegrity.com Thank You!