1. A Survey of Technical Privacy Solutions
Jonathan Oliver
2. Who am I?
• Dr Jonathan Oliver
• Data Scientist at Trend Micro
• 15 years
https://www.slideshare.net/JonathanOliver26/a-survey-of-technical-privacy-solutions
3. Business Benefit of Privacy
• Meet Regulatory Compliance
• Minimize the impact of data breaches
• Increased trust / loyalty
• Public
• Customers
• Investors
• Customer Acquisition
6. Agenda
1. Use cases and definitions
2. Where Privacy and Security intersect
3. Formal Approaches to Privacy
4. Determine how well they fit use cases
• Can we use off the shelf software / services?
7. Privacy Use Cases
• Data Collection and Storage
• Does the data have PII (Personally Identifiable Information) data?
• Data processed in another country
• Example: data processed on AWS in USA
• Published blogs / data releases
PII required PII not required
Customer accounts Optimizing sales / marketing
8. Privacy: Types of Data
Simple Complex
Spreadsheets 1 row per person Multiple rows per person
Databases 1 row per person Multiple rows per person
DBs with joined tables
Log Files Nearly all log files
Privacy
Solutions
Suitable Not Suitable
9. Legal Definition Privacy (GDPR)
“Anonymisation results from
processing personal data in
order to irreversibly prevent
identification.”
[Page 3] https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf
18. Formal Approaches to Privacy
Privacy Technique Method
Differential Privacy Add “noise” (errors) to data
k-anonymity Delete / suppress data
Homomorphic Encryption Perform computation on encrypted
data
Monero style privacy Obfuscate who performed
transactions (Ring Signatures)
Secure Multiparty Computation
Federated Learning
No single entity can see all the data
19. Differential Privacy
Person Zipcode House Value
Alice A 12345 $100,000
Bob B 12345 $150,000
Carol C 99999 $400,000
Doug D 12345 $150,000
Query Real Answer Diff Privacy Answer
Average House Value $200,000 $200,000
Average House Value in Zipcode 12345 $133,000 $140,000
Average House Value in Zipcode 99999 $400,000 $205,000
21. K-anonymity
Person Zipcode House Value
Alice A 12345 $100,000
Bob B 12345 $150,000
Carol C 99999 $400,000
Doug D 12345 $150,000
Person Zipcode House Value
NULL 12345 $100,000
NULL 12345 $150,000
NULL NULL NULL
NULL 12345 $150,000
24. Formal Approaches to Privacy
Privacy Technique Limitation
Differential Privacy Not suitable complex data
k-anonymity Not suitable complex data
Homomorphic Encryption Really slow
(1 million – 1 billion times slower)
Monero style privacy Application specific
Secure Multiparty Computation
Federated Learning
Does a suitable trusted 3rd party exist?
25. Conclusion
• Does your privacy solution / approach generate business value?
• Tell people about it
• Measure it
• Identify areas where solutions can improve both security and
privacy
• Privacy toolsets are not yet mature
• Use privacy toolsets where they fit the problem well
• Not suitable for complex data
26. Further Reading
1. List of Privacy Tools (NIST) https://www.nist.gov/itl/applied-
cybersecurity/privacy-engineering/collaboration-space/focus-
areas/de-id/tools
2. IBM differential privacy toolset.
https://github.com/IBM/differential-privacy-library
3. Data Segmentation (Datamation)
https://www.datamation.com/security/data-segmentation/
33. Step 3. Cluster / Correlate Table 3
Apply clustering / correlation / pivoting to Table 3
Given a group of rows:
• Do not know which / how many customers generated those
rows
• Do know the minimum possible number of customers that
generated those rows
34. Properties Table 3
When trying to identify which customer generated a given row,
then we are unsure up to R customers
When trying to extract all the rows for a given customer, then we
again face significant uncertainty (factor R)
35. Complexity for Attacker
Ring Signatures are computationally expensive.
Associate a large prime with each PID (e.g., a few hundred bits)
We can use a product of large primes as a “Light Weight Ring
Signature”
Meet many of the requirements of a Ring Signature
Very hard to factor product of large primes
Very easy to determine if a set of 2 (or more) large numbers have
common divisor (Euclidean algorithm)