The document discusses GDPR and data lakes. It explains that GDPR is the most important change in data privacy regulation in 20 years. It outlines the data controller's responsibilities under GDPR including lawfulness of processing, records of processing activities, and data protection by design. It also discusses data subjects' rights such as right of access, data portability, and right to be forgotten. The document then examines GDPR considerations from the perspective of a data lake, such as techniques for anonymization and pseudonymization of personal data. It provides recommendations for solutions to ensure GDPR compliance when using a data lake.
2. Understanding GDPR
GDPR from Data Lake perspective
Solving Data Controller’s responsibility
Solving Data Subject’s right
Process recommendation
Final thoughts
Disclaimer: This is not legal advice!
Goal: GDPR compliant Data Lake
3. GDPR is the most important change
in data privacy regulation in 20 years
Enforced from 25th May 2018
4% of annual global turnover or €20 Million
(whichever is greater)
General Data
Protection Regulation
7. The EU General Data Protection
Regulation (GDPR) is the most important
change in
data privacy regulation in 20 years
99 Article
Data controller’s responsibility
Data subject’s right
GDPR
8. Data Controller
Lawfulness of processing based on consent
Records of processing activities and personal data
Data protection by design and default
Cooperation with supervisory authority
Data Controller’s Responsibility
9. Data Subject, consumer
Right of access
Data portability
Right to be forgotten
Right to object, rectify
Data Subject’s Right
10. Data Controller
Lawfulness of processing based on consent
Records of processing activities and personal data
Data protection by design and default
Cooperation with supervisory authority
Data Subject, consumer
Right of access
Data portability
Right to be forgotten
Right to object, rectify
GDPR from Data Lake Perspective
15. There is no silver bullet solution
Different solution approach based on the use case
Solution approach
16. Data Controller
Lawfulness of processing based on consent
Records of processing activities and personal data
Data protection by design and default
Cooperation with supervisory authority
Recap: Data Controller’s Responsibility
18. Anonymization – Re-identification is NOT possible
Pseudo anonymization- re-identification possible
Personal data – Identifies a person directly or indirectly
Special category of personal data – ethnic origin, political or religious
views, health etc
Rest of the talk assumes
P
e
r
s
o
n
a
l
22. Personal data: Pseudo Anonymised
Batch
source
Ingestion
Raw Storage
Batch
source
Analytics
BI
Aggregated Storage
Streaming
Source
Sources Transient Storage Consumer
Channels
23. Pseudo anonymization techniques
• For each data source
• Direct Identifiers
– Encryption
1. Symmetric/Asymmetric
2.Per person/Per purpose
– Hashing ID + salt
– Save mapping hash/key in a lookup table (consent or legal or legitimate interest)
• Indirect identifiers
– Aggregation/generalization etc
24. Personal data: on a single place
Batch
source
Ingestion
Raw Storage
Batch
source
Analytics
BI
Aggregated Storage
Streaming
Source
Sources Transient Storage Consumer
Channels
25. Personal data: Pseudo Anonymized
Batch
source
Ingestion
Batch
source
Analytics
BI
Streaming
Source
Sources Transient Storage Consumer
Channels
Consent
27. Personal Data: Log Access
Batch
source
Ingestion
Batch
source
Analytics
BI
Streaming
Source
Sources Transient Storage Consumer
Channels
Consent
28. If user withdraws a consent later
How would you restrict processing?
Multiple consent for same data source
User Marketing
Campaign
Customer
Care
+467308080 Yes Yes
+467000601 Yes Yes
User Marketing
Campaign
Customer
Care
+467308080 Yes Yes
+467000601 Yes
29. Model around purpose
Pros
Simplifies GDPR compliance
Cons
Increase of storage
Multiple consent for same data source
p1 p2 … pn
30. Minimization of personal data
Lawfulness of processing
Traceability of processing
Data protection by design and by default
Data Controller’s Responsibility: Solution
Principles
31. Data Subject, consumer
Right of access
Data portability
Right to be forgotten
Right to object, rectify
Recap: Data Subject’s Right
36. Governance in single place
Rich Metadata
Self service
Right of Data Subject: Solution Principles
37. Apply PIA for each data sources, DPO
Develop tests for anonymization with Statistician, Scientist
Anonymization level test with existing data sources
Solutions needs to be reapplied to Data Processor’s as well
Process
Broaden the definition of personal data
More responsibility on Data Controller Lawfulnees of processing
Data Subject’s right for example right to be fogotten or portability right
Heavy fine
1. Vendors or products won't solve everything
2. There is no one size fit solution
Recommended for GDPR, processing, processors does not need to identify individuals.
Remember pseudo anonymization is still considered personal data even if they are written down on paper on locked in volt
the GDPR defines pseudonymization in Article 3, as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” To pseudonymize a data set, the “additional information” must be “kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable person.”
Pseudonymization does not remove all identifying information from the data but merely reduces the linkability of a dataset with the original identity of an individual (e.g., via an encryption scheme).
Track all metadata and lineage and based on the lineage keep the whole graph
Services to track and build report for each users data, processing etc
Track metadata, lineage, tags and single source of governance on lake
Tag based dynamic security
Track all metadata and lineage and based on the lineage keep the whole graph
Services to track and build report for each users data, processing etc
Track metadata, lineage, tags and single source of governance on lake
Tag based dynamic security