SlideShare a Scribd company logo
1 of 38
Shadid Chowdhury
GDPR and Data Lake
 Understanding GDPR
 GDPR from Data Lake perspective
 Solving Data Controller’s responsibility
 Solving Data Subject’s right
 Process recommendation
 Final thoughts
Disclaimer: This is not legal advice!
Goal: GDPR compliant Data Lake
 GDPR is the most important change
in data privacy regulation in 20 years
 Enforced from 25th May 2018
 4% of annual global turnover or €20 Million
(whichever is greater)
General Data
Protection Regulation
GDPR from Data Lake perspective
Aggregation
Pseudo
anonymization-
Anonymization
Consent
Legitimate
Interest
Vendor’s Solution
 The EU General Data Protection
Regulation (GDPR) is the most important
change in
data privacy regulation in 20 years
 99 Article
 Data controller’s responsibility
 Data subject’s right
GDPR
 Data Controller
 Lawfulness of processing based on consent
 Records of processing activities and personal data
 Data protection by design and default
 Cooperation with supervisory authority
Data Controller’s Responsibility
 Data Subject, consumer
 Right of access
 Data portability
 Right to be forgotten
 Right to object, rectify
Data Subject’s Right
 Data Controller
 Lawfulness of processing based on consent
 Records of processing activities and personal data
 Data protection by design and default
 Cooperation with supervisory authority
 Data Subject, consumer
 Right of access
 Data portability
 Right to be forgotten
 Right to object, rectify
GDPR from Data Lake Perspective
Understanding Data Lake
 Disjoint files
 Easy to replicate
 Different teams
 No built-in Governance
Data Lake
GDPR & Data Lake
Image Source: https://mindfulmvmnt.org/2016/08/09/sciatica-piriformis-syndrome-condition-breakdown-w-corrective-
yoga/
Solution
 There is no silver bullet solution
 Different solution approach based on the use case
Solution approach
 Data Controller
 Lawfulness of processing based on consent
 Records of processing activities and personal data
 Data protection by design and default
 Cooperation with supervisory authority
Recap: Data Controller’s Responsibility
Lawfulness of processing
 Anonymization – Re-identification is NOT possible
 Pseudo anonymization- re-identification possible
 Personal data – Identifies a person directly or indirectly
 Special category of personal data – ethnic origin, political or religious
views, health etc
Rest of the talk assumes
P
e
r
s
o
n
a
l
True Anonymisation?
Anonymization
V
a
l
u
e
Low High
High
 Anonymized
 Pseudo anonymized
 Personal Data
 Special category of personal data
Personal Data Minimisation
L
a
k
e
Anonymize everything
Batch
source
Ingestion
Raw Storage
Batch
source
Analytics
BI
Aggregated Storage
Streaming
Source
Sources Transient Storage Consumer
Channels
Personal data: Pseudo Anonymised
Batch
source
Ingestion
Raw Storage
Batch
source
Analytics
BI
Aggregated Storage
Streaming
Source
Sources Transient Storage Consumer
Channels
Pseudo anonymization techniques
• For each data source
• Direct Identifiers
– Encryption
1. Symmetric/Asymmetric
2.Per person/Per purpose
– Hashing ID + salt
– Save mapping hash/key in a lookup table (consent or legal or legitimate interest)
• Indirect identifiers
– Aggregation/generalization etc
Personal data: on a single place
Batch
source
Ingestion
Raw Storage
Batch
source
Analytics
BI
Aggregated Storage
Streaming
Source
Sources Transient Storage Consumer
Channels
Personal data: Pseudo Anonymized
Batch
source
Ingestion
Batch
source
Analytics
BI
Streaming
Source
Sources Transient Storage Consumer
Channels
Consent
Pseudo Anonymized separated
Batch
source
Ingestion
Batch
source
Analytics
BI
Streaming
Source
Sources Transient Storage Consumer
Channels
Consent
Personal Data: Log Access
Batch
source
Ingestion
Batch
source
Analytics
BI
Streaming
Source
Sources Transient Storage Consumer
Channels
Consent
 If user withdraws a consent later
 How would you restrict processing?
Multiple consent for same data source
User Marketing
Campaign
Customer
Care
+467308080 Yes Yes
+467000601 Yes Yes
User Marketing
Campaign
Customer
Care
+467308080 Yes Yes
+467000601 Yes
 Model around purpose
 Pros
 Simplifies GDPR compliance
 Cons
 Increase of storage
Multiple consent for same data source
p1 p2 … pn
 Minimization of personal data
 Lawfulness of processing
 Traceability of processing
 Data protection by design and by default
Data Controller’s Responsibility: Solution
Principles
 Data Subject, consumer
 Right of access
 Data portability
 Right to be forgotten
 Right to object, rectify
Recap: Data Subject’s Right
Right of Data Subject
 Removing from the mapped key, hashed ID is sufficient on the lake to
implement right to forget
Right to forget
Keep metadata & lineage
Batch
source
Ingestion
Batch
source
Analytics
BI
Streaming
Source
Sources Transient Storage Consumer
Channels
Consent
Self service: Automated Reports
Batch
source
Ingestion
Batch
source
Analytics
BI
Streaming
Source
Sources Transient Storage Consumer
Channels
Consent
 Governance in single place
 Rich Metadata
 Self service
Right of Data Subject: Solution Principles
 Apply PIA for each data sources, DPO
 Develop tests for anonymization with Statistician, Scientist
 Anonymization level test with existing data sources
 Solutions needs to be reapplied to Data Processor’s as well
Process
GDPR is a blessing in disguise!

More Related Content

What's hot

Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with SnowflakeMatillion
 
BW Adjusting settings and monitoring data loads
BW Adjusting settings and monitoring data loadsBW Adjusting settings and monitoring data loads
BW Adjusting settings and monitoring data loadsLuc Vanrobays
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusGlobus
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data modelDATAVERSITY
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshSion Smith
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Amazon Web Services
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company PresentationAndrewJiang18
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMatillion
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdfDatacademy.ai
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiTimothy Spann
 
Apache Storm
Apache StormApache Storm
Apache StormEdureka!
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateFlink Forward
 

What's hot (20)

Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
BW Adjusting settings and monitoring data loads
BW Adjusting settings and monitoring data loadsBW Adjusting settings and monitoring data loads
BW Adjusting settings and monitoring data loads
 
Building Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with GlobusBuilding Data Portals and Science Gateways with Globus
Building Data Portals and Science Gateways with Globus
 
Strategic imperative the enterprise data model
Strategic imperative the enterprise data modelStrategic imperative the enterprise data model
Strategic imperative the enterprise data model
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
Snowflake: Your Data. No Limits (Session sponsored by Snowflake) - AWS Summit...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Data Sharing with Snowflake
Data Sharing with SnowflakeData Sharing with Snowflake
Data Sharing with Snowflake
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Snowflake Company Presentation
Snowflake Company PresentationSnowflake Company Presentation
Snowflake Company Presentation
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Modern Data Pipelines
Modern Data PipelinesModern Data Pipelines
Modern Data Pipelines
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Data Engineering.pdf
Data Engineering.pdfData Engineering.pdf
Data Engineering.pdf
 
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 

Similar to GDPR Data Lake Guide: Solving Data Controller Responsibilities

Data Quality-Driven GDPR: Compliance with Confidence (EMEA)
Data Quality-Driven GDPR: Compliance with Confidence (EMEA)Data Quality-Driven GDPR: Compliance with Confidence (EMEA)
Data Quality-Driven GDPR: Compliance with Confidence (EMEA)Precisely
 
Taking the Fear Out of GDPR
Taking the Fear Out of GDPRTaking the Fear Out of GDPR
Taking the Fear Out of GDPRNate Stockard
 
GDPR master class - transparent research projects
GDPR master class - transparent research projectsGDPR master class - transparent research projects
GDPR master class - transparent research projectsMRS
 
Flash Friday: Data Quality & GDPR
Flash Friday: Data Quality & GDPRFlash Friday: Data Quality & GDPR
Flash Friday: Data Quality & GDPRPrecisely
 
Preparing for GDPR: What Every B2B Marketer Must Know
Preparing for GDPR: What Every B2B Marketer Must KnowPreparing for GDPR: What Every B2B Marketer Must Know
Preparing for GDPR: What Every B2B Marketer Must KnowIntegrate
 
Bridging the Gap Between Privacy and Retention
Bridging the Gap Between Privacy and RetentionBridging the Gap Between Privacy and Retention
Bridging the Gap Between Privacy and RetentionInfoGoTo
 
Data Quality-Driven GDPR: Compliance with Confidence
Data Quality-Driven GDPR: Compliance with ConfidenceData Quality-Driven GDPR: Compliance with Confidence
Data Quality-Driven GDPR: Compliance with ConfidencePrecisely
 
Webinar | GDPR: How Can Content Services Help You Comply?
Webinar | GDPR: How Can Content Services Help You Comply?Webinar | GDPR: How Can Content Services Help You Comply?
Webinar | GDPR: How Can Content Services Help You Comply?Nuxeo
 
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...Sabrina Kirrane
 
GDPR in the Healthcare Industry
GDPR in the Healthcare IndustryGDPR in the Healthcare Industry
GDPR in the Healthcare IndustryEMMAIntl
 
GDPR: Training Materials by Qualsys
GDPR: Training Materials  by QualsysGDPR: Training Materials  by Qualsys
GDPR: Training Materials by QualsysQualsys Ltd
 
Web Analytics and Privacy
Web Analytics and Privacy Web Analytics and Privacy
Web Analytics and Privacy Piwik PRO
 
IAB Europe's GDPR Compliance Primer
IAB Europe's GDPR Compliance PrimerIAB Europe's GDPR Compliance Primer
IAB Europe's GDPR Compliance PrimerIAB Europe
 
An Overview of GDPR by Pathway Group
An Overview of GDPR by Pathway GroupAn Overview of GDPR by Pathway Group
An Overview of GDPR by Pathway GroupThe Pathway Group
 
#1NWebinar: GDPR and Privacy Best Practices for Digital Marketers
#1NWebinar: GDPR and Privacy Best Practices for Digital Marketers#1NWebinar: GDPR and Privacy Best Practices for Digital Marketers
#1NWebinar: GDPR and Privacy Best Practices for Digital MarketersOne North
 
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...ARMA International
 
GDPR: Your Journey to Compliance
GDPR: Your Journey to ComplianceGDPR: Your Journey to Compliance
GDPR: Your Journey to ComplianceCobweb
 

Similar to GDPR Data Lake Guide: Solving Data Controller Responsibilities (20)

Data Quality-Driven GDPR: Compliance with Confidence (EMEA)
Data Quality-Driven GDPR: Compliance with Confidence (EMEA)Data Quality-Driven GDPR: Compliance with Confidence (EMEA)
Data Quality-Driven GDPR: Compliance with Confidence (EMEA)
 
Taking the Fear Out of GDPR
Taking the Fear Out of GDPRTaking the Fear Out of GDPR
Taking the Fear Out of GDPR
 
GDPR master class - transparent research projects
GDPR master class - transparent research projectsGDPR master class - transparent research projects
GDPR master class - transparent research projects
 
Flash Friday: Data Quality & GDPR
Flash Friday: Data Quality & GDPRFlash Friday: Data Quality & GDPR
Flash Friday: Data Quality & GDPR
 
Preparing for GDPR: What Every B2B Marketer Must Know
Preparing for GDPR: What Every B2B Marketer Must KnowPreparing for GDPR: What Every B2B Marketer Must Know
Preparing for GDPR: What Every B2B Marketer Must Know
 
Bridging the Gap Between Privacy and Retention
Bridging the Gap Between Privacy and RetentionBridging the Gap Between Privacy and Retention
Bridging the Gap Between Privacy and Retention
 
Data Quality-Driven GDPR: Compliance with Confidence
Data Quality-Driven GDPR: Compliance with ConfidenceData Quality-Driven GDPR: Compliance with Confidence
Data Quality-Driven GDPR: Compliance with Confidence
 
Webinar | GDPR: How Can Content Services Help You Comply?
Webinar | GDPR: How Can Content Services Help You Comply?Webinar | GDPR: How Can Content Services Help You Comply?
Webinar | GDPR: How Can Content Services Help You Comply?
 
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
Scalable policy-aware Linked Data architecture for prIvacy, transparency and ...
 
GDPR Seminar Slides
GDPR Seminar SlidesGDPR Seminar Slides
GDPR Seminar Slides
 
GDPR in the Healthcare Industry
GDPR in the Healthcare IndustryGDPR in the Healthcare Industry
GDPR in the Healthcare Industry
 
GDPR: Training Materials by Qualsys
GDPR: Training Materials  by QualsysGDPR: Training Materials  by Qualsys
GDPR: Training Materials by Qualsys
 
GDPR for your Payroll Bureau
GDPR for your Payroll BureauGDPR for your Payroll Bureau
GDPR for your Payroll Bureau
 
Web Analytics and Privacy
Web Analytics and Privacy Web Analytics and Privacy
Web Analytics and Privacy
 
IAB Europe's GDPR Compliance Primer
IAB Europe's GDPR Compliance PrimerIAB Europe's GDPR Compliance Primer
IAB Europe's GDPR Compliance Primer
 
An Overview of GDPR
An Overview of GDPR An Overview of GDPR
An Overview of GDPR
 
An Overview of GDPR by Pathway Group
An Overview of GDPR by Pathway GroupAn Overview of GDPR by Pathway Group
An Overview of GDPR by Pathway Group
 
#1NWebinar: GDPR and Privacy Best Practices for Digital Marketers
#1NWebinar: GDPR and Privacy Best Practices for Digital Marketers#1NWebinar: GDPR and Privacy Best Practices for Digital Marketers
#1NWebinar: GDPR and Privacy Best Practices for Digital Marketers
 
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
Richard Hogg & Dennis Waldron - #InfoGov17 - Cognitive Unified Governance & P...
 
GDPR: Your Journey to Compliance
GDPR: Your Journey to ComplianceGDPR: Your Journey to Compliance
GDPR: Your Journey to Compliance
 

Recently uploaded

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 

Recently uploaded (20)

Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 

GDPR Data Lake Guide: Solving Data Controller Responsibilities

  • 2.  Understanding GDPR  GDPR from Data Lake perspective  Solving Data Controller’s responsibility  Solving Data Subject’s right  Process recommendation  Final thoughts Disclaimer: This is not legal advice! Goal: GDPR compliant Data Lake
  • 3.  GDPR is the most important change in data privacy regulation in 20 years  Enforced from 25th May 2018  4% of annual global turnover or €20 Million (whichever is greater) General Data Protection Regulation
  • 4. GDPR from Data Lake perspective
  • 7.  The EU General Data Protection Regulation (GDPR) is the most important change in data privacy regulation in 20 years  99 Article  Data controller’s responsibility  Data subject’s right GDPR
  • 8.  Data Controller  Lawfulness of processing based on consent  Records of processing activities and personal data  Data protection by design and default  Cooperation with supervisory authority Data Controller’s Responsibility
  • 9.  Data Subject, consumer  Right of access  Data portability  Right to be forgotten  Right to object, rectify Data Subject’s Right
  • 10.  Data Controller  Lawfulness of processing based on consent  Records of processing activities and personal data  Data protection by design and default  Cooperation with supervisory authority  Data Subject, consumer  Right of access  Data portability  Right to be forgotten  Right to object, rectify GDPR from Data Lake Perspective
  • 12.  Disjoint files  Easy to replicate  Different teams  No built-in Governance Data Lake
  • 13. GDPR & Data Lake Image Source: https://mindfulmvmnt.org/2016/08/09/sciatica-piriformis-syndrome-condition-breakdown-w-corrective- yoga/
  • 15.  There is no silver bullet solution  Different solution approach based on the use case Solution approach
  • 16.  Data Controller  Lawfulness of processing based on consent  Records of processing activities and personal data  Data protection by design and default  Cooperation with supervisory authority Recap: Data Controller’s Responsibility
  • 18.  Anonymization – Re-identification is NOT possible  Pseudo anonymization- re-identification possible  Personal data – Identifies a person directly or indirectly  Special category of personal data – ethnic origin, political or religious views, health etc Rest of the talk assumes P e r s o n a l
  • 20.  Anonymized  Pseudo anonymized  Personal Data  Special category of personal data Personal Data Minimisation L a k e
  • 21. Anonymize everything Batch source Ingestion Raw Storage Batch source Analytics BI Aggregated Storage Streaming Source Sources Transient Storage Consumer Channels
  • 22. Personal data: Pseudo Anonymised Batch source Ingestion Raw Storage Batch source Analytics BI Aggregated Storage Streaming Source Sources Transient Storage Consumer Channels
  • 23. Pseudo anonymization techniques • For each data source • Direct Identifiers – Encryption 1. Symmetric/Asymmetric 2.Per person/Per purpose – Hashing ID + salt – Save mapping hash/key in a lookup table (consent or legal or legitimate interest) • Indirect identifiers – Aggregation/generalization etc
  • 24. Personal data: on a single place Batch source Ingestion Raw Storage Batch source Analytics BI Aggregated Storage Streaming Source Sources Transient Storage Consumer Channels
  • 25. Personal data: Pseudo Anonymized Batch source Ingestion Batch source Analytics BI Streaming Source Sources Transient Storage Consumer Channels Consent
  • 27. Personal Data: Log Access Batch source Ingestion Batch source Analytics BI Streaming Source Sources Transient Storage Consumer Channels Consent
  • 28.  If user withdraws a consent later  How would you restrict processing? Multiple consent for same data source User Marketing Campaign Customer Care +467308080 Yes Yes +467000601 Yes Yes User Marketing Campaign Customer Care +467308080 Yes Yes +467000601 Yes
  • 29.  Model around purpose  Pros  Simplifies GDPR compliance  Cons  Increase of storage Multiple consent for same data source p1 p2 … pn
  • 30.  Minimization of personal data  Lawfulness of processing  Traceability of processing  Data protection by design and by default Data Controller’s Responsibility: Solution Principles
  • 31.  Data Subject, consumer  Right of access  Data portability  Right to be forgotten  Right to object, rectify Recap: Data Subject’s Right
  • 32. Right of Data Subject
  • 33.  Removing from the mapped key, hashed ID is sufficient on the lake to implement right to forget Right to forget
  • 34. Keep metadata & lineage Batch source Ingestion Batch source Analytics BI Streaming Source Sources Transient Storage Consumer Channels Consent
  • 35. Self service: Automated Reports Batch source Ingestion Batch source Analytics BI Streaming Source Sources Transient Storage Consumer Channels Consent
  • 36.  Governance in single place  Rich Metadata  Self service Right of Data Subject: Solution Principles
  • 37.  Apply PIA for each data sources, DPO  Develop tests for anonymization with Statistician, Scientist  Anonymization level test with existing data sources  Solutions needs to be reapplied to Data Processor’s as well Process
  • 38. GDPR is a blessing in disguise!

Editor's Notes

  1. Broaden the definition of personal data More responsibility on Data Controller Lawfulnees of processing Data Subject’s right for example right to be fogotten or portability right Heavy fine 
  2. 1. Vendors or products won't solve everything 2. There is no one size fit solution
  3. Recommended for GDPR, processing, processors does not need to identify individuals. Remember pseudo anonymization is still considered personal data even if they are written down on paper on locked in volt the GDPR defines pseudonymization in Article 3, as “the processing of personal data in such a way that the data can no longer be attributed to a specific data subject without the use of additional information.” To pseudonymize a data set, the “additional information” must be “kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable person.” Pseudonymization does not remove all identifying information from the data but merely reduces the linkability of a dataset with the original identity of an individual (e.g., via an encryption scheme).
  4. Track all metadata and lineage and based on the lineage keep the whole graph Services to track and build report for each users data, processing etc Track metadata, lineage, tags and single source of governance on lake Tag based dynamic security
  5. Track all metadata and lineage and based on the lineage keep the whole graph Services to track and build report for each users data, processing etc Track metadata, lineage, tags and single source of governance on lake Tag based dynamic security