Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Real world data engineering practices for GDPR

175 views

Published on

A brief of how a company and/or an engineer designs data collection system under GDPR.

Published in: Data & Analytics
  • Be the first to comment

Real world data engineering practices for GDPR

  1. 1. Real-World Data Engineering Practice for GDPR Ching-Yu Wu and Jeff Hung, SPN Data Team, Trend Micro 2019/09/06 @DataCon
  2. 2. © 2019 Trend Micro Inc.2 ⚠️ Disclaimer • Please view this sharing as a reference – Detailed implementation varies with different business requirements – Maybe not suitable for every company – MUST reach a consensus with legal department before implementing your data pipeline
  3. 3. © 2019 Trend Micro Inc.3 What is GDPR? General Data Protection Regulation Effective on 2018/5/25 Protect Personal Data of EU citizens Strengthen Privacy Rights of EU Individuals
  4. 4. © 2019 Trend Micro Inc.4 Key Changes Increased Territorial Scope • All businesses collecting personal data on EU citizens • Regardless of the company’s location Breach Notification • Report it within 72 hours Penalties • 20M € or 4% of global turnover • Google was fined 50M € on 2019/1/21
  5. 5. © 2019 Trend Micro Inc.5 Highlighted Individual’s Rights Right to Access Right to Erasure Data Portability Privacy by Design
  6. 6. © 2019 Trend Micro Inc.6 Simple Data Pipeline for GDPR
  7. 7. © 2019 Trend Micro Inc.7 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  8. 8. © 2019 Trend Micro Inc.8 Data Collection Declaration • Clearly declare the purposes in Terms of Use – What data will be sent? • List all the categories – Reasons for collecting data • Is it essential for service? – A clear consent • Check box for opt-in or opt-out
  9. 9. © 2019 Trend Micro Inc.9 Data Categorization • Definition of personal data – Personally Identifying Information (PII) – Non-PII, PII and Sensitive-PII • PII: name, account ID, email address, date of birth, gender, etc. • Sensitive-PII: Health data, sexual orientation, Race, etc. – Collecting Sensitive-PII data is basically prohibited
  10. 10. © 2019 Trend Micro Inc.10 It’s All About Compliance • The definition MUST be established by Legal Department • Review process in development cycle – Clear description for the data being collected • Provided by product team – Legal review, approve and archive it – Clearer document, better communication
  11. 11. © 2019 Trend Micro Inc.11 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  12. 12. © 2019 Trend Micro Inc.12 Separated Databases • De-identification in analytical data – Have a clear separation between user and analytical data • No one can access both – User data (user’s behavior and personal information) • Purchase records, login records, etc. – Analytical data (neutral logs) • Detection logs, activity data, etc.
  13. 13. © 2019 Trend Micro Inc.13 Anonymization • GDPR suggests to have a unified anonymous ID across all the systems – Stop using e-mail or other user’s personal information as the unique ID – Avoid storing personal information in each service/application • Use foreign key or other similar concepts
  14. 14. © 2019 Trend Micro Inc.14 • How to de-identify an identifiable field? – Irreversible encoding – Simplest way: one-way hash • With or without salt? • Refresh salt or not? – Ways to avoid re-counting (e.g., DAU and MAU) • Synchronize the salt between client and server • Use one-way hash (or with fixed salt) • Change the definition of “active” Anonymization (cont’d)
  15. 15. © 2019 Trend Micro Inc.15 Anonymization (cont’d) • Where to de-identify a field? – Ideally at the client-side (before the data sends out) – At least at the very beginning step of server-side ETL process • The mapping table of identifiable data is viewed as User data • The operation MUST be isolated
  16. 16. © 2019 Trend Micro Inc.16 Permission Control • ACL on bucket – Few users/service accounts can read – Even fewer service accounts can write • User cannot have write permission – Principle of analytical data permission control
  17. 17. © 2019 Trend Micro Inc.17 Limited Data Retention • Data shouldn’t be kept for “just in case” purpose • Periodically remove outdated data – The retention period is set according to… • Business value (application’s need) • Data volume (cost) • Other legal issues
  18. 18. © 2019 Trend Micro Inc.18 Data Encryption • All the data should be encrypted in storage and in transmission – Bucket-level encryption – SSL connection – Audit logs
  19. 19. © 2019 Trend Micro Inc.19 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  20. 20. © 2019 Trend Micro Inc.20 Rights to Access and Erasure • If the user and analytical database are separated – Just dump/delete the related records in user database • Otherwise – It’s a big project…
  21. 21. © 2019 Trend Micro Inc.21 The Design of User Database • Dump/Delete user database is challenging – Try not to put historical data in user database (if you can) – Try to concentrate personal data on few tables – Use foreign key or similar concept for storing “key information” • Just modify the record in main table as “removed” – Consider the data exportation and deletion processes at design-phase • Minimize the number of actions to take
  22. 22. © 2019 Trend Micro Inc.22 Topics •Data Collection Declaration •Data Categorization Legal & Compliance •Anonymization •Permission Control •Data Encryption Security •Right to Access and Erasure User's Rights •Data Abuse Prevention Role & Responsibility
  23. 23. © 2019 Trend Micro Inc.23 Data Abuse Prevention • Fulfill marketing’s requirements – When you have to associate user and analytical data • To send promotion e-mail to the inactive users • Let active users have discount while purchase new edition – Do the association at the last step
  24. 24. © 2019 Trend Micro Inc.24 Role & Responsibility • There MUST be a Data Protection Officer (DPO) in each company – Organize a taskforce to review the out-coming inquiries – Audit data usage • Audit log parser for monitoring data accessing – Monitor data breach
  25. 25. © 2019 Trend Micro Inc.25 Summary
  26. 26. © 2019 Trend Micro Inc.26 Summary • Recommended practices for engineers – Good communication with Legal • Documentation – Separate user data and analytical data • De-identify all analytical data • Permission control • Data retention period
  27. 27. © 2019 Trend Micro Inc.27 Q & A
  28. 28. Automated hybrid cloud workload protection via calls to Trend Micro APIs. Created with real data by Trend Micro threat researcher and artist Jindrich Karasek.
  29. 29. © 2019 Trend Micro Inc.29 Reference [1] https://eugdpr.org/ [2] https://gdpr-info.eu [3] https://blog.infodiagram.com/2018/05/present- gdpr-diagram-data-privacy-ppt-template.html

×