Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practical experiences using Atlas and Ranger to implement GDPR

242 views

Published on

GDPR is the current focus of all organizations working with data in Europe. Svenska Spel are using Atlas and Ranger as an essential part to make our
data warehouse to conform to GDPR.
This presentation will show how Atlas and Ranger, together with Hive are used in practice to control access and masking of data in our data warehouse. We will demonstrate how we with a model driven development process, using data vault modeling, have extended our model with metadata. From the metadata we generate our access control configuration. The configuration is automatically synchronized to Atlas and Ranger in our deployment process.

Speaker
Magnus Runesson, Data Engineer, Svenska Spel

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Practical experiences using Atlas and Ranger to implement GDPR

  1. 1. Magnus.Runesson@svenskaspel.se DataWorks Summit 2018-04-19 Practical experiences using Atlas and Ranger to implement GDPR 2018-04-25 1
  2. 2. 22018-04-25 Who is talking? Magnus Runesson Data Engineer @ Svenska Spel Developer Ops RDBMS BigData High performance
  3. 3. Gaming is for everyone´s enjoyment
  4. 4. 42018-04-25 • This talk does not cover all we have done around GDPR • This is NOT a way to say if you do this you are GDPR compliant. • Some details are left out or simplified Disclaimer
  5. 5. 52018-04-25 • Why? • Svenska Spel’s data warehouse • Atlas & Ranger • How did we implement it? • Experiences and conclusions Agenda
  6. 6. 62018-04-25 GDPR requires • clear purpose for PII data • privacy by design • clear consent or legal ground • not to use/store PII if not needed • people own their own data. • penalty if not followed Why?
  7. 7. 72018-04-25 • Our customers and partners integrity is protected • Users have only access to data aimed for current purpose • Keep doing our required processing • Adaptable for new requirements • Maintainable solution Goals
  8. 8. 82018-04-25 Svenska Spel’s data warehouse
  9. 9. 92018-04-25 • Moved from classic Cognos + Oracle • HDP 2.6 using Hive • Includes Personal Identifiable Information (PII) • 300+ event streams in • 150 published tables and views Svenska Spel’s data warehouse
  10. 10. 102018-04-25 • Used data are • Understood • Documented • Modelled • Modelled with Data Vault • Oracle SQL Developer Data Modeler • SQL code generated from model Model based development
  11. 11. 112018-04-25 • History tracking • Uniquely linked • Pattern based • Easy to generate code Data Vault Link Hub Hub Satellite Satellite Satellite
  12. 12. 12 CRM mart ETL Anonymization DataLake Integration DataVault Dimension mart ETL BI mart Exasol Tableau Hadoop Presentation Rolebasedaccess CRM Whitelisting …
  13. 13. 132018-04-25 Apache Atlas and Ranger
  14. 14. 142018-04-25 • Metadata about resources • Resource is • Table • Column • Schema • File on HDFS • … • Lineage Apache Atlas
  15. 15. 152018-04-25 • Tags have no meaning themselves • Your business vocabulary define the meaning • Example of tags: • Business entity owning the data • Indication of sensitive data • The rules in Ranger enforces the policy • Separate metadata from policy implementation Atlas tags PII
  16. 16. 162018-04-25 • Is user U allowed to do operation O on resource R? • Access • Row based filtering • Masking • Audit logging • Resources referred with tags Apache Ranger
  17. 17. 172018-04-25 customer Customer_id Name Postal_code Has_phone Marketing 1 Steve 12345 False False 2 Bill 54321 True False 3 Paul 54672 False True Table in Hive before we started our work
  18. 18. 182018-04-25 customer Customer_id Name Postal_code Has_phone Marketing 1 Steve 12345 False False 2 Bill 54321 True False 3 Paul 54672 False True PII_table PII Add PII tags on table and columns in Atlas. No behaviour change. PII
  19. 19. 192018-04-25 customer Customer_id Name Postal_code Has_phone Marketing 17 ABC 12345 False False 42 DEF 54321 True False 13 BDE 54672 False True PII We set a rule in Ranger to mask PII columns Analyst view PII_table PII
  20. 20. 202018-04-25 customer Customer_id Name Postal_code Has_phone Marketing 3 Paul 54672 False True PII Ranger restrict our CRM user to only see rows with Marketing = True PII_table PII
  21. 21. 212018-04-25 How did we implement this?
  22. 22. 222018-04-25 Development process Change m odel Store m odel G enerate code Deploy PII Add rules
  23. 23. 232018-04-25 • In-house tool • Template based generation of SQL/HQL • Generate files with tag-information • Tables and columns respectively HQL generator HQL generator CSV SQL PII
  24. 24. 242018-04-25 schema;table;attribute;tags dim_mart;customer_d;customer_id;PII,Sensitive dim_mart;customer_d;has_phone; Corresponding file for tables without attribute(column) Tag file for columns
  25. 25. 252018-04-25 • Hand coded of rules per tag • Policy tool applies rule on all tables with the tag • Can be different rules for different users • Filter gets appended to where condition by Ranger • Used for • Row based filtering (access) • Masking (anonymization) • Catch all rule to deny access to tables not in our model Ranger rules
  26. 26. 262018-04-25 tag;groups;users;filter PII_table;CRM_USERS;;EXISTS (SELECT 1 FROM customer_whitelist_crm whitelist_crm WERE whitelist_crm.customer_id = $table.customer_id) $table get replaced at deployment time Example tag_row_policies.csv
  27. 27. 272018-04-25 Deployment process *.sql table_tags.csv column_tags.csv tag_row_policies.csv Apply *.sql DDL Policy tool - tag files Policy tool - policy file
  28. 28. 282018-04-25 • Makes it easy to manage • Atlas tags • Ranger policy rules • Command line tool • Consumes tags and policy CSV files • Calls Atlas and Ranger API • Less than 1000 rows of Python Policytool Policytool CSV
  29. 29. 292018-04-25 Put everything together
  30. 30. 302018-04-25 Development process Change m odel Store m odel G enerate code Deploy PII Add rules *.sql column_tags.csv table_tags.csv tag_row_policies.csv
  31. 31. 312018-04-25 Deployment process *.sql column_tags.csv table_tags.csv tag_row_policies.csv Apply *.sql DDL Policy tool - tag files Policy tool - policy file
  32. 32. 322018-04-25 Change in view of an Analyst Before CRM Analyst
  33. 33. 332018-04-25 • Simple and easy model • Limited performance penalty • Tag on table with masking rule => all columns masked • Hard to understand API doc • Restriction on Ranger row based filtering (not on tags) • Row based filtering and masking not on direct file access Experiences of Atlas and Ranger
  34. 34. 342018-04-25 • Our customers and partners integrity is protected • Users have only access to data aimed for current purpose • Keep doing our required processing • Adaptable for new requirements • Maintainable solution Reached Goals
  35. 35. 352018-04-25 • Goals reached • No SQL changes • Scale when new datasets added • Our data model is guaranteed in sync • Better comments in Hive • Minimal impact on ETL developers workflow Conclusions
  36. 36. 362018-04-25 • Make it as simple as possible • Automate • Know your tool • Be clear on your authorization model • Know your data Takeaways
  37. 37. Magnus.Runesson@svenskaspel.se @MRunesson Thank you! 2018-04-25 37 karriar.svenskaspel.se

×