Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

3,643 views

Published on

HighLoad++ 2017

Зал «Кейптаун», 8 ноября, 16:00

Тезисы:
http://www.highload.ru/2017/abstracts/3115.html

During this session we will cover the last development in ProxySQL to support regular expressions (RE2 and PCRE) and how we can use this strong technique in correlation with ProxySQL's query rules to anonymize live data quickly and transparently. We will explain the mechanism and how to generate these rules quickly. We show live demo with all challenges we got from the Community and we finish the session by an interactive brainstorm testing queries from the audience.

Published in: Engineering

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

  1. 1. Inexpensive Datamasking for MySQL with ProxySQL René Cannaò
  2. 2. Who we are René Cannaò Founder of ProxySQL MySQL SRE at Dropbox thanks to: Frédéric Descamps MySQL Community Manager
  3. 3. Other Sessions 273. ProxySQL, MaxScale, MySQL Router and other database traffic managers / Petr Zaitsev (Percona) 155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
  4. 4. Agenda ● Database overview ● What is ProxySQL ● Features overview ● Data masking ● Rules ● Masking rules ● Obfuscation with mysqldump ● Examples
  5. 5. Overview of ProxySQL
  6. 6. Application and Database layers APPLICATIONS DATABASES
  7. 7. Main motivations empower the DBAs Improves manageability understand and improve performance High performance and High Availability create a proxy layer to shield the database
  8. 8. Database as a Service (layered) APPLICATIONS DATABASES + MANAGER(s) DAAS – REVERSE PROXY
  9. 9. What is ProxySQL? The MySQL data stargate
  10. 10. How to deploy
  11. 11. How to deploy
  12. 12. ProxySQL Features (short list) High Availability and Scalability seamless failover firewall query throttling query timeout query mirroring runtime reconfiguration Scheduler Support for Galera/PXC and Group Replication on-the-fly rewrite of queries caching reads outside the database connection pooling and multiplexing complex query routing and r/w split load balancing real time statistics monitoring Data masking Multiple instances on same ports Native Clustering
  13. 13. Support for ClickHouse
  14. 14. Data Masking Data masking or data obfuscation is the process of hiding original data with random characters or data. The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles
  15. 15. Why using ProxySQL as data masking solution? Open Source & Free like in beer Other solutions are expensive or not working Not worse than the other solutions as currently none is perfect The best solution would be to have this feature implemented in the server just after the handler API
  16. 16. Query Rules instructions to "program" ProxySQL behavior matching criteria actions flow control and chains
  17. 17. Query Rewrite Dynamically rewrite queries sent by the application/client without the client being aware on the fly using ProxySQL query rules rules defined using regular expressions, s/match/replace/
  18. 18. The concept We use Regular Expressions to modify the clients’ SQL statement and replace the column(s) we want to hide by some characters or generate fake data. We will split our solution in two different solutions: ● Provide access to the database to developers ● Generate dump to populate a database to share Only the defined users, in our example we use a developer, will have his statements modified.
  19. 19. The concept (2) We will also create two categories : •data masking •data obfuscating
  20. 20. Data Masking Here we will just mask with a generic character the full value of the column or part of it:
  21. 21. Data Obfuscation Here we will just replace the value of the column with random characters of the same type, we create fake data
  22. 22. Access INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('devel','devel',1,1); INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('backup','dumpme',1,1); Create a user for masking: Create a user for backups:
  23. 23. Rules Avoid SELECT * for the developer, we need to create some rules to block any SELECT * variant on the table if the column is part of many tables, we need to do so for each of them
  24. 24. Rules (2) Mask or obfuscate the field when the field is selected in the columns we need: ● to replace the column by showing the first 2 characters and a certain amount of X s or generate a random string ● keep the column name ● for mysqldump we need to allow SELECT * but mask and/or obfuscate sensible values
  25. 25. Rules overview rule_id: 1 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0 Rule #1
  26. 26. rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: ((?)(`?w+`?.)?first_name()?)([ ,n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: 1CONCAT(LEFT(2first_name,2),REPEAT('X',10))3 first_name4 apply: 0 Rule #2
  27. 27. rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: ((?)(`?w+`?.)?salary()?)([ ,n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: 1CONCAT( floor(rand() * 50000) + 10000,'')3 salary4 Rule #2 - obfuscating Let's imagine we want to provide fake number for `salaries`.`salary` column. We could instead of the previous rule use this one
  28. 28. rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: )()?) first_names+(w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )1 2, apply: 1 Rule #3
  29. 29. rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: )()?) first_names+(.*)s+from re_modifiers: caseless,global flagOUT: NULL replace_pattern: )1 2 from apply: 1 Rule #4
  30. 30. rule_id: 5 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #5
  31. 31. rule_id: 6 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+employees.*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #6
  32. 32. rule_id: 7 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+(w+).*.*FROM.*employeess+(ass+)?(1) re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact dba@acme.com apply: 0 Rule #6
  33. 33. Rules for mysqldump To provide a dump that might be used by developers, Q/A or support, we need to: ● generate valid data ● obfuscate sensitive information ● rewrite SQL statements issued by mysqldump ● only for tables and columns with sensitive data
  34. 34. mysqldump rules rule_id: 8 active: 1 user: backup schema: employees flagIN: 0 match: ^/*!40001 SQL_NO_CACHE */ * FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1 Rule #8
  35. 35. mysqldump rules rule_id: 9 active: 1 user: backup schema: employees flagIN: 0 match: * FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1 Rule #9
  36. 36. Limitions ● better support in proxySQL >= 1.4.x ○ RE2 an PCRE regexes ● all fields with the same name will be masked whatever the name of the table is in the same schema ● the regexps can always be not sufficient ● block any query not matching whitelisted SQL statements ● the dump via ProxySQL solution seems to be the best
  37. 37. Make it easy This is not really easy isn´t it ? You can use this small bash script (https://github.com/lefred/maskit) to generate them: # ./maskit.sh -c first_name -t employees -d employees column: first_name table: employees schema: employees let's add the rules...
  38. 38. Examples Easy ones: SELECT * FROM employees; SELECT emp_no, last_name, first_name FROM employees;
  39. 39. Examples (2) More difficult: select emp_no, concat(first_name), last_name from employees; select emp_no, first_name, first_name from employees.employees select emp_no, `first_name` from employees; select emp_no, first_name -> from employees; (*)
  40. 40. Examples (3) More difficult: select t1.first_name from employees.employees as t1; select emp_no, first_name as fred from employees; select emp_no, first_name rene from employees; select emp_no, first_name `as` from employees; select first_name as `as`, last_name from employees; select `t1`.`first_name` from employees.employees as t1;
  41. 41. Examples (4) More difficult: select first_name fred, last_name from employees; select emp_no, first_name /* first_name */ from employees.employees; /* */ select last_name, first_name from employees; select CUSTOMERS.* from myapp.CUSTOMERS; select a.* from employees.employees a;`
  42. 42. We need you!
  43. 43. Thank you! Questions? E: rene@proxysql.com

×