SlideShare a Scribd company logo
1 of 23
Download to read offline
TOOLS OF THE TRADE:
Automated Database Sanitization with AWS
Dee Wilcox
Nashville PHP Monthly Meetup
April 11, 2017
About Me
● Senior Software Developer at NASBA.
● Not a Nashville native, but it’s been home for 7 years.
● Married for 14 years, 2 daughters ages 6 and 1.
● Teaching myself to code since 2002.
● Passionate about the maker movement, mentoring and
empowering women in tech, and building healthy teams.
Full Stack Dev Tools of the Trade
Webserver
● Linux
● Apache
● Easy to create
development
environments with
AWS EC2s or
Lightsail, Docker,
and a host of other
providers.
Database
● MySQL or Postgres
○ Integrated
LAMP
webserver
● Dedicated
database server
● AWS RDS
instance
Application Layer
● Application code
runs on the
webserver and
connects out to the
database
● Environments are
managed through
application
configuration files
The Problem:
How do we cleanly reload
production data to a
development or testing
environment without
compromising security?
Automated Database Sanitization
Option 1:
Web Based On
Demand Solution
One option is to create a simple web
application that is designed to
retrieve, sanitize, and store sanitized
MySQL dump files so that they are
easily accessible.
Benefits of an On Demand Solution
Easy Maintenance
● PHP application code is easy to maintain
● Development team can modify and improve
Easy on the Database
● Dump files are only created as needed
● Better for storage management
Easy Tracking
● Easy tracking for recent requests
● Helps eliminate duplicate requests
Easy Storage and Retrieval
● A common storage and retrieval mechanism
streamlines processes for Ops & Development
In Practice
Failing Un-Gracefully
● Failures to find or execute the
sanitization routines were not captured
or returned to the user, causing data
dumps to either not be created, or to
be created while still containing
sensitive data.
Room for improvement:
● Capturing all types of MySQL errors
● Logging and notification controls for
successful and unsuccessful processing.
Too Tightly Coupled
● Tightly coupling the sanitization code
from the data layer with the
application code made it difficult to
maintain a separation of concerns.
● In an environment with separated
development, operations, and database
administration roles, this made the
process more cumbersome.
Room for improvement:
● Separate application and data layers.
Option 2:
Automated Solution
Another option is to create a simple
shell script (or two) that runs nightly
on a utility server with read-only
access to the production database (or
production replica).
Benefits of an Automated Solution
Easier Maintenance
● Bash code is easy to read and maintain for
both Operations and Development
Easier on the Database
● Dump files are created when the databases are
least-used
Easier Tracking
● No request management. Dump files are
delivered for all databases nightly
Easy Storage and Retrieval
● A common storage and retrieval mechanism
streamlines processes for Ops & Development
In Practice
Boring Bash
● Bash shell scripting has its place, but
it’s not necessarily a beloved
programming language in the PHP
development community.
● Luckily, it’s still easy to work with.
Room for improvement:
● The bash script is long and procedural.
It could be organized into methods,
which would be easier for most PHP
developers to follow.
Tightly Defined Sanitization
Parameters
● The current sanitization scripts
perform standardized sanitization on
table columns known to contain
sensitive data.
● They do not scan the data tables
utilizing regex to identify or mask
additional PII.
Room for improvement:
● Smarter sanitization logic.
Webserver Database Storage & Retrieval
Leveraging AWS
LightSail or EC2 MySQL RDS
Utility Scripts
● Automated sanitization and storage
● Automated retrieval
MySQL Replica
S3 and Glacier
Step 1:
Set up the
Environment
Key dependencies:
● Linux webserver
● AWS Credentials for S3
● AWS CLI
● Database credentials
○ Username and
password OR
○ Login path
● Repository for
sanitizer_db
Setting up Sanitizer DB
Create Statements
● Define the databases that need to be
sanitized.
● Include specific and accurate create
statements that match the production
configuration for these databases.
Grants and Definers
● Make sure your new database user has
read-only access to the other databases
and write access to create and drop
new databases.
Sanitization Routines
● Clearly define the data to be sanitized.
● Use queries or stored routines -
whatever fits your environment best.
Step 2:
Set up the
Database
# Create databases and definers
mysql --login-path=local <
/data/sites/sanitizer_db/databases/creat
e.sql
mysql --login-path=local <
/data/sites/sanitizer_db/databases/defin
ers.sql
# Loop through sanitization routines
cd /data/sites/sanitizer_db/routines
for routine in sanitize*.sql
do
routine_name=$(echo $routine | sed
's/.sql//')
Step 3:
Compile the
Sanitization
Routines and Empty
Databases
database=$(echo $routine | sed 's/.sql//g
; s/^sanitize_// ; s/_noop//')
database_filename=$database$filename
# Drop the database in sanitizer if it
already exists
mysql --login-path=local -e "drop database
if exists sanitize_$database;"
# Create a database
mysql --login-path=local -e "create
database sanitize_$database default
character set utf8;"
# Compile the stored routine for
sanitization
mysql --login-path=local < $routine
Step 4:
Load and Sanitize
# Generate dump files of each database
mysqldump --login-path=local
--lock-tables=false $database | mysql
--login-path=local sanitize_$database
# Run sanitization and capture output
sanitized=$(echo "call
sanitize_$database.sanitize_$database(1)
;" | mysql --login-path=local)
Step 5:
Catch Errors
if [ "$?" == 0]
then
echo "There was a problem executing
the stored routine."
fi
if [ -z "$sanitized" ]
then
sanitized_fail+="$database "
fi
Step 6:
Dump and
Compress Sanitized
Data
if [ "$sanitized" ]
then
# Add entries to sanitized success array.
sanitized_success+="$database "
# Remove existing sanitized file
rm -f
"$local_directory"/"$database_filename"
# Create compressed mysqldump file
mysqldump --login-path=local
--lock-tables=false --no-create-info
--skip-triggers $database | bzip2 >
"$local_directory"/"$database_filename"
# Send to S3
/usr/local/bin/aws --profile $s3_profile
s3 mv
"$local_directory"/"$database_filename"
"$s3_url"/"$database_filename" --region
$s3_region
fi
Step 7:
Clean Up the
Environment
# Drop the sanitized database
mysql --login-path=local -e
"drop database sanitize_$database;"
# Remove the SQL file if it
still exists
rm -f "$database"_sanitized.sql
done
Storage and Retrieval in S3
Using the AWS CLI
● Credentials must be defined and exist in
~/.aws/config
● Include parameters for region and profile
● Encryption flag is only needed on
retrieval of the file
Using a Scheduler
● Utilized simple crontab functionality to
create a scheduled job.
● Use AWS Lambda to schedule events.
Audit Controls
● Use the $sanitized_fail and
$sanitized_success arrays to track
successes and failures.
● Make use of logging and notifications
to meet audit requirements and
immediately notify users of any issues.
Let’s Discuss!
What have you tried in
your environment?
What is working?
Not working?
Where to Find Me
Twitter
https://twitter.com/dee_wilcox
LinkedIn
https://www.linkedin.com/in/
deewilcox
Google
https://plus.google.com/+Dee
WilcoxOnline
Github
https://github..com/deewilcox

More Related Content

What's hot

Linux User Space Debugging & Profiling
Linux User Space Debugging & ProfilingLinux User Space Debugging & Profiling
Linux User Space Debugging & Profiling
Anil Kumar Pugalia
 
vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoring
Alan Renouf
 

What's hot (20)

Automated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and KubernetesAutomated Image Builds in OpenShift and Kubernetes
Automated Image Builds in OpenShift and Kubernetes
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
mastering libcurl part 1
mastering libcurl part 1mastering libcurl part 1
mastering libcurl part 1
 
Device tree
Device treeDevice tree
Device tree
 
Introduction to Rust language programming
Introduction to Rust language programmingIntroduction to Rust language programming
Introduction to Rust language programming
 
What is Docker?
What is Docker?What is Docker?
What is Docker?
 
Deep drive into rust programming language
Deep drive into rust programming languageDeep drive into rust programming language
Deep drive into rust programming language
 
Verilog 語法教學
Verilog 語法教學 Verilog 語法教學
Verilog 語法教學
 
Linux User Space Debugging & Profiling
Linux User Space Debugging & ProfilingLinux User Space Debugging & Profiling
Linux User Space Debugging & Profiling
 
Github basics
Github basicsGithub basics
Github basics
 
Maven
MavenMaven
Maven
 
Evaluación de la accesibilidad de aplicaciones para dispositivos móviles apli...
Evaluación de la accesibilidad de aplicaciones para dispositivos móviles apli...Evaluación de la accesibilidad de aplicaciones para dispositivos móviles apli...
Evaluación de la accesibilidad de aplicaciones para dispositivos móviles apli...
 
github-actions.pdf
github-actions.pdfgithub-actions.pdf
github-actions.pdf
 
Git hooks
Git hooksGit hooks
Git hooks
 
Go Programming language, golang
Go Programming language, golangGo Programming language, golang
Go Programming language, golang
 
Blow fish final ppt
Blow fish final pptBlow fish final ppt
Blow fish final ppt
 
Ssh (The Secure Shell)
Ssh (The Secure Shell)Ssh (The Secure Shell)
Ssh (The Secure Shell)
 
vSphere APIs for performance monitoring
vSphere APIs for performance monitoringvSphere APIs for performance monitoring
vSphere APIs for performance monitoring
 
Object Oriented Exploitation: New techniques in Windows mitigation bypass
Object Oriented Exploitation: New techniques in Windows mitigation bypassObject Oriented Exploitation: New techniques in Windows mitigation bypass
Object Oriented Exploitation: New techniques in Windows mitigation bypass
 
Go Programming Language (Golang)
Go Programming Language (Golang)Go Programming Language (Golang)
Go Programming Language (Golang)
 

Similar to Automated Database Sanitization with AWS

Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
Be05 introduction to sql azure
Be05   introduction to sql azureBe05   introduction to sql azure
Be05 introduction to sql azure
DotNetCampus
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016
Dipesh Singh
 
20090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp0220090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp02
Vinamra Mittal
 

Similar to Automated Database Sanitization with AWS (20)

My sql performance tuning course
My sql performance tuning courseMy sql performance tuning course
My sql performance tuning course
 
Jineesh
JineeshJineesh
Jineesh
 
Mysql tutorial
Mysql tutorialMysql tutorial
Mysql tutorial
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
Mysql tutorial 5257
Mysql tutorial 5257Mysql tutorial 5257
Mysql tutorial 5257
 
Mysql 8 vs Mariadb 10.4 Highload++ 2019
Mysql 8 vs Mariadb 10.4 Highload++ 2019Mysql 8 vs Mariadb 10.4 Highload++ 2019
Mysql 8 vs Mariadb 10.4 Highload++ 2019
 
Be05 introduction to sql azure
Be05   introduction to sql azureBe05   introduction to sql azure
Be05 introduction to sql azure
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
Dipesh Singh 01112016
Dipesh Singh 01112016Dipesh Singh 01112016
Dipesh Singh 01112016
 
20090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp0220090425mysqlslides 12593434194072-phpapp02
20090425mysqlslides 12593434194072-phpapp02
 
MySQL Utilities -- PyTexas 2015
MySQL Utilities -- PyTexas 2015MySQL Utilities -- PyTexas 2015
MySQL Utilities -- PyTexas 2015
 
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersKoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
 
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginnersKoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
KoprowskiT_SQLSaturday409_MaintenancePlansForBeginners
 
MySQL High Availability Solutions - Avoid loss of service by reducing the r...
MySQL High Availability Solutions  -  Avoid loss of service by reducing the r...MySQL High Availability Solutions  -  Avoid loss of service by reducing the r...
MySQL High Availability Solutions - Avoid loss of service by reducing the r...
 
Mysql
MysqlMysql
Mysql
 
MySQL Guide for Beginners
MySQL Guide for BeginnersMySQL Guide for Beginners
MySQL Guide for Beginners
 
vFabric Data Director 2.7 customer deck
vFabric Data Director 2.7 customer deckvFabric Data Director 2.7 customer deck
vFabric Data Director 2.7 customer deck
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
 
The Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL DatabasesThe Proper Care and Feeding of MySQL Databases
The Proper Care and Feeding of MySQL Databases
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Automated Database Sanitization with AWS

  • 1. TOOLS OF THE TRADE: Automated Database Sanitization with AWS Dee Wilcox Nashville PHP Monthly Meetup April 11, 2017
  • 2. About Me ● Senior Software Developer at NASBA. ● Not a Nashville native, but it’s been home for 7 years. ● Married for 14 years, 2 daughters ages 6 and 1. ● Teaching myself to code since 2002. ● Passionate about the maker movement, mentoring and empowering women in tech, and building healthy teams.
  • 3. Full Stack Dev Tools of the Trade Webserver ● Linux ● Apache ● Easy to create development environments with AWS EC2s or Lightsail, Docker, and a host of other providers. Database ● MySQL or Postgres ○ Integrated LAMP webserver ● Dedicated database server ● AWS RDS instance Application Layer ● Application code runs on the webserver and connects out to the database ● Environments are managed through application configuration files
  • 4. The Problem: How do we cleanly reload production data to a development or testing environment without compromising security?
  • 6. Option 1: Web Based On Demand Solution One option is to create a simple web application that is designed to retrieve, sanitize, and store sanitized MySQL dump files so that they are easily accessible.
  • 7. Benefits of an On Demand Solution Easy Maintenance ● PHP application code is easy to maintain ● Development team can modify and improve Easy on the Database ● Dump files are only created as needed ● Better for storage management Easy Tracking ● Easy tracking for recent requests ● Helps eliminate duplicate requests Easy Storage and Retrieval ● A common storage and retrieval mechanism streamlines processes for Ops & Development
  • 8. In Practice Failing Un-Gracefully ● Failures to find or execute the sanitization routines were not captured or returned to the user, causing data dumps to either not be created, or to be created while still containing sensitive data. Room for improvement: ● Capturing all types of MySQL errors ● Logging and notification controls for successful and unsuccessful processing. Too Tightly Coupled ● Tightly coupling the sanitization code from the data layer with the application code made it difficult to maintain a separation of concerns. ● In an environment with separated development, operations, and database administration roles, this made the process more cumbersome. Room for improvement: ● Separate application and data layers.
  • 9. Option 2: Automated Solution Another option is to create a simple shell script (or two) that runs nightly on a utility server with read-only access to the production database (or production replica).
  • 10. Benefits of an Automated Solution Easier Maintenance ● Bash code is easy to read and maintain for both Operations and Development Easier on the Database ● Dump files are created when the databases are least-used Easier Tracking ● No request management. Dump files are delivered for all databases nightly Easy Storage and Retrieval ● A common storage and retrieval mechanism streamlines processes for Ops & Development
  • 11. In Practice Boring Bash ● Bash shell scripting has its place, but it’s not necessarily a beloved programming language in the PHP development community. ● Luckily, it’s still easy to work with. Room for improvement: ● The bash script is long and procedural. It could be organized into methods, which would be easier for most PHP developers to follow. Tightly Defined Sanitization Parameters ● The current sanitization scripts perform standardized sanitization on table columns known to contain sensitive data. ● They do not scan the data tables utilizing regex to identify or mask additional PII. Room for improvement: ● Smarter sanitization logic.
  • 12. Webserver Database Storage & Retrieval Leveraging AWS LightSail or EC2 MySQL RDS Utility Scripts ● Automated sanitization and storage ● Automated retrieval MySQL Replica S3 and Glacier
  • 13. Step 1: Set up the Environment Key dependencies: ● Linux webserver ● AWS Credentials for S3 ● AWS CLI ● Database credentials ○ Username and password OR ○ Login path ● Repository for sanitizer_db
  • 14. Setting up Sanitizer DB Create Statements ● Define the databases that need to be sanitized. ● Include specific and accurate create statements that match the production configuration for these databases. Grants and Definers ● Make sure your new database user has read-only access to the other databases and write access to create and drop new databases. Sanitization Routines ● Clearly define the data to be sanitized. ● Use queries or stored routines - whatever fits your environment best.
  • 15. Step 2: Set up the Database # Create databases and definers mysql --login-path=local < /data/sites/sanitizer_db/databases/creat e.sql mysql --login-path=local < /data/sites/sanitizer_db/databases/defin ers.sql # Loop through sanitization routines cd /data/sites/sanitizer_db/routines for routine in sanitize*.sql do routine_name=$(echo $routine | sed 's/.sql//')
  • 16. Step 3: Compile the Sanitization Routines and Empty Databases database=$(echo $routine | sed 's/.sql//g ; s/^sanitize_// ; s/_noop//') database_filename=$database$filename # Drop the database in sanitizer if it already exists mysql --login-path=local -e "drop database if exists sanitize_$database;" # Create a database mysql --login-path=local -e "create database sanitize_$database default character set utf8;" # Compile the stored routine for sanitization mysql --login-path=local < $routine
  • 17. Step 4: Load and Sanitize # Generate dump files of each database mysqldump --login-path=local --lock-tables=false $database | mysql --login-path=local sanitize_$database # Run sanitization and capture output sanitized=$(echo "call sanitize_$database.sanitize_$database(1) ;" | mysql --login-path=local)
  • 18. Step 5: Catch Errors if [ "$?" == 0] then echo "There was a problem executing the stored routine." fi if [ -z "$sanitized" ] then sanitized_fail+="$database " fi
  • 19. Step 6: Dump and Compress Sanitized Data if [ "$sanitized" ] then # Add entries to sanitized success array. sanitized_success+="$database " # Remove existing sanitized file rm -f "$local_directory"/"$database_filename" # Create compressed mysqldump file mysqldump --login-path=local --lock-tables=false --no-create-info --skip-triggers $database | bzip2 > "$local_directory"/"$database_filename" # Send to S3 /usr/local/bin/aws --profile $s3_profile s3 mv "$local_directory"/"$database_filename" "$s3_url"/"$database_filename" --region $s3_region fi
  • 20. Step 7: Clean Up the Environment # Drop the sanitized database mysql --login-path=local -e "drop database sanitize_$database;" # Remove the SQL file if it still exists rm -f "$database"_sanitized.sql done
  • 21. Storage and Retrieval in S3 Using the AWS CLI ● Credentials must be defined and exist in ~/.aws/config ● Include parameters for region and profile ● Encryption flag is only needed on retrieval of the file Using a Scheduler ● Utilized simple crontab functionality to create a scheduled job. ● Use AWS Lambda to schedule events. Audit Controls ● Use the $sanitized_fail and $sanitized_success arrays to track successes and failures. ● Make use of logging and notifications to meet audit requirements and immediately notify users of any issues.
  • 22. Let’s Discuss! What have you tried in your environment? What is working? Not working?
  • 23. Where to Find Me Twitter https://twitter.com/dee_wilcox LinkedIn https://www.linkedin.com/in/ deewilcox Google https://plus.google.com/+Dee WilcoxOnline Github https://github..com/deewilcox