The document describes ClearCare's migration of their PostgreSQL database architecture to AWS to meet scalability, availability, automation, and HIPAA compliance requirements. Key aspects included setting up a multi-AZ deployment with streaming replication for high availability, auto scaling read replicas, automated backups to EBS snapshots, role-based access control with LDAP, encryption of data at rest and in transit, and centralized logging and auditing for compliance. The new architecture provides improved performance, security, automation, and a cost-effective solution to support ClearCare's growing business needs.
7. Requirements - High Availability
No impact from loss of any 1 server
Recover from loss of master DB in <5m
8. Requirements - Scalable
• Scale without code changes
• Architecture is transparent to the app
Decouple Infrastructure
from App
• Easily isolate certain DB trafficBuilt in Flexibility
19. Backups
WAL-E backup was talking 6-12 hours with ~2TB cluster
EBS volume snapshots
Incremental backups
WAL-E only for WAL archiving
New volume created from the latest snapshot for new slaves and PITR
21. High Availability - Read Tier
• Docker best practice… 1 process per container.HAProxy with App Servers
• Connection management and Docker best practicepgBouncers with App Servers
• More infrastructure to manage.pgPool
• More infrastructure to manage.HAProxy servers
• Decreased visibility.
AWS Elastic Load Balancer
(ELB)
22. High Availability - Read Tier
Pro’s:
• Easy to setup and highly available
• Fully managed service provided by AWS (low maintenance)
• Automatically takes care of routing traffic to correct AZ
Con’s:
• Limited visibility
• The DBs will only see the ELB IPs as the source and will not know about
the client IPs
• ELB is an extra hop which will introduce some latency
• ELB cost will be added
23. Scalability
AWS Auto Scaling Groups (ASG)
Integrates with ELB
Easily increase and decrease instance count in read tier
The AMI is generated by Jenkins jobs
Automates Instance Recovery
34. Securing the Data at Rest
Encrypted EBS volumes
Create a new encrypted volume the latest unencrypted snapshot
Performance impact less that 5%
35. Securing the Data in Motion
SSL in PostgreSQL
• Traffic from app to pgBouncer is encrypted w/ SSL (starting from v 1.7).
• Traffic from pgBouncer to DB is local so no encryption required.
• Replication traffic is using SSL.
Issue encountered
• Site slowdown
• Large number of “idle in transaction” connections on DB
CPU ‘pegged’
104283 pgbounce 20 0 52860 16m 3604 R 100 0.0 4031:47 /usr/bin/pgbouncer -R -d -q /etc/pgbouncer/pgbouncer.ini
36. Securing the Data in Motion
HAPROXY
PGB 1 PGB 2
POSTGRESQL DB
round robin
40. DB User Management
LDAP authentication using JumpCloud (pg_ldap_sync)
Postgres roles being used to grant appropriate permissions
AWS Security Group setting to lock down access
SSH access controlled via JumpCloud
Access allowed to select users only from a dedicated server
41. DB Credentials In App
HIPAA: No shared users
Removed
passwords from
source code
KMS used for password encryption
KMS key
protected by IAM
IAM role assigned
only to app
servers
Password
decrypted at
runtime
43. DB Logging
HIPAA: Auditability (Who did what? When?)
• App Traffic can be audited from application logs.
• Full query logging for all other DB users
App and Query Logs shipped to Loggly (BAA).
ALTER USER myuser SET log_min_duration_statement=0;
44. Security/Compliance Audit
HIPAA: Document, Configure, Report.
•Role based access set up as we expect?
•Who’s viewing and making changes?
•Are we using dedicated instances?
•Are we using encrypted EBS?
•Is traffic encrypted?
•Do we have backups and are they working?
•Are IAM roles (password decryption) set up correctly?
•Are security groups set up correctly?
Auditing is Time Consuming, so automate!