Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017

5,653 views

Published on

Attend this session for a technical deep dive about RDS Postgres and Aurora Postgres. Come hear from Mark Porter, the General Manager of Aurora PostgreSQL and RDS at AWS, as he covers service specific use cases and applications within the AWS worldwide public sector community. Learn More: https://aws.amazon.com/government-education/

Published in: Technology

RDS Postgres and Aurora Postgres | AWS Public Sector Summit 2017

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Mark Porter, General Manager Amazon Aurora, Amazon RDS June 14, 2017 Managed PostgreSQL in the Amazon Cloud Amazon RDS for PostgreSQL & Amazon Aurora PostgreSQL
  2. 2. RDS – Amazon Relational Database Service We make it easy to set up, operate, and scale relational databases in the cloud. We automate the time-consuming database administration tasks so you can focus on application optimization. We (try to) do it with a higher bar of security, durability, and availability than you can achieve either on-premises or on EC2. You choose the database that is right for you: Amazon Aurora, MySQL, MariaDB, Oracle, Microsoft SQL Server, and PostgreSQL. We give you choices from tiny databases on small CPUs to enterprise-class databases on enterprise hardware. We work to save you money through elasticity, operational efficiencies, and a deep passion for delighting you.
  3. 3. If you run your databases on-premises Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups High availability DB s/w installs OS installation you Scaling Hardware management Operating system management Database management App optimization Database deep dive Design consultation Best practicesWhat you want to spend your time on…
  4. 4. Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation If you run your databases on-premises ? you Hardware management Operating System management Database management App optimization Database deep dive Design consultation Best practicesWhat you want to spend your time on…
  5. 5. And then you move to the cloud… Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups Scaling High availability DB s/w installs OS installation you Database management Your cloud vendor App optimization Database deep dive Design consultation Best practicesWhat you want to spend your time on… Hardware management Operating system management
  6. 6. OS patches DB s/w patches Database backups Scaling High availability DB s/w installs App optimization Power, HVAC, net Rack & stack Server maintenance OS installation And then you move to the cloud… ? Hardware management Operating system managementDatabase management you Your cloud vendor Database deep dive Design consultation Best practicesWhat you want to spend your time on…
  7. 7. And now move to RDS… Power, HVAC, net Rack & stack Server maintenance OS patches DB s/w patches Database backups High availability DB s/w installs OS installation Scaling Database deep dive Design consultation App optimization Best practices Hardware management Operating system management Database management What you want to spend your time on… you Your cloud vendor
  8. 8. A single management platform for all your databases  Single Pane of Glass  Console or API  Monitor via email, SMS
  9. 9. Manageable  Rapid deployment with pre-configured parameters  Patch management  Monitoring and metrics Available and durable Scalability Secure  Encryption in transit and at rest  TDE with Oracle Database and SQL Server Inexpensive Elastic Easy to script RDS key features
  10. 10. DB Snapshots  User-driven snapshots of database  Kept until explicitly deleted Automated Backups  Nightly system snapshots + transaction backup  Enables point-in-time restore to any point in retention period, up to the last 5 minutes  Max retention period = 35 days Backups and recovery
  11. 11. Enterprise-grade fault tolerance solution for production databases With a few clicks, Amazon RDS creates and synchronously maintains a standby in a different Availability Zone Automatic failover (~60-90 sec) in case of:  Loss of availability in primary AZ  Loss of connectivity to primary  Host or storage failure on primary  Vertical scaling of CPU or storage  Software patching High availability: Multi-AZ deployments
  12. 12. Scale nodes vertically up or down  t2.small (1 virtual core, 2GiB)  m3.2xlarge (8 virtual cores, 30GiB)  r3.8xlarge (32 virtual cores, 244GiB)  r4.16xlarge (64 virtual cores, 488GiB)  1.32xlarge (~128 virtual cores, ~1952 GiB) • (Coming soon) Convert storage to PIOPS or GP2  Consistent throughput + low I/O latencies Scale storage vertically with minimal downtime  Increase throughput by spreading data across additional volumes  Independently scale provisioned IOPS Push-button scaling
  13. 13. Horizontal scaling with Read Replicas Add Read Replicas with a single button push • Horizontal scaling of read-heavy workloads • Offload reporting • Cross Region as well (VPN, etc.) Available for MySQL, MariaDB, PostgreSQL, Aurora Overcoming challenges • RDS makes it easy to re-create if fallen behind • Deploy a proxy to round robin requests • With Aurora, promote chosen read replica to master
  14. 14. But RDS is only as good as the engines it manages….
  15. 15. Amazon RDS RDS Platform OPEN SOURCE ENGINES COMMERCIAL ENGINES • Advanced monitoring • Routine maintenance • Push-button scaling • Automatic failover • Backup & recovery • X-region replication • Isolation & security • Industry compliance • Automated patching CLOUD NATIVE ENGINE
  16. 16. Agenda  Why did we build Amazon Aurora?  Why add PostgreSQL compatibility?  Durability and availability architecture  Performance results vs. PostgreSQL  Performance architecture  Announcing Performance Insights  Feature Roadmap  Questions +
  17. 17. Traditional relational databases are hard to scale Multiple layers of functionality all in a monolithic stack SQL Transactions Caching Logging Storage
  18. 18. Traditional approaches to scale databases Each architecture is limited by the monolithic mindset SQL Transactions Caching Logging SQL Transactions Caching Logging Application Application SQL Transactions Caching Logging SQL Transactions Caching Logging Storage Application Storage Storage SQL Transactions Caching Logging Storage SQL Transactions Caching Logging Storage
  19. 19. Reimagining the relational database What if you were inventing the database today? You would break apart the stack You would build something that:  Can scale out…  Is self-healing…  Leverages distributed services…
  20. 20. A service-oriented architecture applied to the database Move the logging and storage layer into a multitenant, scale-out, database-optimized storage service Integrate with other AWS services like Amazon EC2, Amazon VPC, Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control & monitoring Make it a managed service — using Amazon RDS. Takes care of management and administrative functions. Amazon DynamoDB Amazon SWF Amazon Route 53 Logging + Storage SQL Transactions Caching Amazon S3 1 2 3 Amazon RDS
  21. 21. In 2014, we launched Amazon Aurora with MySQL compatibility. Now, we are adding PostgreSQL compatibility. Customers can now choose how to use the Amazon cloud-optimized relational database, with the performance and availability of commercial databases and the simplicity and cost- effectiveness of open source databases. Making Amazon Aurora better
  22. 22. Start with the customer – Why add PostgreSQL?
  23. 23. Start with the customer – Why add PostgreSQL?
  24. 24. Open source database In active development for 20 years Owned by a foundation, not a single company Permissive innovation-friendly open source license PostgreSQL fast facts Open Source Initiative
  25. 25. High performance out of the box Object-oriented and ANSI-SQL:2008 compatible Most geospatial features of any open source database Supports stored procedures in 12 languages (Java, Perl, Python, Ruby, Tcl, C/C++, its own Oracle-like PL/pgSQL, etc.) PostgreSQL fast facts
  26. 26. Most Oracle-compatible open source database Highest AWS Schema Conversion Tool automatic conversion rates are from Oracle to PostgreSQL PostgreSQL fast facts
  27. 27. What does Aurora PostgreSQL mean? (1 of 2) PostgreSQL 9.6 + Amazon Aurora cloud-optimized storage Performance: Up to 2x+ better performance than PostgreSQL alone Availability: Failover time of < 30 seconds Durability: 6 copies across 3 Availability Zones Read Replicas: Single-digit millisecond lag times on up to 15 replicas Fully compatible with PostgreSQL, now and for the foreseeable future Not a compatibility layer — native PostgreSQL implementation Amazon Aurora Storage
  28. 28. What does Aurora PostgreSQL mean? (2 of 2) Cloud-native security and encryption AWS Key Management Service (KMS) and AWS Identity and Access Management (IAM) Easy to manage with Amazon RDS Easy to load and unload AWS Database Migration Service and AWS Schema Conversion Tool Fully compatible with PostgreSQL, now and for the foreseeable future Not a compatibility layer — native PostgreSQL implementation AWS DMS Amazon RDS PostgreSQL
  29. 29. Amazon Aurora PostgreSQL Durability & availability
  30. 30. Scale-out, distributed, log structured storage Master Replica Replica Replica Availability Zone 1 Shared Storage Volume – Transaction Aware Primary Database Node Read Replica / Secondary Node Read Replica / Secondary Node Read Replica / Secondary Node Availability Zone 2 Availability Zone 3 AWS Region Storage Monitoring Database and Instance Monitoring
  31. 31. Amazon Aurora Storage Engine overview Data is replicated 6 times across 3 Availability Zones Continuous backup to Amazon S3 (built for 11 9s durability) Continuous monitoring of nodes and disks for repair 10 GB segments as unit of repair or hotspot rebalance Quorum system for read/write; latency tolerant Quorum membership changes do not stall writes Storage volume automatically grows up to 64 TB AZ 1 AZ 2 AZ 3 Amazon S3 Database Node Storage Node Storage Node Storage Node Storage Node Storage Node Storage Node Storage Monitoring
  32. 32. What can fail? Segment failures (disks) Node failures (machines) AZ failures (network or data center) Optimizations 4 out of 6 write quorum 3 out of 6 read quorum Peer-to-peer replication for repairs SQL Transaction AZ 1 AZ 2 AZ 3 Caching SQL Transaction AZ 1 AZ 2 AZ 3 Caching Amazon Aurora Storage Engine fault tolerance
  33. 33. Amazon Aurora Replicas Availability Failing database nodes are automatically detected and replaced Failing database processes are automatically detected and recycled Replicas are automatically promoted to primary if needed (failover) Customer specifiable failover order AZ 1 AZ 3AZ 2 Primary Node Primary Node Primary Database Node Primary Node Primary Node Read Replica Primary Node Primary Node Read Replica Database and Instance Monitoring Performance Customer applications can scale out read traffic across read replicas Read balancing across read replicas
  34. 34. Amazon Aurora Continuous Backup Segment snapshot Log records Recovery point Segment 1 Segment 2 Segment 3 Time • Take periodic snapshot of each segment in parallel; stream the logs to Amazon S3 • Backup happens continuously without performance or availability impact • At restore, retrieve the appropriate segment snapshots and log streams to storage nodes • Apply log streams to segment snapshots in parallel and asynchronously
  35. 35. Faster, more predictable failover with Amazon Aurora App RunningFailure Detection DNS Propagation Recovery Database Failure Amazon RDS for PostgreSQL is good: failover times of ~60 seconds Replica-Aware App Running Failure Detection DNS Propagation Recovery Database Failure Amazon Aurora is better: failover times < 30 seconds 1 5 - 2 0 s e c 3 - 1 0 s e c App Running
  36. 36. Amazon Aurora PostgreSQL Performance
  37. 37. PostgreSQL Benchmark system configurations Amazon Aurora AZ 1 EBS EBS EBS 45,000 total IOPS AZ 1 AZ 2 AZ 3 Amazon S3 m4.16xlarge database instance Storage Node Storage Node Storage Node Storage Node Storage Node Storage Node c4.8xlarge client driver m4.16xlarge database instance c4.8xlarge client driver ext4 filesystem m4.16xlarge (64 VCPU, 256GiB), c4.8xlarge (36 VCPU, 60GiB)
  38. 38. Amazon Aurora is >=2x faster on PgBench pgbench “tpcb-like” workload, scale 2000 (30GiB). All configurations run for 60 minutes
  39. 39. Amazon Aurora is 2x-3x faster on SysBench Amazon Aurora delivers 2x the absolute peak of PostgreSQL and 3x PostgreSQL performance at high client counts SysBench oltp (write-only) workload with 30 GB database with 250 tables and 400,000 initial rows per table
  40. 40. Amazon Aurora: Over 120,000 writes/sec OLTP test statistics: queries performed: read: 0 write: 432772903 other:(begin + commit) 216366749 total: 649139652 transactions: 108163671 (30044.73 per sec.) read/write requests: 432772903 (120211.75 per sec.) other operations: 216366749 (60100.40 per sec.) ignored errors: 39407 (10.95 per sec.) reconnects: 0 (0.00 per sec.) sysbench write-only 10GB workload with 250 tables and 25,000 initial rows per table. 10-minute warmup, 3,076 clients Ignored errors are key constraint errors, designed into sysbench Sustained sysbench throughput over 120K writes/sec
  41. 41. Amazon Aurora loads data 3x faster Database initialization is three times faster than PostgreSQL using the standard PgBench benchmark Command: pgbench -i -s 2000 –F 90
  42. 42. Amazon Aurora gives >2x faster response times Response time under heavy write load >2x faster than PostgreSQL (and >10x more consistent) SysBench oltp(write-only) 23GiB workload with 250 tables and 300,000 initial rows per table. 10-minute warmup.
  43. 43. Amazon Aurora has more consistent throughput While running at load, performance is more than three times more consistent than PostgreSQL PgBench “tpcb-like” workload at scale 2000. Amazon Aurora was run with 1280 clients. PostgreSQL was run with 512 clients (the concurrency at which it delivered the best overall throughput)
  44. 44. Amazon Aurora is 3x faster at large scale Scales from 1.5x to 3x faster as database grows from 10 GiB to 100 GiB SysBench oltp(write-only) – 10 GiB with 250 tables & 150,000 rows and 100 GiB with 250 tables & 1,500,000 rows 75,666 27,491 112,390 82,714 0 20,000 40,000 60,000 80,000 100,000 120,000 10GB 100GB writes/sec SysBench Test Size SysBench write-only PostgreSQL Amazon Aurora
  45. 45. Amazon Aurora delivers up to 85x faster recovery SysBench oltp (write-only) 10GiB workload with 250 tables & 150,000 rows Writes per Second 69,620 Writes per Second 32,765 Writes per Second 16,075 Writes per Second 92,415 Recovery Time (seconds) 102.0 Recovery Time (seconds) 52.0 Recovery Time (seconds) 13.0 Recovery Time (seconds) 1.2 0 20 40 60 80 100 120 140 0 20,000 40,000 60,000 80,000 PostgreSQL 12.5GB Checkpoint PostgreSQL 8.3GB Checkpoint PostgreSQL 2.1GB Checkpoint Amazon Aurora No Checkpoints Recovery Time in Seconds Writes Per Second Crash Recovery Time - SysBench 10GB Write Workload Transaction-aware storage system recovers almost instantly
  46. 46. Amazon Aurora with PostgreSQL compatibility Performance by the numbers Measurement Result PgBench >= 2x faster SysBench 2x-3x faster Data Loading 3x faster Response Time >2x faster Throughput Jitter >3x more consistent Throughput at Scale 3x faster Recovery Speed Up to 85x faster
  47. 47. Amazon Aurora PostgreSQL Performance architecture
  48. 48. Do fewer I/Os Minimize network packets Offload the database engine DO LESS WORK Process asynchronously Reduce latency path Use lock-free data structures Batch operations together BE MORE EFFICIENT How does Amazon Aurora achieve high performance? DATABASES ARE ALL ABOUT I/O NETWORK-ATTACHED STORAGE IS ALL ABOUT PACKETS/SECOND HIGH-THROUGHPUT PROCESSING NEEDS CPU AND MEMORY OPTIMIZATIONS
  49. 49. Write I/O traffic in Amazon RDS for PostgreSQL WAL DATA COMMIT LOG & FILES RDS FOR POSTGRESQL WITH MULTI-AZ EBS mirrorEBS mirror AZ 1 AZ 2 Amazon S3 EBS Amazon Elastic Block Store (EBS) Primary Database Node Standby Database Node 1 2 3 4 5 Issue write to Amazon EBS, EBS issues to mirror, acknowledge when both done Stage write to standby instance Issue write to EBS on standby instance I/O FLOW Steps 1, 3, 5 are sequential and synchronous This amplifies both latency and jitter Many types of writes for each user operation OBSERVATIONS T Y P E O F W R IT E
  50. 50. Write I/O traffic in an Amazon Aurora database node AZ 1 AZ 3 Primary Database Node Amazon S3 AZ 2 Read Replica / Secondary Node AMAZON AURORA ASYNC 4/6 QUORUM DISTRIBUTED WRITES DATAAMAZON AURORA + WAL LOG COMMIT LOG & FILES I/O FLOW Only write WAL records; all steps asynchronous No data block writes (checkpoint, cache replacement) 6X more log writes, but 9X less network traffic Tolerant of network and storage outlier latency OBSERVATIONS 2x or better PostgreSQL Community Edition performance on write-only or mixed read-write workloads PERFORMANCE Boxcar log records – fully ordered by LSN Shuffle to appropriate segments – partially ordered Boxcar to storage nodes and issue writes WAL T Y P E O F W R IT E Read Replica / Secondary Node
  51. 51. Asynchronous group commits Read Write Commit Read Read T1 Commit (T1) Commit (T2) Commit (T3) LSN 10 LSN 12 LSN 22 LSN 50 LSN 30 LSN 34 LSN 41 LSN 47 LSN 20 LSN 49 Commit (T4) Commit (T5) Commit (T6) Commit (T7) Commit (T8) LSN GROWTH Durable LSN at head-node COMMIT QUEUE Pending commits in LSN order TIME GROUP COMMIT TRANSACTIONS Read Write Commit Read Read T1 Read Write Commit Read Read Tn TRADITIONAL APPROACH AMAZON AURORA Maintain a buffer of log records to write out to disk Issue write when buffer full or time out waiting for writes First writer has latency penalty when write rate is low Request I/O with first write, fill buffer till write picked up Individual write durable when 4 of 6 storage nodes ACK Advance DB Durable point up to earliest pending ACK
  52. 52. Write I/O traffic in an Amazon Aurora storage node LOG RECORDS Primary Database Node INCOMING QUEUE STORAGE NODE AMAZON S3 BACKUP 1 2 3 4 5 6 7 8 UPDATE QUEUE ACK HOT LOG DATA BLOCKS POINT IN TIME SNAPSHOT GC SCRUB COALESCE SORT GROUP PEER TO PEER GOSSIPPeer Storage Nodes All steps are asynchronous Only steps 1 and 2 are in foreground latency path Input queue is far smaller than PostgreSQL Favors latency-sensitive operations Uses disk space to buffer against spikes in activity OBSERVATIONS I/O FLOW ① Receive record and add to in-memory queue ② Persist record and acknowledge ③ Organize records and identify gaps in log ④ Gossip with peers to fill in holes ⑤ Coalesce log records into new data block versions ⑥ Periodically stage log and new block versions to Amazon S3 ⑦ Periodically garbage collect old versions ⑧ Periodically validate CRC codes on blocks
  53. 53. I/O traffic in Aurora Replicas PAGE CACHE UPDATE Aurora Master 30% Read 70% Write Aurora Replica 100% New Reads Shared Multi-AZ Storage PostgreSQL Master 30% Read 70% Write PostgreSQL Replica 30% New Reads 70% Write SINGLE-THREADED WAL APPLY Data Volume Data Volume Physical: Ship redo (WAL) to Replica Write workload similar on both instances Independent storage Physical: Ship redo (WAL) from Master to Replica Replica shares storage. No writes performed Cached pages have redo applied Advance read view when all commits seen POSTGRESQL READ SCALING AMAZON AURORA READ SCALING
  54. 54. Applications restart faster with survivable caches Cache normally lives inside the operating system database process– and goes away when/if that database dies Aurora moves the cache out of the database process Cache remains warm in the event of a database restart Lets the database resume fully loaded operations much faster Cache lives outside the database process and remains warm across database restarts SQL Transactions Caching SQL Transactions Caching SQL Transactions Caching RUNNING CRASH AND RESTART RUNNING
  55. 55. Amazon Aurora PostgreSQL Announcing Performance Insights
  56. 56. First step: Enhanced Monitoring Released 2016 O/S Metrics Process & thread list Up to 1 second granularity
  57. 57. Second step: Performance Insights Database Engine Performance Tuning
  58. 58. Beyond database load • Lock detection • Execution plans • API access • Included with RDS • 35 days data retention • Support for all RDS database engines in 2017
  59. 59. Amazon Aurora PostgreSQL Roadmap
  60. 60. High Performance Easy to Operate & Compatible High Availability Secure by Design Amazon Aurora with PostgreSQL compatibility — Launch Roadmap  2-4x faster than PostgreSQL on large instances  Up to 64 TB of storage per instance  Write jitter reduction  Near synchronous replicas (<25ms)  Reader endpoint  Enhanced OS monitoring  Performance Insights  Push-button migration  Auto-scaling storage  Continuous backup and PITR  Easy provisioning / patching  All PostgreSQL features  All RDS for PostgreSQL extensions  AWS DMS supported inbound  Failover in less than 30 seconds  Customer specifiable failover order  Up to 15 readable failover targets  Instant crash recovery  Survivable buffer cache  X-region snapshot copy  Encryption at rest (AWS KMS)  Encryption in transit (SSL)  Amazon VPC by default  Row level security
  61. 61. Amazon Aurora Available Durable The Amazon Aurora database family AWS DMS Amazon RDS AWS IAM, KMS, & VPC Amazon S3 Convenient Compatible Automatic Failover Read Replicas X 6 Copies High Performance & Scale Secure Encryption at rest and in transit Enterprise Performance 64TB Storage PostgreSQL MySQL
  62. 62. Thank You! Mark Porter markpor@amazon.com

×