Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep Dive on Amazon Aurora - Covering New Feature Announcements


Published on

Amazon Aurora is a MySQL-compatible relational database engine that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. Amazon Aurora is a disruptive technology in the database space, bringing a new architectural model and distributed system techniques to provide far higher performance, availability and durability than previously available using conventional monolithic database techniques. In this session, we will do a deep-dive into some of the key innovations behind Amazon Aurora, discuss best practices and configurations, and share customer experiences from the field.

Learning Objectives:
• Learn about the capabilities and features of Amazon Aurora and its new features
• Learn about the benefits of Amazon Aurora and how it delivers 5x the performance and 1/10th the cost
• Learn about the different use cases
• Learn how to get started using Amazon Aurora

Published in: Technology
  • Follow the link, new dating source: ❤❤❤ ❤❤❤
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❶❶❶ ❶❶❶
    Are you sure you want to  Yes  No
    Your message goes here
  • Here's How YOU Can Stake Out Your Personal Claim In Our EIGHT MILLION DOLLAR GOLDMINE... ◆◆◆
    Are you sure you want to  Yes  No
    Your message goes here
  • Secrets to making $$$ with paid surveys... 
    Are you sure you want to  Yes  No
    Your message goes here

Deep Dive on Amazon Aurora - Covering New Feature Announcements

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Steve Abraham – Solutions Architect December 14, 2016 Deep Dive on Amazon Aurora Including New Features
  2. 2. Agenda  What is Aurora?  Review of Aurora performance  New performance enhancements  Review of Aurora availability  New availability enhancements  Other recent and upcoming feature enhancements
  3. 3. Open source compatible relational database Performance and availability of commercial databases Simplicity and cost-effectiveness of open source databases What is Amazon Aurora?
  4. 4. Performance
  5. 5. Write performance Read performance Aurora scales with instance size for both read and write. Aurora MySQL 5.6 MySQL 5.7 Scaling with instance sizes
  6. 6. Aurora vs. RDS MySQL – r3.4XL, MAZ Aurora 3X faster on r3.4xlarge Real-life data – gaming workload
  7. 7. Do fewer IOs Minimize network packets Cache prior results Offload the database engine Do Less Work Process asynchronously Reduce latency path Use lock-free data structures Batch operations together Be More Efficient Databases are all about i/o Network-attached storage is all about packets/second High-throughput processing is all about context switches How did we achieve this?
  8. 8. BINLOG DATA DOUBLE-WRITELOG FRM FILES TYPE OF WRITE Mysql with replica EBS mirrorEBS mirror AZ 1 AZ 2 Amazon S3 EBS Amazon Elastic Block Store (EBS) Primary Instance Replica Instance 1 2 3 4 5  Issue write to EBS – EBS issues to mirror, ack when both done  Stage write to standby instance through DRBD  Issue write to EBS on standby instance Io flow  Steps 1, 3, 4 are sequential and synchronous  This amplifies both latency and jitter  Many types of writes for each user operation  Have to write data blocks twice to avoid torn writes Observations  780K transactions  7,388K I/Os per million txns (excludes mirroring, standby)  Average 7.4 I/Os per transaction Performance 30 minute sysbench writeonly workload, 100GB dataset, RDS multiaz, 30K PIOPS IO traffic in MySQL
  9. 9. AZ 1 AZ 3 Primary Instance Amazon S3 AZ 2 Replica Instance ASYNC 4/6 QUORUM DISTRIBUTED WRITES Replica Instance BINLOG DATA DOUBLE-WRITELOG FRM FILES TYPE OF WRITE Amazon Aurora  Boxcar redo log records – fully ordered by LSN  Shuffle to appropriate segments – partially ordered  Boxcar to storage nodes and issue writes IO flow  Only write redo log records; all steps asynchronous  No data block writes (checkpoint, cache replacement)  6X more log writes, but 9X less network traffic  Tolerant of network and storage outlier latency Observations  27,378K transactions 35X MORE  950K I/Os per 1M txns (6X amplification) 7.7X LESS Performance IO traffic in Aurora
  10. 10. LOG RECORDS Primary Instance INCOMING QUEUE STORAGE NODE S3 BACKUP 1 2 3 4 5 6 7 8 UPDATE QUEUE ACK HOT LOG DATA BLOCKS POINT IN TIME SNAPSHOT GC SCRUB COALESCE SORT GROUP PEER TO PEER GOSSIPPeer Storage Nodes  All steps are asynchronous  Only steps 1 and 2 are in foreground latency path  Input queue is 46X less than MySQL (unamplified, per node)  Favor latency-sensitive operations  Use disk space to buffer against spikes in activity Observations IO Flow 1. Receive record and add to in-memory queue 2. Persist record and ACK 3. Organize records and identify gaps in log 4. Gossip with peers to fill in holes 5. Coalesce log records into new data block versions 6. Periodically stage log and new block versions to S3 7. Periodically garbage collect old versions 8. Periodically validate CRC codes on blocks IO traffic in Aurora (Storage Node)
  11. 11. MySQL Master 30% Read 70% Write MySQL Replica 30% New Reads 70% Write SINGLE-THREADED BINLOG APPLY Data Volume Data Volume Logical: Ship SQL statements to Replica  Write workload similar on both instances  Independent storage  Can result in data drift between Master and Replica Physical: Ship redo from Master to Replica  Replica shares storage. No writes performed  Cached pages have redo applied  Advance read view when all commits seen PAGE CACHE UPDATE Aurora Master 30% Read 70% Write Aurora Replica 100% New Reads Shared Multi-AZ Storage IO traffic in Aurora Replicas My SQL read scaling Amazon Aurora read scaling
  12. 12. “In MySQL, we saw replica lag spike to almost 12 minutes which is almost absurd from an application’s perspective. With Aurora, the maximum read replica lag across 4 replicas never exceeded 20 ms.” Real-life data - read replica latency
  13. 13. Read Write Commit Read Read T1 Commit (T1) Commit (T2) Commit (T3) LSN 10 LSN 12 LSN 22 LSN 50 LSN 30 LSN 34 LSN 41 LSN 47 LSN 20 LSN 49 Commit (T4) Commit (T5) Commit (T6) Commit (T7) Commit (T8) LSN GROWTH Durable LSN at head-node COMMIT QUEUE Pending commits in LSN order TIME GROUP COMMIT Transactions Read Write Commit Read Read T1 Read Write Commit Read Read Tn Traditional Approach Amazon Aurora  Maintain a buffer of log records to write out to disk  Issue write when buffer full or time out waiting for writes  First writer has latency penalty when write rate is low  Request I/O with first write, fill buffer till write picked up  Individual write durable when 4 of 6 storage nodes ACK  Advance DB Durable point up to earliest pending ACK Asynchronous group commits
  14. 14.  Re-entrant connections multiplexed to active threads  Kernel-space epoll() inserts into latch-free event queue  Dynamically size threads pool  Gracefully handles 5000+ concurrent client sessions on r3.8xl  Standard MySQL – one thread per connection  Doesn’t scale with connection count  MySQL EE – connections assigned to thread group  Requires careful stall threshold tuning Clientconnection Clientconnection LATCH FREE TASK QUEUE epoll() MySQL thread model Aurora thread model Adaptive thread pool
  15. 15. Scan Delete Scan Delete Insert Scan Scan Insert Delete Scan Insert Insert MySQL lock manager Aurora lock manager  Same locking semantics as MySQL  Concurrent access to lock chains  Multiple scanners allowed in an individual lock chains  Lock-free deadlock detection Needed to support many concurrent sessions, high update throughput Aurora lock management
  16. 16. New Performance Enhancements
  17. 17. Catalog concurrency: Improved data dictionary synchronization and cache eviction. NUMA aware scheduler: Aurora scheduler is now NUMA aware. Helps scale on multi-socket instances. Read views: Aurora now uses a latch-free concurrent read-view algorithm to construct read views. 0 100 200 300 400 500 600 700 MySQL 5.6 MySQL 5.7 Aurora 2015 Aurora 2016 In thousands of read requests/sec * R3.8xlarge instance, <1GB dataset using Sysbench 25% Throughput gain Cached read performance
  18. 18. Smart scheduler: Aurora scheduler now dynamically assigns threads between IO heavy and CPU heavy workloads. Smart selector: Aurora reduces read latency by selecting the copy of data on a storage node with best performance Logical read ahead (LRA): We avoid read IO waits by prefetching pages based on their order in the btree. 0 20 40 60 80 100 120 MySQL 5.6 MySQL 5.7 Aurora 2015 Aurora 2016 In thousands of requests/sec * R3.8xlarge instance, 1TB dataset using Sysbench Non-cached read performance 10% Throughput gain
  19. 19. Scan Delete Scan Delete Insert Scan Scan Insert Delete Scan Insert Insert Hot row contention MySQL lock manager Aurora lock manager Highly contended workloads had high memory and CPU  1.9 (Nov) – lock compression (bitmap for hot locks)  1.9 – replace spinlocks with blocking futex – up to 12x reduction in CPU, 3x improvement in throughput  December – use dynamic programming to release locks: from O(totalLocks * waitLocks) to O(totalLocks) Throughput on Percona TPC-C 100 improved 29x (from 1,452 txns/min to 42,181 txns/min)
  20. 20. MySQL 5.6 MySQL 5.7 Aurora Improvement 500 connections 6,093 25,289 73,955 2.92x 5000 connections 1,671 2,592 42,181 16.3x Percona TPC-C – 10GB * Numbers are in tpmC, measured using release 1.10 on an R3.8xlarge, MySQL numbers using RDS and EBS with 30K PIOPS MySQL 5.6 MySQL 5.7 Aurora Improvement 500 connections 3,231 11,868 70,663 5.95x 5000 connections 5,575 13,005 30,221 2.32x Percona TPC-C – 100GB Hot row contention
  21. 21.  Accelerates batch inserts sorted by primary key – works by caching the cursor position in an index traversal.  Dynamically turns itself on or off based on data pattern  Avoids contention in acquiring latches while navigating down the tree.  Bi-directional, works across all insert statements  LOAD INFILE, INSERT INTO SELECT, INSERT INTO REPLACE and, Multi-value inserts. Index R4 R5R2 R3R0 R1 R6 R7 R8 Index Root Index R4 R5R2 R3R0 R1 R6 R7 R8 Index Root MySQL: Traverses B-tree starting from root for all inserts Aurora: Inserts avoids index traversal; Insert performance
  22. 22.  MySQL 5.6 leverages Linux read ahead – but this requires consecutive block addresses in the btree. It inserts entries top down into the new btree, causing splits and lots of logging.  Aurora’s scan pre-fetches blocks based on position in tree, not block address.  Aurora builds the leaf blocks and then the branches of the tree.  No splits during the build.  Each page touched only once.  One log record per page. 2-4X better than MySQL 5.6 or MySQL 5.7 0 2 4 6 8 10 12 r3.large on 10GB dataset r3.8xlarge on 10GB dataset r3.8xlarge on 100GB dataset Hours RDS MySQL 5.6 RDS MySQL 5.7 Aurora 2016 Faster index build
  23. 23. Need to store and reason about spatial data  E.g. “Find all people within 1 mile of a hospital”  Spatial data is multi-dimensional  B-Tree indexes are one-dimensional Aurora supports spatial data types (point/polygon)  GEOMETRY data types inherited from MySQL 5.6  This spatial data cannot be indexed Two possible approaches:  Specialized access method for spatial data (e.g., R-Tree)  Map spatial objects to one-dimensional space & store in B-Tree - space-filling curve using a grid approximation A B A A A A A A A B B B B B A COVERS B B COVEREDBY A A CONTAINS B B INSIDE A A TOUCH B B TOUCH A A OVERLAPBDYINTERSECT B B OVERLAPBDYINTERSECT A A OVERLAPBDYDISJOINT B B OVERLAPBDYDISJOINT A A EQUAL B B EQUAL A A DISJOINT B B DISJOINT A A COVERS B B ON A Why Spatial Index
  24. 24. Z-index used in Aurora Challenges with R-Trees  Keeping it efficient while balanced  Rectangles should not overlap or cover empty space  Degenerates over time  Re-indexing is expensive R-Tree used in MySQL 5.7 Z-index (dimensionally ordered space filling curve)  Uses regular B-Tree for storing and indexing  Removes sensitivity to resolution parameter  Adapts to granularity of actual data without user declaration  Eg GeoWave (National Geospatial-Intelligence Agency) Spatial Indexes in Aurora
  25. 25. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . * r3.8xlarge using Sysbench on <1GB dataset * Write Only: 4000 clients, Select Only: 2000 clients, ST_EQUALS 0 20000 40000 60000 80000 100000 120000 140000 Select-only (reads/sec) Write-only (writes/sec) Aurora MySQL 5.7 Spatial Index Benchmarks Sysbench – points and polygons
  26. 26. Availability “Performance only matters if your database is up”
  27. 27.  Storage volume automatically grows up to 64 TB  Quorum system for read/write; latency tolerant  Peer to peer gossip replication to fill in holes  Continuous backup to S3 (built for 11 9s durability)  Continuous monitoring of nodes and disks for repair  10GB segments as unit of repair or hotspot rebalance  Quorum membership changes do not stall writes AZ 1 AZ 2 AZ 3 Amazon S3 Storage Durability
  28. 28.  Aurora cluster contains primary node and up to fifteen replicas  Failing database nodes are automatically detected and replaced  Failing database processes are automatically detected and recycled  Customer application may scale-out read traffic across replicas  Replica automatically promoted on persistent outage AZ 1 AZ 3AZ 2 Primary Node Primary Node Primary Node Primary Node Primary Node Secondary Node Primary Node Primary Node Secondary Node More Replicas
  29. 29. Segment snapshot Log records Recovery point Segment 1 Segment 2 Segment 3 Time Continuous Backup  Take periodic snapshot of each segment in parallel; stream the redo logs to Amazon S3  Backup happens continuously without performance or availability impact  At restore, retrieve the appropriate segment snapshots and log streams to storage nodes  Apply log streams to segment snapshots in parallel and asynchronously
  30. 30. Traditional Databases  Have to replay logs since the last checkpoint  Typically 5 minutes between checkpoints  Single-threaded in MySQL; requires a large number of disk accesses Amazon Aurora  Underlying storage replays redo records on demand as part of a disk read  Parallel, distributed, asynchronous  No replay for startup Checkpointed Data Redo Log Crash at T0 requires a re-application of the SQL in the redo log since last checkpoint T0 T0 Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously Instant Crash Recovery
  31. 31.  We moved the cache out of the database process  Cache remains warm in the event of database restart  Lets you resume fully loaded operations much faster  Instant crash recovery + survivable cache = quick and easy recovery from DB failures SQL Transactions Caching SQL Transactions Caching SQL Transactions Caching Caching process is outside the DB process and remains warm across a database restart Survivable Caches
  32. 32. App RunningFailure Detection DNS Propagation Recovery Recovery DB Failure MYSQL App Running Failure Detection DNS Propagation Recovery DB Failure AURORA WITH MARIADB DRIVER 15-20 sec 3-20 sec Faster Fail-Over
  33. 33. 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 0 - 5s – 30% of fail-overs 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 5 - 10s – 40% of fail-overs 0% 10% 20% 30% 40% 50% 60% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 10 - 20s – 25% of fail-overs 0% 5% 10% 15% 20% 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 20 - 30s – 5% of fail-overs Database fail-over time
  34. 34. New Availability Enhancements
  35. 35. You also incur availability disruptions when you 1. Patch your database software 2. Modify your database schema 3. Perform large scale database reorganizations 4. DBA errors requiring database restores Availability is about more than HW failures
  36. 36. Networking state Application state Storage Service App state Net state App state Net state BeforeZDP New DB Engine Old DB Engine New DB Engine Old DB Engine WithZDP User sessions terminate during patching User sessions remain active through patching Storage Service Zero downtime patching
  37. 37. We have to go to our current patching model when we can’t park connections:  Long running queries  Open transactions  Bin-log enabled  Parameter changes pending  Temporary tables open  Locked Tables  SSL connections open  Read replicas instances We are working on addressing the above. Zero downtime patching – current constraints
  38. 38. Create a copy of a database without duplicate storage costs  Creation of a clone is nearly instantaneous – we don’t copy data  Data copy happens only on write – when original and cloned volume data differ Typical use cases:  Clone a production DB to run tests  Reorganize a database  Save a point in time snapshot for analysis without impacting production system. Production database Clone Clone Clone Dev/test applications Benchmarks Production applications Production applications Database cloning
  39. 39. Page 1 Page 2 Page 3 Page 4 Source Database Page 1 Page 3 Page 2 Page 4 Cloned database Shared Distributed Storage System: physical pages Both databases reference same pages on the shared distributed storage system Page 1 Page 2 Page 3 Page 4 How does it work?
  40. 40. Page 1 Page 2 Page 3 Page 4 Page 5 Page 1 Page 3 Page 5 Page 2 Page 4 Page 6 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 As databases diverge, new pages are added appropriately to each database while still referencing pages common to both databases Page 2 Page 3 Page 5 Shared Distributed Storage System: physical pages Source Database Cloned database How does it work? (contd.)
  41. 41.  Full Table copy; rebuilds all indexes – can take hours or days to complete.  Needs temporary space for DML operations  DDL operation impacts DML throughput  Table lock applied to apply DML changes Index LeafLeafLeaf Leaf Index Root table name operation column-name time-stamp Table 1 Table 2 Table 3 add-col add-col add-col column-abc column-qpr column-xyz t1 t2 t3  We add an entry to the metadata table and use schema versioning to decode the blo.  Added a modify-on-write primitive to upgrade the block to the latest schema when it is modified.  Currently support add NULLable column at end of table  Priority is to support other add column, drop/reortder, modify datatypes. MySQL Amazon Aurora Online DDL: Aurora vs. MySQL
  42. 42. On r3.large On r3.8xlarge Aurora MySQL 5.6 MySQL 5.7 10GB table 0.27 sec 3,960 sec 1,600 sec 50GB table 0.25 sec 23,400 sec 5,040 sec 100GB table 0.26 sec 53,460 sec 9,720 sec Aurora MySQL 5.6 MySQL 5.7 10GB table 0.06 sec 900 sec 1,080 sec 50GB table 0.08 sec 4,680 sec 5,040 sec 100GB table 0.15 sec 14,400 sec 9,720 sec Online DDL performance
  43. 43. t0 t1 t2 t0 t1 t2 t3 t4 t3 t4 Rewind to t1 Rewind to t3 Invisible Invisible What is online point in time restore? Online point in time restore is a quick way to bring the database to a particular point in time without having to restore from backups  Rewinding the database to quickly recover from unintentional DML/DDL operations  Rewind multiple times to determine the desired point in time in the database state. For example quickly iterate over schema changes without having to restore multiple times
  44. 44. Online PiTR Online PiTR operation changes the state of the current DB Current DB is available within seconds, even for multi terabyte DBs No additional storage cost as current DB is restored to prior point in time Multiple iterative online PiTRs are practical Rewind has to be within the allowed rewind period based on purchased rewind storage Cross-region online PiTR is not supported Offline PiTR PiTR creates a new DB at desired point in time from the backup of the current DB New DB instance takes hours to restore for multi terabyte DBs Each restored DB is billed for its own storage Multiple iterative offline PiTRs is time consuming Offline PiTR has to be within the configured backup window or from snapshots Aurora supports cross-region PiTR Online vs Offline Point in Time Restore (PiTR)
  45. 45. Segment snapshot Log records Rewind Point Segment 1 Segment 2 Segment 3 Time Storage Segments How does it work?  Take periodic snapshots within each segment in parallel and store them locally  At rewind time, each segment picks the previous local snapshot and applies the log streams to the snapshot to produce the desired state of the DB
  46. 46. t0 t1 t2 t0 t1 t2 t3 t4 t3 t4 Rewind to t1 Rewind to t3 Invisible Invisible How does it work? (contd.) Logs within the log stream are made visible or invisible based on the branch within the LSN tree to provide a consistent view for the DB  The first rewind performed at t2 to rewind the DB to t1 makes the logs in purple color invisible  The second rewind performed at time t4 to rewind the DB to t3 makes the logs in red and purple invisible
  47. 47. Removing Blockers
  48. 48.  Amazon Aurora PostgreSQL compatibility now in preview  Same underlying scale out, 3 AZ, 6 copy, fault tolerant, self healing, expanding database optimized storage tier  Integrated with a PostgreSQL 9.6 compatible database Logging + Storage SQL Transactions Caching Amazon S3 My applications require PostgreSQL
  49. 49. T2 RI discounts Up to 34% with a 1-year RI Up to 57% with a 3-year RI vCPU Mem Hourly Price db.t2.medium 2 4 $0.082 db.r3.large 2 15.25 $0.29 db.r3.xlarge 4 30.5 $0.58 db.r3.2xlarge 8 61 $1.16 db.r3.4xlarge 16 122 $2.32 db.r3.8xlarge 32 244 $4.64 T2.Small coming in Q1. 2017 *Prices are for Virginia An R3.Large is too expensive for my use case
  50. 50.  Amazon Aurora gives each database instance IP firewall protection  Aurora offers transparent encryption at rest and SSL protection for data in transit  Amazon VPC lets you isolate and control network configuration and connect securely to your IT infrastructure  AWS Identity and Access Management provides resource-level permission controls *New* *New* My databases need to meet certifications
  51. 51. MariaDB server audit plugin Aurora native audit support We can sustain over 500K events/sec Create event string DDL DML Query DCL Connect Write to File DDL DML Query DCL Connect Create event string Create event string Create event string Create event string Create event string Latch-free queue Write to File Write to File Write to File MySQL 5.7 Aurora Audit Off 95K 615K 6.47x Audit On 33K 525K 15.9x Sysbench Select-only Workload on 8xlarge Instance Aurora Auditing
  52. 52. Lambda S3 IAM CloudWatch Generate Lambda event from Aurora stored procedures. Load data from S3, store snapshots and backups in S3. Use IAM roles to manage database access control. Upload systems metrics and audit logs to CloudWatch. *NEW* Q1 AWS ecosystem
  53. 53. Business Intelligence Data Integration Query and Monitoring “We ran our compatibility test suites against Amazon Aurora and everything just worked." - Dan Jewett, VP, Product Management at Tableau MySQL 5.6 / InnoDB compatible  No application compatibility issues reported in last 18 months  MySQL ISV applications run pretty much as is Working on 5.7 compatibility  Running a bit slower than expected – hope to make it available soon  Back ported 81 fixes from different MySQL releases; MySQL compatibility
  54. 54. Available now Available in Dec Available in Q1 Performance Availability Security Ecosystem PCI/DSS HIPPA/BAA Fast online schema change Managed MySQL to Aurora replication Cross-region snapshot copy Online Point in Time Restore Database cloning Zero-downtime patching Spatial indexingLock compression Replace spinlocks with blocking futex Faster index build Aurora auditing IAM Integration Copy-on-write volume T2.Medium T2.Small Cloudwatch for metrics, audit Timeline
  55. 55. Thank you! Steve Abraham –