Adventures in RDS Load Testing

Adventures in RDS Load
Testing
Mike Harnish, KSM Technology Partners LLC

Objectives
Empirical basis for evaluation
 Of RDS as a platform for future development
 Of performance of different configurations

Platform for future load testing
 Of different configurations, schemas, and load profiles

Not strictly scientific
 Did not try to isolate all possible sources of variability

Not benchmarking
Not exhaustive
 Some configurations not tested

Why RDS? Why Oracle?
Why not DynamoDB/NoSQL?
 Nothing at all against them
 Testing platform design does not exclude them

Why not MySQL/SQLServer?
 Ran out of time

Why not PostgreSQL?
 Ran out of time, but would be my next choice

RDBMS migration path

How We Tested
Provision RDS servers
Generate test data
Introduce distributed load
 Persistent and relentless
 Rough-grained “batches” of work
 For a finite number of transactions

Monitor servers
 With Cloudwatch

Analyze per-batch statistics

RDS Server Configurations
db.m2.4xlarge
 High-Memory Quadruple Extra Large DB Instance: 68 GB of
memory, 26 ECUs (8 virtual cores with 3.25 ECUs each), 64-bit
platform, High I/O Capacity, Provisioned IOPS Optimized:
1000Mbps
 At 3000 and 1000 PIOPS
 $3.14 base/hour, Oracle license included
 The largest supported instance type for Oracle

db.m1.xlarge
 Extra Large DB Instance: 15 GB of memory, 8 ECUs (4 virtual
cores with 2 ECUs each), 64-bit platform, High I/O Capacity,
Provisioned IOPS Optimized: 1000Mbps
 No PIOPS
 $1.13 base/hour, license included, on-demand

Test Schema
CREATE TABLE loadgen.account(
account_id NUMBER(9)
CONSTRAINT pk_account PRIMARY KEY,
balance NUMBER(6,2) DEFAULT 0 NOT NULL);
CREATE TABLE loadgen.tx(
tx_id NUMBER(9) CONSTRAINT pk_tx PRIMARY KEY,
account_id NUMBER(9) CONSTRAINT fk_tx_account
REFERENCES loadgen.account(account_id),
amount NUMBER(6,2) NOT NULL,
description VARCHAR2(100),
tx_timestamp TIMESTAMP DEFAULT SYSDATE);
CREATE INDEX loadgen.idx_tx_lookup ON loadgen.tx(account_id, tx_timestamp)
…
CREATE SEQUENCE loadgen.seq_tx_id
…

Baseline Test Data
5,037,003 accounts
353,225,005 transactions
 Roughly 70 initial transactions per account

300GB provisioned storage
 Mostly to get higher PIOPS

Using ~67GB of it
 According to CloudWatch

Test Environment
c1.xlarge
•
•
•
•

t1.micro

SQLPlus

8 vCPU
20 ECU
7GB memory
High network performance

JDBC
RDS Instances

Processing View
Lightweight Batch Specs (2000b by 500tx)
{"targetReadRatio":3,"targetWriteRatio":1,"size":500,"run":"run01",
"id":13,"accountRange":{"start":10001,"count":5040800}

Producer

Tx Queue

Batch Performance Stats
(Also JSON formatted – tl;dr)

Consumers
(12-24)

Stats Queue

• 1M JDBC tx/run
• 3 read : 1 write ratio
• Randomized over the known
set of pre-loaded accounts
• Commit per tx (not per
batch)

RDS Instances
(Victims)

Stats
Collector
.csv

Transaction Specifications
Read Transaction
 Query random ACCOUNT for balance
 Query TX for last 10 tx by TIMESTAMP DESC
 Scan the returned cursor

Write Transaction
 Insert a random (+/-) amount into the TX table for a random
account
 Update the ACCOUNT table by applying that amount to the
current balance
 Commit (or rollback on failure)

[1] db.m2.4xlarge, 3000 PIOPS
(4 consumers @ 6 threads ea)
Cumulative: 5765 tps

Run 01
12000

9000

8000
10000

8000

6000

5000
TPS

Milliseconds Elapsed per Batch

7000

6000
4000

4000

3000

2000
2000
1000

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

Batch Received by Stats Collector
ElapsedTimeMillis

NetTPS

Run 01 Monitoring Results

Peaked @ 2200 Write IOPS

Disk Queue Depth > 100

What’s up with Read IOPS?

(4 consumers @ 6 threads ea) … again
???


Run 02
30000

14000

12000

10000
20000

8000
TPS


25000

15000
6000
10000
4000

5000

2000

0

0
1

101

201

301

401

501

601

701

801

901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

ElapsedTimeMillis

TotalTxPerSecond

(4 consumers @ 6 threads ea) … again

Peaked @ 2500+ Write IOPS

Disk Queue Depth
tracks Write IOPS (or vice versa)

(4 consumers @ 6 threads ea) … third run

Run 03
30000

10000
9000

25000

7000
20000
6000

15000

5000
4000

10000
3000
2000
5000
1000
0

0
1

101

201

301

401

501

601

701

801

901 1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

ElapsedTimeMillis

TotalTxPerSecond

TPS


8000

(4 consumers @ 6 threads ea) … third run

Peaked @ 2500+ Write IOPS
Very curious what’s going on
in this interval, from peak to
end of run
Disk Queue Depth
tracks Write IOPS (or vice versa)

Run 04
12000

5000
4500

10000

Dialed back concurrency, on the hunch that
Oracle is resetting too many connections

8000

6000

3500
3000
2500
2000

4000
1500
1000
2000
500
0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

ElapsedTimeMillis

TotalTxPerSecond

TPS


4000


Run 05
80000

6000

Dialing back up made it worse

5000

60000
4000
50000

40000

3000

30000
2000
20000
1000
10000

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

ElapsedTimeMillis

TotalTxPerSecond

TPS


70000

[6] db.m1.xlarge, No PIOPS
12000


Run 06

Some early flutter, but
not much

1200

1000

8000

800

6000

600

4000

400

2000

200

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

ElapsedTimeMillis

TotalTxPerSecond

TPS


10000

[6] db.m1.xlarge, No PIOPS

Different colors than on
previous slides

Latency: Run 1 (3000 PIOPS)
Run 01 Batch Latencies (all milliseconds)

2000

15

1500

10

1000

5

500

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

MedianWriteLatency

AvgTxLatencyMs

HighWriteLatency

High Write Latency

2500

20
AverageTx/Median Write Latency

25

Latency: Run 6 (No PIOPS)
Run 06 Batch Latencies (all milliseconds)
45

3500

40
3000

2500
30
2000

25

20

1500

15
1000
10
500
5

0

0
1

101

201

301

401

501

601

701

801

901

1001 1101 1201 1301 1401 1501 1601 1701 1801 1901

AvgTxLatencyMs

MedianWriteLatency

HighWriteLatency

High Write Latency

AverageTx/Median Write Latency

35

Pricing

(does not include cost of backup storage)

Single AZ
Instance Type

Storage
PIOPS (GB)

Hourly
O/D**

PIOPS/
Month

Multi-AZ

Storage/
Cost/
GB-month* Month

Hourly
O/D**

PIOPS/
Month

Storage/
Cost/
GB-month* Month

Runs 1,2,3

db.m2.4xlarge

3000

300

$3.14

$0.10

$0.13 $2,598.30

$6.28

$0.20

$0.25 $5,196.60

Runs 4,5

db.m2.4xlarge

1000

300

$3.14

$0.10

$0.13 $2,398.30

$6.28

$0.20

$0.25 $4,796.60

Run 6

db.m1.xlarge

0

300

$1.13

$0.10

$0.10

$2.26

$0.20

$0.20 $1,687.20

$843.60

*Non-PIOPS storage also incurs I/O requests at $0.10/million requests
**Oracle “license-included” pricing. Significant savings for reserved instances.

Conclusions and Takeaways
PIOPS matters
 For throughput and latency

Need larger sampling periods
 To mitigate the effect of warm-up of instruments and subject

Need to try different R/W ratios
 And to gauge how they impact realized PIOPS

Backup and restore takes time
 Consider use of promotable read replicas, for platforms that support it
 Otherwise I might have had more samples

Adventures in RDS Load Testing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Adventures in RDS Load Testing

Similar to Adventures in RDS Load Testing (20)

Recently uploaded

Recently uploaded (20)

Adventures in RDS Load Testing

Editor's Notes