AWS EBS Helps Librato Optimize Cassandra Performance

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Mike Heffner, Data, Librato
Peter Norton, Operations, Librato
Case Study: Librato’s Experience Running
Cassandra Using Amazon EBS, ENIs,
and Amazon VPC
December 1, 2016
CMP213

Collect Some of the integrations with

Cassandra as a time-series database

Data sharding
Real Time
Second (TTL Week)
Long Retention
15 Min (TTL Month) 1 Hour (TTL Year)

Librato circa 2015
●Cassandra 2.0.11 + patches
●i2.2xlarge
• DataStax and Netflix preferred instance type
●160 instances
●Never Amazon EBS – only instance store
●Raid0 over instance store (1.5 TB)

Operational challenges
●CPU/cost ratio low on i2.2xlarge
• Kept rings hot to maximize efficiency
●Persistent data tied to instance
• Long MTTR to stream large data sets
●Maximum data volume size
• Had to scale rings for data capacity

Cassandra infrastructure cost
A total of
60%
(i2.2xlarge)
(Non-Cassandra
infrastructure)

Enter Amazon VPC migration – 3Q 2015
●Librato moving Classic → Amazon VPC
●Code all the things: SaltStack / Terraform / Flask
●Opportunity to overhaul Cassandra
●Emboldened by CrowdStrike Amazon EBS talk
@ re:Invent 2015
• We can do this!
• Anticipating a big win

Initial target spec
● Cassandra 2.2.x
● c4.4xlarge
● EBS GP2
• 4 TB data
• 200 GB commitlog
● XFS
● Ubuntu 14.04 + enhanced networking driver (ixgbevf)
● sysctl, IO, NUMA tuning suggestions from Al Tobey’s guide
● Kernel 3.13

The Migration Journey
February 2016

Test setup: dual writing
Feb 2016
Started
Current Ring Test Ring

Test setup: read sharding
Started
Feb 2016
Current Ring Test Ring Current Ring Test Ring

Write timeouts – March
Started Write
Timeouts
Feb 2016 March

Write timeouts – When did this start?
Started Write
Timeouts
Bisected to Here
Known Good Tested Version
Feb 2016 March

Write timeouts – Found it!
No timeouts in C* 2.1.4
Appear at 2.1.5
Started Write
Timeouts
Feb 2016 March

Write timeouts – CASSANDRA-11302
Started Write
Timeouts
Feb 2016 March

EBS metrics
● Spent a lot of time second-guessing EBS performance
● No GP2 burst scheduling metrics
● EBS/CloudWatch metrics only 5 minute resolution

GP2 Burst Balance – November 10

● Started with 200 GB GP2 volume
● 600 IOPS max
● Hit bottlenecks during test (15-30 min+)
● Workaround
• Bumped to 1 TB commitlog volume (3k IOPS)
• Tested with sharing commitlog on data disk
Commitlog scaling – April 2016
Write
Timeouts
Started Commitlog
Scaling
Feb 2016 March April

Commitlog scaling
Along comes a Twitter conversation...

● Throughput Optimized HDD (st1)
● Use a 600 GB st1 partition
● Cost <50% of GP2
● Commitlog separate from data
New commitlog config
Started Write
Timeouts
Commitlog
Scaling
st1 GA st1 Testing
Feb 2016 March April April 9 May

Ring connection timeouts – June 2016
● Added our read load
● Small message drops
● Slow start, grew to collapse of ring
● Rolling restarts fixed it for a day
Started Write
Timeouts
Commitlog
Scaling
st1 GA st1 Testing Connection
Timeouts
Feb 2016 March April April 9 May June

Ring connection timeouts
● Called The Last Pickle
otc_coalescing_strategy: DISABLED
Started Write
Timeouts
Commitlog
Scaling
st1 GA st1 Testing Connection
Timeouts
Feb 2016 March April April 9 May June

Production reached! – July 2016
We’ll live with more network traffic for now
Write
Timeouts
Commitlog
Scaling
Started st1 GA st1 Testing Connection
Timeouts
In Prod!
Feb 2016 March April April 9 May June July 2016

Librato today
Real Time Long Retention
One week retention Retain over a year
c4.4xlarge m4.2xlarge
EBS 2 TB GP2 data partition EBS 4T B GP2 data partition
EBS 600 GB ST1 commitlog EBS 600 GB ST1 commitlog
Split ring configurations:

Before
● 120 * i2.2xlarge
● Instance cost: $62k monthly*
Real-time rings: before and after
After
● 66 * c4.4xlarge
● 2 TB GP2 + 600 GB ST1
● EBS cost: $15k monthly
● Total: $40k monthly
Total savings
35% (*) 1 year up-front

Before
● 36 * i2.2xlarge
Long retention rings: before and after
After
● 30 * m4.2xlarge
● 4 TB GP2 + 600 GB ST1
● Instances: $6k monthly*
● EBS cost: $13k monthly
● Total: $19k monthly
Even cost
2x+ more disk capacity
(*) 1 year up front

Reducing MTTR
● Two critical pieces of state for Cassandra:
• Data files (commitlog and sstables)
• Network interface
● Data now on EBS
● ENI provides detached IP address (Amazon VPC only)
● Mobility provides a lot of flexibility

Bring them up, bring them down
● Now new rings are brought up fast
● Easier to automate
When not in use
● Shut down nodes
● Park the disks
… Or just destroy them

We’ve grown up: managing resources with
Terraform
● Query Terraform for state
● Create EBS
● Create ENI
● Create Security groups

Organized to let us remove resources
Keeps us from being cloud hoarders. When we’re done with a ring, we
can remove resources easily.
● Remove EBS
● Remove ENI
● Remove Security groups
Snapshots are still available in case we need them.

● Launch instances
● Attach EBS and ENI
● Configure rings
● Augment with the Salt API
● Clear guardrails built into the process
We launch rings with SaltStack

Failed instance replacement
Whole process is minutes vs. hours of bootstrapping

Disaster recovery
● Previously
• Tablesnap to Amazon S3
• Required constant pruning (tablechop)
• Amazon S3 bill high
● Now
• EBS snapshots
• Cron job to snapshot EBS via Ops API
• Cron job to clean old snapshots via Ops API
• Block differences: no pruning needed

In-place ring scale up
Now we have a button to push
● Sudden load change
● Rolling operation to scale up instances
● Example: scale from c4.4xl → c4.8xl
After comfortable with capacity, we can still scale ring out with
bootstrap

Disk access mode
● MMap (4 KB faults) and Standard (read/write syscalls)
● MMap works well for small, random-access row reads
● Read ahead kept small for performant small reads
● Large compaction operations are sequential I/O
● What does this mean?

This impacts cost
● We must provision for a high base IOP load
● Disk size much larger than used capacity
● EBS GP2 IOP size is 256 KB
● What can be done?

Hybrid disk access mode
● MMap reads for row queries
● Standard mode (read/write) during compaction
● Ensure reads are chunked
● Chunk size configurable by disk (eg., 256 KB for GP2)

Wrapup
● Make it easy to test
production traffic
● Instance flexibility with EBS
● Operational simplicity and
reduced MTTR
● Reduced cost and increased
headroom
Future
● Debug network coalescing
● Cassandra 3.0
● More testing of hybrid disk
access models

Thank you!
@mheffner and @petercnorton
Come see us at SolarWinds booth #818

Remember to complete
your evaluations!

AWS EBS Helps Librato Optimize Cassandra Performance

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to AWS EBS Helps Librato Optimize Cassandra Performance

Similar to AWS EBS Helps Librato Optimize Cassandra Performance (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

AWS EBS Helps Librato Optimize Cassandra Performance