Cassandra Performance and Scalability on AWS

Cassandra Performance and
Scalability on AWS
August 8th, 2012
Adrian Cockcroft
@adrianco #netflixcloud #cassandra12
http://www.linkedin.com/in/adriancockcroft

Things we do do. Run benchmarks.
Now.

Screenshots from Live Demo

Backup slides from pre-runs of the
demo with some updates to show
what actually happened

Asgard
cass_perf apps, with no instances running

Jenkins
Jenkins perf_test jobs

Asgard
Iinitial set of cass instances up and running

Back to Presentation

While the load gets going….

Scalability from 48 to 288 nodes on AWS
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

Client Writes/s by node count – Replication Factor = 3
1200000
1099837
1000000

800000
Used 288 of m1.xlarge
600000 4 CPU, 15 GB RAM, 8 ECU
537172 Cassandra 0.86
400000 Benchmark config only
366828 existed for about 1hr
200000 174373

0
0 50 100 150 200 250 300 350

Blah Blah Blah

(I’m skipping all the cloud intro etc. Netflix
runs in the cloud, if you hadn’t figured that
out already you aren’t paying attention and
should go read slideshare.net/netflix)

“Some people skate to the puck,
I skate to where the puck is going to be”
Wayne Gretzky

Cassandra on AWS
The Past The Future
• Instance: m2.4xlarge • Instance: hi1.4xlarge
• Storage: 2 drives, 1.7TB • Storage: 2 SSD volumes, 2TB
• CPU: 8 Cores, 26 ECU • CPU: 8 HT cores, 35 ECU
• RAM: 68GB • RAM: 64GB
• Network: 1Gbit • Network: 10Gbit
• IOPS: ~500 • IOPS: ~100,000
• Throughput: ~100Mbyte/s • Throughput: ~1Gbyte/s
• Cost: $1.80/hr • Cost: $3.10/hr

Cassandra Disk vs. SSD Benchmark
Same Throughput, Lower Latency, Half Cost

Live Demo Workload
• Jenkins automation
– Jmeter load driver
– Asgard provisioning
– Priam instance management
• Traffic
– Reading/writing whole 100 column rows
– Randomly selected from 25M row keys
– Run for 10minutes, then double ring size

Major Front End Services
• Non-member Web Site www.netflix.com
– Marketing driven, sign up flow, SOX/PCI scope

• Member Web Site movies.netflix.com
– Personalization driven

• CDNs for delivering bulk video/audio
– Netflix CDN: openconnect.netflix.com

• API for external and device user interfaces
– Mostly private APIs, public API docs at developer.netflix.com

• API for controlling video playback
– DRM, QoS management, Bookmarks

Netflix Deployed on AWS
2009 2009 2010 2010 2010 2011

Content Logs Play WWW API CS
Content S3 International
DRM Sign-Up Metadata
Management Terabytes CS lookup

EC2 Diagnostics &
EMR CDN routing Search Device Config
Encoding Actions

S3 Movie TV Movie Customer Call
Hive & Pig Bookmarks
Petabytes Choosing Choosing Log

Business Social
Logging Ratings CS Analytics
Intelligence Facebook
CDNs
ISPs
Terabits
Customers

Cassandra Instance Architecture

Linux Base AMI (CentOS)
Priam
Cassandra
Manager
Token
Java7
Management, Bac
kups, Autoscaling
Tomcat/Java7 AppDynamics

Monitoring
appagent
monitoring Cassandra 1.09
Log rotation
AppDynamics GC and thread
machineagent dump logging
Etc.

Priam – Cassandra Automation
Available at http://github.com/netflix

• Netflix Platform Tomcat Code
• Zero touch auto-configuration
• State management for Cassandra JVM
• Token allocation and assignment
• Broken node auto-replacement
• Full and incremental backup to S3
• Restore sequencing from S3
• Grow/Shrink Cassandra “ring”

Astyanax
Available at http://github.com/netflix

• Features
– Complete abstraction of connection pool from RPC protocol
– Fluent Style API
– Operation retry with backoff
– Token aware
• Recipes
– Distribute row lock (without zookeeper)
– Multi-DC row lock
– Uniqueness constraint
– Multi-row uniqueness constraint
– Large file storage

Scale Up

Return to live demo to watch new
nodes coming online

Kiklos
Clusters growing from 12 to 24
in-service, bootstrapping, garbage-collecting, cass-down

• http://explorers.us-east-
1.dyntest.netflix.net:7001/jr/cassandradashbo
ard

Kiklos
Clusters growing from 12 to 24
in-service, bootstrapping, garbage-collecting, cass-down

Asgard

Showed 24 nodes per cluster, but
didn’t get a screen shot

Back to Presentation

While jenkins/jmeter collects graphs
and shuts down the systems

Cassandra on AWS

A highly available and durable
deployment pattern

High Availability
• Cassandra stores 3 local copies, 1 per zone
– Synchronous access, durable, highly available
– Read/Write One fastest, use for fire and forget
– Read/Write Quorum 2 of 3, use for read-after-write

• AWS Availability Zones
– Separate buildings
– Separate power etc.
– Fairly close together

“Traditional” Cassandra Write Data Flows
Single Region, Multiple Availability Zone, Not Token Aware

Cassandra
•Disks
2•Zone A 2
4 2
1. Client Writes to any Cassandra 3 3Cassandra If a node goes
Cassandra Node •Disks5 •Disks 5 offline, hinted handoff
2. Coordinator Node •Zone C 1 •Zone B completes the write
replicates to nodes when the node comes
Non Token
and Zones back up.
3. Nodes return ack to Aware
coordinator Clients Requests can choose to
4. Coordinator returns Cassandra Cassandra wait for one node, a
ack to client •Disks •Disks quorum, or all nodes to
5. Data written to •Zone B •Zone C ack the write
internal commit log 3
disk (no more than Cassandra SSTable disk writes and
10 seconds later) •Disks 5 compactions occur
•Zone A
asynchronously

Astyanax - Cassandra Write Data Flows
Single Region, Multiple Availability Zone, Token Aware

Cassandra
•Disks
•Zone A

1. Client Writes to Cassandra 2 2Cassandra If a node goes
nodes and Zones •Disks3 •Disks 3 offline, hinted handoff
2. Nodes return ack to •Zone C 1 •Zone B completes the write
client Token when the node comes
3. Data written to back up.
internal commit log Aware
disks (no more than Clients Requests can choose to
10 seconds later) Cassandra Cassandra wait for one node, a
•Disks •Disks quorum, or all nodes to
•Zone B •Zone C ack the write
2
Cassandra SSTable disk writes and
•Disks 3 compactions occur
•Zone A
asynchronously

Data Flows for Multi-Region Writes
Token Aware, Consistency Level = Local Quorum

1. Client writes to local replicas If a node or region goes offline, hinted handoff
2. Local write acks returned to completes the write when the node comes back up.
Client which continues when Nightly global compare and repair jobs ensure
2 of 3 local nodes are everything stays consistent.
committed
3. Local coordinator writes to
remote coordinator. 100+ms latency
Cassandra Cassandra
4. When data arrives, remote • Disks
• Zone A
• Disks
• Zone A

coordinator node acks and Cassandra 2 2
Cassandra Cassandra 4Cassandra
6
• Disks • Disks 6 3 5• Disks6 4 Disks6
copies to other remote zones • Zone C
1
• Zone B • Zone C
•
• Zone B

4
5. Remote nodes ack to local US EU
coordinator Clients Clients
Cassandra 2
Cassandra Cassandra Cassandra
6. Data flushed to internal • Disks
• Zone B
• Disks
• Zone C
6 • Disks
• Zone B
• Disks
• Zone C

commit log disks (no more Cassandra 5
6Cassandra
• Disks
than 10 seconds later) • Zone A
• Disks
• Zone A

Extending to Multi-Region
Added production UK/Ireland support with no downtime
Minimize impact on original cluster using bulk backup move

1. Create cluster in EU Take a Boeing 737 on a domestic flight, upgrade it to a
747 by adding more engines, fuel and bigger wings
2. Backup US cluster to S3
and fly it to Europe without landing it on the way…
3. Restore backup in EU
4. Local repair EU cluster
5. Global repair/join
Cassandra
100+ms latency Cassandra 1
• Disks • Disks
• Zone A • Zone A

Cassandra Cassandra Cassandra Cassandra
• Disks • Disks • Disks • Disks
• Zone C • Zone B • Zone C • Zone B

US 5 EU
Clients Clients
Cassandra Cassandra Cassandra Cassandra
• Disks • Disks • Disks • Disks
• Zone B • Zone C • Zone B • Zone C

Cassandra Cassandra
• Disks • Disks
• Zone A
3 • Zone A
4
2
S3

Cassandra Backup
• Full Backup Cassandra

Cassandra Cassandra

– Time based snapshot
– SSTable compress -> S3 Cassandra Cassandra

• Incremental S3
Backup
Cassandra Cassandra

– SSTable write triggers
compressed copy to S3
Cassandra Cassandra

• Archive Cassandra Cassandra

– Copy cross region
A

ETL for Cassandra
• Data is de-normalized over many clusters!
• Too many to restore from backups for ETL
• Solution – read backup files using Hadoop
• Aegisthus
– http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html

– High throughput raw SSTable processing
– Re-normalizes many clusters to a consistent view
– Extract, Transform, then Load into Teradata

Netflix Open Source Strategy
• Release PaaS Components git-by-git
– Source at github.com/netflix – we build from it…
– Intros and techniques at techblog.netflix.com
– Blog post or new code every few weeks

• Motivations
– Give back to Apache licensed OSS community
– Motivate, retain, hire top engineers
– “Peer pressure” code cleanup, external contributions

Open Source Projects and Posts
Legend
Github / Techblog Priam Exhibitor Servo and Autoscaling
Cassandra as a Service Zookeeper as a Service Scripts
Apache Contributions
Astyanax Honu
Curator
Techblog Post Cassandra client for Log4j streaming to
Zookeeper Patterns
Java Hadoop
Coming Soon
EVCache
CassJMeter Circuit Breaker
Memcached as a
Cassandra test suite Robust service pattern
Service

Cassandra Asgard
Eureka / Discovery
Multi-region EC2 AutoScaleGroup based
Service Directory
datastore support AWS console

Aegisthus Archaius
Chaos Monkey
Hadoop ETL for Dynamics Properties
Robustness verification
Cassandra Service

Chaos Monkey
http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html

• Computers (Datacenter or AWS) randomly die
– Fact of life, but too infrequent to test resiliency
• Test to make sure systems are resilient
– Allow any instance to fail without customer impact
• Chaos Monkey hours
– Monday-Friday 9am-3pm random instance kill
• Application configuration option
– Apps now have to opt-out from Chaos Monkey

Asgard
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html

• Replacement for AWS Console at Scale
– Groovy/Grails/JVM based
– Supports all AWS regions on a global basis
– Specific to AWS feature set

• Hides the AWS credentials
– Use AWS IAM to issue restricted keys for Asgard
– Each Asgard instance manages one account
– One install each for test, prod, audit accounts

Roadmap for 2012
• More resiliency and improved availability
• More automation, orchestration
• “Hardening” the platform, code clean-up
• Lower latency for web services and devices
• IPv6 – running now, see techblog for details
• More open sourced components
• Las Vegas in November - AWS Re:Invent

Disclaimers
• We didn’t have time to tune the demo
• These are the plots from the live demo run
• Run’s need to be longer to get to steady state
• Data size only reached around 5GB per node
• Plenty of “I wonder why it did that” remains
• It’s a fair comparison, but not the best absolute
performance possible for this workload and
configuration
• When you remove the IO bottleneck, the next
few bottlenecks appear…

Activity during the talk 10:30-11:30
Custom AppDynamics dashboard showing CPU and IOPS per node

Jmeter Plots
• Plots are the output of the Jenkins build

• Each instance has its own set of plots

• Each availability zone has its own summary plots

• One of the three zone summary plots is compared for
each metric

• Plot collection is currently duplicated as we are
transitioning from “Epic” to “Atlas”

Jenkins
Collected results and graphs after job has completed

The past
m2.4xlarge

Instances
per zone

The future
hi1.4xlarge

The past
m2.4xlarge

Transactions
per zone, same
as total client
transactions

The future
hi1.4xlarge

The past
m2.4xlarge

The future
hi1.4xlarge

The past
m2.4xlarge

Thousands of
Microseconds

The future
hi1.4xlarge

The past
m2.4xlarge

Microseconds

The future
hi1.4xlarge

Takeaway

Netflix has built and deployed a scalable global platform based on
Cassandra and AWS.

Key components of the Netflix PaaS are being released as Open Source
projects so you can build your own custom PaaS.

If you like lots of SSD’s come and work for us….

http://github.com/Netflix
http://techblog.netflix.com
http://slideshare.net/Netflix

http://www.linkedin.com/in/adriancockcroft
@adrianco #netflixcloud #cassandra12

Cassandra Performance and Scalability on AWS

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Cassandra Performance and Scalability on AWS

Similar to Cassandra Performance and Scalability on AWS (20)

More from Adrian Cockcroft

More from Adrian Cockcroft (14)

Recently uploaded

Recently uploaded (20)

Cassandra Performance and Scalability on AWS

Editor's Notes