Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cassandra Performance and Scalability on AWS


Published on

Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.

Published in: Technology, Self Improvement
  • Here's the video of the talk
    Are you sure you want to  Yes  No
    Your message goes here
  • Thanks for posting this. Detailed reporting from the front lines is hard to come by. Noticed the new EC2 instances with SSD and wondered who was using them. We've had Intel 710 drives in production for six months underneath Postgresql in a few boxes and I doubt I'll buy another hard drive for a server.
    Are you sure you want to  Yes  No
    Your message goes here
  • The talk was recorded on video, when that's available I will link it here. The Jenkins automation and demo benchmark setup was written by Denis Sheahan, cass_jmeter is already available at along with Priam and Asgard.
    Are you sure you want to  Yes  No
    Your message goes here

Cassandra Performance and Scalability on AWS

  1. Cassandra Performance and Scalability on AWS August 8th, 2012 Adrian Cockcroft @adrianco #netflixcloud #cassandra12
  2. Things we don’t do
  3. Things we do do. Run benchmarks. Now.
  4. YOLO
  5. Screenshots from Live Demo Backup slides from pre-runs of the demo with some updates to show what actually happened
  6. Asgardcass_perf apps, with no instances running
  7. JenkinsJenkins perf_test jobs
  8. Jmeter SetupBuild parameters
  9. Jmeter SetupBuild parameters
  10. Jmeter SetupBuild parameters
  11. AsgardIinitial set of cass instances up and running
  12. Back to PresentationWhile the load gets going….
  13. Scalability from 48 to 288 nodes on AWS Client Writes/s by node count – Replication Factor = 31200000 10998371000000 800000 Used 288 of m1.xlarge 600000 4 CPU, 15 GB RAM, 8 ECU 537172 Cassandra 0.86 400000 Benchmark config only 366828 existed for about 1hr 200000 174373 0 0 50 100 150 200 250 300 350
  14. Blah Blah Blah (I’m skipping all the cloud intro etc. Netflix runs in the cloud, if you hadn’t figured that out already you aren’t paying attention and should go read
  15. “Some people skate to the puck,I skate to where the puck is going to be” Wayne Gretzky
  16. Cassandra on AWSThe Past The Future• Instance: m2.4xlarge • Instance: hi1.4xlarge• Storage: 2 drives, 1.7TB • Storage: 2 SSD volumes, 2TB• CPU: 8 Cores, 26 ECU • CPU: 8 HT cores, 35 ECU• RAM: 68GB • RAM: 64GB• Network: 1Gbit • Network: 10Gbit• IOPS: ~500 • IOPS: ~100,000• Throughput: ~100Mbyte/s • Throughput: ~1Gbyte/s• Cost: $1.80/hr • Cost: $3.10/hr
  17. Cassandra Disk vs. SSD Benchmark Same Throughput, Lower Latency, Half Cost
  18. Live Demo Workload• Jenkins automation – Jmeter load driver – Asgard provisioning – Priam instance management• Traffic – Reading/writing whole 100 column rows – Randomly selected from 25M row keys – Run for 10minutes, then double ring size
  19. The Netflix Streaming Service
  20. Major Front End Services• Non-member Web Site – Marketing driven, sign up flow, SOX/PCI scope• Member Web Site – Personalization driven• CDNs for delivering bulk video/audio – Netflix CDN:• API for external and device user interfaces – Mostly private APIs, public API docs at• API for controlling video playback – DRM, QoS management, Bookmarks
  21. Netflix Deployed on AWS 2009 2009 2010 2010 2010 2011Content Logs Play WWW API CS Content S3 International DRM Sign-Up Metadata Management Terabytes CS lookup EC2 Diagnostics & EMR CDN routing Search Device Config Encoding Actions S3 Movie TV Movie Customer Call Hive & Pig Bookmarks Petabytes Choosing Choosing Log Business Social Logging Ratings CS Analytics Intelligence Facebook CDNs ISPs TerabitsCustomers
  22. Cassandra Instance ArchitectureLinux Base AMI (CentOS) Priam Cassandra Manager Token Java7Management, Backups, Autoscaling Tomcat/Java7 AppDynamics Monitoring appagent monitoring Cassandra 1.09 Log rotationAppDynamics GC and threadmachineagent dump logging Etc.
  23. Priam – Cassandra Automation Available at• Netflix Platform Tomcat Code• Zero touch auto-configuration• State management for Cassandra JVM• Token allocation and assignment• Broken node auto-replacement• Full and incremental backup to S3• Restore sequencing from S3• Grow/Shrink Cassandra “ring”
  24. Astyanax Available at• Features – Complete abstraction of connection pool from RPC protocol – Fluent Style API – Operation retry with backoff – Token aware• Recipes – Distribute row lock (without zookeeper) – Multi-DC row lock – Uniqueness constraint – Multi-row uniqueness constraint – Large file storage
  25. Scale UpReturn to live demo to watch new nodes coming online
  26. Kiklos Clusters growing from 12 to 24 in-service, bootstrapping, garbage-collecting, cass-down• ard
  27. Kiklos Clusters growing from 12 to 24in-service, bootstrapping, garbage-collecting, cass-down
  28. AsgardShowed 24 nodes per cluster, but didn’t get a screen shot
  29. Back to PresentationWhile jenkins/jmeter collects graphs and shuts down the systems
  30. Cassandra on AWSA highly available and durable deployment pattern
  31. High Availability• Cassandra stores 3 local copies, 1 per zone – Synchronous access, durable, highly available – Read/Write One fastest, use for fire and forget – Read/Write Quorum 2 of 3, use for read-after-write• AWS Availability Zones – Separate buildings – Separate power etc. – Fairly close together
  32. “Traditional” Cassandra Write Data Flows Single Region, Multiple Availability Zone, Not Token Aware Cassandra •Disks 2•Zone A 2 4 21. Client Writes to any Cassandra 3 3Cassandra If a node goes Cassandra Node •Disks5 •Disks 5 offline, hinted handoff2. Coordinator Node •Zone C 1 •Zone B completes the write replicates to nodes when the node comes Non Token and Zones back up.3. Nodes return ack to Aware coordinator Clients Requests can choose to4. Coordinator returns Cassandra Cassandra wait for one node, a ack to client •Disks •Disks quorum, or all nodes to5. Data written to •Zone B •Zone C ack the write internal commit log 3 disk (no more than Cassandra SSTable disk writes and 10 seconds later) •Disks 5 compactions occur •Zone A asynchronously
  33. Astyanax - Cassandra Write Data Flows Single Region, Multiple Availability Zone, Token Aware Cassandra •Disks •Zone A1. Client Writes to Cassandra 2 2Cassandra If a node goes nodes and Zones •Disks3 •Disks 3 offline, hinted handoff2. Nodes return ack to •Zone C 1 •Zone B completes the write client Token when the node comes3. Data written to back up. internal commit log Aware disks (no more than Clients Requests can choose to 10 seconds later) Cassandra Cassandra wait for one node, a •Disks •Disks quorum, or all nodes to •Zone B •Zone C ack the write 2 Cassandra SSTable disk writes and •Disks 3 compactions occur •Zone A asynchronously
  34. Data Flows for Multi-Region Writes Token Aware, Consistency Level = Local Quorum1. Client writes to local replicas If a node or region goes offline, hinted handoff2. Local write acks returned to completes the write when the node comes back up. Client which continues when Nightly global compare and repair jobs ensure 2 of 3 local nodes are everything stays consistent. committed3. Local coordinator writes to remote coordinator. 100+ms latency Cassandra Cassandra4. When data arrives, remote • Disks • Zone A • Disks • Zone A coordinator node acks and Cassandra 2 2 Cassandra Cassandra 4Cassandra 6 • Disks • Disks 6 3 5• Disks6 4 Disks6 copies to other remote zones • Zone C 1 • Zone B • Zone C • • Zone B 45. Remote nodes ack to local US EU coordinator Clients Clients Cassandra 2 Cassandra Cassandra Cassandra6. Data flushed to internal • Disks • Zone B • Disks • Zone C 6 • Disks • Zone B • Disks • Zone C commit log disks (no more Cassandra 5 6Cassandra • Disks than 10 seconds later) • Zone A • Disks • Zone A
  35. Extending to Multi-Region Added production UK/Ireland support with no downtime Minimize impact on original cluster using bulk backup move1. Create cluster in EU Take a Boeing 737 on a domestic flight, upgrade it to a 747 by adding more engines, fuel and bigger wings2. Backup US cluster to S3 and fly it to Europe without landing it on the way…3. Restore backup in EU4. Local repair EU cluster5. Global repair/join Cassandra 100+ms latency Cassandra 1 • Disks • Disks • Zone A • Zone A Cassandra Cassandra Cassandra Cassandra • Disks • Disks • Disks • Disks • Zone C • Zone B • Zone C • Zone B US 5 EU Clients Clients Cassandra Cassandra Cassandra Cassandra • Disks • Disks • Disks • Disks • Zone B • Zone C • Zone B • Zone C Cassandra Cassandra • Disks • Disks • Zone A 3 • Zone A 4 2 S3
  36. Cassandra Backup• Full Backup Cassandra Cassandra Cassandra – Time based snapshot – SSTable compress -> S3 Cassandra Cassandra• Incremental S3 Backup Cassandra Cassandra – SSTable write triggers compressed copy to S3 Cassandra Cassandra• Archive Cassandra Cassandra – Copy cross region A
  37. ETL for Cassandra• Data is de-normalized over many clusters!• Too many to restore from backups for ETL• Solution – read backup files using Hadoop• Aegisthus – – High throughput raw SSTable processing – Re-normalizes many clusters to a consistent view – Extract, Transform, then Load into Teradata
  38. Netflix Open Source Strategy• Release PaaS Components git-by-git – Source at – we build from it… – Intros and techniques at – Blog post or new code every few weeks• Motivations – Give back to Apache licensed OSS community – Motivate, retain, hire top engineers – “Peer pressure” code cleanup, external contributions
  39. Open Source Projects and Posts Legend Github / Techblog Priam Exhibitor Servo and Autoscaling Cassandra as a Service Zookeeper as a Service ScriptsApache Contributions Astyanax Honu Curator Techblog Post Cassandra client for Log4j streaming to Zookeeper Patterns Java Hadoop Coming Soon EVCache CassJMeter Circuit Breaker Memcached as a Cassandra test suite Robust service pattern Service Cassandra Asgard Eureka / Discovery Multi-region EC2 AutoScaleGroup based Service Directory datastore support AWS console Aegisthus Archaius Chaos Monkey Hadoop ETL for Dynamics Properties Robustness verification Cassandra Service
  40. Chaos Monkey• Computers (Datacenter or AWS) randomly die – Fact of life, but too infrequent to test resiliency• Test to make sure systems are resilient – Allow any instance to fail without customer impact• Chaos Monkey hours – Monday-Friday 9am-3pm random instance kill• Application configuration option – Apps now have to opt-out from Chaos Monkey
  41. Asgard• Replacement for AWS Console at Scale – Groovy/Grails/JVM based – Supports all AWS regions on a global basis – Specific to AWS feature set• Hides the AWS credentials – Use AWS IAM to issue restricted keys for Asgard – Each Asgard instance manages one account – One install each for test, prod, audit accounts
  42. Roadmap for 2012• More resiliency and improved availability• More automation, orchestration• “Hardening” the platform, code clean-up• Lower latency for web services and devices• IPv6 – running now, see techblog for details• More open sourced components• Las Vegas in November - AWS Re:Invent
  43. Back to Live Demo
  44. Disclaimers• We didn’t have time to tune the demo• These are the plots from the live demo run• Run’s need to be longer to get to steady state• Data size only reached around 5GB per node• Plenty of “I wonder why it did that” remains• It’s a fair comparison, but not the best absolute performance possible for this workload and configuration• When you remove the IO bottleneck, the next few bottlenecks appear…
  45. Activity during the talk 10:30-11:30 Custom AppDynamics dashboard showing CPU and IOPS per node
  46. Jmeter Plots• Plots are the output of the Jenkins build• Each instance has its own set of plots• Each availability zone has its own summary plots• One of the three zone summary plots is compared for each metric• Plot collection is currently duplicated as we are transitioning from “Epic” to “Atlas”
  47. JenkinsCollected results and graphs after job has completed
  48. The pastm2.4xlarge Instances per zoneThe futurehi1.4xlarge
  49. The past m2.4xlargeTransactionsper zone, sameas total clienttransactions The future hi1.4xlarge
  50. The pastm2.4xlargeThe futurehi1.4xlarge
  51. The pastm2.4xlargeThousands ofMicrosecondsThe futurehi1.4xlarge
  52. The pastm2.4xlargeMicrosecondsThe futurehi1.4xlarge
  53. The pastm2.4xlargeThe futurehi1.4xlarge
  54. The pastm2.4xlargeThe futurehi1.4xlarge
  55. Takeaway Netflix has built and deployed a scalable global platform based on Cassandra and AWS.Key components of the Netflix PaaS are being released as Open Source projects so you can build your own custom PaaS. If you like lots of SSD’s come and work for us…. @adrianco #netflixcloud #cassandra12