• Save
Cassandra Performance and Scalability on AWS
Upcoming SlideShare
Loading in...5
×
 

Cassandra Performance and Scalability on AWS

on

  • 14,228 views

Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node ...

Summary of past Cassandra benchmarks performed by Netflix and description of how Netflix uses Cassandra interspersed with a live demo automated using Jenkins and Jmeter that created two 12 node Cassandra clusters from scratch on AWS, one with regular disks and one with SSDs. Both clusters were scaled up to 24 nodes each during the demo.

Statistics

Views

Total Views
14,228
Views on SlideShare
13,527
Embed Views
701

Actions

Likes
42
Downloads
0
Comments
3

14 Embeds 701

http://www.scoop.it 482
http://www.newvem.com 137
https://twitter.com 23
http://www.techgig.com 14
http://www.linkedin.com 9
http://us-w1.rockmelt.com 7
https://si0.twimg.com 6
http://tweets.valtellinux.it 5
http://webcache.googleusercontent.com 5
https://twimg0-a.akamaihd.net 5
http://tweetedtimes.com 4
http://confluence.netuitive.com 2
http://kred.com 1
https://www.linkedin.com 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Here's the video of the talk
    http://www.youtube.com/watch?v=Wo-zkUH1R8A
    Are you sure you want to
    Your message goes here
    Processing…
  • Thanks for posting this. Detailed reporting from the front lines is hard to come by. Noticed the new EC2 instances with SSD and wondered who was using them. We've had Intel 710 drives in production for six months underneath Postgresql in a few boxes and I doubt I'll buy another hard drive for a server.
    Are you sure you want to
    Your message goes here
    Processing…
  • The talk was recorded on video, when that's available I will link it here. The Jenkins automation and demo benchmark setup was written by Denis Sheahan, cass_jmeter is already available at netflix.github.com along with Priam and Asgard.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Complete connection pool abstractionQueries and mutations wrapped in objects created by the Keyspace implementation making it possible to retry failed operations.  This varies from other connection pool implementations on which the operation is created on a specific connection and must be completely redone if it fails.Simplified serialization via method overloading.  The low level thrift library only understands data that is serialized to a byte array.  Hector requires serializers to be specified for nearly every call.  Astyanax minimizes the places where serializers are specified by using predefined ColumnFamiliy and ColumnPath definitions which specify the serializers.  The API also overloads set and get operation for common data types.The internal library does not log anything.  All internal events are instead ... calls to a ConnectionPoolMonitor interface.  This allows customization of log levels and filtering of repeating events outside of the scope of the connection poolSuper columns will soon be replaced by Composite column names. As such it is recommended to not use super columns at all and to use Composite column names instead. There is some support for super columns in Astyanax but those methods have been deprecated and will eventually be removed.

Cassandra Performance and Scalability on AWS Cassandra Performance and Scalability on AWS Presentation Transcript

  • Cassandra Performance and Scalability on AWS August 8th, 2012 Adrian Cockcroft @adrianco #netflixcloud #cassandra12 http://www.linkedin.com/in/adriancockcroft
  • Things we don’t do
  • Things we do do. Run benchmarks. Now.
  • YOLO
  • Screenshots from Live Demo Backup slides from pre-runs of the demo with some updates to show what actually happened
  • Asgardcass_perf apps, with no instances running
  • JenkinsJenkins perf_test jobs
  • Jmeter SetupBuild parameters
  • Jmeter SetupBuild parameters
  • Jmeter SetupBuild parameters
  • AsgardIinitial set of cass instances up and running
  • Back to PresentationWhile the load gets going….
  • Scalability from 48 to 288 nodes on AWS http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Client Writes/s by node count – Replication Factor = 31200000 10998371000000 800000 Used 288 of m1.xlarge 600000 4 CPU, 15 GB RAM, 8 ECU 537172 Cassandra 0.86 400000 Benchmark config only 366828 existed for about 1hr 200000 174373 0 0 50 100 150 200 250 300 350
  • Blah Blah Blah (I’m skipping all the cloud intro etc. Netflix runs in the cloud, if you hadn’t figured that out already you aren’t paying attention and should go read slideshare.net/netflix)
  • “Some people skate to the puck,I skate to where the puck is going to be” Wayne Gretzky
  • Cassandra on AWSThe Past The Future• Instance: m2.4xlarge • Instance: hi1.4xlarge• Storage: 2 drives, 1.7TB • Storage: 2 SSD volumes, 2TB• CPU: 8 Cores, 26 ECU • CPU: 8 HT cores, 35 ECU• RAM: 68GB • RAM: 64GB• Network: 1Gbit • Network: 10Gbit• IOPS: ~500 • IOPS: ~100,000• Throughput: ~100Mbyte/s • Throughput: ~1Gbyte/s• Cost: $1.80/hr • Cost: $3.10/hr
  • Cassandra Disk vs. SSD Benchmark Same Throughput, Lower Latency, Half Cost
  • Live Demo Workload• Jenkins automation – Jmeter load driver – Asgard provisioning – Priam instance management• Traffic – Reading/writing whole 100 column rows – Randomly selected from 25M row keys – Run for 10minutes, then double ring size
  • The Netflix Streaming Service
  • Major Front End Services• Non-member Web Site www.netflix.com – Marketing driven, sign up flow, SOX/PCI scope• Member Web Site movies.netflix.com – Personalization driven• CDNs for delivering bulk video/audio – Netflix CDN: openconnect.netflix.com• API for external and device user interfaces – Mostly private APIs, public API docs at developer.netflix.com• API for controlling video playback – DRM, QoS management, Bookmarks
  • Netflix Deployed on AWS 2009 2009 2010 2010 2010 2011Content Logs Play WWW API CS Content S3 International DRM Sign-Up Metadata Management Terabytes CS lookup EC2 Diagnostics & EMR CDN routing Search Device Config Encoding Actions S3 Movie TV Movie Customer Call Hive & Pig Bookmarks Petabytes Choosing Choosing Log Business Social Logging Ratings CS Analytics Intelligence Facebook CDNs ISPs TerabitsCustomers
  • Cassandra Instance ArchitectureLinux Base AMI (CentOS) Priam Cassandra Manager Token Java7Management, Backups, Autoscaling Tomcat/Java7 AppDynamics Monitoring appagent monitoring Cassandra 1.09 Log rotationAppDynamics GC and threadmachineagent dump logging Etc.
  • Priam – Cassandra Automation Available at http://github.com/netflix• Netflix Platform Tomcat Code• Zero touch auto-configuration• State management for Cassandra JVM• Token allocation and assignment• Broken node auto-replacement• Full and incremental backup to S3• Restore sequencing from S3• Grow/Shrink Cassandra “ring”
  • Astyanax Available at http://github.com/netflix• Features – Complete abstraction of connection pool from RPC protocol – Fluent Style API – Operation retry with backoff – Token aware• Recipes – Distribute row lock (without zookeeper) – Multi-DC row lock – Uniqueness constraint – Multi-row uniqueness constraint – Large file storage
  • Scale UpReturn to live demo to watch new nodes coming online
  • Kiklos Clusters growing from 12 to 24 in-service, bootstrapping, garbage-collecting, cass-down• http://explorers.us-east- 1.dyntest.netflix.net:7001/jr/cassandradashbo ard
  • Kiklos Clusters growing from 12 to 24in-service, bootstrapping, garbage-collecting, cass-down
  • AsgardShowed 24 nodes per cluster, but didn’t get a screen shot
  • Back to PresentationWhile jenkins/jmeter collects graphs and shuts down the systems
  • Cassandra on AWSA highly available and durable deployment pattern
  • High Availability• Cassandra stores 3 local copies, 1 per zone – Synchronous access, durable, highly available – Read/Write One fastest, use for fire and forget – Read/Write Quorum 2 of 3, use for read-after-write• AWS Availability Zones – Separate buildings – Separate power etc. – Fairly close together
  • “Traditional” Cassandra Write Data Flows Single Region, Multiple Availability Zone, Not Token Aware Cassandra •Disks 2•Zone A 2 4 21. Client Writes to any Cassandra 3 3Cassandra If a node goes Cassandra Node •Disks5 •Disks 5 offline, hinted handoff2. Coordinator Node •Zone C 1 •Zone B completes the write replicates to nodes when the node comes Non Token and Zones back up.3. Nodes return ack to Aware coordinator Clients Requests can choose to4. Coordinator returns Cassandra Cassandra wait for one node, a ack to client •Disks •Disks quorum, or all nodes to5. Data written to •Zone B •Zone C ack the write internal commit log 3 disk (no more than Cassandra SSTable disk writes and 10 seconds later) •Disks 5 compactions occur •Zone A asynchronously
  • Astyanax - Cassandra Write Data Flows Single Region, Multiple Availability Zone, Token Aware Cassandra •Disks •Zone A1. Client Writes to Cassandra 2 2Cassandra If a node goes nodes and Zones •Disks3 •Disks 3 offline, hinted handoff2. Nodes return ack to •Zone C 1 •Zone B completes the write client Token when the node comes3. Data written to back up. internal commit log Aware disks (no more than Clients Requests can choose to 10 seconds later) Cassandra Cassandra wait for one node, a •Disks •Disks quorum, or all nodes to •Zone B •Zone C ack the write 2 Cassandra SSTable disk writes and •Disks 3 compactions occur •Zone A asynchronously
  • Data Flows for Multi-Region Writes Token Aware, Consistency Level = Local Quorum1. Client writes to local replicas If a node or region goes offline, hinted handoff2. Local write acks returned to completes the write when the node comes back up. Client which continues when Nightly global compare and repair jobs ensure 2 of 3 local nodes are everything stays consistent. committed3. Local coordinator writes to remote coordinator. 100+ms latency Cassandra Cassandra4. When data arrives, remote • Disks • Zone A • Disks • Zone A coordinator node acks and Cassandra 2 2 Cassandra Cassandra 4Cassandra 6 • Disks • Disks 6 3 5• Disks6 4 Disks6 copies to other remote zones • Zone C 1 • Zone B • Zone C • • Zone B 45. Remote nodes ack to local US EU coordinator Clients Clients Cassandra 2 Cassandra Cassandra Cassandra6. Data flushed to internal • Disks • Zone B • Disks • Zone C 6 • Disks • Zone B • Disks • Zone C commit log disks (no more Cassandra 5 6Cassandra • Disks than 10 seconds later) • Zone A • Disks • Zone A
  • Extending to Multi-Region Added production UK/Ireland support with no downtime Minimize impact on original cluster using bulk backup move1. Create cluster in EU Take a Boeing 737 on a domestic flight, upgrade it to a 747 by adding more engines, fuel and bigger wings2. Backup US cluster to S3 and fly it to Europe without landing it on the way…3. Restore backup in EU4. Local repair EU cluster5. Global repair/join Cassandra 100+ms latency Cassandra 1 • Disks • Disks • Zone A • Zone A Cassandra Cassandra Cassandra Cassandra • Disks • Disks • Disks • Disks • Zone C • Zone B • Zone C • Zone B US 5 EU Clients Clients Cassandra Cassandra Cassandra Cassandra • Disks • Disks • Disks • Disks • Zone B • Zone C • Zone B • Zone C Cassandra Cassandra • Disks • Disks • Zone A 3 • Zone A 4 2 S3
  • Cassandra Backup• Full Backup Cassandra Cassandra Cassandra – Time based snapshot – SSTable compress -> S3 Cassandra Cassandra• Incremental S3 Backup Cassandra Cassandra – SSTable write triggers compressed copy to S3 Cassandra Cassandra• Archive Cassandra Cassandra – Copy cross region A
  • ETL for Cassandra• Data is de-normalized over many clusters!• Too many to restore from backups for ETL• Solution – read backup files using Hadoop• Aegisthus – http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html – High throughput raw SSTable processing – Re-normalizes many clusters to a consistent view – Extract, Transform, then Load into Teradata
  • Netflix Open Source Strategy• Release PaaS Components git-by-git – Source at github.com/netflix – we build from it… – Intros and techniques at techblog.netflix.com – Blog post or new code every few weeks• Motivations – Give back to Apache licensed OSS community – Motivate, retain, hire top engineers – “Peer pressure” code cleanup, external contributions
  • Open Source Projects and Posts Legend Github / Techblog Priam Exhibitor Servo and Autoscaling Cassandra as a Service Zookeeper as a Service ScriptsApache Contributions Astyanax Honu Curator Techblog Post Cassandra client for Log4j streaming to Zookeeper Patterns Java Hadoop Coming Soon EVCache CassJMeter Circuit Breaker Memcached as a Cassandra test suite Robust service pattern Service Cassandra Asgard Eureka / Discovery Multi-region EC2 AutoScaleGroup based Service Directory datastore support AWS console Aegisthus Archaius Chaos Monkey Hadoop ETL for Dynamics Properties Robustness verification Cassandra Service
  • Chaos Monkeyhttp://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html• Computers (Datacenter or AWS) randomly die – Fact of life, but too infrequent to test resiliency• Test to make sure systems are resilient – Allow any instance to fail without customer impact• Chaos Monkey hours – Monday-Friday 9am-3pm random instance kill• Application configuration option – Apps now have to opt-out from Chaos Monkey
  • Asgardhttp://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html• Replacement for AWS Console at Scale – Groovy/Grails/JVM based – Supports all AWS regions on a global basis – Specific to AWS feature set• Hides the AWS credentials – Use AWS IAM to issue restricted keys for Asgard – Each Asgard instance manages one account – One install each for test, prod, audit accounts
  • Roadmap for 2012• More resiliency and improved availability• More automation, orchestration• “Hardening” the platform, code clean-up• Lower latency for web services and devices• IPv6 – running now, see techblog for details• More open sourced components• Las Vegas in November - AWS Re:Invent
  • Back to Live Demo
  • Disclaimers• We didn’t have time to tune the demo• These are the plots from the live demo run• Run’s need to be longer to get to steady state• Data size only reached around 5GB per node• Plenty of “I wonder why it did that” remains• It’s a fair comparison, but not the best absolute performance possible for this workload and configuration• When you remove the IO bottleneck, the next few bottlenecks appear…
  • Activity during the talk 10:30-11:30 Custom AppDynamics dashboard showing CPU and IOPS per node
  • Jmeter Plots• Plots are the output of the Jenkins build• Each instance has its own set of plots• Each availability zone has its own summary plots• One of the three zone summary plots is compared for each metric• Plot collection is currently duplicated as we are transitioning from “Epic” to “Atlas”
  • JenkinsCollected results and graphs after job has completed
  • The pastm2.4xlarge Instances per zoneThe futurehi1.4xlarge
  • The past m2.4xlargeTransactionsper zone, sameas total clienttransactions The future hi1.4xlarge
  • The pastm2.4xlargeThe futurehi1.4xlarge
  • The pastm2.4xlargeThousands ofMicrosecondsThe futurehi1.4xlarge
  • The pastm2.4xlargeMicrosecondsThe futurehi1.4xlarge
  • The pastm2.4xlargeThe futurehi1.4xlarge
  • The pastm2.4xlargeThe futurehi1.4xlarge
  • Takeaway Netflix has built and deployed a scalable global platform based on Cassandra and AWS.Key components of the Netflix PaaS are being released as Open Source projects so you can build your own custom PaaS. If you like lots of SSD’s come and work for us…. http://github.com/Netflix http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft @adrianco #netflixcloud #cassandra12