Cassandra On EC2


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cassandra On EC2

  1. 1. Cassandra On EC2Matthew F. Dennis // @mdennis @mdennis
  2. 2. Instance Sizes● m1.xlarge is by far the most common size● m1.large is ok for many use cases● m2.4xlarge in some cases ● keep the entire dataset in memory● c1.xlarge / cc1.4xlarge ● Smallish but very hot set of data – regardless of how much data is on disk ● Extremely high request rate ● Encrypted node-node communications and high traffic ● Usually better off with many m1.xlarge instances because of the extra memory, but not always @mdennis
  3. 3. Configuration● Stripe All Ephemeral Drives● data directory and commit log on same volume ● Only applies to EC2 and SSDs, not physical HW ● Why?● 6-8 GB heap on m1.xlarge● 3-4 GB heap on m1.large● Phi Convict Threshold? Maybe ... @mdennis
  4. 4. EBS versus Ephemeral● Ephemeral drives are: ● Generally faster for C* ● More stable (no pauses/freezes; outages?) ● Cheaper ● Easier to initially configure● Striped EBS? ● yeah, about that …● TL;DL dont use EBS for C* on EC2 @mdennis
  5. 5. Multi-Zone● Alternate zones in your token topology ● No really, this is important, alternate zones – We should probably fix this ...● “complicated, but possible” to add new zones after initial deployment● Never move a *token* to a different region or zone ● If you think that is what you want to do, really you want to bootstrap new one at token-1 in the new region/zone and then decom the old one @mdennis
  6. 6. Multi-Region C* on EC2● Connectivity is the complicated part ● Ec2MultiRegionSnitch is not the entire answer –● Dont try to make a “fail over” DC, just go with active-active ● If you insist, then do the fail over in your application and configure C* the same as you would active-active● Generally requires a lot more storage ● Doesnt matter though because youre using ephemeral drives (right?) and dont want a TB of data on each node anyway @mdennis
  7. 7. Multi-Region Connectivity Options● VPN● Encrypted node-node communication ● CPU utilization is often a downside● VPNCubed / VPCPlus ● Ive never deployed it, heard good things about it though● Amazon VPC ● anyone know if a single VPC can span regions yet?● SSH Tunnels● EC2 security groups● IPTables● Encrypted node-node + public IP binding + AWS security groups + IPTables (EIPs may simplify this, never actually tried it) @mdennis
  8. 8. Recovery From Failures● Dont “fix” EC2 nodes, replace them ● boostrap at token-1, remove old token – bootstrap can be slow, but will get better● Other than that its the same in EC2 as not ... @mdennis
  9. 9. Node Maintenance● “Maintenance” On EC2?● Usually not required (just replace the node)● If it is, just stop C*, CL+HH/repair/RR will fix it ● Same as physical HW ●● Stop Trying To Decom Nodes Just To Replace a Disk !!! @mdennis
  10. 10. Backups● C* snapshots and push to S3● Directory Watcher that pushes new files to S3 ● SimpleGeo:● Netflix:● Keep a log of all incoming writes ● Not specific to S3 ● Can be coupled with snapshots / S3 ● Useful for other reasons as well● Compression in transit to S3 (or where ever) can be done on a separate EC2 instance to avoid burning CPU ● Usually not worth the extra complexity / cost @mdennis
  11. 11. Changing Node Sizes● Start a new instance● rsync data from from original node to new node● Shutdown C* on original node● rsync data from from original node to new node● Start C* on new node● Shutdown original instance● NB: Assumes same token, region, zone, etc @mdennis
  12. 12. Elastic Load Balancers● Theyre awesome, use them ● Could be more awesome (e.g. better integration with Route 53) ● What I really want is TCP anycast for ELB across regions (AWS could make it work)● Balance across regions with GeoIP / GeoDNS ● Zerigo, TZOHA, Neustar, “homegrown”, etc ● Route 53? You wish (though Route 53 itself is run over anycast) – “in the future we plan for Route 53 to also give you greater control over … the route your users take to reach an endpoint” --Werner Vogels● Put them in front of your app servers, not your C* instances● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky sessions required) @mdennis
  13. 13. AMIs versus Scripted Setup● DataStax publishes C* AMIs● Chef Recipes as well● Or roll your own …● Whatever you do, just make sure its automated and repeatable● *personally* I prefer scripting the setup remotely, but this is … “less than ideal”● PSSH is, in general, awesome @mdennis
  14. 14. WTF?!● Your zone X is not the same as my zone X ● Consistent within an EC2 account ● Problematic across accounts ● Does not apply to regions (i.e. your region X is my region X)● EIPs resolve to private IPs from within AWS● EBS volumes sometimes just “freeze” ● AWS: “yeah, that happens sometimes under load”● steal% sometimes 20% or more (1%-3% is “normal”) ● This is AWS literally stealing your money ● Thankfully not all that common, but watch out for it @mdennis
  15. 15. Missing AWS Features● ELB over anycast ● Probably doable by AWS, but not others ...● GeoDNS from Route53 ● No really, WTF Doesnt Route53 Do GeoDNS ?!?!● Multi-Region VPC● Local SSDs @mdennis
  16. 16. Were Hiring !● Developers● QA● Community Manager● Sales / SE● Interns – Dev – Support – QA● Smart People Interested In Cassandra @mdennis
  17. 17. Cassandra On EC2 Q? (yes, Ill post the slides on slideshare) @mdennis
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.