SlideShare a Scribd company logo
Cassandra On EC2



Matthew F. Dennis // @mdennis


                         @mdennis
Instance Sizes
●   m1.xlarge is by far the most common size
●   m1.large is ok for many use cases
●   m2.4xlarge in some cases
    ●   keep the entire dataset in memory
●   c1.xlarge / cc1.4xlarge
    ●   Smallish but very hot set of data
        –   regardless of how much data is on disk
    ●   Extremely high request rate
    ●   Encrypted node-node communications and high traffic
    ●   Usually better off with many m1.xlarge instances because of
        the extra memory, but not always

                                                          @mdennis
Configuration
●   Stripe All Ephemeral Drives
●   data directory and commit log on same volume
    ●   Only applies to EC2 and SSDs, not physical HW
    ●   Why?
●   6-8 GB heap on m1.xlarge
●   3-4 GB heap on m1.large
●   Phi Convict Threshold? Maybe ...



                                               @mdennis
EBS versus Ephemeral
●   Ephemeral drives are:
    ●   Generally faster for C*
    ●   More stable (no pauses/freezes; outages?)
    ●   Cheaper
    ●   Easier to initially configure
●   Striped EBS?
    ●   yeah, about that …
●   TL;DL don't use EBS for C* on EC2


                                                @mdennis
Multi-Zone
●   Alternate zones in your token topology
    ●   No really, this is important, alternate zones
        –   We should probably fix this ...
●   “complicated, but possible” to add new zones
    after initial deployment
●   Never move a *token* to a different region or
    zone
    ●   If you think that is what you want to do, really you
        want to bootstrap new one at token-1 in the new
        region/zone and then decom the old one

                                                     @mdennis
Multi-Region C* on EC2
●   Connectivity is the complicated part
    ●   Ec2MultiRegionSnitch is not the entire answer
        –   https://issues.apache.org/jira/browse/CASSANDRA-2452
●   Don't try to make a “fail over” DC, just go with active-active
    ●   If you insist, then do the fail over in your application and configure C* the
        same as you would active-active
●   Generally requires a lot more storage
    ●   Doesn't matter though because you're using ephemeral drives (right?)
        and don't want a TB of data on each node anyway




                                                                       @mdennis
Multi-Region Connectivity Options
●   VPN
●   Encrypted node-node communication
    ●   CPU utilization is often a downside
●   VPNCubed / VPCPlus
    ●   I've never deployed it, heard good things about it though
●   Amazon VPC
    ●   anyone know if a single VPC can span regions yet?
●   SSH Tunnels
●   EC2 security groups
●   IPTables
●   Encrypted node-node + public IP binding + AWS security groups +
    IPTables (EIPs may simplify this, never actually tried it)

                                                                    @mdennis
Recovery From Failures
●   Don't “fix” EC2 nodes, replace them
    ●   boostrap at token-1, remove old token
        –   bootstrap can be slow, but will get better

●   Other than that it's the same in EC2 as not ...




                                                         @mdennis
Node Maintenance
●   “Maintenance” On EC2?
●   Usually not required (just replace the node)
●   If it is, just stop C*, CL+HH/repair/RR will fix it
    ●   Same as physical HW
    ●   https://issues.apache.org/jira/browse/CASSANDRA-2034


●   Stop Trying To Decom Nodes Just To Replace a Disk !!!




                                                    @mdennis
Backups
●   C* snapshots and push to S3
●   Directory Watcher that pushes new files to S3
    ●   SimpleGeo: https://github.com/simplegeo/tablesnap
●   Netflix: http://slidesha.re/NFOnCassBkup
●   Keep a log of all incoming writes
    ●   Not specific to S3
    ●   Can be coupled with snapshots / S3
    ●   Useful for other reasons as well
●   Compression in transit to S3 (or where ever) can be done on
    a separate EC2 instance to avoid burning CPU
    ●   Usually not worth the extra complexity / cost

                                                            @mdennis
Changing Node Sizes
●   Start a new instance
●   rsync data from from original node to new node
●   Shutdown C* on original node
●   rsync data from from original node to new node
●   Start C* on new node
●   Shutdown original instance
●   NB: Assumes same token, region, zone, etc


                                          @mdennis
Elastic Load Balancers
● They're awesome, use them
  ● Could be more awesome (e.g. better integration with Route 53)


  ● What I really want is TCP anycast for ELB across regions (AWS could

    make it work)
● Balance across regions with GeoIP / GeoDNS


  ● Zerigo, TZOHA, Neustar, “homegrown”, etc


  ● Route 53? You wish (though Route 53 itself is run over anycast)


    – “in the future we plan for Route 53 to also give you greater control over
      … the route your users take to reach an endpoint” --Werner Vogels
● Put them in front of your app servers, not your C* instances


● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky

  sessions required)




                                                                  @mdennis
AMIs versus Scripted Setup
●   DataStax publishes C* AMIs
●   Chef Recipes as well
●   Or roll your own …
●   Whatever you do, just make sure it's automated
    and repeatable
●   *personally* I prefer scripting the setup
    remotely, but this is … “less than ideal”
●   PSSH is, in general, awesome

                                            @mdennis
WTF?!
●   Your zone X is not the same as my zone X
    ●   Consistent within an EC2 account
    ●   Problematic across accounts
    ●   Does not apply to regions (i.e. your region X is my region X)
●   EIPs resolve to private IPs from within AWS
●   EBS volumes sometimes just “freeze”
    ●   AWS: “yeah, that happens sometimes under load”
●   steal% sometimes 20% or more (1%-3% is “normal”)
    ●   This is AWS literally stealing your money
    ●   Thankfully not all that common, but watch out for it

                                                          @mdennis
Missing AWS Features
●   ELB over anycast
    ●   Probably doable by AWS, but not others ...
●   GeoDNS from Route53
    ●   No really, WTF Doesn't Route53 Do GeoDNS ?!?!
●   Multi-Region VPC
●   Local SSDs




                                                 @mdennis
We're Hiring !
●   Developers
●   QA
●   Community Manager
●   Sales / SE
●   Interns
       –   Dev
       –   Support
       –   QA
●   Smart People Interested In Cassandra


                                           @mdennis
Cassandra On EC2



    Q?
 (yes, I'll post the slides on slideshare)



                                         @mdennis

More Related Content

What's hot

Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
PostgreSQL Experts, Inc.
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
PostgreSQL Experts, Inc.
 
ops300 Week5 storage (1)
ops300 Week5 storage (1)ops300 Week5 storage (1)
ops300 Week5 storage (1)
trayyoo
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
Erik Osterman
 
Long Term Road Test of C*
Long Term Road Test of C*Long Term Road Test of C*
Long Term Road Test of C*
DataStax Academy
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
PostgreSQL Experts, Inc.
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
DataStax Academy
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
ScyllaDB
 
Redis ndc2013
Redis ndc2013Redis ndc2013
Redis ndc2013
DaeMyung Kang
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
PostgreSQL Experts, Inc.
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
Ontico
 
Tuning Linux for Databases.
Tuning Linux for Databases.Tuning Linux for Databases.
Tuning Linux for Databases.
Alexey Lesovsky
 
Storage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStackStorage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStack
ShapeBlue
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
Matthew Dennis
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
Ceph Community
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
ScyllaDB
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
PostgreSQL Experts, Inc.
 

What's hot (18)

Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
92 grand prix_2013
92 grand prix_201392 grand prix_2013
92 grand prix_2013
 
ops300 Week5 storage (1)
ops300 Week5 storage (1)ops300 Week5 storage (1)
ops300 Week5 storage (1)
 
Speeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using StarlingSpeeding up Page Load Times by Using Starling
Speeding up Page Load Times by Using Starling
 
Long Term Road Test of C*
Long Term Road Test of C*Long Term Road Test of C*
Long Term Road Test of C*
 
Shootout at the PAAS Corral
Shootout at the PAAS CorralShootout at the PAAS Corral
Shootout at the PAAS Corral
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
 
G1: To Infinity and Beyond
G1: To Infinity and BeyondG1: To Infinity and Beyond
G1: To Infinity and Beyond
 
Redis ndc2013
Redis ndc2013Redis ndc2013
Redis ndc2013
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
 
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
HighLoad Solutions On MySQL / Xiaobin Lin (Alibaba)
 
Tuning Linux for Databases.
Tuning Linux for Databases.Tuning Linux for Databases.
Tuning Linux for Databases.
 
Storage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStackStorage based snapshots for KVM VMs in CloudStack
Storage based snapshots for KVM VMs in CloudStack
 
Cassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data ModelingCassandra NYC 2011 Data Modeling
Cassandra NYC 2011 Data Modeling
 
Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt Using Ceph in OStack.de - Ceph Day Frankfurt
Using Ceph in OStack.de - Ceph Day Frankfurt
 
Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 

Similar to Cassandra On EC2

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
seldo
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Community
 
Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Management
basisspace
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Docker, Inc.
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
Omid Vahdaty
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community
 
A vision of persistence
A vision of persistenceA vision of persistence
A vision of persistence
Docker, Inc.
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
Ryan Hunter
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
Jérôme Petazzoni
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
Bret Fisher
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
Docker, Inc.
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
Yann Hamon
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
Ron Kuris
 
Kubernetes lessons learned
Kubernetes lessons learnedKubernetes lessons learned
Kubernetes lessons learned
Paul Guth
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
rkr10
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
lucenerevolution
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Anne Nicolas
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 

Similar to Cassandra On EC2 (20)

Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Ceph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wildCeph Day London 2014 - Deploying ceph in the wild
Ceph Day London 2014 - Deploying ceph in the wild
 
Efficient Buffer Management
Efficient Buffer ManagementEfficient Buffer Management
Efficient Buffer Management
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
 
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
 
Introduction to AWS Big Data
Introduction to AWS Big Data Introduction to AWS Big Data
Introduction to AWS Big Data
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
A vision of persistence
A vision of persistenceA vision of persistence
A vision of persistence
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo..."Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
 
Truemotion Adventures in Containerization
Truemotion Adventures in ContainerizationTruemotion Adventures in Containerization
Truemotion Adventures in Containerization
 
Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)Introduction to Docker (as presented at December 2013 Global Hackathon)
Introduction to Docker (as presented at December 2013 Global Hackathon)
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
 
Taking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and DecideTaking Docker to Production: What You Need to Know and Decide
Taking Docker to Production: What You Need to Know and Decide
 
Coredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS serverCoredns nodecache - A highly-available Node-cache DNS server
Coredns nodecache - A highly-available Node-cache DNS server
 
Cassandra from tarball to production
Cassandra   from tarball to productionCassandra   from tarball to production
Cassandra from tarball to production
 
Kubernetes lessons learned
Kubernetes lessons learnedKubernetes lessons learned
Kubernetes lessons learned
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun DuynsteeSolr on Windows: Does it Work? Does it Scale? - Teun Duynstee
Solr on Windows: Does it Work? Does it Scale? - Teun Duynstee
 
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farmKernel Recipes 2016 - Speeding up development by setting up a kernel build farm
Kernel Recipes 2016 - Speeding up development by setting up a kernel build farm
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 

Recently uploaded

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 

Recently uploaded (20)

Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 

Cassandra On EC2

  • 1. Cassandra On EC2 Matthew F. Dennis // @mdennis @mdennis
  • 2. Instance Sizes ● m1.xlarge is by far the most common size ● m1.large is ok for many use cases ● m2.4xlarge in some cases ● keep the entire dataset in memory ● c1.xlarge / cc1.4xlarge ● Smallish but very hot set of data – regardless of how much data is on disk ● Extremely high request rate ● Encrypted node-node communications and high traffic ● Usually better off with many m1.xlarge instances because of the extra memory, but not always @mdennis
  • 3. Configuration ● Stripe All Ephemeral Drives ● data directory and commit log on same volume ● Only applies to EC2 and SSDs, not physical HW ● Why? ● 6-8 GB heap on m1.xlarge ● 3-4 GB heap on m1.large ● Phi Convict Threshold? Maybe ... @mdennis
  • 4. EBS versus Ephemeral ● Ephemeral drives are: ● Generally faster for C* ● More stable (no pauses/freezes; outages?) ● Cheaper ● Easier to initially configure ● Striped EBS? ● yeah, about that … ● TL;DL don't use EBS for C* on EC2 @mdennis
  • 5. Multi-Zone ● Alternate zones in your token topology ● No really, this is important, alternate zones – We should probably fix this ... ● “complicated, but possible” to add new zones after initial deployment ● Never move a *token* to a different region or zone ● If you think that is what you want to do, really you want to bootstrap new one at token-1 in the new region/zone and then decom the old one @mdennis
  • 6. Multi-Region C* on EC2 ● Connectivity is the complicated part ● Ec2MultiRegionSnitch is not the entire answer – https://issues.apache.org/jira/browse/CASSANDRA-2452 ● Don't try to make a “fail over” DC, just go with active-active ● If you insist, then do the fail over in your application and configure C* the same as you would active-active ● Generally requires a lot more storage ● Doesn't matter though because you're using ephemeral drives (right?) and don't want a TB of data on each node anyway @mdennis
  • 7. Multi-Region Connectivity Options ● VPN ● Encrypted node-node communication ● CPU utilization is often a downside ● VPNCubed / VPCPlus ● I've never deployed it, heard good things about it though ● Amazon VPC ● anyone know if a single VPC can span regions yet? ● SSH Tunnels ● EC2 security groups ● IPTables ● Encrypted node-node + public IP binding + AWS security groups + IPTables (EIPs may simplify this, never actually tried it) @mdennis
  • 8. Recovery From Failures ● Don't “fix” EC2 nodes, replace them ● boostrap at token-1, remove old token – bootstrap can be slow, but will get better ● Other than that it's the same in EC2 as not ... @mdennis
  • 9. Node Maintenance ● “Maintenance” On EC2? ● Usually not required (just replace the node) ● If it is, just stop C*, CL+HH/repair/RR will fix it ● Same as physical HW ● https://issues.apache.org/jira/browse/CASSANDRA-2034 ● Stop Trying To Decom Nodes Just To Replace a Disk !!! @mdennis
  • 10. Backups ● C* snapshots and push to S3 ● Directory Watcher that pushes new files to S3 ● SimpleGeo: https://github.com/simplegeo/tablesnap ● Netflix: http://slidesha.re/NFOnCassBkup ● Keep a log of all incoming writes ● Not specific to S3 ● Can be coupled with snapshots / S3 ● Useful for other reasons as well ● Compression in transit to S3 (or where ever) can be done on a separate EC2 instance to avoid burning CPU ● Usually not worth the extra complexity / cost @mdennis
  • 11. Changing Node Sizes ● Start a new instance ● rsync data from from original node to new node ● Shutdown C* on original node ● rsync data from from original node to new node ● Start C* on new node ● Shutdown original instance ● NB: Assumes same token, region, zone, etc @mdennis
  • 12. Elastic Load Balancers ● They're awesome, use them ● Could be more awesome (e.g. better integration with Route 53) ● What I really want is TCP anycast for ELB across regions (AWS could make it work) ● Balance across regions with GeoIP / GeoDNS ● Zerigo, TZOHA, Neustar, “homegrown”, etc ● Route 53? You wish (though Route 53 itself is run over anycast) – “in the future we plan for Route 53 to also give you greater control over … the route your users take to reach an endpoint” --Werner Vogels ● Put them in front of your app servers, not your C* instances ● Keep your app servers stateless or at least “weakly” stateless (e.g. no sticky sessions required) @mdennis
  • 13. AMIs versus Scripted Setup ● DataStax publishes C* AMIs ● Chef Recipes as well ● Or roll your own … ● Whatever you do, just make sure it's automated and repeatable ● *personally* I prefer scripting the setup remotely, but this is … “less than ideal” ● PSSH is, in general, awesome @mdennis
  • 14. WTF?! ● Your zone X is not the same as my zone X ● Consistent within an EC2 account ● Problematic across accounts ● Does not apply to regions (i.e. your region X is my region X) ● EIPs resolve to private IPs from within AWS ● EBS volumes sometimes just “freeze” ● AWS: “yeah, that happens sometimes under load” ● steal% sometimes 20% or more (1%-3% is “normal”) ● This is AWS literally stealing your money ● Thankfully not all that common, but watch out for it @mdennis
  • 15. Missing AWS Features ● ELB over anycast ● Probably doable by AWS, but not others ... ● GeoDNS from Route53 ● No really, WTF Doesn't Route53 Do GeoDNS ?!?! ● Multi-Region VPC ● Local SSDs @mdennis
  • 16. We're Hiring ! ● Developers ● QA ● Community Manager ● Sales / SE ● Interns – Dev – Support – QA ● Smart People Interested In Cassandra @mdennis
  • 17. Cassandra On EC2 Q? (yes, I'll post the slides on slideshare) @mdennis