SlideShare a Scribd company logo
1 of 43
Ben Bromhead
Cassandra… Every day I’m scaling
2© DataStax, All Rights Reserved.
Who am I and What do I do?
• Co-founder and CTO of Instaclustr -> www.instaclustr.com
• Instaclustr provides Cassandra-as-a-Service in the cloud.
• Currently support AWS, Azure, Heroku, Softlayer and Private DCs with more to come.
• Approaching 1000 nodes under management
• Yes… we are hiring! Come live in Australia!
© DataStax, All Rights Reserved. 3
1 Why scaling sucks in Cassandra
2 It gets better
3 Then it gets really awesome
4© DataStax, All Rights Reserved.
Linear Scalability – In theory
© DataStax, All Rights Reserved. 5
Linear Scalability – In practice
© DataStax, All Rights Reserved. 6
What’s supposed to happen
• Scaling Cassandra is just “bootstrap new nodes”
• That works if your cluster is under provisioned and has 30% disk usage
© DataStax, All Rights Reserved. 7
What actually happens
• Add 1 node
• Bootstrapping node fails (1 day)
• WTF - Full disk on bootstrapping node? (5 minutes)
• If STCS run SSTableSplit on large SSTables on original nodes (2 days)
• Attach super sized network storage (EBS) and bind mount to bootstrapping node.
© DataStax, All Rights Reserved. 8
What actually happens
• Restart bootstrapping process
• Disk alert 70% (2 days later)
• Throttle streaming throughput to below compaction throughput
• Bootstrapping finishes (5 days later)
• Cluster latency spikes cause bootstrap finished but their was a million compactions
remaining
• Take node offline and let compaction finish
• Run repair on node (10 years)
• Add next node.
© DataStax, All Rights Reserved. 9
What actually happens
© DataStax, All Rights Reserved. 10
Scalability in Cassandra sucks
• Soo much over streaming
• LCS and Bootstrap – Over stream then compact all the data!
• STCS and bootstrap – Over stream all the data and run out of disk space
© DataStax, All Rights Reserved. 11
Scalability in Cassandra sucks
• No vnodes? – Can only double your cluster
• Vnodes? – Can only add one node at a time
• Bootstrap – Fragile and not guaranteed to be consistent
© DataStax, All Rights Reserved. 12
Why does it suck for you
Your database never meets your business requirements from a capacity perspective
(bad) and if you try…
• You could interrupt availability and performance (really bad)
• You could loose data (really really bad)
© DataStax, All Rights Reserved. 13
How did it get this way?
It’s actually a hard problem:
• Moving large amounts of data between nodes requires just as much attention to it
from a CAP perspective as client facing stuff.
• New features don’t tend to consider impact on scaling operations
• Features that help ops tends to be less sexy
© DataStax, All Rights Reserved. 14
Does it get better?
© DataStax, All Rights Reserved. 15
Yes!
Does it get better? Consistent bootstrap
Strongly consistent membership and ownership – CASSANDRA-9667
• Using LWT to propose and claim ownership of new token allocations in a consistent
manner
• Work in progress
• You can do this today by pre-assigning non-overlapping (inc replicas) vnode tokens
and using cassandra.consistent.simultaneousmoves.allow=true as a JVM property
before bootstrapping your nodes
© DataStax, All Rights Reserved. 16
Does it get better? Bootstrap stability
Keep-alives for all streaming operations – CASSANDRA-11841
• Currently implements a timeout, you can reduce this to be more aggressive, but large
SSTables will then never stream
Resummable bootstrap – CASSANDRA-8942 & CASSANDRA-8838
• You can do this in 2.2+
Incremental bootstrap – CASSANDRA-8494
• Being worked on, hard to do with vnodes right now (try it… the error message uses
the word “thusly”), instead throttle streaming and uncap compaction to ensure the
node doesn’t get overloaded during bootstrap
© DataStax, All Rights Reserved. 17
Can we make it even better?
© DataStax, All Rights Reserved. 18
Yes!
Can we make it even better?
© DataStax, All Rights Reserved. 19
• Let’s try scaling without data ownership changes
• Take advantage of Cassandras normal partition and availability mechanisms
• With a little help from our cloud providers!
Introducing token pinned scaling
© DataStax, All Rights Reserved. 20
• Probably needs a better name
• Here is how it works
Introducing token pinned scaling
© DataStax, All Rights Reserved. 21
With the introduction of:
• Partitioning SSTables by Range (CASSANDRA-6696)
• Range Aware Compaction (CASSANDRA-10540)
• A few extra lines of code to save/load a map of token to disks (coming soon)
Cassandra will now keep data associated with specific tokens in a single data directory,
this could let us treat a disk as a unit in which to scale around!
But first what do these two features actually let us do?
Introducing token pinned scaling
© DataStax, All Rights Reserved. 22
Before Partitioning SSTables by Range and Range Aware Compaction:
1 - 100
901 - 1000
1401-1500
Disk0 Disk1
SSTables
Introducing token pinned scaling
© DataStax, All Rights Reserved. 23
After Partitioning SSTables by Range and Range Aware Compaction:
1 - 100
901 - 1000
1401-1500
Disk0 Disk1
SSTables
Data within a token range is now kept on a specific disk
Introducing token pinned scaling
© DataStax, All Rights Reserved. 24
Your SSTables will converge to contain a single vnode range when things get big enough
1 - 100
901 - 1000
1401-1500
Disk0 Disk1
SSTables
Leveraging EBS to separate I/O from CPU
© DataStax, All Rights Reserved. 25
• Amazon Web Services provides a networked attached block store called EBS (Elastic
Block Store).
• Isolated to each availability zone
• We can attach and reattach EBS disk ad-hoc and in seconds/minutes
Adding it all together
© DataStax, All Rights Reserved. 26
• Make each EBS disk a data directory in Cassandra
• Cassandra guarantees only data from a specific token range will exist on a given disk
• When throughput is low attach all disks in a single AZ to a single node, specify all the
ranges from each disk via a comma separated list of tokens.
• Up to 40 disks per instance!
• When load is high, launch more instances and spread the disks across the new
instances.
Adding it all together
© DataStax, All Rights Reserved. 27
• Make each EBS disk a data directory in Cassandra
sda
sdd
sdb
sde
sdc
sdf
Amazon EBS
Adding it all together
© DataStax, All Rights Reserved. 28
• Cassandra guarantees only data from a specific token range will exist on a given disk
Amazon EBS
Adding it all together
© DataStax, All Rights Reserved. 29
• When throughput is low attach all disks in a single AZ to a single node
Amazon EBS
200 op/s
Adding it all together
© DataStax, All Rights Reserved. 30
• When load is high, launch more instances and spread the disks across the new instances.
Amazon EBS
10,000 op/s
How it works - Scaling
© DataStax, All Rights Reserved. 31
• Normally you have to provision your cluster at your maximum operations per second +
30% (headroom in case your get it wrong).
• Provision enough IOPS, CPU, RAM etc
• Makes Cassandra an $$$ solution
Provisioned workload
Actual workload
© DataStax, All Rights Reserved. 32
How it works - Scaling
© DataStax, All Rights Reserved. 33
• Let’s make our resources match our workload
Provisioned IOPS workload
Actual workload
Provisioned CPU & RAM workload
How it works - Scaling
© DataStax, All Rights Reserved. 34
• Let’s make our resources match our workload
Provisioned IOPS workload
Actual workload
Provisioned CPU & RAM workload
How it works - Consistency
© DataStax, All Rights Reserved. 35
• No range movements! You don’t need a Jepsen test to see how bad range
movements are for consistency.
• Tokens and ranges are fixed during all scaling operations
• Range movements are where you see most consistency badness in Cassandra
(bootstrap, node replacement, decommission) and need to rely on repair.
How it works - Consistency
© DataStax, All Rights Reserved. 36
• Treats Racks as a giant “meta-node”, network topology strategy ensures replicas are
on different racks.
• AWS Rack == AZ
• As tokens for a node change based on the disk they have, replica topology stays the
same
• You can only swap disks between instances within the same AZ
• Scale one rack at a time… scale your cluster in constant time!
• If you want to do this with a single rack, you will have a bad time
How it works - Consistency
© DataStax, All Rights Reserved. 37
1,5,10
2,6,11
3, 22, 44
4,23,45
102,134,167
101,122,155
1,2,3,4,5,6,10,11,22,23,44,45,101 …
How it works - Consistency
© DataStax, All Rights Reserved. 38
1,5,10
2,6,11
3, 22, 44
4,23,45
102,134,167
101,122,155
1,5,10
2,6,11
3,22,44
4,23,45
102,134,167
101,122,155
How it works - TODO
© DataStax, All Rights Reserved. 39
Some issues remain:
• Hinted handoff breaks (handoff is based on endpoint rather than token)
• Time for gossip to settle on any decent sized cluster
• Currently just clearing out the system.local folder to allow booting
• Can’t do this while repair is running… for some people this is all the time
• You’ll need to run repair more often as scaling intentionally introduces outages
• Breaks consistency and everything where RF > number of racks (usually the
system_auth keyspace).
• More work needed!
How it works – Real world
© DataStax, All Rights Reserved. 40
• No production tests yet 
• Have gone from a 3 node cluster to a 36 node cluster in around 50 minutes.
• Plenty left to optimize (e.g. bake everything into an AMI to reduce startup time)
• Could get this down to 10 minutes per rack depending on how responsive AWS is!
• No performance overhead compared to Cassandra on EBS.
• Check out the code here: https://github.com/benbromhead/Cassandra/tree/ic-token-
pinning
How it works – Real world
© DataStax, All Rights Reserved. 41
• Really this is bending some new and impending changes to do funky stuff 
Questions?
Questions?

More Related Content

What's hot

Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...DataStax
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...DataStax
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applicationsBen Slater
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsJulien Anguenot
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...DataStax
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in CassandraShogo Hoshii
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...DataStax
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japanHiromitsu Komatsu
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCaleb Rackliffe
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityHiromitsu Komatsu
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...DataStax
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsJulien Anguenot
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...DataStax
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...DataStax
 

What's hot (20)

Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
Apache Cassandra Multi-Datacenter Essentials (Julien Anguenot, iLand Internet...
 
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
What We Learned About Cassandra While Building go90 (Christopher Webster & Th...
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
Tales From the Field: The Wrong Way of Using Cassandra (Carlos Rolo, Pythian)...
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
Clock Skew and Other Annoying Realities in Distributed Systems (Donny Nadolny...
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Instaclustr webinar 2017 feb 08 japan
Instaclustr webinar 2017 feb 08   japanInstaclustr webinar 2017 feb 08   japan
Instaclustr webinar 2017 feb 08 japan
 
Cassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE SearchCassandra Summit 2015: Intro to DSE Search
Cassandra Summit 2015: Intro to DSE Search
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
Cassandra on Mesos Across Multiple Datacenters at Uber (Abhishek Verma) | C* ...
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
 
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
A Detailed Look At cassandra.yaml (Edward Capriolo, The Last Pickle) | Cassan...
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
 

Viewers also liked

How Jenkins Builds the Netflix Global Streaming Service
How Jenkins Builds the Netflix Global Streaming ServiceHow Jenkins Builds the Netflix Global Streaming Service
How Jenkins Builds the Netflix Global Streaming ServiceGareth Bowles
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonDataStax Academy
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraEric Evans
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLEric Evans
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internalsnarsiman
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
Database Performance Analysis with Time Series
Database Performance Analysis with Time SeriesDatabase Performance Analysis with Time Series
Database Performance Analysis with Time SeriesGwen (Chen) Shapira
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureDan McKinley
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationSchubert Zhang
 

Viewers also liked (11)

How Jenkins Builds the Netflix Global Streaming Service
How Jenkins Builds the Netflix Global Streaming ServiceHow Jenkins Builds the Netflix Global Streaming Service
How Jenkins Builds the Netflix Global Streaming Service
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick BransonC* Summit 2013: Cassandra at Instagram by Rick Branson
C* Summit 2013: Cassandra at Instagram by Rick Branson
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
Cassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQLCassandra: Not Just NoSQL, It's MoSQL
Cassandra: Not Just NoSQL, It's MoSQL
 
Cassandra internals
Cassandra internalsCassandra internals
Cassandra internals
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
Database Performance Analysis with Time Series
Database Performance Analysis with Time SeriesDatabase Performance Analysis with Time Series
Database Performance Analysis with Time Series
 
Etsy Activity Feeds Architecture
Etsy Activity Feeds ArchitectureEtsy Activity Feeds Architecture
Etsy Activity Feeds Architecture
 
Cassandra Compression and Performance Evaluation
Cassandra Compression and Performance EvaluationCassandra Compression and Performance Evaluation
Cassandra Compression and Performance Evaluation
 

Similar to Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsJeff Jirsa
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Johnny Miller
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Ceph Community
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax Academy
 
Cassandra Bootstap from Backups
Cassandra Bootstap from BackupsCassandra Bootstap from Backups
Cassandra Bootstap from BackupsInstaclustr
 
Cassandra Bootstrap from Backups
Cassandra Bootstrap from BackupsCassandra Bootstrap from Backups
Cassandra Bootstrap from BackupsInstaclustr
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxManish Maheshwari
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successErick Ramirez
 
DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920Daniel Cohen
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaManish Maheshwari
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld
 

Similar to Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016 (20)

M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Using Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series WorkloadsUsing Time Window Compaction Strategy For Time Series Workloads
Using Time Window Compaction Strategy For Time Series Workloads
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
DataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The SequelDataStax: Extreme Cassandra Optimization: The Sequel
DataStax: Extreme Cassandra Optimization: The Sequel
 
Cassandra Bootstap from Backups
Cassandra Bootstap from BackupsCassandra Bootstap from Backups
Cassandra Bootstap from Backups
 
Cassandra Bootstrap from Backups
Cassandra Bootstrap from BackupsCassandra Bootstrap from Backups
Cassandra Bootstrap from Backups
 
Strata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptxStrata London 2019 Scaling Impala.pptx
Strata London 2019 Scaling Impala.pptx
 
CASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for successCASSANDRA MEETUP - Choosing the right cloud instances for success
CASSANDRA MEETUP - Choosing the right cloud instances for success
 
DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920DataStax Enterprise in the Field – 20160920
DataStax Enterprise in the Field – 20160920
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Cassandra
CassandraCassandra
Cassandra
 
Strata London 2019 Scaling Impala
Strata London 2019 Scaling ImpalaStrata London 2019 Scaling Impala
Strata London 2019 Scaling Impala
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
SFHUG Kudu Talk
SFHUG Kudu TalkSFHUG Kudu Talk
SFHUG Kudu Talk
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataDatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 

Recently uploaded (20)

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 

Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016

  • 2. 2© DataStax, All Rights Reserved.
  • 3. Who am I and What do I do? • Co-founder and CTO of Instaclustr -> www.instaclustr.com • Instaclustr provides Cassandra-as-a-Service in the cloud. • Currently support AWS, Azure, Heroku, Softlayer and Private DCs with more to come. • Approaching 1000 nodes under management • Yes… we are hiring! Come live in Australia! © DataStax, All Rights Reserved. 3
  • 4. 1 Why scaling sucks in Cassandra 2 It gets better 3 Then it gets really awesome 4© DataStax, All Rights Reserved.
  • 5. Linear Scalability – In theory © DataStax, All Rights Reserved. 5
  • 6. Linear Scalability – In practice © DataStax, All Rights Reserved. 6
  • 7. What’s supposed to happen • Scaling Cassandra is just “bootstrap new nodes” • That works if your cluster is under provisioned and has 30% disk usage © DataStax, All Rights Reserved. 7
  • 8. What actually happens • Add 1 node • Bootstrapping node fails (1 day) • WTF - Full disk on bootstrapping node? (5 minutes) • If STCS run SSTableSplit on large SSTables on original nodes (2 days) • Attach super sized network storage (EBS) and bind mount to bootstrapping node. © DataStax, All Rights Reserved. 8
  • 9. What actually happens • Restart bootstrapping process • Disk alert 70% (2 days later) • Throttle streaming throughput to below compaction throughput • Bootstrapping finishes (5 days later) • Cluster latency spikes cause bootstrap finished but their was a million compactions remaining • Take node offline and let compaction finish • Run repair on node (10 years) • Add next node. © DataStax, All Rights Reserved. 9
  • 10. What actually happens © DataStax, All Rights Reserved. 10
  • 11. Scalability in Cassandra sucks • Soo much over streaming • LCS and Bootstrap – Over stream then compact all the data! • STCS and bootstrap – Over stream all the data and run out of disk space © DataStax, All Rights Reserved. 11
  • 12. Scalability in Cassandra sucks • No vnodes? – Can only double your cluster • Vnodes? – Can only add one node at a time • Bootstrap – Fragile and not guaranteed to be consistent © DataStax, All Rights Reserved. 12
  • 13. Why does it suck for you Your database never meets your business requirements from a capacity perspective (bad) and if you try… • You could interrupt availability and performance (really bad) • You could loose data (really really bad) © DataStax, All Rights Reserved. 13
  • 14. How did it get this way? It’s actually a hard problem: • Moving large amounts of data between nodes requires just as much attention to it from a CAP perspective as client facing stuff. • New features don’t tend to consider impact on scaling operations • Features that help ops tends to be less sexy © DataStax, All Rights Reserved. 14
  • 15. Does it get better? © DataStax, All Rights Reserved. 15 Yes!
  • 16. Does it get better? Consistent bootstrap Strongly consistent membership and ownership – CASSANDRA-9667 • Using LWT to propose and claim ownership of new token allocations in a consistent manner • Work in progress • You can do this today by pre-assigning non-overlapping (inc replicas) vnode tokens and using cassandra.consistent.simultaneousmoves.allow=true as a JVM property before bootstrapping your nodes © DataStax, All Rights Reserved. 16
  • 17. Does it get better? Bootstrap stability Keep-alives for all streaming operations – CASSANDRA-11841 • Currently implements a timeout, you can reduce this to be more aggressive, but large SSTables will then never stream Resummable bootstrap – CASSANDRA-8942 & CASSANDRA-8838 • You can do this in 2.2+ Incremental bootstrap – CASSANDRA-8494 • Being worked on, hard to do with vnodes right now (try it… the error message uses the word “thusly”), instead throttle streaming and uncap compaction to ensure the node doesn’t get overloaded during bootstrap © DataStax, All Rights Reserved. 17
  • 18. Can we make it even better? © DataStax, All Rights Reserved. 18 Yes!
  • 19. Can we make it even better? © DataStax, All Rights Reserved. 19 • Let’s try scaling without data ownership changes • Take advantage of Cassandras normal partition and availability mechanisms • With a little help from our cloud providers!
  • 20. Introducing token pinned scaling © DataStax, All Rights Reserved. 20 • Probably needs a better name • Here is how it works
  • 21. Introducing token pinned scaling © DataStax, All Rights Reserved. 21 With the introduction of: • Partitioning SSTables by Range (CASSANDRA-6696) • Range Aware Compaction (CASSANDRA-10540) • A few extra lines of code to save/load a map of token to disks (coming soon) Cassandra will now keep data associated with specific tokens in a single data directory, this could let us treat a disk as a unit in which to scale around! But first what do these two features actually let us do?
  • 22. Introducing token pinned scaling © DataStax, All Rights Reserved. 22 Before Partitioning SSTables by Range and Range Aware Compaction: 1 - 100 901 - 1000 1401-1500 Disk0 Disk1 SSTables
  • 23. Introducing token pinned scaling © DataStax, All Rights Reserved. 23 After Partitioning SSTables by Range and Range Aware Compaction: 1 - 100 901 - 1000 1401-1500 Disk0 Disk1 SSTables Data within a token range is now kept on a specific disk
  • 24. Introducing token pinned scaling © DataStax, All Rights Reserved. 24 Your SSTables will converge to contain a single vnode range when things get big enough 1 - 100 901 - 1000 1401-1500 Disk0 Disk1 SSTables
  • 25. Leveraging EBS to separate I/O from CPU © DataStax, All Rights Reserved. 25 • Amazon Web Services provides a networked attached block store called EBS (Elastic Block Store). • Isolated to each availability zone • We can attach and reattach EBS disk ad-hoc and in seconds/minutes
  • 26. Adding it all together © DataStax, All Rights Reserved. 26 • Make each EBS disk a data directory in Cassandra • Cassandra guarantees only data from a specific token range will exist on a given disk • When throughput is low attach all disks in a single AZ to a single node, specify all the ranges from each disk via a comma separated list of tokens. • Up to 40 disks per instance! • When load is high, launch more instances and spread the disks across the new instances.
  • 27. Adding it all together © DataStax, All Rights Reserved. 27 • Make each EBS disk a data directory in Cassandra sda sdd sdb sde sdc sdf Amazon EBS
  • 28. Adding it all together © DataStax, All Rights Reserved. 28 • Cassandra guarantees only data from a specific token range will exist on a given disk Amazon EBS
  • 29. Adding it all together © DataStax, All Rights Reserved. 29 • When throughput is low attach all disks in a single AZ to a single node Amazon EBS 200 op/s
  • 30. Adding it all together © DataStax, All Rights Reserved. 30 • When load is high, launch more instances and spread the disks across the new instances. Amazon EBS 10,000 op/s
  • 31. How it works - Scaling © DataStax, All Rights Reserved. 31 • Normally you have to provision your cluster at your maximum operations per second + 30% (headroom in case your get it wrong). • Provision enough IOPS, CPU, RAM etc • Makes Cassandra an $$$ solution Provisioned workload Actual workload
  • 32. © DataStax, All Rights Reserved. 32
  • 33. How it works - Scaling © DataStax, All Rights Reserved. 33 • Let’s make our resources match our workload Provisioned IOPS workload Actual workload Provisioned CPU & RAM workload
  • 34. How it works - Scaling © DataStax, All Rights Reserved. 34 • Let’s make our resources match our workload Provisioned IOPS workload Actual workload Provisioned CPU & RAM workload
  • 35. How it works - Consistency © DataStax, All Rights Reserved. 35 • No range movements! You don’t need a Jepsen test to see how bad range movements are for consistency. • Tokens and ranges are fixed during all scaling operations • Range movements are where you see most consistency badness in Cassandra (bootstrap, node replacement, decommission) and need to rely on repair.
  • 36. How it works - Consistency © DataStax, All Rights Reserved. 36 • Treats Racks as a giant “meta-node”, network topology strategy ensures replicas are on different racks. • AWS Rack == AZ • As tokens for a node change based on the disk they have, replica topology stays the same • You can only swap disks between instances within the same AZ • Scale one rack at a time… scale your cluster in constant time! • If you want to do this with a single rack, you will have a bad time
  • 37. How it works - Consistency © DataStax, All Rights Reserved. 37 1,5,10 2,6,11 3, 22, 44 4,23,45 102,134,167 101,122,155 1,2,3,4,5,6,10,11,22,23,44,45,101 …
  • 38. How it works - Consistency © DataStax, All Rights Reserved. 38 1,5,10 2,6,11 3, 22, 44 4,23,45 102,134,167 101,122,155 1,5,10 2,6,11 3,22,44 4,23,45 102,134,167 101,122,155
  • 39. How it works - TODO © DataStax, All Rights Reserved. 39 Some issues remain: • Hinted handoff breaks (handoff is based on endpoint rather than token) • Time for gossip to settle on any decent sized cluster • Currently just clearing out the system.local folder to allow booting • Can’t do this while repair is running… for some people this is all the time • You’ll need to run repair more often as scaling intentionally introduces outages • Breaks consistency and everything where RF > number of racks (usually the system_auth keyspace). • More work needed!
  • 40. How it works – Real world © DataStax, All Rights Reserved. 40 • No production tests yet  • Have gone from a 3 node cluster to a 36 node cluster in around 50 minutes. • Plenty left to optimize (e.g. bake everything into an AMI to reduce startup time) • Could get this down to 10 minutes per rack depending on how responsive AWS is! • No performance overhead compared to Cassandra on EBS. • Check out the code here: https://github.com/benbromhead/Cassandra/tree/ic-token- pinning
  • 41. How it works – Real world © DataStax, All Rights Reserved. 41 • Really this is bending some new and impending changes to do funky stuff 