SlideShare a Scribd company logo
1 of 111
Always On:
Building Highly Available
Applications on Cassandra
Robbie Strickland
Who Am I?
Robbie Strickland
VP, Software Engineering
rstrickland@weather.com
@rs_atl An IBM Business
Who Am I?
• Contributor to C*
community since 2010
• DataStax MVP 2014/15/16
• Author, Cassandra High
Availability & Cassandra 3.x
High Availability
• Founder, ATL Cassandra User
Group
What is HA?
What is HA?
• Five nines – 99.999% uptime?
– Roughly 9 hours per year
– … or a full work day of down time!
What is HA?
• Five nines – 99.999% uptime?
– Roughly 9 hours per year
– … or a full work day of down time!
• Can we do better?
Cassandra + HA
• No SPOF
• Multi-DC replication
• Incremental backups
• Client-side failure handling
• Server-side failure handling
• Lots of JMX stats
HA by Design (it’s not an add-on)
HA by Design (it’s not an add-on)
• Properly designed topology
HA by Design (it’s not an add-on)
• Properly designed topology
• Data model that respects C* architecture
HA by Design (it’s not an add-on)
• Properly designed topology
• Data model that respects C* architecture
• Application that handles failure
HA by Design (it’s not an add-on)
• Properly designed topology
• Data model that respects C* architecture
• Application that handles failure
• Monitoring strategy with early warning
HA by Design (it’s not an add-on)
• Properly designed topology
• Data model that respects C* architecture
• Application that handles failure
• Monitoring strategy with early warning
• DevOps mentality
Table Stakes
Table Stakes
• NetworkTopologyStrategy
Table Stakes
• NetworkTopologyStrategy
• GossipingPropertyFileSnitch
– Or [YourCloud]Snitch
Table Stakes
• NetworkTopologyStrategy
• GossipingPropertyFileSnitch
– Or [YourCloud]Snitch
• At least 5 nodes
Table Stakes
• NetworkTopologyStrategy
• GossipingPropertyFileSnitch
– Or [YourCloud]Snitch
• At least 5 nodes
• RF=3
Table Stakes
• NetworkTopologyStrategy
• GossipingPropertyFileSnitch
– Or [YourCloud]Snitch
• At least 5 nodes
• RF=3
• No load balancer
HA Topology
Consistency Basics
Consistency Basics
• Start with LOCAL_QUORUM reads & writes
– Balances performance & availability, and provides
single DC full consistency
– Experiment with eventual consistency (e.g.
CL=ONE) in a controlled environment
Consistency Basics
• Start with LOCAL_QUORUM reads & writes
– Balances performance & availability, and provides
single DC full consistency
– Experiment with eventual consistency (e.g.
CL=ONE) in a controlled environment
• Avoid non-local CLs in multi-DC environments
– Otherwise it’s a crap shoot
Rack Failure
Rack Failure
• Don’t put all your
nodes in one rack!
Rack Failure
• Don’t put all your
nodes in one rack!
• Use rack awareness
– Places replicas in
different racks
Rack Failure
• Don’t put all your
nodes in one rack!
• Use rack awareness
– Places replicas in
different racks
• But don’t use
RackAwareSnitch
Rack Awareness
R2
R3R1
Rack A Rack B
Rack Awareness
R2
R3R1
Rack A Rack B
GossipingPropertyFileSnitch
cassandra-rackdc.properties
dc=dc1
rack=a
dc=dc1
rack=b
Rack Awareness (Cloud Edition)
R2
R3R1
Availability
Zone A
Availability
Zone B
[YourCloud]Snitch
(it’s automagic!)
Data Center Replication
dc=us-1 dc=eu-1
Data Center Replication
CREATE KEYSPACE myKeyspace
WITH REPLICATION = {
‘class’:’NetworkTopologyStrategy’,
‘us-1’:3,
‘eu-1’:3
}
Multi-DC Consistency?
dc=us-1 dc=eu-1
Assumption: LOCAL_QUORUM
Multi-DC Consistency?
dc=us-1 dc=eu-1
Assumption: LOCAL_QUORUM
Fully
consistent
Fully
consistent
Multi-DC Consistency?
dc=us-1 dc=eu-1
Assumption: LOCAL_QUORUM
Fully
consistent
Fully
consistent
?
Multi-DC Consistency?
dc=us-1 dc=eu-1
Assumption: LOCAL_QUORUM
Fully
consistent
Fully
consistent
Eventually
consistent
Multi-DC Routing with LOCAL CL
Client App
us-1
Client App
eu-1
Multi-DC Routing with LOCAL CL
Client App
us-1
Client App
eu-1
Multi-DC Routing with non-LOCAL CL
Client App
us-1
Client App
eu-1
Multi-DC Routing with non-LOCAL CL
Client App
us-1
Client App
eu-1
Multi-DC Routing
• Use DCAwareRoundRobinPolicy wrapped by
TokenAwarePolicy
– This is the default
– Prefers local DC – chosen based on host distance
and seed list
– BUT this can fail for logical DCs that are physically
co-located, or for improperly defined seed lists!
Multi-DC Routing
Pro tip:
val localDC = //get from config
val dcPolicy =
new TokenAwarePolicy(
DCAwareRoundRobinPolicy.builder()
.withLocalDc(localDC)
.build()
)
Be explicit!!
Handling DC Failure
Handling DC Failure
• Make sure backup DC has sufficient capacity
– Don’t try to add capacity on the fly!
Handling DC Failure
• Make sure backup DC has sufficient capacity
– Don’t try to add capacity on the fly!
• Try to limit updates
– Avoids potential consistency issues on recovery
Handling DC Failure
• Make sure backup DC has sufficient capacity
– Don’t try to add capacity on the fly!
• Try to limit updates
– Avoids potential consistency issues on recovery
• Be careful with retry logic
– Isolate it to a single point in the stack
– Don’t DDoS yourself with retries!
Topology Lessons
• Leverage rack awareness
• Use LOCAL_QUORUM
– Full local consistency
– Eventual consistency across DCs
• Run incremental repairs to maintain inter-DC
consistency
• Explicitly route local app to local C* DC
• Plan for DC failure
Data Modeling
Quick Primer
Quick Primer
• C* is a distributed hash table
– Partition key (first field in PK declaration)
determines placement in the cluster
– Efficient queries MUST know the key!
Quick Primer
• C* is a distributed hash table
– Partition key (first field in PK declaration)
determines placement in the cluster
– Efficient queries MUST know the key!
• Data for a given partition is naturally sorted
based on clustering columns
Quick Primer
• C* is a distributed hash table
– Partition key (first field in PK declaration)
determines placement in the cluster
– Efficient queries MUST know the key!
• Data for a given partition is naturally sorted
based on clustering columns
• Column range scans are efficient
Quick Primer
• All writes are immutable
– Deletes create tombstones
– Updates do not immediately purge old data
– Compaction has to sort all this out
Who Cares?
Who Cares?
• Bad performance = application downtime &
lost users
Who Cares?
• Bad performance = application downtime &
lost users
• Lagging compaction is an operations
nightmare
Who Cares?
• Bad performance = application downtime &
lost users
• Lagging compaction is an operations
nightmare
• Some models & query patterns create serious
availability problems
Do
Do
• Choose a partition key that distributes evenly
Do
• Choose a partition key that distributes evenly
• Model your data based on common read
patterns
Do
• Choose a partition key that distributes evenly
• Model your data based on common read
patterns
• Denormalize using collections & materialized
views
Do
• Choose a partition key that distributes evenly
• Model your data based on common read
patterns
• Denormalize using collections & materialized
views
• Use efficient single-partition range queries
Don’t
Don’t
• Create hot spots in either data or traffic
patterns
Don’t
• Create hot spots in either data or traffic
patterns
• Build a relational data model
Don’t
• Create hot spots in either data or traffic
patterns
• Build a relational data model
• Create an application-side join
Don’t
• Create hot spots in either data or traffic
patterns
• Build a relational data model
• Create an application-side join
• Run multi-node queries
Don’t
• Create hot spots in either data or traffic
patterns
• Build a relational data model
• Create an application-side join
• Run multi-node queries
• Use batches to group unrelated writes
Problem Case #1
SELECT *
FROM contacts
WHERE id IN (1,3,5,7,9)
Client
Problem Case #1
SELECT *
FROM contacts
WHERE id IN (1,3,5,7)
1 2
6 5
4 7
2 8
3 6
7 8
1 3
5 2
4 5
7 8
1 3
6 4
Must ask every 4 out of 6 nodes
in the cluster to satisfy quorum!
Client
Problem Case #1
SELECT *
FROM contacts
WHERE id IN (1,3,5,7)
1 2
6 5
4 7
2 8
3 6
7 8
1 3
5 2
4 5
7 8
1 3
6 4
“Not enough replicas available for query
at consistency LOCAL_QUORUM” X
X
1,3,5 all have sufficient replicas,
yet entire query fails because of 7
Solution #1
Solution #1
• Option 1: Be optimistic and run it anyway
– If it fails, you can fall back to option 2
Solution #1
• Option 1: Be optimistic and run it anyway
– If it fails, you can fall back to option 2
• Option 2: Run parallel queries for each key
– Return the results that are available
– Fall back to CL ONE for failed keys
– Client token awareness means coordinator does less
work
Problem Case #2
CREATE INDEX ON contacts(birth_year)
SELECT *
FROM contacts
WHERE birth_year=1975
Client
Problem Case #2
SELECT *
FROM contacts
WHERE birth_year=1975
1975:
Jim
Sue
1975:
Sam
Jim
1975:
Sue
Tim
1975:
Tim
Jim
1975:
Sue
Sam
1975:
Sam
Tim
Index lives with the source data
… so 5 nodes must be queried!
Client
Problem Case #2
SELECT *
FROM contacts
WHERE birth_year=1975
1975:
Jim
Sue
1975:
Sam
Jim
1975:
Sue
Tim
1975:
Tim
Jim
1975:
Sue
Sam
1975:
Sam
Tim
“Not enough replicas available for query
at consistency LOCAL_QUORUM”
Index lives with the source data
… so 5 nodes must be queried!
X
X
Solution #2
Solution #2
• Option 1: Build your own index
– App has to maintain the index
Solution #2
• Option 1: Build your own index
– App has to maintain the index
• Option 2: Use a materialized view
– Not available before 3.0
Solution #2
• Option 1: Build your own index
– App has to maintain the index
• Option 2: Use a materialized view
– Not available before 3.0
• Option 3: Run it anyway
– Ok for small amounts of data (think 10s to 100s of
rows) that can live in memory
– Good for parallel analytics jobs (Spark, Hadoop, etc.)
Problem Case #3
CREATE TABLE sensor_readings (
sensorID uuid,
timestamp int,
reading decimal,
PRIMARY KEY (sensorID, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Problem Case #3
• Partition will grow unbounded
– i.e. it creates wide rows
Problem Case #3
• Partition will grow unbounded
– i.e. it creates wide rows
• Unsustainable number of columns in each
partition
Problem Case #3
• Partition will grow unbounded
– i.e. it creates wide rows
• Unsustainable number of columns in each
partition
• No way to archive off old data
Solution #3
CREATE TABLE sensor_readings (
sensorID uuid,
time_bucket int,
timestamp int,
reading decimal,
PRIMARY KEY ((sensorID, time_bucket),
timestamp)
) WITH CLUSTERING ORDER BY
(timestamp DESC);
Monitoring
Monitoring Basics
Monitoring Basics
• Enable remote JMX
Monitoring Basics
• Enable remote JMX
• Connect a stats collector (jmxtrans, collectd,
etc.)
Monitoring Basics
• Enable remote JMX
• Connect a stats collector (jmxtrans, collectd,
etc.)
• Use nodetool for quick single-node queries
Monitoring Basics
• Enable remote JMX
• Connect a stats collector (jmxtrans, collectd,
etc.)
• Use nodetool for quick single-node queries
• C* tells you pretty much everything via JMX
Thread Pools
• C* is a SEDA architecture
– Essentially message queues feeding thread pools
– nodetool tpstats
Thread Pools
• C* is a SEDA architecture
– Essentially message queues feeding thread pools
– nodetool tpstats
• Pending messages are bad:
Pool Name Active Pending Completed Blocked All time blocked
CounterMutationStage 0 0 0 0 0
ReadStage 0 0 103 0 0
RequestResponseStage 0 0 0 0 0
MutationStage 0 13234794 0 0 0
Lagging Compaction
Lagging Compaction
• Lagging compaction is the reason for many
performance issues
Lagging Compaction
• Lagging compaction is the reason for many
performance issues
• Reads can grind to a halt in the worst case
Lagging Compaction
• Lagging compaction is the reason for many
performance issues
• Reads can grind to a halt in the worst case
• Use nodetool tablestats/cfstats &
compactionstats
Lagging Compaction
• Size-Tiered: watch for high SSTable counts:
Keyspace: my_keyspace
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Flushes: 0
Table: my_table
SSTable count: 84
Lagging Compaction
• Leveled: watch for SSTables remaining in L0:
Keyspace: my_keyspace
Read Count: 11207
Read Latency: 0.047931114482020164 ms.
Write Count: 17598
Write Latency: 0.053502954881236506 ms.
Pending Flushes: 0
Table: my_table
SSTable Count: 70
SSTables in each level: [50/4, 15/10, 5/100]
50 in L0 (should be 4)
Lagging Compaction Solution
• Triage:
– Check stats history to see if it’s a trend or a blip
– Increase compaction throughput using nodetool
setcompactionthroughput
– Temporarily switch to SizeTiered
Lagging Compaction Solution
• Triage:
– Check stats history to see if it’s a trend or a blip
– Increase compaction throughput using nodetool
setcompactionthroughput
– Temporarily switch to SizeTiered
• Do some digging:
– I/O problem?
– Add nodes?
Wide Rows / Hotspots
Wide Rows / Hotspots
• Only takes one to wreak havoc
Wide Rows / Hotspots
• Only takes one to wreak havoc
• It’s a data model problem
Wide Rows / Hotspots
• Only takes one to wreak havoc
• It’s a data model problem
• Early detection is key!
Wide Rows / Hotspots
• Only takes one to wreak havoc
• It’s a data model problem
• Early detection is key!
• Watch partition max bytes
– Make sure it doesn’t grow unbounded
– … or become significantly larger than mean bytes
Wide Rows / Hotspots
• Use nodetool toppartitions to sample
reads/writes and find the offending partition
Wide Rows / Hotspots
• Use nodetool toppartitions to sample
reads/writes and find the offending partition
• Take action early to avoid OOM issues with:
– Compaction
– Streaming
– Reads
For More Info…
(shameless book plug)
Thanks!
Robbie Strickland
rstrickland@weather.com
@rs_atl An IBM Business

More Related Content

What's hot

Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
Acunu
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
DataStax
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
DataStax
 
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
DataStax
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
Gwen (Chen) Shapira
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
DataStax
 

What's hot (20)

DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Sum...
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
Monitoring Cassandra: Don't Miss a Thing (Alain Rodriguez, The Last Pickle) |...
 
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian...
 
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
Productizing a Cassandra-Based Solution (Brij Bhushan Ravat, Ericsson) | C* S...
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?Hadoop:  Big Data Stacks validation w/ iTest  How to tame the elephant?
Hadoop: Big Data Stacks validation w/ iTest How to tame the elephant?
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
 
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
Scalable Data Modeling by Example (Carlos Alonso, Job and Talent) | Cassandra...
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 

Similar to Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
Noam Sheffer
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous Deployment
Timothy Fitz
 

Similar to Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016 (20)

Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
Chill, Distill, No Overkill: Best Practices to Stress Test Kafka with Siva Ku...
 
Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2Lecture 15 run timeenvironment_2
Lecture 15 run timeenvironment_2
 
The Highs and Lows of Stateful Containers
The Highs and Lows of Stateful ContainersThe Highs and Lows of Stateful Containers
The Highs and Lows of Stateful Containers
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
Building a highly scalable and available cloud application
Building a highly scalable and available cloud applicationBuilding a highly scalable and available cloud application
Building a highly scalable and available cloud application
 
No stress with state
No stress with stateNo stress with state
No stress with state
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
The Hard Problems of Continuous Deployment
The Hard Problems of Continuous DeploymentThe Hard Problems of Continuous Deployment
The Hard Problems of Continuous Deployment
 
Reading Notes : the practice of programming
Reading Notes : the practice of programmingReading Notes : the practice of programming
Reading Notes : the practice of programming
 
The Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot PersistenceThe Rise of NoSQL and Polyglot Persistence
The Rise of NoSQL and Polyglot Persistence
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of SimplicityLarge Scale Architecture -- The Unreasonable Effectiveness of Simplicity
Large Scale Architecture -- The Unreasonable Effectiveness of Simplicity
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 

More from DataStax

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 
Innovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud DetectionInnovation Around Data and AI for Fraud Detection
Innovation Around Data and AI for Fraud Detection
 

Recently uploaded

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 

Building Highly Available Apps on Cassandra (Robbie Strickland, Weather Company) | C* Summit 2016

  • 1. Always On: Building Highly Available Applications on Cassandra Robbie Strickland
  • 2. Who Am I? Robbie Strickland VP, Software Engineering rstrickland@weather.com @rs_atl An IBM Business
  • 3. Who Am I? • Contributor to C* community since 2010 • DataStax MVP 2014/15/16 • Author, Cassandra High Availability & Cassandra 3.x High Availability • Founder, ATL Cassandra User Group
  • 5. What is HA? • Five nines – 99.999% uptime? – Roughly 9 hours per year – … or a full work day of down time!
  • 6. What is HA? • Five nines – 99.999% uptime? – Roughly 9 hours per year – … or a full work day of down time! • Can we do better?
  • 7. Cassandra + HA • No SPOF • Multi-DC replication • Incremental backups • Client-side failure handling • Server-side failure handling • Lots of JMX stats
  • 8. HA by Design (it’s not an add-on)
  • 9. HA by Design (it’s not an add-on) • Properly designed topology
  • 10. HA by Design (it’s not an add-on) • Properly designed topology • Data model that respects C* architecture
  • 11. HA by Design (it’s not an add-on) • Properly designed topology • Data model that respects C* architecture • Application that handles failure
  • 12. HA by Design (it’s not an add-on) • Properly designed topology • Data model that respects C* architecture • Application that handles failure • Monitoring strategy with early warning
  • 13. HA by Design (it’s not an add-on) • Properly designed topology • Data model that respects C* architecture • Application that handles failure • Monitoring strategy with early warning • DevOps mentality
  • 16. Table Stakes • NetworkTopologyStrategy • GossipingPropertyFileSnitch – Or [YourCloud]Snitch
  • 17. Table Stakes • NetworkTopologyStrategy • GossipingPropertyFileSnitch – Or [YourCloud]Snitch • At least 5 nodes
  • 18. Table Stakes • NetworkTopologyStrategy • GossipingPropertyFileSnitch – Or [YourCloud]Snitch • At least 5 nodes • RF=3
  • 19. Table Stakes • NetworkTopologyStrategy • GossipingPropertyFileSnitch – Or [YourCloud]Snitch • At least 5 nodes • RF=3 • No load balancer
  • 22. Consistency Basics • Start with LOCAL_QUORUM reads & writes – Balances performance & availability, and provides single DC full consistency – Experiment with eventual consistency (e.g. CL=ONE) in a controlled environment
  • 23. Consistency Basics • Start with LOCAL_QUORUM reads & writes – Balances performance & availability, and provides single DC full consistency – Experiment with eventual consistency (e.g. CL=ONE) in a controlled environment • Avoid non-local CLs in multi-DC environments – Otherwise it’s a crap shoot
  • 25. Rack Failure • Don’t put all your nodes in one rack!
  • 26. Rack Failure • Don’t put all your nodes in one rack! • Use rack awareness – Places replicas in different racks
  • 27. Rack Failure • Don’t put all your nodes in one rack! • Use rack awareness – Places replicas in different racks • But don’t use RackAwareSnitch
  • 29. Rack Awareness R2 R3R1 Rack A Rack B GossipingPropertyFileSnitch cassandra-rackdc.properties dc=dc1 rack=a dc=dc1 rack=b
  • 30. Rack Awareness (Cloud Edition) R2 R3R1 Availability Zone A Availability Zone B [YourCloud]Snitch (it’s automagic!)
  • 32. Data Center Replication CREATE KEYSPACE myKeyspace WITH REPLICATION = { ‘class’:’NetworkTopologyStrategy’, ‘us-1’:3, ‘eu-1’:3 }
  • 34. Multi-DC Consistency? dc=us-1 dc=eu-1 Assumption: LOCAL_QUORUM Fully consistent Fully consistent
  • 35. Multi-DC Consistency? dc=us-1 dc=eu-1 Assumption: LOCAL_QUORUM Fully consistent Fully consistent ?
  • 36. Multi-DC Consistency? dc=us-1 dc=eu-1 Assumption: LOCAL_QUORUM Fully consistent Fully consistent Eventually consistent
  • 37. Multi-DC Routing with LOCAL CL Client App us-1 Client App eu-1
  • 38. Multi-DC Routing with LOCAL CL Client App us-1 Client App eu-1
  • 39. Multi-DC Routing with non-LOCAL CL Client App us-1 Client App eu-1
  • 40. Multi-DC Routing with non-LOCAL CL Client App us-1 Client App eu-1
  • 41. Multi-DC Routing • Use DCAwareRoundRobinPolicy wrapped by TokenAwarePolicy – This is the default – Prefers local DC – chosen based on host distance and seed list – BUT this can fail for logical DCs that are physically co-located, or for improperly defined seed lists!
  • 42. Multi-DC Routing Pro tip: val localDC = //get from config val dcPolicy = new TokenAwarePolicy( DCAwareRoundRobinPolicy.builder() .withLocalDc(localDC) .build() ) Be explicit!!
  • 44. Handling DC Failure • Make sure backup DC has sufficient capacity – Don’t try to add capacity on the fly!
  • 45. Handling DC Failure • Make sure backup DC has sufficient capacity – Don’t try to add capacity on the fly! • Try to limit updates – Avoids potential consistency issues on recovery
  • 46. Handling DC Failure • Make sure backup DC has sufficient capacity – Don’t try to add capacity on the fly! • Try to limit updates – Avoids potential consistency issues on recovery • Be careful with retry logic – Isolate it to a single point in the stack – Don’t DDoS yourself with retries!
  • 47. Topology Lessons • Leverage rack awareness • Use LOCAL_QUORUM – Full local consistency – Eventual consistency across DCs • Run incremental repairs to maintain inter-DC consistency • Explicitly route local app to local C* DC • Plan for DC failure
  • 50. Quick Primer • C* is a distributed hash table – Partition key (first field in PK declaration) determines placement in the cluster – Efficient queries MUST know the key!
  • 51. Quick Primer • C* is a distributed hash table – Partition key (first field in PK declaration) determines placement in the cluster – Efficient queries MUST know the key! • Data for a given partition is naturally sorted based on clustering columns
  • 52. Quick Primer • C* is a distributed hash table – Partition key (first field in PK declaration) determines placement in the cluster – Efficient queries MUST know the key! • Data for a given partition is naturally sorted based on clustering columns • Column range scans are efficient
  • 53. Quick Primer • All writes are immutable – Deletes create tombstones – Updates do not immediately purge old data – Compaction has to sort all this out
  • 55. Who Cares? • Bad performance = application downtime & lost users
  • 56. Who Cares? • Bad performance = application downtime & lost users • Lagging compaction is an operations nightmare
  • 57. Who Cares? • Bad performance = application downtime & lost users • Lagging compaction is an operations nightmare • Some models & query patterns create serious availability problems
  • 58. Do
  • 59. Do • Choose a partition key that distributes evenly
  • 60. Do • Choose a partition key that distributes evenly • Model your data based on common read patterns
  • 61. Do • Choose a partition key that distributes evenly • Model your data based on common read patterns • Denormalize using collections & materialized views
  • 62. Do • Choose a partition key that distributes evenly • Model your data based on common read patterns • Denormalize using collections & materialized views • Use efficient single-partition range queries
  • 64. Don’t • Create hot spots in either data or traffic patterns
  • 65. Don’t • Create hot spots in either data or traffic patterns • Build a relational data model
  • 66. Don’t • Create hot spots in either data or traffic patterns • Build a relational data model • Create an application-side join
  • 67. Don’t • Create hot spots in either data or traffic patterns • Build a relational data model • Create an application-side join • Run multi-node queries
  • 68. Don’t • Create hot spots in either data or traffic patterns • Build a relational data model • Create an application-side join • Run multi-node queries • Use batches to group unrelated writes
  • 69. Problem Case #1 SELECT * FROM contacts WHERE id IN (1,3,5,7,9)
  • 70. Client Problem Case #1 SELECT * FROM contacts WHERE id IN (1,3,5,7) 1 2 6 5 4 7 2 8 3 6 7 8 1 3 5 2 4 5 7 8 1 3 6 4 Must ask every 4 out of 6 nodes in the cluster to satisfy quorum!
  • 71. Client Problem Case #1 SELECT * FROM contacts WHERE id IN (1,3,5,7) 1 2 6 5 4 7 2 8 3 6 7 8 1 3 5 2 4 5 7 8 1 3 6 4 “Not enough replicas available for query at consistency LOCAL_QUORUM” X X 1,3,5 all have sufficient replicas, yet entire query fails because of 7
  • 73. Solution #1 • Option 1: Be optimistic and run it anyway – If it fails, you can fall back to option 2
  • 74. Solution #1 • Option 1: Be optimistic and run it anyway – If it fails, you can fall back to option 2 • Option 2: Run parallel queries for each key – Return the results that are available – Fall back to CL ONE for failed keys – Client token awareness means coordinator does less work
  • 75. Problem Case #2 CREATE INDEX ON contacts(birth_year) SELECT * FROM contacts WHERE birth_year=1975
  • 76. Client Problem Case #2 SELECT * FROM contacts WHERE birth_year=1975 1975: Jim Sue 1975: Sam Jim 1975: Sue Tim 1975: Tim Jim 1975: Sue Sam 1975: Sam Tim Index lives with the source data … so 5 nodes must be queried!
  • 77. Client Problem Case #2 SELECT * FROM contacts WHERE birth_year=1975 1975: Jim Sue 1975: Sam Jim 1975: Sue Tim 1975: Tim Jim 1975: Sue Sam 1975: Sam Tim “Not enough replicas available for query at consistency LOCAL_QUORUM” Index lives with the source data … so 5 nodes must be queried! X X
  • 79. Solution #2 • Option 1: Build your own index – App has to maintain the index
  • 80. Solution #2 • Option 1: Build your own index – App has to maintain the index • Option 2: Use a materialized view – Not available before 3.0
  • 81. Solution #2 • Option 1: Build your own index – App has to maintain the index • Option 2: Use a materialized view – Not available before 3.0 • Option 3: Run it anyway – Ok for small amounts of data (think 10s to 100s of rows) that can live in memory – Good for parallel analytics jobs (Spark, Hadoop, etc.)
  • 82. Problem Case #3 CREATE TABLE sensor_readings ( sensorID uuid, timestamp int, reading decimal, PRIMARY KEY (sensorID, timestamp) ) WITH CLUSTERING ORDER BY (timestamp DESC);
  • 83. Problem Case #3 • Partition will grow unbounded – i.e. it creates wide rows
  • 84. Problem Case #3 • Partition will grow unbounded – i.e. it creates wide rows • Unsustainable number of columns in each partition
  • 85. Problem Case #3 • Partition will grow unbounded – i.e. it creates wide rows • Unsustainable number of columns in each partition • No way to archive off old data
  • 86. Solution #3 CREATE TABLE sensor_readings ( sensorID uuid, time_bucket int, timestamp int, reading decimal, PRIMARY KEY ((sensorID, time_bucket), timestamp) ) WITH CLUSTERING ORDER BY (timestamp DESC);
  • 90. Monitoring Basics • Enable remote JMX • Connect a stats collector (jmxtrans, collectd, etc.)
  • 91. Monitoring Basics • Enable remote JMX • Connect a stats collector (jmxtrans, collectd, etc.) • Use nodetool for quick single-node queries
  • 92. Monitoring Basics • Enable remote JMX • Connect a stats collector (jmxtrans, collectd, etc.) • Use nodetool for quick single-node queries • C* tells you pretty much everything via JMX
  • 93. Thread Pools • C* is a SEDA architecture – Essentially message queues feeding thread pools – nodetool tpstats
  • 94. Thread Pools • C* is a SEDA architecture – Essentially message queues feeding thread pools – nodetool tpstats • Pending messages are bad: Pool Name Active Pending Completed Blocked All time blocked CounterMutationStage 0 0 0 0 0 ReadStage 0 0 103 0 0 RequestResponseStage 0 0 0 0 0 MutationStage 0 13234794 0 0 0
  • 96. Lagging Compaction • Lagging compaction is the reason for many performance issues
  • 97. Lagging Compaction • Lagging compaction is the reason for many performance issues • Reads can grind to a halt in the worst case
  • 98. Lagging Compaction • Lagging compaction is the reason for many performance issues • Reads can grind to a halt in the worst case • Use nodetool tablestats/cfstats & compactionstats
  • 99. Lagging Compaction • Size-Tiered: watch for high SSTable counts: Keyspace: my_keyspace Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Flushes: 0 Table: my_table SSTable count: 84
  • 100. Lagging Compaction • Leveled: watch for SSTables remaining in L0: Keyspace: my_keyspace Read Count: 11207 Read Latency: 0.047931114482020164 ms. Write Count: 17598 Write Latency: 0.053502954881236506 ms. Pending Flushes: 0 Table: my_table SSTable Count: 70 SSTables in each level: [50/4, 15/10, 5/100] 50 in L0 (should be 4)
  • 101. Lagging Compaction Solution • Triage: – Check stats history to see if it’s a trend or a blip – Increase compaction throughput using nodetool setcompactionthroughput – Temporarily switch to SizeTiered
  • 102. Lagging Compaction Solution • Triage: – Check stats history to see if it’s a trend or a blip – Increase compaction throughput using nodetool setcompactionthroughput – Temporarily switch to SizeTiered • Do some digging: – I/O problem? – Add nodes?
  • 103. Wide Rows / Hotspots
  • 104. Wide Rows / Hotspots • Only takes one to wreak havoc
  • 105. Wide Rows / Hotspots • Only takes one to wreak havoc • It’s a data model problem
  • 106. Wide Rows / Hotspots • Only takes one to wreak havoc • It’s a data model problem • Early detection is key!
  • 107. Wide Rows / Hotspots • Only takes one to wreak havoc • It’s a data model problem • Early detection is key! • Watch partition max bytes – Make sure it doesn’t grow unbounded – … or become significantly larger than mean bytes
  • 108. Wide Rows / Hotspots • Use nodetool toppartitions to sample reads/writes and find the offending partition
  • 109. Wide Rows / Hotspots • Use nodetool toppartitions to sample reads/writes and find the offending partition • Take action early to avoid OOM issues with: – Compaction – Streaming – Reads

Editor's Notes

  1. Thank you for joining me for my talk today. My name is Robbie Strickland, and I’m going to talk about how to build highly available applications on Cassandra. If this is not the session you’re looking for, this would be a good time to head out and find the right one. Alternatively, if you’re an expert on this subject, please come talk to me afterward and maybe I can find you a new job…
  2. A little background for those who don’t know me. I lead the analytics team at The Weather Company, based in the beautiful city of Atlanta. I am responsible for our data warehouse and our analytics platform, as well as a team of engineers who get to work on cool analytics projects on massive and varied data sets. We were recently acquired by IBM’s analytics group, and so my role has expanded to include work on the larger IBM platform efforts as well.
  3. Why am I qualified to talk about this? I’ve been around the community for a while, since 2010 and Cassandra 0.5 to be exact, and I’ve worked on a variety of Cassandra-related open source projects. If there’s a way to screw things up with Cassandra, I’ve done it. If you’re interested in learning more about that, you can pick up a copy of my book, Cassandra High Availability, which has a newly released second edition focusing on the 3x series.
  4. I’d like to start by asking the question: what do we mean by high availability?
  5. A common definition is the so-called five nines of uptime. This sounds really good—until you do the math and realize that .001% equates to 9 hours per year, or a full work day of down time per year! I don’t know about your business, but to me that sounds like an unacceptable number.
  6. Can we do better than this?
  7. The conversation around HA and Cassandra is complex and multi-faceted, so it would be impossible to cover everything that needs to be said in a half hour talk. Today I’m going to touch on the highlights, and hopefully take away many of the unknown unknowns. Fortunately Cassandra was built from the ground up to be highly available, and if properly used can deliver 100% uptime on your critical applications. This is possible by leveraging some key capabilities, such as its distributed design with no single point of failure, including replication across data centers. It supports incremental backups, and robust failure handling features on both the client and the server. And Cassandra exposes pretty much anything you’d like to know about its inner workings via a host of JMX stats, so ignorance is no excuse.
  8. As you begin to design your application, I would encourage you to channel the Cassandra architects and think about availability from the start. It’s very difficult to bolt on HA capability to an existing app, and this is especially true with Cassandra.
  9. Let’s talk about the ingredients that comprise a successful HA deployment, starting with a properly designed topology. By this I mean the physical deployment of both the database and your application
  10. Next you need a data model that leverages Cassandra’s strengths and mitigates its weaknesses.
  11. You’ll want to make sure that your application handles failure as well, and there are some specific strategies I’ll discuss to drive that point home.
  12. You will need to keep a close watch on the key performance metrics so you have reaction time before a failure…
  13. and lastly you’ll need to cultivate a devops mentality if you don’t already think this way.
  14. Let’s lay a few ground rules. I’m going to assume a few things about your configuration that are commonly considered to be table stakes for any production Cassandra deployment.
  15. First, you should be using NetworkTopologyStrategy …
  16. and either the GossipingPropertyFileSnitch or the appropriate snitch for your cloud provider. For the record, we run many multi-region EC2 clusters, yet we still use the Gossiping snitch because it gives us more control
  17. Next, I’m assuming you have at least 5 nodes, since anything less is really insufficient for production use
  18. A replication factor of three is the de facto standard; while there are reasons to have more or fewer, your reason probably isn’t valid. Pop quiz: If you set your replication factor to two, what constitutes a quorum? That’s right: two. Now let’s say you have five nodes. How many nodes can fail without some subset of your data becoming unavailable? Zero. So at RF=2, every node in your cluster becomes a catastrophic failure point
  19. Lastly, please don’t put your cluster behind a load balancer. You will break the client-side smarts built into the driver and produce a lot of unnecessary overhead on your cluster.
  20. With that out of the way, let’s talk about how we build an HA topology.
  21. As I’m sure you’re aware, Cassandra has a robust consistency model with a number of knobs to turn. There are plenty of great resources that cover this, so I’m going to leave you with just a few rules of thumb and let you explore further on your own
  22. I always recommend that people start with LOCAL_QUORUM reads and writes, because this gives you a good balance of performance and availability, and you don’t have to deal with eventual consistency within a single data center. As a corollary, my suggestions is to experiment with eventual consistency (meaning something less than quorum) in a controlled environment. You’ll want to gain some operational experience handling eventually consistent behavior before deploying a mission critical app.
  23. Second, don’t use non-local consistency levels in multi-data center environments, because the behavior will be unpredictable. I’ll cover this situation in detail later.
  24. If you follow the basic replication and consistency guidelines I just outlined, single node failures will be relatively straightforward to recover from. But what happens when someone trips over the power cord to your rack, or a switch fails? Fortunately Cassandra offers a mechanism to handle this, as long as you’re smart about your topology.
  25. Obviously if you put all your nodes in a single rack, you’re kind of on your own—so don’t do that!
  26. Assuming you have multiple racks, you can leverage the rack awareness feature, which places replicas in different racks.
  27. However, I would advise against using the RackAwareSnitch, as it makes assumptions about your network configuration that may not always hold.
  28. Let’s look at how rack awareness works. Assuming you have two racks, A and B, Cassandra will insure that the three replicas of each key are distributed across racks. This means you’ll have at least one available replica even if an entire rack is down. In this case, if rack B goes down, your application will have to support reading at CL one if you want to continue to serve that data.
  29. To set this up with the GossipingPropertyFileSnitch, you’ll need to add a cassandra-rackdc.properties file to the config directory, where you’ll specify which data center and rack the node belongs to. This information is automatically gossiped to the rest of the cluster, so there’s no need to keep files in synch as with the legacy PropertyFileSnitch.
  30. Alternatively, if you’re using a cloud snitch, you can accomplish the same thing by locating your nodes in different availability zones. The cloud snitches will map the region to a data center and the availability zone to a rack. Just as with physical racks, it’s important to evenly distribute your nodes across zones if you want this to work properly.
  31. Once you improved local availability, it’s likely that you’ll want or need to expand geographically. There are a variety of reasons for this, such as disaster recovery, failover, and bringing the data closer to your users. Cassandra handles this multi-DC replication automatically through the keyspace definition. In my example here, I have a data center in the US, which we’re calling us-1, and one in Europe (but not England), which we’re calling eu-1.
  32. The setup for this is straightforward using the “with replication” clause on the create keyspace CQL command. You can specify a list of data center names with the corresponding number of replicas you want maintained in each.
  33. One important question when it comes to multi-DC operations is what sort of consistency guarantees you get, again assuming local_quorum reads and writes.
  34. I’ve already established that within a given DC, local_quorum gives you full consistency,
  35. But what guarantee do you get between data centers?
  36. The answer is eventual consistency. This is an extremely important point when designing your application, and it brings my to a closely related topic: client-side routing.
  37. At the risk of stating the obvious, the ideal scenario is to have each client app only talk to local Cassandra nodes using a local consistency level.
  38. This is the right approach, but it’s surprisingly easy to mess this up. I’ve seen this simple rule break down due to misunderstanding about the relationship between consistency level and client load balancing policy.
  39. The breakdown often comes due to failure to set the consistency level to a local variant. This illustrates what happens when you don’t run a local consistency level.
  40. Don’t do this. You end up with traffic running all over the place, because Cassandra is trying to check replicas in the remote data center to satisfy your requested consistency guarantee. This can also happen if you give your app a list of all nodes in your cluster. So make sure you explicitly set a local consistency level, and make sure your client is only connecting to local nodes.
  41. If you want to guarantee that traffic from your app is routed to the right node in the local DC, you’ll want to leverage the DCAwareRoundRobinPolicy, wrapped by the TokenAwarePolicy. The good news is this is the default configuration for the Datastax driver, but there is still potential for problems when relying on the default. If you don’t explicitly specify the DC, it will be chosen automatically using the provided seed list and host distance. We have run into issues where a non-local node was accidentally included in the seed list, which of course caused the driver to learn about other nodes and begin directing traffic to those nodes.
  42. To solve this, obtain your local DC using a local environment configuration, then explicitly specify it using the withLocalDc option, as I’ve shown here. This is essentially a fail-safe against a non-local node getting inadvertently added to your seed list.
  43. So how can you handle the failure of an entire DC?
  44. First, assuming you plan to fail over to another DC, please make sure your backup DC can handle the extra load. Trying to add capacity on the fly is unwise, as you’ll be introducing bootstrap overhead as well as the additional traffic. This is very likely to result in failure of your backup data center as well!
  45. Second, try to limit updates, as they can cause consistency issues when you try to bring the downed data center back online. Many applications will have a read-only failure mode, which can be significantly better than being down altogether
  46. Lastly, be very careful when designing your retry logic. Make sure to isolate the retries to a single point in the stack, so you don’t end up bringing your app down due to your own retry explosion.
  47. To recap the lessons learned on topology, make sure you’re leveraging rack awareness, use local quorum for full local consistency, run incremental repairs to maintain inter-DC consistency, explicitly set the local DC in your app, and create a plan to handle the failure of a DC.
  48. Now let’s move on to one of the most critical aspects of availability, and frankly the one that trips up most people. It’s easy to become lulled by the familiarity of the CQL syntax, but you really need to pay close attention to what Cassandra is doing with your data. Otherwise you’ll almost certainly run into performance and availability problems.
  49. I’ll begin with a quick primer, though I’m sure many of you know this stuff already. But these are critical points, so a quick recap is in order just in case.
  50. First, Cassandra is a distributed hash table, and the partition key determines where data lives in the cluster—specifically which nodes contain replicas
  51. Data for a given partition is sorted based on the clustering column values using the natural sort order of the type
  52. It follows that queries resulting in column range scans are efficient, because they leverage this natural sorting.
  53. Lastly, all Cassandra writes are immutable. Inserts and updates are really the same operation, and deletes create new columns called tombstones that overwrite the old values. Old data hangs around following an update, and compaction has to reconcile this to avoid holding onto a bunch of garbage and making reads more efficient.
  54. So why do we even care about these details?
  55. Obviously bad performance results in down time.
  56. Maybe less obviously, bad data models can result in significant compaction overhead, which can cause compaction to lag. Lagging compaction is a serious operations problem, especially if it’s allowed to continue undetected for too long.
  57. Also, some models and patterns have significant and inherent availability implications.
  58. A couple of general do’s and don’ts. First some rules of thumb:
  59. Choose your partition key carefully, such that you get even distribution across the cluster.
  60. And unlike your favorite third normal form data model, you’ll want to model based on your most common read patterns.
  61. To help accomplish this, you’ll want to denormalize your data. Collections and the new materialized view feature are valuable tools that can help you accomplish this.
  62. And this last point could be considered the unifying theory for Cassandra data modeling: always run single partition range queries. If you’re unsure what constitutes a range query, there are a number of excellent resources available to explain this, including a talk I did at a past summit called CQL under the hood.
  63. Now for a list of don’ts.
  64. First off, avoid models that result in hot spots in either data load or traffic patterns. A hot spot is simply an unusually large amount of data being written to or read from a single partition key
  65. Secondly, If you find yourself building foreign key style relationships, you need to think differently about the problem. Relational models do not translate well to the Cassandra paradigm
  66. A corollary principle is to avoid joining data on the application side, unless the join table is just a few rows that you can cache in memory
  67. Next, don’t run queries that require many nodes to answer. I’ll cover a couple of these cases in a minute.
  68. And lastly, batches are not meant for grouping unrelated writes. In a single-node relational database, batching can be very efficient for loading large amounts of data, but again, this does not translate to Cassandra. If you need to do this, there’s a bulk loader Java API that can be leveraged for this purpose.
  69. Now let’s examine a few problem cases and talk about how they can be addressed. Case number one looks innocuous. You have a contacts table, and you want to retrieve a set of them by ID. So you use an IN clause to filter your results. Why would this be an issue?
  70. Let’s look at what’s happening here: The client issues the request, which will be routed to a coordinator based on one of the keys in the list. Assuming a quorum read, the coordinator will have to find two replicas of every key you asked for, resulting in four out of six nodes participating in the query.
  71. Now suppose we lose two nodes. Cassandra can satisfy quorum for keys 1, 3, and 5, but there aren’t enough available replicas to return the query for 7. Because the keys are grouped using the IN clause, the entire query will fail.
  72. There are two potential solutions to this problem.
  73. Option one is to just throw caution to the wind and do it anyway, then in the failure case you can fall back to option two,
  74. which is to run parallel queries for each key. If you do this, you are able to return any available results, which may be better than nothing at all. In addition, you can choose to reduce your consistency level to ONE for any failed keys, so you’re effectively taking a best effort approach to returning the latest data. This approach also allows the client to more effectively leverage token awareness, so the coordinator is doing less work.
  75. Even after years of warning, our next case seems to stick around like a lingering cold. Revisiting the contacts table, let’s say you have a field called birth year that you want to use to filter our results. So you do what any good relational database architect would do: you create an index on that field.
  76. But this is a bad plan, because index entries are stored alongside the source table. Architecturally this is a sound strategy, but it means that you have to read the index across the cluster to find out which nodes contain data that matches the value you’re querying. This pattern does not scale well, and is prone to availability issues just like the IN clause.
  77. As in the previous example, if you lose two nodes you can no longer satisfy quorum for the query.
  78. There are three potential alternatives to secondary indexes
  79. One option is to build your own index, which may work well if the column you’re indexing has reasonable distribution. Birth year would likely be such a column, but a boolean value would not. The disadvantage is that your app has to maintain the index and deal with potential consistency and orphan issues. But this is a tried and true approach, and may be the right solution for some cases.
  80. Option two is to use a materialized view, which results in essentially the same underlying result as option one, but has the advantage that Cassandra maintains it for you—thus alleviating the burden on the application. The downside is that you’ll need 3.x to get this feature
  81. The last option is to run it anyway, which may be ok if you have only small amounts of data that you’re returning. Indexes can also be a good pairing with analytics frameworks, where good parallelism is important. In this case, the distribution of the query across the cluster is actually a positive attribute.
  82. For our last case, let’s assume you want to capture sensor data, which is inherently time series, and you want to be able to read the latest few values. The obvious model would look like this, where you partition by sensorID and then group by timestamp. You can add the clustering order by clause to reverse the sort order, such that it’s stored with the latest value first. This model allows you to query a given sensor and obtain the readings in descending order by timestamp.
  83. But this model suffers from one of the most insidious of Cassandra evils—the unbounded partition, which is the worst form of wide row problem
  84. Eventually your partition sizes will become unsustainable, which will result in serious problems with compaction and streaming at the very least.
  85. Unfortunately, if you find yourself in this situation you will also realize that there’s no way to efficiently archive off old data, because doing so would create a significant number of tombstones and therefore compound the problem.
  86. The solution is to create a compound partition key, using sensorID plus time bucket to create a boundary for growth, where the time bucket can be known at query time. One important trick here is to choose your time bucket such that you only need at most two buckets to satisfy your most common queries. The reason is to limit the number of nodes that will have to be consulted to answer the query. It’s also worth noting that this concept where you add a known value to your partition key to limit growth and provide better distribution will generalize to other use cases as well.
  87. Now to the last subject I want to cover today. Monitoring is a key part of any HA strategy, for reasons that I hope are obvious. What may be less obvious is exactly what you should be looking for to determine the health of the system. While I cannot hope to cover every possible scenario, I’m going to touch on some of the more critical problem areas that may be less obvious.
  88. First a few basic concepts
  89. Before you can collect anything you’ll need to enable remote JMX in the cassandra-env.sh script as well as on the JVM itself.
  90. ]Then you’ll need some way to collect the stats, using jmxtrans, collectd, or something similar.
  91. For simple, single-node diagnostics, nodetool provides a convenient interface to some of the more common questions you want to answer
  92. Beyond that, Cassandra exposes just about every stat you can imagine through its JMX interface. So let’s look at a few things you might want to watch.
  93. One very important metric to keep an eye on is the state of the thread pools. Cassandra uses a staged event driven architecture, or SEDA, that’s essentially a set of message queues feeding thread pools where the workers are dequeuing the messages. Nodetool tpstats gives you a view into what’s going on with the pools
  94. The important thing to look for here is a buildup of pending messages, in this case on the mutation pool. As you may have guessed, this indicates that writes to disk aren’t keeping up with the queued requests. This doesn’t tell you why that’s happening, but if you’re also monitoring related areas like disk I/O, you should be able to quickly diagnose the problem. The point is to catch this early so you can resolve the situation before it gets out of hand.
  95. Another very common but largely misunderstood problem relates to compaction—specifically when it gets behind
  96. Lagging compaction can causes significant performance issues, especially with reads.
  97. Because it’s responsible for maintaining sane key distribution across SSTables, you can end up with reads that span many tables—and in the worst case you may end up with read latencies spiraling out of control and eventually timing out
  98. To diagnose compaction issues, use nodetool tablestats and compactionstats.
  99. The metric you’re looking for will depend on the compaction strategy you’re using. For Size-tiered, keep an eye on SSTable count, which should stay within a reasonable margin. If your monitoring system shows a consistent growth in SSTables, you’ll need to take action to avoid the situation getting out of hand.
  100. When leveled compaction gets behind the curve, you’ll start to see a buildup of SSTables in the lower levels, specifically in level 0. Since leveled compaction is designed to very quickly compact level 0 SSTables, you should never see more than a handful at time.
  101. Dealing with a lagging compaction situation involves a two-part solution. First, you’ll need to triage quickly to get to as stable a state as possible. First, make sure it’s a trend and not just a blip. This is where history is important, as you can’t make good decisions from a single data point. If you have the ability to do so, consider increasing compaction throughput. In some cases this may be all you need, as long as it recovers successfully and your cluster can handle it. If you’re running leveled compaction, which requires substantially more I/O than Size-tiered, you can often recover by temporarily switching to Size-tiered to catch up
  102. Ultimately you’ll need to figure out what’s causing compaction to lag. Sometimes you can just turn up the throughput, but often there’s an underlying problem, such as poor disk performance, or perhaps you’re underprovisioned and need to add nodes. Either way, these guidelines can help you keep your system running, buying you time to get to the bottom of it.
  103. As I mentioned earlier, one of the worst Cassandra problems I’ve personally experienced is related to wide rows.
  104. It really only takes one to completely ruin your day
  105. Fundamentally, wide rows are a data model problem, so the fix is usually retooling your model—which is not usually quick or easy
  106. This is why you’ll want to find out about it as soon as possible, so you have some runway to deal with it before it takes down your application
  107. The key metric here is to watch the max partition bytes for each table. Make sure you don’t see unbounded growth, or a value that greatly exceeds the mean partition bytes.
  108. Once you detect a problem, you can often use the nodetool toppartitions tool to sample your traffic and get a list of candidate partition keys. This works as long as the traffic pattern at the time of the sampling is indicative of the hotspot pattern.
  109. When you find a wide row, deal with it as soon as possible, or you may start seeing OOM issues with compaction, streaming, and reads.
  110. I’ve covered a lot of territory, but there’s much more detail to this subject. If you’d like to learn more, there’s this really amazing new book that was just printed, and I’d shamelessly encourage you to get your very own copy today.
  111. Thanks again for coming out to my talk today, and I’d love to answer any questions you may have.