SlideShare a Scribd company logo
1 of 35
Thom Valley
Distributing the Enterprise, Safely
1 Distributing the Enterprise
2 The Fault Tolerance Fallacy
3 Testing (On the Spectrum)
4 Critical Factors
5 Example Test Scenarios
2© DataStax, All Rights Reserved.
Distributing the Enterprise
Distributing the Enterprise - Assumptions
© DataStax, All Rights Reserved. 4
Can Install / Configure
Have Appropriate / Adequate Hardware (?)
Sound Data Model
Distributing the Enterprise
© DataStax, All Rights Reserved. 5
San
Francisco
New York
Dubai
Massively Scalable
Operationally Simple
Always On
Geographically Distributed
Fault Tolerant
The Fault Tolerance Fallacy
The Fault Tolerance Fallacy
© DataStax, All Rights Reserved. 7
The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
--Sun Microsystem Fellows…
The Fault Tolerance Fallacy
© DataStax, All Rights Reserved. 8
My cluster has N copies of my data, I don’t need to backup…
Cassandra / DSE is easy to scale, just add nodes…
If a node fails, I will just replace it…
If a DC fails, everything will just fail over…
X and Y (big company) are using it, it must just work…
The Fault Tolerance Fallacy
© DataStax, All Rights Reserved. 9
IF YOU CAN’T RECOVER, YOU’RE NOT FAULT TOLERANT
The Fault Tolerance Fallacy
© DataStax, All Rights Reserved. 10
Very few organizations test for failure (at scale)
Most configurations / circumstances are unique
Anything that you haven’t proven is a potential failure point
Almost everything works at a small scale
The features designed to save you may cause your failure
Understanding Mean Time to Recover (MTTR)
The Fault Tolerance Fallacy
© DataStax, All Rights Reserved. 11
Very few organizations test for failure (at scale)
Most configurations / circumstances are unique
Anything that you haven’t proven is a potential failure point
Almost everything works at a small scale
The features designed to save you may cause your failure
Understanding Mean Time to Recover (MTTR)
Testing (On the Spectrum)
Testing (On the Spectrum)
© DataStax, All Rights Reserved. 13
TEST?
testing acumen
Testing - Guidance
© DataStax, All Rights Reserved. 14
Test at scale (nodes / data size / throughput)
Test with a real data model / production compaction strategies
Test with your full stack
Don’t just test for speeds and feeds
Test for extended time periods (days or weeks, not hours)
Test failure scenarios
Test recovery scenarios (understand MTTR)
Critical Factors
Critical Factors
© DataStax, All Rights Reserved. 16
Node Performance / Density
LAN Performance
WAN Performance
Bootstrap/Rebuild Window
DC Rebuild Window
Cluster Capacity
Configuration*
Hinted Handoff (?)
Repair
Rack / AZ Outages
Rolling Restarts
Upgrades
Critical Factors - Configuration
© DataStax, All Rights Reserved. 17
Compaction Strategy
VNODES
Heap Settings
Throttles
Critical Factors - Understanding
© DataStax, All Rights Reserved. 18
Impact of workload on cluster / nodes
Impact of failed state on cluster
Impact of recovery process on cluster
Mean time to recover - MTTR (node, data center)
Critical Factors - Results
© DataStax, All Rights Reserved. 19
Proper node / cluster sizing
Configuration validation
Operational readiness (understand MTTR):
Single Node
Full Rack / AZ
Full DC
Test Scenarios
General Guidelines
© DataStax, All Rights Reserved. 21
Whenever possible, be running full workload simulation
Run repair in background (where possible)
Monitor application / cluster
Test scenarios more than once
Single Node Failure (within hint window)
© DataStax, All Rights Reserved. 22
General Process
• Get workload running / monitoring
• Fail node within hint window (2 hour target)
• Bring node back online
Single Node Failure (within hint window)
© DataStax, All Rights Reserved. 23
Things to Watch
• Impact to workload during failure, replay, compaction
• Load on cluster during same
• Can you recover at node density / workload?
• MTTR to get to normal state?
Single Node Failure (within hint window)
© DataStax, All Rights Reserved. 24
Other Considerations
• Hint compaction (pre 3.0)
• Multi-DC with distributed workload / hint replay
• Impact of latency on Multi-DC
• Impact of throttling (or lack) on hint replay
• What’s a sustainable hint window w/ workload?
Single Node Failure (beyond hint window)
© DataStax, All Rights Reserved. 25
General Process
• Get workload running / monitoring
• Fail node for more than hint window
• Bring node back online / rebuild node
• (Optionally truncate hints if that’s your ops plan)
Single Node Failure (beyond hint window)
© DataStax, All Rights Reserved. 26
Things to Watch
• Impact to workload during failure, replay, compaction
• Impact to workload during node rebuild
• Load on cluster during same
• MTTR for rebuilding node with workload running,
including compaction
Single Node Failure (beyond hint window)
© DataStax, All Rights Reserved. 27
Other Considerations
• How does node density impact node rebuild?
• How does compaction strategy impact node rebuild?
• What is the impact of VNODES on streaming /
compaction during rebuild?
Rack / AZ Failure (within hint window)
© DataStax, All Rights Reserved. 28
General Process
Same as above (single node test) but for full AZ
Considerations
Can cluster sustain workload? At what utilization?
Can cluster sustain hint buildup?
What is a sustainable hint window with workload?
Rack / AZ Failure (beyond hint window)
© DataStax, All Rights Reserved. 29
General Process
Same as above (single node test) but for full Rack / AZ
Considerations
Can cluster sustain workload? At what utilization?
Can cluster sustain hint buildup? Truncate Hints?
Can cluster sustain rebuilding full Rack / AZ
Full DC Failure (within hint window)
© DataStax, All Rights Reserved. 30
General Process
Same as above (single node test) but for full DC
Considerations
Can cluster sustain workload? At what utilization?
Can cluster sustain hint buildup?
Impact of hint replay on remote DCs? WAN Utilization?
What is a sustainable hint window with workload?
Full DC Failure (beyond hint window)
© DataStax, All Rights Reserved. 31
General Process
Same as above (single node test) but for full DC
Considerations
Can cluster sustain workload? At what utilization?
Can cluster sustain hint buildup? Truncate Hints?
Can cluster sustain rebuilding full DC?
Impact on WAN links during full DC rebuild?
Long Running Workload
© DataStax, All Rights Reserved. 32
Considerations
Change in response times (read and/or write)
Impact of delete workloads
Impact of compaction settings
Changing impact of repair operations
Other Scenarios to Consider
© DataStax, All Rights Reserved. 33
Flapping node
Flapping network (single node)
Flapping network (WAN)
Over consuming WAN (Multi-DC)
Strategies
© DataStax, All Rights Reserved. 34
Pick scenarios that matter to you
Test in advance at as close to scale as possible
Repeat tests on major changes:
Infrastructure
Compaction Strategies
Workload (read/write mix, volume, etc.)
Thank You

More Related Content

What's hot

Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale DataCloudera, Inc.
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2DataStax Academy
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningDataWorks Summit
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...Cloudera, Inc.
 
Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Joelith
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackAnirvan Chakraborty
 
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...DataStax
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxDataStax
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid DataWorks Summit
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesCloudera, Inc.
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with SupersetDataWorks Summit
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerDataWorks Summit
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoophadooparchbook
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreDataWorks Summit
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/KuduChris George
 

What's hot (20)

Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Datadplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
 
Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014Coherence Overview - OFM Canberra July 2014
Coherence Overview - OFM Canberra July 2014
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
Oracle: Let My People Go! (Shu Zhang, Ilya Sokolov, Symantec) | Cassandra Sum...
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
How jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStaxHow jKool Analyzes Streaming Data in Real Time with DataStax
How jKool Analyzes Streaming Data in Real Time with DataStax
 
Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid Sherlock: an anomaly detection service on top of Druid
Sherlock: an anomaly detection service on top of Druid
 
How to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issuesHow to use Impala query plan and profile to fix performance issues
How to use Impala query plan and profile to fix performance issues
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
Application Architectures with Hadoop
Application Architectures with HadoopApplication Architectures with Hadoop
Application Architectures with Hadoop
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
Oracle Coherence
Oracle CoherenceOracle Coherence
Oracle Coherence
 
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
 

Similar to DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Summit 2016

Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...DataStax
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
The Value of Reactive
The Value of ReactiveThe Value of Reactive
The Value of ReactiveVMware Tanzu
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliencyMasashi Narumoto
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyondMatija Gobec
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...DataStax
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msIlya Ganelin
 
CAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin SchönertCAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin SchönertArangoDB Database
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsMatthew Dennis
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msApache Apex
 
Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!Revelation Technologies
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Vinay Kumar Chella
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Toronto-Oracle-Users-Group
 
Upgrading to Oracle SOA Suite 11g While Maintaining 100% Uptime
Upgrading to Oracle SOA Suite 11g While Maintaining 100% UptimeUpgrading to Oracle SOA Suite 11g While Maintaining 100% Uptime
Upgrading to Oracle SOA Suite 11g While Maintaining 100% UptimeRevelation Technologies
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Application-level Disaster Recovery on OpenStack
Application-level Disaster Recovery on OpenStackApplication-level Disaster Recovery on OpenStack
Application-level Disaster Recovery on OpenStackAli Hodroj
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
VMUGIT UC 2013 - 04 Duncan Epping
VMUGIT UC 2013 - 04 Duncan EppingVMUGIT UC 2013 - 04 Duncan Epping
VMUGIT UC 2013 - 04 Duncan EppingVMUG IT
 

Similar to DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Summit 2016 (20)

Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
The value of reactive
The value of reactiveThe value of reactive
The value of reactive
 
The Value of Reactive
The Value of ReactiveThe Value of Reactive
The Value of Reactive
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
 
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
Cassandra Tuning - Above and Beyond (Matija Gobec, SmartCat) | Cassandra Summ...
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
CAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin SchönertCAP and the Architectural Consequences by martin Schönert
CAP and the Architectural Consequences by martin Schönert
 
BigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current TrendsBigData as a Platform: Cassandra and Current Trends
BigData as a Platform: Cassandra and Current Trends
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
 
Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!Double the Performance of Oracle SOA Suite 11g? Absolutely!
Double the Performance of Oracle SOA Suite 11g? Absolutely!
 
Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0Live traffic capture and replay in cassandra 4.0
Live traffic capture and replay in cassandra 4.0
 
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?Extreme Availability using Oracle 12c Features: Your very last system shutdown?
Extreme Availability using Oracle 12c Features: Your very last system shutdown?
 
Upgrading to Oracle SOA Suite 11g While Maintaining 100% Uptime
Upgrading to Oracle SOA Suite 11g While Maintaining 100% UptimeUpgrading to Oracle SOA Suite 11g While Maintaining 100% Uptime
Upgrading to Oracle SOA Suite 11g While Maintaining 100% Uptime
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Application-level Disaster Recovery on OpenStack
Application-level Disaster Recovery on OpenStackApplication-level Disaster Recovery on OpenStack
Application-level Disaster Recovery on OpenStack
 
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
VMUGIT UC 2013 - 04 Duncan Epping
VMUGIT UC 2013 - 04 Duncan EppingVMUGIT UC 2013 - 04 Duncan Epping
VMUGIT UC 2013 - 04 Duncan Epping
 

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Recently uploaded

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 

Recently uploaded (20)

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 

DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Summit 2016

  • 1. Thom Valley Distributing the Enterprise, Safely
  • 2. 1 Distributing the Enterprise 2 The Fault Tolerance Fallacy 3 Testing (On the Spectrum) 4 Critical Factors 5 Example Test Scenarios 2© DataStax, All Rights Reserved.
  • 4. Distributing the Enterprise - Assumptions © DataStax, All Rights Reserved. 4 Can Install / Configure Have Appropriate / Adequate Hardware (?) Sound Data Model
  • 5. Distributing the Enterprise © DataStax, All Rights Reserved. 5 San Francisco New York Dubai Massively Scalable Operationally Simple Always On Geographically Distributed Fault Tolerant
  • 7. The Fault Tolerance Fallacy © DataStax, All Rights Reserved. 7 The network is reliable Latency is zero Bandwidth is infinite The network is secure Topology doesn't change There is one administrator Transport cost is zero --Sun Microsystem Fellows…
  • 8. The Fault Tolerance Fallacy © DataStax, All Rights Reserved. 8 My cluster has N copies of my data, I don’t need to backup… Cassandra / DSE is easy to scale, just add nodes… If a node fails, I will just replace it… If a DC fails, everything will just fail over… X and Y (big company) are using it, it must just work…
  • 9. The Fault Tolerance Fallacy © DataStax, All Rights Reserved. 9 IF YOU CAN’T RECOVER, YOU’RE NOT FAULT TOLERANT
  • 10. The Fault Tolerance Fallacy © DataStax, All Rights Reserved. 10 Very few organizations test for failure (at scale) Most configurations / circumstances are unique Anything that you haven’t proven is a potential failure point Almost everything works at a small scale The features designed to save you may cause your failure Understanding Mean Time to Recover (MTTR)
  • 11. The Fault Tolerance Fallacy © DataStax, All Rights Reserved. 11 Very few organizations test for failure (at scale) Most configurations / circumstances are unique Anything that you haven’t proven is a potential failure point Almost everything works at a small scale The features designed to save you may cause your failure Understanding Mean Time to Recover (MTTR)
  • 12. Testing (On the Spectrum)
  • 13. Testing (On the Spectrum) © DataStax, All Rights Reserved. 13 TEST? testing acumen
  • 14. Testing - Guidance © DataStax, All Rights Reserved. 14 Test at scale (nodes / data size / throughput) Test with a real data model / production compaction strategies Test with your full stack Don’t just test for speeds and feeds Test for extended time periods (days or weeks, not hours) Test failure scenarios Test recovery scenarios (understand MTTR)
  • 16. Critical Factors © DataStax, All Rights Reserved. 16 Node Performance / Density LAN Performance WAN Performance Bootstrap/Rebuild Window DC Rebuild Window Cluster Capacity Configuration* Hinted Handoff (?) Repair Rack / AZ Outages Rolling Restarts Upgrades
  • 17. Critical Factors - Configuration © DataStax, All Rights Reserved. 17 Compaction Strategy VNODES Heap Settings Throttles
  • 18. Critical Factors - Understanding © DataStax, All Rights Reserved. 18 Impact of workload on cluster / nodes Impact of failed state on cluster Impact of recovery process on cluster Mean time to recover - MTTR (node, data center)
  • 19. Critical Factors - Results © DataStax, All Rights Reserved. 19 Proper node / cluster sizing Configuration validation Operational readiness (understand MTTR): Single Node Full Rack / AZ Full DC
  • 21. General Guidelines © DataStax, All Rights Reserved. 21 Whenever possible, be running full workload simulation Run repair in background (where possible) Monitor application / cluster Test scenarios more than once
  • 22. Single Node Failure (within hint window) © DataStax, All Rights Reserved. 22 General Process • Get workload running / monitoring • Fail node within hint window (2 hour target) • Bring node back online
  • 23. Single Node Failure (within hint window) © DataStax, All Rights Reserved. 23 Things to Watch • Impact to workload during failure, replay, compaction • Load on cluster during same • Can you recover at node density / workload? • MTTR to get to normal state?
  • 24. Single Node Failure (within hint window) © DataStax, All Rights Reserved. 24 Other Considerations • Hint compaction (pre 3.0) • Multi-DC with distributed workload / hint replay • Impact of latency on Multi-DC • Impact of throttling (or lack) on hint replay • What’s a sustainable hint window w/ workload?
  • 25. Single Node Failure (beyond hint window) © DataStax, All Rights Reserved. 25 General Process • Get workload running / monitoring • Fail node for more than hint window • Bring node back online / rebuild node • (Optionally truncate hints if that’s your ops plan)
  • 26. Single Node Failure (beyond hint window) © DataStax, All Rights Reserved. 26 Things to Watch • Impact to workload during failure, replay, compaction • Impact to workload during node rebuild • Load on cluster during same • MTTR for rebuilding node with workload running, including compaction
  • 27. Single Node Failure (beyond hint window) © DataStax, All Rights Reserved. 27 Other Considerations • How does node density impact node rebuild? • How does compaction strategy impact node rebuild? • What is the impact of VNODES on streaming / compaction during rebuild?
  • 28. Rack / AZ Failure (within hint window) © DataStax, All Rights Reserved. 28 General Process Same as above (single node test) but for full AZ Considerations Can cluster sustain workload? At what utilization? Can cluster sustain hint buildup? What is a sustainable hint window with workload?
  • 29. Rack / AZ Failure (beyond hint window) © DataStax, All Rights Reserved. 29 General Process Same as above (single node test) but for full Rack / AZ Considerations Can cluster sustain workload? At what utilization? Can cluster sustain hint buildup? Truncate Hints? Can cluster sustain rebuilding full Rack / AZ
  • 30. Full DC Failure (within hint window) © DataStax, All Rights Reserved. 30 General Process Same as above (single node test) but for full DC Considerations Can cluster sustain workload? At what utilization? Can cluster sustain hint buildup? Impact of hint replay on remote DCs? WAN Utilization? What is a sustainable hint window with workload?
  • 31. Full DC Failure (beyond hint window) © DataStax, All Rights Reserved. 31 General Process Same as above (single node test) but for full DC Considerations Can cluster sustain workload? At what utilization? Can cluster sustain hint buildup? Truncate Hints? Can cluster sustain rebuilding full DC? Impact on WAN links during full DC rebuild?
  • 32. Long Running Workload © DataStax, All Rights Reserved. 32 Considerations Change in response times (read and/or write) Impact of delete workloads Impact of compaction settings Changing impact of repair operations
  • 33. Other Scenarios to Consider © DataStax, All Rights Reserved. 33 Flapping node Flapping network (single node) Flapping network (WAN) Over consuming WAN (Multi-DC)
  • 34. Strategies © DataStax, All Rights Reserved. 34 Pick scenarios that matter to you Test in advance at as close to scale as possible Repeat tests on major changes: Infrastructure Compaction Strategies Workload (read/write mix, volume, etc.)

Editor's Notes

  1. So let’s start off with some assumptions we are going to make. You have adequate hardware to meet your workload needs (or at least you think you do.) You are adept at installing and configuring Cassandra or DSE. You are able to and have designed a sound data model. There are lot’s of other presentations about data modeling…
  2. So why do we build distributed systems and especially use distributed databases like Cassandra? To get these specific capabilities. None of this should be a surprise to anyone. While these are fantastic reasons for using distributed systems, these capabilities don’t come without any cost. Understanding how distributed systems can fail is even more critical, because there are many connected parts that make up the powerful whole. If a component fails and you aren’t prepared for it, or don’t understand what failure looks like, or what recovery takes, you are no better of than with a single image system, in fact you may be in a worse place.
  3. While Distributed systems are much complex than single image systems
  4. Go through list… These are assumptions that I see people using Cassandra in the field make all the time The scariest part of this is, all of these are true (well, except for the first one) But they aren’t true in all circumstances, or even in many circumstances. Making blind assumptions about how something as complex as Cassandra is going to work is dangerous. What we are here to talk about today are the factors we need to be aware of in a testing plan and approaches you should be taking to ensure that Cassandra / DSE is going to provide you the benefits it promises.
  5. Go through list… These are assumptions that I see people using Cassandra in the field make all the time The scariest part of this is, all of these are true (well, except for the first one) But they aren’t true in all circumstances, or even in many circumstances. Making blind assumptions about how something as complex as Cassandra is going to work is dangerous. What we are here to talk about today are the factors we need to be aware of in a testing plan and approaches you should be taking to ensure that Cassandra / DSE is going to provide you the benefits it promises.
  6. Testing for a high write workload on STCS does not prepare you for a high read workload on LCS.
  7. Testing for a high write workload on STCS does not prepare you for a high read workload on LCS.
  8. While Distributed systems are much complex than single image systems
  9. While Distributed systems are much complex than single image systems
  10. While Distributed systems are much complex than single image systems
  11. While Distributed systems are much complex than single image systems
  12. This is the test that almost everyone runs (at least once)
  13. This is the test that almost everyone runs (at least once)
  14. This is the test that almost everyone runs (at least once)
  15. This is the test that almost everyone runs (at least once)
  16. This is the test that almost everyone runs (at least once)
  17. This is the test that almost everyone runs (at least once)
  18. While Distributed systems are much complex than single image systems
  19. While Distributed systems are much complex than single image systems
  20. While Distributed systems are much complex than single image systems
  21. While Distributed systems are much complex than single image systems
  22. While Distributed systems are much complex than single image systems
  23. While Distributed systems are much complex than single image systems
  24. While Distributed systems are much complex than single image systems