2. Topics
● Cassandra Configuration
● Amazon Web Services
3. Why Cassandra?
● High write throughput
● Low latency
● Multiple datacenter replication
● Fault tolerant
● Large online community
● Linear scalability
4. Keyspace Configuration
● One keyspace with two column families
● Multiple secondary indexes
● No super column families
● Counter columns
● Consistent column counts
● Large row and key cache
● Compression
5. Keyspace Configuration
● placement_strategy =
'NetworkTopologyStrategy'
● strategy_options = {eu-west : 1, us-west : 1,
us-east : 1}
● Replication factor of 1
● 9 Nodes total
● 3 Nodes at each datacenter Token Ring
EC2 Snitch
Node 1
● Node 2
Node 3
6. HasOffers Keyspace Statistics
● Number of Keys (estimate): 318453248
● Key ttl: 90 days
● Approximately 13.8 Million daily inserts
● Replication to 3 Datacenters
● Approximately 3 Million daily queries
● Compacted row mean size: 1408
7. Daily Inserts
16000000
14000000
12000000
10000000
USW
USE
EUW
Keys
8000000
ALL
6000000
4000000
2000000
0
Month
8. Cassandra Configuration
● Keep Commitlog and Data on separate disks
● Set Initial Token to prevent hotspots
● RandomPartitioner for good data distribution
● commitlog_sync, batch vs. Periodic
● Batch mode won't ack writes until log has been
synced to disk to prevent dropped mutations.
9. Cassandra Configuration
● max_hint_window_in_ms:
● Depends on replication factor and response time
● flush_largest_memtables_at: 0.95
● Depends on heap size
● rpc_timeout_in_ms: 18000
● Network latency and datacenter location are factors
● index_interval: 512
● Larger index interval can lower memory usage
12. Data Recovery
● nodetool repair
● Resource intensive
● Depends on data locality
● sstable loader
● snapshots
13. AWS
● Instance Sizes
● Complex cassandra queries require more
memory
● Ephemeral vs. EBS vs. PIO
● Root Partition instance store/EBS
14. AWS Instances
● 14 different instance types
● Varying specifications and prices
High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)
1690 GB of instance storage
64-bit platform
I/O Performance: High
EBS-Optimized Available: 1000 Mbps
API name: m2.4xlarge
15. Disk Options
● EBS RAID
● Good performance
● Easy implementation
● Provisions IO's
● Additional cost
● Very Good performance
● Optimized for use with only some instance types
● Ephemeral
● Good performance
● Lost when instance is stopped
17. Provisioned IOPS for Amazon EBS
“Provisioned IOPS are a new EBS volume type designed to
deliver predictable, high performance for I/O intensive
workloads, such as database applications, that rely on
consistent and fast response times. With EBS Provisioned
IOPS, customers can flexibly specify both volume size and
volume performance, and Amazon EBS will consistently
deliver the desired performance over the lifetime of the
volume. Customers can then attach multiple volumes to an
Amazon EC2 instance and stripe across them to deliver
thousands of IOPS to their application.”
*EBS-Optimized instances deliver dedicated throughput
between Amazon EC2 and Amazon EBS, with options
between 500 Megabits per second and 1,000 Megabits per
second depending on the instance type used.