Your SlideShare is downloading. ×

Seattle Cassandra Meetup - HasOffers


Published on

HasOffers presentation on using Apache Cassandra in AWS

HasOffers presentation on using Apache Cassandra in AWS

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Apache Cassandra @ HasOffersTuesday, August 28, 2012
  • 2. Topics● Cassandra Configuration● Amazon Web Services
  • 3. Why Cassandra?● High write throughput● Low latency● Multiple datacenter replication● Fault tolerant● Large online community● Linear scalability
  • 4. Keyspace Configuration● One keyspace with two column families● Multiple secondary indexes● No super column families● Counter columns● Consistent column counts● Large row and key cache● Compression
  • 5. Keyspace Configuration● placement_strategy = NetworkTopologyStrategy● strategy_options = {eu-west : 1, us-west : 1, us-east : 1}● Replication factor of 1● 9 Nodes total● 3 Nodes at each datacenter Token Ring EC2 Snitch Node 1● Node 2 Node 3
  • 6. HasOffers Keyspace Statistics● Number of Keys (estimate): 318453248● Key ttl: 90 days● Approximately 13.8 Million daily inserts● Replication to 3 Datacenters● Approximately 3 Million daily queries● Compacted row mean size: 1408
  • 7. Daily Inserts 16000000 14000000 12000000 10000000 USW USE EUWKeys 8000000 ALL 6000000 4000000 2000000 0 Month
  • 8. Cassandra Configuration● Keep Commitlog and Data on separate disks● Set Initial Token to prevent hotspots● RandomPartitioner for good data distribution● commitlog_sync, batch vs. Periodic ● Batch mode wont ack writes until log has been synced to disk to prevent dropped mutations.
  • 9. Cassandra Configuration● max_hint_window_in_ms: ● Depends on replication factor and response time● flush_largest_memtables_at: 0.95 ● Depends on heap size● rpc_timeout_in_ms: 18000 ● Network latency and datacenter location are factors● index_interval: 512 ● Larger index interval can lower memory usage
  • 10. Cassandra Configuration● Nodes networked behind a VPN● Pycassa client library● Multiple clause resource intensive queries● Key based queries● Regional failover strategy controlled by client script
  • 11. Cassandra Configuration● Pycassa client scripts● Query with exception handling ● Retry ● Reconnect ● Fail
  • 12. Data Recovery● nodetool repair ● Resource intensive ● Depends on data locality● sstable loader● snapshots
  • 13. AWS● Instance Sizes● Complex cassandra queries require more memory● Ephemeral vs. EBS vs. PIO● Root Partition instance store/EBS
  • 14. AWS Instances● 14 different instance types● Varying specifications and prices High-Memory Quadruple Extra Large Instance 68.4 GB of memory 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High EBS-Optimized Available: 1000 Mbps API name: m2.4xlarge
  • 15. Disk Options● EBS RAID ● Good performance ● Easy implementation● Provisions IOs ● Additional cost ● Very Good performance ● Optimized for use with only some instance types● Ephemeral ● Good performance ● Lost when instance is stopped
  • 16. EBS Raid Performance on AWS
  • 17. Provisioned IOPS for Amazon EBS “Provisioned IOPS are a new EBS volume type designed to deliver predictable, high performance for I/O intensive workloads, such as database applications, that rely on consistent and fast response times. With EBS Provisioned IOPS, customers can flexibly specify both volume size and volume performance, and Amazon EBS will consistently deliver the desired performance over the lifetime of the volume. Customers can then attach multiple volumes to an Amazon EC2 instance and stripe across them to deliver thousands of IOPS to their application.” *EBS-Optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS, with options between 500 Megabits per second and 1,000 Megabits per second depending on the instance type used.
  • 18. Questions?