SlideShare a Scribd company logo
1 of 18
Download to read offline
Apache Cassandra @ HasOffers




Tuesday, August 28, 2012
Topics
●   Cassandra Configuration
●   Amazon Web Services
Why Cassandra?
●   High write throughput
●   Low latency
●   Multiple datacenter replication
●   Fault tolerant
●   Large online community
●   Linear scalability
Keyspace Configuration
●   One keyspace with two column families
●   Multiple secondary indexes
●   No super column families
●   Counter columns
●   Consistent column counts
●   Large row and key cache
●   Compression
Keyspace Configuration
●    placement_strategy =
    'NetworkTopologyStrategy'
●   strategy_options = {eu-west : 1, us-west : 1,
    us-east : 1}
●   Replication factor of 1
●   9 Nodes total
●   3 Nodes at each datacenter               Token Ring

    EC2 Snitch
                                    Node 1
●                                   Node 2
                                    Node 3
HasOffers Keyspace Statistics
●   Number of Keys (estimate): 318453248
●   Key ttl: 90 days
●   Approximately 13.8 Million daily inserts
●   Replication to 3 Datacenters
●   Approximately 3 Million daily queries
●   Compacted row mean size: 1408
Daily Inserts


       16000000


       14000000


       12000000


       10000000
                                  USW
                                  USE
                                  EUW
Keys




        8000000
                                  ALL


        6000000


        4000000


        2000000


             0

                  Month
Cassandra Configuration
●   Keep Commitlog and Data on separate disks
●   Set Initial Token to prevent hotspots
●   RandomPartitioner for good data distribution
●   commitlog_sync, batch vs. Periodic
    ●   Batch mode won't ack writes until log has been
        synced to disk to prevent dropped mutations.
Cassandra Configuration
●   max_hint_window_in_ms:
    ●   Depends on replication factor and response time
●   flush_largest_memtables_at: 0.95
    ●   Depends on heap size
●   rpc_timeout_in_ms: 18000
    ●   Network latency and datacenter location are factors
●   index_interval: 512
    ●   Larger index interval can lower memory usage
Cassandra Configuration
●   Nodes networked behind a VPN
●   Pycassa client library
●   Multiple clause resource intensive queries
●   Key based queries
●   Regional failover strategy controlled by client
    script
Cassandra Configuration
●   Pycassa client scripts
●   Query with exception handling
    ●   Retry
    ●   Reconnect
    ●   Fail
Data Recovery
●   nodetool repair
    ●   Resource intensive
    ●   Depends on data locality
●   sstable loader
●   snapshots
AWS
●   Instance Sizes
●   Complex cassandra queries require more
    memory
●   Ephemeral vs. EBS vs. PIO
●   Root Partition instance store/EBS
AWS Instances
●   14 different instance types
●   Varying specifications and prices

            High-Memory Quadruple Extra Large Instance

            68.4 GB of memory

            26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)

            1690 GB of instance storage

            64-bit platform

            I/O Performance: High

            EBS-Optimized Available: 1000 Mbps

            API name: m2.4xlarge
Disk Options
●   EBS RAID
    ●   Good performance
    ●   Easy implementation
●   Provisions IO's
    ●   Additional cost
    ●   Very Good performance
    ●   Optimized for use with only some instance types
●   Ephemeral
    ●   Good performance
    ●   Lost when instance is stopped
EBS Raid Performance on AWS
Provisioned IOPS for Amazon EBS
  “Provisioned IOPS are a new EBS volume type designed to
  deliver predictable, high performance for I/O intensive
  workloads, such as database applications, that rely on
  consistent and fast response times. With EBS Provisioned
  IOPS, customers can flexibly specify both volume size and
  volume performance, and Amazon EBS will consistently
  deliver the desired performance over the lifetime of the
  volume. Customers can then attach multiple volumes to an
  Amazon EC2 instance and stripe across them to deliver
  thousands of IOPS to their application.”
  *EBS-Optimized instances deliver dedicated throughput
  between Amazon EC2 and Amazon EBS, with options
  between 500 Megabits per second and 1,000 Megabits per
  second depending on the instance type used.
Questions?

More Related Content

What's hot

Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScyllaDB
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0ScyllaDB
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016Tzach Livyatan
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesScyllaDB
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB
 
Scylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScyllaDB
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScyllaDB
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!ScyllaDB
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and SeastarTzach Livyatan
 
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
Kafka Summit SF 2017 - Infrastructure for Streaming Applications Kafka Summit SF 2017 - Infrastructure for Streaming Applications
Kafka Summit SF 2017 - Infrastructure for Streaming Applications confluent
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB
 

What's hot (19)

Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates
 
Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0Introducing Scylla Open Source 4.0
Introducing Scylla Open Source 4.0
 
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQDataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
Measuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS InstancesMeasuring Database Performance on Bare Metal AWS Instances
Measuring Database Performance on Bare Metal AWS Instances
 
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScylla Summit 2022: New AWS Instances Perfect for ScyllaDB
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
 
Scylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScylla’s Journey Towards Being an Elastic Cloud Native Database
Scylla’s Journey Towards Being an Elastic Cloud Native Database
 
Scylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi KivityScylla Summit 2019 Keynote - Avi Kivity
Scylla Summit 2019 Keynote - Avi Kivity
 
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
 
Instaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr Apache Cassandra Best Practices & Toubleshooting
Instaclustr Apache Cassandra Best Practices & Toubleshooting
 
Back to the future with C++ and Seastar
Back to the future with C++ and SeastarBack to the future with C++ and Seastar
Back to the future with C++ and Seastar
 
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
Kafka Summit SF 2017 - Infrastructure for Streaming Applications Kafka Summit SF 2017 - Infrastructure for Streaming Applications
Kafka Summit SF 2017 - Infrastructure for Streaming Applications
 
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...
 
Critical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency DatabaseCritical Attributes for a High-Performance, Low-Latency Database
Critical Attributes for a High-Performance, Low-Latency Database
 

Similar to Seattle Cassandra Meetup - HasOffers

DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefGaurav "GP" Pal
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4Gaurav "GP" Pal
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Amazon Web Services
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloudOVHcloud
 
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon Web Services
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesAmazon Web Services
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engineBhuvaneshwaran R
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionSearce Inc
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 
AWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance CachingScyllaDB
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 

Similar to Seattle Cassandra Meetup - HasOffers (20)

DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefDevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef
 
stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4stackArmor presentation for DevOpsDC ver 4
stackArmor presentation for DevOpsDC ver 4
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017
 
Introduction on Amazon EC2
Introduction on Amazon EC2Introduction on Amazon EC2
Introduction on Amazon EC2
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon EC2 Instances, Featuring Performance Optimisation Best Practices
Amazon EC2 Instances, Featuring Performance Optimisation Best Practices
 
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalSizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
Optimizing elastic search on google compute engine
Optimizing elastic search on google compute engineOptimizing elastic search on google compute engine
Optimizing elastic search on google compute engine
 
Running ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in ProductionRunning ElasticSearch on Google Compute Engine in Production
Running ElasticSearch on Google Compute Engine in Production
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 
AWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDSAWS Webcast - Cost and Performance Optimization in Amazon RDS
AWS Webcast - Cost and Performance Optimization in Amazon RDS
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
How does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsDataHow does Apache Pegasus (incubating) community develop at SensorsData
How does Apache Pegasus (incubating) community develop at SensorsData
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 

Seattle Cassandra Meetup - HasOffers

  • 1. Apache Cassandra @ HasOffers Tuesday, August 28, 2012
  • 2. Topics ● Cassandra Configuration ● Amazon Web Services
  • 3. Why Cassandra? ● High write throughput ● Low latency ● Multiple datacenter replication ● Fault tolerant ● Large online community ● Linear scalability
  • 4. Keyspace Configuration ● One keyspace with two column families ● Multiple secondary indexes ● No super column families ● Counter columns ● Consistent column counts ● Large row and key cache ● Compression
  • 5. Keyspace Configuration ● placement_strategy = 'NetworkTopologyStrategy' ● strategy_options = {eu-west : 1, us-west : 1, us-east : 1} ● Replication factor of 1 ● 9 Nodes total ● 3 Nodes at each datacenter Token Ring EC2 Snitch Node 1 ● Node 2 Node 3
  • 6. HasOffers Keyspace Statistics ● Number of Keys (estimate): 318453248 ● Key ttl: 90 days ● Approximately 13.8 Million daily inserts ● Replication to 3 Datacenters ● Approximately 3 Million daily queries ● Compacted row mean size: 1408
  • 7. Daily Inserts 16000000 14000000 12000000 10000000 USW USE EUW Keys 8000000 ALL 6000000 4000000 2000000 0 Month
  • 8. Cassandra Configuration ● Keep Commitlog and Data on separate disks ● Set Initial Token to prevent hotspots ● RandomPartitioner for good data distribution ● commitlog_sync, batch vs. Periodic ● Batch mode won't ack writes until log has been synced to disk to prevent dropped mutations.
  • 9. Cassandra Configuration ● max_hint_window_in_ms: ● Depends on replication factor and response time ● flush_largest_memtables_at: 0.95 ● Depends on heap size ● rpc_timeout_in_ms: 18000 ● Network latency and datacenter location are factors ● index_interval: 512 ● Larger index interval can lower memory usage
  • 10. Cassandra Configuration ● Nodes networked behind a VPN ● Pycassa client library ● Multiple clause resource intensive queries ● Key based queries ● Regional failover strategy controlled by client script
  • 11. Cassandra Configuration ● Pycassa client scripts ● Query with exception handling ● Retry ● Reconnect ● Fail
  • 12. Data Recovery ● nodetool repair ● Resource intensive ● Depends on data locality ● sstable loader ● snapshots
  • 13. AWS ● Instance Sizes ● Complex cassandra queries require more memory ● Ephemeral vs. EBS vs. PIO ● Root Partition instance store/EBS
  • 14. AWS Instances ● 14 different instance types ● Varying specifications and prices High-Memory Quadruple Extra Large Instance 68.4 GB of memory 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High EBS-Optimized Available: 1000 Mbps API name: m2.4xlarge
  • 15. Disk Options ● EBS RAID ● Good performance ● Easy implementation ● Provisions IO's ● Additional cost ● Very Good performance ● Optimized for use with only some instance types ● Ephemeral ● Good performance ● Lost when instance is stopped
  • 17. Provisioned IOPS for Amazon EBS “Provisioned IOPS are a new EBS volume type designed to deliver predictable, high performance for I/O intensive workloads, such as database applications, that rely on consistent and fast response times. With EBS Provisioned IOPS, customers can flexibly specify both volume size and volume performance, and Amazon EBS will consistently deliver the desired performance over the lifetime of the volume. Customers can then attach multiple volumes to an Amazon EC2 instance and stripe across them to deliver thousands of IOPS to their application.” *EBS-Optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS, with options between 500 Megabits per second and 1,000 Megabits per second depending on the instance type used.