Seattle Cassandra Meetup - HasOffers

•

3 likes•1,466 views

btoddb

HasOffers presentation on using Apache Cassandra in AWS

Apache Cassandra @ HasOffers

Tuesday, August 28, 2012

Topics
● Cassandra Configuration
● Amazon Web Services

Why Cassandra?
● High write throughput
● Low latency
● Multiple datacenter replication
● Fault tolerant
● Large online community
● Linear scalability

Keyspace Configuration
● One keyspace with two column families
● Multiple secondary indexes
● No super column families
● Counter columns
● Consistent column counts
● Large row and key cache
● Compression

$Keyspace Configuration ● placement_strategy = 'NetworkTopologyStrategy' ● strategy_options = {eu-west : 1, us-west : 1, us-east : 1} ● Replication factor of 1 ● 9 Nodes total ● 3 Nodes at each datacenter Token Ring EC2 Snitch Node 1 ● Node 2 Node 3$

HasOffers Keyspace Statistics
● Number of Keys (estimate): 318453248
● Key ttl: 90 days
● Approximately 13.8 Million daily inserts
● Replication to 3 Datacenters
● Approximately 3 Million daily queries
● Compacted row mean size: 1408

Daily Inserts

16000000

14000000

12000000

10000000
USW
USE
EUW
Keys

8000000
ALL

6000000

4000000

2000000

0

Month

Cassandra Configuration
● Keep Commitlog and Data on separate disks
● Set Initial Token to prevent hotspots
● RandomPartitioner for good data distribution
● commitlog_sync, batch vs. Periodic
● Batch mode won't ack writes until log has been
synced to disk to prevent dropped mutations.

Cassandra Configuration
● max_hint_window_in_ms:
● Depends on replication factor and response time
● flush_largest_memtables_at: 0.95
● Depends on heap size
● rpc_timeout_in_ms: 18000
● Network latency and datacenter location are factors
● index_interval: 512
● Larger index interval can lower memory usage

Cassandra Configuration
● Nodes networked behind a VPN
● Pycassa client library
● Multiple clause resource intensive queries
● Key based queries
● Regional failover strategy controlled by client
script

Cassandra Configuration
● Pycassa client scripts
● Query with exception handling
● Retry
● Reconnect
● Fail

Data Recovery
● nodetool repair
● Resource intensive
● Depends on data locality
● sstable loader
● snapshots

AWS
● Instance Sizes
● Complex cassandra queries require more
memory
● Ephemeral vs. EBS vs. PIO
● Root Partition instance store/EBS

AWS Instances
● 14 different instance types
● Varying specifications and prices

High-Memory Quadruple Extra Large Instance

68.4 GB of memory

26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each)

1690 GB of instance storage

64-bit platform

I/O Performance: High

EBS-Optimized Available: 1000 Mbps

API name: m2.4xlarge

Disk Options
● EBS RAID
● Good performance
● Easy implementation
● Provisions IO's
● Additional cost
● Very Good performance
● Optimized for use with only some instance types
● Ephemeral
● Good performance
● Lost when instance is stopped

Provisioned IOPS for Amazon EBS
“Provisioned IOPS are a new EBS volume type designed to
deliver predictable, high performance for I/O intensive
workloads, such as database applications, that rely on
consistent and fast response times. With EBS Provisioned
IOPS, customers can flexibly specify both volume size and
volume performance, and Amazon EBS will consistently
deliver the desired performance over the lifetime of the
volume. Customers can then attach multiple volumes to an
Amazon EC2 instance and stripe across them to deliver
thousands of IOPS to their application.”
*EBS-Optimized instances deliver dedicated throughput
between Amazon EC2 and Amazon EBS, with options
between 500 Megabits per second and 1,000 Megabits per
second depending on the instance type used.

What's hot

Sizing Your Scylla ClusterScyllaDB

Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair UpdatesScyllaDB

Introducing Scylla Open Source 4.0ScyllaDB

DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQHakka Labs

ScyllaDB @ Apache BigData, may 2016Tzach Livyatan

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...Instaclustr

Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScyllaDB

Measuring Database Performance on Bare Metal AWS InstancesScyllaDB

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDBScyllaDB

Scylla’s Journey Towards Being an Elastic Cloud Native DatabaseScyllaDB

Scylla Summit 2019 Keynote - Avi KivityScyllaDB

Scylla Summit 2019 Keynote - Dor Laor - Beyond CassandraScyllaDB

Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy

Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!ScyllaDB

Instaclustr Apache Cassandra Best Practices & ToubleshootingInstaclustr

Back to the future with C++ and SeastarTzach Livyatan

Kafka Summit SF 2017 - Infrastructure for Streaming Applications confluent

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...DataStax

Critical Attributes for a High-Performance, Low-Latency DatabaseScyllaDB

What's hot (19)

Sizing Your Scylla Cluster

Scylla Summit 2018: Scylla Feature Talks - Scylla Streaming and Repair Updates

Introducing Scylla Open Source 4.0

DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ

ScyllaDB @ Apache BigData, may 2016

Instaclustr Webinar 50,000 Transactions Per Second with Apache Spark on Apach...

Scylla Summit 2018: Consensus in Eventually Consistent Databases

Measuring Database Performance on Bare Metal AWS Instances

Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB

Scylla’s Journey Towards Being an Elastic Cloud Native Database

Scylla Summit 2019 Keynote - Avi Kivity

Scylla Summit 2019 Keynote - Dor Laor - Beyond Cassandra

Cassandra Day Atlanta 2015: Diagnosing Problems in Production

Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!

Instaclustr Apache Cassandra Best Practices & Toubleshooting

Back to the future with C++ and Seastar

Kafka Summit SF 2017 - Infrastructure for Streaming Applications

Terror & Hysteria: Cost Effective Scaling of Time Series Data with Cassandra ...

Critical Attributes for a High-Performance, Low-Latency Database

Similar to Seattle Cassandra Meetup - HasOffers

DevOps for ETL processing at scale with MongoDB, Solr, AWS and ChefGaurav "GP" Pal

stackArmor presentation for DevOpsDC ver 4Gaurav "GP" Pal

EVCache & Moneta (GoSF)Scott Mansfield

Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent

Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017Amazon Web Services

Introduction on Amazon EC2Amazon Web Services

Logs @ OVHcloudOVHcloud

Amazon EC2 Instances, Featuring Performance Optimisation Best PracticesAmazon Web Services

Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-FinalVigyan Jain

Cassandra trainingAndrás Fehér

EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield

Getting Started with Amazon EC2 and Compute ServicesAmazon Web Services

Netflix Open Source Meetup Season 4 Episode 2aspyker

Optimizing elastic search on google compute engineBhuvaneshwaran R

Running ElasticSearch on Google Compute Engine in ProductionSearce Inc

M6d cassandrapresentationEdward Capriolo

AWS Webcast - Cost and Performance Optimization in Amazon RDSAmazon Web Services

cachegrand: A Take on High Performance CachingScyllaDB

How does Apache Pegasus (incubating) community develop at SensorsDataacelyc1112009

Running MongoDB 3.0 on AWSMongoDB

Similar to Seattle Cassandra Meetup - HasOffers (20)

DevOps for ETL processing at scale with MongoDB, Solr, AWS and Chef

stackArmor presentation for DevOpsDC ver 4

EVCache & Moneta (GoSF)

Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...

Deep Dive on Amazon EC2 Instances - AWS Summit Cape Town 2017

Introduction on Amazon EC2

Logs @ OVHcloud

Amazon EC2 Instances, Featuring Performance Optimisation Best Practices

Sizing MongoDB on AWS with Wired Tiger-Patrick and Vigyan-Final

Cassandra training

EVCache: Lowering Costs for a Low Latency Cache with RocksDB

Getting Started with Amazon EC2 and Compute Services

Netflix Open Source Meetup Season 4 Episode 2

Optimizing elastic search on google compute engine

Running ElasticSearch on Google Compute Engine in Production

M6d cassandrapresentation

AWS Webcast - Cost and Performance Optimization in Amazon RDS

cachegrand: A Take on High Performance Caching

How does Apache Pegasus (incubating) community develop at SensorsData

Running MongoDB 3.0 on AWS

Seattle Cassandra Meetup - HasOffers

1. Apache Cassandra @ HasOffers Tuesday, August 28, 2012

2. Topics ● Cassandra Configuration ● Amazon Web Services

3. Why Cassandra? ● High write throughput ● Low latency ● Multiple datacenter replication ● Fault tolerant ● Large online community ● Linear scalability

4. Keyspace Configuration ● One keyspace with two column families ● Multiple secondary indexes ● No super column families ● Counter columns ● Consistent column counts ● Large row and key cache ● Compression

5. Keyspace Configuration ● placement_strategy = 'NetworkTopologyStrategy' ● strategy_options = {eu-west : 1, us-west : 1, us-east : 1} ● Replication factor of 1 ● 9 Nodes total ● 3 Nodes at each datacenter Token Ring EC2 Snitch Node 1 ● Node 2 Node 3

6. HasOffers Keyspace Statistics ● Number of Keys (estimate): 318453248 ● Key ttl: 90 days ● Approximately 13.8 Million daily inserts ● Replication to 3 Datacenters ● Approximately 3 Million daily queries ● Compacted row mean size: 1408

7. Daily Inserts 16000000 14000000 12000000 10000000 USW USE EUW Keys 8000000 ALL 6000000 4000000 2000000 0 Month

8. Cassandra Configuration ● Keep Commitlog and Data on separate disks ● Set Initial Token to prevent hotspots ● RandomPartitioner for good data distribution ● commitlog_sync, batch vs. Periodic ● Batch mode won't ack writes until log has been synced to disk to prevent dropped mutations.

9. Cassandra Configuration ● max_hint_window_in_ms: ● Depends on replication factor and response time ● flush_largest_memtables_at: 0.95 ● Depends on heap size ● rpc_timeout_in_ms: 18000 ● Network latency and datacenter location are factors ● index_interval: 512 ● Larger index interval can lower memory usage

10. Cassandra Configuration ● Nodes networked behind a VPN ● Pycassa client library ● Multiple clause resource intensive queries ● Key based queries ● Regional failover strategy controlled by client script

11. Cassandra Configuration ● Pycassa client scripts ● Query with exception handling ● Retry ● Reconnect ● Fail

12. Data Recovery ● nodetool repair ● Resource intensive ● Depends on data locality ● sstable loader ● snapshots

13. AWS ● Instance Sizes ● Complex cassandra queries require more memory ● Ephemeral vs. EBS vs. PIO ● Root Partition instance store/EBS

14. AWS Instances ● 14 different instance types ● Varying specifications and prices High-Memory Quadruple Extra Large Instance 68.4 GB of memory 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High EBS-Optimized Available: 1000 Mbps API name: m2.4xlarge

15. Disk Options ● EBS RAID ● Good performance ● Easy implementation ● Provisions IO's ● Additional cost ● Very Good performance ● Optimized for use with only some instance types ● Ephemeral ● Good performance ● Lost when instance is stopped

16. EBS Raid Performance on AWS

17. Provisioned IOPS for Amazon EBS “Provisioned IOPS are a new EBS volume type designed to deliver predictable, high performance for I/O intensive workloads, such as database applications, that rely on consistent and fast response times. With EBS Provisioned IOPS, customers can flexibly specify both volume size and volume performance, and Amazon EBS will consistently deliver the desired performance over the lifetime of the volume. Customers can then attach multiple volumes to an Amazon EC2 instance and stripe across them to deliver thousands of IOPS to their application.” *EBS-Optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS, with options between 500 Megabits per second and 1,000 Megabits per second depending on the instance type used.

18. Questions?

Seattle Cassandra Meetup - HasOffers

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Seattle Cassandra Meetup - HasOffers

Similar to Seattle Cassandra Meetup - HasOffers (20)

Seattle Cassandra Meetup - HasOffers