Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Measuring Database Performance on Bare Metal AWS Instances

343 views

Published on

AWS has recently announced a new type of instance targeted at I/O intensive applications, the i3.metal. That instance does away with the virtualization layer altogether and gives back the resources that would otherwise be used by the hypervisor back to the application.

To use all of those resources — 72 CPUs and 512GB of memory — a database needs to be have the ability to scale both up and out.

In this webinar we will look into the performance of Scylla running in a few of those instances versus Apache Cassandra running in their sweet spot, a larger fleet of smaller instances. We will discuss how much of the gains come from the database design and how much come from the removal of the virtualization layer.

Key takeaways:

How to properly compare two different database technologies while being fair to both
How to choose the optimal setup for your Scylla deployment
How AWS’s bare metal servers enable Scylla users to draw a significant performance boost.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Measuring Database Performance on Bare Metal AWS Instances

  1. 1. Measuring Database Performance on Bare Metal AWS Instances Tomer Sandler - Solution Architect, ScyllaDB Glauber Costa - VP Field Engineering, ScyllaDB WEBINAR
  2. 2. 2 + Next-generation NoSQL database + Drop-in replacement for Cassandra + 10X the performance & low tail latency + Open source and enterprise editions + Founded by the creators of KVM hypervisor + HQs: Palo Alto, CA; Herzelia, Israel
  3. 3. Join real-time big-data database developers and users from start-ups and leading enterprises from around the globe for two days of sharing ideas, hearing innovative use cases, and getting practical tips and tricks from your peers and NoSQL gurus.
  4. 4. 4 Tomer Sandler is a Solution Architect at ScyllaDB. Tomer joined Scylla 18 months ago. Prior to Scylla Tomer worked at EMC, mostly on SW defined storage. Glauber Costa is VP of Field Engineering at ScyllaDB. He shares his time between the engineering department working on upcoming Scylla features and helping customers succeed. Before ScyllaDB, Glauber worked with Virtualization in the Linux Kernel for 10 years.
  5. 5. 5 + Good Benchmarking - Keep It Fair + Run each database on its optimal setup and hardware + Compare as apples-to-apples as possible + Think About the User Perspective + Volume, throughput, and latency needs to meet the business needs + Gauge the resources each database needs to meet established requirements
  6. 6. 6 + Scylla 2.2. vs Cassandra 3.11 + AWS EC2 + Seastar infrastructure, NUMA awareness, JVM + Scylla: i3.Metal (4-nodes) vs Cassandra: i3.4XL (40-nodes) + Cassandra-stress
  7. 7. 7 Scylla Cluster Cassandra Cluster EC2 Instance type i3.Metal (72 vCPU | 512 GiB RAM) i3.4xlarge (16 vCPU | 122 GiB RAM) Storage (ephemeral disks) 8 NVMe drives, each 1900GB 2 NVMe drives, each 1900GB Network 25Gbps Up to 10Gbps Cluster size 4-node cluster on single DC 40-node cluster on single DC Total CPU and RAM CPU count: 288 | RAM size: 2TB CPU count: 640 | RAM size: ~4.76TB DB SW version Scylla 2.2 Cassandra 3.11.2 (OpenJDK build 1.8.0_171-b10) Scylla Loaders Cassandra Loaders Population 4 x m4.2xlarge (8 vCPU | 32 GiB RAM) 8 c-s clients, 2 per instance 16 x m4.2xlarge (8 vCPU | 32 GiB RAM) 16 c-s clients, 1 per instance Latency tests 7 x i3.8xlarge (up to 10Gb network) 14 c-s clients, 2 per instance 8 x i3.8xlarge (up to 10Gb network) 16 c-s clients, 2 per instance
  8. 8. 8 + 38.85 billion partitions (~11TB) + Replication factor (RF) = 3 + 50:50 read/write ratio + Latency: up to 10 milliseconds for the 99th percentile + Throughput requirements: 300k, 200k, and 100k IOps + Gaussian distribution (38.85B, MD: 19.425B, STD: 6.475B) + Each test: 90 min
  9. 9. 9 -Xms48G -Xmx48G -XX:+UseG1GC -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:InitiatingHeapOccupancyPercent=70 -XX:ParallelGCThreads=16 -XX:PrintFLSStatistics=1 -Xloggc:/var/log/cassandra/gc.log #-XX:+CMSClassUnloadingEnabled #-XX:+UseParNewGC #-XX:+UseConcMarkSweepGC #-XX:+CMSParallelRemarkEnabled #-XX:SurvivorRatio=8 #-XX:MaxTenuringThreshold=1 #-XX:CMSInitiatingOccupancyFraction=75 #-XX:+UseCMSInitiatingOccupancyOnly #-XX:CMSWaitDuration=10000 #-XX:+CMSParallelInitialMarkEnabled #-XX:+CMSEdenChunksRecordAlways cassandra.yaml buffer_pool_use_heap_if_exhausted: true disk_optimization_strategy: ssd row_cache_size_in_mb: 10240 concurrent_compactors: 16 compaction_throughput_mb_per_sec: 960 jvm.options IO tuning echo 1 > /sys/block/md0/queue/nomerges echo 8 > /sys/block/md0/queue/read_ahead_kb echo deadline > /sys/block/md0/queue/scheduler
  10. 10. 10 Query Commitlog Compaction Queue Queue Queue Userspace I/O Scheduler Disk
  11. 11. 11 Scylla Cassandra Total storage used ~32.5 TB ~27 TB nodetool status server load (Avg.) ~8.12 TB / node ~690.9 GB / node /dev/md0 (Avg.) ~8.18 TB / node ~692 GB / node Data size / RAM ratio ~16.25 : 1 ~5½ : 1
  12. 12. 12 Scylla 2.2 Cassandra 3.11 Year term Estimated cost: ~$112K ● 4 x i3.metal cost: $112,100 (1-year contract, all upfront payment) ● 99th percentile latency: Up to 11X lower ● 99.9th percentile latency: Up to 45X lower Year term Estimated cost: ~$278.6K ● 40 x i3.4xlarge cost: $278,560 (1-year contract, all upfront payment)
  13. 13. 13 I like!
  14. 14. 14
  15. 15. 15
  16. 16. 16 12% more resources, same price.
  17. 17. 17 I Like! i3.16xlarge i3.metal Diff Sequential 1MB Writes 6,231 MB/s 6,228 MB/s + 0% Sequential 1MB Reads 15,732 MB/s 15,767 MB/s + 0% Random 4kB Writes 1.45 M IOPS 1.44 M IOPS + 0% Random 4kB Reads 2.82 M IOPS 3.08 M IOPS + 9%
  18. 18. 18 I Like!
  19. 19. 19 I Like! + Overhead expected to be better with the KVM-based Nitro Hypervisor + For now, cheaper interrupts mean faster I/O for IOPS-based workloads + Same logic for networking.
  20. 20. 20 I Like! + Overhead expected to be better with the KVM-based Nitro Hypervisor + For now, cheaper interrupts mean faster I/O for IOPS-based workloads + Same logic for networking.
  21. 21. 21 I Like!
  22. 22. 22 I Like!
  23. 23. 23 I Like! average latency 95th latency 99th latency 99.9th latency i3.16xlarge 3.7ms 6.0 ms 9.8ms 37.3ms i3.metal 0.9 ms 1.1 ms 2.4ms 4.6ms better by: 4x 5x 4x 8x Throughput gains: 31% (more than what the 12% added CPUs would grant)
  24. 24. 24 SCYLLA SUMMIT 2018 Join us for 2 days of technical sessions on Scylla, NoSQL & adjacent technologies. Register at: scylladb.com/scylla-summit-2018
  25. 25. United States Israel www.scylladb.com @scylladb

×