In this talk AWS’ Ken Krupa, Head of Specialized Solutions Architecture, will describe the architecture and capabilities of two new AWS EC2 instance types perfect for data-intensive storage and IO-heavy workloads like ScyllaDB: the Intel-based I4i and the Graviton2-based I4g series.
The Intel Xeon Ice Lake-based I4i series provides unparalleled raw horsepower for your most demanding workloads. Meanwhile, the Graviton2-powered I4g instances provide lower cost per storage on a power-efficient platform to deploy your cloud-native applications.
Ken will also describe the AWS Nitro SSD, a new form of high-speed NVMe storage with a Flash Translation Layer built with Nitro controllers, which powers both of these instance families.
ScyllaDB VP of Product Tzach Livyatan will then share benchmarking results showing how ScyllaDB behaves under load on these two instance types, providing maximum system utility and efficiency.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Powerful Google developer tools for immediate impact! (2023-24 C)
Scylla Summit 2022: New AWS Instances Perfect for ScyllaDB
1. New AWS Instances
Perfect for ScyllaDB
Ken Krupa, Head of Specialized SA, Compute, AWS
Tzach Livyatan, VP Product, ScyllaDB
2. Ken Krupa
■ Based out of NY Metro area
■ Past experience includes app dev, databases, financial services
■ Let’s just say > 25 years of experience
■ Joined AWS in August of 2020
Head of Specialized SA, Compute, AWS
3. Tzach Livyatan
■ Bsc, MSc in Computer Science
■ More than 10 years of Product Management at Scylla, Oracle,
others.
VP Product, ScyllaDB
YOUR PHOTO
GOES HERE
4. Agenda
■ State of ScyllaDB on AWS
■ Latest and greatest AWS Instances: Intel and Graviton2
powered I4 series
■ ScyllaDB on the new AWS instances
■ Summary and plans for the future
6. Deployment options
Install in Your Datacenter
➔ Scylla Open Source
➔ Scylla Enterprise
➔ AWS Outposts
Deploy at a Cloud Provider
➔ Scylla Open Source
➔ Scylla Enterprise
Database as a Service
➔ Fully managed Scylla
clusters
➔ Bring Your Own Acct
(BYOA) option
On-Prem Cloud Hosted Scylla Cloud
6
7. ScyllaDB & AWS Outposts Better Together
7
■ Amazon DynamoDB compatible API
■ Run on Outposts and in AWS Regions
8. Challenges
■ Needed to store sensitive healthcare data
on-pre to meet HIPAA/GDPR requirements
■ DynamoDB not available on-prem
■ Minimize turnaround time for ISO releases
Scylla Solution
■ Use Scylla Alternator for on-prem
deployment
■ No need to change application code
■ Frictionless extension of DynamoDB to
hybrid deployment
Edison AI Workbench | Cloud
Edison AI Workbench | On-Prem
On Edison Platform Hardware
Extending DynamoDB to
on-prem hybrid deployment
8
9. Preferred Instances
■ Requirements:
• High Speed Storage
• Disk / RAM Ration
■ Preferred instances types:
I3 family
■ Performance: ~30,000
operations per core
15. What to use? Follow this script
■ Use Scylla Calc to find #vCPU and Storage Size
■ Is Disk to vCPU ratio < 2.5?
• Use im4gn (if available);
• Use i3;
■ Else
• Use is4gen (if available);
• Use i3en;
21. Latest Results I3 vs I4 - one node
I3.16xlarge vs i4.16xlarge (64 vCPU servers)
50% Reads / 50% Writes
Latency tests with 50% of the max throughput
22. Latest Results I3 vs I4 - 3 node cluster
I3.16xlarge vs i4.16xlarge (64 vCPU servers)
50% Reads / 50% Writes
Latency tests with 50% of the max throughput
Big thanks to Michał
Chojnowski for benchmarking
all the new AWS instances
types!
23. What to use? Follow this script
■ Use Scylla Calc to find #vCPU and Storage Size
■ Is Disk to vCPU ratio < 2.5?
• Use I4i (if available);
• Use im4gn (if available);
• Use i3;
■ Else
• Use is4gen (if available);
• Use i3en;
24. Summary
■ AWS is the most popular Scylla deployment
option
■ New I4 instances are great for ScyllaDB and
Scylla Cloud Customers
25. Thank you!
Stay in touch
Ken Krupa & Tzach Livyatan
@kenkrupa @tzachl
kenkrupa@amazon.com
tzach@scylladb.com
Editor's Notes
Scylla offers a wide variety of deployment options. You can run Scylla in your datacenter on your own hardware or on AWS Outposts, Scylla is the only certified NoSQL database that runs on Outposts.
Or you can deploy Scylla to any cloud provider. We are giving you freedom of choice.
Or take advantage of our fully managed DBaaS offering, even deploy Scylla Cloud within your own account if you have privacy concerns.
•The benefits of our hybrid model are not limited to developers and IT, rather the benefits of a consistent model extend to your finance and line of business teams, as well as your customers.
•Having a consistent hybrid experience is important so that you can easily scale processes and keep the same pace of innovation as you would in the cloud
•AWS Outposts has the same reliable, secure and high performance infrastructure, which allows you to have the same operational consistency as you have within your AWS Cloud environment
•Outposts also utilizes the same services, APIs, tools for automation, deployment pipelines, and security controls
•Having a consistent hybrid experience like this provides you with the same pace of innovation as you have in the cloud
Here’s a good example, GE Healthcare. They already built their solution on DynamoDB, but due to compliance regulations the need to be able to also run their solution on premises in private cloud. And since DynamoDB is not available outside of AWS, they decided to use Scylla to run their workloads on premises and use our Project Alternator that provides DynamoDB compatible APIs. With that they were able to extend their existing system to a hybrid deployment. They didn’t have to change their application code and the hybrid deployment extension was frictionless and performs very well.
Scylla Summit talk https://www.scylladb.com/presentations/enabling-precision-health-in-edison-ai/
What would be the best instance type to use on AWS?
One can run Scylla on *every* AWS instances, including EBS, but for some, you wouldn't get the best cost/performance ratio
Scylla take full advantage of latest and greatest storage (e.g. SSD), linary scale with the number of core, and take full advantage of the RAM
Out of all AWS (many) options, we found the i3 instance family to gibe the nest cost/performance ratio, with i3en for cases where storage is the bottleneck, not cpu.
You can still run Scylla on all other instances.
Amazon EC2 Im4gn and Is4gen instances are next-generation, storage-optimized instances designed for running applications that require high throughput and low-latency access to large amounts of data on local SSD storage such as SQL databases (MySQL, MariaDB, PostgreSQL), NoSQL databases (Cassandra, ScyllaDB, MongoDB), search engines, analytics, streaming, and large distributed file systems. They are powered by AWS Graviton2 processors and provide up to 30 TB of storage with AWS Nitro SSDs. The AWS Nitro SSDs are AWS-designed SSDs that provide high I/O performance, low latency, minimal latency variability, and security with always-on encryption.
Im4gn instances provide the best price performance for storage-optimized workloads in Amazon EC2 and up to 100 Gbps of networking bandwidth. They are ideal for running applications such as MySQL databases, NoSQL databases, and file systems, which require dense local SSD storage and higher compute performance compared to I3 instances.
Is4gen instances provide the lowest cost per TB and highest density per vCPU of SSD storage in Amazon EC2. They are ideal for running applications such as stream processing and monitoring, real-time databases, log analytics, and distributed file systems.
Amazon’s relentless drive for innovation brings us new instance types optimal for ScyllaDB database workloads: the x86-based Amazon EC2 I4i series as well as a new Graviton2-based I4g series — specifically the Im4gn and Is4gen instances.
This new server family based on the Nitro System provides improved CPU density and chip performance and expanded memory over the existing I3 and I3en series. It also supports faster and more predictable I/O and a maximum NVMe storage level that sits nestled squarely between the other two.
The version of Scylla under test was scylla-4.5.rc7 with commit 7df9deb628bb31906c additionally cherry-picked from master (it’s a small patch which fixes a minor performance bug on ARM).
All tests were performed with shard-aware cassandra-stress on a cluster of size 3, with replication factor of 3 and consistency level QUORUM.
Source https://docs.google.com/spreadsheets/d/1Ik5Yp_p3KdnFRKuOk0VUxMsgtkbMl2_W6nlphzSOQ7o/edit#gid=0
With better performance/price and significantly greater storage density, im4gn is looking to be a solid upgrade over i3.
is4gen presents a tradeoff. Since it has the highest storage density of all instance types, it is now the optimal choice for space-bounded clusters, but in Scylla’s case it’s a worse choice than i3en for performance-bound clusters. (Though for performance-bound clusters the best choice is likely neither of the two, but im4gn!)
The formula is ((i4 throughput / i4 price) / (i3 throughput / i3 price)).
With better performance/price and significantly greater storage density, im4gn is looking to be a solid upgrade over i3.
is4gen presents a tradeoff. Since it has the highest storage density of all instance types, it is now the optimal choice for space-bounded clusters, but in Scylla’s case it’s a worse choice than i3en for performance-bound clusters. (Though for performance-bound clusters the best choice is likely neither of the two, but im4gn!)
The I4i series sports an x86-based architecture centered on the Intel Ice Lake Xeon scalable processor, with 3.5 GHz speeds.
Compare that to the i3.metal’s Intel Xeon E5-2686 v4 Broadwell operating at 2.3 GHz with bursts up to 3 GHz and the I3en’s Xeon Skylake or Cascade Lake (capable of Turbo bursts of 3.1 GHz).
The top number of vCPUs is also increased compared with the i3 and i3en series. Whereas the i3.metal tops out at 72, and the i3en.metal sports 96, the i4i.32xlarge offers 128. That gives the high end of the I4i series 33% – 77% more vCPUs per instance.
To take advantage of a server like that, one would *must* use Scylla.
No other DB linear scale with the number of cores.
The same amount of RAM is associated with each vCPU across the I3en and I4i series — an 8:1 ratio. So for example, the base i4i.large has 2 vCPUs and 16 GiB of RAM. However, because of the higher CPU density of the largest instance type, you get higher maximum RAM in comparison to the i3.metal and i3en.metal: 33% – 77% more with the i4i.24xlarge.
The i3.metal supports 8 x 1.9 TB NVMe SSD drives for 15.2 TB total storage per instance, whereas the i3en.metal supports 8 x 7,500 GB drives for up to 60 TB storage total. The i4i.24xlarge has 8 x 3,750 GB AWS Nitro SSD drives, for 30 TB storage — comfortably nestling it between the other two instance types. It has twice the capacity of the i3.metal, and half the capacity of the i3en.metal.
Both of the I4g series instance types, like the I4i, sport AWS Nitro SSDs.
Full reporter will be shared soon
I4 instances prices are not available yet
Source
https://docs.google.com/spreadsheets/d/17dBFZERu5S6gZ8gBwS2rSEpmTvgEzv3xiOSiVHgoPuc/edit#gid=0 (COPY)
https://docs.google.com/spreadsheets/d/1JVvJsbcVwENGyCu-x16RWUXummHSUbuvXnLY4Z1xWTA/edit#gid=0 (ORIGINAL)
Full reporter will be shared soon
I4 instances prices are not available yet
Source
https://docs.google.com/spreadsheets/d/17dBFZERu5S6gZ8gBwS2rSEpmTvgEzv3xiOSiVHgoPuc/edit#gid=0 (COPY)
https://docs.google.com/spreadsheets/d/1JVvJsbcVwENGyCu-x16RWUXummHSUbuvXnLY4Z1xWTA/edit#gid=0 (ORIGINAL)
I4i.32xlarge will be 2 as good!