Eyal Gutkind, VP of Solutions
Maheedhar Gunturu, Solutions Architect
Sizing your
Scylla Cluster
Sizing your Scylla Cluster
A walk through the process
Presenters
Maheedhar Gunturu, Solutions Architect
Prior to ScyllaDB, Maheedhar held senior roles both in engineering and sales
organizations. He has over a decade of experience designing & developing server-side
applications in the cloud and working on big data and ETL frameworks in companies
such as Samsung, MapR, Apple, VoltDB, Zscaler and Qualcomm. He holds a masters in
Electrical and computer engineering from the University of Texas at San Antonio.
3
Eyal Gutkin, VP of Solutions
Eyal Gutkind is VP of Solutions at ScyllaDB. Prior to joining ScyllaDB, Eyal held product
management roles at Mirantis and DataStax, and spent 12 years with Mellanox
Technologies in various engineering management and product marketing roles. Eyal
holds a BSc. degree in Electrical and Computer Engineering from Ben Gurion
University in Israel and an MBA from Fuqua School of Business at Duke University.
Agenda
+ Understand your workload
+ The machines
+ Let’s build a system
+ How it will look at different IaaS
+ Our sizing process
Thinking about database cluster sizing?!
5
Make sure you have all requirements set
6
Make sure you have all requirements set
7
■ Business
Make sure you have all requirements set
8
■ Business
■ Application
Make sure you have all requirements set
9
■ Business
■ Application
■ Infrastructure
Make sure you have all requirements set
10
■ Business
■ Application
■ Infrastructure
■ Resiliency
Make sure you have all requirements set
11
■ Business
■ Application
■ Infrastructure
■ Resiliency
■ Developer and Operator Friendliness
The obvious...
■ Data volume ingested per second/hour/day/year
■ Data attrition policy
■ Data format : Text, Binary blob
■ Required replication factor
■ What’s in your storage system?
12
13
The shape of your workload
The shape of your workload
https://www.scylladb.com/2019/05/23/workload-prioritization-running-oltp-and-olap-traffic-on-
the-same-superhighway/
Impact of data model
■ Materialized View, Secondary Index
● Increased CPU and I/O
Impact of data model
■ Materialized View, Secondary Index
● Increased CPU and I/O
■ Partition Size and per operation payload
■ Tables and keyspaces
Impact of auxiliary systems
■ Consider the workload from
● Spark
● Kafka/Nifi
● KairosDB
● JanusGraph
The Machines
Someone needs to do the job!
19
Memory and data volume
■ Keep disk space to memory at a reasonable ratio
20
Connectivity
■ Enable enough bandwidth for client-server and server-server
communication
21
Connectivity
■ Enable enough bandwidth for client-server and server-server
communication
■ Geo-replication benefits
22
Let’s build a system
Let’s build a system
Business requirements:
■ 1st year customers: 30M profiles
■ Number of records per profile: 12
■ Avg. size of each record: 3KB
■ Data types: text
24
Let’s build a system
Business requirements:
■ 1st year customers: 30M profiles
■ Number of records per profile: 12
■ Avg. size of each record: 3KB
■ Data types: text
Application requirements:
■ 99% write response time: 15ms
■ 99% read response time: 10ms
■ Peak throughput: 150,000
operations/sec
■ Read:Write ratio: 70:30
25
Let’s build a system
Business requirements:
■ 1st year customers: 30M profiles
■ Number of records per profile: 12
■ Avg. size of each record: 3KB
■ Data types: text
Application requirements:
■ 99% write response time: 15ms
■ 99% read response time: 10ms
■ Peak throughput: 150,000
operations/sec
■ Read:Write ratio: 70:30
Infrastructure:
■ AWS
■ Available instance types: all
■ Multi-DC: Oregon and N.Virgina
■ OS: CentOS 7.6
■ Replication Factor: 3
26
Let’s build a system
Business requirements:
■ 1st year customers: 30M profiles
■ Number of records per profile: 12
■ Avg. size of each record: 3KB
■ Data types: text
Application requirements:
■ 99% write response time: 15ms
■ 99% read response time: 10ms
■ Peak throughput: 150,000
operations/sec
■ Read:Write ratio: 70:30
Infrastructure:
■ AWS
■ Available instance types: all
■ Multi-DC: Oregon and N.Virgina
■ OS: CentOS 7.6
■ Replication Factor: 3
Auxiliary applications:
■ Spark
27
How it looks
at different IaaS
Let’s build a system
■ End of year Total raw data size: 2TB, Starting with ~1TB
■ Typical record size read/written by the application: 3KB
■ Data model: 20-30 tables, up to 20 columns per row, 10-50 rows per
partition, mainly text
■ Latency requirements: 10ms Write, 15ms Read, for the 99%
■ Read:Write ratio: 70:30
■ Throughput: 150,000 database op/s
■ IaaS: AWS, multi-region, multi-availability-zone, N. Virginia and
Oregon
■ Replication Factor: 3
■ Spark for analytics
29
Let’s build a system- recap
■ ~10,000 operations per second per physical CPU
■ We use STCS, so need 2x the disk space of replicated data
■ Latency requirement vs. amount of RAM to increase cache hit ratio
■ Make sure using reasonable number of tables/keyspaces per cluster
■ Any usage of MV/Secondary indexes, or auxiliary system?
Let’s build a system, Amazon Web Services
■ Per Data Center
● Needed disk space: 12TB
● Media type: NVMe drives to meet latency SLA
● Requires 30 threads, 15 physical cores
■ Per Data center instance options
● 3 x i3.4xlarge → Total disk: 11.5TB
and
● 3 x i3.2xlarge for the Spark cluster
● 1x i3.2xlarge for Scyla monitoring and Scylla manager
31
Let’s build a system with i3en , AWS
■ Per Data Center
● Needed disk space: 12TB
● Media type: NVMe drives to meet latency SLA
● Requires 30 threads, 15 physical cores
■ Per Data center instance options
● 3 x i3en.3xlarge → Total disk: 22.5TB!
and
● 3 x i3.2xlarge for the Spark cluster
■ 1x i3.2xlarge for Scyla monitoring and Scylla manager
32
Let’s build a system, Azure
■ Per Data Center
● Needed disk space: 12TB
● Media type: NVMe drives to meet latency SLA
● Requires 30 threads, 15 physical cores
■ Per Data center instance options
● 3 x standard L16 v2 → Total disk: 11.5TB
and
● 3 x standard L8 v2 for the Spark cluster
● 1x standard L8 v2 for Scyla monitoring and Scylla manager
33
Let’s build a system, Google Cloud
■ Per Data Center
● Needed disk space: 12TB
● Media type: NVMe drives to meet latency SLA
● Requires 30 threads, 15 physical cores
■ Per Data center instance options
● 6 x n1-standard-16 + 5x NVMe based
direct attached, 375GB drives
and
● 3 x n1-standard-8 for the Spark cluster
● 1 x n1-standard-8 for Scyla monitoring and Scylla manager
34
Let’s build a system, Scylla Cloud
■ Per Data Center
● Needed disk space: 12TB
● Media type: NVMe drives to meet latency SLA
● Requires 30 threads, 15 physical cores
■ Per Data center instance options
● 3 x i3.4xlarge → Total disk: 11.5TB
and
● 3 x i3.2xlarge for the Spark cluster
35
Let’s build a system, on premise
■ Per Data Center
● Needed disk space: 12TB
● Media type: NVMe drives to meet latency SLA
● Requires 30 threads, 15 physical cores
■ Per Data center instance options
● 3 machines with 8 physical cores each and at least 4TB of SSD direct attached drives
● 3 machines with 4 or more physical cores for Spark
● 1 x machine for Scyla monitoring and Scylla manager
● Scylla nodes and Spark nodes: 128GB RAM
● Network: 10GbE
36
Our sizing process
Scylla’s sizing sheet
38
Summary
■ Do not think only storage!
■ Gather applications and business requirements
● Throughput and SLAs
● Growth expectations
● Security and compliance needs
■ Select the right infrastructure
■ Think about resiliency and high availability
Ask us questions!
39
Thank you Stay in touch
Any questions?
Eyal Gutkind
eyal@scylladb.com
@gutkinde
Maheedhar Gunturu
maheedhar@scylladb.com
@vanguard_space

Sizing Your Scylla Cluster

  • 1.
    Eyal Gutkind, VPof Solutions Maheedhar Gunturu, Solutions Architect Sizing your Scylla Cluster
  • 2.
    Sizing your ScyllaCluster A walk through the process
  • 3.
    Presenters Maheedhar Gunturu, SolutionsArchitect Prior to ScyllaDB, Maheedhar held senior roles both in engineering and sales organizations. He has over a decade of experience designing & developing server-side applications in the cloud and working on big data and ETL frameworks in companies such as Samsung, MapR, Apple, VoltDB, Zscaler and Qualcomm. He holds a masters in Electrical and computer engineering from the University of Texas at San Antonio. 3 Eyal Gutkin, VP of Solutions Eyal Gutkind is VP of Solutions at ScyllaDB. Prior to joining ScyllaDB, Eyal held product management roles at Mirantis and DataStax, and spent 12 years with Mellanox Technologies in various engineering management and product marketing roles. Eyal holds a BSc. degree in Electrical and Computer Engineering from Ben Gurion University in Israel and an MBA from Fuqua School of Business at Duke University.
  • 4.
    Agenda + Understand yourworkload + The machines + Let’s build a system + How it will look at different IaaS + Our sizing process
  • 5.
    Thinking about databasecluster sizing?! 5
  • 6.
    Make sure youhave all requirements set 6
  • 7.
    Make sure youhave all requirements set 7 ■ Business
  • 8.
    Make sure youhave all requirements set 8 ■ Business ■ Application
  • 9.
    Make sure youhave all requirements set 9 ■ Business ■ Application ■ Infrastructure
  • 10.
    Make sure youhave all requirements set 10 ■ Business ■ Application ■ Infrastructure ■ Resiliency
  • 11.
    Make sure youhave all requirements set 11 ■ Business ■ Application ■ Infrastructure ■ Resiliency ■ Developer and Operator Friendliness
  • 12.
    The obvious... ■ Datavolume ingested per second/hour/day/year ■ Data attrition policy ■ Data format : Text, Binary blob ■ Required replication factor ■ What’s in your storage system? 12
  • 13.
    13 The shape ofyour workload
  • 14.
    The shape ofyour workload https://www.scylladb.com/2019/05/23/workload-prioritization-running-oltp-and-olap-traffic-on- the-same-superhighway/
  • 15.
    Impact of datamodel ■ Materialized View, Secondary Index ● Increased CPU and I/O
  • 16.
    Impact of datamodel ■ Materialized View, Secondary Index ● Increased CPU and I/O ■ Partition Size and per operation payload ■ Tables and keyspaces
  • 17.
    Impact of auxiliarysystems ■ Consider the workload from ● Spark ● Kafka/Nifi ● KairosDB ● JanusGraph
  • 18.
  • 19.
    Someone needs todo the job! 19
  • 20.
    Memory and datavolume ■ Keep disk space to memory at a reasonable ratio 20
  • 21.
    Connectivity ■ Enable enoughbandwidth for client-server and server-server communication 21
  • 22.
    Connectivity ■ Enable enoughbandwidth for client-server and server-server communication ■ Geo-replication benefits 22
  • 23.
  • 24.
    Let’s build asystem Business requirements: ■ 1st year customers: 30M profiles ■ Number of records per profile: 12 ■ Avg. size of each record: 3KB ■ Data types: text 24
  • 25.
    Let’s build asystem Business requirements: ■ 1st year customers: 30M profiles ■ Number of records per profile: 12 ■ Avg. size of each record: 3KB ■ Data types: text Application requirements: ■ 99% write response time: 15ms ■ 99% read response time: 10ms ■ Peak throughput: 150,000 operations/sec ■ Read:Write ratio: 70:30 25
  • 26.
    Let’s build asystem Business requirements: ■ 1st year customers: 30M profiles ■ Number of records per profile: 12 ■ Avg. size of each record: 3KB ■ Data types: text Application requirements: ■ 99% write response time: 15ms ■ 99% read response time: 10ms ■ Peak throughput: 150,000 operations/sec ■ Read:Write ratio: 70:30 Infrastructure: ■ AWS ■ Available instance types: all ■ Multi-DC: Oregon and N.Virgina ■ OS: CentOS 7.6 ■ Replication Factor: 3 26
  • 27.
    Let’s build asystem Business requirements: ■ 1st year customers: 30M profiles ■ Number of records per profile: 12 ■ Avg. size of each record: 3KB ■ Data types: text Application requirements: ■ 99% write response time: 15ms ■ 99% read response time: 10ms ■ Peak throughput: 150,000 operations/sec ■ Read:Write ratio: 70:30 Infrastructure: ■ AWS ■ Available instance types: all ■ Multi-DC: Oregon and N.Virgina ■ OS: CentOS 7.6 ■ Replication Factor: 3 Auxiliary applications: ■ Spark 27
  • 28.
    How it looks atdifferent IaaS
  • 29.
    Let’s build asystem ■ End of year Total raw data size: 2TB, Starting with ~1TB ■ Typical record size read/written by the application: 3KB ■ Data model: 20-30 tables, up to 20 columns per row, 10-50 rows per partition, mainly text ■ Latency requirements: 10ms Write, 15ms Read, for the 99% ■ Read:Write ratio: 70:30 ■ Throughput: 150,000 database op/s ■ IaaS: AWS, multi-region, multi-availability-zone, N. Virginia and Oregon ■ Replication Factor: 3 ■ Spark for analytics 29
  • 30.
    Let’s build asystem- recap ■ ~10,000 operations per second per physical CPU ■ We use STCS, so need 2x the disk space of replicated data ■ Latency requirement vs. amount of RAM to increase cache hit ratio ■ Make sure using reasonable number of tables/keyspaces per cluster ■ Any usage of MV/Secondary indexes, or auxiliary system?
  • 31.
    Let’s build asystem, Amazon Web Services ■ Per Data Center ● Needed disk space: 12TB ● Media type: NVMe drives to meet latency SLA ● Requires 30 threads, 15 physical cores ■ Per Data center instance options ● 3 x i3.4xlarge → Total disk: 11.5TB and ● 3 x i3.2xlarge for the Spark cluster ● 1x i3.2xlarge for Scyla monitoring and Scylla manager 31
  • 32.
    Let’s build asystem with i3en , AWS ■ Per Data Center ● Needed disk space: 12TB ● Media type: NVMe drives to meet latency SLA ● Requires 30 threads, 15 physical cores ■ Per Data center instance options ● 3 x i3en.3xlarge → Total disk: 22.5TB! and ● 3 x i3.2xlarge for the Spark cluster ■ 1x i3.2xlarge for Scyla monitoring and Scylla manager 32
  • 33.
    Let’s build asystem, Azure ■ Per Data Center ● Needed disk space: 12TB ● Media type: NVMe drives to meet latency SLA ● Requires 30 threads, 15 physical cores ■ Per Data center instance options ● 3 x standard L16 v2 → Total disk: 11.5TB and ● 3 x standard L8 v2 for the Spark cluster ● 1x standard L8 v2 for Scyla monitoring and Scylla manager 33
  • 34.
    Let’s build asystem, Google Cloud ■ Per Data Center ● Needed disk space: 12TB ● Media type: NVMe drives to meet latency SLA ● Requires 30 threads, 15 physical cores ■ Per Data center instance options ● 6 x n1-standard-16 + 5x NVMe based direct attached, 375GB drives and ● 3 x n1-standard-8 for the Spark cluster ● 1 x n1-standard-8 for Scyla monitoring and Scylla manager 34
  • 35.
    Let’s build asystem, Scylla Cloud ■ Per Data Center ● Needed disk space: 12TB ● Media type: NVMe drives to meet latency SLA ● Requires 30 threads, 15 physical cores ■ Per Data center instance options ● 3 x i3.4xlarge → Total disk: 11.5TB and ● 3 x i3.2xlarge for the Spark cluster 35
  • 36.
    Let’s build asystem, on premise ■ Per Data Center ● Needed disk space: 12TB ● Media type: NVMe drives to meet latency SLA ● Requires 30 threads, 15 physical cores ■ Per Data center instance options ● 3 machines with 8 physical cores each and at least 4TB of SSD direct attached drives ● 3 machines with 4 or more physical cores for Spark ● 1 x machine for Scyla monitoring and Scylla manager ● Scylla nodes and Spark nodes: 128GB RAM ● Network: 10GbE 36
  • 37.
  • 38.
  • 39.
    Summary ■ Do notthink only storage! ■ Gather applications and business requirements ● Throughput and SLAs ● Growth expectations ● Security and compliance needs ■ Select the right infrastructure ■ Think about resiliency and high availability Ask us questions! 39
  • 40.
    Thank you Stayin touch Any questions? Eyal Gutkind eyal@scylladb.com @gutkinde Maheedhar Gunturu maheedhar@scylladb.com @vanguard_space