5 Factors When Selecting a
High Performance, Low
Latency Database
Peter Corless — Director of Technical Advocacy, ScyllaDB
Arthur Pesa — Solutions Architect, ScyllaDB
Brought to you by
VIRTUAL EVENT | OCTOBER 19 + 20
All Things Performance
The event for developers who care about P99
percentiles and high-performance, low-latency
applications.
Register at p99conf.io
Poll
Where are you in your NoSQL adoption?
5 Factors When Selecting a
High Performance, Low
Latency Database
Peter Corless — Director of Technical Advocacy, ScyllaDB
Arthur Pesa — Solutions Architect, ScyllaDB
Introductions
Peter Corless, Director of Technical Advocacy, ScyllaDB
+ Editor of and frequent contributor to the ScyllaDB blog
+ Program chair for ScyllaDB Summit and P99 CONF
+ Host of ScyllaDB Masterclass series
+ @PeterCorless on Twitter
Arthur Pesa, Solutions Architect, ScyllaDB
+ Helps customers successfully implement databases
+ Formerly at Nike, DataStax, Columbia Sportswear
+ Five Factors — What’s most important for making a database decision for your
organization?
+ ScyllaDB — How our big, fast NoSQL database holds up against these
considerations
What We’ll Talk About
+ “SQL vs. NoSQL” — If you need a table JOIN, you need a JOIN; if you need a
wide column, you need a wide column
+ 394 other database systems — Feel free to use these criteria compare to other
databases listed on DB-engines.com. Your Mileage May Vary (YMMV)
What We Won’t Talk About
What is ScyllaDB?
SILL-ah DEE BEE
+ ScyllaDB is the database for data-intensive apps that require high performance and low
latency
+ ScyllaDB is a wide-column NoSQL database compatible with Apache Cassandra CQL &
Amazon DynamoDB APIs — only much faster
+ ScyllaDB, the company, started in 2016
+ ScyllaDB, the database, is available as Open Source, Enterprise and Cloud
ScyllaDB Intro
+ Infoworld 2020 Technology of the Year!
+ Founded by designers of KVM Hypervisor
The Database Built for Gamechangers
10
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
+ Resolves challenges of legacy NoSQL databases
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ DBaaS/Cloud, Enterprise and Open Source solutions
+ Proven globally at scale
11
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Fast computation of flight
pricing
Corporate fleet
management
Real-time analytics
2,000,000 SKU -commerce
management
Real-time location tracking
for friends/family
Video recommendation
management
IoT for industrial
machines
Synchronize browser
properties for millions
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Distributed storage for
distributed ledger tech
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
The Five Factors
1. Software Architecture — Does the database use the most efficient data structures, flexible
data models, and rich query languages to support your workloads and query patterns?
2. Hardware Utilization — Can it take full advantage of modern hardware platforms? Or will
you be leaving a significant amount of CPU cycles underutilized?
3. Interoperability — How easy is it to integrate into your development environment? Does it
support your programming languages, frameworks and projects? Was it designed to
integrate into your microservices and event streaming architecture?
4. RASP — Does it have the necessary qualities of Reliability, Availability, Scalability,
Serviceability and, of course, Performance?
5. Deployment — Does this database only work in a limited environment, such as only
on-premises, or only in a single datacenter or a single cloud vendor? Or does it lend itself to
being deployed exactly where and how you want around the globe?
5 Factors When Selecting a High
Performance, Low Latency Database
Does the database use the most efficient data structures, flexible data models, and
rich query languages to support your workloads and query patterns?
+ Workload — Transactional or Analytical? Hybrid?
+ Data Model — Key-Value, Wide Column, Column Store, Document, Graph, RDBMS, or other?
+ Query Language — SQL, SQL-like (CQL), JSON, or other?
+ Transactions/Operations/CAP — Which is more important, Consistency or Availability?
+ Data Distribution — Multi-datacenter or local clustering? Cross-cluster updates?
Software Architecture
Can it take full advantage of modern hardware platforms? Or will you be leaving a
significant amount of CPU cycles underutilized?
+ CPU utilization / efficiency — Process distribution; single- or multi-threading
+ RAM utilization / efficiency — read path and write path; caching; [JVM, heap tuning, etc.]
+ Storage utilization / efficiency — storage format, mutability, concurrency, tiering
+ Network utilization / efficiency — client/server vs. intra-cluster communications
Hardware Utilization
How easy is it to integrate into your development environment? Does it support your
programming languages, frameworks and projects? Was it designed to integrate into
your microservices and event streaming architecture?
+ Programming Languages/Frameworks — Clients, Libraries, SDKs, ORMs, Packages
+ Event Streaming/Message Queuing — Sink and/or Source, Kafka, Pulsar, RabbitMQ
+ APIs — RESTful, GraphQL, microservices
+ Other — e.g., Pluggable storage layer [ex: JanusGraph]
Interoperability
Does it have the necessary qualities of Reliability, Availability, Scalability, Serviceability
and, of course, Performance?
+ Reliability — Durability, Survivability, Guardrails
+ Availability — “Five Nines”
+ Scalability — Capacity, Elasticity
+ Serviceability — Manageability, Observability, Usability
+ Performance — Throughput, latency
RASP
Does this database only work in a limited environment, such as only on-premises, or
only in a single datacenter or a single cloud vendor? Or does it lend itself to being
deployed exactly where and how you want around the globe?
+ Cloud Vendor Lock-in?
+ On-Prem Deployable?
+ Kubernetes (k8s)
+ Multi-Cloud
Deployment
ScyllaDB — How
Does it Work?
+ Architected from the ground up based on Seastar
+ Seastar is an advanced, open-source C++ framework for high-performance server
applications on modern hardware.
+ Seastar uses a shared-nothing model that shards all requests onto individual cores.
+ Seastar is designed for sharing information between CPU cores without time-consuming
locking.
+ Seastar is the differentiator that allows ScyllaDB to run on hardware and not inside the
JVM
1. ScyllaDB Architecture
+ ScyllaDB supports the Apache Cassandra CQL query language
+ If you're a Cassandra user today you will have the same experience when using CQL
in both CQLsh and your API’s
+ ScyllaDB also supports a DynamoDB-compatible API, called “Alternator”
+ Also supports DynamoDB Streams (“Alternator Streams”)
Cassandra CQL & DynamoDB Queries
+ Wide Column NoSQL
+ “Key-Key-Value” row store (Partition Key, Clustering Key)
+ Highly optimized for OLTP workloads.
+ Do not be confused with “columnar stores” like Clickhouse, Druid or Pinot (OLAP-oriented)
+ Designed for extremely fast data access
+ Data is ordered in each table based on Clustering Key(s)
+ Data retrieval speeds measured in single digit ms
+ Use case based Data Modeling - single table per query
+ ScyllaDB employs Indexing, Secondary Indexing and Materialized Views that are far
superior in performance over Cassandra
Data Model
Data Model Example
+ Shard-per-core — each vCPU assigned its own data partitions
+ NUMA-aware — each vCPU also assigned its own RAM
+ Single-threaded per vCPU
+ Custom CPU and IO schedulers
Shard-per-Core Software Architecture
+ Linear scalability for the latest cloud computing hardware
+ I4i.metal: 128 vCPUs, 1 TB RAM, 30 TB NVMe SSD per node
+ I3en.metal: up to 60 TB NVMe SSD per node
+ iotune and Diskplorer
+ Optimizing NVMe SSD
+ CPU + IO Schedulers
+ Best utilization of HW
2. Maximize Hardware Utilization
I3en I4i
Basic Connectivity
+ Apache Cassandra CQL Drivers
+ Shard-Aware ScyllaDB CQL Drivers
+ AWS DynamoDB SDKs
Streaming
+ Kafka Sink & Source Connectors [also Pulsar]
+ DynamoDB Streams [“Alternator Streams”]
Any Cassandra ecosystem solution
3. ScyllaDB Interoperability
CQL
+ ScyllaDB is a Shard per Core Architecture and has its own Shard Aware Drivers
+ Better utilizes ScyllaDB built-in efficiencies
+ Shard Aware drivers are available in Rust, Python, Go, and C++
+ ScyllaDB supports drivers that utilize standard Apache Cassandra Native Transport
+ Drivers exist for most every programming language in use today.
DynamoDB API
+ ScyllaDB has its own DynamoDB API called Alternator that allows you to plug your
current DynamoDB based API directly into ScyllaDB Alternator
+ ScyllaDB can use any of the AWS SDKs for DynamoDB without modification
Programming Languages / Drivers
+ Kafka Sink Connector — Shard-Aware, optimized for ScyllaDB
+ Kafka Source Connector — based on Debezium
Event Streaming
4. RASP
+ Reliability
+ Partition Tolerant, You can lose a node and still handle traffic.
+ “I just want the thing to run without any babysitting at all.”
+ Availability
+ Always on architecture, tunable consistency
+ Scalability
+ When needed you can add more nodes
+ Vertical as well as horizontal scalability — any number of vCPUs, and amount of TBs of SSD
+ Serviceability
+ ScyllaDB Monitoring Stack — real time observability makes identifying problems simple
+ ScyllaDB Manager — for backups and repairs
+ Performance
+ Millions of ops per second at single-digit ms P99 latencies
+ Allows full usage of available resources, CPU, Memory and Storage
ScyllaDB Open Source ScyllaDB Enterprise
ScyllaDB Operator for k8s
ScyllaDB Cloud
5. Deployment
On Premises
or
Any Cloud
Poll
How much data do you under management of your
transactional database?
Q&A
WANT TO KEEP LEARNING?
Join ScyllaDB University for Free:
university.scylladb.com
SCYLLADB VIRTUAL WORKSHOP
Getting Started with ScyllaDB
29 September, 2022, 12PM GMT | 8 AM ET | 5:30 PM IST
Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com
@scylladb company/scylladb/
scylladb/

5 Factors When Selecting a High Performance, Low Latency Database

  • 1.
    5 Factors WhenSelecting a High Performance, Low Latency Database Peter Corless — Director of Technical Advocacy, ScyllaDB Arthur Pesa — Solutions Architect, ScyllaDB
  • 2.
    Brought to youby VIRTUAL EVENT | OCTOBER 19 + 20 All Things Performance The event for developers who care about P99 percentiles and high-performance, low-latency applications. Register at p99conf.io
  • 3.
    Poll Where are youin your NoSQL adoption?
  • 4.
    5 Factors WhenSelecting a High Performance, Low Latency Database Peter Corless — Director of Technical Advocacy, ScyllaDB Arthur Pesa — Solutions Architect, ScyllaDB
  • 5.
    Introductions Peter Corless, Directorof Technical Advocacy, ScyllaDB + Editor of and frequent contributor to the ScyllaDB blog + Program chair for ScyllaDB Summit and P99 CONF + Host of ScyllaDB Masterclass series + @PeterCorless on Twitter Arthur Pesa, Solutions Architect, ScyllaDB + Helps customers successfully implement databases + Formerly at Nike, DataStax, Columbia Sportswear
  • 6.
    + Five Factors— What’s most important for making a database decision for your organization? + ScyllaDB — How our big, fast NoSQL database holds up against these considerations What We’ll Talk About
  • 7.
    + “SQL vs.NoSQL” — If you need a table JOIN, you need a JOIN; if you need a wide column, you need a wide column + 394 other database systems — Feel free to use these criteria compare to other databases listed on DB-engines.com. Your Mileage May Vary (YMMV) What We Won’t Talk About
  • 8.
  • 9.
    + ScyllaDB isthe database for data-intensive apps that require high performance and low latency + ScyllaDB is a wide-column NoSQL database compatible with Apache Cassandra CQL & Amazon DynamoDB APIs — only much faster + ScyllaDB, the company, started in 2016 + ScyllaDB, the database, is available as Open Source, Enterprise and Cloud ScyllaDB Intro
  • 10.
    + Infoworld 2020Technology of the Year! + Founded by designers of KVM Hypervisor The Database Built for Gamechangers 10 “ScyllaDB stands apart...It’s the rare product that exceeds my expectations.” – Martin Heller, InfoWorld contributing editor and reviewer “For 99.9% of applications, ScyllaDB delivers all the power a customer will ever need, on workloads that other databases can’t touch – and at a fraction of the cost of an in-memory solution.” – Adrian Bridgewater, Forbes senior contributor + Resolves challenges of legacy NoSQL databases + >5x higher throughput + >20x lower latency + >75% TCO savings + DBaaS/Cloud, Enterprise and Open Source solutions + Proven globally at scale
  • 11.
    11 +400 Gamechangers LeverageScyllaDB Seamless experiences across content + devices Fast computation of flight pricing Corporate fleet management Real-time analytics 2,000,000 SKU -commerce management Real-time location tracking for friends/family Video recommendation management IoT for industrial machines Synchronize browser properties for millions Threat intelligence service using JanusGraph Real time fraud detection across 6M transactions/day Uber scale, mission critical chat & messaging app Network security threat detection Power ~50M X1 DVRs with billions of reqs/day Precision healthcare via Edison AI Inventory hub for retail operations Property listings and updates Unified ML feature store across the business Cryptocurrency exchange app Geography-based recommendations Distributed storage for distributed ledger tech Global operations- Avon, Body Shop + more Predictable performance for on sale surges GPS-based exercise tracking
  • 12.
  • 13.
    1. Software Architecture— Does the database use the most efficient data structures, flexible data models, and rich query languages to support your workloads and query patterns? 2. Hardware Utilization — Can it take full advantage of modern hardware platforms? Or will you be leaving a significant amount of CPU cycles underutilized? 3. Interoperability — How easy is it to integrate into your development environment? Does it support your programming languages, frameworks and projects? Was it designed to integrate into your microservices and event streaming architecture? 4. RASP — Does it have the necessary qualities of Reliability, Availability, Scalability, Serviceability and, of course, Performance? 5. Deployment — Does this database only work in a limited environment, such as only on-premises, or only in a single datacenter or a single cloud vendor? Or does it lend itself to being deployed exactly where and how you want around the globe? 5 Factors When Selecting a High Performance, Low Latency Database
  • 14.
    Does the databaseuse the most efficient data structures, flexible data models, and rich query languages to support your workloads and query patterns? + Workload — Transactional or Analytical? Hybrid? + Data Model — Key-Value, Wide Column, Column Store, Document, Graph, RDBMS, or other? + Query Language — SQL, SQL-like (CQL), JSON, or other? + Transactions/Operations/CAP — Which is more important, Consistency or Availability? + Data Distribution — Multi-datacenter or local clustering? Cross-cluster updates? Software Architecture
  • 15.
    Can it takefull advantage of modern hardware platforms? Or will you be leaving a significant amount of CPU cycles underutilized? + CPU utilization / efficiency — Process distribution; single- or multi-threading + RAM utilization / efficiency — read path and write path; caching; [JVM, heap tuning, etc.] + Storage utilization / efficiency — storage format, mutability, concurrency, tiering + Network utilization / efficiency — client/server vs. intra-cluster communications Hardware Utilization
  • 16.
    How easy isit to integrate into your development environment? Does it support your programming languages, frameworks and projects? Was it designed to integrate into your microservices and event streaming architecture? + Programming Languages/Frameworks — Clients, Libraries, SDKs, ORMs, Packages + Event Streaming/Message Queuing — Sink and/or Source, Kafka, Pulsar, RabbitMQ + APIs — RESTful, GraphQL, microservices + Other — e.g., Pluggable storage layer [ex: JanusGraph] Interoperability
  • 17.
    Does it havethe necessary qualities of Reliability, Availability, Scalability, Serviceability and, of course, Performance? + Reliability — Durability, Survivability, Guardrails + Availability — “Five Nines” + Scalability — Capacity, Elasticity + Serviceability — Manageability, Observability, Usability + Performance — Throughput, latency RASP
  • 18.
    Does this databaseonly work in a limited environment, such as only on-premises, or only in a single datacenter or a single cloud vendor? Or does it lend itself to being deployed exactly where and how you want around the globe? + Cloud Vendor Lock-in? + On-Prem Deployable? + Kubernetes (k8s) + Multi-Cloud Deployment
  • 19.
  • 20.
    + Architected fromthe ground up based on Seastar + Seastar is an advanced, open-source C++ framework for high-performance server applications on modern hardware. + Seastar uses a shared-nothing model that shards all requests onto individual cores. + Seastar is designed for sharing information between CPU cores without time-consuming locking. + Seastar is the differentiator that allows ScyllaDB to run on hardware and not inside the JVM 1. ScyllaDB Architecture
  • 21.
    + ScyllaDB supportsthe Apache Cassandra CQL query language + If you're a Cassandra user today you will have the same experience when using CQL in both CQLsh and your API’s + ScyllaDB also supports a DynamoDB-compatible API, called “Alternator” + Also supports DynamoDB Streams (“Alternator Streams”) Cassandra CQL & DynamoDB Queries
  • 22.
    + Wide ColumnNoSQL + “Key-Key-Value” row store (Partition Key, Clustering Key) + Highly optimized for OLTP workloads. + Do not be confused with “columnar stores” like Clickhouse, Druid or Pinot (OLAP-oriented) + Designed for extremely fast data access + Data is ordered in each table based on Clustering Key(s) + Data retrieval speeds measured in single digit ms + Use case based Data Modeling - single table per query + ScyllaDB employs Indexing, Secondary Indexing and Materialized Views that are far superior in performance over Cassandra Data Model
  • 23.
  • 24.
    + Shard-per-core —each vCPU assigned its own data partitions + NUMA-aware — each vCPU also assigned its own RAM + Single-threaded per vCPU + Custom CPU and IO schedulers Shard-per-Core Software Architecture
  • 25.
    + Linear scalabilityfor the latest cloud computing hardware + I4i.metal: 128 vCPUs, 1 TB RAM, 30 TB NVMe SSD per node + I3en.metal: up to 60 TB NVMe SSD per node + iotune and Diskplorer + Optimizing NVMe SSD + CPU + IO Schedulers + Best utilization of HW 2. Maximize Hardware Utilization I3en I4i
  • 26.
    Basic Connectivity + ApacheCassandra CQL Drivers + Shard-Aware ScyllaDB CQL Drivers + AWS DynamoDB SDKs Streaming + Kafka Sink & Source Connectors [also Pulsar] + DynamoDB Streams [“Alternator Streams”] Any Cassandra ecosystem solution 3. ScyllaDB Interoperability
  • 27.
    CQL + ScyllaDB isa Shard per Core Architecture and has its own Shard Aware Drivers + Better utilizes ScyllaDB built-in efficiencies + Shard Aware drivers are available in Rust, Python, Go, and C++ + ScyllaDB supports drivers that utilize standard Apache Cassandra Native Transport + Drivers exist for most every programming language in use today. DynamoDB API + ScyllaDB has its own DynamoDB API called Alternator that allows you to plug your current DynamoDB based API directly into ScyllaDB Alternator + ScyllaDB can use any of the AWS SDKs for DynamoDB without modification Programming Languages / Drivers
  • 28.
    + Kafka SinkConnector — Shard-Aware, optimized for ScyllaDB + Kafka Source Connector — based on Debezium Event Streaming
  • 29.
    4. RASP + Reliability +Partition Tolerant, You can lose a node and still handle traffic. + “I just want the thing to run without any babysitting at all.” + Availability + Always on architecture, tunable consistency + Scalability + When needed you can add more nodes + Vertical as well as horizontal scalability — any number of vCPUs, and amount of TBs of SSD + Serviceability + ScyllaDB Monitoring Stack — real time observability makes identifying problems simple + ScyllaDB Manager — for backups and repairs + Performance + Millions of ops per second at single-digit ms P99 latencies + Allows full usage of available resources, CPU, Memory and Storage
  • 30.
    ScyllaDB Open SourceScyllaDB Enterprise ScyllaDB Operator for k8s ScyllaDB Cloud 5. Deployment On Premises or Any Cloud
  • 31.
    Poll How much datado you under management of your transactional database?
  • 32.
    Q&A WANT TO KEEPLEARNING? Join ScyllaDB University for Free: university.scylladb.com SCYLLADB VIRTUAL WORKSHOP Getting Started with ScyllaDB 29 September, 2022, 12PM GMT | 8 AM ET | 5:30 PM IST
  • 33.
    Thank you for joiningus today. @scylladb scylladb/ slack.scylladb.com @scylladb company/scylladb/ scylladb/