Performance Testing: Scylla vs. Cassandra vs. Datastax

Performance Test
Scylla, DataStax DSE and Apache Cassandra
Linda Xu

Linda Xu
VP, data platform/TechOps
Ticketmaster
■ Who we are and our challengers
■ Cassandra test project
2

Cassandra
Content
▪ Bullet 1
• Bullet 2
• Bullet 3S
3
Somewhere in the world every 20 minutes
is a Live Nation Event
We power unforgettable moments for joy!

4
TECH LANDSCAPE
■ 27 Ticketing Systems and over 250 unique products.
■ Hybrid Cloud with over 20,000 VMs across 7 global data centers, and
multiple AWS regions.
■ Thousands of databases with hybrid cloud deployment across
RDBMS and noSQL etc.

5
Big Scale
Big Challenges
That’s a spike of >8 GBps !!!!!
Black Friday and Cyber
Monday Combined! On-sales = Black Friday every day!
■ Huge spikes / demand for tickets
■ Global company = across time zones
■ Limited inventory
■ Multiple sales channels
0 to 150M transactions in minutes!
Predicable OnSale Traffic
Can we be more prepared?

7
We have a predictable business traffic,
we are looking for predictable backend solutions.
Databases technology we are looking for:
a. Predictable when traffic growth
b. Elastic requirement, not only can scale up but also be able to scale down.
c. Unified deployment to both cloud and OnPrem with shippable technology
d. Balance between features and costs
e. Performance, Performance and Performance.
Ticketmaster’s Cassandra Story

Ticketmaster’s Cassandra Story
8
Early Cassandra adoption
First enterprise deployment 2019
Potential standardized key-value DB solution
Solution for different workloads and business tiers
Find balance between cost and performance
Easy evaluation and deployment toolset

Database Cluster Setup
▪ Single Region, one dc with 6 nodes across 3 AZs
▪ EC2: r5.2xlarge
▪ EBS: io1 + 10K iops
Testing nodes
▪ EC2: t2.2xlarge
▪ Single node vs Six nodes
▪ Same region where the database cluster exists
Data warmup
▪ Each test start with 50M data preparation.
Cassandra Test Project
9
Cassandra Stress
▪ Ticketmaster customized workload
▪ Using cassandra stress binary from each distribution
Test Workloads
▪ 100% read
▪ 100% write
▪ 50% read and 50% write
▪ 80% write and 20% read
▪ 20% write and 80% read
Test Duration: 20 mins
Test Design

Cassandra Test Project - Why EBS?
10
Performance testing:
■ Write Performance is CPU bound
■ Read Performance is memory bound
■ NVMes favrates random reads
RTO
■ EBS: 1~10 mins for single node or entire DC recovery.
■ NVMe: The bigger the data set is the longer it will
take. For a 6TB, it will take 10 hrs or longer.
Memory and CPU scale up/down
■ EBS: 1~10 minutes
■ NVMe: same as RTO
NVMe vs EBS
testing in 2019That’s a spike of >8 GBps !!!!!
Black Friday and
Cyber Monday
Combined!
Start with EBS

#
# Keyspace info
#
keyspace: user_space
#
# The CQL for creating a keyspace (optional if it already exists)
#
keyspace_definition: |
CREATE KEYSPACE user_space WITH replication = {'class': 'NetworkTopologyStrategy', 'us-east-1-nonprod1': 3};
#
11
Customized Yaml - keyspace

12
#
# Table info
#
table: user_table
#
# The CQL for creating a table you wish to stress (optional if it already exists)
#
table_definition: |
CREATE TABLE user_table (
user_id text,
ticket_id text,
ticket_type text,
ticket_value double,
time bigint,
PRIMARY KEY (user_id, time)
) WITH CLUSTERING ORDER BY (time DESC)
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'};
Customized Yaml - table

Customized Yaml - columnspec
13
columnspec:
- name: user_id
population: seq(1..5b) # 5 Billion potential user_ids
size: fixed(16)
- name: ticket_type
size: uniform(10..20) # ticket_type is 10-20 chars
population: uniform(1..10) # there are 10 types of ticket_types
- name: ticket_id
size: fixed(16)
population: seq(1..5b) # 5 Billion unique user_ids
- name: ticket_value
population: gaussian(0..1000) # ticket_values range from 0-1000 and follow a gaussian distribution

Customized Yaml - operation
14
insert:
partitions: fixed(1)
batchtype: UNLOGGED
select: fixed(10)/10
queries:
select_user:
cql: select user_id, ticket_value,time from user_table where user_id = ? and time>=90 LIMIT 10
fields: samerow

15
Monitoring and Dashboard
OnPrem DB logs
Pre-build Machine Image
contains ScyllaDB
Database boost script
AWS resource
Testing Setup: ScyllaDB

16
contains DSE
AWS resource
Testing Setup: DataStax DSE
OnPrem DB logs

17
contains Apache Cassandra
AWS resource
Testing Setup: Apache Cassandra
OnPrem DB logs

20
Scylla is a great database platform
to handle high traffic demands.

21
Big thanks to my team:
Aidan Wong
Erica Ip
Leon Katz
Djerdj Srdanov
Sri Rangisetti
Aron Kumar

Thank You
@lindaxudata
linda.xu@ticketmaster.com
Linda Xu

Performance Testing: Scylla vs. Cassandra vs. Datastax

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Performance Testing: Scylla vs. Cassandra vs. Datastax

Similar to Performance Testing: Scylla vs. Cassandra vs. Datastax (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

Performance Testing: Scylla vs. Cassandra vs. Datastax

Editor's Notes