High-Load Storage of Users’ Actions with ScyllaDB and HDDs

High-load storage of users’ actions
with Scylla and HDDs
Kirill Alekseev

Kirill Alekseev
Software Engineering Team Lead, Mail.Ru Group
■ Software Engineering Team Lead @ Mail Service @ Mail.Ru Group
■ Master’s degree in Computer Science in 2019 @ Lomonosov Moscow State
University
■ Love coding, music and parties
YOUR COMPANY
LOGO HERE
2

19 million
unique real users DAU
47 million
unique real users MAU
3

1 000 000
letters per minute
4

High-load storage of
users’ actions
Service overview
5

Service overview
Basically, actions history is a time series
of actions stored by email:
7
user | system.totimestamp(time) | ip | project_id | action_id
-----------------+--------------------------------------+---------------+------------+------------
test@mail.ru | 2020-11-15 15:22:46.000000+0000 | 172.27.56.34 | 3 | 4
test@mail.ru | 2020-11-15 15:22:45.000000+0000 | 172.27.56.34 | 3 | 13
test@mail.ru | 2020-11-15 15:22:41.000000+0000 | 172.27.56.34 | 3 | 20
test@mail.ru | 2020-11-15 15:22:23.000000+0000 | 172.27.56.34 | 3 | 4
test@mail.ru | 2020-11-15 15:22:22.000000+0000 | 172.27.56.34 | 3 | 120

8
HTTP API
Mail Service Cloud Service Calendar Service
write action by user read a list of actions by user

65000
peak API write RPS
50
peak API read RPS
9

Problems of previous storage
The previous storage had the following problems:
▪ poor scalability
▪ difficult to maintain
▪ lack of must-have DBMS features (secondary indexes, tunable
replication, query language etc)
10

Scylla as a storage
for users’ actions
Cluster and data model overview, hardware specs
11

12
HTTP API
Mail Service Cloud Service Calendar Service
write action by user read a list of actions by user

Cluster overview
▪ 2 DCs, 4+5 nodes, RF=1 inside each DC
▪ CL=ONE for writes/reads
▪ Bare metal
• 2 x Intel Xeon Gold 6230
• 6 x 32GB DDR4 2666 MHz
• 2 x SATA SSD 1TB RAID 1 for clogs, 10 x HDD 16TB RAID 10 for data
• 10 Gb/s Network
13

CREATE TABLE becca.events (
user text, year smallint, week tinyint,
time timeuuid,
project_id smallint, event_id smallint,
ip inet, args map<text, text>,
PRIMARY KEY ((user, year, week, project_id), time)
) WITH CLUSTERING ORDER BY (time DESC)
Data model
▪ Partition is a list of actions sorted by time
▪ Partition is identified by user, year, week and project
▪ Thanks to promoted index, large partitions can be iterated using the ‘time’
column
▪ We use Time Window Compaction Strategy with size of 1 week
14

Reading by a secondary key
15
▪ Out-of-the-box secondary indexes give unpredictable
performance and lots of random IO
▪ Materialized views require a read-before-update for every
write operation (not gonna work with HDDs)
▪ Duplicating writes to a separate table by a different partition
key

CREATE TABLE becca.events_by_ip (
ip text, year smallint, week tinyint,
user text, time timeuuid,
project_id smallint, event_id smallint,
ip inet, args map<text, text>,
PRIMARY KEY ((ip, year, week, project_id), time, user)
) WITH CLUSTERING ORDER BY (time DESC)
Secondary key data model
▪ Requires 2x space and 2x write load
▪ Gives predictable performance on reads
16

240 000
writes per second
95% ~1.5ms, 99.9% ~22ms
17

+4TB
of compressed
data every week
18

10 (100 peak)
reads per second
Avg ~120ms, 95% ~400ms,
99.9% ~650ms
19

Using Scylla with HDDs
Potential problems and possible solutions to them
20

num-io-queues
21
▪ num-io-queues stands for a number of threads
that interact with disks
▪ You have to find your sweet spot so that throughput is
optimal and latencies are ok (Little’s Law)
▪ 10 HDDs in RAID 10 provide the maximum concurrency of
5 for writes, set num-io-queues to 4-5

Cluster repairs
22
▪ nodetool repair does not finish in acceptable
time (months)
▪ nodetool repair overloads cluster (read
latencies grow 4 times)
▪ We came up with a more IO-efficient way to repair a
cluster in our case

Cluster repairs
23
week
42
week
43
week
44
week
45
week
46
time
week
45
week
45
week
42
week
43
week
44
week
45
week
46
time
week
45

Cluster repairs
24
▪ nodetool refresh will finish quickly
▪ compactions of new data will be triggered, but the
cluster will not be overloaded
▪ compactions will finish in a couple of hours

Problems yet to be solved
25
The following problems are yet to be solved:
▪ latencies grow during compactions, cleanup, bootstrap
▪ latencies grow when a node is down
▪ slow bootstrapping

$150 000
CAPEX saved per 1PB
26

Results
We have achieved the following results:
▪ we have built a high-load service for storing users’
actions with Scylla and HDDs
▪ the given service is able to handle 240 000 writes per second
with 95% of timing equal to 1.5ms with just a few Scylla nodes
▪ we have implemented an approach to serve reads by a
secondary key with predictable performance
28

Future work
In 2021:
▪ third DC
▪ optimize Scylla and clients to get even better latencies
▪ integrate Scylla into more projects
29

Special Thanks
I would like to give special thanks to:
▪ Dmitry Pavlov, Pavel Buchinchik, Igor Platonov
▪ Vladislav Zolotarov, Avi Kivity, Raphael Carvalho
▪ The whole ScyllaDB team
31

Thank You
gibsn@mail.ru
Kirill Alekseev
32

High-Load Storage of Users’ Actions with ScyllaDB and HDDs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to High-Load Storage of Users’ Actions with ScyllaDB and HDDs

Similar to High-Load Storage of Users’ Actions with ScyllaDB and HDDs (20)

More from ScyllaDB

More from ScyllaDB (20)

Recently uploaded

Recently uploaded (20)

High-Load Storage of Users’ Actions with ScyllaDB and HDDs

Editor's Notes