Materialized Views and Secondary Indexes in Scylla: They Are finally here!

Materialized Views and
Secondary Indexes in Scylla:
They are finally here!
Piotr Sarna
Software Engineer @ScyllaDB

Presenter bio
Piotr is a software engineer very keen on open-source projects
and C++. He previously developed an open-source distributed
file system and had a brief adventure with Linux kernel during
an apprenticeship at Samsung Electronics. Piotr graduated from
University of Warsaw with MSc in Computer Science.

Agenda
▪ Introduction
▪ Materialized Views
▪ Secondary Indexes
▪ Filtering
▪ Summary

Why finally?
▪ Materialized views
• experimental in 2.0
▪ Secondary indexes
• experimental in 2.1
▪ Filtering

Why finally?
▪ Materialized views
• experimental in 2.0, production-ready since 3.0
▪ Secondary indexes
• experimental in 2.1, production-ready since 3.0
▪ Filtering
• production-ready since 3.0

Before Materialized Views
▪ How to query by something else other than primary key
columns?

CREATE TABLE t (p int, c1 int, c2 int, v int, PRIMARY KEY (p, c1, c2));

CREATE TABLE t (p int, c1 int, c2 int, v int, PRIMARY KEY (p, c1, c2));
▪ Querying for a regular column v:
• CREATE TABLE t2 (v int, p int, c1 int, c2 int, PRIMARY KEY(v, p, c1, c2));
• SELECT * FROM t2 WHERE v = 7;
▪ Querying for a non-prefix part of the primary key:
• CREATE TABLE t2 (c1 int, p int, c2 int, PRIMARY KEY(c1, p, c2));
• SELECT * FROM t2 WHERE c1 = 7;

▪ Manual denormalization - problems
• updating the base table may require read-before-write
• there may be multiple denormalization tables for a table
• what if one of the writes fails?
• what if somebody forgets to write to one of the denormalized parts?

Read before write
CREATE TABLE base_table (
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
CREATE TABLE denormalized (
v int,
p int,
c int,
PRIMARY KEY (v, p, c)
);

Read before write
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
v int,
p int,
c int,
);
v | p | c
---+---+---
8 | 0 | 1

Read before write
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
UPDATE TABLE base_table
SET v = 9
WHERE p = 0 AND c = 1;
v int,
p int,
c int,
);
v | p | c
---+---+---
8 | 0 | 1
DELETE FROM denormalized
WHERE v = 8; -- how do we know it’s 8?
INSERT INTO denormalized (v, p, c)
VALUES (9, 0, 1);

Materialized Views
▪ Let Scylla denormalize a table for you
• view updates are generated automatically and transparently
• read-before-write is performed when needed
• useful statistics are exposed

Materialized Views
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
CREATE MATERIALIZED VIEW
view_table AS
SELECT * FROM base_table
WHERE v IS NOT NULL
PRIMARY KEY(v, p, c);
v | p | c
---+---+---
8 | 0 | 1

Materialized Views
▪ Materialized view’s partition key can be a subset of primary key
parts and/or a regular column
• currently limited to a single regular column
• the whole base primary key must be included in view’s primary key
• all primary key fields must be restricted with IS NOT NULL
▪ Each table is allowed to have multiple views

More examples
p int,
c int,
v1 int,
v2 int,
v3 int,
v4 int,
v5 int,
PRIMARY KEY (p, c)
);
p | c | v1 | v2 | v3 | v4 | v5
---+---+----+----+----+----+----
0 | 1 | 8 | 9 | 10 | 11 | 12
view_table AS
SELECT c, p FROM base_table
WHERE c IS NOT NULL
PRIMARY KEY(c, p);
c | p
---+---
1 | 0

More examples
p1 int,
p2 int,
c1 int,
c2 int,
v1 int,
v2 int,
v3 int,
PRIMARY KEY ((p1, p2), c1, c2)
);
p1 | p2 | c1 | c2 | v1 | v2 | v3
----+----+----+----+----+----+----
0 | 1 | 2 | 3 | 8 | 9 | 10
view_table AS
SELECT c2, p1, p2, c1, v2 FROM
base_table
WHERE c2 IS NOT NULL
AND p1 IS NOT NULL
AND p2 IS NOT NULL
AND c1 IS NOT NULL
PRIMARY KEY(c2, p1, p2, c1);
c2 | p1 | p2 | c1 | v2
----+----+----+----+----
3 | 0 | 1 | 2 | 9

Challenges
▪ View rows must be eventually consistent with their base counterparts
• all updates should be propagated - inserts, updates, deletes
• updates should not be lost in case of temporary failures/restarts
▪ Cluster must not be overloaded with mv updates - backpressure
• each base write may trigger multiple independent updates
• so can streaming
▪ Views created on an existing table should fill themselves
with existing base data - view building

Consistency - synchronous model
C
B
V1
w(p: 1, v: 10)
V2
d(v: 5)
w(v:10, p: 1)
r: Ok
r: Ok

Consistency - asynchronous model
C
B
V1
w(p: 1, v: 10)
V2
d(v: 5)
w(v:10, p: 1)
r: Ok
r: Ok

Consistency
C
B
V1
w(p: 1, v: 10)
V2
d(v: 5)
w(v:10, p: 1)
r: Ok
r: Ok

Consistency
C
B
V1
w(p: 1, v: 10)
V2
d(v: 5)
w(v:10, p: 1)
r: Ok
r: Ok
solution: hinted handoff

Hinted handoff for materialized views
▪ Failed updates are stored on base node as hints
▪ They will be resent once the paired node is available

View building
▪ Views created on existing tables will be incrementally built from
existing data
▪ Progress can be tracked via system tables:
• system.views_builds_in_progress
• system.built_views

Backpressure
▪ a single user write can trigger multiple mv updates
▪ backpressure prevents overloading the cluster with them
• base replicas report their load to the coordinator
• coordinator is allowed to delay serving new user writes to lower the pressure
▪ there’s a whole presentation about the topic by Nadav Har’El
Public design document: https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWo

Streaming
▪ Efficient way of sending data from one node to another
▪ Moves data directly to sstables of the target node,
bypassing the full write path
▪ Used under the hood of several cluster operations, e.g.:
• bootstrap
• repair
• rebuild

▪ In some cases, streamed data should generate materialized view
updates to ensure consistency
• unconditionally during node repair
• when the view is not yet completely built
▪ Affected sstables are stored and used to generate MV updates
Streaming

Before Secondary Indexes
▪ Searching on non-partition columns
• full table scan + client-side filtering
• schema redesign + manual denormalization
• using materialized views

Secondary Indexes
Global
▪ based on materialized views
▪ reading - scalable
▪ writing - distributed
▪ low cardinality = wide partitions
▪ high cardinality = no problem
Local
▪ require custom code
▪ reading - doesn’t scale
▪ writing - fast, local operation
▪ low cardinality = wide local
partitions
▪ high cardinality = too many lookups

Secondary Indexes
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
CREATE INDEX ON base_table(v);
v | token | p | c
---+-------+---+---
8 | 0x123 | 0 | 1
CREATE INDEX ON base_table(c);
c | token | p
---+-------+---
8 | 0x123 | 0

Secondary Indexes
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
SELECT * FROM base_table WHERE v = 8;
SELECT * FROM base_table WHERE c = 1;
CREATE INDEX ON base_table(v);
v | token | p | c
---+-------+---+---
8 | 0x123 | 0 | 1
CREATE INDEX ON base_table(c);
c | token | p
---+-------+---
8 | 0x123 | 0

Global secondary indexes
▪ Receive a query that may need indexes
▪ Check whether a matching index exists
▪ Execute the index query, retrieve matching base primary keys
▪ Execute the base query using mentioned primary keys
▪ Return query results

Secondary index paging
▪ Rows in the index table are small
• only the indexed column, base primary keys and token are stored, all with size limits
• it’s near impossible for 100 index rows to hit the query size limit
▪ Base rows may be much bigger
• even a single row may exceed the query size limit
• not to mention 100 of them

100 rows
page_size=100
100 keys
page_size=100
allow_short_read
Secondary Index Paging
C
I
B

3 rows
short_read=true
page_size=100
100 keys
page_size=100
allow_short_read
Secondary Index Paging
C
I
B

Secondary Indexes vs Materialized Views
▪ transparent - the same table is
used for querying
▪ may be more efficient with
storage
▪ creating/deleting them is easier
and more straightforward
▪ can cooperate with filtering
▪ uses 2-step query to join results
▪ querying doesn’t involve two steps,
which influences performance
▪ more flexible with primary keys
and complicated schemas
▪ denormalizes existing data

Filtering
> SELECT * FROM base_table WHERE v = 8;
Cannot execute this query as it might involve data filtering and thus may have unpredictable
performance. If you want to execute this query despite the performance unpredictability, use
ALLOW FILTERING.

Filtering
▪ Query restrictions that may need filtering
• non-key fields (WHERE v = 1)
• parts of primary keys that are not prefixes (WHERE pk = 1 and c2 = 3)
• partition keys with something other than an equality relation (WHERE pk >= 1)
• clustering keys with a range restriction and then by other conditions
(WHERE pk =1 and c1 > 2 and c2 = 3)

Coordinator-side filtering
▪ Coordinator node retrieves all data from nodes
▪ Filtering is applied
▪ Only matching rows are returned to the client
▪ Easily extensible with optimizations (pre-filtering on data nodes
can be implemented and added)

Query selectivity
Low selectivity queries:
▪ return almost all rows (e.g. 70%)
High selectivity queries:
▪ return only a few rows (e.g. 1)

Query selectivity
Low selectivity queries:
▪ return almost all rows (e.g. 70%)
▪ good candidate for filtering
High selectivity queries:
▪ return only a few rows (e.g. 1)
▪ bad candidate for filtering

Filtering alternatives
▪ Materialized views and their alternatives
▪ Secondary indexes and their alternatives

Combining filtering with indexes
CREATE TABLE t (p int, c1 int, c2 int, v1 int, v2 int, PRIMARY KEY(p, c1, c2));
CREATE INDEX ON t(c2);
▪ SELECT * FROM t WHERE c2 = 3 and v1 = 1 and v2 = 3 ALLOW FILTERING;

Combining filtering with indexes
• extract rows using index on c2
• filter rows that match (v1 == 1 and v2 == 3)

Multiple indexing
CREATE INDEX ON t(v1);

Multiple indexing
• extract rows using index on c2
• filter rows that match (v1 == 1 and v2 == 3)

Key prefix optimizations
▪ SELECT * FROM t WHERE p = 0 and c1 = 1 and v2 = 7 ALLOW FILTERING;
• extract rows only from partition p=0 and sliced by c1=1
• filter rows that match (v2 = 7)

Key prefix optimizations
▪ SELECT * FROM t WHERE p = 0 and c1 = 1 and v1 = 7 ALLOW FILTERING;
• extract rows from index v1, including p=0 and c1=1 in index query restrictions
• no filtering needed!

Future: selectivity statistics
Having selectivity statistics for every index would help with:
▪ identifying data model problems
• was indexing the right choice for the use case? Would filtering fit better?
▪ choosing the best index to query from in multiple index queries
• one index is used to retrieve results from the base replica
• remaining restrictions are filtered
• which combination is the best?

Conclusions
▪ As of 3.0, the following features are going GA:
• materialized views
• secondary indexes
• filtering support

Future plans
▪ MV repair
▪ optimized multi-index support
▪ add replica-side filtering optimizations
▪ as always - optimize even further

Thank You
Any Questions?
Please stay in touch
sarna@scylladb.com

Materialized Views and Secondary Indexes in Scylla: They Are finally here!

More Related Content

What's hot

Similar to Materialized Views and Secondary Indexes in Scylla: They Are finally here!

More from ScyllaDB

Recently uploaded

Materialized Views and Secondary Indexes in Scylla: They Are finally here!