This document summarizes a presentation about materialized views, secondary indexes, and filtering in ScyllaDB. Materialized views allow querying data by non-primary key columns through automatic denormalization. Secondary indexes provide an alternative through global indexes. Filtering queries that don't use the primary key are now supported with the ALLOW FILTERING option. The presentation covered how these features work, consistency models, and combining indexes with filtering for optimized queries. Future work includes improving materialized view repair and adding selectivity statistics.
2. Presenter bio
Piotr is a software engineer very keen on open-source projects
and C++. He previously developed an open-source distributed
file system and had a brief adventure with Linux kernel during
an apprenticeship at Samsung Electronics. Piotr graduated from
University of Warsaw with MSc in Computer Science.
10. Before Materialized Views
CREATE TABLE t (p int, c1 int, c2 int, v int, PRIMARY KEY (p, c1, c2));
▪ Querying for a regular column v:
• CREATE TABLE t2 (v int, p int, c1 int, c2 int, PRIMARY KEY(v, p, c1, c2));
• SELECT * FROM t2 WHERE v = 7;
▪ Querying for a non-prefix part of the primary key:
• CREATE TABLE t2 (c1 int, p int, c2 int, PRIMARY KEY(c1, p, c2));
• SELECT * FROM t2 WHERE c1 = 7;
11. Before Materialized Views
▪ Manual denormalization - problems
• updating the base table may require read-before-write
• there may be multiple denormalization tables for a table
• what if one of the writes fails?
• what if somebody forgets to write to one of the denormalized parts?
12. Read before write
CREATE TABLE base_table (
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
CREATE TABLE denormalized (
v int,
p int,
c int,
PRIMARY KEY (v, p, c)
);
13. Read before write
CREATE TABLE base_table (
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
CREATE TABLE denormalized (
v int,
p int,
c int,
PRIMARY KEY (v, p, c)
);
v | p | c
---+---+---
8 | 0 | 1
14. Read before write
CREATE TABLE base_table (
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
UPDATE TABLE base_table
SET v = 9
WHERE p = 0 AND c = 1;
CREATE TABLE denormalized (
v int,
p int,
c int,
PRIMARY KEY (v, p, c)
);
v | p | c
---+---+---
8 | 0 | 1
DELETE FROM denormalized
WHERE v = 8; -- how do we know it’s 8?
INSERT INTO denormalized (v, p, c)
VALUES (9, 0, 1);
15. Materialized Views
▪ Let Scylla denormalize a table for you
• view updates are generated automatically and transparently
• read-before-write is performed when needed
• useful statistics are exposed
16. Materialized Views
CREATE TABLE base_table (
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
CREATE MATERIALIZED VIEW
view_table AS
SELECT * FROM base_table
WHERE v IS NOT NULL
PRIMARY KEY(v, p, c);
v | p | c
---+---+---
8 | 0 | 1
17. Materialized Views
▪ Materialized view’s partition key can be a subset of primary key
parts and/or a regular column
• currently limited to a single regular column
• the whole base primary key must be included in view’s primary key
• all primary key fields must be restricted with IS NOT NULL
▪ Each table is allowed to have multiple views
18. More examples
CREATE TABLE base_table (
p int,
c int,
v1 int,
v2 int,
v3 int,
v4 int,
v5 int,
PRIMARY KEY (p, c)
);
p | c | v1 | v2 | v3 | v4 | v5
---+---+----+----+----+----+----
0 | 1 | 8 | 9 | 10 | 11 | 12
CREATE MATERIALIZED VIEW
view_table AS
SELECT c, p FROM base_table
WHERE c IS NOT NULL
PRIMARY KEY(c, p);
c | p
---+---
1 | 0
19. More examples
CREATE TABLE base_table (
p1 int,
p2 int,
c1 int,
c2 int,
v1 int,
v2 int,
v3 int,
PRIMARY KEY ((p1, p2), c1, c2)
);
p1 | p2 | c1 | c2 | v1 | v2 | v3
----+----+----+----+----+----+----
0 | 1 | 2 | 3 | 8 | 9 | 10
CREATE MATERIALIZED VIEW
view_table AS
SELECT c2, p1, p2, c1, v2 FROM
base_table
WHERE c2 IS NOT NULL
AND p1 IS NOT NULL
AND p2 IS NOT NULL
AND c1 IS NOT NULL
PRIMARY KEY(c2, p1, p2, c1);
c2 | p1 | p2 | c1 | v2
----+----+----+----+----
3 | 0 | 1 | 2 | 9
20. Challenges
▪ View rows must be eventually consistent with their base counterparts
• all updates should be propagated - inserts, updates, deletes
• updates should not be lost in case of temporary failures/restarts
▪ Cluster must not be overloaded with mv updates - backpressure
• each base write may trigger multiple independent updates
• so can streaming
▪ Views created on an existing table should fill themselves
with existing base data - view building
26. Hinted handoff for materialized views
▪ Failed updates are stored on base node as hints
▪ They will be resent once the paired node is available
27. View building
▪ Views created on existing tables will be incrementally built from
existing data
▪ Progress can be tracked via system tables:
• system.views_builds_in_progress
• system.built_views
28. Backpressure
▪ a single user write can trigger multiple mv updates
▪ backpressure prevents overloading the cluster with them
• base replicas report their load to the coordinator
• coordinator is allowed to delay serving new user writes to lower the pressure
▪ there’s a whole presentation about the topic by Nadav Har’El
Public design document: https://docs.google.com/document/d/1J6GeLBvN8_c3SbLVp8YsOXHcLc9nOLlRY7pC6MH3JWo
29. Streaming
▪ Efficient way of sending data from one node to another
▪ Moves data directly to sstables of the target node,
bypassing the full write path
▪ Used under the hood of several cluster operations, e.g.:
• bootstrap
• repair
• rebuild
30. ▪ In some cases, streamed data should generate materialized view
updates to ensure consistency
• unconditionally during node repair
• when the view is not yet completely built
▪ Affected sstables are stored and used to generate MV updates
Streaming
32. Before Secondary Indexes
▪ Searching on non-partition columns
• full table scan + client-side filtering
• schema redesign + manual denormalization
• using materialized views
33. Secondary Indexes
Global
▪ based on materialized views
▪ reading - scalable
▪ writing - distributed
▪ low cardinality = wide partitions
▪ high cardinality = no problem
Local
▪ require custom code
▪ reading - doesn’t scale
▪ writing - fast, local operation
▪ low cardinality = wide local
partitions
▪ high cardinality = too many lookups
34. Secondary Indexes
CREATE TABLE base_table (
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
CREATE INDEX ON base_table(v);
v | token | p | c
---+-------+---+---
8 | 0x123 | 0 | 1
CREATE INDEX ON base_table(c);
c | token | p
---+-------+---
8 | 0x123 | 0
35. Secondary Indexes
CREATE TABLE base_table (
p int,
c int,
v int,
PRIMARY KEY (p, c)
);
p | c | v
---+---+---
0 | 1 | 8
SELECT * FROM base_table WHERE v = 8;
SELECT * FROM base_table WHERE c = 1;
CREATE INDEX ON base_table(v);
v | token | p | c
---+-------+---+---
8 | 0x123 | 0 | 1
CREATE INDEX ON base_table(c);
c | token | p
---+-------+---
8 | 0x123 | 0
37. Global secondary indexes
▪ Receive a query that may need indexes
▪ Check whether a matching index exists
▪ Execute the index query, retrieve matching base primary keys
▪ Execute the base query using mentioned primary keys
▪ Return query results
38. Secondary index paging
▪ Rows in the index table are small
• only the indexed column, base primary keys and token are stored, all with size limits
• it’s near impossible for 100 index rows to hit the query size limit
▪ Base rows may be much bigger
• even a single row may exceed the query size limit
• not to mention 100 of them
41. Secondary Indexes vs Materialized Views
▪ transparent - the same table is
used for querying
▪ may be more efficient with
storage
▪ creating/deleting them is easier
and more straightforward
▪ can cooperate with filtering
▪ uses 2-step query to join results
▪ querying doesn’t involve two steps,
which influences performance
▪ more flexible with primary keys
and complicated schemas
▪ denormalizes existing data
43. Filtering
> SELECT * FROM base_table WHERE v = 8;
Cannot execute this query as it might involve data filtering and thus may have unpredictable
performance. If you want to execute this query despite the performance unpredictability, use
ALLOW FILTERING.
44. Filtering
▪ Query restrictions that may need filtering
• non-key fields (WHERE v = 1)
• parts of primary keys that are not prefixes (WHERE pk = 1 and c2 = 3)
• partition keys with something other than an equality relation (WHERE pk >= 1)
• clustering keys with a range restriction and then by other conditions
(WHERE pk =1 and c1 > 2 and c2 = 3)
45. Coordinator-side filtering
▪ Coordinator node retrieves all data from nodes
▪ Filtering is applied
▪ Only matching rows are returned to the client
▪ Easily extensible with optimizations (pre-filtering on data nodes
can be implemented and added)
46. Query selectivity
Low selectivity queries:
▪ return almost all rows (e.g. 70%)
High selectivity queries:
▪ return only a few rows (e.g. 1)
47. Query selectivity
Low selectivity queries:
▪ return almost all rows (e.g. 70%)
▪ good candidate for filtering
High selectivity queries:
▪ return only a few rows (e.g. 1)
▪ bad candidate for filtering
50. Combining filtering with indexes
CREATE TABLE t (p int, c1 int, c2 int, v1 int, v2 int, PRIMARY KEY(p, c1, c2));
CREATE INDEX ON t(c2);
▪ SELECT * FROM t WHERE c2 = 3 and v1 = 1 and v2 = 3 ALLOW FILTERING;
51. Combining filtering with indexes
CREATE TABLE t (p int, c1 int, c2 int, v1 int, v2 int, PRIMARY KEY(p, c1, c2));
CREATE INDEX ON t(c2);
▪ SELECT * FROM t WHERE c2 = 3 and v1 = 1 and v2 = 3 ALLOW FILTERING;
• extract rows using index on c2
• filter rows that match (v1 == 1 and v2 == 3)
52. Multiple indexing
CREATE TABLE t (p int, c1 int, c2 int, v1 int, v2 int, PRIMARY KEY(p, c1, c2));
CREATE INDEX ON t(c2);
CREATE INDEX ON t(v1);
▪ SELECT * FROM t WHERE c2 = 3 and v1 = 1 and v2 = 3 ALLOW FILTERING;
53. Multiple indexing
CREATE TABLE t (p int, c1 int, c2 int, v1 int, v2 int, PRIMARY KEY(p, c1, c2));
CREATE INDEX ON t(c2);
CREATE INDEX ON t(v1);
▪ SELECT * FROM t WHERE c2 = 3 and v1 = 1 and v2 = 3 ALLOW FILTERING;
• extract rows using index on c2
• filter rows that match (v1 == 1 and v2 == 3)
54. Key prefix optimizations
CREATE TABLE t (p int, c1 int, c2 int, v1 int, v2 int, PRIMARY KEY(p, c1, c2));
CREATE INDEX ON t(c2);
CREATE INDEX ON t(v1);
▪ SELECT * FROM t WHERE p = 0 and c1 = 1 and v2 = 7 ALLOW FILTERING;
• extract rows only from partition p=0 and sliced by c1=1
• filter rows that match (v2 = 7)
55. Key prefix optimizations
CREATE TABLE t (p int, c1 int, c2 int, v1 int, v2 int, PRIMARY KEY(p, c1, c2));
CREATE INDEX ON t(c2);
CREATE INDEX ON t(v1);
▪ SELECT * FROM t WHERE p = 0 and c1 = 1 and v1 = 7 ALLOW FILTERING;
• extract rows from index v1, including p=0 and c1=1 in index query restrictions
• no filtering needed!
56. Future: selectivity statistics
Having selectivity statistics for every index would help with:
▪ identifying data model problems
• was indexing the right choice for the use case? Would filtering fit better?
▪ choosing the best index to query from in multiple index queries
• one index is used to retrieve results from the base replica
• remaining restrictions are filtered
• which combination is the best?